https://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-07-28T20:22:25ZEcoinformatics RedmineKepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=144962009-07-28T20:22:25Zben leinfelderleinfelder@nceas.ucsb.edu
<ul></ul><p>It's easy enough to strip out the backslashes - this makes sense in the case of \" where we are just escaping the " <br />But for \n and \t I don't think we're ever getting an actual newline or tab character so we'd end up with a random-looking n or t in the string if we just removed the backslash.<br />Thoughts?</p> Kepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=144972009-07-28T23:25:12ZOliver Soongsoong@nceas.ucsb.edu
<ul></ul><p>I think it'd be better to handle this more rigorously by searching specifically for special escape sequences (\", \\, \n, and \t at least) and converting them. There's the java.lang.string.replaceAll function which would probably work. Even better would be to actually figure out what all the sequences converted by R's dput are.</p>
<p>I haven't studied the RExpression.java code about this, but I think Kepler's catching the R stdout stream and parsing it. In that case, I think another alternative might be to replace the dput with a cat. I believe it would replace the escape sequences with the appropriate characters in stdout. Consider these R commands:</p>
<p>a <- c("a\\b", "c\"d", "e\nf", "g\th")<br />dput(a)<br />cat("c(\"", paste(a, collapse = "\", \""), "\")\n", sep = "")<br />cat("{\"", paste(a, collapse = "\", \""), "\"}\n", sep = "")</p>
<p>The only problem is what exactly happens on the RExpression side when it encounters one of these special characters. I think it might be easier to just continue with the dput and careful use of replaceAll.</p> Kepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=144982009-07-29T13:34:24Zben leinfelderleinfelder@nceas.ucsb.edu
<ul></ul><p>Thanks to Apache Commons, I think this is the answer:<br /><a class="external" href="http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeJava(java.lang.String)">http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeJava(java.lang.String)</a></p>
<p>It will convert the "special" sequences into the actual characters we want: \t will be a tab, \n a newline, \" a quote....etc.</p> Kepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=144992009-07-29T13:36:51Zben leinfelderleinfelder@nceas.ucsb.edu
<ul></ul><p>this change is committed to trunk</p> Kepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=145002009-07-30T16:37:20ZOliver Soongsoong@nceas.ucsb.edu
<ul></ul><p>I did a little digging, and the escape sequences R uses in deparsing (core of dput) seems to be handled in the EncodeString function in printutils.c. There's a very minor discrepancy with the escape sequences used by Java (see <a class="external" href="http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#101089">http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#101089</a>).</p>
<p>The discrepancies seem to be \a, \v, and \0 (audible bell, vertical tab, and null). A fix would be to use replaceAll after the unescapeJava, but it's complicated because these three aren't Java escape sequences, so we would have to look up the appropriate octal or unicode escape in Java.</p>
<p>I doubt anybody will actually use these in practice, so it's unlikely to cause any real problems. I'm deferring judgment and leaving this as closed.</p> Kepler - Bug #4270: RExpression exports special characters in strings as the 2 character escaped sequencehttps://projects.ecoinformatics.org/ecoinfo/issues/4270?journal_id=145012013-03-27T21:26:15ZRedmine Admin
<ul></ul><p>Original Bugzilla ID was 4270</p>