Project

General

Profile

Bug #3815

Ampersand character not correctly encoded

Added by Shaun Walbridge almost 11 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
metacat
Target version:
Start date:
02/09/2009
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
3815

Description

Ampersands are encoded as "&" within register-dataset.cgi in normalize(), but documents uploaded have a "%26amp;" entry instead. "%26" is the urlencoded version of "&" and 0x0026 is the Unicode code-point.

An example document exhibiting the behavior:
http://knb.ecoinformatics.org/knb/metacat?action=read&qformat=nceas&docid=nceas.912.8

The organization is set to "U.S. Fish %26amp; Wildlife Service".

History

#1 Updated by Shaun Walbridge almost 11 years ago

To properly fix this bug, we'll need to add tests for character conversions, to make sure that we aren't introducing regressions with our fixes. Add test documents which break the system, and then afterward add the necessary fixes.

register-dataset.cgi has no testing, and may also be the source of this bug.

#2 Updated by Matt Jones almost 11 years ago

Register-dataset.cgi may not have any testing itself, but it is based on Metacat.pm which does have a test suite in the module definition library. So, I think that extending the tests in Metacat.pm should help in covering multiple scripts like register-dataset.cgi that might make use of it for inserting and updating data to metacat.

#3 Updated by ben leinfelder about 8 years ago

Using the dev skin to load an XML document with an ampersand encoded as: & kept the character intact. To me, this indicates that Metacat's servlet API is correctly handling the character. If the register-dataset.cgi or Metacat.pm is doing something to encode this additionally, that might account for the double encoding.

#4 Updated by ben leinfelder about 8 years ago

Using the Java MetacatClient API to insert a document with & also worked fine (no additional encoding of the & symbol).

#5 Updated by Shaun Walbridge about 8 years ago

The specific example looks to have its origins in delNormalize (sic) within register-dataset.cgi -- the first regex operator replaces '&' with '&' but the last regex operator then replaces '&' with '%26'. I'd recommend removing these functions and using existing modules for encoding/decoding of the XML. There are a couple of options [1], though perhaps just fixing the symptom is good enough for the time being.

1. http://stackoverflow.com/questions/1137790/how-can-i-escape-text-for-an-xml-document-in-perl

#6 Updated by ben leinfelder about 8 years ago

I only see delNormalize() being called from the deleteData() function in register-dataset.cgi -- but perhaps similar code is lurking somewhere else?
Is this something you (Shaun) can look into?

#7 Updated by ben leinfelder about 8 years ago

I just tried the registry on a test machine running most recent Metacat trunk and the ampersand was correctly encoded for XML (and only once):

<title>Testing & Stuff</title>

#8 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 3815

Also available in: Atom PDF