Bug #5273
closed
docs with inline-data allow invalid xml into metacat
Added by Chad Berkley almost 14 years ago.
Updated about 13 years ago.
Description
If you insert a document with inline-data, the data is stripped out of the document before it is validated. However, when you do a GET on the document, it is read off of the disk. So if you insert a doc with inline-data that has invalid characters in it (like unescaped ampersands), metacat will not recognize that it is invalid, but when you try to get the document, you will get a parser error if you try to parse it.
We should be validating the document first before stripping inline-data out of it.
Files
is inline data not contained in CDATA? I thought you could put anything in CDATA and have it be ignored by parsers.
People certainly can use CDATA sections within their inline element, in which case escaping would be taken care of. But in this case, the data in the CDR document is not in a CDATA element, has reserved XML characters in it, and Metacat is not properly rejecting it as invalid.
This file uses unescaped ampersand (&) in the inline data section.
I tried this with the attached inline.xml file that has an invalid unescaped ampersand in the inline section -- Metacat rejected it as invalid.
If there is a specific CDR file that is causing this issue still, let's reopen and and figure out how it is slipping by. Otherwise, I believe this is not currently a problem in trunk.
Original Bugzilla ID was 5273
Also available in: Atom
PDF