Bug #1427
closedxml_index constrains depth of paths that can be inserted
0%
Description
When an XML document contains a deeply nested structure, metacat accepts the
document for storage in xml_nodes, but during the subsequent indexing phase, it
throws an exception because the composite paths to the deep nodes are too long
to fit in the space allocated for the paths in the column in the xml_index
table. This column was limited to a a few hundred characters so that it is
indexable (Oracle had a limit on the total indexable width of columns).
These problems were discovered and reported by Wade Sheldon (GCE LTER) when he
submitted EML documents with fully filled out taxonomic coverage entries. We
definitely need to support realistically filled out EML documents.
So, two possible solutions:
1) make the column much wider
-- this is a partial solution, because the column still might not be big
enough for very deep docs or docs with long element names
-- if its wider, it may not be indexable, which is why it exists
2) eliminate the dependency on the xml_index table altogether
-- the recursive search needed isn't that much slower, and may not be
slower at all as we tune the database
-- insert/update/delete should be MUCH faster
-- simpler database structure
We have decided to pursue (2) above because of the advantages listed. Rather
than completely removing the xml_index code, we are going to make it an option
whether or not it is used, but by default ship with it turned off.