Bug #3174
closedMetacat performance issue in Sanparks skin
Added by Jing Tao over 16 years ago. Updated over 16 years ago.
0%
Description
Matt and Mike reported it would like about 4 or 5 minutes to do a search in sanparks skin of production server. We should fixed before 1.8.1 release.
Related issues
Updated by Jing Tao over 16 years ago
- Bug 3032 has been marked as a duplicate of this bug. ***
Updated by ben leinfelder over 16 years ago
duane's comments about the indexPath in LTER pointed me toward the search pathquery in the sanparks/saeon skins.
Perhaps the fix is as simple as adding the additional returnfields that are requested (from FGDC documents).
We had previously only added the FGDC field "placekey" because it was used as part of the queryterm/pathexpr element.
The additional returnfields are:
<returnfield>idinfo/citation/citeinfo/title</returnfield>
<returnfield>idinfo/citation/citeinfo/origin</returnfield>
<returnfield>idinfo/keywords/theme/themekey</returnfield>
Updated by ben leinfelder over 16 years ago
adding those extra returnfields to the index paths did not seem to make a difference with searches on dev.
The sanparks skin ends up generating SQL that uses an "intersect" so as to handle organization filters for both EML and FGDC document types- i believe this is where the performance hit is being introduced.
It might be nice to include a search option so that you could search for only one document type at a time. If you wanted all types it would just take longer...
*dev.nceas.ucsb.edu is giving about 85 seconds for a 'kruger' search vs. the default skin's 23 seconds for the same search term.
Updated by ben leinfelder over 16 years ago
oh! one more thing:
looks like sanparks is always searching for the search term in any nodedata (not limited to title, abstract, etc....)
that means it does not use the xml_patch_index table to find some of the valid docids....
It is usually an option in the other skins' search interface to specify that searches only look in certain [predefined] fields. Perhaps we should add this to the sanparks and saeon skins
Updated by Jing Tao over 16 years ago
Here is the selection query:
SELECT docid,docname,doctype,date_created, date_updated, rev FROM xml_documents WHERE docid IN (((SELECT DISTINCT docid FROM xml_path_index WHERE (UPPER LIKE '%PLANT%' AND path IN ('abstract/para','surName','givenName','organizationName','title','keyword','para','geographicDescription','literalLayout','@packageId','abstract','idinfo/citation/citeinfo/title','idinfo/citation/citeinfo/origin','idinfo/keywords/theme/themekey')) UNION (SELECT DISTINCT docid FROM xml_nodes WHERE UPPER LIKE '%PLANT%' AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 'idinfo/keywords/theme/placekey') ) ) INTERSECT (SELECT DISTINCT docid FROM xml_path_index WHERE (UPPER LIKE '%SANPARKS, SOUTH AFRICA%' AND path IN ('placekey','keyword')) OR (UPPER LIKE '%SAEON, SOUTH AFRICA%' AND path IN ('placekey','keyword'))))) AND (docid IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)) AND docid NOT IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3) ))
Updated by ben leinfelder over 16 years ago
Added optional "Search All fields" checkbox for SANParks and SAEON skins. Default search (unchecked) searches only a handful of indexed document elements (as specified by the indexPath metacat property).
Updated by Jing Tao over 16 years ago
Modified the search query - change the search field from "idinfo/keywords/theme/placekey", which is not in the path index, to "placekey" which is in the path index. The search now only takes about 10 seconds in the first time.
Updated by Jing Tao over 16 years ago
callie did some test too and it is fine to her to close the bug.