Bug #2155: Metacat Performace: Rewrite the xml_nodes queries - Metacat - Ecoinformatics Redmine

Actions

Copy link

Bug #2155

open

Metacat Performace: Rewrite the xml_nodes queries

Added by Saurabh Garg over 19 years ago. Updated over 15 years ago.

Status:

In Progress

Priority:

Immediate

Assignee:

Jing Tao

Category:

metacat

Target version:

Unspecified

Start date:

07/13/2005

Due date:

% Done:

Estimated time:

Bugzilla-Id:

2155

Description

From Matt's email...

Rewrite the xml_nodes queries. In general we use the IN clause a lot
which is less than efficient. We need to evaluate how our current
queries are working and rewrite them. With some systematic work we can
probably come up with some similar ideas for improvements

Files

Download all files

soil-query1.sql (3.93 KB) soil-query1.sql		Matt Jones, 09/28/2006 08:46 AM
soil-query3.sql (2.98 KB) soil-query3.sql		Matt Jones, 09/28/2006 08:49 AM
soil-query3a.sql (2.98 KB) soil-query3a.sql		Matt Jones, 09/28/2006 08:51 AM
MetacatOriginalQueryWithIntersectFrom1.7+ (3.75 KB) MetacatOriginalQueryWithIntersectFrom1.7+		Jing Tao, 04/12/2007 06:01 PM
metacat-intersect-new.txt (2.62 KB) metacat-intersect-new.txt		Matt Jones, 04/13/2007 07:35 AM
extendedQuery (833 Bytes) extendedQuery		Jing Tao, 05/09/2007 02:41 PM
extendedQuery (839 Bytes) extendedQuery		Jing Tao, 05/09/2007 02:47 PM
AttributeQuery (573 Bytes) AttributeQuery		Jing Tao, 05/09/2007 02:48 PM
XML_QueryResultQuery (448 Bytes) XML_QueryResultQuery		Jing Tao, 05/09/2007 02:50 PM
timing_result (6.79 KB) timing_result		Jing Tao, 05/09/2007 10:18 PM
land_delimitered.ods (22.9 KB) land_delimitered.ods		Jing Tao, 05/14/2007 02:35 PM
land_with_attribute_delimitered.ods (20.3 KB) land_with_attribute_delimitered.ods		Jing Tao, 05/14/2007 02:38 PM
metacat.debug.ods (22.8 KB) metacat.debug.ods		Jing Tao, 06/06/2007 11:09 AM
metacat.debug2.ods (22.4 KB) metacat.debug2.ods		Jing Tao, 06/08/2007 10:39 AM
metacat.debug.ods (16 KB) metacat.debug.ods		Jing Tao, 06/08/2007 03:47 PM
metacat.debug.ods (19.5 KB) metacat.debug.ods		Jing Tao, 06/11/2007 10:56 AM
metacat.debug.ods (19.1 KB) metacat.debug.ods		Jing Tao, 06/12/2007 09:30 AM
metacat.debug.ods (11 KB) metacat.debug.ods		Jing Tao, 06/19/2007 11:55 AM
metacat.debug.ods (11.3 KB) metacat.debug.ods		Jing Tao, 06/21/2007 03:07 PM
knb-default-selection-query (1.06 KB) knb-default-selection-query		Jing Tao, 06/21/2007 03:54 PM
oringalIntersectQuery (1.55 KB) oringalIntersectQuery		Jing Tao, 07/13/2007 04:40 PM
newIntersectQuery (1.46 KB) newIntersectQuery		Jing Tao, 07/13/2007 05:14 PM
metacat.debug.ods (11.3 KB) metacat.debug.ods		Jing Tao, 07/27/2007 02:39 PM
metacat.debug.ods (10.4 KB) metacat.debug.ods		Jing Tao, 07/27/2007 04:28 PM
metacat.debug.ods (9.92 KB) metacat.debug.ods		Jing Tao, 07/27/2007 05:19 PM
metacat.debug.ods (10 KB) metacat.debug.ods		Jing Tao, 07/27/2007 06:24 PM
graph.ods (9.37 KB) graph.ods		Jing Tao, 07/27/2007 06:43 PM
orgin (977 Bytes) orgin		Jing Tao, 10/31/2007 11:39 AM
new (994 Bytes) new		Jing Tao, 10/31/2007 11:41 AM
Query1 (157 Bytes) Query1		Jing Tao, 11/01/2007 07:13 PM
Query1 (303 Bytes) Query1		Jing Tao, 11/01/2007 09:38 PM
Query2 (1001 Bytes) Query2		Jing Tao, 11/01/2007 09:40 PM
Query3 (1.22 KB) Query3		Jing Tao, 11/01/2007 09:44 PM
Query4 (288 Bytes) Query4		Jing Tao, 11/01/2007 09:46 PM

Related issues

Actions

Copy link

Updated by Saurabh Garg over 18 years ago

IN clause is not so efficient. I was able to improve performance by using outer joins instead of IN to improve performance in some of the other queries and those might be worth checking out. Also I think IN clause performance is much improved in Postgres 8 as opposed to Postgres 7

Actions

Copy link

Updated by Matt Jones over 18 years ago

Just changing the 'LIKE' operator to the '=' operator allows the optimizer to use indices if they exist. Unfortunately, this has the negative side effect of not working for wildcard substring queries, thereby significantly reducing functionality.

Actions

Copy link

Updated by Saurabh Garg over 18 years ago

Following query can be used to replace the first IN clause

select distinct a.docid, a.docname, a.doctype, a.date_created, a.date_updated, a.rev from xml_documents a LEFT JOIN xml_path_index b ON a.docid=b.docid where UPPER LIKE '%WATER%' AND path IN ('abstract/para','givenName','keyword','organizationName','title','surName', 'para','geographicDescription','literalLayout') and b.docid is NOT NULL

Actions

Copy link

Updated by Jing Tao almost 18 years ago

Now I found a problem in query modification.
If path query looks like:

<pathquery version="1.0">
<meta_file_id>test</meta_file_id>
<returndoctype>eml://ecoinformatics.org//eml-2.0.0</returndoctype>
<returnfield>dataset/title</returnfield>
<querygroup operator="UNION">
<queryterm casesensitive="false" searchmode="contains">
<value>soil</value>
<pathexpr>dataset/title</pathexpr>
</queryterm>
<queryterm casesensitive="false" searchmode="contains">
<value>soil</value>
<pathexpr>keyword</pathexpr>
</queryterm>
<queryterm casesensitive="false" searchmode="contains">
<value>soil</value>
<pathexpr>organizationName</pathexpr>
</queryterm>
</querygroup>
</pathquery>

Then the query:
SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN (

SELECT DISTINCT docid FROM xml_path_index WHERE UPPER(nodedata) LIKE '%SOIL%' AND path LIKE 'dataset/title'  
                   UNION 
                  SELECT DISTINCT docid FROM xml_path_index WHERE UPPER(nodedata) LIKE '%SOIL%' AND path LIKE 'keyword'
                   UNION 
                  SELECT DISTINCT docid FROM xml_path_index WHERE UPPER(nodedata) LIKE '%SOIL%' AND path LIKE 'organizationName' 
                 )
 )

can be modified to
SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN (
(
SELECT DISTINCT docid FROM xml_path_index
WHERE UPPER LIKE '%SOIL%'
AND path IN ("dataset/title','keyword','organizationName'
)
)
)

However, from the path query definition, it is not necessary that every path will hold the same value, e.g. dataset/title can be soil, keyword can be grass land and organization can be NCEAS. In this case the above modification doesn't work. So our modification now only fits a special case instead of a general case.

Actions

Copy link

Updated by Matt Jones almost 18 years ago

Jing,

We were aware of this before. But still, the special case where the same query term is being searched is the most common query case (most of our clients have this mode of searching multiploe fields for the same query term). So, I think that modifying QuerySpecification to produce the simpler, non-unioned query when the query term is the same for several fields will result in noticeable performance improvements.

In the more general case when the query terms do not match, you could still eliminate the UNION with a query like this:

SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN (
(
SELECT DISTINCT docid FROM xml_path_index
WHERE
(UPPER LIKE '%SOIL%'
AND path ="dataset/title')
OR
(UPPER LIKE '%NITROGEN%'
AND path = 'keyword')
)
)

I haven't tested this for performace improvements, but I suspect it is faster than the UNION.

So QuerySpecification would have to detect when clusters of paths are being searched for the same term and use whichever form is appropriate. You can also mix the two approaches (ie, search a bunch of fields for one term, and OR that with a search of one or more fields for different terms.

Actions

Copy link

Updated by Jing Tao almost 18 years ago

It took about 39 seconds to run above query in knb database and got 982 rows.
If run the union query:
SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN (
(

SELECT DISTINCT docid FROM xml_path_index WHERE         UPPER(nodedata) LIKE '%SOIL%' AND path LIKE 'dataset/title'
                   UNION 
                  SELECT DISTINCT docid FROM xml_path_index WHERE UPPER(nodedata) LIKE '%NITROGEN%' AND path LIKE 'keyword' 
                 )
                );

It took 29 seconds and got 982 rows too. So your guess is right. So QuerySpecification would have to detect when clusters of paths are being
searched for the same term and use whichever form is appropriate.

Actions

Copy link

Updated by Jing Tao almost 18 years ago

Hi, matt:

The query modification looks good for "UNION" operator in Query Group. Do you have any idea if the operator is "INTERCECT"?

Actions

Copy link

#10

Updated by Jing Tao almost 18 years ago

A readable format of metacat original query with INTERSECT operator.

Actions

Copy link

#12

Updated by Jing Tao almost 18 years ago

Here is the timing for query:
1. In default search page, search "datos" and got 8 hits.

total search time 33 seconds:
a. Selection docid total took 32 seconds. 32 of 32 seconds was used to run selection query (note this query only select xml_nodes table).
b. Preparing return fields took 1.0 second. (1.0 was taken to run the query of selection of xml_queryresult table).
c.Transform from xml to html took 0 second

2. Ran query:

<pathquery version="1.0">
<meta_file_id>test</meta_file_id>
<returndoctype>eml://ecoinformatics.org//eml-2.0.0</returndoctype>
<returnfield>dataset/title</returnfield>
<querygroup operator="UNION">
<queryterm casesensitive="false" searchmode="contains">
<value>datos</value>
<pathexpr>dataset/title</pathexpr>
</queryterm>
<queryterm casesensitive="false" searchmode="contains">
<value>datos</value>
<pathexpr>keyword</pathexpr>
</queryterm>
<queryterm casesensitive="false" searchmode="contains">
<value>datos</value>
<pathexpr>originator/individualName/surName</pathexpr>
</queryterm>
</querygroup>
</pathquery>

It took 21 seconds and got 0 hits.

a. Selection docid total took 17 seconds. 17 of 17 seconds was used to run selection query.
b. Preparing return fields took 0.0 second.
c.Transform from xml to html took 4 second

Actions

Copy link

#14

Updated by Jing Tao almost 18 years ago

select docid, path, nodedata, parentnodeid from xml_path_index where
(path like 'originator/individualName/surName' or
path like 'originator/individualName/givenName' or
path like 'creator/individualName/surName' or
path like 'creator/individualName/givenName' or
path like 'originator/organizationName' or
path like 'creator/organizationName' or
path like 'dataset/title' or

path like 'keyword' )

AND docid in
('ABS.4','access.10','access.7','access.8','access.9',
'ADCP_template.50','adler.3','ALASDatabaseAccess.3','ALEXXX_015ADCP015R00_19990817.50',
'ALEXXX_015ADCP015R00_19990906.50','ALEXXX_015ADCP015R00_19991108.50',
'ALEXXX_015ADCP015R00_20000106.50','ALEXXX_015ADCP015R00_20000302.50',
'ALEXXX_015ADCP015R00_20000523.50','ALEXXX_015ADCP015R00_20000815.50')

Actions

Copy link

#18

Updated by Jing Tao almost 18 years ago

<pathquery version="1.2">
<returndoctype>-//ecoinformatics.org//eml-dataset-2.0.0beta6//EN</returndoctype>
<returndoctype>-//ecoinformatics.org//eml-dataset-2.0.0beta4//EN</returndoctype>
<returndoctype>eml://ecoinformatics.org/eml-2.0.0</returndoctype>
<returndoctype>eml://ecoinformatics.org/eml-2.0.1</returndoctype>
<returndoctype>-//NCEAS//eml-dataset-2.0//EN</returndoctype>
<returndoctype>-//NCEAS//resource//EN</returndoctype>
<returnfield>originator/individualName/surName</returnfield>
<returnfield>originator/individualName/givenName</returnfield>
<returnfield>originator/organizationName</returnfield>
<returnfield>creator/individualName/surName</returnfield>
<returnfield>creator/organizationName</returnfield>
<returnfield>dataset/title</returnfield>
<returnfield>keyword</returnfield>
<querygroup operator="UNION">
<queryterm casesensitive="false" searchmode="contains">
<value>datos</value>
</queryterm>
</querygroup>
</pathquery>

Value Hits Total(seconds) Selection[selection query] GetReturnFields[xml_queryresult, extend(extend query, attribute query), storeBackXml_queryresult] xml-html
% 14751 448 401² 24 [13, 8 (?, 0), 0] 23
land 6253 169 38³² 8 [4, 3 (?, 0), 0] 123
soil 1612 42 33³² 3 [2, 0 (0, 0), 0] 6
datos 7 32 32³¹ 0 [0, 0 (0, 0), 0] 0

<pathquery version="1.2">
<querytitle>Web-Search</querytitle>
<returndoctype>eml://ecoinformatics.org/eml-2.0.1</returndoctype>
<returndoctype>eml://ecoinformatics.org/eml-2.0.0</returndoctype>
<returndoctype>-//ecoinformatics.org//eml-dataset-2.0.0beta6//EN</returndoctype>
<returndoctype>-//ecoinformatics.org//eml-dataset-2.0.0beta4//EN</returndoctype>
<returndoctype>-//NCEAS//resource//EN</returndoctype><returndoctype>-//NCEAS//eml-dataset//EN</returndoctype>
<returnfield>originator/individualName/surName</returnfield>
<returnfield>originator/individualName/givenName</returnfield>
<returnfield>creator/individualName/surName</returnfield>
<returnfield>creator/individualName/givenName</returnfield>
<returnfield>originator/organizationName</returnfield>
<returnfield>creator/organizationName</returnfield>
<returnfield>dataset/title</returnfield>
<returnfield>keyword</returnfield>
<returnfield>@packageId</returnfield>
<querygroup operator="UNION">
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>surName</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>givenName</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>keyword</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>para</pathexpr></queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>geographicDescription</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>literalLayout</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>title</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>@packageId</pathexpr>
</queryterm>
<queryterm searchmode="contains" casesensitive="false">
<value>datos</value>
<pathexpr>abstract/para</pathexpr>
</queryterm>
</querygroup>
</pathquery>

Value Hits Total(seconds) Selection[selection query] GetReturnFields[xml_queryresult, extend(extend query, attribute query), relationship, storeBackXml_queryresult] xml-html
% ? 10075 94⁷¹ 8581 [1, 8347 (132, 89), ? 63] 1400
land 5944 112 14¹¹ 39 [2, 35 (1, 32), ? 0] 59
soil 1509 50 10¹⁰ 34 [1, 32 (0, 30), ? 0] 5
datos 7 10 10¹⁰ 0 [0, 0 (0, 0), ? 0] 0

Above query after removing a attribute return field
Value Hits Total(seconds) Selection[selection query] GetReturnFields[xml_queryresult, extend(extend query, attribute query), relationship, storeBackXml_queryresult] xml-html
land 5944 141 65(61) 59 [9, 10 (7, 0), 1, 0] 16
soil 1509 104 44(44) 45 [3, 5 (2, 0), 0, 0] 14
datos 7 49 46(46) 2 [0, 0 (0, 0), 0, 0] 1

After removing xml_queryresult and xml_returnfield table
Value Hits Total(seconds) Selection[selection query] GetReturnFields[xml_queryresult, extend(extend query, attribute query), relationship, storeBackXml_queryresult] xml-html
land ? 3113 49(40) 1668 [1, 1530 (24, 0), 51, 63] 1396
soil 1509 471 36(33) 360 [0, 101 (13, 0), 11, 225] 75
datos 7 59 50(49) 7 [0, 0 (0, 0), 0, 5] 2

Actions

Copy link

#19

Updated by Jing Tao almost 18 years ago

The previous comment has the same content as this attachment. However, the comment is not readable. So I added this attachment.

Actions

Copy link

#40

Updated by Jing Tao over 17 years ago

SELECT docid, nodedata, path FROM xml_path_index WHERE path IN ('dataset/title','entityName','individualName/surName','keyword');
SELECT docid, rev, docname, doctype, date_created,date_updated from xml_documents;
INSERT INTO xml_queryresult (returnfield_id, docid, queryresult_string) VALUES (?, ?, ?);

Actions

Copy link

#45

Updated by Jing Tao over 17 years ago

Move to 1.7.1 release

Actions

Copy link

#46

Updated by Jing Tao over 17 years ago

Move to release 1.7.1

Actions

Copy link

#47

Updated by Jing Tao over 17 years ago

Move this bug from 1.7.1 to 1.8. Continue to improve metacat performance.

Actions

Copy link

#48

Updated by Jing Tao about 17 years ago

I played around Postgresql new feature - text search. Here are some results:

Using metacat 1.8.0 search query (base on xml_path_index):
SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN
(
(
SELECT DISTINCT docid FROM xml_path_index WHERE
( UPPER like '%TEST%'
AND path IN ('abstract/para','givenName','keyword','organizationName','title','surName','para','geographicDescription','literalLayout')
)
)
)
AND (
docid IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)
)
AND
docid NOT IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3)
)
)
It took 5.5 second in dev.nceas.

Then I modified the xml_path_index table:

BEGIN;
ALTER TABLE xml_path_index ADD COLUMN nodedatavector TSVECTOR;
UPDATE xml_path_index SET nodedatavector = to_tsvector(nodedata);
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON xml_path_index FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(nodedatavector, 'pg_catalog.english', nodedata);
CREATE INDEX xml_path_index_vector ON xml_path_index USING gin(nodedatavector);
COMMIT;

Then using new modified query
SELECT docid,docname,doctype,date_created, date_updated, rev
FROM xml_documents
WHERE docid IN
(
(
SELECT DISTINCT docid FROM xml_path_index WHERE
( UPPER like '%TEST%'
AND path IN ('abstract/para','givenName','keyword','organizationName','title','surName','para','geographicDescription','literalLayout')
)
)
)
AND (
docid IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)
)
AND
docid NOT IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3)
)
)

It took 33 seconds. It slows down.

Actions

Copy link

#49

Updated by Jing Tao about 17 years ago

Here is the 1.8.0 query base on xml_nodes(not specifying the search path):
SELECT docid,docname,doctype,date_created, date_updated, rev FROM xml_documents
WHERE docid IN
(((SELECT DISTINCT docid FROM xml_nodes WHERE UPPER LIKE '%TEST%' ) ))
AND
(docid IN
(SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3))
AND docid NOT IN
(SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3))

It took 60 seconds and got 895 hits.

xml_nodes table was modified by
BEGIN;
ALTER TABLE xml_nodes ADD COLUMN nodedatavector TSVECTOR;
UPDATE xml_nodes SET nodedatavector = to_tsvector(nodedata);
CREATE TRIGGER xml_node_tsvectorupdate BEFORE INSERT OR UPDATE ON xml_nodes FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(nodedatavector, 'pg_catalog.english', nodedata);
CREATE INDEX xml_node_vector ON xml_nodes USING gin(nodedatavector);
COMMIT;

Then run the following query:
ELECT docid,docname,doctype,date_created, date_updated, rev FROM xml_documents
WHERE docid IN
(((SELECT DISTINCT docid FROM xml_nodes WHERE nodedatavector @@ to_tsquery('english', 'test') ) ))
AND
(
docid IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3))
AND
docid NOT IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3))
)

It took 1.8 seconds and got 736 hits. Oh, much faster!

Actions

Copy link

#50

Updated by Jing Tao almost 17 years ago

Here is the comparison when use search a eml path which is not in xml_path_index table:
SELECT docid,docname,doctype,date_created, date_updated, rev FROM
xml_documents WHERE docid IN
((
SELECT DISTINCT docid FROM xml_nodes
WHERE UPPER LIKE '%VALUE1%'
AND
parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 'path1')
))
AND (docid IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)) AND docid NOT IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3) ))

30 seconds

SELECT docid,docname,doctype,date_created, date_updated, rev FROM
xml_documents WHERE docid IN
((
SELECT DISTINCT docid FROM xml_nodes
WHERE nodedatavector @@ to_tsquery('english', 'test')
AND
parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 'path1')
))
AND (docid IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)) AND docid NOT IN (SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3) ))

12 seconds

We can see ts_vector help the search.

Actions

Copy link

#51

Updated by Jing Tao almost 17 years ago

Now we can conclude:
ts search helps the query 1. without specifying xpath.
2. xpath is not in xml_path_index.
However, for xpath is in xml_path_index, the search speed is slower.

In order to figure out why it is slower I did following test. We can see if i remove the line : path IN ('abstract/para','givenName','keyword','organizationName','title','surName','para','geographicDescription','literalLayout')

The speed will jump to 0.8 seconds from 47 seconds. Any one have idea about this?

Here is the query and query plan in postgresql:
SELECT queryresult_string
FROM xml_queryresult
WHERE docid IN
(
(
SELECT DISTINCT docid FROM xml_path_index WHERE
(path IN ('abstract/para','givenName','keyword','organizationName','title','surName','para','geographicDescription','literalLayout')
AND nodedatavector @@ to_tsquery('english', 'test')
)
)
)
AND (
docid IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'allow' AND permission > 3)
)
AND
docid NOT IN (
SELECT docid from xml_access WHERE = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3)
)
)
AND (
returnfield_id=1
)

Nested Loop IN Join  (cost=11787.67..27048.77 rows=1 width=969) (actual time=145.877..47048.898 rows=691 loops=1)
   Join Filter: ((xml_queryresult.docid)::text = (public.xml_access.docid)::text)
   ->  Nested Loop  (cost=11787.67..19651.83 rows=1 width=1510) (actual time=142.646..175.635 rows=691 loops=1)
         ->  Unique  (cost=3942.40..3944.56 rows=2 width=25) (actual time=17.997..25.338 rows=737 loops=1)
               ->  Sort  (cost=3942.40..3943.48 rows=432 width=25) (actual time=17.997..20.641 rows=1464 loops=1)
                     Sort Key: xml_path_index.docid
                     Sort Method:  quicksort  Memory: 93kB
                     ->  Bitmap Heap Scan on xml_path_index  (cost=60.88..3923.49 rows=432 width=25) (actual time=0.000..5.040 rows=1464 loops=1)
                           Recheck Cond: (nodedatavector @ '''test'''::tsquery)
                           Filter: ((path)::text = ANY (('{abstract/para,givenName,keyword,organizationName,title,surName,para,geographicDescription,literalLayout}'::character varying[])::text[]))
                           ->  Bitmap Index Scan on xml_path_index_vector  (cost=0.00..60.77 rows=1069 width=0) (actual time=0.000..0.000 rows=1569 loops=1)
                                 Index Cond: (nodedatavector @ '''test'''::tsquery)
         ->  Index Scan using xml_queryresult_idx1 on xml_queryresult  (cost=7845.27..7853.61 rows=1 width=994) (actual time=0.191..0.197 rows=1 loops=737)
               Index Cond: ((xml_queryresult.returnfield_id = 1) AND ((xml_queryresult.docid)::text = (xml_path_index.docid)::text))
               Filter: (NOT (hashed subplan))
               SubPlan
                 ->  Seq Scan on xml_access  (cost=0.00..7845.27 rows=1 width=30) (actual time=0.000..123.606 rows=124 loops=1)
                       Filter: ((permission > 3) AND ((perm_type)::text = 'deny'::text) AND ((perm_order)::text = 'allowFirst'::text) AND (lower((principal_name)::text) = 'public'::text))
   ->  Seq Scan on xml_access  (cost=0.00..7385.46 rows=918 width=30) (actual time=0.013..59.893 rows=3482 loops=691)
         Filter: ((public.xml_access.permission > 3) AND ((public.xml_access.perm_type)::text = 'allow'::text) AND (lower((public.xml_access.principal_name)::text) = 'public'::text))
 Total runtime: 47050.383 ms

SELECT queryresult_string
FROM xml_queryresult
WHERE docid IN
(
(
SELECT DISTINCT docid FROM xml_path_index WHERE
(nodedatavector @@ to_tsquery('english', 'test')

)
                     )
                    )  
      AND (
           docid IN (
                      SELECT docid from xml_access WHERE( (lower(principal_name) = 'public') AND perm_type = 'allow' AND permission > 3)
                     ) 
            AND 
            docid NOT IN (
                           SELECT docid from xml_access WHERE( (lower(principal_name) = 'public') AND perm_type = 'deny' AND perm_order ='allowFirst' AND permission > 3) 
                         )
          )
      AND (
           returnfield_id=1
          )

Hash Join  (cost=19200.43..42775.07 rows=1 width=969) (actual time=758.885..807.122 rows=691 loops=1)
   Hash Cond: ((xml_queryresult.docid)::text = (public.xml_access.docid)::text)
   ->  Nested Loop  (cost=11810.67..35385.30 rows=2 width=1510) (actual time=141.284..171.137 rows=691 loops=1)
         ->  Unique  (cost=3965.40..3970.75 rows=4 width=25) (actual time=19.300..26.762 rows=739 loops=1)
               ->  Sort  (cost=3965.40..3968.08 rows=1069 width=25) (actual time=19.297..22.357 rows=1552 loops=1)
                     Sort Key: xml_path_index.docid
                     Sort Method:  quicksort  Memory: 97kB
                     ->  Bitmap Heap Scan on xml_path_index  (cost=61.04..3911.62 rows=1069 width=25) (actual time=1.029..5.876 rows=1552 loops=1)
                           Recheck Cond: (nodedatavector @ '''test'''::tsquery)
                           ->  Bitmap Index Scan on xml_path_index_vector  (cost=0.00..60.77 rows=1069 width=0) (actual time=0.882..0.882 rows=1569 loops=1)
                                 Index Cond: (nodedatavector @ '''test'''::tsquery)
         ->  Index Scan using xml_queryresult_idx1 on xml_queryresult  (cost=7845.27..7853.61 rows=1 width=994) (actual time=0.182..0.185 rows=1 loops=739)
               Index Cond: ((xml_queryresult.returnfield_id = 1) AND ((xml_queryresult.docid)::text = (xml_path_index.docid)::text))
               Filter: (NOT (hashed subplan))
               SubPlan
                 ->  Seq Scan on xml_access  (cost=0.00..7845.27 rows=1 width=30) (actual time=0.594..120.731 rows=124 loops=1)
                       Filter: ((permission > 3) AND ((perm_type)::text = 'deny'::text) AND ((perm_order)::text = 'allowFirst'::text) AND (lower((principal_name)::text) = 'public'::text))
   ->  Hash  (cost=7388.65..7388.65 rows=89 width=30) (actual time=617.598..617.598 rows=23194 loops=1)
         ->  HashAggregate  (cost=7387.76..7388.65 rows=89 width=30) (actual time=488.550..539.284 rows=23194 loops=1)
               ->  Seq Scan on xml_access  (cost=0.00..7385.46 rows=918 width=30) (actual time=0.000..406.257 rows=23194 loops=1)
                     Filter: ((permission > 3) AND ((perm_type)::text = 'allow'::text) AND (lower((principal_name)::text) = 'public'::text))
 Total runtime: 811.484 ms

Actions

Copy link

#52

Updated by Redmine Admin almost 12 years ago

Original Bugzilla ID was 2155

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Metacat

Custom queries

Bug #2155

Metacat Performace: Rewrite the xml_nodes queries

Updated by Saurabh Garg over 18 years ago

Updated by Matt Jones over 18 years ago

Updated by Saurabh Garg over 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Matt Jones almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao almost 18 years ago

Updated by Jing Tao over 17 years ago

Updated by Jing Tao over 17 years ago

Updated by Jing Tao over 17 years ago

Updated by Jing Tao over 17 years ago

Updated by Jing Tao about 17 years ago

Updated by Jing Tao about 17 years ago

Updated by Jing Tao almost 17 years ago

Updated by Jing Tao almost 17 years ago

Updated by Redmine Admin almost 12 years ago