Bug #5929: replication update action times out - Metacat - Ecoinformatics Redmine

Actions

Copy link

Bug #5929

closed

replication update action times out

Added by Matt Jones over 11 years ago. Updated over 11 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Matt Jones

Category:

metacat

Target version:

2.0.7

Start date:

04/30/2013

Due date:

% Done:

Estimated time:

Bugzilla-Id:

Description

With large database sizes, the replications "update" action times out with normal settings for HTTP timeouts. On DataONE CNs, this action can take more than 4 minutes. I traced this down to the SQL query used to find deleted documents:

select distinct docid from xml_revisions where docid not in (select docid from xml_documents) and server_location = 1;

which takes an excessive amount of time because it materializes a large table. See the DataONE ticket for details (https://redmine.dataone.org/issues/3740).

Actions

Copy link

Updated by Matt Jones over 11 years ago

By replacing the subselect with a LEFT JOIN, we can massively improve performance of this query. Here's the same query rewritten as a left join:

SELECT t1.docid FROM xml_revisions t1
LEFT JOIN xml_documents t2 on t1.docid = t2.docid
WHERE t2.docid IS NULL
AND t1.server_location = 1;

This seems to produce the same results as the original query (verified by running both queries on the same corpus of data, producing identical 6311 result records). While the old query took 724940.251 ms, the new query takes 593.911 ms, representing a 1220 to 1 performance improvement.

Committed fix in r7636.

Actions

Copy link