replication update action times out
With large database sizes, the replications "update" action times out with normal settings for HTTP timeouts. On DataONE CNs, this action can take more than 4 minutes. I traced this down to the SQL query used to find deleted documents:
select distinct docid from xml_revisions where docid not in (select docid from xml_documents) and server_location = 1;
which takes an excessive amount of time because it materializes a large table. See the DataONE ticket for details (https://redmine.dataone.org/issues/3740).
#1 Updated by Matt Jones over 8 years ago
By replacing the subselect with a LEFT JOIN, we can massively improve performance of this query. Here's the same query rewritten as a left join:
SELECT t1.docid FROM xml_revisions t1
LEFT JOIN xml_documents t2 on t1.docid = t2.docid
WHERE t2.docid IS NULL
AND t1.server_location = 1;
This seems to produce the same results as the original query (verified by running both queries on the same corpus of data, producing identical 6311 result records). While the old query took 724940.251 ms, the new query takes 593.911 ms, representing a 1220 to 1 performance improvement.
Committed fix in r7636.