Project

General

Profile

Bug #2060

Documents not indexed because of error generated during indexing of documents

Added by Saurabh Garg over 14 years ago. Updated almost 14 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
metacat
Target version:
Start date:
04/04/2005
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
2060

Description

Sometimes given below errors have been seen in the log files. These are
generated because of lack of coordination between the thread updating the
xml_nodes and the thread updating the xml_index table. The result is that
sometimes you have documents which are not indexed.

MetaCat: Error in DBSAXHandler.checkDocumentTable Couldn't find the docid for
index build in reseaonable time!
MetaCat: SQL Exception while inserting path index in DocumentImpl.buildIndex
for document test.200594171957
MetaCat: ORA-02291: integrity constraint (SGARG.XML_INDEX_DOCID_FK) violated -
parent key not found

java.sql.SQLException: ORA-02291: integrity constraint
(SGARG.XML_INDEX_DOCID_FK) violated - parent key not found

at oracle.jdbc.driver.DatabaseError.throwSqlException
(DatabaseError.java:125)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:305)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:272)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:623)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8
(T4CPreparedStatement.java:181)
at oracle.jdbc.driver.T4CPreparedStatement.execute_for_rows
(T4CPreparedStatement.java:543)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout
(OracleStatement.java:1028)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal
(OraclePreparedStatement.java:2888)
at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate
(OraclePreparedStatement.java:2960)
at edu.ucsb.nceas.metacat.DocumentImpl.updateNodeIndex
(DocumentImpl.java:1345)
at edu.ucsb.nceas.metacat.DocumentImpl.buildIndex
(DocumentImpl.java:1214)
at edu.ucsb.nceas.metacat.DBSAXHandler.run(DBSAXHandler.java:444)
at java.lang.Thread.run(Thread.java:534)

MetaCat: SQL Exception while inserting path index in DocumentImpl.buildIndex
for document test.20059417224
MetaCat: ORA-00001: unique constraint (SGARG.XML_INDEX_PK) violated

java.sql.SQLException: ORA-00001: unique constraint (SGARG.XML_INDEX_PK)
violated

at oracle.jdbc.driver.DatabaseError.throwSqlException
(DatabaseError.java:125)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:305)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:272)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:623)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8
(T4CPreparedStatement.java:181)
at oracle.jdbc.driver.T4CPreparedStatement.execute_for_rows
(T4CPreparedStatement.java:543)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout
(OracleStatement.java:1028)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal
(OraclePreparedStatement.java:2888)
at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate
(OraclePreparedStatement.java:2960)
at edu.ucsb.nceas.metacat.DocumentImpl.updateNodeIndex
(DocumentImpl.java:1345)
at edu.ucsb.nceas.metacat.DocumentImpl.buildIndex
(DocumentImpl.java:1214)
at edu.ucsb.nceas.metacat.DBSAXHandler.run(DBSAXHandler.java:444)
at java.lang.Thread.run(Thread.java:534)

History

#1 Updated by Saurabh Garg about 14 years ago

Changin severity of teh bug to Critical because with the new changes if the
indexing thread fails, the paths will not be indexed in xml_path_index. This
means the document wont show up in the search for specified paths.

#2 Updated by Saurabh Garg about 14 years ago

Fixed.

Moved the call to starting of indexing thread from endDocument to DocumentImpl
after commit has been done. This way when ever a document is indexed it has
already been entered in xml_nodes and xml_documents.

Closing the bug.

#3 Updated by Saurabh Garg about 14 years ago

Still getting the same error...

Metacat: [ERROR]: Error in DBSAXHandler.checkDocumentTable Couldn't find the
docid for index build in reseaonable time! [edu.ucsb.nceas.metacat.DBSAXHandler]
Metacat: [ERROR]: SQL Exception while inserting path index in
DocumentImpl.buildIndex for document sgarg.1130
[edu.ucsb.nceas.metacat.DocumentImpl]
Metacat: [ERROR]: ERROR: insert or update on table "xml_index" violates foreign
key constraint "xml_index_nodeid_fk" [edu.ucsb.nceas.metacat.DocumentImpl]

#4 Updated by Saurabh Garg almost 14 years ago

Increasing severity to Blocking. This is after the update on KNB. The build
index thread is failing on a more regular basis. Maybe because of the size of
the current knb database.

The result is after a document is inserted or updated, it is not searchable
using the new web searches. The modified web searches use the new table -
xml_path_index. If the indexing of document fails, document doesnt show up in
the xml_path_index. Hence when a search is done on xml_path_index, the document
is not found.

If I am not able to figure out the reason for the failure of buildindex search
(the indexing thread is now started after the document has been inserted into
the database. but the indexing thread complains that the document was not found
in xml_documents. so the changes to xml_documents are not commited - this is
what i am not able to figure out), I suggest an alternative approach.

Metacat can run a seperate thread which does indexing after set amount of time.
This could be 2 minutes. So every 2 minutes the following query is run:

SELECT d.docid FROM xml_documents d, xml_index i WHERE d.docid = i.docid(+) AND
i.docid is NULL and (d.doctype like 'eml://ecoinformatics.org/eml-2.0.0' or
d.doctype like 'eml://ecoinformatics.org/eml-2.0.1');

The above query is fast as it is a left outer join. (The syntax for postgres is
different). It is even faster if we have an index on xml_documents.doctype

The documents found in the above query can be indexed.

The downside of the above approach is that there is a delay of 2-3 minutes
between when a document is inserted and when it is searchable.

As indexing fails for updating of documents also, for the above change to work,
the indexing information in xml_index and xml_path_index will have to be
deleted for a document after it has been updated.

#5 Updated by Saurabh Garg almost 14 years ago

Closing for now. The code seems to be working for now.

#6 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 2060

Also available in: Atom PDF