Bug #111: reading large documents from metacat is slow - Metacat - Ecoinformatics Redmine

Actions

Copy link

Bug #111

closed

reading large documents from metacat is slow

Added by Matt Jones over 24 years ago. Updated over 22 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Matt Jones

Category:

metacat

Target version:

Beta1 (AnnMeet2000)

Start date:

08/31/2000

Due date:

% Done:

Estimated time:

Bugzilla-Id:

111

Description

Reading documents from metacat seems to scale with document size, and gets to be
extremely slow for even medium sized documents. This is probably because we
recursively create a whole tree of objects representing every element and
attribute and test node when reading from the database, and this is an expensive
operation. Need to change this so that the conversion from the resultset into
an XML form is all done within a single instance of a class (probably within
DocumentImpl.java).

Actions

Copy link

Updated by Matt Jones over 24 years ago

Fixed document reading bug (bugzilla bug #111) so that reading documents is
no longer a power function of the number of nodes in the document which
used to be the case). Now, reading a document occurs entirely within
DocumentImpl, by making a single SQL call to get the document data, and then
using the NodeComparator class to return a TreeSet of the nodes sorted in
a depth-first traversal order. This TreeSet is then processed by the new
DocumentImpl.toXml() methods, which formats and outputs a text representation
of the document to the Writer that is passed in. The DocumentImpl.toString()
method has been re-written to utilize DocumentImpl.toXml() as well.

The old algorithm for searching (that utilized the ElementNode, textNode,
CommentNode, and PINode classes) is still implemented for comparison
purposes, and can be accessed by calling the readUsingSlowAlgorithm() method.
A timing option has been added to DocumentImpl.main() so that the methods
can be compared (see the -t and -old options). Although the difference
in read time is only a fraction of a second for small documents (< 1K),
the new method of reading is 72 times faster than the old method for a
34K document (1.9 seconds versus 144 seconds). This difference continues
to grow as the node count increases.

Actions

Copy link

Updated by Redmine Admin over 11 years ago

Original Bugzilla ID was 111

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Metacat

Custom queries

Bug #111

reading large documents from metacat is slow

Updated by Matt Jones over 24 years ago

Updated by Redmine Admin over 11 years ago