/ - Diff - Metacat - Ecoinformatics Redmine

« Previous | Next »

Revision 429

Added by Matt Jones about 24 years ago

Fixed document reading bug (bugzilla bug #111) so that reading documents
is no longer a power function of the number of nodes in the document
(which used to be the case). Now, reading a document occurs entirely
within DocumentImpl, by making a single SQL call to get the document data,
and then using the NodeComparator class to return a TreeSet of the nodes
sorted in a depth-first traversal order. This TreeSet is then processed
by the new DocumentImpl.toXml() methods, which formats and outputs a text
representation of the document to the Writer that is passed in. The
DocumentImpl.toString() method has been re-written to utilize
DocumentImpl.toXml() as well.

The old algorithm for searching (that utilized the ElementNode, textNode,
CommentNode, and PINode classes) is still implemented for comparison purposes,
and can be accessed by calling the readUsingSlowAlgorithm() method. A
timing option has been added to DocumentImpl.main() so that the methods can be
compared (see the -t and -old options). Although the difference in read
time is only a fraction of a second for small documents (< 1K), the new
method of reading is 72 times faster than the old method for a 34K
document (1.9 seconds versus 144 seconds). This difference continues to grow
as the node count increases.
BugID: 111

     package edu.ucsb.nceas.metacat;
     import java.sql.*;
     import java.io.File;
     import java.io.FileReader;
     import java.io.IOException;
     import java.io.PrintWriter;
     import java.util.TreeSet;
     import java.io.Reader;
     import java.io.StringWriter;
     import java.io.Writer;
     import java.io.*;
     import java.util.Iterator;
     import java.util.Stack;
     import java.util.TreeSet;
     import org.xml.sax.AttributeList;
     import org.xml.sax.ContentHandler;
-...
       private long rootnodeid;
       private ElementNode rootNode = null;
       private String doctitle = null;
       private TreeSet nodeRecordList = null;
       /**
        * Constructor, creates document from database connection, used
-...
           getDocumentInfo(docid);
           // Download all of the document nodes using a single SQL query
           TreeSet nodeRecordList = getNodeRecordList(rootnodeid);
           // The sort order of the records is determined by the NodeComparator
           // class, and needs to represent a depth-first traversal for the
           // toXml() method to work properly
           nodeRecordList = getNodeRecordList(rootnodeid);
           // Create the elements from the downloaded data in the TreeSet
           rootNode = new ElementNode(nodeRecordList, rootnodeid);
         } catch (McdbException ex) {
           throw ex;
         } catch (Throwable t) {
-...
         return docid;
+      }
       /**
        * Create an XML document from the database for the document with ID docid
        * Print a string representation of the XML document
        */
       public String toString()
+      {
         StringWriter docwriter = new StringWriter();
         this.toXml(docwriter);
         String document = docwriter.toString();
         return document;
+      }
       /**
        * Get a text representation of the XML document as a string
        * This older algorithm uses a recursive tree of Objects to represent the
        * nodes of the tree.  Each object is passed the data for the document
        * and searches all of the document data to find its children nodes and
        * recursively build.  Thus, because each node reads the whole document,
        * this algorithm is extremely slow for larger documents, and the time
        * to completion is O(N^N) wrt the number of nodes.  See toXml() for a
        * better algorithm.
        */
       public String readUsingSlowAlgorithm()
+      {
         StringBuffer doc = new StringBuffer();
         // Create the elements from the downloaded data in the TreeSet
         rootNode = new ElementNode(nodeRecordList, rootnodeid);
         // Append the resulting document to the StringBuffer and return it
         doc.append("<?xml version=\"1.0\"?>\n");
-...
+      }
       /**
        * Print a text representation of the XML document to a Writer
+       *
        * @param pw the Writer to which we print the document
        */
       public void toXml(Writer pw)
+      {
         PrintWriter out = null;
         if (pw instanceof PrintWriter) {
           out = (PrintWriter)pw;
         } else {
           out = new PrintWriter(pw);
+        }
         MetaCatUtil util = new MetaCatUtil();
         Stack openElements = new Stack();
         boolean atRootElement = true;
         boolean previousNodeWasElement = false;
         // Step through all of the node records we were given
         Iterator it = nodeRecordList.iterator();
         while (it.hasNext()) {
           NodeRecord currentNode = (NodeRecord)it.next();
           //util.debugMessage("[Got Node ID: " + currentNode.nodeid +
                               //" (" + currentNode.parentnodeid +
                               //", " + currentNode.nodeindex +
                               //", " + currentNode.nodetype +
                               //", " + currentNode.nodename +
                               //", " + currentNode.nodedata + ")]");
           // Print the end tag for the previous node if needed
           //
           // This is determined by inspecting the parent nodeid for the
           // currentNode.  If it is the same as the nodeid of the last element
           // that was pushed onto the stack, then we are still in that previous
           // parent element, and we do nothing.  However, if it differs, then we
           // have returned to a level above the previous parent, so we go into
           // a loop and pop off nodes and print out their end tags until we get
           // the node on the stack to match the currentNode parentnodeid
           //
           // So, this of course means that we rely on the list of elements
           // having been sorted in a depth first traversal of the nodes, which
           // is handled by the NodeComparator class used by the TreeSet
           if (!atRootElement) {
             NodeRecord currentElement = (NodeRecord)openElements.peek();
             if ( currentNode.parentnodeid != currentElement.nodeid ) {
               while ( currentNode.parentnodeid != currentElement.nodeid ) {
                 currentElement = (NodeRecord)openElements.pop();
                 util.debugMessage("\n POPPED: " + currentElement.nodename);
                 out.print("</" + currentElement.nodename + ">" );
                 currentElement = (NodeRecord)openElements.peek();
+              }
+            }
+          }
           // Handle the DOCUMENT node
           if (currentNode.nodetype.equals("DOCUMENT")) {
             out.println("<?xml version=\"1.0\"?>");
             if (docname != null) {
               if ((doctype != null) && (system_id != null)) {
                 out.println("<!DOCTYPE " + docname + " PUBLIC \"" + doctype +
                            "\" \"" + system_id + "\">");
               } else {
                 out.println("<!DOCTYPE " + docname + ">");
+              }
+            }
           // Handle the ELEMENT nodes
           } else if (currentNode.nodetype.equals("ELEMENT")) {
             if (atRootElement) {
               atRootElement = false;
             } else {
               if (previousNodeWasElement) {
                 out.print(">");
+              }
+            }
             openElements.push(currentNode);
             util.debugMessage("\n PUSHED: " + currentNode.nodename);
             previousNodeWasElement = true;
             out.print("<" + currentNode.nodename);
           // Handle the ATTRIBUTE nodes
           } else if (currentNode.nodetype.equals("ATTRIBUTE")) {
             out.print(" " + currentNode.nodename + "=\""
                      + currentNode.nodedata + "\"");
           } else if (currentNode.nodetype.equals("TEXT")) {
             if (previousNodeWasElement) {
               out.print(">");
+            }
             out.print(currentNode.nodedata);
             previousNodeWasElement = false;
           // Handle the COMMENT nodes
           } else if (currentNode.nodetype.equals("COMMENT")) {
             if (previousNodeWasElement) {
               out.print(">");
+            }
             out.print("<!--" + currentNode.nodedata + "-->");
             previousNodeWasElement = false;
           // Handle the PI nodes
           } else if (currentNode.nodetype.equals("PI")) {
             if (previousNodeWasElement) {
               out.print(">");
+            }
             out.print("<?" + currentNode.nodename + " " +
                             currentNode.nodedata + "?>");
             previousNodeWasElement = false;
           // Handle any other node type (do nothing)
           } else {
             // Any other types of nodes are not handled.
             // Probably should throw an exception here to indicate this
+          }
           out.flush();
+        }
         // Print the final end tag for the root element
         NodeRecord currentElement = (NodeRecord)openElements.pop();
         util.debugMessage("\n POPPED: " + currentElement.nodename);
         out.print("</" + currentElement.nodename + ">" );
         out.flush();
+      }
       /**
        * Look up the document type information from the database
+       *
        * @param docid the id of the document to look up
-...
           String filename = null;
           String action   = null;
           String docid    = null;
           boolean showRuntime = false;
           boolean useOldReadAlgorithm = false;
           // Parse the command line arguments
           for ( int i=0 ; i < args.length; ++i ) {
-...
               action =  args[++i];
             } else if ( args[i].equals( "-d" ) ) {
               docid =  args[++i];
             } else if ( args[i].equals( "-t" ) ) {
               showRuntime = true;
             } else if ( args[i].equals( "-old" ) ) {
               useOldReadAlgorithm = true;
             } else {
               System.err.println
                 ( "   args[" +i+ "] '" +args[i]+ "' ignored." );
-...
           if (!argsAreValid) {
             System.err.println("Wrong number of arguments!!!");
             System.err.println(
                   "USAGE: java DocumentImpl <-a INSERT> [-d docid] <-f filename>");
               "USAGE: java DocumentImpl [-t] <-a INSERT> [-d docid] <-f filename>");
             System.err.println(
                   "   OR: java DocumentImpl <-a UPDATE -d docid -f filename>");
               "   OR: java DocumentImpl [-t] <-a UPDATE -d docid -f filename>");
             System.err.println(
                   "   OR: java DocumentImpl <-a DELETE -d docid>");
               "   OR: java DocumentImpl [-t] <-a DELETE -d docid>");
             System.err.println(
                   "   OR: java DocumentImpl <-a READ -d docid>");
               "   OR: java DocumentImpl [-t] [-old] <-a READ -d docid>");
             return;
+          }
           // Time the request if asked for
           double startTime = System.currentTimeMillis();
           // Open a connection to the database
           MetaCatUtil util = new MetaCatUtil();
           Connection dbconn = util.openDBConnection();
-...
           // Execute the action requested (READ, INSERT, UPDATE, DELETE)
           if (action.equals("READ")) {
               DocumentImpl xmldoc = new DocumentImpl( dbconn, docid );
               System.out.println(xmldoc.toString());
               if (useOldReadAlgorithm) {
                 System.out.println(xmldoc.readUsingSlowAlgorithm());
               } else {
                 xmldoc.toXml(new PrintWriter(System.out));
+              }
           } else if (action.equals("DELETE")) {
             DocumentImpl.delete(dbconn, docid, null, null);
             System.out.println("Document deleted: " + docid);
-...
                   + " (" + newdocid + ")");
+          }
           double stopTime = System.currentTimeMillis();
           double executionTime = (stopTime - startTime)/1000;
           if (showRuntime) {
             System.out.println("\n\nExecution time was: " +
                                executionTime + " seconds");
+          }
         } catch (McdbException me) {
           me.toXml(new PrintWriter(System.err));
         } catch (AccessionNumberException ane) {

     import java.util.Comparator;
     /**
      * A utility class that sorts two node records
      * A utility class that sorts two node records.
      * <p>
      * The order of the records
      * determines how the XML document is printed from DocumentImpl.toXml(),
      * so it is important that the sort order specified here results in a depth
      * first traversal of the nodes in tree.  Currently, the nodes are inserted
      * into the database in this depth-forst order, so the nodeid identifiers
      * are a good indicator of the proper sort order.
      * <p>
      * However, if we modify data loading semantics to allow document nodes to
      * be rearranged, or otherwise change the nodeindex value, this current
      * sort algorithm will fail to work.
      */
     public class NodeComparator implements Comparator {
-...
       public int compare(NodeRecord o1, NodeRecord o2) {
         if (o1.nodeid == o2.nodeid) {
           return EQUALS;
         } else if (o1.nodeid < o2.nodeid) {
           return LESS;
         } else if (o1.nodeid > o2.nodeid) {
           return GREATER;
     /*  // This is old code that used to sort the records into breadth-first
         // traversal order, based on the parentnodeid and the nodeindex.
         //
         if (o1.nodeid == o2.nodeid) {
           return EQUALS;
         } else if (o1.parentnodeid < o2.parentnodeid) {
           return LESS;
         } else if (o1.parentnodeid > o2.parentnodeid) {
-...
             // this should never happen because (parentnodeid,nodeindex) is unique
             return EQUALS;
+          }
     */
         } else {
           // this should never happen because parentnodeid is always <,>, or =
           return EQUALS;

Also available in: Unified diff

Project

General

Profile

Metacat

Revision 429

Added by Matt Jones about 24 years ago