Indexing for Performance

Back | Home | Next

Metacat DB stores indeces for the current version of all documents in xml_index table. All relative and absolute paths to any node in a document are stored in this table. These paths are used for structured query searches to a specified location in an XML tree. Path expressions specified in the pathquery document are queried from this table.

Indeces become necessary because relational databases are not efficient at querying tree structures. By slightly denormalizing the database and listing the tree structure in a flat table, the relational database engine can more effectively handle large path queries.

Example:

    <?xml version="1.0"?>
    <!DOCTYPE employee>
    <employee>
      <name>
        <first>Chad</first>
        <last>Berkley</last>
      </name>
      <address>
        <street>735 State St. Ste. 303</street>
        <city>Santa Barbara</city>
        <state>California</state> 
        <zip>93101</zip>
      </address>
      <occupation>
        <title>Metadata Systems Developer</title>
        <location>
          <type>
            <grant>
              <name>DBA</name>
              <grantor>NSF</grantor>
              <PI>Jim Reichman</PI>
              <PI>Matt Jones</PI>
              <PI>Mark Schildhauer</PI>
            </grant>
          </type>
          <system>UC</system>
          <location>Santa Barbara</location>
        </location>
      </occupation>
    </employee>
  

The XML document above is logically deconstructed into all possible paths through the document. Each path (relative and absolute) is entered as a record in xml_index along with the nodeid of the deepest node. In this way, any path can be quickly queried. It's contents can be assertained from cross linking the nodeid back to the xml_nodes table.
PATHNODEIDPARENTNODEID
occupation/location/type/grant/grantor200150200145
location/type/grant/grantor200150200145
grantor200150200145
location/type/grant/PI200153200145
grant/PI200153 200145
/employee/occupation/location/type/grant/PI200153 200145
employee/occupation/location/type/grant/PI200153 200145
occupation/location/type/grant/PI200153 200145
type/grant/PI200153 200145
PI200153 200145
location/type/grant/PI200156 200145
grant/PI200156 200145
/employee/occupation/location/type/grant/PI200156 200145
employee/occupation/location/type/grant/PI200156 200145
occupation/location/type/grant/PI200156 200145
type/grant/PI200156 200145
PI 200156 200145
location/type/grant/PI200159 200145
grant/PI200159 200145
/employee/occupation/location/type/grant/PI200159 200145
employee/occupation/location/type/grant/PI200159 200145
occupation/location/type/grant/PI200159 200145
type/grant/PI200159 200145
PI200159 200145
occupation/location/system200164 200141
system200164 200141
employee/occupation/location/system200164 200141
location/system200164 200141
/employee/occupation/location/system200164 200141
/employee/occupation/location/location200167 200141
employee/occupation/location/location200167 200141
occupation/location/location200167 200141
location200167 200141
location/location200167 200141
/employee/name200112 200110
name200112 200110
employee/name200112 200110
/employee/name/first200114 200112
name/first200114 200112
/employee200110 200109
employee200110 200109
employee/name/first200114 200112
first200114 200112
name/last200117 200112
/employee/name/last200117 200112
employee/name/last200117 200112
last200117 200112
employee/address200121 200110
address200121 200110
/employee/address200121 200110
/employee/address/street200123 200121
employee/address/street200123 200121
address/street200123 200121
street200123 200121
employee/address/city200126 200121
address/city200126 200121
/employee/address/city200126 200121
city200126 200121
address/state200129 200121
/employee/address/state200129 200121
employee/address/state200129 200121
state200129 200121
employee/address/zip200132 200121
zip200132 200121
/employee/address/zip200132 200121
address/zip200132 200121
employee/occupation200136 200110
/employee/occupation200136 200110
occupation200136 200110
/employee/occupation/title200138 200136
employee/occupation/title200138 200136
occupation/title200138 200136
title200138 200136
location200141 200136
employee/occupation/location200141 200136
/employee/occupation/location200141 200136
occupation/location200141 200136
occupation/location/type200143 200141
type200143 200141
/employee/occupation/location/type200143 200141
location/type200143 200141
employee/occupation/location/type200143 200141
type/grant200145 200143
occupation/location/type/grant200145 200143
grant200145 200143
location/type/grant200145 200143
/employee/occupation/location/type/grant200145 200143
employee/occupation/location/type/grant200145 200143
grant/name200147 200145
/employee/occupation/location/type/grant/name200147 200145
occupation/location/type/grant/name200147 200145
name200147 200145
type/grant/name200147 200145
location/type/grant/name200147 200145
employee/occupation/location/type/grant/name200147 200145
/employee/occupation/location/type/grant/grantor200150 200145
type/grant/grantor200150 200145
grant/grantor200150 200145
employee/occupation/location/type/grant/grantor200150 200145

The following is the document in xml_nodes.


nodeidnodetypenodename nodedatarootnodeidparentnodeid
200164 ELEMENT system 200109200141
200165 TEXT  UC200109200164
200166 TEXT   200109200141
200167 ELEMENT location 200109200141
200168 TEXT  Santa Barbara200109200167
200169 TEXT   200109200141
200170 TEXT   200109200136
200171 TEXT   200109200110
200109 DOCUMENT employee 200109 
200110 ELEMENT employee 200109200109
200111 TEXT   200109200110
200112 ELEMENT name 200109200110
200113 TEXT   200109200112
200114 ELEMENT first 200109200112
200115 TEXT  Chad200109200114
200116 TEXT   200109200112
200117 ELEMENT last 200109200112
200118 TEXT  Berkley200109200117
200119 TEXT   200109200112
200120 TEXT   200109200110
200121 ELEMENT address 200109200110
200122 TEXT   200109200121
200123 ELEMENT street 200109200121
200124 TEXT  735 State St. Ste. 303200109200123
200125 TEXT   200109200121
200126 ELEMENT city 200109200121
200127 TEXT  Santa Barbara200109200126
200128 TEXT   200109200121
200129 ELEMENT state 200109200121
200130 TEXT  California200109200129
200131 TEXT   200109200121
200132 ELEMENT zip 200109200121
200133 TEXT  93101200109200132
200134 TEXT   200109200121
200135 TEXT   200109200110
200136 ELEMENT occupation 200109200110
200137 TEXT   200109200136
200138 ELEMENT title 200109200136
200139 TEXT  Metadata Systems Developer200109200138
200140 TEXT   200109200136
200141 ELEMENT location 200109200136
200142 TEXT   200109200141
200143 ELEMENT type 200109200141
200144 TEXT   200109200143
200145 ELEMENT grant 200109200143
200146 TEXT   200109200145
200147 ELEMENT name 200109200145
200148 TEXT  DBA200109200147
200149 TEXT   200109200145
200150 ELEMENT grantor 200109200145
200151 TEXT  NSF200109200150
200152 TEXT   200109200145
200153 ELEMENT PI 200109200145
200154 TEXT  Jim Reichman200109200153
200155 TEXT   200109200145
200156 ELEMENT PI 200109200145
200157 TEXT  Matt Jones200109200156
200158 TEXT   200109200145
200159 ELEMENT PI 200109200145
200160 TEXT  Mark Schilhauer200109200159
200161 TEXT   200109200145
200162 TEXT   200109200143
200163 TEXT   200109200141

Back | Home | Next