Bug #2084
closedPathquery support for temporal search on date fields
0%
Description
Metacat pathquery relational search modes ("greater-than", "less-than", etc.) do
not currently support temporal searches on date fields. The reasons for this are
described in the email correspondence to metacat-dev below. This enhancement
would make it possible to do temporal searches using date ranges, which would be
a important feature in an "Advanced Search" form (such as the one currently
under development at LTER), and could also be added to the search dialog in Morpho.
---
On 5/17/2005, Duane Costa wrote:
Metacat supports the following pathquery search modes: contains, starts-with,
ends-with, equals, isnot-equal, greater-than, less-than, greater-than-equals,
less-than-equals.
For the search modes that are equivalent to relational operators (equals,
isnot-equal, greater-than, less-than, greater-than-equals, less-than-equals), is
it possible to use these search modes in EML fields that contain non-numeric
string values? In particular, is it possible to use the relational search modes
for date strings?
For example, here is a pathquery that attempts to find all documents with
temporal coverage between January 1, 1900 and January 1, 2005. It reads like
this: “Return all documents that have a beginDate or a singleDateTime greater
than or equal to 1900-01-01, and an endDate or a singleDateTime less than or
equal to 2005-01-01.”
<query>
<pathquery version="1.2">
<querytitle>LTER Query</querytitle>
<returnfield>dataset/title</returnfield>
<returnfield>originator/individualName/surName</returnfield>
<returnfield>creator/individualName/surName</returnfield>
<returnfield>originator/organizationName</returnfield>
<returnfield>creator/organizationName</returnfield>
<returnfield>keyword</returnfield>
<querygroup operator="INTERSECT">
<querygroup operator="INTERSECT">
<querygroup operator="INTERSECT">
<querygroup operator="UNION">
<queryterm searchmode="greater-than-equals" casesensitive="false">
<value>1900-01-01</value>
<pathexpr>temporalCoverage/rangeOfDates/beginDate/calendarDate</path
expr>
</queryterm>
<queryterm searchmode="greater-than-equals" casesensitive="false">
<value>1900-01-01</value>
<pathexpr>temporalCoverage/singleDateTime/calendarDate</pathexpr>
</queryterm>
</querygroup>
<querygroup operator="UNION">
<queryterm searchmode="less-than-equals" casesensitive="false">
<value>2005-01-01</value>
<pathexpr>temporalCoverage/rangeOfDates/endDate/calendarDate</pathex
pr>
</queryterm>
<queryterm searchmode="less-than-equals" casesensitive="false">
<value>2005-01-01</value>
<pathexpr>temporalCoverage/singleDateTime/calendarDate</pathexpr>
</queryterm>
</querygroup>
</querygroup>
</querygroup>
</querygroup>
</pathquery>
</query>
When I run this against a test Metacat with an Oracle database, this pathquery
fails and the resultset contains zero documents. The Tomcat output shows that
the pathquery triggers a SQL error in Oracle:
MetaCat: SQL Error in DBQuery.findDocuments: ORA-01722: invalid number
So my question is this: can this problem with temporal search be fixed in the
same way that Sid fixed a similar bug for spatial search (Bug 1703, 1718), or is
this a different situation because of the fact that the temporal fields contain
non-numeric strings while the spatial fields contain numeric values? That is, is
it illegal to use the relational pathquery modes for EML fields that contain
non-numeric strings? If that is the case, it seems that there would be no
practical way to use pathquery for a temporal search involving date ranges: is
that correct?
On 5/18/2005, Chris Jones wrote:
Duane,
I've thought about this myself, and it seems that an internal metacat solution
needs to happen where date/time strings are converted or linked to a universal
date representation in order to do comparisons. This might be tough. Each
vendor's database seems to store dates internally in very different ways. It
would be difficult to create another column in the xml_nodes table that is of
type 'date' (depending on the vendor), because an EML (or other XML) date string
isn't necessarily recognizable as a date, whereas an integer or float is much
more discernible (in the case of the nodedatanumerical column) in xml_nodes.
However, with strong typing used in XMLSchema, metacat could in theory glean
'date' strings as type 'date' by referring to the element definition in the
XMLSchema document to which the instance document adheres. In that case, a
nodedatadate column in xml_nodes might work out where, upon insert or update,
the leaf node values get put into that column as a converted date. Of course,
this leads to: which date formats will be supported for conversion?
Anyone have a better solution?
Chris
On 5/20/2005, Matt Jones wrote:
Chris and Duane,
I think Chris' analysis of the issues with adding support for datetime
comparisons is just about right on. It would be a nice feature to have, but
implementation would need to accomodate multiple database systems.
I've even been contemplating adding support for Sleepycat XMLdb or other XML
databases instead of relational backends, and this would further complicate the
implementation issues for datetime values. But it would be worthwhile. Lets
get it into bugzilla as a feature request and we'll see where it falls out in
the priority order.
Matt