EML: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362011-02-15T15:57:28ZEcoinformatics Redmine
Redmine Bug #5308 (In Progress): Data Manager Library: storageType content should be stored and usedhttps://projects.ecoinformatics.org/ecoinfo/issues/53082011-02-15T15:57:28ZDuane Costadcosta@lternet.edu
<p>'storageType' is an optional, repeatable element within the EML 'attribute' element. In addition to the documentation available in the EML normative documents, several old bug tickets describe the rationale behind this element: <a class="issue tracker-1 status-3 priority-5 priority-highest closed" title="Bug: eml-attribute changes needed (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/484">#484</a>, <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: issues about storageType and attributeDomain (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/544">#544</a>, <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: storageType is repeatable in eml-attribute (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/599">#599</a>.</p>
<p>When the Data Manager Library parses EML attributes, it does not record any 'storageType' content that may be present. This means that the hints that may have been provided by the metadata provider pertaining to how the attribute should be stored optimally (say, in a relational database table), are completely ignored by the Data Manager Library, which instead relies entirely on the 'measurementScale' content for this purpose.</p>
<p>To cite a specific example of how 'storageType' content can be helpful, the document knb-lter-gce.1.9 (<a class="external" href="http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9">http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9</a>) contains three attributes for year, month, and day, respectively. Each of the attributes has storageType set to 'integer' and measurementScale set to 'dateTime'. When loading the data table into a relational database, the Data Manager Library sets the corresponding database fields to type 'timestamp' (in Postgres), having no knowledge that the storage type "hint" was to set the fields to type integer ('int4' in Postgres). The result is that in the original data table entity, the fields appear like this:</p>
<p>2000 8 26</p>
<p>while in the relational database, they appear like this:</p>
<pre><code>year | month | day <br />---------------------+------------------------+------------------------<br /> 2000-01-01 00:00:00 | 0001-08-01 00:00:00 BC | 0001-01-26 00:00:00 BC</code></pre>
<p>It's clear that in this particular case, the Data Manager Library could have used the storageType hint to select a more appropriate data type for these attributes.</p>
<p>The goal of this task is to:</p>
<p>1. Enhance the EML parsing phase of the Data Manager Library, so that it parses and stores all storageType elements that are provided for an attribute.</p>
<p>2. Enhance the data loading phase of the Data Manager Library, so that it uses storageType content, if provided, to make a more informed decision about which data type to define for the attribute. This may involve the need for heuristics to determine which data type is most appropriate under a given set of circumstances, particularly in cases where more than one storageType element is provided for an attribute.</p> Bug #2702 (New): Data Manager Library: Support for online URL referenceshttps://projects.ecoinformatics.org/ecoinfo/issues/27022006-12-15T16:44:49ZDuane Costadcosta@lternet.edu
<p>Next release. Again, this will be rare. Not much to be gained from a URL reference.</p>
<p>Matt</p>
<p>Duane Costa wrote:</p>
<blockquote>
<p>Matt, Mark:</p>
<p>Do you think that handling references to online URLs should be a <br />requirement for the first release of the Data Manager Library (1.0.0), or recorded as an enhancement for the next release (1.1.0)?</p>
<p>Thanks,<br />Duane</p>
<blockquote>
<p>-----Original Message-----<br />From: Jing Tao [mailto:<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>]<br />Sent: Wednesday, December 13, 2006 9:06 PM<br />To: Duane Costa<br />Cc: 'inigo san gil'; 'Mark Servilla'<br />Subject: RE: In-line data</p>
<p>Hi, Duane:</p>
<p>Yeah, current eml parser coudn't handle the reference for online url. <br />It can handle reference for attributeList and attribute. We can add <br />supporting online url reference as new feature into our data manager <br />library.</p>
<p>Thanks,</p>
<p>Jing</p>
<p>Jing Tao<br />National Center for Ecological<br />Analysis and Synthesis (NCEAS)<br />735 State St. Suite 204<br />Santa Barbara, CA 93101</p>
<p>On Wed, 13 Dec 2006, Duane Costa wrote:</p>
<blockquote>
<p>Date: Wed, 13 Dec 2006 15:37:27 -0700<br />From: Duane Costa <<a class="email" href="mailto:dcosta@lternet.edu">dcosta@lternet.edu</a>><br />To: 'Jing Tao' <<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>><br />Cc: 'inigo san gil' <<a class="email" href="mailto:isangil@lternet.edu">isangil@lternet.edu</a>>,<br />'Mark Servilla' <<a class="email" href="mailto:servilla@lternet.edu">servilla@lternet.edu</a>><br />Subject: RE: In-line data</p>
<p>Hi Jing,</p>
<p>Inigo and I have looked into the second issue below a</p>
</blockquote>
<p>little more (the</p>
<blockquote>
<p>question about FTP protocol). The problem was not the FTP</p>
</blockquote>
<p>protocol --</p>
<blockquote>
<p>we changed to HTTP and the Data Manager library had the</p>
</blockquote>
<p>same problem downloading the data. The problem is that the metadata <br />is using a reference to the URL to the data like this:</p>
<blockquote>
<p><dataTable><br />.<br />.<br />.<br /><distribution><br /><references>distributionReference</references><br /></distribution></p>
<p>In another part of the EML, we have:</p>
<p><distribution id="distributionReference"> <online><br /><url><br /><a class="external" href="http://lternet.lternet.edu/~isangil/NIN/nin_met_1982.txt">http://lternet.lternet.edu/~isangil/NIN/nin_met_1982.txt</a><br /></url><br /></online><br /></distribution></p>
<p>Because of the reference, Data Manager has no value for the</p>
</blockquote>
<p>entity identifier, and the download handler is not able to download <br />the</p>
<blockquote>
<p>data. So it seems that this is a legal EML document but the</p>
</blockquote>
<p>EML parser is not able to follow the reference to the URL for the <br />data.</p>
<blockquote>
<p>Here is a link to the document that is having the problem:</p>
<p><a class="external" href="http://lternet.lternet.edu/~isangil/NIN/nin_lter_met_1982.xml">http://lternet.lternet.edu/~isangil/NIN/nin_lter_met_1982.xml</a></p>
<p>Could you take a look?</p>
<p>Thanks,<br />Duane</p>
</blockquote></blockquote></blockquote> Bug #2701 (New): Data Manager Library: Support for inline datahttps://projects.ecoinformatics.org/ecoinfo/issues/27012006-12-15T16:42:45ZDuane Costadcosta@lternet.edu
<p>Wait for the next release -- as far as I know there is very little or no inline data out there in the KNB collection.</p>
<p>Matt</p>
<p>Duane Costa wrote:</p>
<blockquote>
<p>Matt, Mark:</p>
<p>Do you think that handling inline data should be a priority for <br />release 1.0.0 of the Data Manager Library, or something that should be recorded in Bugzilla as an enhancement for the next release, 1.1.0?</p>
<p>Thanks,<br />Duane</p>
<blockquote>
<p>-----Original Message-----<br />From: Jing Tao [mailto:<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>]<br />Sent: Wednesday, December 13, 2006 8:59 PM<br />To: Duane Costa<br />Subject: Re: In-line data</p>
<p>Hi, Duane:</p>
<p>Our datamanager couldn't handle inline data so far. Do you think this <br />feature has very high priority?</p>
</blockquote>
<p>.<br />.<br />.</p>
<blockquote>
<p>Jing</p>
<p>Jing Tao<br />National Center for Ecological<br />Analysis and Synthesis (NCEAS)<br />735 State St. Suite 204<br />Santa Barbara, CA 93101</p>
<p>On Wed, 13 Dec 2006, Duane Costa wrote:</p>
<blockquote>
<p>Date: Wed, 13 Dec 2006 12:20:05 -0700<br />From: Duane Costa <<a class="email" href="mailto:dcosta@lternet.edu">dcosta@lternet.edu</a>><br />To: 'Jing Tao' <<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>><br />Subject: In-line data</p>
<p>Hi Jing,</p>
<p>We have some metadata that contains <inline> tags to the</p>
</blockquote>
<p>data. Is the</p>
<blockquote>
<p>Data Manager download handler able to use this to download the data?</p>
</blockquote></blockquote>
<p>.<br />.<br />.</p>
<blockquote><blockquote>
<p>Thanks,<br />Duane</p>
</blockquote></blockquote></blockquote> Bug #2674 (In Progress): Data Manager Library: Set database table life-span priorityhttps://projects.ecoinformatics.org/ecoinfo/issues/26742006-11-21T20:36:35ZDuane Costadcosta@lternet.edu
<p>Provide an API for the Calling Application to set a database table life-span priority on specific database tables.</p>
<p>When the upper limit on the database size is reached (see Bug <a class="issue tracker-1 status-2 priority-2 priority-default" title="Bug: Data Manager Library: Set upper limit on database size (In Progress)" href="https://projects.ecoinformatics.org/ecoinfo/issues/2673">#2673</a>), the Data Manager Library will free up space by reducing the number of cached data tables in the database based on a "least used" removal algorithm. However, the Calling Application should be able to protect specific tables from removal by setting them as high priority. This is a boolean setting, either a table is protected from removal or it isn't.</p>
<p>This task supports Use Case <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: MCAT won't build under IRIX with Oracle 8.0.5 (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/6">#6</a> in the Data Manager Library UML documentation.</p> Bug #2673 (In Progress): Data Manager Library: Set upper limit on database sizehttps://projects.ecoinformatics.org/ecoinfo/issues/26732006-11-21T20:25:42ZDuane Costadcosta@lternet.edu
<p>Provide a means for the Calling Application to set an upper limit on the database size to prevent overloading the database. The table monitor component of the library must abide by the upper limit size constraint, and must include routines to drop tables when size constraints are met.</p>
<p>This task supports Use Case <a class="issue tracker-1 status-5 priority-5 priority-highest closed" title="Bug: mde won't load because of hardcoded image paths (Closed)" href="https://projects.ecoinformatics.org/ecoinfo/issues/5">#5</a> in the Database Manager Library UML documentation.</p>