EML: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362013-09-06T18:12:45ZEcoinformatics Redmine
Redmine Feature #6079 (New): Support JSON or XML output from emlparserhttps://projects.ecoinformatics.org/ecoinfo/issues/60792013-09-06T18:12:45Zben leinfelderleinfelder@nceas.ucsb.edu
<p>The online parser servlet returns HTML, but there has been a request to support alternate output formats for programatic interactions.</p>
<p>Matt's proposed schema<br /><pre>
<!ELEMENT response (validation+)>
<!ELEMENT validation (message*)>
<!ATTLIST validation type (#PCDATA) #REQUIRED>
<!ATTLIST validation status (passed | failed) #REQUIRED>
<!ELEMENT message (#PCDATA)>
</pre></p>
<p>and example:</p>
<pre>
<response>
<validation type="emlparse" status="failed">
<message>Missing key for reference to node "154A12"</message>
<message>Missing key for reference to node "26A467"</message>
</validation>
<validation type="saxparse" status="passed" />
</response>
</pre> Bug #5475 (New): Make the data manager handle multiple physical representations in an entityhttps://projects.ecoinformatics.org/ecoinfo/issues/54752011-08-19T16:23:30ZJing Taotao@nceas.ucsb.edu
<p>I enter a bug for the eml actor in kepler:<br /><a class="external" href="http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5474">http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5474</a></p>
<p>I guess the data manager has the same issue.</p> Bug #5427 (New): round-trip encoding of missing values uploaded then queried from a db table is losthttps://projects.ecoinformatics.org/ecoinfo/issues/54272011-06-23T21:37:02Zgastil gastilmarygastil@yahoo.com
<p>Round-trip encoding of missing values in EML datasets uploaded to and queried from a database table is lost with current version of DML.</p>
<p>Short link to this doc is <a class="external" href="http://goo.gl/2mq9T">http://goo.gl/2mq9T</a></p>
<p>Might pertain to:<br /> EML Data Manager Library<br /> PASTA workflows<br /> EML-parsed data delivered from a DML-loaded database</p>
<p>Might be a feature request for a future iteration,... Far in the future!</p>
<p>This is about handling missing values in data tables, whether they are stored in a database table as codes or as nulls, and how they are then coalesce()-ed in a VIEW of that table.</p>
<p>Background:<br />EML allows multiple missing value codes for the same data table column, as there may be more than one reason for missing a value. This is good. The EML Data Manager Library (DML) compares missing value codes with a string comparison, not a numeric comparison. So -9999.0 does not match -9999. A numeric column may have a text missing value code such as NaN or na in a column of float type.</p>
<p>Issue:<br />When the data is inserted into the database table, obviously the non-numeric string missing value codes cannot be inserted literally. They are inserted as nulls. I looked at that part of the DML code. Any datum which matches one of the missing value codes for its column gets inserted as a null.*</p>
<p>Since any missing value code is collapsed into a simple null, then the original information about what kind of missing value code it was has been lost.</p>
<p>When querying that data table, either a VIEW specifically written for that table or the code constructing that query could use the EML to assign a missing value code to nulls using coalesce(), but only if there were only one missing value code per column. Where multiple codes exist, it would be wrong to just arbitrarily assign the first-listed code to all nulls of a column.</p>
<p>Proposed Solution:<br />Alternatively, the DML could store missing value codes, assigning numeric codes to replace non-numeric codes where necessary (a tricky feat since it implies knowledge of the range of valid values, which may not be specified in the EML.) Then a corresponding query would have to be stored as a VIEW, with a CASE wrapping that column to translate back to the original codes.</p>
<p>Details:<br />*Notes relating to actual java code are below.</p>
<p>In the DML DatabaseAdapter class, in the method generateInsertSQL() gets three inputs: the attributeList, the tableName, and oneRowData. Each attribute value is compared as a literal string to the possible missing value codes for that attribute using the private method issMissingValue on line 523<br />In generateInsertSQL() line 267 if a value is a missing value then it jumps to the next attribute in the list. The insert statement then does not upload that column. This is equivalent to inserting a null into that column. Since the DML does not seem to put any NON NULL constraints on any columns, (let alone any constraints at all), that in itself does not generate an error. It does, however, mean that missing value codes are not stored in the database table and so if there are multiple codes for a column that information is lost.</p>
<p>This was looking at svn revision number 2195 of Duaneās branch<br /><a class="external" href="https://code.ecoinformatics.org/code/eml/branches/DATAMANAGER_QUALITY/src/org/ecoinformatics/datamanager/database/DatabaseAdapter.java">https://code.ecoinformatics.org/code/eml/branches/DATAMANAGER_QUALITY/src/org/ecoinformatics/datamanager/database/DatabaseAdapter.java</a></p>
<p>on 23June2011</p> Bug #5308 (In Progress): Data Manager Library: storageType content should be stored and usedhttps://projects.ecoinformatics.org/ecoinfo/issues/53082011-02-15T15:57:28ZDuane Costadcosta@lternet.edu
<p>'storageType' is an optional, repeatable element within the EML 'attribute' element. In addition to the documentation available in the EML normative documents, several old bug tickets describe the rationale behind this element: <a class="issue tracker-1 status-3 priority-5 priority-highest closed" title="Bug: eml-attribute changes needed (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/484">#484</a>, <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: issues about storageType and attributeDomain (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/544">#544</a>, <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: storageType is repeatable in eml-attribute (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/599">#599</a>.</p>
<p>When the Data Manager Library parses EML attributes, it does not record any 'storageType' content that may be present. This means that the hints that may have been provided by the metadata provider pertaining to how the attribute should be stored optimally (say, in a relational database table), are completely ignored by the Data Manager Library, which instead relies entirely on the 'measurementScale' content for this purpose.</p>
<p>To cite a specific example of how 'storageType' content can be helpful, the document knb-lter-gce.1.9 (<a class="external" href="http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9">http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9</a>) contains three attributes for year, month, and day, respectively. Each of the attributes has storageType set to 'integer' and measurementScale set to 'dateTime'. When loading the data table into a relational database, the Data Manager Library sets the corresponding database fields to type 'timestamp' (in Postgres), having no knowledge that the storage type "hint" was to set the fields to type integer ('int4' in Postgres). The result is that in the original data table entity, the fields appear like this:</p>
<p>2000 8 26</p>
<p>while in the relational database, they appear like this:</p>
<pre><code>year | month | day <br />---------------------+------------------------+------------------------<br /> 2000-01-01 00:00:00 | 0001-08-01 00:00:00 BC | 0001-01-26 00:00:00 BC</code></pre>
<p>It's clear that in this particular case, the Data Manager Library could have used the storageType hint to select a more appropriate data type for these attributes.</p>
<p>The goal of this task is to:</p>
<p>1. Enhance the EML parsing phase of the Data Manager Library, so that it parses and stores all storageType elements that are provided for an attribute.</p>
<p>2. Enhance the data loading phase of the Data Manager Library, so that it uses storageType content, if provided, to make a more informed decision about which data type to define for the attribute. This may involve the need for heuristics to determine which data type is most appropriate under a given set of circumstances, particularly in cases where more than one storageType element is provided for an attribute.</p> Bug #4393 (New): Use datamanager for EML QA/QChttps://projects.ecoinformatics.org/ecoinfo/issues/43932009-09-17T20:24:06Zben leinfelderleinfelder@nceas.ucsb.edu
<p>As discussed at the LTER meeting this year:<br />------------<br />Work Group: Metrics and reports for EML data package quality<br />The EML data manager library (contributors: Costa, Tao, Leinfelder, Servilla) was created to parse EML metadata documents and insert the described data entity into a relational database. Our experience using the library with data packages contributed to the LTER NIS indicates that a large fraction do not have metadata of sufficient quality for the data to be used in this way. The primary contribution from LTER sites to the NIS is data sets, which are intended to be used in cross-site synthesis projects. Clearly, for cross-site synthesis to make use of the NIS a certain minimum level of metadata and data quality is required.<br />The goals for this group:<br />1. establish a set of metrics for LTER EML data package quality,<br />2. recommend content for a report to be produced by the EML data manager library, and<br />3. consider implementation strategies, e.g. should the report be another choice on the EML parser page? a shell script similar to that included with the EML parser?</p>
<p>The quality reports can be used to<br />1. inform the dataset contributor about the content of the data package, and indicate whether data are of sufficient quality to be machine-readable. Our data catalog (metacat) has no quality standards beyond basic XML and EML compliance, so a data package that fails these quality metrics can still be uploaded or harvested, although its usefulness is limited.<br />2. in the LTER context, reports can produce a list of failure modes for LTER metadata and data entities. Such a list could provide input for the design of specific tools for data providers, or help identify gaps in a site's IM system. A site requesting supplemental funding for its IMS could use the reports as part of the proposal justification.</p>
<p>As a starting point for our discussion, I have started a flowchart based on my own experience with the data manager library and SBC's EML data packages.</p>
<p>Here is the current membership (on this cc list, and present in Estes Park):<br />Margaret O'Brien, SBC<br />Emery Boose, HFR<br />Dan Bahauddin, CDR<br />James Brunt, LNO<br />Mark Servilla, LNO<br />Duane Costa, LNO<br />Mark Shildhauer, NCEAS<br />Ben Leinfelder, NCEAS</p> Bug #3181 (New): xs:string to ComplexType TextType, minOccurs=0, judiciously appliedhttps://projects.ecoinformatics.org/ecoinfo/issues/31812008-03-21T23:13:51ZMargaret O'Brienmob@msi.ucsb.edu
<p>This is a summary of a recent discussion on eml-dev which does not appear to have been entered in bugzilla.<br />Several people have expressed a need for additional structure in leaf nodes that are currently designated xs:string, generally to accommodate formatting for species binomials, chemical notation and lists. Examples include <title>, <method>, and <protocol>.</p>
<p>One solution is to change these from xs:string to txt:TextType. Since TextType is mixed content, it will not affect existing documents containing strings. The nodes to apply this change should be agreed on by this group, and this is not meant to be a work-around for eml which needs enhancement. Database implementations will need to correctly interpret the data typing when searching these elements. For more info on TextType, see bug 2703, and the docbook schema (<a class="external" href="http://www.docbook.org/specs/">http://www.docbook.org/specs/</a>).</p>
<p>EML 2.0.1 title element:<br /><xs:element name="title" type="xs:string" maxOccurs="unbounded"></p>
<p>EML 2.0.2 proposed title element:<br /><xs:element name="title" type="txt:TextType" maxOccurs="unbounded"></p>
<p>Either of these is valid:<br /><eml><br /> <dataset><br /> <title>Uptake of nitrogen by Alnus tenuifolia and Alnus crispa in six different successional habitats</title><br /> ...<br /> </dataset><br /></eml></p>
<p><eml><br /> <dataset><br /> <title>Uptake of nitrogen by<br /> <emphasis>Alnus tenuifolia</emphasis> and<br /> <emphasis>Alnus crispa</emphasis><br /> in six different successional habitats</title><br /> ...<br /> </dataset><br /></eml></p> Bug #2758 (New): datamanager does not respect precision on nominal day attributeshttps://projects.ecoinformatics.org/ecoinfo/issues/27582007-02-05T20:46:36ZChad Burtcburt@msi.ucsb.edu
<p>When importing a data table with many floats set to a precision of 0.0001 and a unit of nominal day, the library was rounding to 1 or 3 decimal places.</p>
<p>The affected attributes were matlab_datenum, and Decimal time<br />The dataset was : knb-lter-sbc.2003.2 :: arroyoburro_mooring_arb.txt</p> Bug #2756 (New): Single quote characters from data are not escaped when performing insertshttps://projects.ecoinformatics.org/ecoinfo/issues/27562007-02-01T23:38:17ZChad Burtcburt@msi.ucsb.edu
<p>recieved this error:<br />DatabaseLoader.run(): Error message: ERROR: syntax error at or near "only" <br />regarding this line:<br />INSERT into ... calm','"adrift; CTD dropped to 100' only; slight breeze"'</p>
<p>It seems that if a single quote is present within the data being entered it is not escaped. On this line "only" is seen as a postgres command since "100'" came before it.</p> Bug #2702 (New): Data Manager Library: Support for online URL referenceshttps://projects.ecoinformatics.org/ecoinfo/issues/27022006-12-15T16:44:49ZDuane Costadcosta@lternet.edu
<p>Next release. Again, this will be rare. Not much to be gained from a URL reference.</p>
<p>Matt</p>
<p>Duane Costa wrote:</p>
<blockquote>
<p>Matt, Mark:</p>
<p>Do you think that handling references to online URLs should be a <br />requirement for the first release of the Data Manager Library (1.0.0), or recorded as an enhancement for the next release (1.1.0)?</p>
<p>Thanks,<br />Duane</p>
<blockquote>
<p>-----Original Message-----<br />From: Jing Tao [mailto:<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>]<br />Sent: Wednesday, December 13, 2006 9:06 PM<br />To: Duane Costa<br />Cc: 'inigo san gil'; 'Mark Servilla'<br />Subject: RE: In-line data</p>
<p>Hi, Duane:</p>
<p>Yeah, current eml parser coudn't handle the reference for online url. <br />It can handle reference for attributeList and attribute. We can add <br />supporting online url reference as new feature into our data manager <br />library.</p>
<p>Thanks,</p>
<p>Jing</p>
<p>Jing Tao<br />National Center for Ecological<br />Analysis and Synthesis (NCEAS)<br />735 State St. Suite 204<br />Santa Barbara, CA 93101</p>
<p>On Wed, 13 Dec 2006, Duane Costa wrote:</p>
<blockquote>
<p>Date: Wed, 13 Dec 2006 15:37:27 -0700<br />From: Duane Costa <<a class="email" href="mailto:dcosta@lternet.edu">dcosta@lternet.edu</a>><br />To: 'Jing Tao' <<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>><br />Cc: 'inigo san gil' <<a class="email" href="mailto:isangil@lternet.edu">isangil@lternet.edu</a>>,<br />'Mark Servilla' <<a class="email" href="mailto:servilla@lternet.edu">servilla@lternet.edu</a>><br />Subject: RE: In-line data</p>
<p>Hi Jing,</p>
<p>Inigo and I have looked into the second issue below a</p>
</blockquote>
<p>little more (the</p>
<blockquote>
<p>question about FTP protocol). The problem was not the FTP</p>
</blockquote>
<p>protocol --</p>
<blockquote>
<p>we changed to HTTP and the Data Manager library had the</p>
</blockquote>
<p>same problem downloading the data. The problem is that the metadata <br />is using a reference to the URL to the data like this:</p>
<blockquote>
<p><dataTable><br />.<br />.<br />.<br /><distribution><br /><references>distributionReference</references><br /></distribution></p>
<p>In another part of the EML, we have:</p>
<p><distribution id="distributionReference"> <online><br /><url><br /><a class="external" href="http://lternet.lternet.edu/~isangil/NIN/nin_met_1982.txt">http://lternet.lternet.edu/~isangil/NIN/nin_met_1982.txt</a><br /></url><br /></online><br /></distribution></p>
<p>Because of the reference, Data Manager has no value for the</p>
</blockquote>
<p>entity identifier, and the download handler is not able to download <br />the</p>
<blockquote>
<p>data. So it seems that this is a legal EML document but the</p>
</blockquote>
<p>EML parser is not able to follow the reference to the URL for the <br />data.</p>
<blockquote>
<p>Here is a link to the document that is having the problem:</p>
<p><a class="external" href="http://lternet.lternet.edu/~isangil/NIN/nin_lter_met_1982.xml">http://lternet.lternet.edu/~isangil/NIN/nin_lter_met_1982.xml</a></p>
<p>Could you take a look?</p>
<p>Thanks,<br />Duane</p>
</blockquote></blockquote></blockquote> Bug #2674 (In Progress): Data Manager Library: Set database table life-span priorityhttps://projects.ecoinformatics.org/ecoinfo/issues/26742006-11-21T20:36:35ZDuane Costadcosta@lternet.edu
<p>Provide an API for the Calling Application to set a database table life-span priority on specific database tables.</p>
<p>When the upper limit on the database size is reached (see Bug <a class="issue tracker-1 status-2 priority-2 priority-default" title="Bug: Data Manager Library: Set upper limit on database size (In Progress)" href="https://projects.ecoinformatics.org/ecoinfo/issues/2673">#2673</a>), the Data Manager Library will free up space by reducing the number of cached data tables in the database based on a "least used" removal algorithm. However, the Calling Application should be able to protect specific tables from removal by setting them as high priority. This is a boolean setting, either a table is protected from removal or it isn't.</p>
<p>This task supports Use Case <a class="issue tracker-1 status-3 priority-2 priority-default closed" title="Bug: MCAT won't build under IRIX with Oracle 8.0.5 (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/6">#6</a> in the Data Manager Library UML documentation.</p> Bug #2673 (In Progress): Data Manager Library: Set upper limit on database sizehttps://projects.ecoinformatics.org/ecoinfo/issues/26732006-11-21T20:25:42ZDuane Costadcosta@lternet.edu
<p>Provide a means for the Calling Application to set an upper limit on the database size to prevent overloading the database. The table monitor component of the library must abide by the upper limit size constraint, and must include routines to drop tables when size constraints are met.</p>
<p>This task supports Use Case <a class="issue tracker-1 status-5 priority-5 priority-highest closed" title="Bug: mde won't load because of hardcoded image paths (Closed)" href="https://projects.ecoinformatics.org/ecoinfo/issues/5">#5</a> in the Database Manager Library UML documentation.</p> Bug #2578 (In Progress): Data Manager Library: Release and Distributionhttps://projects.ecoinformatics.org/ecoinfo/issues/25782006-10-27T20:28:05ZDuane Costadcosta@lternet.edu
<p>The Data Manager Library should be assigned a release number and distributed with the following components:</p>
<p>1. jar file (datamanager.jar)<br />2. javadoc API documentation<br />3. overview document which describes the API and provides usage examples<br />4. UML documents<br /> a. class diagram<br /> b. sequence of operations</p>
<p>The expected end-user is a programmer who will integrate the libary into a calling application. The Data Manager Library could be distributed on the EML product site.</p> Bug #1128 (In Progress): Distribution element requested for projecthttps://projects.ecoinformatics.org/ecoinfo/issues/11282003-08-11T23:00:01ZDavid Blankmandblankman@lternet.edu
<p>At the LTER EML Implementation workshop the LTER Information Managers working on<br />"best practices" identified a desire to have a distribution element under<br />dataset/project. This would allow an LTER site to reference the location of<br />their site wide data catalog.</p> Bug #1036 (In Progress): Develop errata section for eml technical documents on webhttps://projects.ecoinformatics.org/ecoinfo/issues/10362003-04-10T23:59:48ZDavid Blankmandblankman@lternet.edu
<p>There is at least one and perhaps several places where the eml documentation on<br />the web is either incorrect or misleading. For example, eml-methods refers to<br />references which is inconsistent with the schema.</p> Bug #365 (In Progress): Eml documentation for Seminars & LTER siteshttps://projects.ecoinformatics.org/ecoinfo/issues/3652001-12-03T17:54:10ZDavid Blankmandblankman@lternet.edu
<p>This will be an overview of EML and its relationship to Morpho. It will also be practical guide for <br />users to understand how to take conventionally reported metadata, such as text documents or <br />other legacy systems, and manually enter it into Morpho. (This is not to be confused with <br />automated conversions of metadata into EML).</p>