EML: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-09-17T20:24:06ZEcoinformatics Redmine
Redmine Bug #4393 (New): Use datamanager for EML QA/QChttps://projects.ecoinformatics.org/ecoinfo/issues/43932009-09-17T20:24:06Zben leinfelderleinfelder@nceas.ucsb.edu
<p>As discussed at the LTER meeting this year:<br />------------<br />Work Group: Metrics and reports for EML data package quality<br />The EML data manager library (contributors: Costa, Tao, Leinfelder, Servilla) was created to parse EML metadata documents and insert the described data entity into a relational database. Our experience using the library with data packages contributed to the LTER NIS indicates that a large fraction do not have metadata of sufficient quality for the data to be used in this way. The primary contribution from LTER sites to the NIS is data sets, which are intended to be used in cross-site synthesis projects. Clearly, for cross-site synthesis to make use of the NIS a certain minimum level of metadata and data quality is required.<br />The goals for this group:<br />1. establish a set of metrics for LTER EML data package quality,<br />2. recommend content for a report to be produced by the EML data manager library, and<br />3. consider implementation strategies, e.g. should the report be another choice on the EML parser page? a shell script similar to that included with the EML parser?</p>
<p>The quality reports can be used to<br />1. inform the dataset contributor about the content of the data package, and indicate whether data are of sufficient quality to be machine-readable. Our data catalog (metacat) has no quality standards beyond basic XML and EML compliance, so a data package that fails these quality metrics can still be uploaded or harvested, although its usefulness is limited.<br />2. in the LTER context, reports can produce a list of failure modes for LTER metadata and data entities. Such a list could provide input for the design of specific tools for data providers, or help identify gaps in a site's IM system. A site requesting supplemental funding for its IMS could use the reports as part of the proposal justification.</p>
<p>As a starting point for our discussion, I have started a flowchart based on my own experience with the data manager library and SBC's EML data packages.</p>
<p>Here is the current membership (on this cc list, and present in Estes Park):<br />Margaret O'Brien, SBC<br />Emery Boose, HFR<br />Dan Bahauddin, CDR<br />James Brunt, LNO<br />Mark Servilla, LNO<br />Duane Costa, LNO<br />Mark Shildhauer, NCEAS<br />Ben Leinfelder, NCEAS</p> Bug #2835 (Resolved): Data Manager Library: Run-time errors involving Xalan classeshttps://projects.ecoinformatics.org/ecoinfo/issues/28352007-05-01T20:46:50ZDuane Costadcosta@lternet.edu
<p>Two developers (myself and Chad Burt) are getting run-time errors when our applications try to use the Data Manager library. The errors are NoClassDefFoundError involving Xalan classes.</p>
<hr />
<p>On 4/30/2007, Chad Burt wrote:</p>
<p>Hi guys,<br />I am trying to deploy my sbc app with the datamanager library on a different machine and am having some problems. It seems to be missing a dependancy that is not on this fedora linux machine. I get this error:</p>
<p>Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xpath/axes/PredicatedNodeTest</p>
<p>I never ran into this problem on my mac, i just built the jar and it worked fine. Originally I thought it was because the jar needed to be recompiled because I was on a different machine. So I copied the whole eml tree over, hit "ant clean", then "ant dist-datamanager-lib", and uncompressed the zip to get my jar. No error messages. Ran my dataset import method based off the sample applications and I got the same error.</p>
<p>Is there some kind of apache library I need to have on this machine to get the datamanager library working?</p>
<p>Here is the full stack trace:<br />Exception in thread "main" java.lang.NoClassDefFoundError : org/apache/xpath/axes/PredicatedNodeTest<br /> at java.lang.ClassLoader.defineClass1(Native Method)<br /> at java.lang.ClassLoader.defineClass(ClassLoader.java:620)<br /> at java.security.SecureClassLoader.defineClass (SecureClassLoader.java:124)<br /> at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)<br /> at java.net.URLClassLoader.access$100(URLClassLoader.java:56)<br /> at java.net.URLClassLoader$1.run( URLClassLoader.java:195)<br /> at java.security.AccessController.doPrivileged(Native Method)<br /> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)<br /> at java.lang.ClassLoader.loadClass(ClassLoader.java :306)<br /> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)<br /> at java.lang.ClassLoader.loadClass(ClassLoader.java:251)<br /> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)<br /> at org.apache.xpath.XPath.<init>(XPath.java:199)<br /> at org.apache.xpath.CachedXPathAPI.eval(CachedXPathAPI.java:322)<br /> at org.apache.xpath.CachedXPathAPI.selectNodeIterator(CachedXPathAPI.java :216)<br /> at org.apache.xpath.CachedXPathAPI.selectSingleNode(CachedXPathAPI.java:177)<br /> at org.apache.xpath.CachedXPathAPI.selectSingleNode(CachedXPathAPI.java:157)<br /> at org.ecoinformatics.datamanager.parser.eml.Eml200Parser.parseDocument (Eml200Parser.java:182)<br /> at org.ecoinformatics.datamanager.parser.eml.Eml200Parser.parse(Eml200Parser.java:160)<br /> at org.ecoinformatics.datamanager.DataManager.parseMetadata(DataManager.java:585)<br /> at org.ecoinformatics.datamanager.sample.ImportDataset.testParseMetadata(ImportDataset.java:331)<br /> at org.ecoinformatics.datamanager.sample.ImportDataset.main(ImportDataset.java:132)</p>
<p>-Chad Burt</p>
<hr />
<p>On 4/30/2007, Duane Costa wrote:</p>
<p>Hi Chad,</p>
<p>I recently started experiencing a very similar run-time error on my Windows machine in an application I am developing that uses the datamanager library too. The only difference is that in my case the NoClassDefFoundError was for a different class:</p>
<p>org.apache.xpath.patterns.NodeTest</p>
<p>After a little trial and error, I found that I could resolve the error by incorporating a newer version of xalan.jar, based on Xalan-Java Version 2.7.0, into my classpath. I am attaching the xalan.jar file that fixed the problem for me. It seems that the NodeTest class was missing from the older xalan.jar but present in the newer xalan.jar. I'm guessing that the same might be true for the PredicatedNodeTest class.</p>
<p>I don't really understand was caused the error to start occurring; it may have something to do with the Java version I am running (I upgraded from Java 1.4.2 to Java 1.5.0 fairly recently). Jing and I will need to investigate this further. Meanwhile, as a temporary fix, could you try including the new xalan.jar file in your sbc application's classpath and let us know if that resolves the error for you?</p>
<p>Thanks,<br />Duane</p>
<hr />
<p>On 4/30/2007, Chad Burt wrote:</p>
<p>Thanks Duane,<br />I'm not too familiar with java and classpaths. I'm calling the datamanager.jar from the command line. I replaced eml/lib/apache/xalan.jar with the one you gave me. Is that correct? I'm getting the same error.<br />-Chad</p>
<hr />
<p>On 4/30/2007, Duane Costa wrote:</p>
<p>Hi Chad,</p>
<p>Interesting, I did exactly what you did, replaced eml/lib/apache/xalan.jar with the one I sent you, and rebuilt datamanager.jar using 'ant jar-datamanager-lib'. Now I'm getting the same error you've been getting! So the new xalan.jar obviously is not the solution, but at least we're getting the same error now. This is going to take more investigation. I'll try to get this figured out. Meanwhile, could you please send me the following?:</p>
<p>(1) The output from running 'java -version' in a command window on your system.<br />(2) The output from running 'echo $CLASSPATH' in a command window on your system.<br />(3) The exact command you use when you run the Data Manager library code.</p>
<p>Thanks,<br />Duane</p>
<hr />
<p>On 4/30/2007, Char Burt wrote:</p>
<p>Here's my info:</p>
<p>java -version:<br />java version "1.5.0_11" <br />Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)<br />Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode, sharing)</p>
<p>$CLASSPATH doesn't seem to be set, nor $CLASS_PATH. It's not set on my mac either.</p>
<p>I am running a custom method based on the sample apps. I've attached the file. It's usually under /src/org/ecoinformatics/datamanager/sample/ImportDataset.java.<br />I call it via :<br />java -cp "datamanager.jar" org.ecoinformatics.datamanager.sample.ImportDataset > /dev/null</p>
<p>Thanks for the help,<br />Chad</p> Bug #2700 (Resolved): Data Manager Library: Sample Calling Applicationhttps://projects.ecoinformatics.org/ecoinfo/issues/27002006-12-15T16:39:34ZDuane Costadcosta@lternet.edu
<blockquote>
<p>-----Original Message-----<br />From: Matthew Jones [mailto:<a class="email" href="mailto:jones@nceas.ucsb.edu">jones@nceas.ucsb.edu</a>] <br />Sent: Friday, December 15, 2006 12:37 AM<br />To: Duane Costa<br />Cc: 'Jing Tao'<br />Subject: Re: Sample calling application</p>
<p>Hi Duane and Jing,</p>
<p>A samle app sounds great. Comments inline...</p>
<p>Matt</p>
<p>Duane Costa wrote:</p>
<blockquote>
<p>Matt,</p>
<p>Could you add your comments to this discussion about a</p>
</blockquote>
<p>sample calling</p>
<blockquote>
<p>application in the Data Manager Library code? Jing and I both agree <br />that a sample calling application (as opposed to Junit</p>
</blockquote>
<p>tests) would be</p>
<blockquote>
<p>a useful addition to the distribution, even if it's limited</p>
</blockquote>
<p>to just the user documentation. However, there are a couple <br />of loose ends Jing and I feel unsure about (see below). After <br />you add your comments, I'll open a Bugzilla entry for this.</p>
<blockquote>
<p>Thanks,<br />Duane</p>
<blockquote>
<p>On Wed, 13 Dec 2006, Duane Costa wrote:</p>
<blockquote>
<p>Date: Wed, 13 Dec 2006 11:32:27 -0700<br />From: Duane Costa <<a class="email" href="mailto:dcosta@lternet.edu">dcosta@lternet.edu</a>><br />To: 'Jing Tao' <<a class="email" href="mailto:tao@nceas.ucsb.edu">tao@nceas.ucsb.edu</a>><br />Subject: Sample calling application</p>
<p>Hi Jing,</p>
<p>I think it would be nice to provide a sample calling</p>
</blockquote></blockquote></blockquote>
<p>application in</p>
<blockquote><blockquote><blockquote>
<p>the Data Manager Library source code distribution. It would</p>
</blockquote>
<p>just be a</p>
<blockquote>
<p>small program, together with implementations of the</p>
</blockquote>
<p>call-back interfaces for database connection pool and Ecogrid end <br />point, to demonstrate the different use cases. Do you</p>
</blockquote></blockquote>
<p>think this is a</p>
<blockquote><blockquote>
<p>good idea? If so, there are a few minor things to decide:<br />It is great idea.</p>
</blockquote>
<p>Good! I'll work on the sample program. I'll also add a new</p>
</blockquote>
<p>Bugzilla bug to document these ideas after Matt adds his comments.</p>
<blockquote>
<blockquote><blockquote>
<ul>
<li>Where to put the source code -- One possible package would be:<br />org.ecoinformatics.datamanager.sample</li>
</ul>
</blockquote></blockquote></blockquote>
<p>This package sounds good to me.</p>
<blockquote><blockquote><blockquote>
</blockquote>
<p>I am not sure. But I think since it is sample and it will</p>
</blockquote></blockquote>
<p>be good to</p>
<blockquote><blockquote>
<p>be easy found by the user. So we can still use the package you <br />prosposed, but can we put them into a another dir<br />- sample, which is parallel to src? The dir structure will</p>
</blockquote></blockquote>
<p>look like</p>
<blockquote><blockquote>
<p>sample/org/ecoinformatics/datamanager/sample.</p>
</blockquote>
<p>This sounds fine. Maybe we need a separate ant target in</p>
</blockquote>
<p>build.xml to</p>
<blockquote>
<p>compile the sample code, something like 'ant</p>
</blockquote>
<p>compile-datamanager-sample'.<br />Sounds fine. If its easier to just include the code in src <br />then I might just do that instead of making the parallel <br />hierarchy. But either way is fine.</p>
<blockquote>
<blockquote><blockquote>
<ul>
<li>How to set properties -- The main program could hard-code the <br />database values as constants, or the main program could</li>
</ul>
</blockquote></blockquote></blockquote>
<p>read values</p>
<blockquote><blockquote><blockquote>
<p>from the lib/datamanager/datamanager.properties file. The</p>
</blockquote>
<p>advantage of</p>
<blockquote>
<p>the first approach is that it keeps the database values</p>
</blockquote>
<p>together in the same file with the main program; the</p>
</blockquote></blockquote>
<p>second approach</p>
<blockquote><blockquote>
<p>has the advantage that users can edit</p>
</blockquote></blockquote>
<p>datamanager.properties and run</p>
<blockquote><blockquote>
<p>the sample program without needing to recompile. Which approach do <br />you like better?<br />First, I have a question. How do you plan to run this</p>
</blockquote></blockquote>
<p>sample code? It</p>
<blockquote><blockquote>
<p>will be compiled and distributed too? Or user should compile it by <br />himself or through build.xml? Or even just give user an</p>
</blockquote></blockquote>
<p>idea how to</p>
<blockquote><blockquote>
<p>use the library and we don't have plan let user run it? If we just <br />want to show user how to use the library and I think it is okay to <br />hard code in main program.<br />If we plan to let user run it (like our test file, it is</p>
</blockquote></blockquote>
<p>better put</p>
<blockquote><blockquote>
<p>those values in the property file.</p>
</blockquote>
<p>I don't know which approach is best. Maybe we just want to include <br />sample code primarily as part of the documentation, without any <br />expectation that the user will actually compile and execute</p>
</blockquote>
<p>it. Or maybe we do want the end user to try it out <br />themselves. I think we need Matt's input on this.<br />Let's use a properties file. Hardcoding these values in code <br />is a bad example to set.</p>
</blockquote> Bug #2578 (In Progress): Data Manager Library: Release and Distributionhttps://projects.ecoinformatics.org/ecoinfo/issues/25782006-10-27T20:28:05ZDuane Costadcosta@lternet.edu
<p>The Data Manager Library should be assigned a release number and distributed with the following components:</p>
<p>1. jar file (datamanager.jar)<br />2. javadoc API documentation<br />3. overview document which describes the API and provides usage examples<br />4. UML documents<br /> a. class diagram<br /> b. sequence of operations</p>
<p>The expected end-user is a programmer who will integrate the libary into a calling application. The Data Manager Library could be distributed on the EML product site.</p> Bug #2577 (Resolved): Data Manager Library: API to enumerate table and field nameshttps://projects.ecoinformatics.org/ecoinfo/issues/25772006-10-27T20:17:31ZDuane Costadcosta@lternet.edu
<p>Some applications may want to do direct queries on the data tables in the database. The application will need to map entity names to table names, and attribute names to field names. Extend the Data Manager Library API to provide a method to enumerate the table and field names for a given entity.</p> Bug #2576 (In Progress): Data Manager Library: Database Connection Poolinghttps://projects.ecoinformatics.org/ecoinfo/issues/25762006-10-27T20:11:08ZDuane Costadcosta@lternet.edu
<p>Rework the design and implementation of database connection pooling in the Data Manager Library. Provide a callback mechanism for the calling application to manage its own connection pool. This should include a mechanism for returning a "Connection not available" status to the Data Manager so that it will know that it needs to wait until a connection is available. The Data Manager should generally use one connection per operation, though if the operation has several steps it could re-use the same connection in more than one step if it's safe to do so.</p> Bug #2575 (Resolved): Data Manager Library: Support for query object APIhttps://projects.ecoinformatics.org/ecoinfo/issues/25752006-10-27T20:01:08ZDuane Costadcosta@lternet.edu
<p>The original design of the Data Manager Library was somewhat vague in its support for querying data tables. After further discussion (Matt, Jing, Duane, and Mark Servilla), we think that allowing the calling application to pass in an ANSI SQL string would be too problematic because of the parsing requirements. The problems arise from needing to parse non-standard entity names into database table names, and non-standard attribute names into database field names. For example:</p>
<p>SELECT SPECIES NAME, SPECIES ID FROM SPECIES</p>
<p>means one thing from the perspective of entities and attributes, but something else from a database perspective, where "NAME" and "ID" would be interpreted as column aliases.</p>
<p>Instead, we will design a query class that the calling application can use to <br />construct its queries in a more structured way by setting various attributes of the query object. At some later point, we may also support queries in an XML format that could be mapped onto the query object by the Data Manager Library. This would facilitate passing queries between two or more processes (e.g. first from Morpho to Metacat, and then from Metacat to the Data Manager Library code).</p>
<p>The JDBC ResultSet object that is returned could also pose a problem, since it contains references to the database table field names, not the original attribute names. The calling application could get around this by restricting itself to accessing the fields by position rather than by name.</p> Bug #2573 (In Progress): Data manager library need to handle binary data file formathttps://projects.ecoinformatics.org/ecoinfo/issues/25732006-10-27T17:54:02ZJing Taotao@nceas.ucsb.edu
<p>Currently data manager library can load text data, both simple delimited and complicated text format(e.g, fixed width), into relational database. We want a new feature - this library can handle binary data file format (e.g. excel) too.</p> Bug #2507 (Resolved): Data Manager Library: Create a EML parser lib to digest eml documenthttps://projects.ecoinformatics.org/ecoinfo/issues/25072006-08-01T17:52:42ZJing Taotao@nceas.ucsb.edu
<p>Currently, the EML actor in Kepler can download eml document and parse it. After parsing, the entity information in eml document will be stored in java object and data file will be download into local file system and also be stored in relation db too. <br />We want to seperate this process from Kepler and make it as lib in eml module. So this library can be used in Kepler, Metacat and some other projects.</p>