Ecoinformatics Redmine: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-11-11T22:05:02ZEcoinformatics Redmine
Redmine Morpho - Bug #4542 (New): Feature Request: Import / update metadata from MySQL databasehttps://projects.ecoinformatics.org/ecoinfo/issues/45422009-11-11T22:05:02ZDavid LeBauerdlebauer@gmail.com
<p>I am writing to request a feature for Morpho that would facilitate creating and updating metadata from an existing database. I am using MySQL, but I imagine that the feature could be more general.</p>
<p>I would like for it to:</p>
<p>a) query a database to generate relevant metadata <br />or <br />b) generate metadata from an SQL CREATE script</p>
<p>For each table<br /> give table name to entityName<br /> Add comments to Description<br /> Make list of attributeNames from column headers<br />For each attributeName<br /> entityName: is column name<br /> attributeDefinition: state if it is a primary key or foreign key, if foreign key, write foreign key and the name of referenced table<br /> storageType derived from datatype<br /> managementScale/unit/numericDomain/range etc. should be derived from datatype (and data if connected to database)</p>
<p>Thanks!</p>
<p>-David</p> Metacat - Bug #3835 (In Progress): design and implement OAI-PMH compliant harvest subsystemhttps://projects.ecoinformatics.org/ecoinfo/issues/38352009-02-24T02:06:53ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat's current harvest mechanism works well but is a proprietary system. The Dryad project has proposed to implement an OAI-PMH compliant harvest susbstem for Metacat in order to allow Metacat to interact more effectively with other systems that implement this protocol. This is a tracking bug for the design and implementation of this feature. Other more detailed bugs will be filed for specific tasks. It would be useful if the final system allowed Metacat to act as both an OAI-PMH Data Provider and as an OAI-PMH Service Provider, allowing us to both serve and harvest documents from OAI-PMH servers.</p>
<p>Some issues to consider and discuss:<br />1) lack of record authorization mechanisms in OAI-PMH. Metacat currently allows harvest with access controls on harvested records. Reverting to a purely OAI-PMH system would eliminate this capability that is used by many of our harvest clients (especially for data, but somewhat for metadata as well). So the design needs to consider a hybrid that allows both public records to be exposed through OAI-PMH and restricted records to be exposed through a protocol like Metacat's that supports access control. What is our design goal here?</p>
<p>2) A corollary of (1) is how to determine who is allowed to update a given record. Does OAI-PMH assume providers always originate from a constant URL endpoint in order to get around authenticating data providers? This is probably not reasonable for even short periods of time (a few years). A number of sites change domain names over short period of times, and the harvester needs to be able to adjust to these changes, update endpoints, and still handle record replacement. Maybe this is a non-issue if PMH allows provider endpoints to be updated.</p>
<p>3) Date-based change detection in OAI-PMH versus GUID-based versioning in metacat. How should these be reconciled? If a PMH harvest occurs every ten days, but a metadata document is revised three times in that interval, does OAI-PMH only get the most recent version? How are the other versions archived and made accessible over time?</p>
<p>4) Data objects. The Metacat harvester allows one to transfer objects of any type, which is used to harvest both metadata objects of various formats (e.g., EML and FGDC) as well as the associated data objects. Each of these objects has their own unique identifier. How would this be handled under OAI-PMH?</p>
<p>A nice background set of slides is here:<br /><a class="external" href="http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt">http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt</a></p> Metacat - Bug #3402 (In Progress): internal dtds are not handledhttps://projects.ecoinformatics.org/ecoinfo/issues/34022008-06-19T18:52:33ZChad Berkleyberkley@nceas.ucsb.edu
<p>XML documents with internal DTDs are not handled by metacat. The internal dtd entity callback in the sax parser is blank and doesn't do anything when presented with an internal dtd.</p> Metacat - Bug #3396 (In Progress): Enable event notification featurehttps://projects.ecoinformatics.org/ecoinfo/issues/33962008-06-14T17:34:35ZChris Jonescjones@nceas.ucsb.edu
<p>We would like to propose some changes to Metacat's event logging <br />feature to extend the functionality and provide a notification feature <br />that alerts data set owners and/or interested parties of downloads and <br />other events. We plan on prototyping the changes, and would like <br />input and suggestions from other metacat developers on the features <br />and implementation.</p>
<p>For an email notification system (or other, such as RSS) to work, it <br />would require a mechanism for the end user to 'subscribe' to <br />notifications based on events. In brainstorming this, we thought that <br />the subscription could be based on, perhaps, a hand chosen <br />notification list of packageIds by data set or data set group (e.g. <br />'notify me about events on: PISCO intertidal/subtidal/physical ocean/ <br />data packages' ...). Expressing these groupings might be done via a <br />pathquery document or a cached query that produces a packageId list. <br />Suggestions are welcome on the best method to associate a data package <br />docid list and an email address of a person to be notified.</p>
<p>The information that's logged in metacat's access_log table is <br />sufficient for general reporting:</p>
<p>- registered user LDAP DN<br /> user name<br /> affiliated organization name<br />- event date/time stamp<br />- event type<br />- docid<br />(However, in building an email [or an RSS feed], the data package <br />title would be a more friendly way of displaying which package was <br />downloaded, etc.)</p>
<p>The changes to metacat would also likely a include mechanism to <br />register an event listener that monitors changes to the model backed <br />by the access_log table. For instance, a researcher might post the <br />following to metacat:</p>
<p>action=monitor&\<br />username=uid=rcore,o=PISCO,dc=ecoinformatics,dc=org&\<br />qformat=email&\<br />event=read&\<br />query=< the pathquery document that produces a package list ></p>
<p>By doing so, this action would register the listener, and the listener <br />would provide a callback used to handle the event notification. At <br />the moment, only metacat administrators have access to the logging <br />information via the getlog action.</p>
<p>Once someone is registered to monitor events, metacat would have to <br />then provide notification over specific protocols. The notification <br />process may be easiest if metacat includes an SMTP send-only server, <br />such as Aspirin, an embeddable SMTP server.</p>
<p><a class="external" href="https://aspirin.dev.java.net/">https://aspirin.dev.java.net/</a></p>
<p>There are other push mechanisms that could be used (like RSS), but the <br />researchers we work with specifically asked for email-based <br />notification.</p>
<p>We'll enter a placeholder bugzilla report to keep track of this <br />feature, but thought that people would have suggestions on both the <br />design and implementation before we get started.</p>
<p>Please let us know what you think.</p>
<p>Rex, Chris, Mike, Jordan</p> Metacat - Bug #3367 (New): Harvester stores passwords in clear texthttps://projects.ecoinformatics.org/ecoinfo/issues/33672008-06-05T20:18:24ZChad Berkleyberkley@nceas.ucsb.edu
<p>The harvester stores the user's password in clear text in the database. Passwords need to be stored as md5s or use some other secure form of encryption.</p> Metacat - Bug #3142 (New): metacat client uses in-memory buffer for posting datahttps://projects.ecoinformatics.org/ecoinfo/issues/31422008-02-08T19:41:52ZMatt Jonesjones@nceas.ucsb.edu
<p>The size of XML files (and probably data files) that can be sent to metacat is memory limited in client applications because the MetacatClient implementation assumes the payload can be loaded into a memory buffer before it is sent. This is done to calculate the size of the payload before POSTing it. We need new insert(), update(), and upload() methods that take a size parameter so that the Reader or InputStream can be streamed directly over the http connection instead of being accumulated in an in-memory buffer.</p>
<p>We have code that does this in Morpho already using Apache's httpclient library, but this should make its way into MetacatClient. With JDK after 1.5.x, Sun's http protocol handler now supports streaming POSTs, but you have to set up a separate HttpURLConnection with a new protocol handler and call setFixedLengthStreamingMode(). See:<br /> <a class="external" href="http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode(int">http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode(int</a></p>
<p>This would be an alternative to using httpclient, but probably still requires registering a newly configured protocol handler.</p>
<p>We also may have trouble with Metacat, because it also reads data using a string, as described in bug # 1122.</p> Morpho - Bug #2473 (In Progress): Morpho uses too much memoryhttps://projects.ecoinformatics.org/ecoinfo/issues/24732006-06-27T18:12:08ZWill Tyburczytyburczy@nceas.ucsb.edu
<p>Currently, Morpho needs 512M of RAM to run successfully. If it is set to a lower value, Morpho will freeze in the Data Table Wizard when importing a table with many columns.</p> Metacat - Bug #2155 (In Progress): Metacat Performace: Rewrite the xml_nodes querieshttps://projects.ecoinformatics.org/ecoinfo/issues/21552005-07-14T01:17:37ZSaurabh Gargsgarg@nceas.ucsb.edu
<p>From Matt's email...</p>
<p>Rewrite the xml_nodes queries. In general we use the IN clause a lot<br />which is less than efficient. We need to evaluate how our current<br />queries are working and rewrite them. With some systematic work we can<br />probably come up with some similar ideas for improvements</p> Metacat - Bug #1879 (New): Metacat Performance: Summaryhttps://projects.ecoinformatics.org/ecoinfo/issues/18792005-01-18T21:42:07ZSaurabh Gargsgarg@nceas.ucsb.edu
<p>These are notes based on the changes I did in Metacat source for improving the<br />performance. I was not able to make the below given changes due to lack of time<br />and because these changes would require a more thorough testing.</p>
<p>1. xml_index is a large table and most of the time we are searching for paths<br />which are needed by the web interface and Morpho for displaying the results. So<br />it might be a good idea to create a seperate table similar to xml_index table<br />which has only got some predefined paths in it. For current knb skin and morpho<br />this table on would have about 1/200th the number of records that xml_index has<br />right now. The code that would need to be modified would include both insertion<br />and deletion of documents.</p>
<p>2. For searching data in particular given paths (e.g. geographic query) the<br />current query uses both xml_index and xml_nodes. This can be improved by just<br />using xml_index table which has nodedata in it. But there is a lot of repetition<br />of data in xml_index table. So it has to be tested and checked if this would<br />result in better performance or otherwise. This would require rewriting<br />QueryTerm.java.</p> Morpho - Bug #1702 (New): spatial search is not savedhttps://projects.ecoinformatics.org/ecoinfo/issues/17022004-09-28T23:47:07ZChad Berkleyberkley@nceas.ucsb.edu
<p>when you revise a search that you have just performed, the spatial bounding box<br />is reset to the default instead of being left in the state that you left it. <br />When opening a saved search or revising an existing search, the bounding box<br />should retain its position.</p> Metacat - Bug #1542 (New): SQL Server support brokenhttps://projects.ecoinformatics.org/ecoinfo/issues/15422004-04-30T15:35:59ZMatt Jonesjones@nceas.ucsb.edu
<p>Support for the MS SQL Server database was maintained in versions prior to 1.3<br />of metacat. Now the xmltables-sqlserver.sql and the associated<br />upgrade*-sqlserver.sql are either not up to date or are missing entirely. Need<br />to port the database changes to SQL Server and test all functions, including<br />upgrades from 1.3 to 1.4 before releasing 1.4.</p> Morpho - Bug #1426 (In Progress): closing unsaved package does not close packagehttps://projects.ecoinformatics.org/ecoinfo/issues/14262004-03-30T22:19:51ZChad Berkleyberkley@nceas.ucsb.edu
<p>if you have an unsaved package open and you close the window, morpho asks you <br />if you'd like to save the package. if you click yes, it saves the package but <br />forgets that your original request was to close the window and leaves it open.</p> Metacat - Bug #421 (In Progress): create simple turnkey installer for metacat Phase IIhttps://projects.ecoinformatics.org/ecoinfo/issues/4212002-02-13T18:32:47ZChad Berkleyberkley@nceas.ucsb.edu
<p>we need to use the previously protyped metacat installer to build a robust, one<br />click installer for metacat that includes Tomcat, Ant, Metacat, PostgresSQL and<br />any other tools that are necessary.</p>
<p>We should do this for the next release of Metacat.</p> Utilities - Bug #324 (In Progress): perl implementation of harvester apihttps://projects.ecoinformatics.org/ecoinfo/issues/3242001-11-08T19:34:03ZMatt Jonesjones@nceas.ucsb.edu
<p>Need to implement registry and harvester.</p> Utilities - Bug #323 (In Progress): Establish API for communication with src and desthttps://projects.ecoinformatics.org/ecoinfo/issues/3232001-11-08T19:33:09ZMatt Jonesjones@nceas.ucsb.edu
<p>Need an API for the harverster communication between the source sites that are<br />being harvested and the destination sites that are recieving the data. See the<br />Visio diagram for initial design ideas.</p>