Metacat: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362017-11-17T21:19:00ZEcoinformatics Redmine
Redmine Bug #7229 (New): Mis-Formatting of Data Package Contentshttps://projects.ecoinformatics.org/ecoinfo/issues/72292017-11-17T21:19:00ZThomas Thelen
<p>In MetacatUI we're getting a slight mis-formatting when displaying data package contents. This can be seen in the attached images. The issue was initially reported in MetacatUI as issue 379.</p>
<p><a class="external" href="https://github.com/NCEAS/metacatui/issues/379">https://github.com/NCEAS/metacatui/issues/379</a></p>
<p>From Bryce,</p>
<p>"The HTML in question is actually produced by Metacat and MetcatUI is just rendering it without modification from Metacat's View Service. ... The fix would involve changing the underlying eml-2 XSLT."</p> Bug #7228 (New): Error in sorting data-sets based on title in MetaCatUIhttps://projects.ecoinformatics.org/ecoinfo/issues/72282017-11-15T17:35:09ZRushiraj Nenuji
<p>Reference: Issue <a class="external" href="https://github.com/NCEAS/metacatui/issues/350">https://github.com/NCEAS/metacatui/issues/350</a> MetaCatUI<br /><br />The datasets when sorted based on the title(a-z) results into sorting based on the upper case and lower case separately.</p>
<p>Please find attached image for results of sorting along with this email.</p>
<p>Possible Solution: <a class="external" href="https://stackoverflow.com/questions/2053214/how-to-create-a-case-insensitive-copy-of-a-string-field-in-solr">https://stackoverflow.com/questions/2053214/how-to-create-a-case-insensitive-copy-of-a-string-field-in-solr</a></p>
<p>Thanks!</p>
<p>Best,<br />Rushiraj Nenuji.</p> Bug #7223 (New): EZID metadata registration doesn't seem to work with SIDshttps://projects.ecoinformatics.org/ecoinfo/issues/72232017-10-24T22:11:37ZBryce Mecummecum@nceas.ucsb.edu
<p>Earlier today, a DOI was generated using R's `generateIdentifier` (which calls MNStorage.generateIdentifier()). Then the newly-minted DOI was set as the Series ID of an EML 2.1.1 Object. The DOI was successfully registered with EZID but the EZID metadata was not correctly set on the object, as shown when I logged in. See the attached screenshot.</p>
<p>I expected the EZID metadata to get filled in like it normally does for DOIs that get used as PIDs. I took a quick glance at the relevant part of Metacat and it doesn't look like anything special is done to handle SIDs.</p> Support #6793 (In Progress): Update DOIs from KNB to redirect to view servicehttps://projects.ecoinformatics.org/ecoinfo/issues/67932015-07-02T19:00:00ZMatt Jonesjones@nceas.ucsb.edu
<p>In <a class="issue tracker-2 status-3 priority-2 priority-default closed" title="Feature: Create a new service to update DOI pointers (Resolved)" href="https://projects.ecoinformatics.org/ecoinfo/issues/6530">#6530</a> and <a class="issue tracker-5 status-5 priority-2 priority-default closed" title="Story: Keep DOI registrations current and resolvable (Closed)" href="https://projects.ecoinformatics.org/ecoinfo/issues/6440">#6440</a>, we added features to update DOI registrations, but we still have many originally assigned DOIs that redirect to the raw EML document rather than our landing page for a data set. We need to fix all of the /AA/ DOI registrations in the KNB and ensure they point to the right View service page. For DOIs for metadata, that would be the associated /view url for that DOI. For data files and resource maps, its to the view for the associated metadata. E.g.,</p>
<ul>
<li>Metadata doi:10.5063/AA/nceas.227.15 should redirect to <a class="external" href="https://knb.ecoinformatics.org/#view/doi:10.5063/AA/nceas.227.15">https://knb.ecoinformatics.org/#view/doi:10.5063/AA/nceas.227.15</a></li>
<li>Data doi:10.5063/AA/wtyburczy.30.1 should redirect to the same metadata file <a class="external" href="https://knb.ecoinformatics.org/#view/doi:10.5063/AA/nceas.227.15">https://knb.ecoinformatics.org/#view/doi:10.5063/AA/nceas.227.15</a></li>
</ul>
<p>Also, when a user updates metadata for a package (but doesn't change the data), the DOI redirect for the data will need to be updated to point to the new metadata. Let's verify that this is happening automatically in Metacat.</p> Task #6040 (New): Metacat-index does not handle <references>https://projects.ecoinformatics.org/ecoinfo/issues/60402013-07-26T00:01:50Zben leinfelderleinfelder@nceas.ucsb.edu
<p>I indexed a document from EVOS that uses a reference for a creator rather than the details of the person:<br /><pre>
<creator><references>1359152217358</references></creator>
</pre><br />But in the index it shows up as "||" instead of following the reference back the the id where it was declared:<br /><pre>
<associatedParty id="1359152217358">...
</pre></p>
<p><a class="external" href="http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default">http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default</a></p> Bug #3835 (In Progress): design and implement OAI-PMH compliant harvest subsystemhttps://projects.ecoinformatics.org/ecoinfo/issues/38352009-02-24T02:06:53ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat's current harvest mechanism works well but is a proprietary system. The Dryad project has proposed to implement an OAI-PMH compliant harvest susbstem for Metacat in order to allow Metacat to interact more effectively with other systems that implement this protocol. This is a tracking bug for the design and implementation of this feature. Other more detailed bugs will be filed for specific tasks. It would be useful if the final system allowed Metacat to act as both an OAI-PMH Data Provider and as an OAI-PMH Service Provider, allowing us to both serve and harvest documents from OAI-PMH servers.</p>
<p>Some issues to consider and discuss:<br />1) lack of record authorization mechanisms in OAI-PMH. Metacat currently allows harvest with access controls on harvested records. Reverting to a purely OAI-PMH system would eliminate this capability that is used by many of our harvest clients (especially for data, but somewhat for metadata as well). So the design needs to consider a hybrid that allows both public records to be exposed through OAI-PMH and restricted records to be exposed through a protocol like Metacat's that supports access control. What is our design goal here?</p>
<p>2) A corollary of (1) is how to determine who is allowed to update a given record. Does OAI-PMH assume providers always originate from a constant URL endpoint in order to get around authenticating data providers? This is probably not reasonable for even short periods of time (a few years). A number of sites change domain names over short period of times, and the harvester needs to be able to adjust to these changes, update endpoints, and still handle record replacement. Maybe this is a non-issue if PMH allows provider endpoints to be updated.</p>
<p>3) Date-based change detection in OAI-PMH versus GUID-based versioning in metacat. How should these be reconciled? If a PMH harvest occurs every ten days, but a metadata document is revised three times in that interval, does OAI-PMH only get the most recent version? How are the other versions archived and made accessible over time?</p>
<p>4) Data objects. The Metacat harvester allows one to transfer objects of any type, which is used to harvest both metadata objects of various formats (e.g., EML and FGDC) as well as the associated data objects. Each of these objects has their own unique identifier. How would this be handled under OAI-PMH?</p>
<p>A nice background set of slides is here:<br /><a class="external" href="http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt">http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt</a></p> Bug #3402 (In Progress): internal dtds are not handledhttps://projects.ecoinformatics.org/ecoinfo/issues/34022008-06-19T18:52:33ZChad Berkleyberkley@nceas.ucsb.edu
<p>XML documents with internal DTDs are not handled by metacat. The internal dtd entity callback in the sax parser is blank and doesn't do anything when presented with an internal dtd.</p> Bug #3396 (In Progress): Enable event notification featurehttps://projects.ecoinformatics.org/ecoinfo/issues/33962008-06-14T17:34:35ZChris Jonescjones@nceas.ucsb.edu
<p>We would like to propose some changes to Metacat's event logging <br />feature to extend the functionality and provide a notification feature <br />that alerts data set owners and/or interested parties of downloads and <br />other events. We plan on prototyping the changes, and would like <br />input and suggestions from other metacat developers on the features <br />and implementation.</p>
<p>For an email notification system (or other, such as RSS) to work, it <br />would require a mechanism for the end user to 'subscribe' to <br />notifications based on events. In brainstorming this, we thought that <br />the subscription could be based on, perhaps, a hand chosen <br />notification list of packageIds by data set or data set group (e.g. <br />'notify me about events on: PISCO intertidal/subtidal/physical ocean/ <br />data packages' ...). Expressing these groupings might be done via a <br />pathquery document or a cached query that produces a packageId list. <br />Suggestions are welcome on the best method to associate a data package <br />docid list and an email address of a person to be notified.</p>
<p>The information that's logged in metacat's access_log table is <br />sufficient for general reporting:</p>
<p>- registered user LDAP DN<br /> user name<br /> affiliated organization name<br />- event date/time stamp<br />- event type<br />- docid<br />(However, in building an email [or an RSS feed], the data package <br />title would be a more friendly way of displaying which package was <br />downloaded, etc.)</p>
<p>The changes to metacat would also likely a include mechanism to <br />register an event listener that monitors changes to the model backed <br />by the access_log table. For instance, a researcher might post the <br />following to metacat:</p>
<p>action=monitor&\<br />username=uid=rcore,o=PISCO,dc=ecoinformatics,dc=org&\<br />qformat=email&\<br />event=read&\<br />query=< the pathquery document that produces a package list ></p>
<p>By doing so, this action would register the listener, and the listener <br />would provide a callback used to handle the event notification. At <br />the moment, only metacat administrators have access to the logging <br />information via the getlog action.</p>
<p>Once someone is registered to monitor events, metacat would have to <br />then provide notification over specific protocols. The notification <br />process may be easiest if metacat includes an SMTP send-only server, <br />such as Aspirin, an embeddable SMTP server.</p>
<p><a class="external" href="https://aspirin.dev.java.net/">https://aspirin.dev.java.net/</a></p>
<p>There are other push mechanisms that could be used (like RSS), but the <br />researchers we work with specifically asked for email-based <br />notification.</p>
<p>We'll enter a placeholder bugzilla report to keep track of this <br />feature, but thought that people would have suggestions on both the <br />design and implementation before we get started.</p>
<p>Please let us know what you think.</p>
<p>Rex, Chris, Mike, Jordan</p> Bug #3367 (New): Harvester stores passwords in clear texthttps://projects.ecoinformatics.org/ecoinfo/issues/33672008-06-05T20:18:24ZChad Berkleyberkley@nceas.ucsb.edu
<p>The harvester stores the user's password in clear text in the database. Passwords need to be stored as md5s or use some other secure form of encryption.</p> Bug #3142 (New): metacat client uses in-memory buffer for posting datahttps://projects.ecoinformatics.org/ecoinfo/issues/31422008-02-08T19:41:52ZMatt Jonesjones@nceas.ucsb.edu
<p>The size of XML files (and probably data files) that can be sent to metacat is memory limited in client applications because the MetacatClient implementation assumes the payload can be loaded into a memory buffer before it is sent. This is done to calculate the size of the payload before POSTing it. We need new insert(), update(), and upload() methods that take a size parameter so that the Reader or InputStream can be streamed directly over the http connection instead of being accumulated in an in-memory buffer.</p>
<p>We have code that does this in Morpho already using Apache's httpclient library, but this should make its way into MetacatClient. With JDK after 1.5.x, Sun's http protocol handler now supports streaming POSTs, but you have to set up a separate HttpURLConnection with a new protocol handler and call setFixedLengthStreamingMode(). See:<br /> <a class="external" href="http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode(int">http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode(int</a></p>
<p>This would be an alternative to using httpclient, but probably still requires registering a newly configured protocol handler.</p>
<p>We also may have trouble with Metacat, because it also reads data using a string, as described in bug # 1122.</p> Bug #2155 (In Progress): Metacat Performace: Rewrite the xml_nodes querieshttps://projects.ecoinformatics.org/ecoinfo/issues/21552005-07-14T01:17:37ZSaurabh Gargsgarg@nceas.ucsb.edu
<p>From Matt's email...</p>
<p>Rewrite the xml_nodes queries. In general we use the IN clause a lot<br />which is less than efficient. We need to evaluate how our current<br />queries are working and rewrite them. With some systematic work we can<br />probably come up with some similar ideas for improvements</p> Bug #1879 (New): Metacat Performance: Summaryhttps://projects.ecoinformatics.org/ecoinfo/issues/18792005-01-18T21:42:07ZSaurabh Gargsgarg@nceas.ucsb.edu
<p>These are notes based on the changes I did in Metacat source for improving the<br />performance. I was not able to make the below given changes due to lack of time<br />and because these changes would require a more thorough testing.</p>
<p>1. xml_index is a large table and most of the time we are searching for paths<br />which are needed by the web interface and Morpho for displaying the results. So<br />it might be a good idea to create a seperate table similar to xml_index table<br />which has only got some predefined paths in it. For current knb skin and morpho<br />this table on would have about 1/200th the number of records that xml_index has<br />right now. The code that would need to be modified would include both insertion<br />and deletion of documents.</p>
<p>2. For searching data in particular given paths (e.g. geographic query) the<br />current query uses both xml_index and xml_nodes. This can be improved by just<br />using xml_index table which has nodedata in it. But there is a lot of repetition<br />of data in xml_index table. So it has to be tested and checked if this would<br />result in better performance or otherwise. This would require rewriting<br />QueryTerm.java.</p> Bug #1542 (New): SQL Server support brokenhttps://projects.ecoinformatics.org/ecoinfo/issues/15422004-04-30T15:35:59ZMatt Jonesjones@nceas.ucsb.edu
<p>Support for the MS SQL Server database was maintained in versions prior to 1.3<br />of metacat. Now the xmltables-sqlserver.sql and the associated<br />upgrade*-sqlserver.sql are either not up to date or are missing entirely. Need<br />to port the database changes to SQL Server and test all functions, including<br />upgrades from 1.3 to 1.4 before releasing 1.4.</p> Bug #421 (In Progress): create simple turnkey installer for metacat Phase IIhttps://projects.ecoinformatics.org/ecoinfo/issues/4212002-02-13T18:32:47ZChad Berkleyberkley@nceas.ucsb.edu
<p>we need to use the previously protyped metacat installer to build a robust, one<br />click installer for metacat that includes Tomcat, Ant, Metacat, PostgresSQL and<br />any other tools that are necessary.</p>
<p>We should do this for the next release of Metacat.</p> Bug #213 (New): transaction support for packageshttps://projects.ecoinformatics.org/ecoinfo/issues/2132001-04-09T22:35:17ZMatt Jonesjones@nceas.ucsb.edu
<p>Need to build in transaction support for packages. a client should be able to<br />insert (or update) a bunch of components of a package and be sure that they all<br />succeed or all fail. This is especially important if we allow submissions as<br />"jar" files or otherwise. Still need to be able to insert individual compnents<br />though.</p>