Metacat: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362016-11-22T23:10:50ZEcoinformatics Redmine
Redmine Bug #7160 (Closed): replicationPolicy missing numberReplicas and replicationAllowed attributeshttps://projects.ecoinformatics.org/ecoinfo/issues/71602016-11-22T23:10:50ZMatt Jonesjones@nceas.ucsb.edu
<p>While technically optional, the ReplicationPolicy class in SystemMetadata is not useful without the replicationAllowed and numberReplicas attributes. We encountered a bug in the R client where the parser assumed those would be present (<a class="external" href="https://github.com/ropensci/datapack/issues/63">https://github.com/ropensci/datapack/issues/63</a>). We fixed that client side, but it would also be useful to have Metacat generate ReplicationPolicy instances with these attributes set. This is handled in SystemMetadataFactory.getDefaultReplicationPolicy().</p> Bug #6662 (Closed): Metacat fails large-file uploadhttps://projects.ecoinformatics.org/ecoinfo/issues/66622015-02-06T06:53:40ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat seems to have a hard limit set on file upload size, at least for the DataONE MN.create() API. I attempted to call create() on a 4GiB file, which produced the error below in the logs.</p>
<p>Looking into the code, for Metacat 2.4.2, it appears the size limit is hardcoded on line 677 of D1ResourceHandler.java:</p>
<p><code><br />MultipartRequestResolver mrr =<br /> new MultipartRequestResolver(tmpDir.getAbsolutePath(), 1000000000, 0);<br /></code></p>
<p>To fix this, we should set a reasonable size that allows individual files to include typical multi-gigabyte-sized files. At a minimum this should be configurable and not hard-coded.</p>
<p>The produced error was:<br /><code>org.dataone.service.exceptions.ServiceFailure: Could not resolve multipart files: the request was rejected because its size (1000001678) exceeds the configured maximum (1000000000)<br /> at edu.ucsb.nceas.metacat.restservice.D1ResourceHandler.collectMultipartFiles(D1ResourceHandler.java:683)<br /> at edu.ucsb.nceas.metacat.restservice.MNResourceHandler.putObject(MNResourceHandler.java:1381)<br /> at edu.ucsb.nceas.metacat.restservice.MNResourceHandler.handle(MNResourceHandler.java:255)<br /> at edu.ucsb.nceas.metacat.restservice.D1RestServlet.doPost(D1RestServlet.java:84)<br /> at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)<br /> at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)<br /> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)<br /> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)<br /> at edu.ucsb.nceas.metacat.restservice.D1URLFilter.doFilter(D1URLFilter.java:48)<br /> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)<br /> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)<br /> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)<br /> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)<br /> at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)<br /> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)<br /> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)<br /> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)<br /> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)<br /> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)<br /> at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193)<br /> at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)<br /> at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)<br /> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)<br /> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)<br /> at java.lang.Thread.run(Thread.java:745)<br /></code></p> Bug #6136 (Closed): files left open causes too many file descriptors on OShttps://projects.ecoinformatics.org/ecoinfo/issues/61362013-10-09T20:03:37ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat writes temp files to disk, and in the process has been failing to close file handles. Over time, especially with operations that touch many files, the number of file handles in use by Metacat increases and eventually exceeds the operating systems hard limit, causing exceptions when Metacat tries to open any additional files. Need to be sure to close all file handles properly after usage.</p> Bug #5938 (Closed): sitemap format is deprecatedhttps://projects.ecoinformatics.org/ecoinfo/issues/59382013-05-22T02:34:25ZMatt Jonesjones@nceas.ucsb.edu
<p>The sitemap format used by Metacat has been deprecated, and should be updated to the current release (0.9) as published by <a class="external" href="http://sitemaps.org">http://sitemaps.org</a>.</p> Bug #5929 (Closed): replication update action times outhttps://projects.ecoinformatics.org/ecoinfo/issues/59292013-04-30T21:35:33ZMatt Jonesjones@nceas.ucsb.edu
<p>With large database sizes, the replications "update" action times out with normal settings for HTTP timeouts. On DataONE CNs, this action can take more than 4 minutes. I traced this down to the SQL query used to find deleted documents:</p>
<p>select distinct docid from xml_revisions where docid not in (select docid from xml_documents) and server_location = 1;</p>
<p>which takes an excessive amount of time because it materializes a large table. See the DataONE ticket for details (<a class="external" href="https://redmine.dataone.org/issues/3740">https://redmine.dataone.org/issues/3740</a>).</p> Story #5811 (Closed): Redesign KNB look and feelhttps://projects.ecoinformatics.org/ecoinfo/issues/58112013-01-24T20:28:28Zben leinfelderleinfelder@nceas.ucsb.edu
<p>Not necessarily a "Metacat" bug, but we will likely want to include the KNB skin in Metacat with an updated look and feel.</p>
<p>We may even be re thinking the entire skin-based approach - discussion to ensue!</p> Bug #5531 (Resolved): Remove/make optional the DataONE MN registration that occurs during Metacat...https://projects.ecoinformatics.org/ecoinfo/issues/55312011-11-04T17:26:31Zben leinfelderleinfelder@nceas.ucsb.edu
<p>Right now when I [re]configure Metacat I hit the DataONE CN in an attempt to register myself as a MN. This should be optional.</p> Bug #1984 (Resolved): add support for LSID identifiershttps://projects.ecoinformatics.org/ecoinfo/issues/19842005-02-18T01:50:30ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat currently supports identifiers of the form 'scope.id.revision' with a<br />'system' attribute inthe metadata. We need to modify metacat to support the<br />Life Science Identifier specification (LSID). LSIDs have the form:</p>
<p>urn:lsid:authority:namespace:localidentifier:revision</p>
<p>THis maps directly onto our existing scheme, and so there should be a one to one<br />correpondence <strong>except</strong> that we fail to store the EML system attribute in our<br />existing scheme. To support LSID fully, we need to:</p>
<p>1) Accept EML documents that use an LSID in their packageId or other Id fields<br />2) Accept LSID in the docid input parameters to the Metacat interfaces,<br />including insert, update, delete, and read among others<br />3) Implement an LSID resolver that is associated with Metacat that allows the<br />standard LSID web services to respond with information about the document<br />4) Allow LSIDs to be placed as identifiers withing the "url" field of EML to<br />reference the data objects that are associated with metadata documents</p>
<p>We probably also need to:<br />5) Allow data objects to be inserted with LSID identifiers<br />6) Allow LSID resolver calls for a dataset id to be able to locate the metadata<br />associated with the data file in order to propoerly respond to the resolver request</p>
<p>The last point requires some new info to be tracked, because right now we<br />maintain the linkage between metadata and data only in the EML documents -- we<br />probably need to extract this when ingesting an eml document and store it for<br />later retrieval.</p> Bug #1658 (Resolved): tracking bug for 1.4.0 releasehttps://projects.ecoinformatics.org/ecoinfo/issues/16582004-08-20T20:00:48ZMatt Jonesjones@nceas.ucsb.edu
<p>This is the tracking bug for last minute details for the 1.4.0 release of<br />metacat. Before releasing, we need to:</p>
<p>1) Review and revise documentation, including installation instructions<br />2) Check that the harvester code is properly integrated and documented<br />3) Create a function for re-indexing xml_nodes into xml_index<br />3a) make sure the new xml_index with larger path is created in each db<br />4) finish testing access control support for 2.0.1<br />5) change use_xmlindex to 'true'<br />6) develop a test case for inserting and reading both an EML 2.0.0 doc and an<br />EML 2.0.1 doc (the access control tests may satisfy this)<br />7) Update README with appropriate contributors list</p> Bug #1452 (In Progress): dtd filenames clash if reused for multiple PUBLIC identifiershttps://projects.ecoinformatics.org/ecoinfo/issues/14522004-04-05T23:44:29ZMatt Jonesjones@nceas.ucsb.edu
<p>Problem reported by Rod Spears:</p>
<p>Ok, this is partially intended behavior. Metacat takes the following attitude<br />towards establishing the relationship between a PUBLIC identifier/namespace and<br />an associated DTD or schema:</p>
<pre><code>1) When a document is submitted, check its PUBLIC id/namespace<br /> a) if it is not registered, then try to retrieve the DTD from<br /> either the passed in parameters, or from the provided<br /> SYSTEM identifier or from an xsi:schemaLocation. If schema<br /> is obtained, cache it and record its location and the public <br /> identifier. Fail with error if schema can't be obtained.<br /> b) if we already have it registered, look up the cached version of <br /> the schema and use it for validation, ignoring any data the <br /> user passes in.</code></pre>
<p>This means that the first submitted docuemnt with a given type determines the<br />DTD/schema used for validation for all subsequent documents submitted as that<br />type. This allows an administrator to pre-register several document types that<br />are important to him and be sure that any submitted documents are valid with<br />respect to the schema he provided. Metacat ships with several pre-registered<br />schemas and DTDs for EML.</p>
<p>So, your issue is this: the first time you registered the DTD, it uploaded the<br />ecogridregistry.205.22.dtd file to metacat's dtd cache. Later, when you tried<br />to upload a new document using a different public ID but the same system ID, it<br />tried to save the file ecogridregistry.205.22.dtd but found that it already<br />existed in the dtd cache, so it couldn't. This is a bug. There's no reason<br />that we should use the identical filename as is passed in to us for the dtd<br />filename, and so we should be gracefully renaming the DTD file when a name is<br />already in use. This hasn't cropped up before because we haven't had people<br />using the same DTD for different PUBLIC identifiers. You can work around it by<br />simply renaming your DTD (to anything other than its current name) and then<br />resubmitting. I'll file this as yet another bug -- yikes.</p> Bug #1451 (Resolved): null returndoctype fails to return all documentshttps://projects.ecoinformatics.org/ecoinfo/issues/14512004-04-05T23:23:26ZMatt Jonesjones@nceas.ucsb.edu
<p>When the pathquery "returndoctype" filed is omitted, metacat is supposed to<br />return all matching documents. In fact, no documents are returned under version<br />1.3.1, because it filters out all documents in the returndoctype listt is empty<br />or null. Fixing this will require chaning the logic in the DBQuery module to<br />include results even when the returndoctype field is not present.</p> Bug #1427 (Resolved): xml_index constrains depth of paths that can be insertedhttps://projects.ecoinformatics.org/ecoinfo/issues/14272004-03-30T22:20:05ZMatt Jonesjones@nceas.ucsb.edu
<p>When an XML document contains a deeply nested structure, metacat accepts the<br />document for storage in xml_nodes, but during the subsequent indexing phase, it<br />throws an exception because the composite paths to the deep nodes are too long<br />to fit in the space allocated for the paths in the column in the xml_index<br />table. This column was limited to a a few hundred characters so that it is<br />indexable (Oracle had a limit on the total indexable width of columns).</p>
<p>These problems were discovered and reported by Wade Sheldon (GCE LTER) when he<br />submitted EML documents with fully filled out taxonomic coverage entries. We<br />definitely need to support realistically filled out EML documents.</p>
<p>So, two possible solutions:<br /> 1) make the column much wider<br /> -- this is a partial solution, because the column still might not be big <br /> enough for very deep docs or docs with long element names<br /> -- if its wider, it may not be indexable, which is why it exists<br /> 2) eliminate the dependency on the xml_index table altogether<br /> -- the recursive search needed isn't that much slower, and may not be<br /> slower at all as we tune the database<br /> -- insert/update/delete should be MUCH faster<br /> -- simpler database structure</p>
<p>We have decided to pursue (2) above because of the advantages listed. Rather<br />than completely removing the xml_index code, we are going to make it an option<br />whether or not it is used, but by default ship with it turned off.</p> Bug #1235 (Resolved): enable passthrough parameters to support stysheet paramshttps://projects.ecoinformatics.org/ecoinfo/issues/12352003-12-10T09:22:21ZMatt Jonesjones@nceas.ucsb.edu
<p>Many different skins for metacat could take advantage of custom parameters in<br />the stylesheets. For example the OBFS registry has a need to add Edit and<br />Delete buttons to the resultset listing. A simple way to do this is to pass<br />paramters through metacat into the stylesheets to control the behavior of the<br />rendered output. This is currently hindered by the DBQuery.createSQuery()<br />function because it currently interprets all unknown parameters as XPaths that<br />should be written as an additional constraint in an squery. We need to<br />partially circumvent this feature in order for passthrough stlesheet parameters<br />to work.</p> Bug #1230 (Resolved): move metacat.properties out of jar filehttps://projects.ecoinformatics.org/ecoinfo/issues/12302003-12-05T20:56:06ZMatt Jonesjones@nceas.ucsb.edu
<p>The current configuration file for metacat (metacat.properties) is installed<br />inside of the metacat.jar JAR file. This makes changing the configuration<br />difficult for most users. Need to move it out of the jar, probably to a<br />location like ${context}/WEB-INF/metacat.properties. I have started code to<br />accomplish this change.</p> Bug #1137 (Resolved): add a metacat-info actionhttps://projects.ecoinformatics.org/ecoinfo/issues/11372003-08-28T17:53:16ZChad Berkleyberkley@nceas.ucsb.edu
<p>I think we need to add a metacat-info action so that you can send a request to <br />metacat and it will print selected properties from the properties file as well <br />as the actual metacat version that is running. I think the version is <br />actually the most important info that we need but other things that could be <br />returned are the database name, the jdbc connection string, etc. this would <br />be very useful for debugging.</p>