Metacat: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362017-07-22T23:14:48ZEcoinformatics Redmine
Redmine Bug #7203 (In Progress): Improve D1NodeService.isAuthorized() performancehttps://projects.ecoinformatics.org/ecoinfo/issues/72032017-07-22T23:14:48ZChris Jonescjones@nceas.ucsb.edu
<p>We're seeing poor performance in calls to <code>D1NodeService.isAuthorized()</code> on <a class="external" href="https://arcticdata.io">https://arcticdata.io</a>. When the system Metacat is under light load (< 10 requests per second), calls to <code>isAuthorized()</code> are taking up to 35 seconds to return either an <code>HTTP 200</code> response or a <code>HTTP 403</code> exception.</p>
<p>Change <code>isAuthorized()</code> to prioritize user-based authorization first, and then CN or MN authorization last. This should increase performance for end users, whereas MN to MN replication calls and CN-administrative calls will be slightly less prioritized.</p>
<p>Note that calls to <code>userHasPermission()</code> involve token verification using the <code>PortalCertificateManager</code> and the <code>TokenGenerator</code>. These calls may be repeatedly making a call to the CN to get the SSL certificate for verification if it is not cached. If this change doesn't significantly improve performance, look into refactoring those classes in <code>d1_portal</code> to cache and use the certificate, unless there is a verification exception, in which case we make the call to <code>fetchCertificate()</code> again, re-cache it, and attempt to re-verify the token. If it still fails, throw <code>NotAuthorized</code>.</p> Support #6838 (In Progress): LTER user can't log inhttps://projects.ecoinformatics.org/ecoinfo/issues/68382015-08-28T23:41:47ZJing Taotao@nceas.ucsb.edu
<p>marco: ldap.lternet.edu should still work<br />[4:32pm] Jing: but why the search doesn’t work?<br />[4:32pm] Jing: and i can’t log in it from knb web page.<br />[4:34pm] marco: my guess is that the connection is trying to connect to 389, which IIRC is where startTLS initiates<br />[4:34pm] marco: port 389 is now blocked - not my decision<br />[4:34pm] Jing: aha.<br />[4:35pm] Jing: thanks, marco<br />[4:35pm] marco: if necessary, 389 can be opened for a specific IP or range<br />[4:35pm] marco: and startTLS enabled<br />[4:37pm] marco: we'll work with mark schildhauer next week to figure out the disposition of LDAP</p> Feature #6416 (In Progress): Do not allow restrictive access control change to content with a DOIhttps://projects.ecoinformatics.org/ecoinfo/issues/64162014-02-11T18:34:16Zben leinfelderleinfelder@nceas.ucsb.eduBug #4245 (In Progress): Harvester command line scripts don't executehttps://projects.ecoinformatics.org/ecoinfo/issues/42452009-07-13T21:55:42ZDuane Costadcosta@lternet.edu
<p>Metacat Harvester is normally launched as a Java servlet, but also has the option of being invoked manually from a pair of command-line scripts ('lib/harvester/runHarvester.bat' on Windows, 'lib/harvester/runHarvester.sh' on Linux). As of Metacat 1.9.x, execution of Metacat Harvester via the command-line scripts is not working.</p>
<p>Solution:<br /> 1. Additional dependencies need to be specified in the Java CLASSPATH:<br /> a. METACAT_LIB/log4j-1.2.12.jar<br /> b. METACAT_LIB/xalan.jar<br /> c. METACAT_LIB/postgresql-8.0-312.jdbc3.jar (for POSTGRESQL)<br /> 2. The Harvester.java class needs the following changes:<br /> a. Add support for log4j initialization in the 'main' method.<br /> b. In the 'loadProperties()' method, change the PropertyService constructor from 'PropertyService.getInstance();' to 'PropertyService.getTestInstance(configDir);' where 'configDir' is a relative path to the directory where 'metacat.properties' resides.</p>
<p>Note: The solution implemented to resolve this problem for Metacat Harvester will also be beneficial toward the implementation of the new Metacat OAI-PMH Harvester described in Bug <a class="issue tracker-1 status-2 priority-5 priority-highest" title="Bug: design and implement OAI-PMH compliant harvest subsystem (In Progress)" href="https://projects.ecoinformatics.org/ecoinfo/issues/3835">#3835</a>.</p> Bug #4243 (In Progress): Harvester db errors due to fixed character length overflowhttps://projects.ecoinformatics.org/ecoinfo/issues/42432009-07-13T16:59:04ZDuane Costadcosta@lternet.edu
<p>In a recent release of Metacat (1.9.0), the Harvester property names were<br />refactored to begin with the prefix 'harvester.'. Some of Harvester property<br />names are used as operation codes in metacat's 'harvest_log' table,<br />'harvest_operation_code' field, which is declared with a fixed length of 30<br />characters. The 'harvester.ValidateHarvestListSuccess' code is 35 chars, which<br />exceeds the limit and results in DB errors on record insertion during a harvest.</p> Bug #3835 (In Progress): design and implement OAI-PMH compliant harvest subsystemhttps://projects.ecoinformatics.org/ecoinfo/issues/38352009-02-24T02:06:53ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat's current harvest mechanism works well but is a proprietary system. The Dryad project has proposed to implement an OAI-PMH compliant harvest susbstem for Metacat in order to allow Metacat to interact more effectively with other systems that implement this protocol. This is a tracking bug for the design and implementation of this feature. Other more detailed bugs will be filed for specific tasks. It would be useful if the final system allowed Metacat to act as both an OAI-PMH Data Provider and as an OAI-PMH Service Provider, allowing us to both serve and harvest documents from OAI-PMH servers.</p>
<p>Some issues to consider and discuss:<br />1) lack of record authorization mechanisms in OAI-PMH. Metacat currently allows harvest with access controls on harvested records. Reverting to a purely OAI-PMH system would eliminate this capability that is used by many of our harvest clients (especially for data, but somewhat for metadata as well). So the design needs to consider a hybrid that allows both public records to be exposed through OAI-PMH and restricted records to be exposed through a protocol like Metacat's that supports access control. What is our design goal here?</p>
<p>2) A corollary of (1) is how to determine who is allowed to update a given record. Does OAI-PMH assume providers always originate from a constant URL endpoint in order to get around authenticating data providers? This is probably not reasonable for even short periods of time (a few years). A number of sites change domain names over short period of times, and the harvester needs to be able to adjust to these changes, update endpoints, and still handle record replacement. Maybe this is a non-issue if PMH allows provider endpoints to be updated.</p>
<p>3) Date-based change detection in OAI-PMH versus GUID-based versioning in metacat. How should these be reconciled? If a PMH harvest occurs every ten days, but a metadata document is revised three times in that interval, does OAI-PMH only get the most recent version? How are the other versions archived and made accessible over time?</p>
<p>4) Data objects. The Metacat harvester allows one to transfer objects of any type, which is used to harvest both metadata objects of various formats (e.g., EML and FGDC) as well as the associated data objects. Each of these objects has their own unique identifier. How would this be handled under OAI-PMH?</p>
<p>A nice background set of slides is here:<br /><a class="external" href="http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt">http://www.oaforum.org/otherfiles/berl_oai-tutorial_e.ppt</a></p> Bug #3402 (In Progress): internal dtds are not handledhttps://projects.ecoinformatics.org/ecoinfo/issues/34022008-06-19T18:52:33ZChad Berkleyberkley@nceas.ucsb.edu
<p>XML documents with internal DTDs are not handled by metacat. The internal dtd entity callback in the sax parser is blank and doesn't do anything when presented with an internal dtd.</p> Bug #3396 (In Progress): Enable event notification featurehttps://projects.ecoinformatics.org/ecoinfo/issues/33962008-06-14T17:34:35ZChris Jonescjones@nceas.ucsb.edu
<p>We would like to propose some changes to Metacat's event logging <br />feature to extend the functionality and provide a notification feature <br />that alerts data set owners and/or interested parties of downloads and <br />other events. We plan on prototyping the changes, and would like <br />input and suggestions from other metacat developers on the features <br />and implementation.</p>
<p>For an email notification system (or other, such as RSS) to work, it <br />would require a mechanism for the end user to 'subscribe' to <br />notifications based on events. In brainstorming this, we thought that <br />the subscription could be based on, perhaps, a hand chosen <br />notification list of packageIds by data set or data set group (e.g. <br />'notify me about events on: PISCO intertidal/subtidal/physical ocean/ <br />data packages' ...). Expressing these groupings might be done via a <br />pathquery document or a cached query that produces a packageId list. <br />Suggestions are welcome on the best method to associate a data package <br />docid list and an email address of a person to be notified.</p>
<p>The information that's logged in metacat's access_log table is <br />sufficient for general reporting:</p>
<p>- registered user LDAP DN<br /> user name<br /> affiliated organization name<br />- event date/time stamp<br />- event type<br />- docid<br />(However, in building an email [or an RSS feed], the data package <br />title would be a more friendly way of displaying which package was <br />downloaded, etc.)</p>
<p>The changes to metacat would also likely a include mechanism to <br />register an event listener that monitors changes to the model backed <br />by the access_log table. For instance, a researcher might post the <br />following to metacat:</p>
<p>action=monitor&\<br />username=uid=rcore,o=PISCO,dc=ecoinformatics,dc=org&\<br />qformat=email&\<br />event=read&\<br />query=< the pathquery document that produces a package list ></p>
<p>By doing so, this action would register the listener, and the listener <br />would provide a callback used to handle the event notification. At <br />the moment, only metacat administrators have access to the logging <br />information via the getlog action.</p>
<p>Once someone is registered to monitor events, metacat would have to <br />then provide notification over specific protocols. The notification <br />process may be easiest if metacat includes an SMTP send-only server, <br />such as Aspirin, an embeddable SMTP server.</p>
<p><a class="external" href="https://aspirin.dev.java.net/">https://aspirin.dev.java.net/</a></p>
<p>There are other push mechanisms that could be used (like RSS), but the <br />researchers we work with specifically asked for email-based <br />notification.</p>
<p>We'll enter a placeholder bugzilla report to keep track of this <br />feature, but thought that people would have suggestions on both the <br />design and implementation before we get started.</p>
<p>Please let us know what you think.</p>
<p>Rex, Chris, Mike, Jordan</p> Bug #3383 (In Progress): Create RPM/Deb installation utilitieshttps://projects.ecoinformatics.org/ecoinfo/issues/33832008-06-09T16:23:24ZMichael Daigledaigle@nceas.ucsb.edu
<p>Phase II of the turnkey installation project is the creation of an install utility for linux</p> Bug #2841 (In Progress): ESA Registry edit feature does not work with submitted data setshttps://projects.ecoinformatics.org/ecoinfo/issues/28412007-05-14T20:59:12ZCallie Bowdishbowdish@nceas.ucsb.edu
<p>I save a data set using the ESA Registry and then try to edit it (still with the registry). The following error comes up. "More occurrences of the tag dataset/access/allow found than that can be shown in the form. Please use Morpho to edit this document."</p>
<p>Jing suggests that since now we have two moderator groups, we have two access rules and the two rules is bigger than the one which we can show in the form.</p>
<p>The Online form will automatically add access rules. The user does not enter them in. A couple of weeks ago the new group esa-moderators was created: dc=org:cn=esa-moderators,dc=ecoinformatics,dc=org</p>
<p>Now the access rules generated look something like:<br /> ALLOW:</p>
<p>[all]<br /> uid=esaadmin,o=NCEAS,dc=ecoinformatics,dc=org<br />ALLOW:</p>
<p>[all]<br /> cn=knb-prod,o=NCEAS,dc=ecoinformatics,dc=org<br />ALLOW:</p>
<p>[all]<br /> cn=esa-moderators,dc=ecoinformatics,dc=org<br />ALLOW:</p>
<p>[read]<br />[write]<br /> uid=Bowdish1,o=unaffiliated,dc=ecoinformatics,dc=org</p>
<p>The Bowdish1 is not my moderator account, but a user test account.</p>
<p>Please note that I am unsure if this feature ever worked. Most of my testing was done with my regular account with moderator rights and we thought that the moderator could not use the edit feature online. Also the NCEAS Skin has a bug listed about the edit feature not working 2644. I do not know if there is a relation to that bug and the ESA problem.</p> Bug #2180 (In Progress): Make it easier for admin to add new layershttps://projects.ecoinformatics.org/ecoinfo/issues/21802005-09-06T16:29:26ZJohn Harrisharris@nceas.ucsb.edu
<p>Currently, to add new spatial layers to the metacat map server it requires a sys<br />admin type individual with some unix skills. This needs to be easier so that<br />any Metacat manager can register new themes. Ideas for making this easier are:</p>
<p>2c1) manually by uploading and configuring a layer (web interface?)</p>
<p>2c2) automatically by picking a layer from Metacat</p> Bug #1452 (In Progress): dtd filenames clash if reused for multiple PUBLIC identifiershttps://projects.ecoinformatics.org/ecoinfo/issues/14522004-04-05T23:44:29ZMatt Jonesjones@nceas.ucsb.edu
<p>Problem reported by Rod Spears:</p>
<p>Ok, this is partially intended behavior. Metacat takes the following attitude<br />towards establishing the relationship between a PUBLIC identifier/namespace and<br />an associated DTD or schema:</p>
<pre><code>1) When a document is submitted, check its PUBLIC id/namespace<br /> a) if it is not registered, then try to retrieve the DTD from<br /> either the passed in parameters, or from the provided<br /> SYSTEM identifier or from an xsi:schemaLocation. If schema<br /> is obtained, cache it and record its location and the public <br /> identifier. Fail with error if schema can't be obtained.<br /> b) if we already have it registered, look up the cached version of <br /> the schema and use it for validation, ignoring any data the <br /> user passes in.</code></pre>
<p>This means that the first submitted docuemnt with a given type determines the<br />DTD/schema used for validation for all subsequent documents submitted as that<br />type. This allows an administrator to pre-register several document types that<br />are important to him and be sure that any submitted documents are valid with<br />respect to the schema he provided. Metacat ships with several pre-registered<br />schemas and DTDs for EML.</p>
<p>So, your issue is this: the first time you registered the DTD, it uploaded the<br />ecogridregistry.205.22.dtd file to metacat's dtd cache. Later, when you tried<br />to upload a new document using a different public ID but the same system ID, it<br />tried to save the file ecogridregistry.205.22.dtd but found that it already<br />existed in the dtd cache, so it couldn't. This is a bug. There's no reason<br />that we should use the identical filename as is passed in to us for the dtd<br />filename, and so we should be gracefully renaming the DTD file when a name is<br />already in use. This hasn't cropped up before because we haven't had people<br />using the same DTD for different PUBLIC identifiers. You can work around it by<br />simply renaming your DTD (to anything other than its current name) and then<br />resubmitting. I'll file this as yet another bug -- yikes.</p> Bug #1213 (In Progress): emlbeta6 to eml2 conversion stylesheets should be relocatablehttps://projects.ecoinformatics.org/ecoinfo/issues/12132003-11-20T00:23:55ZSaurabh Gargsgarg@nceas.ucsb.edu
<p>The emlb6 to eml2 conversion stylesheets that are used by webmdentry are <br />presently in a temp directory. This is the directory where all the emlb6 files <br />are downloaded and converted. But the stylesheets should go in a permanent <br />directory. They are presently in the temp directory because they are not able <br />to convert emlb6 files if the emlb6 files are downloaded to any other directory <br />other than the directory in which the stylesheets are stored. This is the bug <br />which needs to fixed so that the stylesheets should be able to transform the <br />emlb6 files irrespective of where they are downloaded.</p> Bug #421 (In Progress): create simple turnkey installer for metacat Phase IIhttps://projects.ecoinformatics.org/ecoinfo/issues/4212002-02-13T18:32:47ZChad Berkleyberkley@nceas.ucsb.edu
<p>we need to use the previously protyped metacat installer to build a robust, one<br />click installer for metacat that includes Tomcat, Ant, Metacat, PostgresSQL and<br />any other tools that are necessary.</p>
<p>We should do this for the next release of Metacat.</p> Bug #195 (In Progress): allow metacat to store files on multiple fshttps://projects.ecoinformatics.org/ecoinfo/issues/1952001-04-09T19:53:06ZMatt Jonesjones@nceas.ucsb.edu
<p>Metacat currently stores files on a single file system. Need to changes this so<br />that Metacat can be configured to store files on multiple file systems in case<br />space management by the administrator requires this.</p>