Metacat: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362010-11-12T17:31:01ZEcoinformatics Redmine
Redmine Bug #5239 (New): Difficulties configuring and viewing Metacat replication log filehttps://projects.ecoinformatics.org/ecoinfo/issues/52392010-11-12T17:31:01ZDuane Costadcosta@lternet.edu
<p>While working with Chad to restore replication between KNB and LTER, I encountered issues with configuring and viewing the replication log file. As detailed in the email exchanges below, I configured a replication log directory and file name in the following three places:</p>
<p>1. Metacat administrative interface, 'Metacat Properties Configuration' page:</p>
<pre><code>Replication Log Directory /home/tcat/local/metacat_data/metacat/logs<br /> [The directory where replication log should be located.]</code></pre>
<p>2. metacat.properties file</p>
<pre><code>replication.logdir=/home/tcat/local/metacat_data/metacat/logs</code></pre>
<p>3. log4j.properties file<br /> log4j.appender.replication.File=/home/tcat/local/metacat_data/metacat/logs/metacatreplication.log</p>
<p>However, Metacat seems to ignore these configuration values and instead sends the output to the Tomcat log file, 'catalina.out'.</p>
<hr />
<p>In an email to Chad Berkley on 2010-11-12, Duane Costa wrote:</p>
<pre><code>Hi Chad,<br /> .<br /> .<br /> .<br /> There's still a problem with the replication log file. I configured it<br /> exactly as described below, but the file has still not been touched since<br /> 4/26/2010. There is, however, a bunch of replication output in the Tomcat<br /> catalina.out file. Apparently, Metacat is ignoring my configuration<br /> settings and sending the output to the default location of catalina.out.<br /> This is a fairly minor issue, but I'll open a Bugzilla bug report on it so<br /> that it can be recorded.</code></pre>
<pre><code>Thanks,<br /> Duane</code></pre>
<hr />
<p>In an email to Duane Costa on 2010-11-10, Chad Berkley wrote:</p>
<p>On 11/10/10 2:19 PM, Duane Costa wrote:</p>
<blockquote>
<p>Ours is configured as follows:</p>
<p>log4j.appender.replication.File = ${replication.logfile.name}</p>
<p>On the other hand, the metacat.properties file has:</p>
<p>replication.logdir=/home/tcat/local/metacat_data/metacat/logs</p>
<p>And the Metacat administrative interface displays the following:</p>
<p>Replication Log Directory /home/tcat/local/metacat_data/metacat/logs<br />[The directory where replication log should be located.]</p>
<p>This raises a few questions. You don't have to try to answer them. I'm<br />just trying to point out that the current situation is confusing, and<br />hopefully this can be simplified in future versions of Metacat.</p>
<p>1. Do we configure a directory name or a file name? (A file name<br />according to the first setting; a directory name according to the next<br />two.)</p>
</blockquote>
<p>I'm not sure of the original intent here. It would be nice if we just gave it a directory and all of the log files are just put there.</p>
<blockquote>
<p>2. There used to be two different replication output files:<br />'metacatreplicationerror.log', 'metacatreplication.log'. Is there now<br />only one?</p>
</blockquote>
<p>Again, not sure. Seems like there should really only be one.</p>
<blockquote>
<p>3. If log4j.properties now controls where the replication output is<br />sent, are the other two settings no longer used by the system?</p>
</blockquote>
<p>It's always controlled the output, I just think that there's something wrong with the token system that isn't updating the log4j file. I think there's a bug in the admin interface where it's not updating the log4j.properties file as well as the metacat.properties file. I think that token is supposed to be replaced by the actual path to the file.</p>
<blockquote>
<p>4. Should '${replication.logfile.name}' in log4j.properties be manually<br />edited, or is 'replication.logfile.name' the name of a property that is<br />being set somewhere else? If the latter, where is it being controlled? I<br />don't see it in either build.properties or metacat.properties.</p>
</blockquote>
<p>It's safe to manually edit it. I just did and it worked fine.</p>
<blockquote>
<p>For now I will explicitly set the value in log4j.properties to the<br />following:</p>
<p>log4j.appender.replication.File=/home/tcat/local/metacat_data/metacat/logs/metacatreplication.log</p>
</blockquote>
<p>Yep, that should work.</p>
<p>I'm seeing a bunchof files coming from lter now, unfortunately there are also a lot of errors, but I'm not sure if they're critical. Here's an example:</p>
<p>knb 20101110-14:42:45: [ERROR]: ReplicationHandler.handleSingleXMLDocument - Failed to write doc knb-lter-gce.113.11 into db because The file you are trying to write already exists in metacat. Please update your version number. [ReplicationLogging]</p>
<p>It might just be trying to get all of your files and using that exception to handle the "already exists" case, but I'm not exactly sure. Lets see if it mirrors correctly when it's done.</p>
<p>chad</p>
<blockquote>
<p>Thanks,<br />Duane</p>
<p>On 11/10/2010 4:43 PM, Chad Berkley wrote:</p>
<blockquote>
<p>Check your webapps/knb/WEB-INF/log4j.properties file. It should show<br />you where the output for replication is going. I've attached the<br />log4j.properties file from knb for reference. By default, all logs<br />will be sent to catalina.out in tomcat/logs. If you configure the<br />replication log, it will go to whatever file you configure<br />(/var/metacat/logs/metacatreplication.log in the attached file).</p>
<p>chad</p>
</blockquote></blockquote> Bug #5199 (New): OAI-PMH: Improve memory management of data provider catalog metadatahttps://projects.ecoinformatics.org/ecoinfo/issues/51992010-10-11T16:16:44ZDuane Costadcosta@lternet.edu
<p>On October 6, 2010, Marco Fahmi wrote:</p>
<p>What scares me about this code is that it stores the whole metadata catalog<br />in memory (static member fields docTypeMap and dateMap) at class-load time,<br />which is on the first call to the OAIHandler servlet.</p>
<p>So that check for the refreshDate is checking whether the entire catalog<br />needs to be reloaded into memory! I can't see that scaling to any extent.</p>
<hr />
<p>On October 11, 2010, Duane Costa and Mark Servilla wrote in a reply to Marco Fahmi:</p>
<p>With regard to the memory management issue -- the OAI-PMH data provider code has been tested on a repository of 375 EML documents with no apparent problem regarding memory. We will soon be applying the code to a much larger repository (~10,000 documents) so we'll have a better sense of whether the current implementation scales. One distinction we'd like to make is that when you state that "the entire catalog" is loaded into memory, please note that only the documents' identifiers, their EML versions, and their revision dates are loaded into memory; the full contents of the documents themselves are not stored in memory but are instead retrieved from the Metacat database only as needed. In any case, we agree that it's likely that a better memory management solution should ultimately be implemented for storing the subset of catalog metadata that is currently held in memory.</p> Bug #5198 (New): OAI-PMH: Data provider may fail to trigger reharvest of documentshttps://projects.ecoinformatics.org/ecoinfo/issues/51982010-10-11T16:11:47ZDuane Costadcosta@lternet.edu
<p>On Oct 6, 2010, Marco Fahmi (<a class="email" href="mailto:hani.fahmi@qut.edu.au">hani.fahmi@qut.edu.au</a>) wrote:</p>
<p>From my tests I found out that the metacat OAI PMH Provider stores something<br />called "refreshDate" which is the last time the xml is generated and<br />compares it to maxDateUpdated (date that the data package records table had<br />last been modified). If the maxDateUpdated is newer than the refreshDate,<br />OAI-PMH XML will be refreshed. I'm sure you can see the problem with this<br />immediately but I'll just elaborate for our documentations about morpho and<br />metacat.</p>
<p>Sample Scenario:</p>
<p>a. 1st October 10 AM - Harvester/User accesses OAI-PMH Provider URL and an<br />XML is returned for the first time, refreshDate is set at 1st October<br />b. 1st October 11 AM - Researcher delete/update/add a data package through<br />Morpho to metacat, and the maxDateUpdated = '1st October'<br />c. 1st October 12 AM - Harvester/user accesses the provider URL again and<br />refreshDate is compared with maxDateUpdated. There is no indication of time,<br />so there will be no refreshing of the OAI-PMH XML since the date is the<br />same.<br />d. 2nd October 12 AM - Harvester/user accesses the provider URL again and<br />refreshDate is compared with maxDateUpdated. There is still no difference<br />between the dates and the bug is not resolved until someone decided to<br />update on this day or future days or tomcat is refreshed.</p>
<p>Conclusion: Any updates on the same day with the refreshDate will not be<br />reflected and may return the NullPointerException error in the case of data<br />package deletion until a data package is modified/added"</p>
<p>"If you change the String comparison in MetacatCatalog.shouldRefeshCatalog()<br />to be:</p>
<pre><code>else if (maxDateUpdated.compareTo(refreshDate) >= 0) {</code></pre>
<p>Then you will get fresh data when the refreshDate is equal to<br />maxDateUpdated. Not an ideal fix, but should get you past that bug.</p>
<p>--</p>
<p>On October 11, 2010, Duane Costa and Mark Servilla wrote in a reply to Marco Fahmi:</p>
<p>The bug relating to the sample scenario you outline is a known limitation of the OAI-PMH data provider code. It originates from the fact that Metacat stores the 'date_created' and 'date_updated' fields as type 'date', rather than type 'timestamp', in the database table that stores metadata about EML documents. This makes it difficult for the data provider to know when to re-harvest based on anything more fine-grained than the current date. We're not certain that the change you suggest would actually resolve the issue: we think it would re-load the data catalog into memory, but it wouldn't necessarily trigger the re-harvest of a document. We agree that this is a significant limitation of the data provider code and we think further analysis will be needed before a good solution can be implemented.</p> Bug #4245 (In Progress): Harvester command line scripts don't executehttps://projects.ecoinformatics.org/ecoinfo/issues/42452009-07-13T21:55:42ZDuane Costadcosta@lternet.edu
<p>Metacat Harvester is normally launched as a Java servlet, but also has the option of being invoked manually from a pair of command-line scripts ('lib/harvester/runHarvester.bat' on Windows, 'lib/harvester/runHarvester.sh' on Linux). As of Metacat 1.9.x, execution of Metacat Harvester via the command-line scripts is not working.</p>
<p>Solution:<br /> 1. Additional dependencies need to be specified in the Java CLASSPATH:<br /> a. METACAT_LIB/log4j-1.2.12.jar<br /> b. METACAT_LIB/xalan.jar<br /> c. METACAT_LIB/postgresql-8.0-312.jdbc3.jar (for POSTGRESQL)<br /> 2. The Harvester.java class needs the following changes:<br /> a. Add support for log4j initialization in the 'main' method.<br /> b. In the 'loadProperties()' method, change the PropertyService constructor from 'PropertyService.getInstance();' to 'PropertyService.getTestInstance(configDir);' where 'configDir' is a relative path to the directory where 'metacat.properties' resides.</p>
<p>Note: The solution implemented to resolve this problem for Metacat Harvester will also be beneficial toward the implementation of the new Metacat OAI-PMH Harvester described in Bug <a class="issue tracker-1 status-2 priority-5 priority-highest" title="Bug: design and implement OAI-PMH compliant harvest subsystem (In Progress)" href="https://projects.ecoinformatics.org/ecoinfo/issues/3835">#3835</a>.</p> Bug #4243 (In Progress): Harvester db errors due to fixed character length overflowhttps://projects.ecoinformatics.org/ecoinfo/issues/42432009-07-13T16:59:04ZDuane Costadcosta@lternet.edu
<p>In a recent release of Metacat (1.9.0), the Harvester property names were<br />refactored to begin with the prefix 'harvester.'. Some of Harvester property<br />names are used as operation codes in metacat's 'harvest_log' table,<br />'harvest_operation_code' field, which is declared with a fixed length of 30<br />characters. The 'harvester.ValidateHarvestListSuccess' code is 35 chars, which<br />exceeds the limit and results in DB errors on record insertion during a harvest.</p> Bug #3416 (New): ArrayIndexOutOfBoundsException in DBConnectionPoolhttps://projects.ecoinformatics.org/ecoinfo/issues/34162008-06-25T18:15:35ZDuane Costadcosta@lternet.edu
<p>The following error has appeared in the Tomcat log files at LTER several times:</p>
<p>Jun 25, 2008 9:55:54 AM org.apache.catalina.core.StandardWrapperValve invoke<br />SEVERE: Servlet.service() for servlet metacat threw exception<br />java.lang.ArrayIndexOutOfBoundsException: 9 >= 9<br /> at java.util.Vector.elementAt(Vector.java:432)<br /> at edu.ucsb.nceas.metacat.DBConnectionPool.printMethodNameHavingBusyDBConnection(DBConnectionPool.java:617)<br /> at edu.ucsb.nceas.metacat.MetaCatServlet.handleGetOrPost(MetaCatServlet.java:481)<br /> at edu.ucsb.nceas.metacat.MetaCatServlet.doPost(MetaCatServlet.java:312)<br /> at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)<br /> at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)<br /> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)<br /> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)<br /> at org.vfny.geoserver.filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)<br /> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)<br /> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)<br /> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)<br /> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)<br /> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)<br /> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)<br /> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)<br /> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)<br /> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)<br /> at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)<br /> at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)<br /> at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)<br /> at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)<br /> at java.lang.Thread.run(Thread.java:595)</p>
<p>The relevant source code in DBConnectionPool.java is:</p>
<pre><code>/**
* Method to print out the method name which have busy DBconnection<br /> */<br /> public void printMethodNameHavingBusyDBConnection()
{</code></pre>
<pre><code>DBConnection db = null; //single DBconnection<br /> int poolSize = 0; //size of connection pool<br /> //get the size of DBConnection pool<br /> poolSize = connectionPool.size();<br /> //Check every DBConnection in the pool<br /> for ( int i=0; i&lt;poolSize; i++)
{</code></pre>
<pre><code>db = (DBConnection) connectionPool.elementAt(i);<br /> //check the status of db. If it is free, count it<br /> if (db.getStatus() == BUSY)
{<br /> logMetacat.warn("This method having a busy DBConnection: " <br /> +db.getCheckOutMethodName());<br /> logMetacat.warn("The busy DBConnection tag is: " <br /> +db.getTag());<br /> }//if<br /> }//for</code></pre>
<pre><code>}//printMethodNameHavingBusyDBConnection</code></pre>
<p>It looks like this could be a thread safety issue. Perhaps the poolSize changes between the time that it is assigned:</p>
<pre><code>poolSize = connectionPool.size();</code></pre>
<p>and the time that the Vector is accessed:</p>
<pre><code>db = (DBConnection) connectionPool.elementAt(i);</code></pre>
<p>If this is the case, then it seems that a synchronized() block might be needed.</p>
<p>Thanks,<br />Duane</p>