Project

General

Profile

Bug #4462

Component search in remote repositories not working

Added by Aaron Aaron about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
core
Target version:
Start date:
10/15/2009
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
4462

Description

Selecting the checkboxes next to remote repositories from the source button of the component library tab does not include any new components in the search results.


Related issues

Blocked by Kepler - Bug #2493: actor repository tracking bugResolved07/18/2006

Blocks Kepler - Bug #4781: Use the KarXmlGenerator to create the xml metadata sent to the metacat server during "Upload To Repository" actionResolved02/08/2010

Blocks Kepler - Bug #4798: Create parser for KAR XML specificationResolved02/16/2010

Blocked by Kepler - Bug #4517: Verify that the search capabilities workResolved10/29/2009

Blocks Kepler - Bug #4857: Search on ecogrid server does not return proper resultsResolved03/01/2010

History

#1 Updated by Aaron Aaron about 10 years ago

Adding to this bug the feature enhancement of being able to search in multiple remote repositories for components...

#2 Updated by Aaron Aaron about 10 years ago

The problem here was that org.kepler.objectmanager.repository.EcogridRepositoryLibrarySearcher was set as the _repositorySearcher configuration parameter but the class had been moved to org.kepler.gui. So it was not getting instantiated at startup. I moved the class to the right package and so no it gets instantiated.

But there are errors coming back from the Ecogrid which are very similar to errors we saw at some other point during our wrp testing session...

ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "61.05.21" in statement [SET TABLE T1939104900 SOURCE "urn.lsid.localdata.db73f738.0.0;fs=,"]
ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "99.05.01" in statement [SET TABLE T1939127964 SOURCE "urn.lsid.localdata.ca5b9640.0.0;fs=,"]
ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "87.11.01" in statement [SET TABLE T1939106822 SOURCE "urn.lsid.localdata.bedf2026.0.0;fs=,"]
ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "59.09.01" in statement [SET TABLE T1939103939 SOURCE "urn.lsid.localdata.7258e813.0.0;fs=,"]
ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "70.06.01" in statement [SET TABLE T1939105861 SOURCE "urn.lsid.localdata.fd9d1a92.0.0;fs=,"]
ERROR (org.ecoinformatics.seek.dataquery.DBTablesGenerator:generateDBTextTable:481) The error in generateDBTable is bad TEXT table source file - line number: 1 java.lang.NumberFormatException: For input string: "42.11.19" in statement [SET TABLE T1939102978 SOURCE "urn.lsid.localdata.c920ce0f.0.0;fs=,"]
ER

#3 Updated by Sean Riddle almost 10 years ago

Trying to search the kepler dev repository yielded the following errors:

[run] trying to clone tpc02-water-flow-base but you can't clone this object for some reason: Problem cloning 'output'
[run] ERROR (org.ecoinformatics.seek.datasource.EcogridDataCacheItem:getDataItemFromEcoGrid:326) EcogridDataCacheItem - error connecting to Ecogrid
[run] AxisFault
[run] faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
[run] faultSubcode:
[run] faultString: java.rmi.RemoteException: Error reading document: judithk.628
[run] faultActor:
[run] faultNode:
[run] faultDetail:
[run] {http://xml.apache.org/axis/}hostname:127.0.0.1
[run]
[run] java.rmi.RemoteException: Error reading document: judithk.628
[run] at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
[run] at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
[run] at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
[run] at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[run] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
[run] at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
[run] at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
[run] at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
[run] at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
[run] at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
[run] at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
[run] at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
[run] at org.apache.axis.client.Call.invoke(Call.java:2767)
[run] at org.apache.axis.client.Call.invoke(Call.java:2443)
[run] at org.apache.axis.client.Call.invoke(Call.java:2366)
[run] at org.apache.axis.client.Call.invoke(Call.java:1812)
[run] at org.ecoinformatics.ecogrid.queryservice.stub.QueryServiceStub.get(Unknown Source)
[run] at org.ecoinformatics.ecogrid.queryservice.QueryServiceGetToStreamClient.get(Unknown Source)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getDataItemFromEcoGrid(EcogridDataCacheItem.java:316)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getContentFromSource(EcogridDataCacheItem.java:159)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.downloadDataFromSource(EcogridDataCacheItem.java:94)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.doWork(EcogridDataCacheItem.java:85)
[run] at org.kepler.objectmanager.cache.DataCacheObject.run(DataCacheObject.java:422)
[run] at java.lang.Thread.run(Thread.java:613)
[run] ERROR (org.ecoinformatics.seek.datasource.EcogridDataCacheItem:getDataItemFromEcoGrid:326) EcogridDataCacheItem - error connecting to Ecogrid
[run] AxisFault
[run] faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
[run] faultSubcode:
[run] faultString: java.rmi.RemoteException: /var/metacat/data/judithk.894.1 (Too many open files)
[run] faultActor:
[run] faultNode:
[run] faultDetail:
[run] {http://xml.apache.org/axis/}hostname:127.0.0.1
[run]
[run] java.rmi.RemoteException: /var/metacat/data/judithk.894.1 (Too many open files)
[run] at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
[run] at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
[run] at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
[run] at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[run] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
[run] at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
[run] at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
[run] at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
[run] at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
[run] at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
[run] at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
[run] at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
[run] at org.apache.axis.client.Call.invoke(Call.java:2767)
[run] at org.apache.axis.client.Call.invoke(Call.java:2443)
[run] at org.apache.axis.client.Call.invoke(Call.java:2366)
[run] at org.apache.axis.client.Call.invoke(Call.java:1812)
[run] at org.ecoinformatics.ecogrid.queryservice.stub.QueryServiceStub.get(Unknown Source)
[run] at org.ecoinformatics.ecogrid.queryservice.QueryServiceGetToStreamClient.get(Unknown Source)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getDataItemFromEcoGrid(EcogridDataCacheItem.java:316)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getContentFromSource(EcogridDataCacheItem.java:159)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.downloadDataFromSource(EcogridDataCacheItem.java:94)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.doWork(EcogridDataCacheItem.java:85)
[run] at org.kepler.objectmanager.cache.DataCacheObject.run(DataCacheObject.java:422)
[run] at java.lang.Thread.run(Thread.java:613)
[run] ERROR (org.ecoinformatics.seek.datasource.EcogridDataCacheItem:getDataItemFromEcoGrid:326) EcogridDataCacheItem - error connecting to Ecogrid
[run] AxisFault
[run] faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
[run] faultSubcode:
[run] faultString: java.rmi.RemoteException: /var/metacat/data/judithk.630.1 (Too many open files)
[run] faultActor:
[run] faultNode:
[run] faultDetail:
[run] {http://xml.apache.org/axis/}hostname:127.0.0.1
[run]
[run] java.rmi.RemoteException: /var/metacat/data/judithk.630.1 (Too many open files)
[run] at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
[run] at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
[run] at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
[run] at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
[run] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[run] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[run] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
[run] at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
[run] at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
[run] at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
[run] at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
[run] at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
[run] at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
[run] at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
[run] at org.apache.axis.client.Call.invoke(Call.java:2767)
[run] at org.apache.axis.client.Call.invoke(Call.java:2443)
[run] at org.apache.axis.client.Call.invoke(Call.java:2366)
[run] at org.apache.axis.client.Call.invoke(Call.java:1812)
[run] at org.ecoinformatics.ecogrid.queryservice.stub.QueryServiceStub.get(Unknown Source)
[run] at org.ecoinformatics.ecogrid.queryservice.QueryServiceGetToStreamClient.get(Unknown Source)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getDataItemFromEcoGrid(EcogridDataCacheItem.java:316)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.getContentFromSource(EcogridDataCacheItem.java:159)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.downloadDataFromSource(EcogridDataCacheItem.java:94)
[run] at org.ecoinformatics.seek.datasource.EcogridDataCacheItem.doWork(EcogridDataCacheItem.java:85)
[run] at org.kepler.objectmanager.cache.DataCacheObject.run(DataCacheObject.java:422)
[run] at java.lang.Thread.run(Thread.java:613)
[remainder clipped]

#4 Updated by ben leinfelder almost 10 years ago

looks like each and every search result is being fully parsed, which then kicks off the data download for EML actors in the compositeActor...which then overwhelms Metacat...

#5 Updated by ben leinfelder almost 10 years ago

kepler-dev has been wiped clean.
This means we shouldn't be at risk of hosing up the KNB server when issuing searches for components in kepler pointing to kepler-dev.
The search result processing needs to avoid running each result through the MOML parser. I know Aaron was able to do this for local repositories, and I'm hopeful some of that work can be applied to processing remote results.

#6 Updated by Oliver Soong almost 10 years ago

I think the local repositories might still be running the MOML parser. At the very least, they seem to be downloading and caching all the data needed by the workflows.

#7 Updated by ben leinfelder almost 10 years ago

if synchronizing the local repositories is still running the moml parser, then that means you'll be prompted for authentication credentials if you are using protected data in your workflows. is that the case?

#8 Updated by ben leinfelder almost 10 years ago

In regard to replicating problematic components on Kepler-dev, I sent this email to Sean:
----
If I was correct about what they contained, a "safe" version would be to search for a dataset in Kepler ("Datos Meteorologicos" is a good clean test data set) and you just drag that onto the canvas.
Save that "workflow" to the Kepler-dev repository you'll have a testbed that hopefully won't cripple KNB when you search Kepler-dev for that workflow component. wilrwind
To clarify - I don't think there are any "damaged metacat records" - but there are certainly "problematic moml" files that reference lots of data on the KNB and bring it to a grinding halt when Kepler parses them in a whirlwind of new Threads.

#9 Updated by ben leinfelder almost 10 years ago

Need to verify with Aaron, but I believe:
components are now being uploaded using a new XML metadata format (a wrapped version of the entity MOML that has a bunch of other KAR manifest information in it),
We need to change the search. Possibly quite a lot.
Most crucial is the document type ("namespace" in ecogrid) being searched. Right now it is looking for 'entity' doc types whereas we will have to change it to search for 'kar' doctypes. Then I believe we'll have to actually download the actual kar file that contains the component and process that (rather than, say a plain old XML MOML file like it is now).

#10 Updated by ben leinfelder almost 10 years ago

if you go on Keper-dev's "dev skin" and search using this squery, you can see the "kar" metadata is returned.
(URL: http://kepler-dev.nceas.ucsb.edu/kepler/style/skins/dev/querymetacat.html)

<pathquery version='1.2'>
<returndoctype>kar</returndoctype>
<returnfield>/entity/@name</returnfield>
<returnfield>entity/property[@name='KeplerDocumentation']/property[@name='author']/configure</returnfield>
<returnfield>entity/property[@name='KeplerDocumentation']/property[@name='description']/configure</returnfield>
<returnfield>entity/property[@name='KeplerDocumentation']/property[@name='createDate']/configure</returnfield>
<returnfield>entity/property[@name='KeplerDocumentation']/property[@name='workflowId']/configure</returnfield>
<returnfield>entity/property[@name='karLSID']/@value</returnfield>
<returnfield>entity/property[@name='entityId']/@value</returnfield>
<querygroup operator='INTERSECT'>
<queryterm casesensitive='false' searchmode='contains'>
<value>%</value>
</queryterm>
</querygroup>
</pathquery>

#11 Updated by ben leinfelder almost 10 years ago

then there's a sample of this "kar metadata" file that aaron uploaded here:
http://kepler-dev.nceas.ucsb.edu/kepler/metacat/7885.93.1/xml

we'll want to search for /kar/mainAttributes/lsid to actually get the KAR file's lsid so we can download it as a component [say, when we drag the search result to the canvas].

#12 Updated by Jing Tao over 9 years ago

sean:

After searching the remote repository, the search result shows the docid, such as 8128:69:1 rather than the kar name.

#13 Updated by Sean Riddle over 9 years ago

Yes, I'm aware of that odd naming convention. The problem is, the KARs themselves don't have any other name on the repository, as far as I know. The only name or ID given to the KAR file itself, as opposed to something contained within it, is the LSID. It's far from perfect, and I encourage suggestions of what to use instead. Aaron, you might know a good way to generate a useful name.

#14 Updated by Aaron Aaron over 9 years ago

ummmm, the filename gets lost in metacat somewhere? You should just display the filename of the KAR file in the search results? If the file name gets lost for some reason going through Metacat then we could save the name as a Main attribute next to the lsid in the MANIFEST.MF

#15 Updated by Aaron Aaron over 9 years ago

Also Sean, I don't think it makes sense to have the ontology hierarchy inside of the KAR search results. Can you return kar results in addition to ontology results, in a similar manner as to how the local components are displayed?

#16 Updated by Sean Riddle over 9 years ago

I sent out an email about this a week ago. Ben and Chad thought this (semantic types under a KAR file) would be clearest. I can take another look at it, though.

#17 Updated by Aaron Aaron over 9 years ago

Sorry I wasn't following that thread... My input would be that if the search results are going to be displayed in the Component Library, then they should be organized in a similar fashion. If a different organization is desired, then the results (and perhaps the search function itself) should be completely separated from the local component search and results.

#18 Updated by Sean Riddle over 9 years ago

Personally, Aaron, I agree with you. I'll look into reorganizing the tree. Ben, Chad or others, let me know if you are really married to the old representation we discussed.

#19 Updated by Chad Berkley over 9 years ago

Sean says this is completed. Closing. Reopen if problems are found.

#20 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 4462

Also available in: Atom PDF