Bug #2507
closedData Manager Library: Create a EML parser lib to digest eml document
0%
Description
Currently, the EML actor in Kepler can download eml document and parse it. After parsing, the entity information in eml document will be stored in java object and data file will be download into local file system and also be stored in relation db too.
We want to seperate this process from Kepler and make it as lib in eml module. So this library can be used in Kepler, Metacat and some other projects.
Related issues
Updated by Jing Tao over 18 years ago
Here is our plan:
Creating 3 packages in eml src dir:
1. org.ecoinformatics.eml.digestor package and main class is EML200Parser. The main class can be copied from kepler module.
2. org.ecoinformatics.eml.download package and main class is DataDistributionHandler. The this class will implement Runnable interface and API is
DataDistributionHandler(Entity entity);
run();
3. org.ecoinformatics.eml.db package and main class is is TableGenerator. The API of the class is:
TableGenerator(Enity entity, File localFile);
generateTable();
getTableName();
loadDataToTable();
The function of download package is very similar to cache system of Kepler. I am thinking how to reuse those code in kepler.
Updated by Jing Tao over 18 years ago
In order to make download package more configurable, I would like to change to constructor to:
DataDistributionHandler(Entity entity, File cacheDir, File fileName);
Updated by Jing Tao over 18 years ago
Here is the change in org.ecoinformatics.eml.db package:
Main class is SQLCommandHandler and API is:
SQLCommandHandler(DBConnection conn, String plugInName)
generateTable(Entity entity, File fileName) and it will return the generated table name as string;
dropTable(String tableName);
excuteSelectionSQLComman(String sqlCommand) and it return a Resultset object;
The org.ecoinformatics.eml.degestor package API is:
EML200Parser(InputStream stream);
EML200Parser(InputSource source);
getEntityList() and it will return a vector;
parse();
Updated by Jing Tao over 18 years ago
New package name are suggested:
org.ecoinformatics.eml.digestor.parser
org.ecoinformatics.eml.digestor.download
org.ecoinformatics.eml.digestor.db
Updated by Matt Jones over 18 years ago
digestor is a bit of a crude name. How about "loader"?
org.ecoinformatics.eml.loader.parser
org.ecoinformatics.eml.loader.download
org.ecoinformatics.eml.loader.database
This is an improvement but still not totally great. Suggestions welcome.
Updated by James Brunt over 18 years ago
I like loader but it's not a perfect fit for the way the work is divided which is more like parse (eml) -> create (table) -> source (data). Correct?
Updated by Duane Costa about 18 years ago
We named the top-level package "org.ecoinformatics.datamanager". The complete set of packages is:
org.ecoinformatics.datamanager
org.ecoinformatics.datamanager.database
org.ecoinformatics.datamanager.download
org.ecoinformatics.datamanager.parser
org.ecoinformatics.datamanager.parser.eml
Updated by Duane Costa about 18 years ago
- Bug 2504 has been marked as a duplicate of this bug. ***
Updated by ben leinfelder almost 15 years ago
this has been completed. Moreover, it has been extended to support any XML schema that makes use of the EML dataSet module.