Bug #2507
closed
Data Manager Library: Create a EML parser lib to digest eml document
Added by Jing Tao over 18 years ago.
Updated almost 15 years ago.
Description
Currently, the EML actor in Kepler can download eml document and parse it. After parsing, the entity information in eml document will be stored in java object and data file will be download into local file system and also be stored in relation db too.
We want to seperate this process from Kepler and make it as lib in eml module. So this library can be used in Kepler, Metacat and some other projects.
Here is our plan:
Creating 3 packages in eml src dir:
1. org.ecoinformatics.eml.digestor package and main class is EML200Parser. The main class can be copied from kepler module.
2. org.ecoinformatics.eml.download package and main class is DataDistributionHandler. The this class will implement Runnable interface and API is
DataDistributionHandler(Entity entity);
run();
3. org.ecoinformatics.eml.db package and main class is is TableGenerator. The API of the class is:
TableGenerator(Enity entity, File localFile);
generateTable();
getTableName();
loadDataToTable();
The function of download package is very similar to cache system of Kepler. I am thinking how to reuse those code in kepler.
In order to make download package more configurable, I would like to change to constructor to:
DataDistributionHandler(Entity entity, File cacheDir, File fileName);
Here is the change in org.ecoinformatics.eml.db package:
Main class is SQLCommandHandler and API is:
SQLCommandHandler(DBConnection conn, String plugInName)
generateTable(Entity entity, File fileName) and it will return the generated table name as string;
dropTable(String tableName);
excuteSelectionSQLComman(String sqlCommand) and it return a Resultset object;
The org.ecoinformatics.eml.degestor package API is:
EML200Parser(InputStream stream);
EML200Parser(InputSource source);
getEntityList() and it will return a vector;
parse();
New package name are suggested:
org.ecoinformatics.eml.digestor.parser
org.ecoinformatics.eml.digestor.download
org.ecoinformatics.eml.digestor.db
digestor is a bit of a crude name. How about "loader"?
org.ecoinformatics.eml.loader.parser
org.ecoinformatics.eml.loader.download
org.ecoinformatics.eml.loader.database
This is an improvement but still not totally great. Suggestions welcome.
I like loader but it's not a perfect fit for the way the work is divided which is more like parse (eml) -> create (table) -> source (data). Correct?
We named the top-level package "org.ecoinformatics.datamanager". The complete set of packages is:
org.ecoinformatics.datamanager
org.ecoinformatics.datamanager.database
org.ecoinformatics.datamanager.download
org.ecoinformatics.datamanager.parser
org.ecoinformatics.datamanager.parser.eml
- Bug 2504 has been marked as a duplicate of this bug. ***
this has been completed. Moreover, it has been extended to support any XML schema that makes use of the EML dataSet module.
Original Bugzilla ID was 2507
Also available in: Atom
PDF