consolidating data access user interfaces
Currently Kepler contains several distinct methods for binding data sources to a
workflow. These include the EML200DataSource actor, the JDBC data source
actor(s), the incipient EcoGrid access interfaces, the GridFTP actor, and
probably others. Each of these exposes the data in a different way, and is
therefore multiply representing data in a confusing way. We need to consolidate
these approaches to find a single UI that can encapsualte all of the data access
This proposal is to use and adapt the user interface described in
kepler/docs/dev/screenshots and related design documents to data access in
EcoGrid, GridFTP, JDBC, and other sources. This would allow a user to view data
uniformly in the workflow, regardless of which data access protocol is used to
get the data. This would also allow the user to specify subsetting constraints
(WHERE clause) uniformly, and to choose which attributes from the joined
relations are exposed to the workflow. Finally, it would allow us to use richer
metadata descriptions of underspecified data sources (like those found at the
other end of JDBC connections) so that the user (and ultimately the SEEK SMS
system) can reason about these data sources effectively.
Updated by Jing Tao almost 19 years ago
Here is the summary after the meeting in March 11:
Federate Metadata across different communities: Create a unified metadata
object, called DataProxy. The DataProxy can get the metadata
(probably\uffffusing a\uffffDataSystem class\uffffas described below) and
parse\uffffit using different metadata formats interpreter, such as EML, Darwin
Core, ADN, FGDC, etc... After parsing the\uffffmetadata, the DataProxy object
will have the info to download\uffffthe\uffffdata object\uffffas described by
the metadata specification\uffffand pass the info to proper DataSystem class to
download the data. The API will include the following
InputStream getFullMetadata(String id, String endPoints);
DataSystem parseMetadata(InpuStream metadata);
void downloadData(DataSystem object);
The DataSystem class:\uffffa generic class to handle get data object (including
metadata object) from different data sources (data system).
Inputstream getData(String identifier, String endPoints);
InputStream getData(other signatures).\uffff
Extending classes: EcoGridDataSystem, MetacatDataSystem(for metacats\uffffthat
don't implement the ecogrid interface), JDBCDataSystem, etc...
Certificate authority: Create a single centralized certificate authority to
provide a shared infrastructure to access and maintain
different sites' certificate authorities, e.g., the GEON portal, the seek
different sites' CAs. - follow up with Karan for more
information about the Grid Account Management Architecture (GAMA) used in the
GEON portal authentication.
A unified web service access to the datasources:
- In order to support other clients than the Kepler interface to\uffffaccess the
- A web service access to datasources with no additional requirements (such as
registering). Communities can benefits from accessing each
other datasources directly.
The GEON and SEEK datasources\uffffaccess architectures\uffffare very similar -
a follow up meeting\uffffis required\uffffon consolidating datasources access
through a unified web service with Kai, Ashraf, Karan, Sandeep, Efrat, Chaitan
from GEON and folks from SEEK.
Unified query for data sources in Kepler either by
adding\uffffmore\uffffdatasources querying\uffffclasses (besides EMLDataSource
and DCDataSource), or
once there is a unified web service access, using a\uffffgeneric web service
actor to query all the data sources.