Project

General

Profile

1 6878 jones
DataONE Member Node Support
2
===========================
3
DataONE_ is a federation of data repositories that aims to improve
4
interoperability among data repository software systems and advance the
5
preservation of scientific data for future use.
6
Metacat deployments can be configured to participate in DataONE_. This
7
chapter describes the DataONE_ data federation,  its architecture, and the
8
way in which Metacat can be used to participate as a node in the DataONE system.
9 6850 jones
10 6878 jones
.. _DataONE: http://dataone.org/
11
12
What is DataONE?
13
----------------
14
The DataONE_ project is a collaboration among scientists, technologists, librarians,
15
and social scientists to build a robust, interoperable, and sustainable system for
16
preserving and accessing Earth observational data at national and global scales.
17
Supported by the U.S. National Science Foundation, DataONE partners focus on
18 6885 jones
technological, financial, and organizational sustainability approaches to
19 6878 jones
building a distributed network of data repositories that are fully interoperable,
20
even when those repositories use divergent underlying software and support different
21
data and metadata content standards. DataONE defines a common web-service service
22
programming interface that allows the main software components of the DataONE system
23
to seamlessly communicate. The components of the DataONE system include:
24
25
* DataONE Service Interface
26
* Member Nodes
27
* Coordinating Nodes
28
* Investigator Toolkit
29
30
Metacat implements the services needed to operate as a DataONE Member Node,
31
as described below.  The service interface then allows many different scientific
32
software tools for data management, analysis, visualization and other parts of
33
the scientific lifecycle to directly communicate with Metacat without being
34
further specialized beyond the support needed for DataONE.  This streamlines the
35 6885 jones
process of writing scientific software both for servers and client tools.
36 6878 jones
37
The DataONE Service Interface
38
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39 6885 jones
DataONE acheives interoperability by defining a lightweight but powerful set of
40
REST_ web services that can be implemented by various data management software
41
systems to allow those systems to effectively communicate with one another,
42
exchange data, metadata, and other scientific objects.  This `DataONE Service Interface`_
43 6878 jones
is an open standard that defines the communication protocols and technical
44
expectations for software components that wish to participate in the DataONE
45
federation. This service interface is divided into `four distinct tiers`_, with the
46
intention that any given software system may implement only those tiers that are
47
relevant to their repository; for example, a data aggregator might only implement
48
the Tier 1 interfaces that provide anonymous access to public data sets, while
49
a complete data management system like Metacat can implement all four tiers:
50
51
1. **Tier 1:** Read-only, anonymous data access
52
2. **Tier 2:** Read-only, with authentication and access control
53
3. **Tier 3:** Full Write access
54
4. **Tier 4:** Replication target services
55
56 6885 jones
.. _REST: http://en.wikipedia.org/wiki/Representational_state_transfer
57
58 6878 jones
.. _DataONE Service Interface: http://releases.dataone.org/online/d1-architecture-1.0.0
59
60
.. _four distinct tiers: http://releases.dataone.org/online/d1-architecture-1.0.0/apis/index.html
61
62
Member Nodes
63
~~~~~~~~~~~~
64 6885 jones
In DataONE, Member Nodes represent the core of the network, in that they represent
65
particular scientific communities, manage and preserve their data and metadata, and
66
provide tools to their community for contributing, managing, and accessing data.
67
DataONE provides a standard way for these individual repositories to interact, and helps
68
to coordinate among the Member Nodes in the federation.  This allows Member Nodes
69
to provide services to each other, such as replication of data for backup and failover.
70
To be a Member Node, a repository must implement the Member Node service interface,
71
and then register with DataONE.  Metacat provides this implementation automatically,
72
and provides an easy configuration option to register a Metacat instance as a
73
DataONE Member Node (see configuration section below). If you are deploying a Metacat
74
instance, it is relatively simple to become a Member Node, but keep in mind that
75
DataONE is aiming for longevity and preservation, and so is selecting for nodes
76
that have long-term data preservation as part of their mission.
77 6878 jones
78
Coordinating Nodes
79
~~~~~~~~~~~~~~~~~~
80 6885 jones
The DataONE Coordinating Nodes provide a set of services to Member Nodes that
81
allow Member Nodes to easily interact with one another and to provide a unified
82
view of the whole DataONE Federation.  The main services provided by Coordinating
83
Nodes are:
84 6878 jones
85 6885 jones
* Global search index for all metadata and web portal for data discovery
86
* Resolution service to map unique identifiers to the Member Nodes that hold data
87
* Authentication against a shared set of accounts based on CILogon_ and InCommon_
88
* Replication management services to reliably replicate data according to
89
  policies set by the Member Nodes
90
* Fixity checking to ensure that preserved objects remain valid
91
* Member Node registration and management
92
* Aggregated logging for data access across the whole federation
93
94
Three geographically distributed Coordinating Nodes replicate these coordinating
95
services at UC Santa Barbara, the University of New Mexico, and the Oak Ridge Campus.
96
Coordinating Nodes are set up in a fully redundant manner, such that any of the coordinating
97
nodes can be offline and the others will continue to provide availability of the services
98
without interruption.  The DataONE services expose their services at::
99
100
  https://cn.dataone.org/cn
101
102
And the DataONE search portal is available at:
103
104
  https://cn.dataone.org/
105
106
.. _CILogon: http://www.cilogon.org
107
108
.. _InCommon: http://incommon.org
109
110 6878 jones
Investigator Toolkit
111
~~~~~~~~~~~~~~~~~~~~
112 6885 jones
In order to provide scientists with convenient access to the data and metadata in
113
DataONE, the third component represents a library of software tools that have been
114
adapted to work with DataONE via the service interface and can be used to
115
discover, manage, analyze, and visualize data in DataONE.  For example, DataONE
116
plans to release metadata editors (e.g., Morpho), data search tools (e.g., Mercury),
117
data access tools (e.g., ONEDrive), and data analysis tools (e.g., R) that all
118
know how to interact with DataONE Member Nodes and Coordinating Nodes.  Consequently,
119
scientists will be able to access data from any DataONE Member Node, such as a Metacat
120
node, directly from within the R environment.  In addition, software tools that
121
are written to work with one Member Node should also work with others, thereby
122
greatly increasing the efficiency of creating an entire toolkit of software that
123
is useful to investigators.
124 6878 jones
125 6885 jones
Because DataONE services are REST web services, software written in any
126
programming language can be adapted to interact with DataONE.
127
In addition, to ease the process of adapting tools to work with DataONE, libraries
128
are provided for common programming languages such as Java (d1-libclient-java)
129
and Python (d1_libclient-python) are provided that allow simple function calls
130
to be used to access any DataONE service.
131 6878 jones
132 6885 jones
Configuring Metacat as a Member Node
133
------------------------------------
134
Configuring Metacat as a DataONE Member Node is accomplished with the standard
135
Metacat Administrative configuration utility. To access the utility, visit the
136
following URL::
137
138
  http://<yourhost.org>/<context>/admin
139
140
where ``<yourhost.org>`` represents the hostname of your webserver running metacat,
141
and ``<context>`` is the name of the web context in which Metacat was installed.
142
Once at the administrative utility, click on the DataONE configuration link, which
143
should show the following screen:
144
145
.. figure:: images/screenshots/screen-dataone-config.png
146
   :align: center
147
148
   The configuration screen for configuring Metacat as a DataONE node.
149
150
Being a replication target
151
~~~~~~~~~~~~~~~~~~~~~~~~~~
152
TODO: Describe the configuraiton for acting as a replication target.
153
154
Replication Policies
155
--------------------
156
TODO: Describe the replication policies for objects in DataONE.
157
158
Access Control Policies
159
-----------------------
160
TODO: Describe access control for objects in DataONE.
161
162
163