1 |
6878
|
jones
|
DataONE Member Node Support
|
2 |
|
|
===========================
|
3 |
|
|
DataONE_ is a federation of data repositories that aims to improve
|
4 |
|
|
interoperability among data repository software systems and advance the
|
5 |
|
|
preservation of scientific data for future use.
|
6 |
|
|
Metacat deployments can be configured to participate in DataONE_. This
|
7 |
|
|
chapter describes the DataONE_ data federation, its architecture, and the
|
8 |
|
|
way in which Metacat can be used to participate as a node in the DataONE system.
|
9 |
6850
|
jones
|
|
10 |
6878
|
jones
|
.. _DataONE: http://dataone.org/
|
11 |
|
|
|
12 |
|
|
What is DataONE?
|
13 |
|
|
----------------
|
14 |
|
|
The DataONE_ project is a collaboration among scientists, technologists, librarians,
|
15 |
|
|
and social scientists to build a robust, interoperable, and sustainable system for
|
16 |
|
|
preserving and accessing Earth observational data at national and global scales.
|
17 |
|
|
Supported by the U.S. National Science Foundation, DataONE partners focus on
|
18 |
6885
|
jones
|
technological, financial, and organizational sustainability approaches to
|
19 |
6878
|
jones
|
building a distributed network of data repositories that are fully interoperable,
|
20 |
|
|
even when those repositories use divergent underlying software and support different
|
21 |
|
|
data and metadata content standards. DataONE defines a common web-service service
|
22 |
|
|
programming interface that allows the main software components of the DataONE system
|
23 |
|
|
to seamlessly communicate. The components of the DataONE system include:
|
24 |
|
|
|
25 |
|
|
* DataONE Service Interface
|
26 |
|
|
* Member Nodes
|
27 |
|
|
* Coordinating Nodes
|
28 |
|
|
* Investigator Toolkit
|
29 |
|
|
|
30 |
|
|
Metacat implements the services needed to operate as a DataONE Member Node,
|
31 |
|
|
as described below. The service interface then allows many different scientific
|
32 |
|
|
software tools for data management, analysis, visualization and other parts of
|
33 |
|
|
the scientific lifecycle to directly communicate with Metacat without being
|
34 |
|
|
further specialized beyond the support needed for DataONE. This streamlines the
|
35 |
6885
|
jones
|
process of writing scientific software both for servers and client tools.
|
36 |
6878
|
jones
|
|
37 |
|
|
The DataONE Service Interface
|
38 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
39 |
6885
|
jones
|
DataONE acheives interoperability by defining a lightweight but powerful set of
|
40 |
|
|
REST_ web services that can be implemented by various data management software
|
41 |
|
|
systems to allow those systems to effectively communicate with one another,
|
42 |
|
|
exchange data, metadata, and other scientific objects. This `DataONE Service Interface`_
|
43 |
6878
|
jones
|
is an open standard that defines the communication protocols and technical
|
44 |
|
|
expectations for software components that wish to participate in the DataONE
|
45 |
|
|
federation. This service interface is divided into `four distinct tiers`_, with the
|
46 |
|
|
intention that any given software system may implement only those tiers that are
|
47 |
|
|
relevant to their repository; for example, a data aggregator might only implement
|
48 |
|
|
the Tier 1 interfaces that provide anonymous access to public data sets, while
|
49 |
|
|
a complete data management system like Metacat can implement all four tiers:
|
50 |
|
|
|
51 |
|
|
1. **Tier 1:** Read-only, anonymous data access
|
52 |
|
|
2. **Tier 2:** Read-only, with authentication and access control
|
53 |
|
|
3. **Tier 3:** Full Write access
|
54 |
|
|
4. **Tier 4:** Replication target services
|
55 |
|
|
|
56 |
6885
|
jones
|
.. _REST: http://en.wikipedia.org/wiki/Representational_state_transfer
|
57 |
|
|
|
58 |
6878
|
jones
|
.. _DataONE Service Interface: http://releases.dataone.org/online/d1-architecture-1.0.0
|
59 |
|
|
|
60 |
|
|
.. _four distinct tiers: http://releases.dataone.org/online/d1-architecture-1.0.0/apis/index.html
|
61 |
|
|
|
62 |
|
|
Member Nodes
|
63 |
|
|
~~~~~~~~~~~~
|
64 |
6885
|
jones
|
In DataONE, Member Nodes represent the core of the network, in that they represent
|
65 |
|
|
particular scientific communities, manage and preserve their data and metadata, and
|
66 |
|
|
provide tools to their community for contributing, managing, and accessing data.
|
67 |
|
|
DataONE provides a standard way for these individual repositories to interact, and helps
|
68 |
|
|
to coordinate among the Member Nodes in the federation. This allows Member Nodes
|
69 |
|
|
to provide services to each other, such as replication of data for backup and failover.
|
70 |
|
|
To be a Member Node, a repository must implement the Member Node service interface,
|
71 |
|
|
and then register with DataONE. Metacat provides this implementation automatically,
|
72 |
|
|
and provides an easy configuration option to register a Metacat instance as a
|
73 |
|
|
DataONE Member Node (see configuration section below). If you are deploying a Metacat
|
74 |
|
|
instance, it is relatively simple to become a Member Node, but keep in mind that
|
75 |
|
|
DataONE is aiming for longevity and preservation, and so is selecting for nodes
|
76 |
|
|
that have long-term data preservation as part of their mission.
|
77 |
6878
|
jones
|
|
78 |
|
|
Coordinating Nodes
|
79 |
|
|
~~~~~~~~~~~~~~~~~~
|
80 |
6885
|
jones
|
The DataONE Coordinating Nodes provide a set of services to Member Nodes that
|
81 |
|
|
allow Member Nodes to easily interact with one another and to provide a unified
|
82 |
|
|
view of the whole DataONE Federation. The main services provided by Coordinating
|
83 |
|
|
Nodes are:
|
84 |
6878
|
jones
|
|
85 |
6885
|
jones
|
* Global search index for all metadata and web portal for data discovery
|
86 |
|
|
* Resolution service to map unique identifiers to the Member Nodes that hold data
|
87 |
|
|
* Authentication against a shared set of accounts based on CILogon_ and InCommon_
|
88 |
|
|
* Replication management services to reliably replicate data according to
|
89 |
|
|
policies set by the Member Nodes
|
90 |
|
|
* Fixity checking to ensure that preserved objects remain valid
|
91 |
|
|
* Member Node registration and management
|
92 |
|
|
* Aggregated logging for data access across the whole federation
|
93 |
|
|
|
94 |
|
|
Three geographically distributed Coordinating Nodes replicate these coordinating
|
95 |
|
|
services at UC Santa Barbara, the University of New Mexico, and the Oak Ridge Campus.
|
96 |
|
|
Coordinating Nodes are set up in a fully redundant manner, such that any of the coordinating
|
97 |
|
|
nodes can be offline and the others will continue to provide availability of the services
|
98 |
|
|
without interruption. The DataONE services expose their services at::
|
99 |
|
|
|
100 |
|
|
https://cn.dataone.org/cn
|
101 |
|
|
|
102 |
|
|
And the DataONE search portal is available at:
|
103 |
|
|
|
104 |
|
|
https://cn.dataone.org/
|
105 |
|
|
|
106 |
|
|
.. _CILogon: http://www.cilogon.org
|
107 |
|
|
|
108 |
|
|
.. _InCommon: http://incommon.org
|
109 |
|
|
|
110 |
6878
|
jones
|
Investigator Toolkit
|
111 |
|
|
~~~~~~~~~~~~~~~~~~~~
|
112 |
6885
|
jones
|
In order to provide scientists with convenient access to the data and metadata in
|
113 |
|
|
DataONE, the third component represents a library of software tools that have been
|
114 |
|
|
adapted to work with DataONE via the service interface and can be used to
|
115 |
|
|
discover, manage, analyze, and visualize data in DataONE. For example, DataONE
|
116 |
|
|
plans to release metadata editors (e.g., Morpho), data search tools (e.g., Mercury),
|
117 |
|
|
data access tools (e.g., ONEDrive), and data analysis tools (e.g., R) that all
|
118 |
|
|
know how to interact with DataONE Member Nodes and Coordinating Nodes. Consequently,
|
119 |
|
|
scientists will be able to access data from any DataONE Member Node, such as a Metacat
|
120 |
|
|
node, directly from within the R environment. In addition, software tools that
|
121 |
|
|
are written to work with one Member Node should also work with others, thereby
|
122 |
|
|
greatly increasing the efficiency of creating an entire toolkit of software that
|
123 |
|
|
is useful to investigators.
|
124 |
6878
|
jones
|
|
125 |
6885
|
jones
|
Because DataONE services are REST web services, software written in any
|
126 |
|
|
programming language can be adapted to interact with DataONE.
|
127 |
|
|
In addition, to ease the process of adapting tools to work with DataONE, libraries
|
128 |
|
|
are provided for common programming languages such as Java (d1-libclient-java)
|
129 |
|
|
and Python (d1_libclient-python) are provided that allow simple function calls
|
130 |
|
|
to be used to access any DataONE service.
|
131 |
6878
|
jones
|
|
132 |
6885
|
jones
|
Configuring Metacat as a Member Node
|
133 |
|
|
------------------------------------
|
134 |
|
|
Configuring Metacat as a DataONE Member Node is accomplished with the standard
|
135 |
|
|
Metacat Administrative configuration utility. To access the utility, visit the
|
136 |
|
|
following URL::
|
137 |
|
|
|
138 |
|
|
http://<yourhost.org>/<context>/admin
|
139 |
|
|
|
140 |
|
|
where ``<yourhost.org>`` represents the hostname of your webserver running metacat,
|
141 |
|
|
and ``<context>`` is the name of the web context in which Metacat was installed.
|
142 |
|
|
Once at the administrative utility, click on the DataONE configuration link, which
|
143 |
|
|
should show the following screen:
|
144 |
|
|
|
145 |
|
|
.. figure:: images/screenshots/screen-dataone-config.png
|
146 |
|
|
:align: center
|
147 |
|
|
|
148 |
|
|
The configuration screen for configuring Metacat as a DataONE node.
|
149 |
|
|
|
150 |
|
|
Being a replication target
|
151 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
152 |
|
|
TODO: Describe the configuraiton for acting as a replication target.
|
153 |
|
|
|
154 |
|
|
Replication Policies
|
155 |
|
|
--------------------
|
156 |
|
|
TODO: Describe the replication policies for objects in DataONE.
|
157 |
|
|
|
158 |
|
|
Access Control Policies
|
159 |
|
|
-----------------------
|
160 |
|
|
TODO: Describe access control for objects in DataONE.
|
161 |
|
|
|
162 |
|
|
|
163 |
|
|
|