1
|
DataONE Member Node Support
|
2
|
===========================
|
3
|
DataONE_ is a federation of data repositories that aims to improve
|
4
|
interoperability among data repository software systems and advance the
|
5
|
preservation of scientific data for future use.
|
6
|
Metacat deployments can be configured to participate in DataONE_. This
|
7
|
chapter describes the DataONE_ data federation, its architecture, and the
|
8
|
way in which Metacat can be used to participate as a node in the DataONE system.
|
9
|
|
10
|
.. _DataONE: http://dataone.org/
|
11
|
|
12
|
What is DataONE?
|
13
|
----------------
|
14
|
The DataONE_ project is a collaboration among scientists, technologists, librarians,
|
15
|
and social scientists to build a robust, interoperable, and sustainable system for
|
16
|
preserving and accessing Earth observational data at national and global scales.
|
17
|
Supported by the U.S. National Science Foundation, DataONE partners focus on
|
18
|
technological, financial, and organizational sustainability approaches to
|
19
|
building a distributed network of data repositories that are fully interoperable,
|
20
|
even when those repositories use divergent underlying software and support different
|
21
|
data and metadata content standards. DataONE defines a common web-service service
|
22
|
programming interface that allows the main software components of the DataONE system
|
23
|
to seamlessly communicate. The components of the DataONE system include:
|
24
|
|
25
|
* DataONE Service Interface
|
26
|
* Member Nodes
|
27
|
* Coordinating Nodes
|
28
|
* Investigator Toolkit
|
29
|
|
30
|
Metacat implements the services needed to operate as a DataONE Member Node,
|
31
|
as described below. The service interface then allows many different scientific
|
32
|
software tools for data management, analysis, visualization and other parts of
|
33
|
the scientific lifecycle to directly communicate with Metacat without being
|
34
|
further specialized beyond the support needed for DataONE. This streamlines the
|
35
|
process of writing scientific software both for servers and client tools.
|
36
|
|
37
|
The DataONE Service Interface
|
38
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
39
|
DataONE acheives interoperability by defining a lightweight but powerful set of
|
40
|
REST_ web services that can be implemented by various data management software
|
41
|
systems to allow those systems to effectively communicate with one another,
|
42
|
exchange data, metadata, and other scientific objects. This `DataONE Service Interface`_
|
43
|
is an open standard that defines the communication protocols and technical
|
44
|
expectations for software components that wish to participate in the DataONE
|
45
|
federation. This service interface is divided into `four distinct tiers`_, with the
|
46
|
intention that any given software system may implement only those tiers that are
|
47
|
relevant to their repository; for example, a data aggregator might only implement
|
48
|
the Tier 1 interfaces that provide anonymous access to public data sets, while
|
49
|
a complete data management system like Metacat can implement all four tiers:
|
50
|
|
51
|
1. **Tier 1:** Read-only, anonymous data access
|
52
|
2. **Tier 2:** Read-only, with authentication and access control
|
53
|
3. **Tier 3:** Full Write access
|
54
|
4. **Tier 4:** Replication target services
|
55
|
|
56
|
.. _REST: http://en.wikipedia.org/wiki/Representational_state_transfer
|
57
|
|
58
|
.. _DataONE Service Interface: http://releases.dataone.org/online/d1-architecture-1.0.0
|
59
|
|
60
|
.. _four distinct tiers: http://releases.dataone.org/online/d1-architecture-1.0.0/apis/index.html
|
61
|
|
62
|
Member Nodes
|
63
|
~~~~~~~~~~~~
|
64
|
In DataONE, Member Nodes represent the core of the network, in that they represent
|
65
|
particular scientific communities, manage and preserve their data and metadata, and
|
66
|
provide tools to their community for contributing, managing, and accessing data.
|
67
|
DataONE provides a standard way for these individual repositories to interact, and helps
|
68
|
to coordinate among the Member Nodes in the federation. This allows Member Nodes
|
69
|
to provide services to each other, such as replication of data for backup and failover.
|
70
|
To be a Member Node, a repository must implement the Member Node service interface,
|
71
|
and then register with DataONE. Metacat provides this implementation automatically,
|
72
|
and provides an easy configuration option to register a Metacat instance as a
|
73
|
DataONE Member Node (see configuration section below). If you are deploying a Metacat
|
74
|
instance, it is relatively simple to become a Member Node, but keep in mind that
|
75
|
DataONE is aiming for longevity and preservation, and so is selecting for nodes
|
76
|
that have long-term data preservation as part of their mission.
|
77
|
|
78
|
Coordinating Nodes
|
79
|
~~~~~~~~~~~~~~~~~~
|
80
|
The DataONE Coordinating Nodes provide a set of services to Member Nodes that
|
81
|
allow Member Nodes to easily interact with one another and to provide a unified
|
82
|
view of the whole DataONE Federation. The main services provided by Coordinating
|
83
|
Nodes are:
|
84
|
|
85
|
* Global search index for all metadata and web portal for data discovery
|
86
|
* Resolution service to map unique identifiers to the Member Nodes that hold data
|
87
|
* Authentication against a shared set of accounts based on CILogon_ and InCommon_
|
88
|
* Replication management services to reliably replicate data according to
|
89
|
policies set by the Member Nodes
|
90
|
* Fixity checking to ensure that preserved objects remain valid
|
91
|
* Member Node registration and management
|
92
|
* Aggregated logging for data access across the whole federation
|
93
|
|
94
|
Three geographically distributed Coordinating Nodes replicate these coordinating
|
95
|
services at UC Santa Barbara, the University of New Mexico, and the Oak Ridge Campus.
|
96
|
Coordinating Nodes are set up in a fully redundant manner, such that any of the coordinating
|
97
|
nodes can be offline and the others will continue to provide availability of the services
|
98
|
without interruption. The DataONE services expose their services at::
|
99
|
|
100
|
https://cn.dataone.org/cn
|
101
|
|
102
|
And the DataONE search portal is available at:
|
103
|
|
104
|
https://cn.dataone.org/
|
105
|
|
106
|
.. _CILogon: http://www.cilogon.org
|
107
|
|
108
|
.. _InCommon: http://incommon.org
|
109
|
|
110
|
Investigator Toolkit
|
111
|
~~~~~~~~~~~~~~~~~~~~
|
112
|
In order to provide scientists with convenient access to the data and metadata in
|
113
|
DataONE, the third component represents a library of software tools that have been
|
114
|
adapted to work with DataONE via the service interface and can be used to
|
115
|
discover, manage, analyze, and visualize data in DataONE. For example, DataONE
|
116
|
plans to release metadata editors (e.g., Morpho), data search tools (e.g., Mercury),
|
117
|
data access tools (e.g., ONEDrive), and data analysis tools (e.g., R) that all
|
118
|
know how to interact with DataONE Member Nodes and Coordinating Nodes. Consequently,
|
119
|
scientists will be able to access data from any DataONE Member Node, such as a Metacat
|
120
|
node, directly from within the R environment. In addition, software tools that
|
121
|
are written to work with one Member Node should also work with others, thereby
|
122
|
greatly increasing the efficiency of creating an entire toolkit of software that
|
123
|
is useful to investigators.
|
124
|
|
125
|
Because DataONE services are REST web services, software written in any
|
126
|
programming language can be adapted to interact with DataONE.
|
127
|
In addition, to ease the process of adapting tools to work with DataONE, libraries
|
128
|
are provided for common programming languages such as Java (d1-libclient-java)
|
129
|
and Python (d1_libclient-python) are provided that allow simple function calls
|
130
|
to be used to access any DataONE service.
|
131
|
|
132
|
Configuring Metacat as a Member Node
|
133
|
------------------------------------
|
134
|
Configuring Metacat as a DataONE Member Node is accomplished with the standard
|
135
|
Metacat Administrative configuration utility. To access the utility, visit the
|
136
|
following URL::
|
137
|
|
138
|
http://<yourhost.org>/<context>/admin
|
139
|
|
140
|
where ``<yourhost.org>`` represents the hostname of your webserver running metacat,
|
141
|
and ``<context>`` is the name of the web context in which Metacat was installed.
|
142
|
Once at the administrative utility, click on the DataONE configuration link, which
|
143
|
should show the following screen:
|
144
|
|
145
|
.. figure:: images/screenshots/screen-dataone-config.png
|
146
|
:align: center
|
147
|
|
148
|
The configuration screen for configuring Metacat as a DataONE node.
|
149
|
|
150
|
Being a replication target
|
151
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
152
|
TODO: Describe the configuraiton for acting as a replication target.
|
153
|
|
154
|
Replication Policies
|
155
|
--------------------
|
156
|
TODO: Describe the replication policies for objects in DataONE.
|
157
|
|
158
|
Access Control Policies
|
159
|
-----------------------
|
160
|
TODO: Describe access control for objects in DataONE.
|
161
|
|
162
|
|
163
|
|
164
|
|