Project

General

Profile

« Previous | Next » 

Revision 6885

Added by Matt Jones over 12 years ago

Continued authoring the description of DataONE in Metacat. More to come.

View differences:

docs/user/metacat/source/replication.rst
1 1
Replication
2 2
===========
3

  
4
.. Note:: 
5
  
6
  Note that much of the functionality provided by the replication subsystem in Metacat
7
  has now been generalized and standardized by DataONE, so consider utilizing the
8
  DataONE services for replication as it is a more general and standardized approach
9
  than this Metacat-specific replication system.  The Metacat replication system
10
  will be supported for a while longer, but will likely be deprecated in a future
11
  release in favor of using the DataONE replication approach. 
12

  
3 13
Metacat has a built-in replication feature that allows different Metacat servers 
4 14
to share data (both XML documents and data files) between each other. Metacat 
5 15
can replicate not only its home server's original documents, but also those 
docs/user/metacat/source/dataone.rst
1 1
DataONE Member Node Support
2 2
===========================
3

  
4 3
DataONE_ is a federation of data repositories that aims to improve 
5 4
interoperability among data repository software systems and advance the
6 5
preservation of scientific data for future use.
......
16 15
and social scientists to build a robust, interoperable, and sustainable system for
17 16
preserving and accessing Earth observational data at national and global scales.  
18 17
Supported by the U.S. National Science Foundation, DataONE partners focus on
19
technological, finalncial, and organizational sustainability approaches to 
18
technological, financial, and organizational sustainability approaches to 
20 19
building a distributed network of data repositories that are fully interoperable,
21 20
even when those repositories use divergent underlying software and support different
22 21
data and metadata content standards. DataONE defines a common web-service service 
......
33 32
software tools for data management, analysis, visualization and other parts of 
34 33
the scientific lifecycle to directly communicate with Metacat without being
35 34
further specialized beyond the support needed for DataONE.  This streamlines the
36
process of writing scientific software on both for servers and client tools.
35
process of writing scientific software both for servers and client tools.
37 36

  
38 37
The DataONE Service Interface
39 38
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
40
DataONE acheives interoperability
41
by defining a lightweight but powerful set of web services that can be
42
implemented by various data management software systems to allow those systems
43
to effectively communicate with one another, exchange data, metadata, and other
44
scientific objects.  This `DataONE Service Interface`_
39
DataONE acheives interoperability by defining a lightweight but powerful set of 
40
REST_ web services that can be implemented by various data management software 
41
systems to allow those systems to effectively communicate with one another, 
42
exchange data, metadata, and other scientific objects.  This `DataONE Service Interface`_
45 43
is an open standard that defines the communication protocols and technical 
46 44
expectations for software components that wish to participate in the DataONE
47 45
federation. This service interface is divided into `four distinct tiers`_, with the 
......
55 53
3. **Tier 3:** Full Write access
56 54
4. **Tier 4:** Replication target services
57 55

  
56
.. _REST: http://en.wikipedia.org/wiki/Representational_state_transfer
57

  
58 58
.. _DataONE Service Interface: http://releases.dataone.org/online/d1-architecture-1.0.0
59 59

  
60 60
.. _four distinct tiers: http://releases.dataone.org/online/d1-architecture-1.0.0/apis/index.html
61 61

  
62 62
Member Nodes
63 63
~~~~~~~~~~~~
64
In DataONE, Member Nodes represent the core of the network, in that they represent
65
particular scientific communities, manage and preserve their data and metadata, and
66
provide tools to their community for contributing, managing, and accessing data.
67
DataONE provides a standard way for these individual repositories to interact, and helps
68
to coordinate among the Member Nodes in the federation.  This allows Member Nodes
69
to provide services to each other, such as replication of data for backup and failover.
70
To be a Member Node, a repository must implement the Member Node service interface, 
71
and then register with DataONE.  Metacat provides this implementation automatically,
72
and provides an easy configuration option to register a Metacat instance as a 
73
DataONE Member Node (see configuration section below). If you are deploying a Metacat
74
instance, it is relatively simple to become a Member Node, but keep in mind that 
75
DataONE is aiming for longevity and preservation, and so is selecting for nodes
76
that have long-term data preservation as part of their mission. 
64 77

  
65 78
Coordinating Nodes
66 79
~~~~~~~~~~~~~~~~~~
80
The DataONE Coordinating Nodes provide a set of services to Member Nodes that
81
allow Member Nodes to easily interact with one another and to provide a unified
82
view of the whole DataONE Federation.  The main services provided by Coordinating
83
Nodes are:
67 84

  
85
* Global search index for all metadata and web portal for data discovery
86
* Resolution service to map unique identifiers to the Member Nodes that hold data
87
* Authentication against a shared set of accounts based on CILogon_ and InCommon_
88
* Replication management services to reliably replicate data according to 
89
  policies set by the Member Nodes
90
* Fixity checking to ensure that preserved objects remain valid
91
* Member Node registration and management
92
* Aggregated logging for data access across the whole federation
93

  
94
Three geographically distributed Coordinating Nodes replicate these coordinating 
95
services at UC Santa Barbara, the University of New Mexico, and the Oak Ridge Campus.
96
Coordinating Nodes are set up in a fully redundant manner, such that any of the coordinating
97
nodes can be offline and the others will continue to provide availability of the services
98
without interruption.  The DataONE services expose their services at::
99

  
100
  https://cn.dataone.org/cn
101
  
102
And the DataONE search portal is available at:
103

  
104
  https://cn.dataone.org/
105

  
106
.. _CILogon: http://www.cilogon.org
107

  
108
.. _InCommon: http://incommon.org
109

  
68 110
Investigator Toolkit
69 111
~~~~~~~~~~~~~~~~~~~~
112
In order to provide scientists with convenient access to the data and metadata in
113
DataONE, the third component represents a library of software tools that have been 
114
adapted to work with DataONE via the service interface and can be used to
115
discover, manage, analyze, and visualize data in DataONE.  For example, DataONE
116
plans to release metadata editors (e.g., Morpho), data search tools (e.g., Mercury), 
117
data access tools (e.g., ONEDrive), and data analysis tools (e.g., R) that all 
118
know how to interact with DataONE Member Nodes and Coordinating Nodes.  Consequently,
119
scientists will be able to access data from any DataONE Member Node, such as a Metacat
120
node, directly from within the R environment.  In addition, software tools that 
121
are written to work with one Member Node should also work with others, thereby
122
greatly increasing the efficiency of creating an entire toolkit of software that
123
is useful to investigators.  
70 124

  
71
Metacat as a Member Node
72
------------------------
125
Because DataONE services are REST web services, software written in any
126
programming language can be adapted to interact with DataONE.
127
In addition, to ease the process of adapting tools to work with DataONE, libraries
128
are provided for common programming languages such as Java (d1-libclient-java) 
129
and Python (d1_libclient-python) are provided that allow simple function calls 
130
to be used to access any DataONE service.
73 131

  
132
Configuring Metacat as a Member Node
133
------------------------------------
134
Configuring Metacat as a DataONE Member Node is accomplished with the standard
135
Metacat Administrative configuration utility. To access the utility, visit the 
136
following URL::
137

  
138
  http://<yourhost.org>/<context>/admin
139
  
140
where ``<yourhost.org>`` represents the hostname of your webserver running metacat,
141
and ``<context>`` is the name of the web context in which Metacat was installed.
142
Once at the administrative utility, click on the DataONE configuration link, which
143
should show the following screen:
144

  
145
.. figure:: images/screenshots/screen-dataone-config.png
146
   :align: center
147
   
148
   The configuration screen for configuring Metacat as a DataONE node.
149
   
150
Being a replication target
151
~~~~~~~~~~~~~~~~~~~~~~~~~~
152
TODO: Describe the configuraiton for acting as a replication target.
153

  
154
Replication Policies
155
--------------------
156
TODO: Describe the replication policies for objects in DataONE.
157

  
158
Access Control Policies
159
-----------------------
160
TODO: Describe access control for objects in DataONE.
161

  
162

  
163

  
164

  
docs/user/metacat/source/configuration.rst
336 336

  
337 337
   Configuring Geoserver.
338 338

  
339
DataONE Configuration
340
~~~~~~~~~~~~~~~~~~~~~
341
Metacat can be configured to operate as a Member Node within the DataONE
342
federation of data repositories.  See :doc:`dataone` for background and details
343
on DataONE and details about configuring Metacat to act as a DataONE Member Node.
344

  
345
Replication Configuration
346
~~~~~~~~~~~~~~~~~~~~~~~~~
347
Metacat can be configured to replicate its metadata and/or data content to another
348
Metacat instance for backup and redundancy purposes, as well as to share data across
349
sites.  This feature has been used to create the Knowledge Network for Biocomplexity
350
(KNB), as well as other networks.  See :doc:`replication` for details on
351
the replication system and how to configure Metacat to replicate with another node.
352

  
353
.. Note:: 
354
  
355
  Note that much of the functionality provided by the replication subsystem in Metacat
356
  has now been generalized and standardized by DataONE, so consider utilizing the
357
  DataONE services for replication as it is a more general and standardized approach
358
  than this Metacat-specific replication system.  The Metacat replication system
359
  will be supported for a while longer, but will likely be deprecated in a future
360
  release in favor of using the DataONE replication approach. 
361

  
339 362
Additional Configuration
340 363
------------------------
341 364
The most dynamic Metacat properties are managed and modified with the 

Also available in: Unified diff