DataONE Member Node Support =========================== DataONE_ is a federation of data repositories that aims to improve interoperability among data repository software systems and advance the preservation of scientific data for future use. Metacat deployments can be configured to participate in DataONE_. This chapter describes the DataONE_ data federation, its architecture, and the way in which Metacat can be used to participate as a node in the DataONE system. .. _DataONE: http://dataone.org/ What is DataONE? ---------------- The DataONE_ project is a collaboration among scientists, technologists, librarians, and social scientists to build a robust, interoperable, and sustainable system for preserving and accessing Earth observational data at national and global scales. Supported by the U.S. National Science Foundation, DataONE partners focus on technological, financial, and organizational sustainability approaches to building a distributed network of data repositories that are fully interoperable, even when those repositories use divergent underlying software and support different data and metadata content standards. DataONE defines a common web-service service programming interface that allows the main software components of the DataONE system to seamlessly communicate. The components of the DataONE system include: * DataONE Service Interface * Member Nodes * Coordinating Nodes * Investigator Toolkit Metacat implements the services needed to operate as a DataONE Member Node, as described below. The service interface then allows many different scientific software tools for data management, analysis, visualization and other parts of the scientific lifecycle to directly communicate with Metacat without being further specialized beyond the support needed for DataONE. This streamlines the process of writing scientific software both for servers and client tools. The DataONE Service Interface ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DataONE acheives interoperability by defining a lightweight but powerful set of REST_ web services that can be implemented by various data management software systems to allow those systems to effectively communicate with one another, exchange data, metadata, and other scientific objects. This `DataONE Service Interface`_ is an open standard that defines the communication protocols and technical expectations for software components that wish to participate in the DataONE federation. This service interface is divided into `four distinct tiers`_, with the intention that any given software system may implement only those tiers that are relevant to their repository; for example, a data aggregator might only implement the Tier 1 interfaces that provide anonymous access to public data sets, while a complete data management system like Metacat can implement all four tiers: 1. **Tier 1:** Read-only, anonymous data access 2. **Tier 2:** Read-only, with authentication and access control 3. **Tier 3:** Full Write access 4. **Tier 4:** Replication target services .. _REST: http://en.wikipedia.org/wiki/Representational_state_transfer .. _DataONE Service Interface: http://releases.dataone.org/online/d1-architecture-1.0.0 .. _four distinct tiers: http://releases.dataone.org/online/d1-architecture-1.0.0/apis/index.html Member Nodes ~~~~~~~~~~~~ In DataONE, Member Nodes represent the core of the network, in that they represent particular scientific communities, manage and preserve their data and metadata, and provide tools to their community for contributing, managing, and accessing data. DataONE provides a standard way for these individual repositories to interact, and helps to coordinate among the Member Nodes in the federation. This allows Member Nodes to provide services to each other, such as replication of data for backup and failover. To be a Member Node, a repository must implement the Member Node service interface, and then register with DataONE. Metacat provides this implementation automatically, and provides an easy configuration option to register a Metacat instance as a DataONE Member Node (see configuration section below). If you are deploying a Metacat instance, it is relatively simple to become a Member Node, but keep in mind that DataONE is aiming for longevity and preservation, and so is selecting for nodes that have long-term data preservation as part of their mission. Coordinating Nodes ~~~~~~~~~~~~~~~~~~ The DataONE Coordinating Nodes provide a set of services to Member Nodes that allow Member Nodes to easily interact with one another and to provide a unified view of the whole DataONE Federation. The main services provided by Coordinating Nodes are: * Global search index for all metadata and web portal for data discovery * Resolution service to map unique identifiers to the Member Nodes that hold data * Authentication against a shared set of accounts based on CILogon_ and InCommon_ * Replication management services to reliably replicate data according to policies set by the Member Nodes * Fixity checking to ensure that preserved objects remain valid * Member Node registration and management * Aggregated logging for data access across the whole federation Three geographically distributed Coordinating Nodes replicate these coordinating services at UC Santa Barbara, the University of New Mexico, and the Oak Ridge Campus. Coordinating Nodes are set up in a fully redundant manner, such that any of the coordinating nodes can be offline and the others will continue to provide availability of the services without interruption. The DataONE services expose their services at:: https://cn.dataone.org/cn And the DataONE search portal is available at: https://cn.dataone.org/ .. _CILogon: http://www.cilogon.org .. _InCommon: http://incommon.org Investigator Toolkit ~~~~~~~~~~~~~~~~~~~~ In order to provide scientists with convenient access to the data and metadata in DataONE, the third component represents a library of software tools that have been adapted to work with DataONE via the service interface and can be used to discover, manage, analyze, and visualize data in DataONE. For example, DataONE plans to release metadata editors (e.g., Morpho), data search tools (e.g., Mercury), data access tools (e.g., ONEDrive), and data analysis tools (e.g., R) that all know how to interact with DataONE Member Nodes and Coordinating Nodes. Consequently, scientists will be able to access data from any DataONE Member Node, such as a Metacat node, directly from within the R environment. In addition, software tools that are written to work with one Member Node should also work with others, thereby greatly increasing the efficiency of creating an entire toolkit of software that is useful to investigators. Because DataONE services are REST web services, software written in any programming language can be adapted to interact with DataONE. In addition, to ease the process of adapting tools to work with DataONE, libraries are provided for common programming languages such as Java (d1-libclient-java) and Python (d1_libclient-python) are provided that allow simple function calls to be used to access any DataONE service. Configuring Metacat as a Member Node ------------------------------------ Configuring Metacat as a DataONE Member Node is accomplished with the standard Metacat Administrative configuration utility. To access the utility, visit the following URL:: http:////admin where ```` represents the hostname of your webserver running metacat, and ```` is the name of the web context in which Metacat was installed. Once at the administrative utility, click on the DataONE configuration link, which should show the following screen: .. figure:: images/screenshots/screen-dataone-config.png :align: center The configuration screen for configuring Metacat as a DataONE node. Being a replication target ~~~~~~~~~~~~~~~~~~~~~~~~~~ TODO: Describe the configuraiton for acting as a replication target. Replication Policies -------------------- TODO: Describe the replication policies for objects in DataONE. Access Control Policies ----------------------- TODO: Describe access control for objects in DataONE.