1
|
Introduction
|
2
|
============
|
3
|
|
4
|
Metacat is a repository for data and metadata (descriptions of data) that helps
|
5
|
scientists find, understand and effectively use the data sets they manage or
|
6
|
that have been created by others. Thousands of data sets are currently
|
7
|
documented in a standardized way and stored in Metacat systems, providing the
|
8
|
scientific community with a broad range of science data that--because the
|
9
|
data are well and consistently described--can be easily searched, compared,
|
10
|
merged, or used in other ways.
|
11
|
|
12
|
Not only is the Metacat repository a reliable place to store metadata and data
|
13
|
(the database is replicated over a secure connection so that every record is
|
14
|
stored on multiple machines and no data is ever lost to technical failures), it
|
15
|
provides a user-friendly interface for information entry and retrieval.
|
16
|
Scientists can search the repository via the Web using a customizable search
|
17
|
form. Searches return results based on user-specified criteria, such as desired
|
18
|
geographic coverage, taxonomic coverage, and/or keywords that appear in places
|
19
|
such as the data set's title or owner's name. Users need only click a linked
|
20
|
search result to open the corresponding data-set documentation in a browser
|
21
|
window and discover whom to contact to obtain the data themselves (or how to
|
22
|
immediately download the data via the Web).
|
23
|
|
24
|
Metacat's user-friendly Registry application allows data providers to enter
|
25
|
data set documentation into Metacat using a Web form. When the form is
|
26
|
submitted, Metacat compiles the provided documentation into the required format
|
27
|
and saves it. Information providers need never work directly with the XML_
|
28
|
format in which the metadata are stored or with the database records themselves. In
|
29
|
addition, the Metacat application can easily be extended to provide a
|
30
|
customized data-entry interface that suits the particular requirements of each
|
31
|
project. Metacat users can also choose to enter metadata using the Morpho
|
32
|
application, which provides data entry wizards that guide information providers
|
33
|
through the process of documenting each data set.
|
34
|
|
35
|
The metadata stored in Metacat includes all of the information needed
|
36
|
to understand what the described data are and how to use them: a
|
37
|
descriptive data set title; an abstract; the temporal, spatial, and taxonomic
|
38
|
coverage of the data; the data collection methods; distribution information;
|
39
|
and contact information. Each information provider decides who has access to
|
40
|
this information (the public, or just specified users), and whether or not to
|
41
|
upload the data set itself with the data documentation. Information providers
|
42
|
can also edit the metadata or delete it from the repository, again using
|
43
|
Metacat's straightforward Web interface.
|
44
|
|
45
|
Metacat is a `Java servlet`_ application that runs on Linux, Mac OS, and
|
46
|
Windows platforms in conjunction with a database, such as
|
47
|
PostgreSQL_ (or Oracle_), and a Web
|
48
|
server. The Metacat application stores data in an XML_ format using `Ecological
|
49
|
Metadata Language`_ (EML) or other metadata standards such as `ISO 19139`_ or the
|
50
|
`FGDC Biological Data Profile`_. For more
|
51
|
information about Metacat or for examples of projects currently using Metacat,
|
52
|
please see http://knb.ecoinformatics.org.
|
53
|
|
54
|
.. _XML: http://en.wikipedia.org/wiki/XML
|
55
|
|
56
|
.. _Java servlet: http://en.wikipedia.org/wiki/Java_Servlet
|
57
|
|
58
|
.. _PostgreSQL: http://www.postgresql.org/
|
59
|
|
60
|
.. _Oracle: http://www.oracle.com/
|
61
|
|
62
|
.. _Ecological Metadata Language: http://knb.ecoinformatics.org/software/eml
|
63
|
|
64
|
.. _ISO 19139: http://marinemetadata.org/references/iso19139
|
65
|
|
66
|
.. _FGDC Biological Data Profile: http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata
|
67
|
|
68
|
What's in this Guide
|
69
|
--------------------
|
70
|
This Administrator's guide includes information for installing, configuring,
|
71
|
managing and extending Metacat for both Linux, Mac OS, and Windows systems.
|
72
|
Chapter Two contains instructions for downloading and installing Metacat and the
|
73
|
applications required to run the software on Linux and Microsoft platforms.
|
74
|
Chapter Three covers how to configure Metacat, both for new and upgraded
|
75
|
installations. Chapter Four details the ways in which you can customize the
|
76
|
Metacat interface so users can access and submit information easily: using
|
77
|
Metacat's generic web-interface (the Registry), creating your own HTML forms,
|
78
|
and creating your own desktop client (like Morpho). Chapter Five discusses how
|
79
|
to work with Metacat's embedded Geoserver. Chapter Six describes how to set up the
|
80
|
Metacat's replication service, which permits Metacat servers to share data with
|
81
|
each other, effectively backing up metadata and data files. Chapter Seven looks
|
82
|
at the Metacat Harvester, a program that automates the retrieval of EML
|
83
|
documents from one or more sites and their subsequent upload (insert or update)
|
84
|
to Metacat. Chapter Eight discusses logging, Chapter Nine contains instructions
|
85
|
for creating a site map, which makes individual metadata entries available via
|
86
|
Web searches. Metacat's Java API is included as an appendix at the end of the
|
87
|
guide.
|
88
|
|
89
|
Metacat Features
|
90
|
----------------
|
91
|
Metacat is a repository for data and metadata (documentation about data), that
|
92
|
helps scientists find, understand and effectively use the data sets they manage or
|
93
|
that have been created by others. Specifically,
|
94
|
|
95
|
* Metacat is an open source web application, which can run on Linux, MacOS, and Windows operating systems and is written in Java
|
96
|
* Metacat's Web interface facilitates the input and retrieval of data
|
97
|
* Metacat's optional mapping functionality enables you to query and visualize the geographic coverage of stored data sets
|
98
|
* Metacat's replication feature ensures that all Metacat data and metadata is stored safely on multiple Metacat servers
|
99
|
* The Metacat interface can be easily extended and customized via Web forms, skins, and/or user-developed client tools in Java and other languages
|
100
|
* The Metacat harvester automates the process of retrieving and storing EML documents from one or more sites
|
101
|
* Metacat can be customized to use Life Sciences Identifiers (LSIDs), uniquely identifying every data record
|
102
|
* Metacat has a built-in logging system for tracking events such as document insertions, updates, deletes, and reads
|
103
|
* The appearance of Metacat's Web interface can be customized via skins.
|
104
|
* Metacat fully supports the DataONE Member Node interface, allowing Metacat deployments to easily participate in the DataONE federation
|
105
|
|
106
|
.. figure:: images/screenshots/image007.png
|
107
|
:align: center
|
108
|
|
109
|
Metacat's default home page. Users can customize the appearance using skins.
|
110
|
|
111
|
|
112
|
|