Project

General

Profile

1
Replication
2
===========
3

    
4
.. Note:: 
5
  
6
  Note that much of the functionality provided by the replication subsystem in Metacat
7
  has now been generalized and standardized by DataONE, so consider utilizing the
8
  DataONE services for replication as it is a more general and standardized approach
9
  than this Metacat-specific replication system.  The Metacat replication system
10
  will be supported for a while longer, but will likely be deprecated in a future
11
  release in favor of using the DataONE replication approach. 
12

    
13
Metacat has a built-in replication feature that allows different Metacat servers 
14
to share data (both XML documents and data files) between each other. Metacat 
15
can replicate not only its home server's original documents, but also those 
16
that were replicated from partner Metacat servers. When changes are made to 
17
one server in a replication network, the changes are automatically propogated 
18
to the network, even if the network is down.
19

    
20
Replication allows users to manage their data locally and (by replicating them 
21
to a shared Metacat repository) to make those data available to the greater 
22
scientific community via a centralized search. In other words, your Metacat can 
23
be part of a broader network, but you retain control over the local repository 
24
and how it is managed.
25

    
26
For example, the KNB Network (Figure 6.1), which currently consists of ten 
27
different Metacat servers from around the world, uses replication to "join" 
28
the disperate servers to form a single robust and searchable data 
29
repository--facilitating data discovery, while leaving the data ownership and 
30
management with the local administrators.
31

    
32
.. figure:: images/screenshots/image059.jpg
33
   :align: center
34
   
35
   A map of the KNB Metacat network.
36

    
37
When properly configured, Metacat's replication mechanism can be triggered by 
38
several types of events that occur on either the home or partner server: a 
39
document insertion, an update, or an automatic replication (i.e., Delta-T 
40
monitoring), which is set at a user-specified time interval.
41

    
42
+----------------------+----------------------------------------------------------+
43
| Replication Triggers | Description                                              |
44
+======================+==========================================================+
45
| Insert               | Whenever a document is inserted into Metacat, the server |
46
|                      | notifies each server in its replication list             |
47
|                      | that it has a new file available.                        |
48
+----------------------+----------------------------------------------------------+
49
| Update               | Whenever a document is updated, the server notifies      |
50
|                      | each server in its replication list of the update.       |
51
+----------------------+----------------------------------------------------------+
52
| Delta-T monitoring   | At a user-specified time interval, Metacat checks each   |
53
|                      | of the servers in its replication list                   |
54
|                      | for updated documents.                                   |
55
+----------------------+----------------------------------------------------------+
56

    
57
Configuring Replication
58
-----------------------
59
To configure replication, you must configure both the home and partner servers:
60

    
61
1. Create a list of partner servers on your home server using the Replication Control Panel
62
2. Create certificate files for the home server
63
3. Create certificate files for the partner server
64
4. Import partner certificate files to the home server
65
5. Import home certificate to the partner server
66
6. Update your Metacat database 
67

    
68
Each step is discussed in more detail in the following sections.
69

    
70
Using the Replication Control Panel
71
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72
To add, remove, or alter servers on your home server's Replication list, or to 
73
activate and customize the Delta-T handler, use the Replication control panel, 
74
which is accessed via the Metacat Administration interface at the following URL::
75
 
76
   http://somehost.somelocation.edu/context/admin
77
   
78
"http://somehost.somelocation.edu/context" should be replaced with the name 
79
of your Metacat server and context (e.g., http://knb.ecoinformatics.org/knb/). 
80
You must be logged in to Metacat as an administrator.
81

    
82
.. figure:: images/screenshots/image061.jpg
83
   :align: center
84
   
85
   Replication control panel.
86

    
87
Note that currently, you cannot use the Replication Control Panel to remove a 
88
server after a replication has occurred. To stop replication between two servers,
89
update the flags that control whether metadata and/or data are replicated.
90

    
91
Generating and Exchanging Security Certificates
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93
Before you can take advantage of Metacat's replication feature, you must 
94
generate security certificates on both the replication partner and home servers. 
95
Depending on how the certificates are generated, the certificates may need to be
96
exchanged so that each machine "trusts" that the other has replication access.
97
Certificates that are purchased from a commercial and well-recognized 
98
Certificate Authority do not need to be exchanged with the other replication
99
partner before replication takes place.  Metacat replication relies on SSL with
100
client certificate authentication enabled.  When a replication partner server 
101
communicates with another replication partner, it presents a certificate that
102
serves to verify and authenticate that the server is trusted.
103

    
104
If you must generate a self-signed certificate, the partner replication server 
105
will need that public certificate added to its existing Certificate Authorities. 
106

    
107
Generate Certificates for Metacat running under Apache/Tomcat
108
.............................................................
109
Note: Instructions are for Ubuntu/Debian systems.
110

    
111
1. Generate a private key using openssl. The key will be named 
112
   ``<hostname>-apache.key``, where ``<hostname>`` is the name of your Metacat 
113
   server. Example values for the individual key fields are included in the
114
   table below.
115

    
116
   ::
117
   
118
     openssl req -new -out REQ.pem -keyout <hostname>-apache.key
119

    
120
   +--------------------------+-------------------------------------------------------------------------+
121
   | Key Field                | Description and Example Value                                           |
122
   +==========================+=========================================================================+
123
   | Country Name             | Two letter country code  (e.g., US)                                     |
124
   +--------------------------+-------------------------------------------------------------------------+
125
   | State or Province Name   | The name of your state or province spelled in full (e.g., California)   |
126
   +--------------------------+-------------------------------------------------------------------------+
127
   | Locality Name            | The name of your city (e.g., Santa Barbara)                             |
128
   +--------------------------+-------------------------------------------------------------------------+
129
   | Organization Name        | The company or organization name (e.g., UCSB)                           |
130
   +--------------------------+-------------------------------------------------------------------------+
131
   | Organizational Unit Name | The department or section name (e.g., NCEAS)                            |
132
   +--------------------------+-------------------------------------------------------------------------+
133
   | Common Name              | The host server name without port numbers (e.g., myserver.mydomain.edu) |
134
   +--------------------------+-------------------------------------------------------------------------+
135
   | Email Address            | Administrator's contact email (e.g., administrator@mydomain.edu)        |
136
   +--------------------------+-------------------------------------------------------------------------+
137
   | A challenge password     | --leave this field blank--                                              |
138
   +--------------------------+-------------------------------------------------------------------------+
139
   | An optional company name | --leave this field blank--                                              |
140
   +--------------------------+-------------------------------------------------------------------------+
141

    
142
2. Create the local certificate file by running the command:
143

    
144
   ::
145
   
146
     openssl req -x509 -days 800 -in REQ.pem -key <hostname>-apache.key -out <hostname>-apache.crt
147

    
148
   Use the same ``<hostname>`` you used when you generated the key. A file named 
149
   ``<hostname>-apache.crt`` will be created in the directory from which you 
150
   ran the openssl command. Note: You can name the certificate file anything 
151
   you'd like, but keep in mind that the file will be sent to the partner 
152
   machine used for replication. The certificate name should have enough 
153
   meaning that someone who sees it on that machine can figure out where it 
154
   came from and for what purpose it should be used. 
155

    
156
3. Enter the certificate into Apache's security configuration. This will
157
   be used to identify your server to a replication partner. You must 
158
   register the certificate in the local Apache instance. Note that the 
159
   security files may be in a different directory from the one used in the 
160
   instructions depending on how you installed Apache. Copy the certificate and 
161
   key file using the following commands:
162
   
163
   ::
164
   
165
     sudo cp <hostname>-apache.crt /etc/ssl/certs 
166
     sudo cp <hostname>-apache.key /etc/ssl/private 
167

    
168
4. Apache needs to be configured to request a client certificate when the 
169
   replication API is utilized. The helper file named "knb-ssl" has default 
170
   rules that configure Apache for SSL and client certificate authentication. 
171
   Set up these SSL settings by copying the knb-ssl file into the ``sites-available`` 
172
   directory, editing pertinent values to match your system and running 
173
   ``a2ensite`` to enable the site. (Note: some settings in knb-ssl need to be 
174
   changed to match the specifics of your system.) 
175

    
176
   ::
177
   
178
     sudo cp <metacat_helper_dir>/knb-ssl <apache_install_dir>/sites-available
179
     sudo a2ensite knb-ssl
180

    
181
5. Enable the ssl module: 
182

    
183
   ::
184
   
185
     sudo a2enmod ssl
186

    
187
6. Restart Apache to bring in changes by typing: 
188

    
189
   ::
190
   
191
     sudo /etc/init.d/apache2 restart
192

    
193
7. If using a self-signed certificate, SCP ``<hostname>-apache.crt`` to the 
194
   replication partner machine where it will be added as an additional 
195
   Certificate Authority.
196

    
197
If using self-signed certificates, after you have created and SCP'd a 
198
certificate file to each replication partner, and received a certificate file 
199
from each partner in return, both home and partner servers must add the 
200
respective partner certificates as Certificate Authorities.
201

    
202

    
203
To import a certificate
204
.......................
205
1. Copy it into the Apache directory
206
   
207
   ::
208
   
209
     sudo cp <remotehostfilename> /etc/ssl/certs/
210

    
211
2. Rehash the certificates for Apache by running: 
212

    
213
   ::
214
   
215
     cd /etc/ssl/certs
216
     sudo c_rehash
217

    
218

    
219
   where the ``<remotehostfilename>`` is the name of the certificate file 
220
   created on the remote partner machine and SCP'd to the home machine. 
221

    
222
Update your Metacat database
223
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224
The simplest way to update the Metacat database to use replication is to use 
225
the Replication Control Panel. You can also update the database using SQL. 
226
Instructions for both options are included in this section.
227

    
228
.. figure:: images/screenshots/image063.jpg
229
   :align: center
230
   
231
   Using the Replication Control Panel to update the Metacat database.
232

    
233
To update your Metacat database to use replication, select the "Add this server" 
234
radio button from the Replication Control Panel, enter the partner server name, 
235
and specify how the replication should occur (whether to replicate xml, data, 
236
or use the local machine as a hub).
237

    
238
To update the database using SQL
239
................................
240

    
241
1. Log in to the database
242

    
243
   ::
244
   
245
     psql -U metacat -W -h localhost metacat
246

    
247
2. Select all rows from the replication table
248

    
249
   ::
250

    
251
     select * from xml_replication;  
252

    
253
3. Insert the partner server. 
254

    
255
   ::
256
   
257
     INSERT INTO xml_replication (server,last_checked,replicate,datareplicate,hub) VALUES ('<partner.server/context>/servlet/replication',NULL,1,1,0);
258

    
259
   Where ``<partner.server/context>`` is the name of the partner server and 
260
   context. The values 'NULL, 1,1,0' indicate (respectively) the last time 
261
   replication occurred, that XML docs should be replicated to the partner 
262
   server, that data files should be replicated to the partner server, and 
263
   that the local server should not act as a hub. Set a value of 'NULL,0,0,0' 
264
   if your Metacat is only receiving documents from the partner site and not 
265
   replicating to that site.
266

    
267
4. Exit the database 
268
5. Restart Apache and Tomcat on both home and partner replication machines 
(18-18/20)