Project

General

Profile

1 6147 jones
Replication
2
===========
3 6885 jones
4
.. Note::
5
6
  Note that much of the functionality provided by the replication subsystem in Metacat
7
  has now been generalized and standardized by DataONE, so consider utilizing the
8
  DataONE services for replication as it is a more general and standardized approach
9
  than this Metacat-specific replication system.  The Metacat replication system
10
  will be supported for a while longer, but will likely be deprecated in a future
11
  release in favor of using the DataONE replication approach.
12
13 6845 jones
Metacat has a built-in replication feature that allows different Metacat servers
14
to share data (both XML documents and data files) between each other. Metacat
15
can replicate not only its home server's original documents, but also those
16
that were replicated from partner Metacat servers. When changes are made to
17
one server in a replication network, the changes are automatically propogated
18
to the network, even if the network is down.
19 6147 jones
20 6845 jones
Replication allows users to manage their data locally and (by replicating them
21
to a shared Metacat repository) to make those data available to the greater
22
scientific community via a centralized search. In other words, your Metacat can
23
be part of a broader network, but you retain control over the local repository
24
and how it is managed.
25 6147 jones
26 6845 jones
For example, the KNB Network (Figure 6.1), which currently consists of ten
27
different Metacat servers from around the world, uses replication to "join"
28
the disperate servers to form a single robust and searchable data
29
repository--facilitating data discovery, while leaving the data ownership and
30
management with the local administrators.
31 6147 jones
32 6845 jones
.. figure:: images/screenshots/image059.jpg
33
   :align: center
34
35
   A map of the KNB Metacat network.
36 6147 jones
37 6845 jones
When properly configured, Metacat's replication mechanism can be triggered by
38
several types of events that occur on either the home or partner server: a
39
document insertion, an update, or an automatic replication (i.e., Delta-T
40
monitoring), which is set at a user-specified time interval.
41
42
+----------------------+----------------------------------------------------------+
43
| Replication Triggers | Description                                              |
44
+======================+==========================================================+
45
| Insert               | Whenever a document is inserted into Metacat, the server |
46
|                      | notifies each server in its replication list             |
47
|                      | that it has a new file available.                        |
48
+----------------------+----------------------------------------------------------+
49
| Update               | Whenever a document is updated, the server notifies      |
50
|                      | each server in its replication list of the update.       |
51
+----------------------+----------------------------------------------------------+
52
| Delta-T monitoring   | At a user-specified time interval, Metacat checks each   |
53
|                      | of the servers in its replication list                   |
54
|                      | for updated documents.                                   |
55
+----------------------+----------------------------------------------------------+
56
57
Configuring Replication
58
-----------------------
59
To configure replication, you must configure both the home and partner servers:
60
61
1. Create a list of partner servers on your home server using the Replication Control Panel
62
2. Create certificate files for the home server
63
3. Create certificate files for the partner server
64
4. Import partner certificate files to the home server
65
5. Import home certificate to the partner server
66
6. Update your Metacat database
67
68
Each step is discussed in more detail in the following sections.
69
70
Using the Replication Control Panel
71
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72
To add, remove, or alter servers on your home server's Replication list, or to
73
activate and customize the Delta-T handler, use the Replication control panel,
74 6870 jones
which is accessed via the Metacat Administration interface at the following URL::
75 6845 jones
76 6870 jones
   http://somehost.somelocation.edu/context/admin
77 6845 jones
78
"http://somehost.somelocation.edu/context" should be replaced with the name
79
of your Metacat server and context (e.g., http://knb.ecoinformatics.org/knb/).
80
You must be logged in to Metacat as an administrator.
81
82
.. figure:: images/screenshots/image061.jpg
83
   :align: center
84
85
   Replication control panel.
86
87
Note that currently, you cannot use the Replication Control Panel to remove a
88 6936 leinfelder
server after a replication has occurred. To stop replication between two servers,
89
update the flags that control whether metadata and/or data are replicated.
90 6845 jones
91
Generating and Exchanging Security Certificates
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93
Before you can take advantage of Metacat's replication feature, you must
94
generate security certificates on both the replication partner and home servers.
95 6870 jones
Depending on how the certificates are generated, the certificates may need to be
96
exchanged so that each machine "trusts" that the other has replication access.
97
Certificates that are purchased from a commercial and well-recognized
98
Certificate Authority do not need to be exchanged with the other replication
99
partner before replication takes place.  Metacat replication relies on SSL with
100
client certificate authentication enabled.  When a replication partner server
101
communicates with another replication partner, it presents a certificate that
102
serves to verify and authenticate that the server is trusted.
103 6845 jones
104 6870 jones
If you must generate a self-signed certificate, the partner replication server
105 7219 leinfelder
will need that public certificate (or the certificate of the signing CA) added
106
to its existing Certificate Authorities.
107 6845 jones
108
Generate Certificates for Metacat running under Apache/Tomcat
109
.............................................................
110
Note: Instructions are for Ubuntu/Debian systems.
111
112 6870 jones
1. Generate a private key using openssl. The key will be named
113 6845 jones
   ``<hostname>-apache.key``, where ``<hostname>`` is the name of your Metacat
114
   server. Example values for the individual key fields are included in the
115
   table below.
116
117
   ::
118
119
     openssl req -new -out REQ.pem -keyout <hostname>-apache.key
120
121
   +--------------------------+-------------------------------------------------------------------------+
122
   | Key Field                | Description and Example Value                                           |
123
   +==========================+=========================================================================+
124
   | Country Name             | Two letter country code  (e.g., US)                                     |
125
   +--------------------------+-------------------------------------------------------------------------+
126
   | State or Province Name   | The name of your state or province spelled in full (e.g., California)   |
127
   +--------------------------+-------------------------------------------------------------------------+
128
   | Locality Name            | The name of your city (e.g., Santa Barbara)                             |
129
   +--------------------------+-------------------------------------------------------------------------+
130
   | Organization Name        | The company or organization name (e.g., UCSB)                           |
131
   +--------------------------+-------------------------------------------------------------------------+
132
   | Organizational Unit Name | The department or section name (e.g., NCEAS)                            |
133
   +--------------------------+-------------------------------------------------------------------------+
134
   | Common Name              | The host server name without port numbers (e.g., myserver.mydomain.edu) |
135
   +--------------------------+-------------------------------------------------------------------------+
136
   | Email Address            | Administrator's contact email (e.g., administrator@mydomain.edu)        |
137
   +--------------------------+-------------------------------------------------------------------------+
138
   | A challenge password     | --leave this field blank--                                              |
139
   +--------------------------+-------------------------------------------------------------------------+
140
   | An optional company name | --leave this field blank--                                              |
141
   +--------------------------+-------------------------------------------------------------------------+
142
143
2. Create the local certificate file by running the command:
144
145
   ::
146
147
     openssl req -x509 -days 800 -in REQ.pem -key <hostname>-apache.key -out <hostname>-apache.crt
148
149
   Use the same ``<hostname>`` you used when you generated the key. A file named
150
   ``<hostname>-apache.crt`` will be created in the directory from which you
151
   ran the openssl command. Note: You can name the certificate file anything
152
   you'd like, but keep in mind that the file will be sent to the partner
153
   machine used for replication. The certificate name should have enough
154
   meaning that someone who sees it on that machine can figure out where it
155 6870 jones
   came from and for what purpose it should be used.
156 6845 jones
157 6870 jones
3. Enter the certificate into Apache's security configuration. This will
158
   be used to identify your server to a replication partner. You must
159 6845 jones
   register the certificate in the local Apache instance. Note that the
160
   security files may be in a different directory from the one used in the
161
   instructions depending on how you installed Apache. Copy the certificate and
162
   key file using the following commands:
163
164
   ::
165
166
     sudo cp <hostname>-apache.crt /etc/ssl/certs
167
     sudo cp <hostname>-apache.key /etc/ssl/private
168
169 6936 leinfelder
4. Apache needs to be configured to request a client certificate when the
170 8265 leinfelder
   replication API is utilized. The helper file named "metacat-site-ssl" has default
171 6870 jones
   rules that configure Apache for SSL and client certificate authentication.
172 8265 leinfelder
   Set up these SSL settings by copying the metacat-site-ssl file into the ``sites-available``
173 6870 jones
   directory, editing pertinent values to match your system and running
174 8265 leinfelder
   ``a2ensite`` to enable the site. (Note: some settings in metacat-site-ssl need to be
175 7219 leinfelder
   changed to match the specifics of your system and Metacat deployment.)
176 6845 jones
177
   ::
178
179 8265 leinfelder
     sudo cp <metacat_helper_dir>/metacat-site-ssl <apache_install_dir>/sites-available
180
     sudo a2ensite metacat-site-ssl
181 6845 jones
182 6930 leinfelder
5. Enable the ssl module:
183 6845 jones
184
   ::
185
186 6930 leinfelder
     sudo a2enmod ssl
187
188
6. Restart Apache to bring in changes by typing:
189
190
   ::
191
192 6845 jones
     sudo /etc/init.d/apache2 restart
193
194 6930 leinfelder
7. If using a self-signed certificate, SCP ``<hostname>-apache.crt`` to the
195 6870 jones
   replication partner machine where it will be added as an additional
196
   Certificate Authority.
197 6845 jones
198 6870 jones
If using self-signed certificates, after you have created and SCP'd a
199
certificate file to each replication partner, and received a certificate file
200
from each partner in return, both home and partner servers must add the
201
respective partner certificates as Certificate Authorities.
202 6845 jones
203
204
To import a certificate
205
.......................
206 6870 jones
1. Copy it into the Apache directory
207 6845 jones
208
   ::
209
210 6870 jones
     sudo cp <remotehostfilename> /etc/ssl/certs/
211 6845 jones
212 6870 jones
2. Rehash the certificates for Apache by running:
213 6845 jones
214
   ::
215
216 6870 jones
     cd /etc/ssl/certs
217
     sudo c_rehash
218 6845 jones
219 6870 jones
220 6845 jones
   where the ``<remotehostfilename>`` is the name of the certificate file
221
   created on the remote partner machine and SCP'd to the home machine.
222
223 7220 leinfelder
To import a certificate into Java keystore (for self-signed certificates)
224 7223 jones
.........................................................................
225 7220 leinfelder
1. Use Java's keytool to import to the default Java keystore
226
227
   ::
228
229
     sudo keytool -import -alias <remotehostname_alias> -file <remotehostfilename> -keystore $JAVA_HOME/lib/security/cacerts
230
231
2. Restart Tomcat
232
233
   ::
234
235
     sudo /etc/init.d/tomcat6 restart
236
237
238
   where the ``<remotehostfilename>`` is the name of the certificate file
239
   created on the remote partner machine and SCP'd to the home machine and
240
   <remotehostname_alias> is a short memorable alias for this certificate and
241
   $JAVA_HOME is the same as configured for running Tomcat. NOTE: the cacerts path may be different
242
   depending on your exact Java installation.
243
244 7242 leinfelder
245
Update Metacat properties
246
.........................
247
Metacat needs to be configured with the path to both the server certificate and the private key.
248
1. Edit metacat.properties, modifying these properties to match your specific deployment.
249
250
   ::
251
252
     replication.certificate.file=/etc/ssl/certs/<hostname>-apache.crt
253
     replication.privatekey.file=/etc/ssl/private/<hostname>-apache.key
254
     replication.privatekey.password=<password, or blank if not protected>
255
256
257 6845 jones
Update your Metacat database
258
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
259
The simplest way to update the Metacat database to use replication is to use
260 6936 leinfelder
the Replication Control Panel. You can also update the database using SQL.
261
Instructions for both options are included in this section.
262 6845 jones
263
.. figure:: images/screenshots/image063.jpg
264
   :align: center
265
266
   Using the Replication Control Panel to update the Metacat database.
267
268
To update your Metacat database to use replication, select the "Add this server"
269
radio button from the Replication Control Panel, enter the partner server name,
270
and specify how the replication should occur (whether to replicate xml, data,
271 6870 jones
or use the local machine as a hub).
272 6845 jones
273
To update the database using SQL
274
................................
275
276
1. Log in to the database
277
278
   ::
279
280
     psql -U metacat -W -h localhost metacat
281
282
2. Select all rows from the replication table
283
284
   ::
285
286
     select * from xml_replication;
287
288
3. Insert the partner server.
289
290
   ::
291
292
     INSERT INTO xml_replication (server,last_checked,replicate,datareplicate,hub) VALUES ('<partner.server/context>/servlet/replication',NULL,1,1,0);
293
294
   Where ``<partner.server/context>`` is the name of the partner server and
295
   context. The values 'NULL, 1,1,0' indicate (respectively) the last time
296
   replication occurred, that XML docs should be replicated to the partner
297
   server, that data files should be replicated to the partner server, and
298
   that the local server should not act as a hub. Set a value of 'NULL,0,0,0'
299
   if your Metacat is only receiving documents from the partner site and not
300
   replicating to that site.
301
302
4. Exit the database
303
5. Restart Apache and Tomcat on both home and partner replication machines