Project

General

Profile

1 6147 jones
Replication
2
===========
3 6845 jones
Metacat has a built-in replication feature that allows different Metacat servers
4
to share data (both XML documents and data files) between each other. Metacat
5
can replicate not only its home server's original documents, but also those
6
that were replicated from partner Metacat servers. When changes are made to
7
one server in a replication network, the changes are automatically propogated
8
to the network, even if the network is down.
9 6147 jones
10 6845 jones
Replication allows users to manage their data locally and (by replicating them
11
to a shared Metacat repository) to make those data available to the greater
12
scientific community via a centralized search. In other words, your Metacat can
13
be part of a broader network, but you retain control over the local repository
14
and how it is managed.
15 6147 jones
16 6845 jones
For example, the KNB Network (Figure 6.1), which currently consists of ten
17
different Metacat servers from around the world, uses replication to "join"
18
the disperate servers to form a single robust and searchable data
19
repository--facilitating data discovery, while leaving the data ownership and
20
management with the local administrators.
21 6147 jones
22 6845 jones
.. figure:: images/screenshots/image059.jpg
23
   :align: center
24
25
   A map of the KNB Metacat network.
26 6147 jones
27 6845 jones
When properly configured, Metacat's replication mechanism can be triggered by
28
several types of events that occur on either the home or partner server: a
29
document insertion, an update, or an automatic replication (i.e., Delta-T
30
monitoring), which is set at a user-specified time interval.
31
32
+----------------------+----------------------------------------------------------+
33
| Replication Triggers | Description                                              |
34
+======================+==========================================================+
35
| Insert               | Whenever a document is inserted into Metacat, the server |
36
|                      | notifies each server in its replication list             |
37
|                      | that it has a new file available.                        |
38
+----------------------+----------------------------------------------------------+
39
| Update               | Whenever a document is updated, the server notifies      |
40
|                      | each server in its replication list of the update.       |
41
+----------------------+----------------------------------------------------------+
42
| Delta-T monitoring   | At a user-specified time interval, Metacat checks each   |
43
|                      | of the servers in its replication list                   |
44
|                      | for updated documents.                                   |
45
+----------------------+----------------------------------------------------------+
46
47
Configuring Replication
48
-----------------------
49
To configure replication, you must configure both the home and partner servers:
50
51
1. Create a list of partner servers on your home server using the Replication Control Panel
52
2. Create certificate files for the home server
53
3. Create certificate files for the partner server
54
4. Import partner certificate files to the home server
55
5. Import home certificate to the partner server
56
6. Update your Metacat database
57
58
Each step is discussed in more detail in the following sections.
59
60
Using the Replication Control Panel
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62
To add, remove, or alter servers on your home server's Replication list, or to
63
activate and customize the Delta-T handler, use the Replication control panel,
64 6870 jones
which is accessed via the Metacat Administration interface at the following URL::
65 6845 jones
66 6870 jones
   http://somehost.somelocation.edu/context/admin
67 6845 jones
68
"http://somehost.somelocation.edu/context" should be replaced with the name
69
of your Metacat server and context (e.g., http://knb.ecoinformatics.org/knb/).
70
You must be logged in to Metacat as an administrator.
71
72
.. figure:: images/screenshots/image061.jpg
73
   :align: center
74
75
   Replication control panel.
76
77
Note that currently, you cannot use the Replication Control Panel to remove a
78
server after a replication has occurred. At this point in time, the only way to
79
remove a replication server after replication has occurred is to remove the
80
certificates.
81
82
Generating and Exchanging Security Certificates
83
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84
Before you can take advantage of Metacat's replication feature, you must
85
generate security certificates on both the replication partner and home servers.
86 6870 jones
Depending on how the certificates are generated, the certificates may need to be
87
exchanged so that each machine "trusts" that the other has replication access.
88
Certificates that are purchased from a commercial and well-recognized
89
Certificate Authority do not need to be exchanged with the other replication
90
partner before replication takes place.  Metacat replication relies on SSL with
91
client certificate authentication enabled.  When a replication partner server
92
communicates with another replication partner, it presents a certificate that
93
serves to verify and authenticate that the server is trusted.
94 6845 jones
95 6870 jones
If you must generate a self-signed certificate, the partner replication server
96
will need the public certificate added to its existing Certificate Authorities.
97 6845 jones
98
Generate Certificates for Metacat running under Apache/Tomcat
99
.............................................................
100
Note: Instructions are for Ubuntu/Debian systems.
101
102 6870 jones
1. Generate a private key using openssl. The key will be named
103 6845 jones
   ``<hostname>-apache.key``, where ``<hostname>`` is the name of your Metacat
104
   server. Example values for the individual key fields are included in the
105
   table below.
106
107
   ::
108
109
     openssl req -new -out REQ.pem -keyout <hostname>-apache.key
110
111
   +--------------------------+-------------------------------------------------------------------------+
112
   | Key Field                | Description and Example Value                                           |
113
   +==========================+=========================================================================+
114
   | Country Name             | Two letter country code  (e.g., US)                                     |
115
   +--------------------------+-------------------------------------------------------------------------+
116
   | State or Province Name   | The name of your state or province spelled in full (e.g., California)   |
117
   +--------------------------+-------------------------------------------------------------------------+
118
   | Locality Name            | The name of your city (e.g., Santa Barbara)                             |
119
   +--------------------------+-------------------------------------------------------------------------+
120
   | Organization Name        | The company or organization name (e.g., UCSB)                           |
121
   +--------------------------+-------------------------------------------------------------------------+
122
   | Organizational Unit Name | The department or section name (e.g., NCEAS)                            |
123
   +--------------------------+-------------------------------------------------------------------------+
124
   | Common Name              | The host server name without port numbers (e.g., myserver.mydomain.edu) |
125
   +--------------------------+-------------------------------------------------------------------------+
126
   | Email Address            | Administrator's contact email (e.g., administrator@mydomain.edu)        |
127
   +--------------------------+-------------------------------------------------------------------------+
128
   | A challenge password     | --leave this field blank--                                              |
129
   +--------------------------+-------------------------------------------------------------------------+
130
   | An optional company name | --leave this field blank--                                              |
131
   +--------------------------+-------------------------------------------------------------------------+
132
133
2. Create the local certificate file by running the command:
134
135
   ::
136
137
     openssl req -x509 -days 800 -in REQ.pem -key <hostname>-apache.key -out <hostname>-apache.crt
138
139
   Use the same ``<hostname>`` you used when you generated the key. A file named
140
   ``<hostname>-apache.crt`` will be created in the directory from which you
141
   ran the openssl command. Note: You can name the certificate file anything
142
   you'd like, but keep in mind that the file will be sent to the partner
143
   machine used for replication. The certificate name should have enough
144
   meaning that someone who sees it on that machine can figure out where it
145 6870 jones
   came from and for what purpose it should be used.
146 6845 jones
147 6870 jones
3. Enter the certificate into Apache's security configuration. This will
148
   be used to identify your server to a replication partner. You must
149 6845 jones
   register the certificate in the local Apache instance. Note that the
150
   security files may be in a different directory from the one used in the
151
   instructions depending on how you installed Apache. Copy the certificate and
152
   key file using the following commands:
153
154
   ::
155
156
     sudo cp <hostname>-apache.crt /etc/ssl/certs
157
     sudo cp <hostname>-apache.key /etc/ssl/private
158
159 6870 jones
4. Apache needs to be configured to request a “client certificate” when the
160
   replication API is utilized. The helper file named "knb-ssl" has default
161
   rules that configure Apache for SSL and client certificate authentication.
162
   Set up these SSL settings by copying the knb-ssl file into the ``sites-available``
163
   directory, editing pertinent values to match your system and running
164
   ``a2ensite`` to enable the site. (Note: some settings in knb-ssl need to be
165
   changed to match the specifics of your system.)
166 6845 jones
167
   ::
168
169
     sudo cp <metacat_helper_dir>/knb-ssl <apache_install_dir>/sites-available
170
     sudo a2ensite knb-ssl
171
172
5. Restart Apache to bring in changes by typing:
173
174
   ::
175
176
     sudo /etc/init.d/apache2 restart
177
178 6870 jones
6. If using a self-signed certificate, SCP ``<hostname>-apache.crt`` to the
179
   replication partner machine where it will be added as an additional
180
   Certificate Authority.
181 6845 jones
182 6870 jones
If using self-signed certificates, after you have created and SCP'd a
183
certificate file to each replication partner, and received a certificate file
184
from each partner in return, both home and partner servers must add the
185
respective partner certificates as Certificate Authorities.
186 6845 jones
187
188
To import a certificate
189
.......................
190 6870 jones
1. Copy it into the Apache directory
191 6845 jones
192
   ::
193
194 6870 jones
     sudo cp <remotehostfilename> /etc/ssl/certs/
195 6845 jones
196 6870 jones
2. Rehash the certificates for Apache by running:
197 6845 jones
198
   ::
199
200 6870 jones
     cd /etc/ssl/certs
201
     sudo c_rehash
202 6845 jones
203 6870 jones
204 6845 jones
   where the ``<remotehostfilename>`` is the name of the certificate file
205
   created on the remote partner machine and SCP'd to the home machine.
206
207
Update your Metacat database
208
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209
The simplest way to update the Metacat database to use replication is to use
210
the Replication Control Panel. You can also update the database u
211
sing SQL. Instructions for both options are included in this section.
212
213
.. figure:: images/screenshots/image063.jpg
214
   :align: center
215
216
   Using the Replication Control Panel to update the Metacat database.
217
218
To update your Metacat database to use replication, select the "Add this server"
219
radio button from the Replication Control Panel, enter the partner server name,
220
and specify how the replication should occur (whether to replicate xml, data,
221 6870 jones
or use the local machine as a hub).
222 6845 jones
223
To update the database using SQL
224
................................
225
226
1. Log in to the database
227
228
   ::
229
230
     psql -U metacat -W -h localhost metacat
231
232
2. Select all rows from the replication table
233
234
   ::
235
236
     select * from xml_replication;
237
238
3. Insert the partner server.
239
240
   ::
241
242
     INSERT INTO xml_replication (server,last_checked,replicate,datareplicate,hub) VALUES ('<partner.server/context>/servlet/replication',NULL,1,1,0);
243
244
   Where ``<partner.server/context>`` is the name of the partner server and
245
   context. The values 'NULL, 1,1,0' indicate (respectively) the last time
246
   replication occurred, that XML docs should be replicated to the partner
247
   server, that data files should be replicated to the partner server, and
248
   that the local server should not act as a hub. Set a value of 'NULL,0,0,0'
249
   if your Metacat is only receiving documents from the partner site and not
250
   replicating to that site.
251
252
4. Exit the database
253
5. Restart Apache and Tomcat on both home and partner replication machines