Project

General

Profile

1
Replication
2
===========
3
Metacat has a built-in replication feature that allows different Metacat servers 
4
to share data (both XML documents and data files) between each other. Metacat 
5
can replicate not only its home server's original documents, but also those 
6
that were replicated from partner Metacat servers. When changes are made to 
7
one server in a replication network, the changes are automatically propogated 
8
to the network, even if the network is down.
9

    
10
Replication allows users to manage their data locally and (by replicating them 
11
to a shared Metacat repository) to make those data available to the greater 
12
scientific community via a centralized search. In other words, your Metacat can 
13
be part of a broader network, but you retain control over the local repository 
14
and how it is managed.
15

    
16
For example, the KNB Network (Figure 6.1), which currently consists of ten 
17
different Metacat servers from around the world, uses replication to "join" 
18
the disperate servers to form a single robust and searchable data 
19
repository--facilitating data discovery, while leaving the data ownership and 
20
management with the local administrators.
21

    
22
.. figure:: images/screenshots/image059.jpg
23
   :align: center
24
   
25
   A map of the KNB Metacat network.
26

    
27
When properly configured, Metacat's replication mechanism can be triggered by 
28
several types of events that occur on either the home or partner server: a 
29
document insertion, an update, or an automatic replication (i.e., Delta-T 
30
monitoring), which is set at a user-specified time interval.
31

    
32
+----------------------+----------------------------------------------------------+
33
| Replication Triggers | Description                                              |
34
+======================+==========================================================+
35
| Insert               | Whenever a document is inserted into Metacat, the server |
36
|                      | notifies each server in its replication list             |
37
|                      | that it has a new file available.                        |
38
+----------------------+----------------------------------------------------------+
39
| Update               | Whenever a document is updated, the server notifies      |
40
|                      | each server in its replication list of the update.       |
41
+----------------------+----------------------------------------------------------+
42
| Delta-T monitoring   | At a user-specified time interval, Metacat checks each   |
43
|                      | of the servers in its replication list                   |
44
|                      | for updated documents.                                   |
45
+----------------------+----------------------------------------------------------+
46

    
47
Configuring Replication
48
-----------------------
49
To configure replication, you must configure both the home and partner servers:
50

    
51
1. Create a list of partner servers on your home server using the Replication Control Panel
52
2. Create certificate files for the home server
53
3. Create certificate files for the partner server
54
4. Import partner certificate files to the home server
55
5. Import home certificate to the partner server
56
6. Update your Metacat database 
57

    
58
Each step is discussed in more detail in the following sections.
59

    
60
Using the Replication Control Panel
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62
To add, remove, or alter servers on your home server's Replication list, or to 
63
activate and customize the Delta-T handler, use the Replication control panel, 
64
which is accessed via the Metacat Administration interface at the following URL::
65
 
66
   http://somehost.somelocation.edu/context/admin
67
   
68
"http://somehost.somelocation.edu/context" should be replaced with the name 
69
of your Metacat server and context (e.g., http://knb.ecoinformatics.org/knb/). 
70
You must be logged in to Metacat as an administrator.
71

    
72
.. figure:: images/screenshots/image061.jpg
73
   :align: center
74
   
75
   Replication control panel.
76

    
77
Note that currently, you cannot use the Replication Control Panel to remove a 
78
server after a replication has occurred. At this point in time, the only way to 
79
remove a replication server after replication has occurred is to remove the 
80
certificates. 
81

    
82
Generating and Exchanging Security Certificates
83
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84
Before you can take advantage of Metacat's replication feature, you must 
85
generate security certificates on both the replication partner and home servers. 
86
Depending on how the certificates are generated, the certificates may need to be
87
exchanged so that each machine "trusts" that the other has replication access.
88
Certificates that are purchased from a commercial and well-recognized 
89
Certificate Authority do not need to be exchanged with the other replication
90
partner before replication takes place.  Metacat replication relies on SSL with
91
client certificate authentication enabled.  When a replication partner server 
92
communicates with another replication partner, it presents a certificate that
93
serves to verify and authenticate that the server is trusted.
94

    
95
If you must generate a self-signed certificate, the partner replication server 
96
will need the public certificate added to its existing Certificate Authorities. 
97

    
98
Generate Certificates for Metacat running under Apache/Tomcat
99
.............................................................
100
Note: Instructions are for Ubuntu/Debian systems.
101

    
102
1. Generate a private key using openssl. The key will be named 
103
   ``<hostname>-apache.key``, where ``<hostname>`` is the name of your Metacat 
104
   server. Example values for the individual key fields are included in the
105
   table below.
106

    
107
   ::
108
   
109
     openssl req -new -out REQ.pem -keyout <hostname>-apache.key
110

    
111
   +--------------------------+-------------------------------------------------------------------------+
112
   | Key Field                | Description and Example Value                                           |
113
   +==========================+=========================================================================+
114
   | Country Name             | Two letter country code  (e.g., US)                                     |
115
   +--------------------------+-------------------------------------------------------------------------+
116
   | State or Province Name   | The name of your state or province spelled in full (e.g., California)   |
117
   +--------------------------+-------------------------------------------------------------------------+
118
   | Locality Name            | The name of your city (e.g., Santa Barbara)                             |
119
   +--------------------------+-------------------------------------------------------------------------+
120
   | Organization Name        | The company or organization name (e.g., UCSB)                           |
121
   +--------------------------+-------------------------------------------------------------------------+
122
   | Organizational Unit Name | The department or section name (e.g., NCEAS)                            |
123
   +--------------------------+-------------------------------------------------------------------------+
124
   | Common Name              | The host server name without port numbers (e.g., myserver.mydomain.edu) |
125
   +--------------------------+-------------------------------------------------------------------------+
126
   | Email Address            | Administrator's contact email (e.g., administrator@mydomain.edu)        |
127
   +--------------------------+-------------------------------------------------------------------------+
128
   | A challenge password     | --leave this field blank--                                              |
129
   +--------------------------+-------------------------------------------------------------------------+
130
   | An optional company name | --leave this field blank--                                              |
131
   +--------------------------+-------------------------------------------------------------------------+
132

    
133
2. Create the local certificate file by running the command:
134

    
135
   ::
136
   
137
     openssl req -x509 -days 800 -in REQ.pem -key <hostname>-apache.key -out <hostname>-apache.crt
138

    
139
   Use the same ``<hostname>`` you used when you generated the key. A file named 
140
   ``<hostname>-apache.crt`` will be created in the directory from which you 
141
   ran the openssl command. Note: You can name the certificate file anything 
142
   you'd like, but keep in mind that the file will be sent to the partner 
143
   machine used for replication. The certificate name should have enough 
144
   meaning that someone who sees it on that machine can figure out where it 
145
   came from and for what purpose it should be used. 
146

    
147
3. Enter the certificate into Apache's security configuration. This will
148
   be used to identify your server to a replication partner. You must 
149
   register the certificate in the local Apache instance. Note that the 
150
   security files may be in a different directory from the one used in the 
151
   instructions depending on how you installed Apache. Copy the certificate and 
152
   key file using the following commands:
153
   
154
   ::
155
   
156
     sudo cp <hostname>-apache.crt /etc/ssl/certs 
157
     sudo cp <hostname>-apache.key /etc/ssl/private 
158

    
159
4. Apache needs to be configured to request a “client certificate” when the 
160
   replication API is utilized. The helper file named "knb-ssl" has default 
161
   rules that configure Apache for SSL and client certificate authentication. 
162
   Set up these SSL settings by copying the knb-ssl file into the ``sites-available`` 
163
   directory, editing pertinent values to match your system and running 
164
   ``a2ensite`` to enable the site. (Note: some settings in knb-ssl need to be 
165
   changed to match the specifics of your system.) 
166

    
167
   ::
168
   
169
     sudo cp <metacat_helper_dir>/knb-ssl <apache_install_dir>/sites-available
170
     sudo a2ensite knb-ssl
171

    
172
5. Restart Apache to bring in changes by typing: 
173

    
174
   ::
175
   
176
     sudo /etc/init.d/apache2 restart
177

    
178
6. If using a self-signed certificate, SCP ``<hostname>-apache.crt`` to the 
179
   replication partner machine where it will be added as an additional 
180
   Certificate Authority.
181

    
182
If using self-signed certificates, after you have created and SCP'd a 
183
certificate file to each replication partner, and received a certificate file 
184
from each partner in return, both home and partner servers must add the 
185
respective partner certificates as Certificate Authorities.
186

    
187

    
188
To import a certificate
189
.......................
190
1. Copy it into the Apache directory
191
   
192
   ::
193
   
194
     sudo cp <remotehostfilename> /etc/ssl/certs/
195

    
196
2. Rehash the certificates for Apache by running: 
197

    
198
   ::
199
   
200
     cd /etc/ssl/certs
201
     sudo c_rehash
202

    
203

    
204
   where the ``<remotehostfilename>`` is the name of the certificate file 
205
   created on the remote partner machine and SCP'd to the home machine. 
206

    
207
Update your Metacat database
208
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209
The simplest way to update the Metacat database to use replication is to use 
210
the Replication Control Panel. You can also update the database u
211
sing SQL. Instructions for both options are included in this section.
212

    
213
.. figure:: images/screenshots/image063.jpg
214
   :align: center
215
   
216
   Using the Replication Control Panel to update the Metacat database.
217

    
218
To update your Metacat database to use replication, select the "Add this server" 
219
radio button from the Replication Control Panel, enter the partner server name, 
220
and specify how the replication should occur (whether to replicate xml, data, 
221
or use the local machine as a hub).
222

    
223
To update the database using SQL
224
................................
225

    
226
1. Log in to the database
227

    
228
   ::
229
   
230
     psql -U metacat -W -h localhost metacat
231

    
232
2. Select all rows from the replication table
233

    
234
   ::
235

    
236
     select * from xml_replication;  
237

    
238
3. Insert the partner server. 
239

    
240
   ::
241
   
242
     INSERT INTO xml_replication (server,last_checked,replicate,datareplicate,hub) VALUES ('<partner.server/context>/servlet/replication',NULL,1,1,0);
243

    
244
   Where ``<partner.server/context>`` is the name of the partner server and 
245
   context. The values 'NULL, 1,1,0' indicate (respectively) the last time 
246
   replication occurred, that XML docs should be replicated to the partner 
247
   server, that data files should be replicated to the partner server, and 
248
   that the local server should not act as a hub. Set a value of 'NULL,0,0,0' 
249
   if your Metacat is only receiving documents from the partner site and not 
250
   replicating to that site.
251

    
252
4. Exit the database 
253
5. Restart Apache and Tomcat on both home and partner replication machines 
(18-18/20)