Project

General

Profile

metacat / docs / user / replication.html @ 4808

1
<!--
2
  * replication.html
3
  *
4
  *      Authors: Chad Berkley
5
  *    Copyright: 2000 Regents of the University of California and the
6
  *               National Center for Ecological Analysis and Synthesis
7
  *  For Details: http://www.nceas.ucsb.edu/
8
  *      Created: 2001 January 23
9
  *      Version: 
10
  *    File Info: '$ '
11
  * 
12
  * 
13
-->
14
<HTML>
15
<HEAD>
16
<TITLE>Metacat Replication</TITLE>
17
<link rel="stylesheet" type="text/css" href="./common.css">
18
<link rel="stylesheet" type="text/css" href="./default.css">
19
</HEAD> 
20
<BODY>
21
  <table width="100%">
22
    <tr>
23
      <td class="tablehead" colspan="2"><p class="label">Replication</p></td>
24
      <td class="tablehead" colspan="2" align="right">
25
        <a href="./packages.html">Back</a> | <a href="./metacattour.html">Home</a> | 
26
        <a href="./datafiles.html">Next</a>
27
      </td>
28
    </tr>
29
  </table>
30
  
31
  <div class="header1">Table of Contents</div>
32
  <div class="toc1"><a href="#Intro">Metacat Replication</a></div>
33
    <div class="toc2"><a href="#Overview">Overview</a></div>
34
    <div class="toc2"><a href="#DatabasedInfo">Databased Information</a></div>
35
    <div class="toc2"><a href="#Example">Example</a></div>
36
      <div class="toc3"><a href="#gamma">What happens with gamma?</a></div>
37
      <div class="toc3"><a href="#alpha">What happens with alpha?</a></div>
38
      <div class="toc3"><a href="#lamda">What happens with lamda?</a></div>
39
  <div class="toc1"><a href="#ControlPanel">The Replication Control Panel</a></div>
40
  <div class="toc1"><a href="#Certificates">Certificates</a></div>
41
    <div class="toc2"><a href="#GenerateCertificates">Generate Certificates on both the replication client and server.</a></div> 
42
      <div class="toc3"><a href="#GenerateCertTomcat">Generate Certificate for Tomcat standalone (no Apache)</a></div>
43
      <div class="toc3"><a href="#GenerateCertApache">Generate Certificate for Apache/Tomcat</a></div>
44
    <div class="toc2"><a href="#RegisterPartner">Register the partner machines certificate</a></div> 
45
  
46
  <a name="Intro"></a><div class="header1">Metacat Replication</div>
47
  <a name="Overview"></a><div class="header2">Overview</div>
48
  <p>Metacat has built-in replication to allow different Metacat servers to 
49
  share data between themselves. Metacat not only replicates XML documents but 
50
  also data files. </p>
51
  
52
  <p>Metacat's hub feature allows it to replicate not only it's own server's original
53
  documents, but also those that were replicated from other servers.  This functionality
54
  allows for a more complex chaining replication structure.</p>
55
  
56
  <p>The replication scheme that Metacat uses is both push and pull.  There are 
57
  several triggers that can start a replication mechanism: </p>
58
  <ul class="list1">
59
    <li><b>Delta-T monitoring</b> - at a set time interval a server checks each of the
60
    other servers in its list for updated documents</li>
61
    <li><b>INSERT trigger</b> - Whenever a document is inserted, the server notifies
62
    the remote hosts in its list that it has a new file available.</li>
63
    <li><b>UPDATE trigger</b> - Whenever a document is updated, the server notifies
64
    each server in its list of the update.</li>
65
    <li><b>File locking</b> - When a local user tries to alter a document on a local 
66
    server that belongs to a remote server, the local server must first
67
    obtain a lock on that file.  Once the lock is obtained, the file can 
68
    be updated, then it is force replicated out to each server in the list.
69
    The lock ensures that the remote copy is up to date and that an older
70
    file does not overwrite a newer one.  Only a documents home server
71
    can give a lock for that file to be altered.</li>
72
  </ul>
73
  
74
  <a name="DatabasedInfo"></a><div class="header2">Databased Information</div>
75
  <p>Each server contains a list of servers to which it can replicate.  One-way
76
  replication is enabled by the 'replicate' and 'datareplicate' flags in the 
77
  list.  The server list may look like the following.</p>
78
  <table border="1">
79
    <tr>
80
      <td><b>serverid</b></td>
81
      <td><b>server</b></td>
82
      <td><b>last_checked</b></td>
83
      <td><b>replicate</b></td>
84
      <td><b>datareplicate</b></td>
85
      <td><b>hub</b></td>
86
    </tr>
87
    <tr>
88
      <td>1</td>
89
      <td>localhost</td>
90
      <td>null</td>
91
      <td>0</td>
92
      <td>0</td>
93
      <td>0</td>
94
    </tr>
95
    <tr>
96
      <td>2</td>
97
      <td>alpha.nceas.ucsb.edu:8080/berkley/servlet/replication</td>
98
      <td>2001-01-22 14:52:12.1</td>
99
      <td>0</td>
100
      <td>0</td>
101
      <td>0</td>
102
    </tr>
103
    <tr>
104
      <td>3</td>
105
      <td>dev.nceas.ucsb.edu/Metacat/servlet/replication</td>
106
      <td>2001-01-23 9:10:02.5</td>
107
      <td>1</td>
108
      <td>1</td>
109
      <td>0</td>
110
    </tr>
111
  </table>
112
  
113
  <br>
114
  The server list is kept in a table in the database called xml_replication.
115
  Localhost must always be the first entry in the table and have a serverid of 1.
116
  The database fields are:
117
  <ul class="list1">
118
  <li><b>serverid</b> - a unique ID that is generated by the database when a new field is added.</li>
119
  <li><b>server</b> - this field always points to the partner server's replication servlet,
120
  hence the "servlet/replication" on the end of both of the sample servers.  Note
121
  that any port numbers (if your servlet engine is not running on port 80) must
122
  also be included. </li>
123
  <li><b>last_checked</b> - a system generated values that holds the last time that a check was 
124
  made to see if replication needed to be performed.<li>
125
  <li><b>replicate</b> - flag that is set to 1 if you want this server to replicate XML 
126
  metadata documents TO the remote host.  Note that if this flag is set to 0, datareplicate
127
  and hub fields have no meaning.</li>
128
  <li><b>datareplicate</b> - flag that is set to 1 if you want this server to copy data 
129
  files to the remote host.  Note that this field has no meaning if replicate is not set to 1.</li>
130
  If this server is a hub to the remote host, the hub flag should be set to.
131
  <li><b>hub</b> - if this flag is set to true, this server will not only replicate it's own
132
  original documents, it will also replicate documents that were replicated to it.  Thus it 
133
  acts as a replication hub to one or more other Metacat servers.</li>
134
  </ul>
135
  
136
  <a name="Example"></a><div class="header2">Example</div>
137
  Here we show an example setup of three replication servers.  We will discuss each.<br><br>
138
  
139
  First, note that in order for replication to occur, both partner servers must have 
140
  each other in their respective tables or replication will not take place.  Also, 
141
  certificates must be set up correctly on both servers in order for replication to 
142
  work.  See the <a href="#Certificates">certificates</a> section below.<br><br>
143

    
144
  <table border="1">
145
    <tr>
146
      <td>host</td>
147
      <td>replication table</td>
148
    </tr>
149
    <tr>
150
     <td>gamma.nceas.ucsb.edu</td>
151
     <td>
152
      <table border="2">
153
        <tr>
154
          <td><b>server</b></td>
155
          <td><b>last_checked</b></td>
156
          <td><b>replicate</b></td>
157
          <td><b>datareplicate</b></td>
158
          <td><b>hub</b></td>
159
        </tr>
160
        <tr>
161
          <td>localhost</td>
162
          <td>null</td>
163
          <td>0</td>
164
          <td>0</td>
165
          <td>0</td>
166
        </tr>
167
        <tr>
168
          <td>alpha.nceas.ucsb.edu:8080/berkley/servlet/replication&nbsp;&nbsp;&nbsp;</td>
169
          <td>2001-01-22 14:52:12.1</td>
170
          <td>0</td>
171
          <td>0</td>
172
          <td>0</td>
173
        </tr>
174
        <tr>
175
          <td>lamda.nceas.ucsb.edu/Metacat/servlet/replication</td>
176
          <td>2001-01-23 9:10:02.5</td>
177
          <td>1</td>
178
          <td>1</td>
179
          <td>0</td>
180
        </tr>
181
      </table>
182
     </td>
183
    </tr>
184
    <tr>
185
      <td>alpha.nceas.ucsb.edu</td>
186
      <td>
187
        <table border="2">
188
          <tr>
189
            <td><b>server</b></td>
190
            <td><b>last_checked</b></td>
191
            <td><b>replicate</b></td>
192
            <td><b>datareplicate</b></td>
193
            <td><b>hub</b></td>
194
          </tr>
195
          <tr>
196
            <td>localhost</td>
197
            <td>null</td>
198
            <td>0</td>
199
            <td>0</td>
200
            <td>0</td>
201
          </tr>
202
          <tr>
203
            <td>gamma.nceas.ucsb.edu:8080/berkley/servlet/replication</td>
204
            <td>2001-01-21 11:33:12.7</td>
205
            <td>0</td>
206
            <td>1</td>
207
            <td>0</td>
208
          </tr>
209
          <tr>
210
            <td>lamda.nceas.ucsb.edu/Metacat/servlet/replication</td>
211
            <td>2001-01-23 10:22:02.5</td>
212
            <td>1</td>
213
            <td>0</td>
214
            <td>0</td>
215
          </tr>
216
        </table>
217
      </td>
218
    </tr>
219
    <tr>
220
      <td>lamda.nceas.ucsb.edu</td>
221
      <td>
222
        <table border="2">
223
          <tr>
224
            <td><b>server</b></td>
225
            <td><b>last_checked</b></td>
226
            <td><b>replicate</b></td>
227
            <td><b>datareplicate</b></td>
228
            <td><b>hub</b></td>
229
          </tr>
230
          <tr>
231
            <td>localhost</td>
232
            <td>null</td>
233
            <td>0</td>
234
            <td>0</td>
235
            <td>0</td>
236
          </tr>
237
          <tr>
238
            <td>gamma.nceas.ucsb.edu:8080/berkley/servlet/replication</td>
239
            <td>2001-01-21 11:33:12.7</td>
240
            <td>0</td>
241
            <td>0</td>
242
            <td>0</td>
243
          </tr>
244
          <tr>
245
            <td>alpha.nceas.ucsb.edu:8080/Metacat/servlet/replication</td>
246
            <td>2001-01-22 12:15:32.5</td>
247
            <td>1</td>
248
            <td>1</td>
249
            <td>1</td>
250
          </tr>
251
        </table>
252
      </td>
253
    </tr>
254
  </table>
255
  
256
  <a name="gamma"></a><div class="header3">What happens with gamma?</div>
257
  <ul class="list1">
258
  <li>The localhost entry is required internally for replication to work on 
259
      gamma.  As long as we see it there, we can safely disregard it.</li>
260
  <li>We see the entry for the alpha machine has all zeros in replicate, 
261
      datareplicate and hub columns.  This means that gamma is configured to
262
      accept replication information from alpha.  (As we will see in a moment,
263
      alpha is not actually correctly configured to send data to gamma.)</li>
264
  <li>We see that the entry for the lamda machine has ones in the replicate
265
      and data replicate columns and a zero in the hub column.  This tells us
266
      that gamma will replicate it's original documents to lamda, assuming that
267
      lambda is configured to accept replication from gamma (we will see that it
268
      is).  However, because the hub value is zero, any documents that replicate 
269
      to gamma will not be further replicated to lamda.</li>
270
  </ul>
271
   
272
  <a name="alpha"></a><div class="header3">What happens with alpha?</div>
273
  <ul class="list1">
274
  <li>The localhost entry is required internally for replication to work on 
275
      alpha.  As long as we see it there, we can safely disregard it.</li>
276
  <li>We see that the entry for gamma has a zero in the replicate column.  
277
      This means that all other entries are meaningless and can be disregarded.
278
      Even though there is a one in the datareplicate column on alpha and gamma 
279
      is configured to accept replication from alpha, no replicationwill happen 
280
      from alpha to gamma.</li>
281
  <li>We see that the entry for lamda is a one in the replicate column and zeros
282
      in the datareplicate and hub columns.  Assuming lamda is configured to 
283
      accept replication from alpha, alpha will replicate metadata only to lamda 
284
      (and indeed, we will see that lambda is set up to accept replication from 
285
      alpha). </li>
286
  </ul>
287
      
288
  <a name="lamda"></a><div class="header3">What happens with lamda?</div>
289
  <ul class="list1">
290
  <li>The localhost entry is required internally for replication to work on 
291
      lamda.  As long as we see it there, we can safely disregard it.</li>
292
  <li>We see that the entry for gamma has all zeros in replicate, datareplicate
293
      and hub, so lamba is set up to accept replication from gamma.  As we have
294
      already seen, gamma is correctly configured to replicate metadata and data
295
      to lambda.  We should see data and metadata replication from gamma to lamda.
296
  <li>We see that the entry for alpha has ones in the replicate datareplicate and 
297
      hub columns.  There's a lot going on here:
298
    <ul class="list2">
299
    <li>First, lamda will replicate original metadata and data to alpha if 
300
        alpha is configured to accept replication from lamda.  Because alpha 
301
        has an entry for lambda, lamba will be allowed to replicate to alpha. </li>
302
    <li>Second, because the alpha entry has a one in the hub column, lambda 
303
        will not only replicate it's original data, it will also replicate 
304
        data that was replicated to it.  Remember that gamma was configured 
305
        to replicate to lamda.  So any data or metadata that gamma sends to 
306
        lambda will get further replicated to alpha.</li>
307
    <li>Finally, the alpha entry in the table allows the alpha server to 
308
        replicate to lambda.  Since the alpha server is set up to replicate
309
        metadata only, we would expect any original metadata on alpha to 
310
        wind up on lambda.</li>
311
    </ul>
312
  </ul>
313

    
314
<a name="ControlPanel"></a><div class="header1">The Replication Control Panel:</div>      
315
  There is an html control panel for controling replication.  After
316
  installing Metacat, you can access it by calling replControl.html.  For instance, if you 
317
  setup a Metacat application context called 'knb' you would probably type :
318
  
319
  <div class="code">http://server.domain.com/knb/style/skins/dev/replControl.html</div>  
320
  
321
  The control panel is an easy interface for adding/removing/altering servers and 
322
  starting the delta-T handler.  It will also allow you to 'force replicate' your 
323
  server list.  This is useful if you want to initialize the state of one Metacat 
324
  server from an existing state of another (i.e. copy all of the data from an existing
325
  server).</p>
326
  
327
  <a name="Certificates"></a><div class="header1">Certificates:</div>
328
  You will need to generate security certificates on both the replication client 
329
  and server.  The certificates will be exchanged so that each machine understands
330
  that the other has access for replication.<br><br>
331
  The following are the steps to generate and exchange certificates on systems
332
  running Tomcat 5 and java 1.5.  Note that if Tomcat is running in conjunction with
333
  Apache, the process is somewhat different than if it is running standalone.
334

    
335
  <a name="GenerateCertificates"></a><div class="header2">Generate Certificates on both the replication client and server.</div>  
336

    
337
  <a name="GenerateCertTomcat"></a><div class="header3">Generate Certificate for Tomcat standalone (no Apache)</div>
338
  <ul class="list1">
339
  <li>Generate keys in java default key store - this will create a secure key and put it
340
    into the binary certificates file located at $JAVA_HOME/lib/security/cacerts</li> 
341
    <ul class="list2">
342
    <li>Run the command: 
343
             <div class="code">keytool -genkey -alias &lt;aliasname&gt; -keyalg RSA -validity 800 -keystore $JAVA_HOME/lib/security/cacerts</div>
344
     where &lt;aliasname&gt; is a unique name that you choose for this cert.  Something like "&lt;hostname-tomcat&gt"
345
     might be appropriate, where &lt;hostname-tomcat&gt is the name of this host.</li>
346
    </ul>
347
  </li>
348
  <li>
349
    Password - keytool will ask for a password.  If this is a pre-existing keystore, you will need
350
    to know its password to modify it.  If you are creating a new keystore, the password you enter
351
    will become the keystore password.
352
  </li>
353
  <li>Sample values when creating certificate</li>
354
    <ul class="list2">
355
    <li>What is your first and last name? <b>myserver.nceas.ucsb.edu </b>
356
        (note: use the host name without port number)<li>
357
    <li>What is the name of your organizional unit? <b>NCEAS</b></li>
358
    <li>What is the name of your organizional unit? <b>UCSB</b></li>
359
    <li>What is the name of your City or Locality? <b>Santa Barbara</b></li>
360
    <li>What is the name of your State or Province? <b>California</b> 
361
        (note: this is spelled in full)<li>
362
    <li>What is the two-letter country code for this unit? <b>US</b></li>
363
    </ul>
364
  <li>Generate certificate - this will pull the certificate you created from the cacerts file
365
      and put it into a local file</li>
366
    <ul class="list2">
367
    <li>Run the command:
368
      <div class="code">keytool -export -alias &lt;aliasname&gt; -file &lt;outputfile&gt;.cert -keystore $JAVA_HOME/lib/security/cacerts</div>
369
      where &lt;aliasname&gt; is the same name you used when you created the certificate.  </li>
370
    <li>A file named &lt;outputfile&gt;.cert will be created in the same directory where you run the keytool 
371
      command.  You can name the output file anything you like, but keep in mind that it will get sent to the 
372
      partner machine used for replication.  The filename should have have enough meaning that someone who sees 
373
      it on that machine can have some idea where it came from.  Again, something like "&lt;hostname&gt;-tomcat.cert"
374
      will suffice.</li>   
375
    </ul>
376
  </li>
377
  <li>Enable SSL in Tomcat 
378
    <ul class="list2">
379
    <li>Edit the Tomcat server file at $TOMCAT_HOME/conf/server.xml</li>
380
    <li>
381
      uncomment the section that starts with "&lt;Connector port="8443" ... (Note: Databased Informationcomments start with
382
      &lt;!-- and end with --&gt;).
383
    </li>
384
          <li>add two attribute to that section that read:
385
            <div class="code">keystoreFile="&lt;JAVA_HOME&gt;/lib/security/cacerts"</div>
386
            <div class="code">keystorePass="&lt;keystore_password&gt;"</div>
387
            where &lt;JAVA_HOME&gt; should be the actual java path and &lt;keystore_password&gt; should be the 
388
            password you used when you created the keystore.
389
          </li>
390
          </ul>
391
  </li>
392
  </ul>  
393
    
394
  <a name="GenerateCertApache"></a><div class="header3">Generate Certificate for Apache/Tomcat</div>
395
  <ul class="list1">
396
  <li>Generate keys using openssl
397
    <ul class="list2">
398
    <li>Run the command: 
399
             <div class="code">   openssl req -new -out REQ.pem -keyout &lt;hostname&gt;-apache.key</div>
400
    </li>
401
    </ul>
402
  </li>
403
  <li>Sample values when creating certificate</li>
404
    <ul class="list2">
405
    <li>Enter PEM pass phrase: (note: I use the first part of the host name)
406
    <li>Country Name (2 letter code) [AU]: <b>US</b></li>
407
    <li>State or Province Name (full name) [Some-State]: <b>California</b> 
408
        (note: this is spelled in full)</li>
409
    <li>Locality Name (eg, city) []: <b>Santa Barbara</b></li>
410
    <li>Organization Name (eg, company) [Internet Widgits Pty Ltd]: <b>UCSB</b></li>
411
    <li>Organizational Unit Name (eg, section) []: <b>NCEAS</b></li>
412
    <li>Common Name (eg, YOUR name) []: <b>myserver.mydomain.edu</b>
413
        (note: use the host name without port number)</li>
414
    <li>Email Address []:  <b>administrator@mydomain.edu</b></li>
415
    <li>A challenge password []: (note: leave blank)</li>
416
    <li>An optional company name []: (note: leave blank)</li>
417
    </ul>
418
  </li>    
419
  <li>Generate certificate - this will create a local file with your certificate</li>
420
    <ul class="list2">
421
    <li>Run the command:
422
      <div class="code">openssl req -x509 -days 800 -in REQ.pem -key &lt;hostname&gt;-apache.key -out &lt;hostname&gt;-apache.crt</div>
423
      where &lt;hostname&gt; is the same name you used when you created the certificate.  </li>
424
    <li>A file named &lt;hostname&gt;-apache.crt will be created in the same directory where you run the keytool 
425
      command.  You can name the output file anything you like, but keep in mind that it will get sent to the 
426
      partner machine used for replication.  The filename should have have enough meaning that someone who sees 
427
      it on that machine can have some idea where it came from.  Again, something like "&lt;hostname&gt;-apache.crt"
428
      will suffice.</li>   
429
    </ul>
430
  </li>   
431
  <li>Enter the certificate into apache security configuration - you need to register the certificate
432
      in the local Apache instance.  Note that the security files may be in a different place depending
433
      on how you installed apache.</li>
434
    <ul class="list2">
435
    <li>Copy the certificate and key file to the apache ssl directories and enable ssl.</li>
436
    <li>For Ubuntu/Debian based systems:
437
      <ul class="list3">
438
      <li>sudo cp &lt;hostname&gt;-apache.crt /etc/ssl/certs</li>
439
      <li>sudo cp &lt;hostname&gt;-apache.key /etc/ssl/private</li>
440
      <li>As root edit /etc/apache2/sites-available/default.  In the VirtualHost section
441
          after the DocumentRoot line, add:<br>
442
          SSLEngine on<br>
443
          SSLOptions +FakeBasicAuth +ExportCertData +CompatEnvVars +StrictRequire<br>
444
          SSLCertificateFile /etc/ssl/certs/server.crt<br>
445
          SSLCertificateKeyFile /etc/ssl/private/server.key<br>
446
      </li>
447
      </ul>
448
    </li>  
449
    </ul>  
450
    <ul class="list2">
451
    <li>For other systems:
452
      <ul class="list3">
453
      <li>sudo cp &lt;hostname&gt;-apache.crt $APACHE_HOME/conf/ssl.crt</li>
454
      <li>sudo cp &lt;hostname&gt;-apache.key $APACHE_HOME/conf/ssl.key</li> 
455
      <li> ADD STEPS TO ENABLE SSL ON NON_DEBIAN SYSTEMS HERE</li>
456
      </ul>
457
    </li>  
458
    </ul>                              
459
  <li>scp &lt;hostname&gt;-apache.crt to the replication partner machine.</li>
460
  </ul>  
461
  
462
  <a name="RegisterPartner"></a><div class="header2">Register the partner machines certificate.</div>   
463
  At this point, you have created a certificate for each replication server and 
464
  scp-ed them across to each other.  Now you need to import the remote server's
465
  certificate on the local machine.  Perform the following steps for each 
466
  replication server.
467
  <ul class="list1">
468
  <li>Import the remote certificate by running:
469
    <div class="code">keytool -import -alias &lt;remotehostalias&gt; -file &lt;remotehostfilename&gt;.crt -keystore $JAVA_HOME/jre/lib/security/cacerts</div>
470
    where the &lt;remotehostfilename&gt; is the certificate file you created on the remote machine and
471
    copied to this machine.  The &lt;remotehostalias&gt; is the name the certificate will use in
472
    the keystore.  It should be something that identifies the remote host.  
473
  </li>
474
  <li>Restart Apache and Tomcat on both replication machines</li>
475
  </ul>
476

    
477
  <a href="./packages.html">Back</a> | <a href="./metacattour.html">Home</a> | 
478
  <a href="./datafiles.html">Next</a>
479
  </ul>
480
  
481

    
482
</BODY>
483
</HTML>