Metacat has built-in replication to allow different Metacat servers to
share data between themselves. Metacat not only replicates XML documents but
also data files.
Metacat's hub feature allows it to replicate not only it's own server's original
documents, but also those that were replicated from other servers. This functionality
allows for a more complex chaining replication structure.
The replication scheme that Metacat uses is both push and pull. There are
several triggers that can start a replication mechanism:
- Delta-T monitoring - at a set time interval a server checks each of the
other servers in its list for updated documents
- INSERT trigger - Whenever a document is inserted, the server notifies
the remote hosts in its list that it has a new file available.
- UPDATE trigger - Whenever a document is updated, the server notifies
each server in its list of the update.
- File locking - When a local user tries to alter a document on a local
server that belongs to a remote server, the local server must first
obtain a lock on that file. Once the lock is obtained, the file can
be updated, then it is force replicated out to each server in the list.
The lock ensures that the remote copy is up to date and that an older
file does not overwrite a newer one. Only a documents home server
can give a lock for that file to be altered.
Each server contains a list of servers to which it can replicate. One-way
replication is enabled by the 'replicate' and 'datareplicate' flags in the
list. The server list may look like the following.
serverid |
server |
last_checked |
replicate |
datareplicate |
hub |
1 |
localhost |
null |
0 |
0 |
0 |
2 |
alpha.nceas.ucsb.edu:8080/berkley/servlet/replication |
2001-01-22 14:52:12.1 |
0 |
0 |
0 |
3 |
dev.nceas.ucsb.edu/Metacat/servlet/replication |
2001-01-23 9:10:02.5 |
1 |
1 |
0 |
The server list is kept in a table in the database called xml_replication.
Localhost must always be the first entry in the table and have a serverid of 1.
The database fields are:
- serverid - a unique ID that is generated by the database when a new field is added.
- server - this field always points to the partner server's replication servlet,
hence the "servlet/replication" on the end of both of the sample servers. Note
that any port numbers (if your servlet engine is not running on port 80) must
also be included.
- last_checked - a system generated values that holds the last time that a check was
made to see if replication needed to be performed.
-
- replicate - flag that is set to 1 if you want this server to replicate XML
metadata documents TO the remote host. Note that if this flag is set to 0, datareplicate
and hub fields have no meaning.
- datareplicate - flag that is set to 1 if you want this server to copy data
files to the remote host. Note that this field has no meaning if replicate is not set to 1.
If this server is a hub to the remote host, the hub flag should be set to.
- hub - if this flag is set to true, this server will not only replicate it's own
original documents, it will also replicate documents that were replicated to it. Thus it
acts as a replication hub to one or more other Metacat servers.
Here we show an example setup of three replication servers. We will discuss each.
First, note that in order for replication to occur, both partner servers must have
each other in their respective tables or replication will not take place. Also,
certificates must be set up correctly on both servers in order for replication to
work. See the certificates section below.
host |
replication table |
gamma.nceas.ucsb.edu |
server |
last_checked |
replicate |
datareplicate |
hub |
localhost |
null |
0 |
0 |
0 |
alpha.nceas.ucsb.edu:8080/berkley/servlet/replication |
2001-01-22 14:52:12.1 |
0 |
0 |
0 |
lamda.nceas.ucsb.edu/Metacat/servlet/replication |
2001-01-23 9:10:02.5 |
1 |
1 |
0 |
|
alpha.nceas.ucsb.edu |
server |
last_checked |
replicate |
datareplicate |
hub |
localhost |
null |
0 |
0 |
0 |
gamma.nceas.ucsb.edu:8080/berkley/servlet/replication |
2001-01-21 11:33:12.7 |
0 |
1 |
0 |
lamda.nceas.ucsb.edu/Metacat/servlet/replication |
2001-01-23 10:22:02.5 |
1 |
0 |
0 |
|
lamda.nceas.ucsb.edu |
server |
last_checked |
replicate |
datareplicate |
hub |
localhost |
null |
0 |
0 |
0 |
gamma.nceas.ucsb.edu:8080/berkley/servlet/replication |
2001-01-21 11:33:12.7 |
0 |
0 |
0 |
alpha.nceas.ucsb.edu:8080/Metacat/servlet/replication |
2001-01-22 12:15:32.5 |
1 |
1 |
1 |
|
- The localhost entry is required internally for replication to work on
gamma. As long as we see it there, we can safely disregard it.
- We see the entry for the alpha machine has all zeros in replicate,
datareplicate and hub columns. This means that gamma is configured to
accept replication information from alpha. (As we will see in a moment,
alpha is not actually correctly configured to send data to gamma.)
- We see that the entry for the lamda machine has ones in the replicate
and data replicate columns and a zero in the hub column. This tells us
that gamma will replicate it's original documents to lamda, assuming that
lambda is configured to accept replication from gamma (we will see that it
is). However, because the hub value is zero, any documents that replicate
to gamma will not be further replicated to lamda.
- The localhost entry is required internally for replication to work on
alpha. As long as we see it there, we can safely disregard it.
- We see that the entry for gamma has a zero in the replicate column.
This means that all other entries are meaningless and can be disregarded.
Even though there is a one in the datareplicate column on alpha and gamma
is configured to accept replication from alpha, no replicationwill happen
from alpha to gamma.
- We see that the entry for lamda is a one in the replicate column and zeros
in the datareplicate and hub columns. Assuming lamda is configured to
accept replication from alpha, alpha will replicate metadata only to lamda
(and indeed, we will see that lambda is set up to accept replication from
alpha).
- The localhost entry is required internally for replication to work on
lamda. As long as we see it there, we can safely disregard it.
- We see that the entry for gamma has all zeros in replicate, datareplicate
and hub, so lamba is set up to accept replication from gamma. As we have
already seen, gamma is correctly configured to replicate metadata and data
to lambda. We should see data and metadata replication from gamma to lamda.
- We see that the entry for alpha has ones in the replicate datareplicate and
hub columns. There's a lot going on here:
- First, lamda will replicate original metadata and data to alpha if
alpha is configured to accept replication from lamda. Because alpha
has an entry for lambda, lamba will be allowed to replicate to alpha.
- Second, because the alpha entry has a one in the hub column, lambda
will not only replicate it's original data, it will also replicate
data that was replicated to it. Remember that gamma was configured
to replicate to lamda. So any data or metadata that gamma sends to
lambda will get further replicated to alpha.
- Finally, the alpha entry in the table allows the alpha server to
replicate to lambda. Since the alpha server is set up to replicate
metadata only, we would expect any original metadata on alpha to
wind up on lambda.
There is an html control panel for controling replication. After
installing Metacat, you can access
it by going through the Metacat servlet context you have setup and calling up
replControl.html. For instance, if you setup a Metacat servlet instance
called 'knb' you would probably type
http://server.domain.com:8080/Metacat/style/skins/dev/replControl.html
The control panel is an easy interface for adding/removing/altering servers and
starting the delta-T handler. It will also allow you to 'force replicate' your
server list. This is useful if you want to initialize the state of one Metacat
server from an existing state of another (i.e. copy all of the data from an existing
server).
You will need to generate security certificates on both the replication client
and server. The certificates will be exchanged so that each machine understands
that the other has access for replication.
The following are the steps to generate and exchange certificates on systems
running Tomcat 5 and java 1.5. Note that if Tomcat is running in conjunction with
Apache, the process is somewhat different than if it is running standalone.
- Generate keys in java default key store - this will create a secure key and put it
into the binary certificates file located at $JAVA_HOME/lib/security/cacerts
-
Password - keytool will ask for a password. If this is a pre-existing keystore, you will need
to know its password to modify it. If you are creating a new keystore, the password you enter
will become the keystore password.
- Sample values when creating certificate
- What is your first and last name? myserver.nceas.ucsb.edu
(note: use the host name without port number)
-
- What is the name of your organizional unit? NCEAS
- What is the name of your organizional unit? UCSB
- What is the name of your City or Locality? Santa Barbara
- What is the name of your State or Province? California
(note: this is spelled in full)
-
- What is the two-letter country code for this unit? US
- Generate certificate - this will pull the certificate you created from the cacerts file
and put it into a local file
- Enable SSL in Tomcat
- Generate keys using openssl
- Sample values when creating certificate
- Country Name (2 letter code) [AU]: US
- State or Province Name (full name) [Some-State]: California
(note: this is spelled in full)
- Locality Name (eg, city) []: Santa Barbara
- Organization Name (eg, company) [Internet Widgits Pty Ltd]: UCSB
- Organizational Unit Name (eg, section) []: NCEAS
- Common Name (eg, YOUR name) []: myserver.mydomain.edu
(note: use the host name without port number)
- Email Address []: administrator@mydomain.edu
- A challenge password []: (note: leave blank)
- An optional company name []: (note: leave blank)
- Generate certificate - this will create a local file with your certificate
- Enter the certificate into apache security configuration - you need to register the certificate
in the local Apache instance. Note that the security files may be in a different place depending
on how you installed apache.
- Copy the certificate and key file to the apache ssl directories and enable ssl.
- For Ubuntu/Debian based systems:
- sudo cp <hostname>-apache.crt /etc/ssl/certs
- sudo cp <hostname>-apache.key /etc/ssl/private
- As root edit /etc/apache2/sites-available/default. In the VirtualHost section
after the DocumentRoot line, add:
SSLEngine on
SSLOptions +FakeBasicAuth +ExportCertData +CompatEnvVars +StrictRequire
SSLCertificateFile /etc/ssl/certs/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key
- For other systems:
- sudo cp <hostname>-apache.crt $APACHE_HOME/conf/ssl.crt
- sudo cp <hostname>-apache.key $APACHE_HOME/conf/ssl.key
- ADD STEPS TO ENABLE SSL ON NON_DEBIAN SYSTEMS HERE
- scp <hostname>-apache.crt to the replication partner machine.
At this point, you have created a certificate for each replication server and
scp-ed them across to each other. Now you need to import the remote server's
certificate on the local machine. Perform the following steps for each
replication server.
Back | Home |
Next