Metacat UNIX Installation Instructions
KNB Home Data People Informatics Biocomplexity Education Software

***Disclaimer***

These installation instructions are meant for a systems administrator/DBA or someone who is an advanced computer user. They are NOT meant for the average computer user. Please realize that by executing these instructions, you may have to trouble shoot many advanced issues yourself.

The Table of Contents

Operating System Specific Instructions

These documents are meant to outline the metacat installation process on specific platforms. They are not a substitute for the below instructions and only meant as a supplemental guideline.

Pre-Installation

Minimum Requirements

Installing Metacat requires a server running an SQL92 compliant database (Oracle 8i or Postgresql recommended) with at least 128MB RAM, and a Pentium III class processor or higher. The amount of disk space required depends on the size of your RDBMS tablespace (which should be at least 10 MB, however Metacat itself requires only about 1 MB of free space after installation). These instructions assume a Linux environment but may work on other UNIX type environments, however this has not been tested.

Additional Required Software

The server on which you wish to install Metacat must have the following software installed and running correctly before attempting to install Metacat.

  • Oracle 8i (or another SQL92 compliant RDBMS like Postgres)
  • Apache Jakarta-Ant
  • Apache Jakarta-Tomcat

    Note: For a more robust web serving environment, Apache web server should be installed along with Tomcat and the two should be integrated as described on the Apache web site.

Aditional Software Setup

Java

You'll need a recent Java SDK; J2SE 1.4.2 or later is required. The latest metacat release has been tested most extensively with J2SE 5.0 and this is the recommended version. Make sure that JAVA_HOME environment variable is properly set and that both java and javac are on your PATH.

Oracle 8i or Postgres

Oracle:
The Oracle RDBMS must be installed and running as a daemon on the system. In addition the JDBC listener must be enabled. You can enable it by logging in as your Oracle user and typing the following:

lsnrctl start
Your instance should have a table space of at least 5 MB (10 MB or higher recommended). You should also have a username specific to Metacat created and enabled. This user must have most normal permissions including CREATE SESSION, CREATE TABLE, CREATE INDEX, CREATE TRIGGER, EXECUTE PROCEDURE, EXECUTE TYPE, etc. If an action is unexplainably rejected by Metacat it is probably because the user permissions are not correctly set.

Postgres:
Postgres can be easily installed on most linux distributions and on Windows (using cygwin) and Mac OS X. Using Fedora Core or RedHat Linux, you can install the rpms for postgres and then run /etc/init.d/postgresql start in order to start the database. On Ubuntu and other Debian-based Linux distributions, you can use the apt-get command to install postgres: sudo apt-get install postgresql-8.0 and then run /etc/init.d/postgresql-8.0 start to start. This initializes the data files. You need to do a bit of configuration to create a database and set up a user account and allow internet access via jdbc. See the postgres documentation for this, but here is a quick start:

  • Switch to the "postgres" user account and edit "data/pg_hba.conf", adding the following line to the file:
    host metacat metacat 127.0.0.1 255.255.255.255 password
    If your host uses IPv6 addresses, you made need this line instead: host metacat metacat ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff password
  • If you are using Postgresql pre-8.0, you must edit the "data/postgres.conf" file and uncomment and edit the line starting with "tcpip_socket" so that it reads tcpip_socket = true
  • Run createdb metacat to create a new database
  • Run psql metacat to log in using the postgres account and create a new "metacat" user account
    • In postgres, run CREATE USER metacat WITH UNENCRYPTED PASSWORD 'apasswordyoulike';
    • This creates a new account called metacat on the database named metacat
    • Note: there are many ways to do this, so others such as using ENCRYPTED passwords will work fine.
  • Exit the postgres account back to root and restart the postgres database with /etc/init.d/postgresql restart
  • Test logging into the postgres db using the metacat account with the following command: psql -U metacat -W -h localhost metacat

Ant

Ant is a Java based build application similar to Make on UNIX systems. It takes in installation parameters from a file in the root installation directory named "build.xml". The Metacat CVS module contains a default build.xml file that may require some modification upon installation. Ant should be installed on the system and the "ant" executable shell script should be available in the users path. The latest metacat release was tested with Ant 1.6.5.

Tomcat

Install Tomcat into the directory of your choice. The directory in which you install Tomcat itself will be referred to as the "$CATALINA_HOME". We recommend that you install Tomcat version 5.5. More details about Tomcat installation are available here.

Edit build.properties File

Once all of the prerequisite software is installed as described above, the installation of Metacat can begin. First you must have a current version of the source distribution of Metacat. You can get it two ways. Authorized users can check it out of the NCEAS CVS system. You'll need both the "metacat" module and the "utilities" module to be checked out in sibling directories. The command is as follows:

mkdir knb-software
cd knb-software
cvs checkout -P metacat
cvs checkout -P utilities
Or you can download a gzipped tar file from this site.

Once you have either checked out or unzipped and untarred the source distribution, you can begin the installation process. Change into the metacat directory and edit the file called "build.properties". You will need to change a number of configuration properties to match the setup on your system. The property values that you will likely need to change are described in detail in the following table:


Property Description Default value and examples of other values
tomcat The tomcat property is the location in which tomcat is installed. Default:   /usr/local/devtools/jakarta-tomcat

Example:   C:/Tomcat-5.5
deploy.dir The deploy.dir property is the location in which your tomcat servlet contexts are deployed. This is typically "${tomcat}/webapps", where ${tomcat} is the same value that you entered for the 'tomcat' property above. Default:   /var/www/org.ecoinformatics.knb

Example:   C:/Tomcat-5.5/webapps
tomcatversion The tomcatversion property is the version of Tomcat in which you want Metacat to run. This will determine the location of some jar files coming with Tomcat.

Note: Tomcat 3 and 4 are no longer tested or supported by Metacat. Users are highly encouraged to upgrade to Tomcat 5.5.

Also note: a property value of 'tomcat5' can be set when using either Tomcat 5.0 or Tomcat 5.5.

Default:   tomcat5

Other possible values (deprecated):   tomcat3 tomcat4
metacat.context The metacat.context property is the name of the servlet context in which you want Metacat to be installed. This will determine the installation directory for the servlet and many of the URLs that are used to access the installed Metacat server. Default:   knb

Example:   mycontext
config.hostname The config.hostname property is the hostname of the server on which Metacat is running (note that you should not include the 'http://' in the config.hostname property). Default:   knb.ecoinformatics.org

Example:   somehost.university.edu
config.port The config.port property is the HTTP plain port number that is used to connect to Metacat. If Tomcat is running stand-alone, the value will typically be 8080. Default:   80

Example:   8080
config.port.https The config.port.https property is the HTTP secure port number that is used to connect to Metacat, generally when replicating documents to and from other Metacat servers. If Tomcat is running stand-alone, the value will typically be 8443. Default:   443

Example:   8443
ldapUrl URL to the LDAP server. The LDAP server is used in the default authentication module to authenticate and identify users of the system. To participate in the KNB network, you should leave this at the default. But it can be changed if you want to use a different directory of users. Default:   ldap://ldap.ecoinformatics.org/dc=ecoinformatics,dc=org
database Select the database to use for metadata storage.

The build file is preconfigured to install Metacat either using Oracle, PostgreSQL, or Microsoft SQL Server as a backend database. To change the database system, simply change the value of the 'database' property to be the name of the database target that you wish to use.

Valid values are oracle, postgresql, or sqlserver. Note that sqlserver support is minimal and probably will not work without substantial changes on your part, possibly including code changes. We have not recently tested on sqlserver.

Default:   postgresql

Other possible values:   oracle   sqlserver
jdbc-connect The JDBC connection string used to connect to the database. Default:   jdbc:postgresql://localhost/metacat

Example:   jdbc:oracle:thin:@somehost.university.edu:1521:metacat
jdbc-base The base directory for locating JDBC jar files. When using the postgresql database, the default setting of './lib' can be used, while oracle and sqlserver databases will require a different setting since these jar files are not included in the Metacat distribution. Default:   ./lib

Example:   /usr/oracle/jdbc/lib
user The database user name that you set up to use Metacat. Default:   metacat

Example:   metacatuser
password The database password that you set up to use Metacat. Default:   yourPasswordHere

Example:   metacat123
datafilepath The datafilepath is the directory to store data files. Default:   /var/metacat/data

Example:   C:/Tomcat-5.5/data/metacat/data
inlinedatafilepath The inlinedatafilepath is the directory to store inline data that has been extracted from EML documents. Default:   /var/metacat/inline-data

Example:   C:/Tomcat-5.5/data/metacat/inlinedata
default-style The default-style parameter defines the "style-set" that is to be used by default when the qformat parameter is missing or set to "html" during a query. It is set to "default", which is one of the styles that ships with the default metacat distribution. Other possible settings are shown in the examples to the right. Default:   default

Examples:esa kepler knb knb2 knp lter ltss nceas nrs obfs pisco specnet
administrators The administrators parameter lists the accounts that are allowed to perform administrative actions such as rebuilding indices for documents. The list can contain more than one account separated by colons. Default:   uid=jones,o=NCEAS,dc=ecoinformatics,dc=org

Examples:   uid=localadmin,o=ucnrs.org
authority.context This is the context for the (Life Sciences Identifier) LSID authority. LSID support is an optional feature which can be configured to provide metacat access to LSID clients. For more information on LSID's see TDWG site. Default: authority
config.lsidauthority This is the name of the LSID authority that this metacat should use. This authority needs to be defined as SRV record in a DNS.

Default: ecoinformatics.org

Examples: esa.org or sulphur.ecoinformatics.org

install.ecogrid Enables EarthGrid web services. EarthGrid web services are disabled by default. To enable EarthGrid web services (including query, put, authentication and identifier interface), set this value to true, and also set the value of the metacat.dir property as detailed below. Default:  false
metacat.dir If the install.ecogrid property (see above) is set to true, this property should be set to the absolute path of the top-level metacat directory. If install.ecogrid is set to false (the default setting), the value of metacat.dir is ignored. Default:  /home/tao/project/metacat

Other properties in build.properties that you can (but generally need not) change:

Property Description Default value and examples of other values
server The server property is the hostname and port number of the server that Metacat uses for replicating documents to and from other Metacat servers, which should be with the secure (HTTPS) port. Since this property is usually composed of the config.hostname and config.port.https properties (described above), the default setting can be used in most cases. Default:  ${config.hostname}:${config.port.https}
httpserver httpserver is the plain HTTP address and port number that Metacat uses for purposes other than replication. Since this property is usually composed of the config.hosthame and config.port properties (described above), the default setting can be used in most cases. Default:  ${config.hostname}:${config.port}
http.protocol http.protocol is the string used in the leading part of a URL to indicate use of the HTTP protocol. Default:  http
config.metacatserver The URL to the metacat server, composed in part from three other properties (http.protocol, httpserver, and metacat.context). Default:  ${http.protocol}://${httpserver}/${metacat.context}/metacat
inst.cgi.dir Installation directory for registry CGI scripts Default:   /var/www/cgi-knb
cgi-prefix The URL used for executing CGI scripts Default:   http://${httpserver}/cgi-bin
cvsroot CVS access to retrieve latest EML. Only used by developers in building the release. Default:  
:ext:${env.USER}@cvs.ecoinformatics.org:/cvs
Example:  
:ext:myaccount@cvs.ecoinformatics.org:/cvs
knb-site-url This is the URL to the web context root for the knb site. It is used for the qformat=knb skin only. Default:   http://knb.ecoinformatics.org
timedreplication Determines whether timed replication to other metacat servers is being used. Default:   false

Other possible values:   true
firsttimedreplication The time for starting first timed replication if timedreplication is true. (See comments in build.properties file for additional details.) Default:   10:00 PM  
timedreplicationinterval The interval to next timed replication if timedreplication is true. The value is in milliseconds and default value is 48 hours. Default:   172800000  
forcereplicationwaitingtime The waiting time before replication is forced to begin after uploading a package. The default value should usually suffice. Default:   30000  
log.dir The directory where replication log files are to be written by Metacat. Default:  ${tomcat}/logs
moderators Moderator accounts, in a colon-separated list. Specifies a list of special users who can review a general user's submission. Moderators can approve, revise and reject the submission after reviewing. This property is only used in the ESA skin. Default:  cn=knb-prod,o=NCEAS,dc=ecoinformatics,dc=org
allowedSubmitters Specifies the list of users who should be allowed to submit documents. If no value is specified (the default setting), all users will be allowed to submit documents. Default:  (no value)
deniedSubmitters Specify the list of users who should not be allowed to submit documents. If no value is specified (the default setting), all users will be allowed to submit documents. Default:  (no value)
config.metadataLabelLsid Default:  ${config.lsidauthority}
build.dir The name of the subdirectory that is created when metacat is built by the 'ant' tool. Default:  build
lsid.build.dir The name of the subdirectory for building the LSID component of metacat. Default:  ${build.dir}/lsid
lib.dir The name of the subdirectory where library (.jar) files and a number of other important files are located. Default:  lib
lsid.lib.dir The name of the subdirectory where LSID library (.jar) files are located. Default:  ${lib.dir}/lsid_lib
lsid.classes.dir The relative path to the location of Java classes that support LSID. Default:  edu/ucsb/nceas/metacat/lsid
conf.dir The name of the directory where LSID configuration files are located. Default:  lib/lsid_conf
services.dir The name of the directory where LSID services configuration files are located. Default:  ${conf.dir}/services
webinf.dir The name of the directory where additional LSID web services files are located. Default:  ${conf.dir}/WEB-INF
compile.debug Indicates whether Java source should be compiled with debug information. Default:  true
compile.deprecation Indicates whether Java source should be compiled with deprecation information. Default:  false
compile.optimize Indicates whether Java source should be compiled with optimization. Default:  true
indexPaths The indexPaths property specifies a comma-separated (potentially long) list of indexed paths that can be utilized to improve the performance of metacat queries. Each component of the indexPaths property should specify an absolute or relative path (using an XPath-like syntax) to an XML element or attribute present in the XML documents being queried. Metacat stores XML element and attribute values for indexed paths in a special database table that optimizes search performance.

Metacat queries allow you to specify (using the <pathexpr> tag in search query) an exact path to which you want to restrict the search. When the <pathexpr> path that is specified in the query is a member of the indexPaths list, search results are returned faster because metacat can conduct its search using the optimized database table.

The default value for the indexPaths property contains numerous paths to EML elements and attributes that are commonly queried by metacat search tools. For example, keyword is a member of this list because it is common for search tools to query the keyword field in EML documents. For most purposes, the default value will optimize various types of searches on EML documents and need not be changed.

For more information about metacat queries, see Queries and Results.

Default:  organizationName,originator/individualName/surName,...

Metacat has a number of additional settable properties in file lib/metacat.properties. Under most circumstances, you will not need to modify this file because the properties of interest to you can be controlled by editing build.properties as described above. To learn more about Metacat's additional properties, see Metacat Properties File.

Note: When setting properties, DO NOT add a trailing slash [/] to the end of any paths that are specified. Metacat will not function correctly if you do so.

Compilation and Installation

Ant allows compilation and installation to be done in one step. Change into the metacat directory and type:

ant install
or, if you are upgrading an existing installation, type:
ant clean upgrade

You should see a bunch of messages telling you the progress of compilation and installation. When it is done you should see the message BUILD SUCCESSFUL and you should be returned to a UNIX command prompt. If you do not see the message BUILD SUCCESSFUL then there was an error that you need to resolve. This may come up if you are logged in as a user that does not have write access to one or more of the directories that are listed in the build.xml file, or if any of the paths to files are not configured correctly in the "config" target.

Note: The 'data' directories that are indicated in the 'datafilepath' and 'inlinedatafilepath' build properties must be writeable by user account under which Tomcat runs or you will not be able to upload data files to the system.

To install metacat LSID support, adjust the LSID-related properties in the build.properties files and type:

ant deploy-lsid

SQL Scripts

You now need to set up the table structure in your database. You can do either do this using the ant build system, or by manually running the scripts using a sql utility.

WARNING: Do NOT run this on an existing metacat installation as it will delete all of your data. If you have an existing metacat installation, see the instructions for "Upgrading" below.

To run the scripts using ant, type ant installdb. This does not work for postgres, so you'll need to run the xmltables-postgres.sql script manually (see next paragraph).

To run the scripts manually, change to the metacat/build/src directory. Then run you RDBMS's SQL utility. In Oracle it is SQLPlus. This tutorial assumes an Oracle database so this example is for SQLPlus. Login as the oracle user that was set up for use with Metacat. At the SQLPlus prompt type the following:

@xmltables.sql;
For postgres, use a command like: psql -U metacat -W -h localhost -f build/src/xmltables-postgres.sql metacat

Either way, you should see a bunch of output showing the creation of the Metacat table space. The first time you run this script you will get several errors at the beginning saying that you cannot drop a table/index/trigger because it does not exist. This is normal. Any other errors besides this need to be resolved before continuing. The script file name for PostgreSQL is xmltables-postgres.sql and for Microsoft SQL server is xmltables-sqlserver.sql.

If the script has run correctly you should be able to type

describe xml_documents
and it should show:
    Name            Null?         Type
    --------------  ------------  ---------------- 
     DOCID          NOT NULL      VARCHAR2(250)
     ROOTNODEID                   NUMBER(20)
     DOCNAME                      VARCHAR2(100)
     DOCTYPE                      VARCHAR2(100)
     DOCTITLE                     VARCHAR2(1000)
     USER_OWNER                   VARCHAR2(100)
     USER_UPDATED                 VARCHAR2(100)
     SERVER_LOCATION              NUMBER(20)
     REV                          NUMBER(10)
     DATE_CREATED                 DATE
     DATE_UPDATED                 DATE
     PUBLIC_ACCESS                NUMBER(1)
     UPDATED                      NUMBER(1)
   

Registering schemas and DTDs

Once the tables have been created, you should also register the Ecological Metadata Language (EML) DTDs and schemas. However, note that you should NOT do this if you are upgrading an existing installation -- the upgrade scripts take care of it for you (see the next section). If you are installing new, you can register the schema documents by running:

ant register-schemas

This command registers the EML DTDs' and schemas' location in the metacat server. Your database username and password have to be set correctly for this to work. Also, if for some reason running this script from ant does not work, you could instead try running "build/src/loaddtdschema.sql" from your sql utility (but be sure to use the version in the 'build' directory that has been customized for your installation).

Upgrading SQL Scripts

If you have an existing metacat installation, you should not run the install script because it will replace all of the older tables with new, empty copies of the tables. Thus you would lose your data! Instead, you can run some upgrade scripts that will change the table structure as needed for the new version. If you are skipping versions, run each upgrade script for the intermediate versions as well. Currently the upgrade scripts are:

  • build/src/upgrade-db-to-1.2.sql
  • build/src/upgrade-db-to-1.3.sql
  • build/src/upgrade-db-to-1.4.sql
  • build/src/upgrade-db-to-1.5.sql
  • build/src/upgrade-db-to-1.6.sql
  • build/src/upgrade-db-to-1.7.sql
  • build/src/upgrade-db-to-1.7.1.sql

For example, if you had an existing metacat 1.4 installation and you were upgrading to metacat 1.7, you would need to run three scripts in sequence: upgrade-db-to-1.5.sql, upgrade-db-to-1.6.sql, and upgrade-db-to-1.7.sql. However, if you were starting from a Metacat 1.6 installation, you would only need to run the upgrade-db-to-1.7.sql script. Be sure to use the version of the scripts from the 'build/src' directory: they are customized for your installation in that directory.

Restart Tomcat

Once you have successfully installed Metacat, there is one more step. Tomcat (and Apache if you have Tomcat integrated with it) must be restarted. To do this, login as the user that runs your tomcat server (often "tomcat"), go to $CATALINA_HOME/bin and type:

   ./shutdown.sh 
   ./startup.sh 
   
In the Tomcat startup messages you should see something in the log file like:
	Metacat: [WARN]: Metacat (1.7.0) initialized. [edu.ucsb.nceas.metacat.MetaCatServlet]
   
If you see that message Tomcat is successfully loading the Metacat servlet. Next, try to run your new servlet. Go to a web browser and type:
http://yourserver.yourdomain.com/yourcontext/
You should substitute your context name for "yourcontext" in the url above. If everything is working correctly, you should see a query page followed by an empty result set. Note that if you do not have Tomcat integrated with Apache you will probably have to type
http://yourserver.yourdomain.com:8080/yourcontext/

Troubleshooting: If you see something like java.lang.InternalError: Can't connect to X11 window server using 'yourservanme:0.0' as the value of the DISPLAY variable.

You should add this line: JAVA_OPTS="-Djava.awt.headless=true $JAVA_OPTS" at the first line of catalina.sh file in tomcat bin directory. The reason is that GeoServer uses X11 windows to draw graphics.

Deploy wsdl file (Only for EarthGrid-enabled Metacat installation)

Once Tomcat is running successfully, there is another step for EarthGrid-enabled Metacat installation. In metacat directory, type:

ant deploy-ecogrid
It will generate wsdl files for EarthGrid services.