Non-XML Data Files

Back | Home | Next

Metacat has the capability of proxying a connection to a data file server. This capability is provided through an abstract class called DataFileUploadInterface. Currently, there is a data file server implemented for this interface that can store non-XML data files on the local UNIX file system. Theoretically, any file storage system could be proxied through this interface.

architecture diagram of the Metacat Data File Upload process

Reasoning

Since Metacat has been designed as a metadata storage system for ecological data, there was a concern that data and metadata files were being stored seperately and that users would find it impractical to have to use two different systems to get their data and metadata. This system also allows users to specify access control restraints on their data files the same way that they are specified on the xml metadata files.

Data Download (GET)

Metacat Server can retrieve data files that are stored on Metacat file system or in any other file system on Internet.
Metacat stores data files in a directory under the servlet context. It writes information about these data files in xml_documents just like about any XML file.
An example of HTTP request for download of data file on Metacat is shown below:

http://server.domain.com/metacat?action=read&docid=nceas.55

An example of HTTP request for download of data file on Internet is shown below:

http://server.domain.com/metacat?action=read&docid=http://otherserver.domain.com/filename

Note in docid=http://otherserver.domain.com/filename HTTP protocol is used, i.e. currently for file download HTTP protocol is only supported.

Data Upload (PUT)

Due to a Java limitation on the HTTP PUT comand, the data upload portion of Metacat deviates from the standard HTTP interface. A standard bidirectional TCP/IP socket is used for tranfering the data. The procedure for uploading a file is as follows.

  1. The client must login to Metacat and get a session_id
  2. The client sends a request to the servlet with an action of 'getdataport'.
  3. The server responds with an xml message that includes a port number. The message looks like:
    <xml version="1.0"?><port>xxxx</port>
    where xxxx is an open port between 0 and 65000.
  4. The client then can create a socket connection to the returned port. Note that the client must make this connection within 30 seconds or the port will close.
  5. The data can now be sent but first, some extra information must be appended onto the beginning of the data stream. The extra information looks like:
     [filename]0[sessionID]0[filelength]0[DATA] 
    The filename, sessionID and filelength must be converted into a byte string, terminated with a 0 (zero) byte and inserted into the stream in the order shown. The filelength is in bytes. The DATA stream does not need to be terminated with a 0 byte.
  6. After the upload the server will return either an error message or a success message which will include the docid of the new data file. Both messages are encoded in xml (like the port message). The success message looks like:
    <?xml version="1.0"?><docid>yyyy</docid>
    where yyyy is the new docid.
  7. The file DataStreamTest.java is a test class that shows how a client must operate to successfully upload a data file to Metacat.


    Back | Home | Next