.. raw:: latex \newpage Metacat Indexing =========================== Lorem ipsum SOLR background information --------------------------- Features: * something * something * more * even more Something to explain the advantage of solr over the old metacat index approach Indexed documents and fields ----------------------------- Metacat reuses the default DataONE index which includes many common metadata formats out-of-the-box 1. EML 2. FGDC 3. Dryad Default indexed fields ----------------------- Describe the existing fields like in the DataONE docs, with link to them Index configuration overview ---------------------------- Describe the configuration files and extension points for the implementation Adding additional document types and fields -------------------------------------------- Step-by-step guide for adding new documents and indexed fields. Querying the index -------------------- Provide example SOLR queries and expected results. Show a variety of return types and query facets. Access Policy enforcement ------------------------- Explain how access control is processed and honored when utilizing the index. Regenerating the index from scratch ----------------------------------- When the SOLR index has been drastically modified, a complete regenration of the index may be necessary. In order to accomplish this: Step-by-step instructions NOTE: this may take a long time depending on the size of your Metacat store. Class design overview ---------------------- .. figure:: images/indexing-class-diagram.png Figure 1. Class design overview. .. @startuml images/indexing-class-diagram.png package cn-index-processor.parser { interface IDocumentSubprocessor { + boolean canProcess(Document doc) + initExpression(XPath xpath) + Map processDocument(String identifier, Map docs, Document doc) } class AbstractDocumentSubprocessor { - List fields + setMatchDocument(String matchDocument) + setFieldList(List fieldList) } class ResourceMapSubprocessor { } class ScienceMetadataDocumentSubprocessor { } interface ISolrField { + initExpression(XPath xpathObject) + List getFields(Document doc, String identifier) } class SolrField { - String name - String xpath - boolean multivalue } class CommonRootSolrField { } class FullTextSolrField { } class MergeSolrField { } class ResolveSolrField { } class SolrFieldResourceMap { } class SolrDoc { - List fieldList } class SolrElementField { - String name - String value } } IDocumentSubprocessor <|-- AbstractDocumentSubprocessor AbstractDocumentSubprocessor <|-- ResourceMapSubprocessor AbstractDocumentSubprocessor <|-- ScienceMetadataDocumentSubprocessor ISolrField <|-- SolrField SolrField <|-- CommonRootSolrField SolrField <|-- FullTextSolrField SolrField <|-- MergeSolrField SolrField <|-- ResolveSolrField SolrField <|-- SolrFieldResourceMap AbstractDocumentSubprocessor o--"*" ISolrField IDocumentSubprocessor --> SolrDoc SolrDoc o--"*" SolrElementField package solr { abstract class SolrServer { + add(SolrInputDocument doc) + deleteByQuery(String id) + query(SolrQuery query) } class EmbeddedSolrServer { } class HttpSolrServer { } } SolrServer <|-- EmbeddedSolrServer SolrServer <|-- HttpSolrServer package edu.ucsb.nceas.metacat.indexer { class MetacatSolrIndex { - List subprocessors - SolorFiledParser solrFieldParser - EmbeddedSolrServer solrServer + insert(String pid, InputStream data) + update(String pid, InputStream data) + remove(String pid) + OutputStream query(String solrQuery) } class SolrFieldParser { - List solrFields + SolrFieldParser(InputStream config) + List getSolrFields() } } MetacatSolrIndex *--"1" EmbeddedSolrServer MetacatSolrIndex --> SolrFieldParser MetacatSolrIndex o--"*" IDocumentSubprocessor SolrFieldParser --> SolrField @enduml