Project

General

Profile

1 7521 leinfelder
.. raw:: latex
2
3
  \newpage
4
5
6 7501 leinfelder
Metacat Indexing
7
===========================
8
Lorem ipsum
9
10
SOLR background information
11
---------------------------
12
Features:
13
14
* something
15
* something
16
* more
17
* even more
18
19
Something to explain the advantage of solr over the old metacat index approach
20
21
Indexed documents and fields
22
-----------------------------
23
Metacat reuses the default DataONE index which includes many common metadata formats
24
out-of-the-box
25
26
1. EML
27
2. FGDC
28
3. Dryad
29
30
31
Default indexed fields
32
-----------------------
33
Describe the existing fields like in the DataONE docs, with link to them
34
35
36
Index configuration overview
37
----------------------------
38
Describe the configuration files and extension points for the implementation
39
40
41
Adding additional document types and fields
42
--------------------------------------------
43
Step-by-step guide for adding new documents and indexed fields.
44
45
46
Querying the index
47
--------------------
48
Provide example SOLR queries and expected results. Show a variety of return types
49
and query facets.
50
51
52
Access Policy enforcement
53
-------------------------
54
Explain how access control is processed and honored when utilizing the index.
55
56
57
Regenerating the index from scratch
58
-----------------------------------
59
When the SOLR index has been drastically modified, a complete regenration of the
60
index may be necessary. In order to accomplish this:
61
62
Step-by-step instructions
63
64
NOTE: this may take a long time depending on the size of your Metacat store.
65 7521 leinfelder
66
67
68
Class design overview
69
----------------------
70
71
.. figure:: images/indexing-class-diagram.png
72
73
   Figure 1. Class design overview.
74
75
..
76
  @startuml images/indexing-class-diagram.png
77
78 7526 leinfelder
	package cn-index-processor.parser {
79
80
		interface IDocumentSubprocessor {
81
			+ boolean canProcess(Document doc)
82
			+ initExpression(XPath xpath)
83
			+ Map<String, SolrDoc> processDocument(String identifier, Map<String, SolrDoc> docs, Document doc)
84 7521 leinfelder
		}
85 7526 leinfelder
		class AbstractDocumentSubprocessor {
86
			- List<SolrField> fields
87 7521 leinfelder
		}
88 7526 leinfelder
		class ResourceMapSubprocessor {
89
		}
90
		class ScienceMetadataDocumentSubprocessor {
91
		}
92
93
		interface ISolrField {
94
			+ initExpression(XPath xpathObject)
95
			+ List<SolrElementField> getFields(Document doc, String identifier)
96
		}
97
		class SolrField {
98
			- String name
99 7521 leinfelder
			- String xpath
100 7526 leinfelder
			- boolean multivalue
101 7521 leinfelder
		}
102 7526 leinfelder
		class CommonRootSolrField {
103 7522 leinfelder
		}
104 7526 leinfelder
		class FullTextSolrField {
105
		}
106
		class MergeSolrField {
107
		}
108
		class ResolveSolrField {
109
		}
110
		class SolrFieldResourceMap {
111
		}
112 7521 leinfelder
113
	}
114
115 7526 leinfelder
	IDocumentSubprocessor <|-- AbstractDocumentSubprocessor
116
	AbstractDocumentSubprocessor <|-- ResourceMapSubprocessor
117
	AbstractDocumentSubprocessor <|-- ScienceMetadataDocumentSubprocessor
118
119
	ISolrField <|-- SolrField
120
	SolrField <|-- CommonRootSolrField
121
	SolrField <|-- FullTextSolrField
122
	SolrField <|-- MergeSolrField
123
	SolrField <|-- ResolveSolrField
124
	SolrField <|-- SolrFieldResourceMap
125 7521 leinfelder
126 7526 leinfelder
	AbstractDocumentSubprocessor o--"*" ISolrField
127 7522 leinfelder
128 7527 tao
	package edu.ucsb.nceas.metacat.indexer {
129 7522 leinfelder
130 7526 leinfelder
		class MetacatIndex {
131 7527 tao
			- List<IDocumentSubprocessor> subprocessors
132
			- List<SolrField> sysmetaFields
133
			- SolorFiledParser solrFieldParser
134
			- EmbeddedSolrServer solrServer
135
			+ insert(String pid, InputStream data)
136
			+ update(String pid, InputStream data)
137 7526 leinfelder
			+ remove(String pid)
138
			+ OutputStream query(String solrQuery)
139 7522 leinfelder
		}
140
141 7527 tao
		class SolrFieldParser {
142
		     - List<SolrField> solrFields
143
		     + SolrFieldParser(InputStream config)
144
		     + List<SolrField> getSolrFields()
145 7522 leinfelder
		}
146 7526 leinfelder
147
	}
148
149
150
	package solr {
151
152
		abstract class SolrServer {
153
			+ add(SolrInputDocument doc)
154
			+ deleteByQuery(String id)
155
			+ query(SolrQuery query)
156 7522 leinfelder
		}
157 7526 leinfelder
		class EmbeddedSolrServer {
158 7522 leinfelder
		}
159 7526 leinfelder
		class HttpSolrServer {
160 7522 leinfelder
		}
161
162
	}
163
164 7526 leinfelder
	SolrServer <|-- EmbeddedSolrServer
165
	SolrServer <|-- HttpSolrServer
166
167 7527 tao
168 7521 leinfelder
169
  @enduml