1 |
7521
|
leinfelder
|
.. raw:: latex
|
2 |
|
|
|
3 |
|
|
\newpage
|
4 |
|
|
|
5 |
|
|
|
6 |
7501
|
leinfelder
|
Metacat Indexing
|
7 |
|
|
===========================
|
8 |
8101
|
leinfelder
|
Metacat v2.1 introduces support for building a SOLR index of Metacat content.
|
9 |
|
|
While we continue to support the "pathquery" search mechanism, this will be phased out
|
10 |
|
|
in favor of the more efficient SOLR query interface.
|
11 |
7501
|
leinfelder
|
|
12 |
|
|
|
13 |
8101
|
leinfelder
|
Metacat deployments that opt to use the Metacat SOLR index will be able to take advantage
|
14 |
|
|
of:
|
15 |
7501
|
leinfelder
|
|
16 |
8101
|
leinfelder
|
* fast search performance
|
17 |
|
|
* built-in paging features
|
18 |
|
|
* customizable return formats (for advanced admins)
|
19 |
7501
|
leinfelder
|
|
20 |
|
|
Indexed documents and fields
|
21 |
|
|
-----------------------------
|
22 |
8101
|
leinfelder
|
Metacat integrates the existing DataONE index library which includes many common metadata formats
|
23 |
|
|
out-of-the-box:
|
24 |
7501
|
leinfelder
|
|
25 |
|
|
1. EML
|
26 |
|
|
2. FGDC
|
27 |
8101
|
leinfelder
|
3. Dryad*
|
28 |
7501
|
leinfelder
|
|
29 |
|
|
|
30 |
|
|
Default indexed fields
|
31 |
|
|
-----------------------
|
32 |
8101
|
leinfelder
|
For a complete listing of the indexed fields, please see the DataONE documentation.
|
33 |
7501
|
leinfelder
|
|
34 |
8101
|
leinfelder
|
http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html
|
35 |
7501
|
leinfelder
|
|
36 |
8101
|
leinfelder
|
Metacat also reports on the currently-indexed fields, simply navigate to:
|
37 |
|
|
|
38 |
|
|
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.getQueryEngineDescription
|
39 |
|
|
|
40 |
|
|
with "solr" as the engine.
|
41 |
|
|
|
42 |
|
|
Index configuration
|
43 |
7501
|
leinfelder
|
----------------------------
|
44 |
8101
|
leinfelder
|
Metacat-index is deployed as a separate web application (metacat-index.war) and should be deployed
|
45 |
8265
|
leinfelder
|
as a sibling of the Metacat webapp (metacat.war). Deploying metacat-index.war is only required when SOLR support
|
46 |
9162
|
leinfelder
|
is desired (e.g., for MetacatUI) and can safely be omitted if it will not be utilized for any given Metacat deployment.
|
47 |
7501
|
leinfelder
|
|
48 |
9162
|
leinfelder
|
|
49 |
8101
|
leinfelder
|
During the initial installation/upgrade, an empty index will be initialized in the configured "solr-home" location.
|
50 |
|
|
Metacat-index will index all the existing Metacat content when the webapp next initializes.
|
51 |
|
|
Note: the configured solr-home directory should not exist before configuring Metacat with indexing for the first time,
|
52 |
|
|
otherwise the blank index will not be created for metacat-index to utilize.
|
53 |
7501
|
leinfelder
|
|
54 |
8101
|
leinfelder
|
Additional advanced configuration options are available in the metacat.properties file (shared between Metacat and Metacat-index).
|
55 |
|
|
|
56 |
|
|
|
57 |
7501
|
leinfelder
|
Adding additional document types and fields
|
58 |
|
|
--------------------------------------------
|
59 |
8101
|
leinfelder
|
TBD: Step-by-step guide for adding new documents and indexed fields.
|
60 |
7501
|
leinfelder
|
|
61 |
|
|
|
62 |
|
|
Querying the index
|
63 |
|
|
--------------------
|
64 |
8101
|
leinfelder
|
The SOLR index can be queried using standard SOLR syntax and return options.
|
65 |
|
|
The DataONE query interface exposes the SOLR query engine.
|
66 |
7501
|
leinfelder
|
|
67 |
8101
|
leinfelder
|
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.query
|
68 |
7501
|
leinfelder
|
|
69 |
8101
|
leinfelder
|
Please see the SOLR documentation for examples and exhaustive syntax information.
|
70 |
|
|
|
71 |
|
|
http://lucene.apache.org/solr/
|
72 |
|
|
|
73 |
|
|
|
74 |
7501
|
leinfelder
|
Access Policy enforcement
|
75 |
|
|
-------------------------
|
76 |
8101
|
leinfelder
|
Access control is enforced by the index such that only records that are readable by the
|
77 |
|
|
user performing the query are returned to the user. Any SOLR query submitted will be
|
78 |
|
|
augmented with access control criteria corresponding to if and how the user is currently
|
79 |
|
|
authenticated. Both certificate-based (DataONE API) and JSESSIONID-based (Metacat API)
|
80 |
|
|
authentication are simultaneously supported.
|
81 |
7501
|
leinfelder
|
|
82 |
|
|
|
83 |
|
|
Regenerating the index from scratch
|
84 |
|
|
-----------------------------------
|
85 |
8101
|
leinfelder
|
When the SOLR index has been drastically modified, a complete regeneration of the
|
86 |
7501
|
leinfelder
|
index may be necessary. In order to accomplish this:
|
87 |
|
|
|
88 |
8101
|
leinfelder
|
Step-by-step instructions:
|
89 |
7501
|
leinfelder
|
|
90 |
8101
|
leinfelder
|
1. Entirely remove the solr-home directory
|
91 |
|
|
2. Step through the Metacat admin interface main properties screen, specifying the solr-home directory you wish to use
|
92 |
|
|
3. Restart the webapp container (Tomcat).
|
93 |
7521
|
leinfelder
|
|
94 |
8101
|
leinfelder
|
Content can also be submitted for index regeneration by using the the Metacat API:
|
95 |
7521
|
leinfelder
|
|
96 |
8101
|
leinfelder
|
1. Login as the Metacat administrator
|
97 |
|
|
2. Navigate to: <host>/<metacat_context>/metacat?action=reindex[&pid={pid}]
|
98 |
|
|
3. If the pid parameter is omitted, all objects in Metacat will be submitted for reindexing.
|
99 |
7521
|
leinfelder
|
|
100 |
8101
|
leinfelder
|
|
101 |
|
|
|
102 |
7521
|
leinfelder
|
Class design overview
|
103 |
|
|
----------------------
|
104 |
|
|
|
105 |
|
|
.. figure:: images/indexing-class-diagram.png
|
106 |
|
|
|
107 |
|
|
Figure 1. Class design overview.
|
108 |
|
|
|
109 |
|
|
..
|
110 |
|
|
@startuml images/indexing-class-diagram.png
|
111 |
|
|
|
112 |
7531
|
leinfelder
|
package "Current cn-index-processor (library)" {
|
113 |
7526
|
leinfelder
|
|
114 |
|
|
interface IDocumentSubprocessor {
|
115 |
|
|
+ boolean canProcess(Document doc)
|
116 |
|
|
+ initExpression(XPath xpath)
|
117 |
|
|
+ Map<String, SolrDoc> processDocument(String identifier, Map<String, SolrDoc> docs, Document doc)
|
118 |
7521
|
leinfelder
|
}
|
119 |
7526
|
leinfelder
|
class AbstractDocumentSubprocessor {
|
120 |
|
|
- List<SolrField> fields
|
121 |
7528
|
tao
|
+ setMatchDocument(String matchDocument)
|
122 |
|
|
+ setFieldList(List<SolrField> fieldList)
|
123 |
7521
|
leinfelder
|
}
|
124 |
7526
|
leinfelder
|
class ResourceMapSubprocessor {
|
125 |
|
|
}
|
126 |
|
|
class ScienceMetadataDocumentSubprocessor {
|
127 |
|
|
}
|
128 |
|
|
|
129 |
|
|
interface ISolrField {
|
130 |
|
|
+ initExpression(XPath xpathObject)
|
131 |
|
|
+ List<SolrElementField> getFields(Document doc, String identifier)
|
132 |
|
|
}
|
133 |
|
|
class SolrField {
|
134 |
|
|
- String name
|
135 |
7521
|
leinfelder
|
- String xpath
|
136 |
7526
|
leinfelder
|
- boolean multivalue
|
137 |
7521
|
leinfelder
|
}
|
138 |
7526
|
leinfelder
|
class CommonRootSolrField {
|
139 |
7522
|
leinfelder
|
}
|
140 |
7531
|
leinfelder
|
class RootElement {
|
141 |
|
|
}
|
142 |
|
|
class LeafElement {
|
143 |
|
|
}
|
144 |
7526
|
leinfelder
|
class FullTextSolrField {
|
145 |
|
|
}
|
146 |
|
|
class MergeSolrField {
|
147 |
|
|
}
|
148 |
|
|
class ResolveSolrField {
|
149 |
|
|
}
|
150 |
|
|
class SolrFieldResourceMap {
|
151 |
|
|
}
|
152 |
7528
|
tao
|
|
153 |
|
|
class SolrDoc {
|
154 |
|
|
- List<SolrElementField> fieldList
|
155 |
|
|
}
|
156 |
|
|
|
157 |
|
|
class SolrElementField {
|
158 |
|
|
- String name
|
159 |
|
|
- String value
|
160 |
|
|
}
|
161 |
7521
|
leinfelder
|
|
162 |
|
|
}
|
163 |
|
|
|
164 |
7526
|
leinfelder
|
IDocumentSubprocessor <|-- AbstractDocumentSubprocessor
|
165 |
|
|
AbstractDocumentSubprocessor <|-- ResourceMapSubprocessor
|
166 |
|
|
AbstractDocumentSubprocessor <|-- ScienceMetadataDocumentSubprocessor
|
167 |
|
|
|
168 |
|
|
ISolrField <|-- SolrField
|
169 |
|
|
SolrField <|-- CommonRootSolrField
|
170 |
7531
|
leinfelder
|
CommonRootSolrField o--"1" RootElement
|
171 |
|
|
RootElement o--"*" LeafElement
|
172 |
7526
|
leinfelder
|
SolrField <|-- FullTextSolrField
|
173 |
|
|
SolrField <|-- MergeSolrField
|
174 |
|
|
SolrField <|-- ResolveSolrField
|
175 |
7531
|
leinfelder
|
SolrField <|-- SolrFieldResourceMap
|
176 |
7521
|
leinfelder
|
|
177 |
7526
|
leinfelder
|
AbstractDocumentSubprocessor o--"*" ISolrField
|
178 |
7522
|
leinfelder
|
|
179 |
7528
|
tao
|
IDocumentSubprocessor --> SolrDoc
|
180 |
|
|
|
181 |
|
|
SolrDoc o--"*" SolrElementField
|
182 |
|
|
|
183 |
7531
|
leinfelder
|
package "SOLR (library)" {
|
184 |
7528
|
tao
|
|
185 |
|
|
abstract class SolrServer {
|
186 |
|
|
+ add(SolrInputDocument doc)
|
187 |
|
|
+ deleteByQuery(String id)
|
188 |
|
|
+ query(SolrQuery query)
|
189 |
|
|
}
|
190 |
|
|
class EmbeddedSolrServer {
|
191 |
|
|
}
|
192 |
|
|
class HttpSolrServer {
|
193 |
|
|
}
|
194 |
|
|
|
195 |
|
|
}
|
196 |
|
|
|
197 |
|
|
SolrServer <|-- EmbeddedSolrServer
|
198 |
|
|
SolrServer <|-- HttpSolrServer
|
199 |
|
|
|
200 |
8101
|
leinfelder
|
package "Metact-index (webapp)" {
|
201 |
7522
|
leinfelder
|
|
202 |
7535
|
tao
|
class ApplicationController {
|
203 |
|
|
- List<SolrIndex> solrIndex
|
204 |
|
|
+ regenerateIndex()
|
205 |
|
|
}
|
206 |
|
|
|
207 |
7531
|
leinfelder
|
class SolrIndex {
|
208 |
7527
|
tao
|
- List<IDocumentSubprocessor> subprocessors
|
209 |
7531
|
leinfelder
|
- SolrServer solrServer
|
210 |
7527
|
tao
|
+ insert(String pid, InputStream data)
|
211 |
|
|
+ update(String pid, InputStream data)
|
212 |
7526
|
leinfelder
|
+ remove(String pid)
|
213 |
7522
|
leinfelder
|
}
|
214 |
7531
|
leinfelder
|
|
215 |
7532
|
leinfelder
|
class SystemMetadataEventListener {
|
216 |
7531
|
leinfelder
|
- SolrIndex solrIndex
|
217 |
8101
|
leinfelder
|
+ itemAdded(ItemEvent<SystemMetadata>)
|
218 |
|
|
+ itemRemoved(ItemEvent<SystemMetadata>)
|
219 |
7522
|
leinfelder
|
}
|
220 |
7526
|
leinfelder
|
|
221 |
|
|
}
|
222 |
|
|
|
223 |
7531
|
leinfelder
|
package "Metacat (webapp)" {
|
224 |
|
|
|
225 |
|
|
class MetacatSolrIndex {
|
226 |
7532
|
leinfelder
|
- SolrServer solrServer
|
227 |
|
|
+ InputStream query(SolrQuery)
|
228 |
7531
|
leinfelder
|
}
|
229 |
|
|
|
230 |
|
|
class HazelcastService {
|
231 |
8101
|
leinfelder
|
- IMap hzIndexQueue
|
232 |
7532
|
leinfelder
|
- IMap hzSystemMetadata
|
233 |
8101
|
leinfelder
|
- IMap hzObjectPath
|
234 |
7531
|
leinfelder
|
}
|
235 |
7532
|
leinfelder
|
|
236 |
7531
|
leinfelder
|
}
|
237 |
7528
|
tao
|
|
238 |
7532
|
leinfelder
|
MetacatSolrIndex o--"1" SolrServer
|
239 |
|
|
HazelcastService .. SystemMetadataEventListener
|
240 |
7522
|
leinfelder
|
|
241 |
7535
|
tao
|
ApplicationController o--"*" SolrIndex
|
242 |
7532
|
leinfelder
|
SolrIndex o--"1" SolrServer
|
243 |
|
|
SolrIndex "1"--o SystemMetadataEventListener
|
244 |
7531
|
leinfelder
|
SolrIndex o--"*" IDocumentSubprocessor: Assembled using Spring bean configuration
|
245 |
7522
|
leinfelder
|
|
246 |
7526
|
leinfelder
|
|
247 |
7531
|
leinfelder
|
|
248 |
7521
|
leinfelder
|
|
249 |
|
|
@enduml
|