5 |
5 |
|
6 |
6 |
Metacat Indexing
|
7 |
7 |
===========================
|
8 |
|
Lorem ipsum
|
|
8 |
Metacat v2.1 introduces support for building a SOLR index of Metacat content.
|
|
9 |
While we continue to support the "pathquery" search mechanism, this will be phased out
|
|
10 |
in favor of the more efficient SOLR query interface.
|
9 |
11 |
|
10 |
|
SOLR background information
|
11 |
|
---------------------------
|
12 |
|
Features:
|
13 |
12 |
|
14 |
|
* something
|
15 |
|
* something
|
16 |
|
* more
|
17 |
|
* even more
|
|
13 |
Metacat deployments that opt to use the Metacat SOLR index will be able to take advantage
|
|
14 |
of:
|
18 |
15 |
|
19 |
|
Something to explain the advantage of solr over the old metacat index approach
|
|
16 |
* fast search performance
|
|
17 |
* built-in paging features
|
|
18 |
* customizable return formats (for advanced admins)
|
20 |
19 |
|
21 |
20 |
Indexed documents and fields
|
22 |
21 |
-----------------------------
|
23 |
|
Metacat reuses the default DataONE index which includes many common metadata formats
|
24 |
|
out-of-the-box
|
|
22 |
Metacat integrates the existing DataONE index library which includes many common metadata formats
|
|
23 |
out-of-the-box:
|
25 |
24 |
|
26 |
25 |
1. EML
|
27 |
26 |
2. FGDC
|
28 |
|
3. Dryad
|
|
27 |
3. Dryad*
|
29 |
28 |
|
30 |
29 |
|
31 |
30 |
Default indexed fields
|
32 |
31 |
-----------------------
|
33 |
|
Describe the existing fields like in the DataONE docs, with link to them
|
|
32 |
For a complete listing of the indexed fields, please see the DataONE documentation.
|
34 |
33 |
|
|
34 |
http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html
|
35 |
35 |
|
36 |
|
Index configuration overview
|
|
36 |
Metacat also reports on the currently-indexed fields, simply navigate to:
|
|
37 |
|
|
38 |
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.getQueryEngineDescription
|
|
39 |
|
|
40 |
with "solr" as the engine.
|
|
41 |
|
|
42 |
Index configuration
|
37 |
43 |
----------------------------
|
38 |
|
Describe the configuration files and extension points for the implementation
|
|
44 |
Metacat-index is deployed as a separate web application (metacat-index.war) and should be deployed
|
|
45 |
as a sibling of the Metacat webapp (knb.war). Deploying metacat-index.war is only required when SOLR support
|
|
46 |
is desired and can safely be omitted if it will not be utilized for any given Metacat deployment.
|
39 |
47 |
|
|
48 |
During the initial installation/upgrade, an empty index will be initialized in the configured "solr-home" location.
|
|
49 |
Metacat-index will index all the existing Metacat content when the webapp next initializes.
|
|
50 |
Note: the configured solr-home directory should not exist before configuring Metacat with indexing for the first time,
|
|
51 |
otherwise the blank index will not be created for metacat-index to utilize.
|
40 |
52 |
|
|
53 |
Additional advanced configuration options are available in the metacat.properties file (shared between Metacat and Metacat-index).
|
|
54 |
|
|
55 |
|
41 |
56 |
Adding additional document types and fields
|
42 |
57 |
--------------------------------------------
|
43 |
|
Step-by-step guide for adding new documents and indexed fields.
|
|
58 |
TBD: Step-by-step guide for adding new documents and indexed fields.
|
44 |
59 |
|
45 |
60 |
|
46 |
61 |
Querying the index
|
47 |
62 |
--------------------
|
48 |
|
Provide example SOLR queries and expected results. Show a variety of return types
|
49 |
|
and query facets.
|
|
63 |
The SOLR index can be queried using standard SOLR syntax and return options.
|
|
64 |
The DataONE query interface exposes the SOLR query engine.
|
50 |
65 |
|
|
66 |
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.query
|
51 |
67 |
|
|
68 |
Please see the SOLR documentation for examples and exhaustive syntax information.
|
|
69 |
|
|
70 |
http://lucene.apache.org/solr/
|
|
71 |
|
|
72 |
|
52 |
73 |
Access Policy enforcement
|
53 |
74 |
-------------------------
|
54 |
|
Explain how access control is processed and honored when utilizing the index.
|
|
75 |
Access control is enforced by the index such that only records that are readable by the
|
|
76 |
user performing the query are returned to the user. Any SOLR query submitted will be
|
|
77 |
augmented with access control criteria corresponding to if and how the user is currently
|
|
78 |
authenticated. Both certificate-based (DataONE API) and JSESSIONID-based (Metacat API)
|
|
79 |
authentication are simultaneously supported.
|
55 |
80 |
|
56 |
81 |
|
57 |
82 |
Regenerating the index from scratch
|
58 |
83 |
-----------------------------------
|
59 |
|
When the SOLR index has been drastically modified, a complete regenration of the
|
|
84 |
When the SOLR index has been drastically modified, a complete regeneration of the
|
60 |
85 |
index may be necessary. In order to accomplish this:
|
61 |
86 |
|
62 |
|
Step-by-step instructions
|
|
87 |
Step-by-step instructions:
|
63 |
88 |
|
64 |
|
NOTE: this may take a long time depending on the size of your Metacat store.
|
|
89 |
1. Entirely remove the solr-home directory
|
|
90 |
2. Step through the Metacat admin interface main properties screen, specifying the solr-home directory you wish to use
|
|
91 |
3. Restart the webapp container (Tomcat).
|
65 |
92 |
|
|
93 |
Content can also be submitted for index regeneration by using the the Metacat API:
|
66 |
94 |
|
|
95 |
1. Login as the Metacat administrator
|
|
96 |
2. Navigate to: <host>/<metacat_context>/metacat?action=reindex[&pid={pid}]
|
|
97 |
3. If the pid parameter is omitted, all objects in Metacat will be submitted for reindexing.
|
67 |
98 |
|
|
99 |
|
|
100 |
|
68 |
101 |
Class design overview
|
69 |
102 |
----------------------
|
70 |
103 |
|
... | ... | |
163 |
196 |
SolrServer <|-- EmbeddedSolrServer
|
164 |
197 |
SolrServer <|-- HttpSolrServer
|
165 |
198 |
|
166 |
|
package "Stand-alone indexer (webapp or daemon)" {
|
|
199 |
package "Metact-index (webapp)" {
|
167 |
200 |
|
168 |
201 |
class ApplicationController {
|
169 |
202 |
- List<SolrIndex> solrIndex
|
... | ... | |
180 |
213 |
|
181 |
214 |
class SystemMetadataEventListener {
|
182 |
215 |
- SolrIndex solrIndex
|
183 |
|
- IMap hzSystemMetadata
|
184 |
|
- IMap hzObjectPath
|
185 |
|
+ entryAdded(EntryEvent<Identifier, SystemMetadata>)
|
186 |
|
+ entryUpdated(EntryEvent<Identifier, SystemMetadata>)
|
187 |
|
+ entryRemoved(EntryEvent<Identifier, SystemMetadata>)
|
|
216 |
+ itemAdded(ItemEvent<SystemMetadata>)
|
|
217 |
+ itemRemoved(ItemEvent<SystemMetadata>)
|
188 |
218 |
}
|
189 |
219 |
|
190 |
220 |
}
|
... | ... | |
197 |
227 |
}
|
198 |
228 |
|
199 |
229 |
class HazelcastService {
|
|
230 |
- IMap hzIndexQueue
|
200 |
231 |
- IMap hzSystemMetadata
|
|
232 |
- IMap hzObjectPath
|
201 |
233 |
}
|
202 |
234 |
|
203 |
|
class ObjectPathMap {
|
204 |
|
- IMap hzObjectPath
|
205 |
|
}
|
206 |
235 |
}
|
207 |
236 |
|
208 |
237 |
MetacatSolrIndex o--"1" SolrServer
|
209 |
238 |
HazelcastService .. SystemMetadataEventListener
|
210 |
|
ObjectPathMap .. SystemMetadataEventListener
|
211 |
239 |
|
212 |
240 |
ApplicationController o--"*" SolrIndex
|
213 |
241 |
SolrIndex o--"1" SolrServer
|
clean-up and flesh-out the metacat-index docs. https://projects.ecoinformatics.org/ecoinfo/issues/5884