Revision 8101
Added by ben leinfelder almost 11 years ago
query-index.rst | ||
---|---|---|
5 | 5 |
|
6 | 6 |
Metacat Indexing |
7 | 7 |
=========================== |
8 |
Lorem ipsum |
|
8 |
Metacat v2.1 introduces support for building a SOLR index of Metacat content. |
|
9 |
While we continue to support the "pathquery" search mechanism, this will be phased out |
|
10 |
in favor of the more efficient SOLR query interface. |
|
9 | 11 |
|
10 |
SOLR background information |
|
11 |
--------------------------- |
|
12 |
Features: |
|
13 | 12 |
|
14 |
* something |
|
15 |
* something |
|
16 |
* more |
|
17 |
* even more |
|
13 |
Metacat deployments that opt to use the Metacat SOLR index will be able to take advantage |
|
14 |
of: |
|
18 | 15 |
|
19 |
Something to explain the advantage of solr over the old metacat index approach |
|
16 |
* fast search performance |
|
17 |
* built-in paging features |
|
18 |
* customizable return formats (for advanced admins) |
|
20 | 19 |
|
21 | 20 |
Indexed documents and fields |
22 | 21 |
----------------------------- |
23 |
Metacat reuses the default DataONE index which includes many common metadata formats
|
|
24 |
out-of-the-box |
|
22 |
Metacat integrates the existing DataONE index library which includes many common metadata formats
|
|
23 |
out-of-the-box:
|
|
25 | 24 |
|
26 | 25 |
1. EML |
27 | 26 |
2. FGDC |
28 |
3. Dryad |
|
27 |
3. Dryad*
|
|
29 | 28 |
|
30 | 29 |
|
31 | 30 |
Default indexed fields |
32 | 31 |
----------------------- |
33 |
Describe the existing fields like in the DataONE docs, with link to them
|
|
32 |
For a complete listing of the indexed fields, please see the DataONE documentation.
|
|
34 | 33 |
|
34 |
http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html |
|
35 | 35 |
|
36 |
Index configuration overview |
|
36 |
Metacat also reports on the currently-indexed fields, simply navigate to: |
|
37 |
|
|
38 |
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.getQueryEngineDescription |
|
39 |
|
|
40 |
with "solr" as the engine. |
|
41 |
|
|
42 |
Index configuration |
|
37 | 43 |
---------------------------- |
38 |
Describe the configuration files and extension points for the implementation |
|
44 |
Metacat-index is deployed as a separate web application (metacat-index.war) and should be deployed |
|
45 |
as a sibling of the Metacat webapp (knb.war). Deploying metacat-index.war is only required when SOLR support |
|
46 |
is desired and can safely be omitted if it will not be utilized for any given Metacat deployment. |
|
39 | 47 |
|
48 |
During the initial installation/upgrade, an empty index will be initialized in the configured "solr-home" location. |
|
49 |
Metacat-index will index all the existing Metacat content when the webapp next initializes. |
|
50 |
Note: the configured solr-home directory should not exist before configuring Metacat with indexing for the first time, |
|
51 |
otherwise the blank index will not be created for metacat-index to utilize. |
|
40 | 52 |
|
53 |
Additional advanced configuration options are available in the metacat.properties file (shared between Metacat and Metacat-index). |
|
54 |
|
|
55 |
|
|
41 | 56 |
Adding additional document types and fields |
42 | 57 |
-------------------------------------------- |
43 |
Step-by-step guide for adding new documents and indexed fields. |
|
58 |
TBD: Step-by-step guide for adding new documents and indexed fields.
|
|
44 | 59 |
|
45 | 60 |
|
46 | 61 |
Querying the index |
47 | 62 |
-------------------- |
48 |
Provide example SOLR queries and expected results. Show a variety of return types
|
|
49 |
and query facets.
|
|
63 |
The SOLR index can be queried using standard SOLR syntax and return options.
|
|
64 |
The DataONE query interface exposes the SOLR query engine.
|
|
50 | 65 |
|
66 |
http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNQuery.query |
|
51 | 67 |
|
68 |
Please see the SOLR documentation for examples and exhaustive syntax information. |
|
69 |
|
|
70 |
http://lucene.apache.org/solr/ |
|
71 |
|
|
72 |
|
|
52 | 73 |
Access Policy enforcement |
53 | 74 |
------------------------- |
54 |
Explain how access control is processed and honored when utilizing the index. |
|
75 |
Access control is enforced by the index such that only records that are readable by the |
|
76 |
user performing the query are returned to the user. Any SOLR query submitted will be |
|
77 |
augmented with access control criteria corresponding to if and how the user is currently |
|
78 |
authenticated. Both certificate-based (DataONE API) and JSESSIONID-based (Metacat API) |
|
79 |
authentication are simultaneously supported. |
|
55 | 80 |
|
56 | 81 |
|
57 | 82 |
Regenerating the index from scratch |
58 | 83 |
----------------------------------- |
59 |
When the SOLR index has been drastically modified, a complete regenration of the |
|
84 |
When the SOLR index has been drastically modified, a complete regeneration of the
|
|
60 | 85 |
index may be necessary. In order to accomplish this: |
61 | 86 |
|
62 |
Step-by-step instructions |
|
87 |
Step-by-step instructions:
|
|
63 | 88 |
|
64 |
NOTE: this may take a long time depending on the size of your Metacat store. |
|
89 |
1. Entirely remove the solr-home directory |
|
90 |
2. Step through the Metacat admin interface main properties screen, specifying the solr-home directory you wish to use |
|
91 |
3. Restart the webapp container (Tomcat). |
|
65 | 92 |
|
93 |
Content can also be submitted for index regeneration by using the the Metacat API: |
|
66 | 94 |
|
95 |
1. Login as the Metacat administrator |
|
96 |
2. Navigate to: <host>/<metacat_context>/metacat?action=reindex[&pid={pid}] |
|
97 |
3. If the pid parameter is omitted, all objects in Metacat will be submitted for reindexing. |
|
67 | 98 |
|
99 |
|
|
100 |
|
|
68 | 101 |
Class design overview |
69 | 102 |
---------------------- |
70 | 103 |
|
... | ... | |
163 | 196 |
SolrServer <|-- EmbeddedSolrServer |
164 | 197 |
SolrServer <|-- HttpSolrServer |
165 | 198 |
|
166 |
package "Stand-alone indexer (webapp or daemon)" {
|
|
199 |
package "Metact-index (webapp)" {
|
|
167 | 200 |
|
168 | 201 |
class ApplicationController { |
169 | 202 |
- List<SolrIndex> solrIndex |
... | ... | |
180 | 213 |
|
181 | 214 |
class SystemMetadataEventListener { |
182 | 215 |
- SolrIndex solrIndex |
183 |
- IMap hzSystemMetadata |
|
184 |
- IMap hzObjectPath |
|
185 |
+ entryAdded(EntryEvent<Identifier, SystemMetadata>) |
|
186 |
+ entryUpdated(EntryEvent<Identifier, SystemMetadata>) |
|
187 |
+ entryRemoved(EntryEvent<Identifier, SystemMetadata>) |
|
216 |
+ itemAdded(ItemEvent<SystemMetadata>) |
|
217 |
+ itemRemoved(ItemEvent<SystemMetadata>) |
|
188 | 218 |
} |
189 | 219 |
|
190 | 220 |
} |
... | ... | |
197 | 227 |
} |
198 | 228 |
|
199 | 229 |
class HazelcastService { |
230 |
- IMap hzIndexQueue |
|
200 | 231 |
- IMap hzSystemMetadata |
232 |
- IMap hzObjectPath |
|
201 | 233 |
} |
202 | 234 |
|
203 |
class ObjectPathMap { |
|
204 |
- IMap hzObjectPath |
|
205 |
} |
|
206 | 235 |
} |
207 | 236 |
|
208 | 237 |
MetacatSolrIndex o--"1" SolrServer |
209 | 238 |
HazelcastService .. SystemMetadataEventListener |
210 |
ObjectPathMap .. SystemMetadataEventListener |
|
211 | 239 |
|
212 | 240 |
ApplicationController o--"*" SolrIndex |
213 | 241 |
SolrIndex o--"1" SolrServer |
Also available in: Unified diff
clean-up and flesh-out the metacat-index docs. https://projects.ecoinformatics.org/ecoinfo/issues/5884