Task #6448
closedFeature #5975: Implement 'Statistics' view
Task #6429: Create data year coverage chart in user profile
Create Solr query to retrieve temporal coverage data
Added by Lauren Walker over 10 years ago. Updated over 10 years ago.
0%
Updated by Lauren Walker over 10 years ago
- Status changed from New to In Progress
After looking at the Solr docs it seems the only way to retrieve facets on a range of dates like this is to perform a facet query for each year to facet the number of docs with a begin date that is before or included in that year AND an end date after or included in that year. Example:
Updated by Lauren Walker over 10 years ago
I should note that this is still a work in progress because that query can get very long and may exceed the browser's max URL length or may generate over a hundred facet queries if someone enters their metadata wrong or has a relatively early date (e.g. a user with one dataset in 1910 could generate a query that facets every year from 1910 to 2014 even though they may have one 1910 set and then one 2010 set and nothing in between)
Updated by ben leinfelder over 10 years ago
I was thinking more about this after we talked and wonder if we could combine startDate and endDate into the facet "field" and then process the ranges client-side to get a better visualization. You could end up with (numberOfYearsInRange^2) facets, but that's probably safer than constructing a GET request of indeterminate length.
Here's a snippet of the response I am imagining:
<lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="startYearToEndYear"> <int name="2010-2011">14</int> <int name="2010-2012">0</int> <int name="2010-2013">1</int> <int name="2011-2011">3</int> <int name="2011-2014">2</int> etc... </lst> </lst> </lst>
I think we'd have to modify the solr schema and/or indexing rules so that we can populate the "startYearToEndYear" field (possibly a dynamic field so we don't need to add it to the schema explicitly) with the values: beginDate/YEAR-endDate/YEAR. I tried doing this at query time, but that didn't see to work.
Updated by Lauren Walker over 10 years ago
It looks like most servers and browsers can accept URLs up to ~2,000 characters. (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)
So I can create a fairly long URL with facet.queries before I run into trouble, but for safe practice, I should start increasing my year facet bins at a certain total year range for the specified query:
range of years < 10: 1 year
range of years is between 11 and 20: 2 years
range of years is between 21 and 50: 5 years
range of years is between 50 and 100: 10 years
range of years > 100: 25 years
So I will never have to send more than 10 facet.queries in one URL. (One facet.query is 89 characters = &facet.query=(endDate:[*%20TO%20NOW-10YEARS/YEAR]+beginDate:[NOW-10YEARS/YEAR%20TO%20*]))
Updated by Lauren Walker over 10 years ago
- Status changed from In Progress to Resolved
- translation missing: en.field_remaining_hours set to 0.0
Here is an example query that would cover all of KNB:
Response:
1800: 0
1825: 1
1850: 2
1875: 9
1900: 15
1925: 31
1950: 73
1975: 182
2000: 434
2014: 18