Project

General

Profile

Task #6448

Feature #5975: Implement 'Statistics' view

Task #6429: Create data year coverage chart in user profile

Create Solr query to retrieve temporal coverage data

Added by Lauren Walker over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
03/07/2014
Due date:
% Done:

0%

Estimated time:

History

#1 Updated by Lauren Walker over 8 years ago

  • Status changed from New to In Progress

After looking at the Solr docs it seems the only way to retrieve facets on a range of dates like this is to perform a facet query for each year to facet the number of docs with a begin date that is before or included in that year AND an end date after or included in that year. Example:

https://knb.ecoinformatics.org/knb/d1/mn/v1/query/solr/rows=100000&fl=beginDate,endDate&q=testing&facet=true&facet.query=(endDate:[*%20TO%202010-12-31T23:59:59Z]+beginDate:[2010-01-01T00:00:00Z%20TO%20*])&facet.query=(endDate:[*%20TO%202011-12-31T23:59:59Z]+beginDate:[2011-01-01T00:00:00Z%20TO%20*])

#2 Updated by Lauren Walker over 8 years ago

I should note that this is still a work in progress because that query can get very long and may exceed the browser's max URL length or may generate over a hundred facet queries if someone enters their metadata wrong or has a relatively early date (e.g. a user with one dataset in 1910 could generate a query that facets every year from 1910 to 2014 even though they may have one 1910 set and then one 2010 set and nothing in between)

#3 Updated by ben leinfelder over 8 years ago

I was thinking more about this after we talked and wonder if we could combine startDate and endDate into the facet "field" and then process the ranges client-side to get a better visualization. You could end up with (numberOfYearsInRange^2) facets, but that's probably safer than constructing a GET request of indeterminate length.

Here's a snippet of the response I am imagining:


<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="startYearToEndYear">
        <int name="2010-2011">14</int>
        <int name="2010-2012">0</int>
        <int name="2010-2013">1</int>
        <int name="2011-2011">3</int>
        <int name="2011-2014">2</int>
etc...
  </lst>
 </lst>
</lst>

I think we'd have to modify the solr schema and/or indexing rules so that we can populate the "startYearToEndYear" field (possibly a dynamic field so we don't need to add it to the schema explicitly) with the values: beginDate/YEAR-endDate/YEAR. I tried doing this at query time, but that didn't see to work.

#4 Updated by Lauren Walker over 8 years ago

It looks like most servers and browsers can accept URLs up to ~2,000 characters. (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)

So I can create a fairly long URL with facet.queries before I run into trouble, but for safe practice, I should start increasing my year facet bins at a certain total year range for the specified query:

range of years < 10: 1 year
range of years is between 11 and 20: 2 years
range of years is between 21 and 50: 5 years
range of years is between 50 and 100: 10 years
range of years > 100: 25 years

So I will never have to send more than 10 facet.queries in one URL. (One facet.query is 89 characters = &facet.query=(endDate:[*%20TO%20NOW-10YEARS/YEAR]+beginDate:[NOW-10YEARS/YEAR%20TO%20*]))

#5 Updated by Lauren Walker over 8 years ago

  • translation missing: en.field_remaining_hours set to 0.0
  • Status changed from In Progress to Resolved

Also available in: Atom PDF