Story #6548

Expand ORE model to allow relationships for derived datasets

Added by Lauren Walker over 6 years ago. Updated almost 5 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)


Design a new model for Metacat's OREs where relationships:

- wasGeneratedBy
- derivedFrom
- used
- etc.

are used to describe datasets that are derived from raw data and metadata. These relationships may span OREs (e.g. an analyst may create data visualizations of another scientist's data and create a new package for those).


Task #6555: Convert PNG diagrams in ORE Model Expansion doc to plantUML diagramsClosedLauren Walker

Task #6586: Index PROV relationshipsResolvedLauren Walker


#1 Updated by ben leinfelder over 6 years ago

  • Target version set to 2.5.0

Looking at the ORE spec, this is certainly acceptable practice - they just want to make sure that there isn't any orphaned node so that everything can trace back to either the aggregation or one of the aggregated resources within it. I believe we are also allowed to refer to objects that are aggregatedBy other resourceMaps, though I don't know if the derived resource map needs to also state that it "aggregates" the resource that it is deriving products from. Either way, I think this will be great.

Another thing to consider is doing ALL the semantic annotation assertions in the OREs. Not that it would be required, but it could be convenient since we already have a good precedent with folks starting to generate OREs for DataONE. My one concern is that the index parser would need to know how to handle the existing ORE packaging assertions as well as any SPARQL-based index processing we would want to do.

#2 Updated by Lauren Walker over 6 years ago

  • Status changed from New to In Progress

A page in the metacat docs has been added to describe changes to Metacat's ORE model.

#3 Updated by Lauren Walker about 6 years ago

The model will need a second revision. Matt and I talked today about the "activities" in our model, which represent programs/scripts. There needs to be a place in the model for "runs."

A run would represent a single execution of a program. It would have properties like a start time, end time, parameters used, etc. Each run could possibly have unique parameters each time, especially if one or more functions creates a random number.

A program, (e.g. an R script), is not exactly an activity but another entity/data object.

If our model could store runs and separate the idea between programs and runs, data output can be reproducible since the run will have all the information needed to execute the program again using the exact same parameters.

#4 Updated by ben leinfelder almost 5 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF