Project

General

Profile

Bug #5319

The workflow which archive sensor data into metacat can upload incorrect data set when new data is coming

Added by Jing Tao about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
sensor-view
Target version:
Start date:
02/22/2011
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
5319

Description

I used sensor simulator to create data set, then killed the simulator. It created data with timestamp from 2011-02-22 03:16:54 to 2011-02-22 03:17:52.

I ran the workflow and got the eml has the title:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-22 03:16:54" and "2011-02-22 03:17:52"

I checked the data file in metacat and it has the data
from 2011-02-22 03:16:54 to 2011-02-22 03:17:52

Everything looks good.

Then I ran the sensor simulator again and created some new data. Then I killed the simulator. It created data with timestamp from 2011-02-22 03:57:35 to 2011-02-22 03:58:33

I ran the workflow again and two eml documents were uploaded:
1. Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-22 03:16:54" and "2011-02-22 03:58:33"

2. Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-22 03:57:35" and "2011-02-22 03:58:33"

I double checked the data file in metacat and found the data files comply with the metadata.

From the title, we can see the dataset 1 combines both dataset 2 and the previous dataset which was uploaded in the last time.

So we have the duplicated data. The second run should only create the dataset 2.

History

#1 Updated by Jing Tao about 8 years ago

I use sensor simulator created data with timestamp from 2011-02-22 04:32:02 to2011-02-22 04:33:00.

Then I ran the workflow and three eml documents (data files) were uploaded to metacat.

The time intervals are:
1. from 2011-02-22 03:16:54 to 2011-02-22 04:33:00
2. from 2011-02-22 03:57:35 to 2011-02-22 04:33:00
3. from 2011-02-22 04:32:02 to 2011-02-22 04:33:00

documents 1 and 2 are duplicated data.

#2 Updated by Jing Tao about 8 years ago

Today, i fixed the issue that "last update" wasn't persistent.
I ran the workflow again and found:
First time:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:56:15" and "2011-02-23 03:58:13"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:57:15" and "2011-02-23 03:58:13"

Second time:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:56:15" and "2011-02-23 04:23:44"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:57:15" and "2011-02-23 03:58:13"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:21:46" and "2011-02-23 04:23:44"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:22:46" and "2011-02-23 04:23:44"

Third time:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:56:15" and "2011-02-23 04:38:35"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 03:57:15" and "2011-02-23 03:58:13"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:21:46" and "2011-02-23 04:38:35"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:22:46" and "2011-02-23 04:23:44"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:36:37" and "2011-02-23 04:38:35"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-23 04:37:37" and "2011-02-23 04:38:35"

#3 Updated by Jing Tao about 8 years ago

This logic issue on two inputs, start time and interval, of DataTurbineActor 3.

The start time is using the current time, but the interval is using current time - previous time. We change the start time is previous time and interval is current time - previous time.

It works. Here is the result after running 3 times of the workflow:
1
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:05:24" and "2011-02-25 10:06:22"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:06:24" and "2011-02-25 10:07:22"

2
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:07:22" and "2011-02-25 10:07:22"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:07:24" and "2011-02-25 10:32:22"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:32:24" and "2011-02-25 10:33:22"

3
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:33:22" and "2011-02-25 10:33:22"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:33:24" and "2011-02-25 10:47:06"
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:47:08" and "2011-02-25 10:48:06"

But it still has issue on the boundary:
data with timestamp 2011-02-25 10:33:22 has been upload in the second time run:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:32:24" and "2011-02-25 10:33:22

However, it was uploaded again in the third run:
Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:33:22" and "2011-02-25 10:33:22"

You can see the document Dataset for sensor:"sensor0" at site:"gpp" for time period "2011-02-25 10:33:22" and "2011-02-25 10:33:22" in the third run only has a single data.

I dug around and found:

When lastUploadedTime 2011-02-25 10:33:22 was passed to DataTurbineActor 2, three output came out:
2011-02-25 10:33:24
2011-02-25 10:47:07
2011-02-25 10:48:07

Actually there is not data or metadata with the timestamp 2011-02-25 10:33:24.
Is it a bug of DataTurbineActor?

#4 Updated by Jing Tao about 8 years ago

This is bug on TimeDifference class. I wrote another class MetadataRangesDeterminer class to replace it. This issue was fixed by stored the timeformat into database.

#5 Updated by Redmine Admin about 6 years ago

Original Bugzilla ID was 5319

Also available in: Atom PDF