Project

General

Profile

Bug #5582

DataTurbine server crashing the JRE

Added by Derik Barseghian about 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
Normal
Category:
sensor-view
Target version:
Start date:
03/28/2012
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
5582

Description

I don't think this is a Kepler bug but I want to record my notes somewhere:

I was able to get the DataTurbine server to crash the JRE around 10 times today. Sometimes the JRE would leave a log file, I submitted one to Sun. I was using DT 3.2b5 but moved to 3.2b6, but this didn't seem to help. Likewise I moved the client (Kepler DT actor) from 3.2b5 to 3.2b6, to no avail. I was able to crash the server using both Ubuntu w/ java 1.6.0_26-b03 and osX 10.6 w/ java 1.6.0_29-b11-402-10M3527. I'm in the process of doing some cleanup on the DT actor, but I verified these crashes would occur with none of the new changes. Generally the procedure was I'd run the growingDegreeDays workflow requesting 1 day of data. After about 6-10 iterations of this workflow in succession the server would crash.

This hasn't been well vetted but what seems to have "fixed" it is starting the server with a larger archiveSize -- I bumped from 500000 to 1000000. Recently I loaded a lot more data to the reap02 channels, so perhaps that's why this has only recently cropped up. I've now run the GDD workflow requesting 100 days of data 10 times, and no crash w/ the 3.2b6 DT server on my mac. I'm going to run a long stress test over night and see what happens.

hs_err_pid2306.log (29.5 KB) hs_err_pid2306.log Derik Barseghian, 07/27/2012 11:33 AM
hs_err_pid7368.log (25.6 KB) hs_err_pid7368.log Derik Barseghian, 07/27/2012 11:33 AM
hs_err_pid3520.log (26.7 KB) hs_err_pid3520.log Derik Barseghian, 07/27/2012 11:35 AM

History

#1 Updated by Derik Barseghian about 7 years ago

A correction: I changed cacheSize to 1000000, not archiveSize (which my Source sets to cacheSize*10).

With that change, DT 3.2b6 has stopped crashing the JRE on my mac -- I ran 500 iterations of GDD requesting 300days each time with no failure.

However the change doesn't help nibbler (ubuntu 10.04 in a VM w/ 500MB RAM). On nibbler I also tried doubling the cache size to 2000000. Continued to crash. After that I tried recompiling my feeder program and running RBNB under jdk1.6.0_31. Continued to crash. I also tried halving the original size to 50000, but same story.

I tried on my dedicated ubuntu 10.04 ubuntu box w/ java 1.6.0_26-b03 w/ cacheSize 1000000, and successfully ran 500 iterations of GDD requesting 300days each time.

So the remaining crashing seems specific to nibbler. VM? RAM? I doubt it's java at this point, but to be thorough:

Working Mac:
nceas-macbook05:~ derik$ java -version
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11-402-10M3527)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02-402, mixed mode)

Working Ubuntu PC:
barseghian@ubuntu:~$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)

Not working nibbler Ubuntu VM:
barseghian@nibbler:~$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

Not working alternative nibbler:
barseghian@nibbler:~$ ./jdk-6u31/jdk1.6.0_31/bin/java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

#2 Updated by Derik Barseghian about 7 years ago

Nick allocated 1gb, and then 2gb to nibbler, but I continue to be able to crash DT with a few large requests.

#6 Updated by Derik Barseghian almost 7 years ago

Crashing on nibbler resolves when using older version of java (jdk-6u20-linux-x64.bin).
Probably there's an issue w/ newer versions of 64bit java and RBNB. See:
https://lists.sdsc.edu/pipermail/rbnb-dev/2012/001028.html

#7 Updated by Derik Barseghian almost 7 years ago

I ran 500 iterations of the GDD workflow configured to request 300days of data as a stress test. No crash using the older version of java on nibbler.

#8 Updated by Redmine Admin about 6 years ago

Original Bugzilla ID was 5582

Also available in: Atom PDF