October 21, 2004 In a telephone conference call (participants: M. Jones, Schildhaouer, Higgins, Berkley, Spears, Pereira, Zhang, Tao) the status of the BEAM ENM/GARP Kepler pipeline was discussed with a goal of determining what can and should be completed prior to the December ENME meeting with domain scientists. A major change decided on during the conversation was to drop requirements for a parallelized workflow that would be capable of handling thousands of species. The desire is to develop a more general purpose method for handling parallel computations that will take more time to develop than is available for the NMM working group meeting. The plan is thus to get a workflow running that will handle a few species (~10) at spatial resolutions that do not require very large amounts of computer time (i.e. can be run on a single computer in time of less than 1 day). Another decision was to drop the effort for doing a statistical analysis of GARP output to calculate omission/comission matrices and use this data to find the 'best' rule sets for species distribution proedictions. It was notes that this is not currently part of the GARP code being used, but it is included in the new OpenModeler NNM code under development. Reproducing that effort for the Kepler workflow thus seemed to be a low priority since the existing GARP actor will eventually be replaced with the new code. Remaining tasks are thus in 3 major areas: I) Creation of species ocurrence data tables A) DiGIR data source modification to return tables of Darwin Core Long/Lat information by Species (Rod Spears) 1) Uses the Data tab to do a search by species 2) Resulting datasource to be dragged to work area and connected to ENM workflow B)DiGIR datatable with lat/long info to be used to created a species occurence file (CSV long,lat text file) for input to GARP actor in workflow. (Higgins) II) Collection of environment data layers, conversion to *.raw format needed by GARP, and creation of *.dxl summary file. A) Layer Descriptions Layers include current climate layers, DEM physical layers, and climate change projections (plus a mask layer that may clip to region about the existence occurence distribution). IPCC (http://ipcc-ddc.cru.uea.ac.uk/asres/baseline/climate_download.html) is source for current climate data and projected climate changes; Hydro1K (http://edcdaac.usgs.gov/gtopo30/hydro/) is source for physical layers (e.g. elevation info). Environmetal layer data is in widely different resolutions, while the *.raw files that must be input into GARP must all have the same resolution (input layers are binary; one byte per grid point with data scaled to 0-255). Existing *.raw layers used in Kepler GARP example: (0.1 degree resolution world wide - 3600 x 1800 cells) IPCC baseline climate data: (0.5 degree resolution world wide - 720 x 380 cells; data on SRB, described by eml on ecogrid (also can be downloaded directly from web at IPCC); monthly averages included in single file, custom processing needed to get seasonal annual averages) IPCC climate change models: (7 different climate change models, each for 3 different periods; data is a delta from the current values; current values at 1 degree resolution; changes data at 4-5 degree resolution); no eml descriptions yet; data can be downloaded from the web Hydro1K physical layers: (30 second (~0.0083 deg) resolution(~ 1 km) by continents); no eml descriptions yet B) Tasks/Actors 1) Create EML (metadata) for Hydro1K data, climate change data; enter in ecogrid and refence data copies on SRB (Deana/Jianting?) 2) Get SRB references (SRB URLs) in EML working so that climate data can be retrieved via eml datasources (Jing) 3) Creating specialized processing to convert grid information to ASCII row/col data (grass ascii grid format?) for each type of layer (java code for baseline climate data is done; converts to single file for each month, season, or year; for other layers, processing is TBD) (Dan?; Chad-Hydro1K) 4) Preprocess the climate change difference data so it can be used as layers. 5) Create a regridding (resampling) actor which takes various layer files in grass ascii format and converts to a common resolution output (for GARP input); must handle missing data; what algorithm?; what output format? 6) Create rescaling actor that meets the GARP byte output requirements; binary file output; create the layer description needed for layer input in the *.dxl file which summarizes the GARP input layers. 7) Build the 'mask' layer. [This can be a simple region (e.g. non-oceans) or we may want to mask to the convex hull around known data points plus some additional area for migration. Will need a convex hull actor plus some means for specifying additional area around the CH. This region is probably defined by a vecor; need to convert to a raster.] 8) Create actor/composite workflow for building all the layer *.raw files and the *.dxl file that must be used as GARP inputs. III) Actual GARP Calculations A) Hook outputs of I) and II) into existing GARP actors and produce ruleset, predicted occurence map (for single species) B) Use ruleset created in A) to recalculate for climate change data; prepare maps for comparison C) Build workflow to iterate over several species