Project

General

Profile

Firstwrkshnotes » History » Revision 17

Revision 16 (Corinna Gries, 03/11/2014 08:14 AM) → Revision 17/20 (Corinna Gries, 03/11/2014 08:18 AM)

h1. Workshop Notes 

 h1. Breakout session 1: Metrics brainstorming 

 * what are you currently using 
 * what would you like to use 
 * how widely is it used 
 * can it be applied to different biological community datasets (sampling approach) 
 * is it already coded {in R} 


 h2. Metrics 

 # *Diversity* (all of these are generally in R, mostly in vegan) 
 ## Jaccard index 
 ## Simpson's diversity  
 ## Shannons index 
 ## Turnover - different ways to calculate 
 ## Dominance  
 ## Evenness 
 ## Richness 
 ## Rank abundance shift 
 ## Proportion of overall diversity 
 ## Beta diversity 
 # *Community metrics/ordination* 
 ## NMDS (vegan) 
 ## PCA (vegan) 
 ## Bray curtis (vegan) 
 ## Variance tracking, quantify variability change 
 ## Position in ordination-space 
 # *Spatial* 
 ## patch scale  
 ## spatial autoregression 
 ## Endemism 
 ## Summary of species' positions within their ranges 
 ## meta community statistics 
 # *Mechanistic models* 
 ## MAR, needs driver matrix, problem auto-corelation, mostly fresh water or marine (Eli Holmes has state-space MAR in R implemented, not sure if it's on CRAN)     http://cran.r-project.org/web/packages/MARSS/index.html 
 ## MANOVA (vegan? Also, permanova is in vegan) 
 ## Ecosystem function (e.g. N deposition) 
 ## interaction population models - inter specific competition (Ben Bolker's book and corresponding package) 
 ## Economically/legally relevant metrics (e.g. Maximum sustainable yield) 
 # *Food webs* 
 ## connectance 
 ## network analysis 
 # *Traits/phylogentic* 
 ## functional/phylogenetic diversity 
 ## species aggregation (functional groups, trophic levels 
 ## phylogenetic dispersion 
 ## Native/exotic 
 ## Phylogeographic history 
 # *Temporal indices* 
 ## species turnover 
 ## rate of return 
 ## Variance ratio 
 ## Mean-variance scaling 
 ## Spectral analysis 
 ## Regresssion windows (strucchange) 
 ## time series models of abundance -- metric would be parameters of model 
 # *null models* 
 # *Comparative analysis of small noise vs large noise systems. What drives differences?* 

 h2. Coded in R 

 * Richness/diversity metrics: http://cran.r-project.org/web/packages/vegan/index.html 
 * Diversity metrics (alpha, beta, gamma): http://cran.r-project.org/web/packages/vegetarian/index.html 
 * Hubble metrics: http://cran.r-project.org/web/packages/untb/index.html 
 * Leading indicators, variance, autocorrelation, skew, heteroscedasticity: http://cran.at.r-project.org/web/packages/earlywarnings/index.html 

 not yet coded: 
 * state-space models and community level resilience 
 * variance components analysis 

 h1. Breakout Session 2: Identify research questions 

 * # Group 1 
 ** ## Data set transformation to allow compute of many metrics 
 ** ## Time series analysis of community level metrics (consider higher freq data too)(earlywarnings R package) 
 * # Group 2 
 ** ## New R code for capturing climate variance at seasonal and interannual scales and residuals 
 ** ## R model for analyzing more spatial variability (Eric's LTER project) 
 ** ## Review of non-stationarity 
 *** ### Variance partitioning  
 *** ### Temporal and spatial variance 

 h1. Discussion and Feedback: Collaboration Approaches 

 *Most Important Limitations* 

 * Data 
 ** Lack of coordinated long-term measurements 
 ** Time necessary to find data 
 ** Determine usability of data, e.g. stations within a boundary envelope with at least 2 samples over 2 years 
 ** Time necessary to clean data 
 ** Quality control data and deal with problems 
 ** Data sharing permission issues  

 * Workflows 
 ** Need incentive to document as you work; would be different if pushed to KNB as work progresses and get credit for that work done 

 * Collaboration 
 ** Scattered resources: data and code in different locations, hard to move back and forth, hard to work on the code together, hard to know who's working on which parts of the code 
 ** Workspace integration and accessibility 
 ** Project management/tool integration 
 ** Time investment in learning different tools, training needs 
 ** Github is too technical 

 *Recommendations* 

 * Data 
 ** Dataset format: long format with columns for species and count/biomass, plus columns for site (plot, subplot, etc.) and date. Separate table with species name to be able to add functional groups, taxonomic rank, etc.. Separate table for site descriptions (manipulations, land use, etc.) 
 ** Gather additional data on biogeochemistry, climate etc. 
 ** Develop standard methods for dealing with outliers, large gaps, species names and spellings 
 ** Develop standards for classifying data points into aggregated  
 ** Create library of cleaned data sets that are massaged into one format 

 * Workflows 
 ** Create library of workflows that provide general cleaning routines that can be applied to arbitrary data, possibly interactive with some user input 
 ** Create library of workflows that make reshaping more accessible to people with little coding experience 
 ** Create library of workflows specifically for dealing with taxonomic names. 
 ** Link workflows to publications, e.g., via a website (repository) where scientists can publish citeable workflows (ecologicalworkflows.org, like myexperiment.org, but possibly more agnostic with respect to dependencies/tools that connect to it (package descriptions)) 
 ** Make this repository more accessible by keeping the 'ecology' emphasis, make workflows much more visible in existing repositories (KNB, DataONE) by linking to datasets. 
 ** Create library of workflows for training purposes (e.g. Dan Bunker's R tutorial), link to datasets in a repository 


 * Collaboration Tool 
 ** Pair programming: changes how you work; divide and conquer worked well 
 ** Git repository, has been used successfully in this workshop when some people were familiar with it a could bootstrap the use for other people quickly 
 ** Way to replicate or interface with services like {Google open refine, db constraints, taxize, TNRS) 
 ** Develop a 'Redmine' that is more useful for academics; becomes the point for integration of multiple tools; also BaseCamp/Trello, Digital notebook environments 
 ** Run workflows, organize outputs, communicate with collaborators 
 ** Ability to couple models at multiple scales (e.g., spatial or temporal scales), scale up computing as well 
 ** Incorporate writing process, version control for documents (Google docs is not sufficient) 
 ** Incorporate mechanisms to maintain social connection even in absence of face to face meetings 

 *Datasets* 

 * small mammal (VCR, SEV) 
 * arthropod data (CAP, KNZ, FCE) 
 * datasets on kelp published in ESA journal 
 * Cedar Creek :  
 ** species compostion data Accessible at: http://doi.org/10.6073/pasta/50db8bde41c9ea8b32dfbdde8bb0fad2 
 ** climate data accessible at http://doi.org/10.6073/pasta/24eb99ad3102cdcb2f8d02de93dd551e 
	
 * PISCO intertidal biodiversity surveys 
 ** Methods: http://cbsurveys.ucsc.edu/sampling/images/dataprotocols.pdf 
 ** Point contact data (percent cover, good for sessile/common spp): https://knb.ecoinformatics.org/m/#view/doi:10.6085/AA/pisco_intertidal.50.6 
 ** Quadrat data (percent cover, good for mobile spp): https://knb.ecoinformatics.org/m/#view/doi:10.6085/AA/pisco_intertidal.52.7 
 ** Swath data (extensive, only select rare species like seastars): https://knb.ecoinformatics.org/m/#view/doi:10.6085/AA/pisco_intertidal.51.6 

 * Konza 
 ** climate data (KNZ headquarters): doi:10.6073/pasta/ac19b27f2c28a63890d59ece32f5116b 
 ** Konza species composition (belowground experiment for N addition contrasts): doi:10.6073/pasta/b6653594d336bddf9d5f7f72c7d9200c Konza only collects cover for N addition treatments every 5 years, so we will abandon for now 


 *Detailed notes are on etherpad: https://epad.nceas.ucsb.edu/p/commdyn-20140105*