Sign up and follow our journey.
Sign in/Register

Educational Material

4. Mastering Species Distribution Modelling in R

These scripts and video descriptions provide a variety of additional things to either think about or try beyond

what is available within the BCCVL modelling wizard.

The scripts in R & Jupyter formats are here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

A link to another example of running an SDM in R using Maxent as well as links to Python scripts to process remote sensing data from the Microsoft Planetary Computer, and passed along to us from NASA can be found here.

This module highlights the many important things to consider when working with occurrence data. The power of systematically collected data, representative sampling with both geographic and environmental space are discussed as well as the importance of matching your pseudo-absence or background points to the bias within your dataset.

Examples in R include:
1. Set up your environment
2. Download and filter occurrence data from the Atlas of Living Australia using the Galah R package
3. Conduct spatial thinning on your data to reduce the sampling in areas with extremely high levels of sampling.
4. Create a bias layer from which background points can be sampled.
5. Create a targeted background layer which background or pseudo-absence points can be created (identify areas where similar species were observed, but where the target species was not observed)

Slides available here

Step 1: This module highlights the many important things to consider when working with occurrence data. The power of systematically collected data, representative sampling with both geographic and environmental space are discussed as well as the importance of matching your pseudo-absence or background points to the bias within your dataset.

Examples in R include:
1. Set up your environment
2. Download and filter occurrence data from the Atlas of Living Australia using the Galah R package
3. Conduct spatial thinning on your data to reduce the sampling in areas with extremely high levels of sampling.
4. Create a bias layer from which background points can be sampled.
5. Create a targeted background layer which background or pseudo-absence points can be created (identify areas where similar species were observed, but where the target species was not observed).

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 2: In this module, we remind modellers to consider spatial autocorrelation, the importance of setting the extent, and the value of using neighbourhood functions to maintain fine resolutions while capturing larger spatial patterns.

R steps include:
1. Match, extent, resolution and coordinate reference system
2. Focal function to run neighborhood functions
3. Stack environmental variables

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 3: In this module we briefly provide an overview of the types of algorithms used in SDM and highlight some of the things to consider.

In R we demonstrate:

1. Fitting a Maxent model and using either bias layers or targeted background layers to select background points.

2. Fitting a BRT model with psedoabsence points selected from targetted background layer. We then touch on some of the BRT tuning that can be done as presented in https://doi.org/10.1111/j.1365-2656.2…

3. Fit a GLM model

4. Fit a GAM model with MGCV and penalized regression splines

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 4: In this module we highlight how to evaluate your model, stressing the importance of visual and expert review. We also explain the confusion matrix and how it is used to generate a variety of evaluation statistics. We also explain AUC and the value of using independent data for validation.

R steps using a BRT model example include:
1. Identifying a threshold to generate binary (presence / absence) predictions
2. How to generate a confusion matrix for training data
3. Generate evaluation statistics from a confusion matrix including TSS & F1
4. Demonstrate downloading independent data
5. Repeat confusion matrix and evaluation statistics with independent data

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Our partners

 

  • Australian Research Data Commons
  • National Collaborative Research Infrastructure Strategy
  • EcoCommons Australia received investment (https://doi.org/10.47486/PL108) from the Australian Research Data Commons (ARDC). The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

Sign up to our newsletter