# Educational Material

Videos and links to ppt presentations and code to run models

Additional written support materials can be found here.

The following tutorials are included below:

1. Practical guides to using the EcoCommons modelling dashboard tools

2. BCCVL overview of Species Distribution Modelling

3. R for Ecologists, an introduction to writing code in R

4. Species distribution modelling in R – learn how to go beyond the dashboard tools

### A practical guide to using EcoCommons dashboard modelling tools

### BCCVL overview of Species Distribution Modelling

An overview of SDM and each of the algorithms used

Welcome to the first module of this species distribution modelling course. In this module, we will give you an introduction to what species distribution modelling is and the steps involved in calibrating and mapping these models.

In the first module of this species distribution modelling course, we had a quick look at what species distribution modelling is. In this second module, I am going to explain the theory behind these models in a bit more detail.

In the first two modules we gained a better understanding of how you can use species distribution models and some of the ecological theory underpinning these models. In this module, we will have a closer look at the different types of data that you need to run a species distribution model, where to get this data from, things to be aware of and some standard good practices when dealing with data.

Now we know more about the theoretical background of species distribution models, and the different types of data that you need to built a model, it is time to design your model. What I mean with that is that you need to think about the question that you are trying to answer, and what kind of data and algorithm are best to find that answer. In this module we explore the main components of an SDM and the things you need to think about.

In the previous modules of this online open course in species distribution modelling, we have learnt a great deal about the kind of data that you need to build a model, and we had a quick look at the different kinds of models that you can use. So, now it is time to explain the algorithms in more detail. In this module we will look at models that only use presence data: geographic and profile models, and Maxent, a popular machine learning model.

In the previous module, we looked at models for which you only need to provide occurrence data to predict the distribution of a species. In this module, we will focus on statistical regression models, which use both presence and absence data.

In the previous two modules we have looked at geographic, profile and statistical regression models to predict species distributions. In this module, we will look at another group of species distribution models: machine learning models.

Now we have looked at the different models you can use to predict species distributions, it is important to understand how to interpret the output of a model. A vital step in modelling is assessing the accuracy of the model prediction, commonly called ‘validation’ or ‘evaluation’.

Welcome to module 9 of the online open course about species distribution modelling. In the first 8 modules of this course, we have learnt about the different aspects of designing a species distribution model: the data that you need, the different algorithms that you can use to predict species distributions and how to evaluate the outcomes of your model. With all that knowledge in our back pocket, we can now look at an important application of species distribution models: the prediction of species distribution under future climate change projections.

Welcome to the last module of this Online Open Course in Species Distribution Modelling. In this module, I am going to show 4 different case studies that highlight the variety of research questions and applications that can be addressed with species distribution models. And I will show you how you can run these models in the BCCVL, the Biodiversity and Climate Change Virtual Laboratory, an online tool that let’s you run species distribution models in a few easy steps.

### R for Ecologists

An introduction to R for participants with no experience programming in R.

Based on this module: https://datacarpentry.org/R-ecology-lesson/

In this module we show you how you can use R to manipulate your data.

- 0:00 How to open R on a high performance computer in the cloud
- 2:10 Setting up your working directory
- 4:30 Download files and read in csv. files
- 7:30 Selecting columns and filtering rows
- 9:38 Using the pipe operator
- 14:52 Using mutate to create new columns based on existing ones
- 22:24 Summarise and group_by function
- 37:50 Exporting csv.files

Find the R script here

In this second module of R for ecologists we show you how to visualize your data using the package ggplot2.

- Skip to 7:18 if you have watched the first module.
- 0:00 – Setting up working directory and recap on module 1
- 7:18 – Start with ggplot2

You can find the R script here

### Species Distribution Modelling in R

These scripts and video descriptions provide a variety of additional things to either think about or try beyond

what is available within the BCCVL modelling wizard.

The scripts in R & Jupyter formats are here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

A link to another example of running an SDM in R using Maxent as well as links to Python scripts to process remote sensing data from the Microsoft Planetary Computer, and passed along to us from NASA can be found here.

This module highlights the many important things to consider when working with occurrence data. The power of systematically collected data, representative sampling with both geographic and environmental space are discussed as well as the importance of matching your pseudo-absence or background points to the bias within your dataset.

Examples in R include:

1. Set up your environment

2. Download and filter occurrence data from the Atlas of Living Australia using the Galah R package

3. Conduct spatial thinning on your data to reduce the sampling in areas with extremely high levels of sampling.

4. Create a bias layer from which background points can be sampled.

5. Create a targeted background layer which background or pseudo-absence points can be created (identify areas where similar species were observed, but where the target species was not observed)

Slides available here

Step 1: This module highlights the many important things to consider when working with occurrence data. The power of systematically collected data, representative sampling with both geographic and environmental space are discussed as well as the importance of matching your pseudo-absence or background points to the bias within your dataset.

Examples in R include:

1. Set up your environment

2. Download and filter occurrence data from the Atlas of Living Australia using the Galah R package

3. Conduct spatial thinning on your data to reduce the sampling in areas with extremely high levels of sampling.

4. Create a bias layer from which background points can be sampled.

5. Create a targeted background layer which background or pseudo-absence points can be created (identify areas where similar species were observed, but where the target species was not observed).

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 2: In this module, we remind modellers to consider spatial autocorrelation, the importance of setting the extent, and the value of using neighbourhood functions to maintain fine resolutions while capturing larger spatial patterns.

R steps include:

1. Match, extent, resolution and coordinate reference system

2. Focal function to run neighborhood functions

3. Stack environmental variables

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 3: In this module we briefly provide an overview of the types of algorithms used in SDM and highlight some of the things to consider.

In R we demonstrate:

1. Fitting a Maxent model and using either bias layers or targeted background layers to select background points.

2. Fitting a BRT model with psedoabsence points selected from targetted background layer. We then touch on some of the BRT tuning that can be done as presented in https://doi.org/10.1111/j.1365-2656.2…

3. Fit a GLM model

4. Fit a GAM model with MGCV and penalized regression splines

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

Step 4: In this module we highlight how to evaluate your model, stressing the importance of visual and expert review. We also explain the confusion matrix and how it is used to generate a variety of evaluation statistics. We also explain AUC and the value of using independent data for validation.

R steps using a BRT model example include:

1. Identifying a threshold to generate binary (presence / absence) predictions

2. How to generate a confusion matrix for training data

3. Generate evaluation statistics from a confusion matrix including TSS & F1

4. Demonstrate downloading independent data

5. Repeat confusion matrix and evaluation statistics with independent data

The scripts in R & Jupyter formats are here

Slides available here

HTML version of all four scripts available: EcoCommons_steps_1_to_4

## Our partners

- EcoCommons Australia received investment (https://doi.org/10.47486/PL108) from the Australian Research Data Commons (ARDC). The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).