# Example workflow of STEPCAM

#### 2016-09-21

The aim of this vignette is to show the user how the workflow of STEPCAM looks like, and to provide a worked out example, with output.

## What is STEPCAM

STEPCAM is a STEPwise Community Assembly Model. STEPCAM uses approximate Bayesian computation in order to infer the relative contribution of Dispersal Assembly (DA), Habitat Filtering (HF) and Limiting Similarity (LS), based on the combination of abundance and trait data. Starting from the full species pool, STEPCAM removes species in stepwise function, in order to recover the trait-distribution observed in the focal community. Choice of which species to remove depends on the relative contribution of dispersal assembly, habitat filtering, and limiting similary. At a dispersal assembly removal step, a species is randomly selected inversely proportional to the abundance of the species across all communities. At a habitat filtering removal step, a species that is most dissimilar from the communities’ optimal trait value is removed. At a limiting similarity removal step, a species that is most similar with the other remaining species is removed. The approximate Bayesian framework uses ABC-SMC to estimate the relative contribution of DA, HF and LS. The output is a posterior distribution across the three paremeters of interest.

## Required data

Although in typical use, the user will have his own data available, for clarity and portability, we will generate artificial data, and use that to perform STEPCAM. Using the function generate.Artificial.Data we generate a dataset, with 3 traits, 10 plots and 10 species:

  library(STEPCAM)
set.seed(42)
simul_data <- generate.Artificial.Data(n_species = 10, n_traits = 3, n_communities = 10,
occurence_distribution = 0.5, average_richness = 0.8,
sd_richness = 0.25, mechanism_random = FALSE)

For the sake of this vignette we want to display the ability of STEPCAM to estimate habitat filtering. Habitat Filtering causes species with too dissimilar trait values to be selected against in a community, or in a plot. Because generate.Artificial.Data generates random trait data, for our example it is more convenient to enter specific trait values that will aid in detecting Habitat Filtering:

  data_species <- simul_data$traits data_species$trait1 <- c(runif(8,0,1), 5, 10)
data_species$trait2 <- c(runif(8,0,1), -10, 30) data_species$trait3 <- c(runif(8,0,1), -20, 40)

The three traits selected here do not represent a specific trait, but we can clearly see that the two last entries (of species9 and species10) have very deviant values from the values of the rest of the species:

  data_species
##      species      trait1       trait2      trait3
## 1   species1  0.42137842   0.14957900   0.7987603
## 2   species2  0.06763682   0.49927288   0.1103351
## 3   species3  0.56143796   0.94056488   0.5397983
## 4   species4  0.07072189   0.33423133   0.5712339
## 5   species5  0.21139194   0.18843433   0.6189515
## 6   species6  0.54962041   0.26971618   0.7148537
## 7   species7  0.48198145   0.53074408   0.1233006
## 8   species8  0.15946985   0.02145023   0.3110496
## 9   species9  5.00000000 -10.00000000 -20.0000000
## 10 species10 10.00000000  30.00000000  40.0000000

To complete the construction of our artificial dataset, we have to set the abundance data to something a bit more meaningful. Our focus here is on plot 1, and for plot one we want to “trick” the dataset into showing strong Habitat Filtering. In order to do so we include all species, except species 9 and 10:

  data_abundances <- simul_data\$abundances
for (i in 1:8) {
data_abundances[1,i] <- 1
}
data_abundances[1,9] <- 0
data_abundances[1,10] <- 0
data_abundances
##    species1 species2 species3 species4 species5 species6 species7 species8
## 1         1        1        1        1        1        1        1        1
## 2         0        0        1        0        1        1        1        1
## 3         1        1        1        1        1        1        1        1
## 4         1        0        0        0        0        1        1        1
## 5         0        0        0        1        1        1        1        0
## 6         1        0        1        1        1        1        1        1
## 7         1        0        0        1        1        1        1        0
## 8         1        1        1        1        1        1        1        1
## 9         1        0        1        1        0        1        1        1
## 10        1        1        1        1        1        1        1        1
##    species9 species10
## 1         0         0
## 2         1         1
## 3         1         1
## 4         1         1
## 5         0         1
## 6         1         1
## 7         0         1
## 8         1         1
## 9         0         1
## 10        1         1

We now have all the basic ingredients to start a STEPCAM analysis. What STEPCAM basically does, is pick a random combination of parameters (DA, HF, LS), and simulate the stepwise removal (especially for a high relative contribution of DA, the outcome can be very stochastic). After stepwise removal, the resulting species pool is used to estimate four trait-distribution summary statistics, which in turn are compared with the values obtained for the original data. If they are similar enough, the parameter combination is accepted, and the algorithm moves on. As you will see, the algorithm goes through a range of iterations. Each iteration, the threshold of similarity is lowered (e.g. the difference between the simulated and observed data has to be smaller). As the threshold is lowered, the acceptance rate often drops, and good results are best obtained for extremely low acceptance rates, if computation time permits.

### Starting STEPCAM

  output <- STEPCAM_ABC(data_abundances, data_species,
numParticles = 100, n_traits = 3, plot_number = 1,
stopRate = 0.05, stop_at_iteration = 12, continue_from_file = FALSE)
##
## Generating Particles for iteration    1
## 0--------25--------50--------75--------100
## *****************************************  0.59 0.74 0.67     accept rate =  1
##
## Generating Particles for iteration    2
## 0--------25--------50--------75--------100
## *****************************************  0.56 0.75 0.69     accept rate =  1
##
## Generating Particles for iteration    3
## 0--------25--------50--------75--------100
## *****************************************  0.63 0.9 0.47      accept rate =  1
##
## Generating Particles for iteration    4
## 0--------25--------50--------75--------100
## *****************************************  0.46 1.07 0.47     accept rate =  0.7769231
##
## Generating Particles for iteration    5
## 0--------25--------50--------75--------100
## *****************************************  0.09 1.91 0    accept rate =  0.3136646
##
## Generating Particles for iteration    6
## 0--------25--------50--------75--------100
## *****************************************  0.07 1.93 0    accept rate =  0.6778523
##
## Generating Particles for iteration    7
## 0--------25--------50--------75--------100
## *****************************************  0.08 1.92 0    accept rate =  0.6558442
##
## Generating Particles for iteration    8
## 0--------25--------50--------75--------100
## *****************************************  0.1 1.9 0      accept rate =  0.6392405
##
## Generating Particles for iteration    9
## 0--------25--------50--------75--------100
## *****************************************  0.07 1.93 0    accept rate =  0.6516129
##
## Generating Particles for iteration    10
## 0--------25--------50--------75--------100
## *****************************************  0.02 1.98 0    accept rate =  0.7112676
##
## Generating Particles for iteration    11
## 0--------25--------50--------75--------100
## *****************************************  0.06 1.94 0    accept rate =  0.7372263
##
## Generating Particles for iteration    12
## 0--------25--------50--------75--------100
## *  50.5 50.5 50.5     accept rate =  1

The results above show that the simulation ran for 9 iterations, and at the 10th iteration the acceptance rate of newly proposed parameter values dropped below 1 in 20 (5%). Right next to the progressbars, the simulation provides us with some intermediate output: the mean estimates for DS, HF and LS, and the acceptance rate. Furthermore, we see that while the acceptance rate is 1, the relative contribution of our three parameters of interest is more or less equal to the prior. What is the prior then? The prior in this case is the non-informative starting point that STEPCAM takes as parameter combination in the first iteration. In our case we are removing two species in a stepwise fashion (remember that we set the presence of two species to 0?), and hence the initial expected frequency of the three processes is 2*(1/3,1/3,1/3), which is about (0.67,0.67,0.67). From iteration 7 onwards we see that the acceptance rate starts to drop, and that the relative contribution of the three processes starts to deviate from an equal distribution. By iteration 11 we have a very nice result of c(0.02,1.98,0), which is very much in line with our expectation of (0,2,0) (DS,HF,LS)! STEPCAM has thus correctly inferred the relative contribution of the three processes.

We can plot our results in a ternary plot:

TernPlot(output)

Because we have only 2 species to remove, only a limited number of combinations of (DS, HF, LS) are possible, hence the low number of points in the ternaryplot. For a more exciting example of the plotting abilities of the STEPCAM package, please see the relevant vignette.