workflow

Thijs Janzen

8/12/2021

NodeSub workflow

Given a tree, we can generate a NodeSubstitution alignment as follows (please note that for this example, we use an extremely short sequence length to reduce simulation time. Such short sequence lengths might not be appropriate for more serious applications.)

seq_length <- 100
sub_rate <- 1 / seq_length

input_tree <- TreeSim::sim.bd.taxa(n = 100,
                                   numbsim = 1,
                                   lambda = 1,
                                   mu = 0.1,
                                   complete = TRUE)[[1]]

target_alignment <- sim_unlinked(phy = input_tree,
                                 rate1 = sub_rate,
                                 rate2 = sub_rate,
                                 l = seq_length,
                                 node_time = 0.3)

We can then proceed to create a Twin alignment (e.g. an alignment with the exact same number of accumulated substitutions, given the same tree, but using a ‘normal’ substitution model)

comp_alignment <- create_equal_alignment(
                        input_tree = geiger::drop.extinct(input_tree),
                        # can only work on trees without extinct branches
                        sub_rate = sub_rate,
                        alignment_result = target_alignment)

Now we have two alignments, one generated using the Node Substitution model, and one using standard strict clock model. For both we can perform phylogenetic inference to get the resulting tree. The aim is to compare the posterior distribution of trees with the true tree that we started with ‘input_tree’, and estimate the error invoked by the node substitution model.

if (beastier::is_beast2_installed()) {

  node_posterior <- infer_phylogeny(target_alignment$alignment,
                                    "node_posterior",
                                    clock_prior = beautier::create_strict_clock_model(clock_rate_param =    beautier::create_clock_rate_param(value = sub_rate)),
                                    chain_length = 1e5,
                                    burnin = 0.1,
                                    working_dir = getwd(),
                                    sub_rate = sub_rate)

  reference_posterior <- infer_phylogeny(comp_alignment$alignment,
                                         "reference_posterior",
                                         burnin = 0.1,
                                         clock_prior = beautier::create_strict_clock_model(clock_rate_param = beautier::create_clock_rate_param(value = comp_alignment$adjusted_rate)),
                                         chain_length = 1e5,
                                         working_dir = getwd,
                                         sub_rate = sub_rate)
}
## Warning in infer_phylogeny(target_alignment$alignment, "node_posterior", : WARNING! MCMC chain not converged!
## remaining 2 
## total_trees 21
## Warning in infer_phylogeny(comp_alignment$alignment, "reference_posterior", : WARNING! MCMC chain not converged!
## remaining 2 
## total_trees 21

Having two posterior distributions of trees, we can compare them based on summary statistics (note 09-14-2023: the beta statistic is no longer supported, as the calculating package was removed from CRAN).

if (beastier::is_beast2_installed()) {
node_stats <- calc_sum_stats(node_posterior$all_trees,
                             input_tree)
ref_stats  <- calc_sum_stats(reference_posterior$all_trees,
                            input_tree)

node_stats$differences$method <- "node_sub"
ref_stats$differences$method  <- "reference"

results <- rbind(node_stats$differences,
                 ref_stats$differences)
}
## Warning in calc_sum_stats(node_posterior$all_trees, input_tree): Found extinct lineages, removed these from tree
## Warning: `calc_beta()` was deprecated in nodeSub 1.2.7.
## ℹ Unfortunately, the original package calculating beta is no longer on CRAN
##   (apTreeshape), please install the package treestats from
##   https://github.com/thijsjanzen/treestats as a temporary fix
## ℹ The deprecated feature was likely used in the nodeSub package.
##   Please report the issue at <https://github.com/thijsjanzen/nodeSub>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)
## Warning in calc_sum_stats(reference_posterior$all_trees, input_tree): Found extinct lineages, removed these from tree

We can now visualize the differences.

if (beastier::is_beast2_installed()) {
  results %>%
  tidyr::gather(key = "statistic", value = "val", -method) %>%
  dplyr::filter(statistic != "num_tips") %>%
  ggplot(aes(x = method, y = val)) +
    geom_boxplot() +
    facet_wrap(~statistic, scales = "free")
}

We find that the beta, gamma and nLTT statistic yield very similar results. However, the JSD, crown age and mean branch length statistics indicate that when using the node substitution model, errors arise, where the crown age of the trees inferred from the node substitution alignment is too high, resulting in a higher mean branch length and a difference in the JSD statistic.