# Testing RRphylo methods overfit

## overfitRR basics

Methods using a large number of parameters risk being overfit. This usually translates in poor fitting with data and trees other than the those originally used. With RRphylo methods this risk is usually very low. However, the user can assess how robust the results got by applying search.shift, search.trend, search.conv, or PGLS_fossil are by running overfitRR. With the latter, the original tree and data are subsampled by specifying a s parameter, that is the proportion of tips to be removed from the tree. Internally, overfitRR further shuffles the tree by using the function swapONE. Thereby, both the potential for overfit and phylogenetic uncertainty are accounted for straight away.

overfitRR always takes an object generated by RRphylo and all the data used to produce it (besides necessary phenotypic data, any other argument such as covariate, predictor, and so on, passed to RRphylo). The arguments s and swap.args can be used to set the intensity of subsampling and phylogenetic alterations to be applied. Depending on which tool is under testing, the user supplies to the funcion one or more among trend.args, shift.args, and conv.args, each of them being a list of arguments specific to the namesake function (see the examples below).

## Results

The output of overfitRR is a RRphyloList object whose elements are different depending on the case under testing (see below).

### search.trend results

When testing for search.trend robustness, overfitRR returns results for both the entire tree and specific clades if indicated ($trend.results). Results for the entire tree (tree) summarize the proportion of simulations producing significant and positive (p.slope+) or significant and negative (p.slope-) trends in either phenotypes or absolute rates versus time regressions. Such evaluations is based on p.random only (see Temporal trends on the entire tree,for further details). When specific clades are under testing, the same set of results as for the whole tree is returned for each node (node). In this case, for phenotype versus age regression through nodes, the proportion of significant and positive/negative slopes (p.slope+ and p.slope-) is accompanied by the same figures for the estimated marginal mean differences (p.emm+ and p.emm-). As for the temporal trend in absolute rates through node, the proportion of significant and positive/negative estimated marginal means differences (p.emm+ and p.emm-) and the same figure for slope difference (p.slope+ and p.slope-) are reported (see Temporal trends at clade level). Finally when more than one node is tested, the $trend.results object also includes results for the pairwise comparison between nodes.

## Guided examples

library(ape)

# load the RRphylo example dataset including Ornithodirans tree and data
DataOrnithodirans$treedino->treedino # phylogenetic tree DataOrnithodirans$massdino->massdino # body mass data
DataOrnithodirans$statedino->statedino # locomotory type data ### Testing search.shift # perform RRphylo Ornithodirans tree and data RRphylo(tree=treedino,y=massdino)->dinoRates # perform search.shift under both "clade" and "sparse" condition search.shift(RR=dinoRates, status.type= "clade",filename=tempdir())->SSnode search.shift(RR=dinoRates, status.type= "sparse", state=statedino, filename=tempdir())->SSstate # test the robustness of search.shift results overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2), shift.args = list(node=rownames(SSnode$single.clades),state=statedino),
nsim=10)

### Testing search.trend
# Extract Pterosaurs tree and data
massdino[match(treeptero$tip.label,names(massdino))]->massptero # body mass data massptero[match(treeptero$tip.label,names(massptero))]->massptero

# perform RRphylo and search.trend on Pterosaurs tree and data
# by specifying a clade to be tested
RRphylo(tree=treeptero,y=log(massptero))->RRptero

search.trend(RR=RRptero, y=log(massptero),node=143,filename=tempdir(),
cov=NULL,ConfInt=FALSE)->STnode

# test the robustness of search.trend results
overfitRR(RR=RRptero,y=log(massptero),trend.args = list(node=143),nsim=10)

### Applying overfitRR on multiple RRphylo
# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet # phylogenetic tree DataCetaceans$masscet->masscet # logged body mass data
DataCetaceans$brainmasscet->brainmasscet # logged brain mass data DataCetaceans$aceMyst->aceMyst # known phenotypic value for the most recent
# common ancestor of Mysticeti

# cross-reference the phylogenetic tree and body and brain mass data. Remove from
# both the tree and vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet), treecet$tip.label)])->treecet1
masscet[match(treecet1$tip.label,names(masscet))]->masscet1 # peform RRphylo on the variable (body mass) to be used as additional predictor RRphylo(tree=treecet1,y=masscet1)->RRmass RRmass$aces[,1]->acemass1

# create the predictor vector: retrieve the ancestral character estimates
# of body size at internal nodes from the RR object ($aces) and collate them # to the vector of species' body sizes to create c(acemass1,masscet1)->x1.mass # peform RRphylo and search.trend on the brain mass # by using the body mass as additional predictor RRphylo(tree=treecet1,y=brainmasscet,x1=x1.mass)->RRmulti search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,filename=tempdir())->STcet # test the robustness of search.trend results overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),x1=x1.mass,nsim=10) ### Testing PGLS_fossil # peform RRphylo on cetaceans brain mass RRphylo(tree=treecet1,y=brainmasscet)->RRbrain # perform PGLS_fossil by using the original tree PGLS_fossil(y~x,data=list(y=brainmasscet,x=masscet1),tree=treecet1)->pgls_noRR # perform PGLS_fossil rescaling the tree according to RRphylo rates PGLS_fossil(y~x,data=list(y=brainmasscet,x=masscet1),tree=RRbrain$tree,RR=RRbrain)->pgls_RR

# test the robustness of PGLS_fossil results
overfitRR(RR=RRbrain,y=brainmasscet,
pgls.args=list(modform=y~x,data=list(y=brainmasscet,x=masscet1),tree=TRUE,RR=TRUE),
nsim=10)

### Testing search.conv
# load the RRphylo example dataset including Felids tree and data
data("DataFelids")
DataFelids$PCscoresfel->PCscoresfel # mandible shape data DataFelids$treefel->treefel # phylogenetic tree
DataFelids\$statefel->statefel # conical-toothed or saber-toothed condition

# perform RRphylo on Felids tree and data
RRphylo(tree=treefel,y=PCscoresfel)->RRfel

# search for morphologicl convergence between clades (automatic mode)
# and within the category
search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9",
list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10)