We typically create fast-and-frugal trees (FFTs) from data by using the FFTrees()
function (see the Main guide, the vignette on Creating FFTs with FFTrees() and for details). However, we sometimes want to design and test some specific FFT (e.g., to check a hypothesis or using some variables based on theoretical considerations).
There are two ways to define fast-and-frugal trees manually when using the FFTrees()
function:
as a sentence using the my.tree
argument (the easier way), or
as a data frame using the tree.definitions
argument (the harder way).
Both of these methods will bypass the tree construction algorithms built into the FFTrees package.
my.tree
The first method is to use the my.tree
argument, where my.tree
is a sentence describing a (single) FFT. When this argument is specified in FFTrees()
, the function (specifically, an auxiliary fftrees_wordstofftrees()
function) will try to convert the verbal description into the definition of a FFT (of an FFTrees
object).
For example, let’s look at the heartdisease
data to find out how some predictor variables (e.g., sex
, age
, etc.) predict the criterion variable (diagnosis
):
sex | age | thal | cp | ca | diagnosis |
---|---|---|---|---|---|
1 | 63 | fd | ta | 0 | FALSE |
1 | 67 | normal | a | 3 | TRUE |
1 | 67 | rd | a | 2 | TRUE |
1 | 37 | normal | np | 0 | FALSE |
0 | 41 | normal | aa | 0 | FALSE |
1 | 56 | normal | aa | 0 | FALSE |
Here’s how we could verbally describe an FFT by using the first three cues in conditional sentences:
"If sex = 1, predict True.
in_words <- If age < 45, predict False.
If thal = {fd, normal}, predict True.
Otherwise, predict False."
As we will see shortly, the FFTrees()
function accepts such descriptions (assigned here to a character string in_words
) as its my.tree
argument, create a corresponding FFT, and evaluate it on a corresponding dataset.
Here are some instructions for manually specifying trees:
Each node must start with the word “If” and should correspond to the form: If <CUE> <DIRECTION> <THRESHOLD>, predict <EXIT>
.
Numeric thresholds should be specified directly (without brackets), like age > 21
.
For categorical variables, factor thresholds must be specified within curly braces, like sex = {male}
. For factors with sets of values, categories within a threshold should be separated by commas like eyecolor = {blue,brown}
.
To specify cue directions, standard logical comparisons =
, !=
, <
, >=
(etc.) are valid. For numeric cues, only use >
, >=
, <
, or <=
. For factors, only use =
or !=
.
Positive exits are indicated by True
, while negative exits are specified by False
.
The final node of an FFT is always bi-directional (i.e., has both a positive and a negative exit). The description of the final node always mentions its positive (True
) exit first. The text Otherwise, predict EXIT
that we have included in the example above is actually not necessary (and ignored).
Now, let’s use our verbal description of an FFT (assigned to in_words
above) as the my.tree
argument of the FFTrees()
function. This creates a corresponding FFT and applies it to the heartdisease
data:
# Create FFTrees from a verbal FFT description (as my.tree):
FFTrees(diagnosis ~.,
my_fft <-data = heartdisease,
main = "My 1st FFT",
my.tree = in_words)
Let’s see how well our manually constructed FFT (my_fft
) did:
# Inspect FFTrees object:
plot(my_fft)
When manually constructing a tree, the resulting FFTrees
object only contains a single FFT. Hence, the ROC plot (in the right bottom panel of Figure 1) cannot show a range of FFTs, but locates the constructed FFT in ROC space.
As it turns out, the performance of our first FFT created from a verbal description is a mixed affair: The tree has a rather high sensitivity (of 91%), but its low specificity (of only 10%) allows for many false alarms. Consequently, its accuracy measures are only around baseline level.
Let’s see if we can come up with a better FFT. The following example uses the cues thal
, cp
, and ca
in the my.tree
argument:
# Create 2nd FFTrees from an alternative FFT description (as my.tree):
2 <- FFTrees(diagnosis ~.,
my_fft_data = heartdisease,
main = "My 2nd FFT",
my.tree = "If thal = {rd,fd}, predict True.
If cp != {a}, predict False.
If ca > 1, predict True.
Otherwise, predict False.")
# Inspect FFTrees object:
plot(my_fft_2)
This alternative FFT is nicely balancing sensitivity and specificity and performs much better overall. Nevertheless, it is still far from perfect — so check out whether you can create even better ones!
For details on understanding and changing tree definitions, see the section on Tree definitions in the Creating FFTs with FFTrees() vignette.
Here is a complete list of the vignettes available in the FFTrees package:
Vignette | Description | |
---|---|---|
Main guide | An overview of the FFTrees package | |
1 | Tutorial: FFTs for heart disease | An example of using FFTrees() to model heart disease diagnosis |
2 | Accuracy statistics | Definitions of accuracy statistics used throughout the package |
3 | Creating FFTs with FFTrees() | Details on the main function FFTrees() |
4 | Specifying FFTs directly | How to directly create FFTs with my.tree without using the built-in algorithms |
5 | Visualizing FFTs with plot() | Plotting FFTrees objects, from full trees to icon arrays |
6 | Examples of FFTs | Examples of FFTs from different datasets contained in the package |