Combine several predictive models via weights
Source:R/caretEnsemble.R
, R/package.R
caretEnsemble.Rd
Find a greedy, positive only linear combination of several train
objects
Functions for creating ensembles of caret models: caretList and caretStack
Arguments
- all.models
an object of class caretList
- excluded_class_id
The integer level to exclude from binary classification or multiclass problems. By default no classes are excluded, as the greedy optimizer requires all classes because it cannot use negative coefficients.
- tuneLength
The size of the grid to search for tuning the model. Defaults to 1, as the only parameter to optimize is the number of iterations, and the default of 100 works well.
- ...
additional arguments to pass caret::train
Details
greedyMSE works well when you want an ensemble that will never be worse than any single model in the dataset. In the worst case scenario, it will select the single best model, if none of them can be ensembled to improve the overall score. It will also never assign any model a negative coefficient, which can help avoid unintuitive cases at prediction time (e.g. if the correlations between predictors breaks down on new data, negative coefficients can lead to bad results).
Note
Every model in the "library" must be a separate train
object. For
example, if you wish to combine a random forests with several different
values of mtry, you must build a model for each value of mtry. If you
use several values of mtry in one train model, (e.g. tuneGrid =
expand.grid(.mtry=2:5)), caret will select the best value of mtry
before we get a chance to include it in the ensemble. By default,
RMSE is used to ensemble regression models, and AUC is used to ensemble
Classification models. This function does not currently support multi-class
problems
Author
Maintainer: Zachary A. Deane-Mayer zach.mayer@gmail.com [copyright holder]
Other contributors:
Jared E. Knowles jknowles@gmail.com [contributor]
Antón López anton.gomez.lopez@rai.usc.es [contributor]
Examples
set.seed(42)
models <- caretList(iris[1:50, 1:2], iris[1:50, 3], methodList = c("rpart", "rf"))
#> Warning: There were missing values in resampled performance measures.
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#>
ens <- caretEnsemble(models)
summary(ens)
#> The following models were ensembled: rpart, rf
#>
#> Model Importance:
#> rpart rf
#> 0.952 0.048
#>
#> Model accuracy:
#> model_name metric value sd
#> <char> <char> <num> <num>
#> 1: ensemble RMSE 0.1725561 0.03171894
#> 2: rpart RMSE 0.1679543 0.04899049
#> 3: rf RMSE 0.1869204 0.03174027