Usage
caretStack(
all.models,
new_X = NULL,
new_y = NULL,
metric = NULL,
trControl = NULL,
excluded_class_id = 1L,
...
)
Arguments
- all.models
a caretList, or an object coercible to a caretList (such as a list of train objects)
- new_X
Data to predict on for the caretList, prior to training the stack (for transfer learning). if NULL, the stacked predictions will be extracted from the caretList models.
- new_y
The outcome variable to predict on for the caretList, prior to training the stack (for transfer learning). If NULL, will use the observed levels from the first model in the caret stack If 0, will include all levels.
- metric
the metric to use for grid search on the stacking model.
- trControl
a trainControl object to use for training the ensemble model. If NULL, will use defaultControl.
- excluded_class_id
The integer level to exclude from binary classification or multiclass problems.
- ...
additional arguments to pass to the stacking model
Details
Uses either transfer learning or stacking to stack models. Assumes that all models were trained on the same number of rows of data, with the same target values. The features, cross-validation strategies, and model types (class vs reg) may vary however. If your stack of models were trained with different number of rows, please provide new_X and new_y so the models can predict on a common set of data for stacking.
If your models were trained on different columns, you should use stacking.
If you have both differing rows and columns in your model set, you are out of luck. You need at least a common set of rows during training (for stacking) or a common set of columns at inference time for transfer learning.
References
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble Selection from Libraries of Models. https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf
Examples
models <- caretList(
x = iris[1:50, 1:2],
y = iris[1:50, 3],
methodList = c("rpart", "glm")
)
#> Warning: There were missing values in resampled performance measures.
caretStack(models, method = "glm")
#> The following models were ensembled: rpart, glm
#>
#> caret::train model:
#> Generalized Linear Model
#>
#> No pre-processing
#> Resampling: Cross-Validated (5 fold)
#> Summary of sample sizes: 40, 40, 40, 40, 40
#> Resampling results:
#>
#> RMSE Rsquared MAE
#> 0.1810296 0.1078031 0.1348091
#>
#>
#> Final model:
#>
#> Call: NULL
#>
#> Coefficients:
#> (Intercept) rpart glm
#> 0.8108 0.1772 0.2663
#>
#> Degrees of Freedom: 49 Total (i.e. Null); 47 Residual
#> Null Deviance: 1.478
#> Residual Deviance: 1.455 AIC: -26.96