计算预测的特征贡献 — lgb.interprete • lightgbm

计算原始分数预测的特征贡献分量。

lgb.interprete(model, data, idxset, num_iteration = NULL)

参数

model: lgb.Booster 类的对象。
data: 一个 matrix 对象或一个 dgCMatrix 对象。
idxset: 一个整数向量，包含所需行的索引。
num_iteration: 要用于预测的迭代次数，NULL 或 <= 0 表示使用最佳迭代。

返回值

对于回归、二元分类和排序模型，一个包含以下列的 data.table 列表：

Feature: 模型中的特征名称。
Contribution: 此特征分割的总贡献。

对于多元分类，一个包含 Feature 列和对应于每个类别的 Contribution 列的 data.table 列表。

示例

# \donttest{
Logit <- function(x) log(x / (1.0 - x))
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
set_field(
  dataset = dtrain
  , field_name = "init_score"
  , data = rep(Logit(mean(train$label)), length(train$label))
)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test

params <- list(
    objective = "binary"
    , learning_rate = 0.1
    , max_depth = -1L
    , min_data_in_leaf = 1L
    , min_sum_hessian_in_leaf = 1.0
    , num_threads = 2L
)
model <- lgb.train(
    params = params
    , data = dtrain
    , nrounds = 3L
)
#> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373
#> [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000789 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 232
#> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf

tree_interpretation <- lgb.interprete(model, test$data, 1L:5L)
# }