解析 LightGBM 模型 JSON dump — lgb.model.dt.tree • lightgbm

将 LightGBM 模型 JSON dump 解析为 data.table 结构。

lgb.model.dt.tree(model, num_iteration = NULL, start_iteration = 1L)

参数

model

lgb.Booster 类的对象。

num_iteration

要包含的迭代次数。NULL 或 <= 0 表示使用最佳迭代次数。

start_iteration

输出中要包含的第一个提升轮次的索引（从 1 开始）。例如，对于回归模型，传递 start_iteration=5, num_iteration=3 意味着“返回关于第五、第六和第七棵树的信息”。

4.4.0 版本新增

值

一个 data.table，其中包含有关模型树节点和叶子的详细信息。

该 data.table 的列包括：

tree_index: 模型中树的 ID（整数）
split_index: 树中节点的 ID（整数）
split_feature: 对于节点，它是特征名称（字符）；对于叶子，它简单地将其标记为 "NA"
node_parent: 当前节点的父节点的 ID（整数）
leaf_index: 树中叶子的 ID（整数）
leaf_parent: 当前叶子的父节点的 ID（整数）
split_gain: 节点的分割增益
threshold: 节点的分割阈值
decision_type: 节点的决策类型
default_left: 确定如何处理 NA 值，TRUE -> 左，FALSE -> 右
internal_value: 节点值
internal_count: 节点收集的观测数量
leaf_value: 叶子值
leaf_count: 叶子收集的观测数量

示例

# \donttest{
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)

params <- list(
  objective = "binary"
  , learning_rate = 0.01
  , num_leaves = 63L
  , max_depth = -1L
  , min_data_in_leaf = 1L
  , min_sum_hessian_in_leaf = 1.0
  , num_threads = 2L
)
model <- lgb.train(params, dtrain, 10L)
#> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373
#> [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000726 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 232
#> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
#> [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
#> [LightGBM] [Info] Start training from score -0.071580
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf

tree_dt <- lgb.model.dt.tree(model)
# }