diff --git a/docs/mindformers/docs/source_en/feature/configuration.md b/docs/mindformers/docs/source_en/feature/configuration.md
index 61c145cb4ba03af352a81e6b68ddf39ad9ec8fa2..b96d108828df02de0ab8dea8f34fcddf77d5b078 100644
--- a/docs/mindformers/docs/source_en/feature/configuration.md
+++ b/docs/mindformers/docs/source_en/feature/configuration.md
@@ -255,7 +255,7 @@ In order to improve the performance of the model, it is usually necessary to con
    | recompute_config.select_recompute                  | Turn on recomputation to recompute only for the operators in the attention layer.                       | bool/list       |
    | recompute_config.parallel_optimizer_comm_recompute | Whether to recompute AllGather communication introduced in parallel by the optimizer.                   | bool/list       |
    | recompute_config.mp_comm_recompute                 | Whether to recompute communications introduced by model parallel.                                       | bool            |
-   | recompute_config.recompute_slice_activation        | Whether to output slices for Cells kept in memory.                                                      | bool            |
+   | recompute_config.recompute_slice_activation        | Whether to output slices for Cells kept in memory. This parameter is only supported in legacy models.   | bool            |
    | recompute_config.select_recompute_exclude          | Disable recomputation for the specified operator, valid only for the Primitive operators.               | bool/list       |
    | recompute_config.select_comm_recompute_exclude     | Disable communication recomputation for the specified operator, valid only for the Primitive operators. | bool/list       |
 
diff --git a/docs/mindformers/docs/source_en/feature/memory_optimization.md b/docs/mindformers/docs/source_en/feature/memory_optimization.md
index cd26139e84635b2af0ca42446c71d449000ed6df..990105717e5d03c6f24ef17285b62f948a2bd014 100644
--- a/docs/mindformers/docs/source_en/feature/memory_optimization.md
+++ b/docs/mindformers/docs/source_en/feature/memory_optimization.md
@@ -59,14 +59,14 @@ Then the configuration of each layer recompute will be printed.
 
 The main parameters for recomputation configuration are listed in the following table:
 
-| Parameter                         | Description                                                                                                            | Value Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-|-----------------------------------|------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| recompute                         | (By layer) Full recompute.                                                                                             | Can be configured as bool, list or tuple of integers, or 2D list or tuple. <br>When configured as bool type, turn on or off full recompute for all layers; <br>When configured as list or tuple of integers, it indicates how many layers in each `pipeline_stage` have full recompute enabled. When `pp_interleave_num > 1`, the number of recompute layers enabled will be evenly distributed to each interleave; <br>When configured as a 2D list or tuple of integers, it indicates how many layers in each mini stage have full recompute enabled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| select_recompute                  | (By operator) Select recompute.                                                                                        | Can be configured as bool, list or tuple of integers, or two-dimensional list or tuple, list or tuple of strings, and dict. <br>The default selection recalculation operator is `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` . <br>When configured as bool type, it turns on or off the selection recalculation of the default operator for all layers; <br>When configured as an integer list or tuple, it represents how many layers in each `pipeline_stage` turn on the selection recalculation of the default operator. When `pp_interleave_num > 1`, the number of selection recalculation layers turned on will be evenly distributed to each interleave; <br>When configured as an integer two-dimensional list or tuple, it represents how many layers in each mini stage turn on the selection recalculation of the default operator. <br>When configured as a string list or tuple, it indicates which operators are enabled for selective recomputation. The operator names are matched by regular expressions, and the hierarchical relationships are separated by `'\\.'`; <br>When configured as a dict, the key value corresponds to the operator name, and the value corresponds to the configuration method for selective recomputation. This method can fine-tune the recomputation strategy for each operator. |
-| select_comm_recompute             | Select communication recomputation (by operator).                                                                      | The configuration method is the same as **select_recompute**. The default selection of communication recomputation operators is `['.*\\.norm']` . Generally, it is only configured for layer_norm or similar layers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-| parallel_optimizer_comm_recompute | Optimizer parallel communication recomputation. Whether to recompute AllGather communication in optimizer parallelism. | (bool, optional) - After enabling, in automatic parallelism or semi-automatic parallelism mode, specify whether AllGather communication introduced by optimizer parallelism in Cell is recomputed. Default value: `False`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| mp_comm_recompute                 | Model parallel communication recomputation, whether to recompute communication operators in model parallelism.         | (bool, optional) - After turning on, in automatic parallelism or semi-automatic parallelism mode, specify whether to recompute the communication operations introduced by model parallelism in the cell. Default value: `True`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-| recompute_slice_activation        | Slice recomputation, whether to slice the cell output that will be kept in memory.                                     | (bool, optional) - Default value: `False`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Parameter                         | Description                                                                                                                           | Value Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| recompute                         | (By layer) Full recompute.                                                                                                            | Can be configured as bool, list or tuple of integers, or 2D list or tuple. <br>When configured as bool type, turn on or off full recompute for all layers; <br>When configured as list or tuple of integers, it indicates how many layers in each `pipeline_stage` have full recompute enabled. When `pp_interleave_num > 1`, the number of recompute layers enabled will be evenly distributed to each interleave; <br>When configured as a 2D list or tuple of integers, it indicates how many layers in each mini stage have full recompute enabled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| select_recompute                  | (By operator) Select recompute.                                                                                                       | Can be configured as bool, list or tuple of integers, or two-dimensional list or tuple, list or tuple of strings, and dict. <br>The default selection recalculation operator is `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` . <br>When configured as bool type, it turns on or off the selection recalculation of the default operator for all layers; <br>When configured as an integer list or tuple, it represents how many layers in each `pipeline_stage` turn on the selection recalculation of the default operator. When `pp_interleave_num > 1`, the number of selection recalculation layers turned on will be evenly distributed to each interleave; <br>When configured as an integer two-dimensional list or tuple, it represents how many layers in each mini stage turn on the selection recalculation of the default operator. <br>When configured as a string list or tuple, it indicates which operators are enabled for selective recomputation. The operator names are matched by regular expressions, and the hierarchical relationships are separated by `'\\.'`; <br>When configured as a dict, the key value corresponds to the operator name, and the value corresponds to the configuration method for selective recomputation. This method can fine-tune the recomputation strategy for each operator. |
+| select_comm_recompute             | Select communication recomputation (by operator).                                                                                     | The configuration method is the same as **select_recompute**. The default selection of communication recomputation operators is `['.*\\.norm']` . Generally, it is only configured for layer_norm or similar layers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| parallel_optimizer_comm_recompute | Optimizer parallel communication recomputation. Whether to recompute AllGather communication in optimizer parallelism.                | (bool, optional) - After enabling, in automatic parallelism or semi-automatic parallelism mode, specify whether AllGather communication introduced by optimizer parallelism in Cell is recomputed. Default value: `False`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| mp_comm_recompute                 | Model parallel communication recomputation, whether to recompute communication operators in model parallelism.                        | (bool, optional) - After turning on, in automatic parallelism or semi-automatic parallelism mode, specify whether to recompute the communication operations introduced by model parallelism in the cell. Default value: `True`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| recompute_slice_activation        | Slice recomputation, whether to slice the cell output that will be kept in memory. This parameter is only supported in legacy models. | (bool, optional) - Default value: `False`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 
 ## Fine-Grained Activations SWAP
 
diff --git a/docs/mindformers/docs/source_zh_cn/feature/configuration.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
index 303b61b100afff95ed9ef39f71a5b26e96d26278..ca09a052e6c29de4b016b9fa10575a4a9b778484 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/configuration.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
@@ -249,15 +249,15 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
 
 1. MindSpore Transformers提供重计算相关配置，以降低模型在训练时的内存占用，详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/master/advanced_development/performance_optimization.html#重计算)。
 
-   | 参数                                                 | 说明                             | 类型              |
-   |----------------------------------------------------|--------------------------------|-----------------|
-   | recompute_config.recompute                         | 是否开启重计算。                       | bool/list/tuple |
-   | recompute_config.select_recompute                  | 开启选择重计算，只针对attention层的算子进行重计算。 | bool/list       |
-   | recompute_config.parallel_optimizer_comm_recompute | 是否对由优化器并行引入的AllGather通信进行重计算。  | bool/list       |
-   | recompute_config.mp_comm_recompute                 | 是否对由模型并行引入的通信进行重计算。            | bool            |
-   | recompute_config.recompute_slice_activation        | 是否对保留在内存中的Cell输出切片。            | bool            |
-   | recompute_config.select_recompute_exclude          | 关闭指定算子的重计算，只对Primitive算子有效。    | bool/list       |
-   | recompute_config.select_comm_recompute_exclude     | 关闭指定算子的通讯重计算，只对Primitive算子有效。  | bool/list       |
+   | 参数                                                 | 说明                                 | 类型              |
+   |----------------------------------------------------|------------------------------------|-----------------|
+   | recompute_config.recompute                         | 是否开启重计算。                           | bool/list/tuple |
+   | recompute_config.select_recompute                  | 开启选择重计算，只针对attention层的算子进行重计算。     | bool/list       |
+   | recompute_config.parallel_optimizer_comm_recompute | 是否对由优化器并行引入的AllGather通信进行重计算。      | bool/list       |
+   | recompute_config.mp_comm_recompute                 | 是否对由模型并行引入的通信进行重计算。                | bool            |
+   | recompute_config.recompute_slice_activation        | 是否对保留在内存中的Cell输出切片。该参数仅支持legacy模型。 | bool            |
+   | recompute_config.select_recompute_exclude          | 关闭指定算子的重计算，只对Primitive算子有效。        | bool/list       |
+   | recompute_config.select_comm_recompute_exclude     | 关闭指定算子的通讯重计算，只对Primitive算子有效。      | bool/list       |
 
 2. MindSpore Transformers提供细粒度激活值SWAP相关配置，以降低模型在训练时的内存占用，详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/master/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。
 
diff --git a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
index 06afbb52ec3660484c0b0bedf2bc70bc00542ed8..0d1523e31549f5c623935ab21224190cca954225 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
@@ -59,14 +59,14 @@ INFO - Formative select_comm_recompute: {'ffn_norm\.norm': [[4, 5, 5, 5, 5], [5,
 
 有关重计算配置的主要参数如下表所列：
 
-| 参数                                | 描述                                                       | 取值说明                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-|-----------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| recompute                         | （按层）完全重计算。                                               | 可配置为 bool，整数型的 list 或 tuple，或二维 list 或 tuple。<br>配置为 bool 类型时，对所有层开启或关闭完全重计算；<br>配置为整数型 list 或 tuple 时，代表每个 `pipeline_stage` 中有多少层开启完全重计算， `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中；<br>配置为整数型二维 list 或 tuple 时，代表每个 mini stage 中有多少层开启完全重计算。                                                                                                                                                                                                                                                                         |
-| select_recompute                  | （按算子）选择重计算。                                              | 可配置为 bool，整数型的 list 或 tuple，或二维 list 或 tuple，字符串的 list 或 tuple，以及 dict。<br>默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。<br>配置为 bool 类型时，对所有层开启或关闭默认算子的选择重计算；<br>配置为整数型 list 或 tuple 时，代表每个 `pipeline_stage` 中有多少层开启默认算子的选择重计算， `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中；<br>配置为整数型二维 list 或 tuple 时，代表每个 mini stage 中有多少层开启默认算子的选择重计算。<br>配置为字符串 list 或 tuple 时，代表对哪些算子开启选择重计算，算子名通过正则表达式匹配，层级关系通过 `'\\.'` 分割；<br>配置为 dict 时，key 值对应算子名，value 值对应选择重计算的配置方式，这种配法可以对每个算子精细配置重计算策略。 |
-| select_comm_recompute             | （按算子）选择通信重计算。                                            | 配置方式与 **select_recompute** 相同，默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下，是否重计算 AllGather 通信。                   | (bool, 可选) - 开启后在自动并行或半自动并行模式下，指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。默认值： `False` 。                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-| mp_comm_recompute                 | 模型并行通信重计算，在模型并行下，是否重计算通信算子。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下，指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值： `True` 。                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| recompute_slice_activation        | 切片重计算，是否对将保留在内存中的 Cell 输出进行切片。                           | (bool, 可选) - 默认值： `False` 。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| 参数                                | 描述                                            | 取值说明                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|-----------------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| recompute                         | （按层）完全重计算。                                    | 可配置为 bool，整数型的 list 或 tuple，或二维 list 或 tuple。<br>配置为 bool 类型时，对所有层开启或关闭完全重计算；<br>配置为整数型 list 或 tuple 时，代表每个 `pipeline_stage` 中有多少层开启完全重计算， `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中；<br>配置为整数型二维 list 或 tuple 时，代表每个 mini stage 中有多少层开启完全重计算。                                                                                                                                                                                                                                                                         |
+| select_recompute                  | （按算子）选择重计算。                                   | 可配置为 bool，整数型的 list 或 tuple，或二维 list 或 tuple，字符串的 list 或 tuple，以及 dict。<br>默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。<br>配置为 bool 类型时，对所有层开启或关闭默认算子的选择重计算；<br>配置为整数型 list 或 tuple 时，代表每个 `pipeline_stage` 中有多少层开启默认算子的选择重计算， `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中；<br>配置为整数型二维 list 或 tuple 时，代表每个 mini stage 中有多少层开启默认算子的选择重计算。<br>配置为字符串 list 或 tuple 时，代表对哪些算子开启选择重计算，算子名通过正则表达式匹配，层级关系通过 `'\\.'` 分割；<br>配置为 dict 时，key 值对应算子名，value 值对应选择重计算的配置方式，这种配法可以对每个算子精细配置重计算策略。 |
+| select_comm_recompute             | （按算子）选择通信重计算。                                 | 配置方式与 **select_recompute** 相同，默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下，是否重计算 AllGather 通信。        | (bool, 可选) - 开启后在自动并行或半自动并行模式下，指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。默认值： `False` 。                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| mp_comm_recompute                 | 模型并行通信重计算，在模型并行下，是否重计算通信算子。                   | (bool, 可选) - 开启后在自动并行或半自动并行模式下，指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值： `True` 。                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| recompute_slice_activation        | 切片重计算，是否对将保留在内存中的 Cell 输出进行切片。该参数仅支持legacy模型。 | (bool, 可选) - 默认值： `False` 。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
 
 ## 细粒度激活值SWAP