diff --git a/docs/mindformers/docs/source_en/feature/configuration.md b/docs/mindformers/docs/source_en/feature/configuration.md
index 61c145cb4ba03af352a81e6b68ddf39ad9ec8fa2..b96d108828df02de0ab8dea8f34fcddf77d5b078 100644
--- a/docs/mindformers/docs/source_en/feature/configuration.md
+++ b/docs/mindformers/docs/source_en/feature/configuration.md
@@ -255,7 +255,7 @@ In order to improve the performance of the model, it is usually necessary to con
| recompute_config.select_recompute | Turn on recomputation to recompute only for the operators in the attention layer. | bool/list |
| recompute_config.parallel_optimizer_comm_recompute | Whether to recompute AllGather communication introduced in parallel by the optimizer. | bool/list |
| recompute_config.mp_comm_recompute | Whether to recompute communications introduced by model parallel. | bool |
- | recompute_config.recompute_slice_activation | Whether to output slices for Cells kept in memory. | bool |
+ | recompute_config.recompute_slice_activation | Whether to output slices for Cells kept in memory. This parameter is only supported in legacy models. | bool |
| recompute_config.select_recompute_exclude | Disable recomputation for the specified operator, valid only for the Primitive operators. | bool/list |
| recompute_config.select_comm_recompute_exclude | Disable communication recomputation for the specified operator, valid only for the Primitive operators. | bool/list |
diff --git a/docs/mindformers/docs/source_en/feature/memory_optimization.md b/docs/mindformers/docs/source_en/feature/memory_optimization.md
index cd26139e84635b2af0ca42446c71d449000ed6df..990105717e5d03c6f24ef17285b62f948a2bd014 100644
--- a/docs/mindformers/docs/source_en/feature/memory_optimization.md
+++ b/docs/mindformers/docs/source_en/feature/memory_optimization.md
@@ -59,14 +59,14 @@ Then the configuration of each layer recompute will be printed.
The main parameters for recomputation configuration are listed in the following table:
-| Parameter | Description | Value Description |
-|-----------------------------------|------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| recompute | (By layer) Full recompute. | Can be configured as bool, list or tuple of integers, or 2D list or tuple.
When configured as bool type, turn on or off full recompute for all layers;
When configured as list or tuple of integers, it indicates how many layers in each `pipeline_stage` have full recompute enabled. When `pp_interleave_num > 1`, the number of recompute layers enabled will be evenly distributed to each interleave;
When configured as a 2D list or tuple of integers, it indicates how many layers in each mini stage have full recompute enabled. |
-| select_recompute | (By operator) Select recompute. | Can be configured as bool, list or tuple of integers, or two-dimensional list or tuple, list or tuple of strings, and dict.
The default selection recalculation operator is `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` .
When configured as bool type, it turns on or off the selection recalculation of the default operator for all layers;
When configured as an integer list or tuple, it represents how many layers in each `pipeline_stage` turn on the selection recalculation of the default operator. When `pp_interleave_num > 1`, the number of selection recalculation layers turned on will be evenly distributed to each interleave;
When configured as an integer two-dimensional list or tuple, it represents how many layers in each mini stage turn on the selection recalculation of the default operator.
When configured as a string list or tuple, it indicates which operators are enabled for selective recomputation. The operator names are matched by regular expressions, and the hierarchical relationships are separated by `'\\.'`;
When configured as a dict, the key value corresponds to the operator name, and the value corresponds to the configuration method for selective recomputation. This method can fine-tune the recomputation strategy for each operator. |
-| select_comm_recompute | Select communication recomputation (by operator). | The configuration method is the same as **select_recompute**. The default selection of communication recomputation operators is `['.*\\.norm']` . Generally, it is only configured for layer_norm or similar layers. |
-| parallel_optimizer_comm_recompute | Optimizer parallel communication recomputation. Whether to recompute AllGather communication in optimizer parallelism. | (bool, optional) - After enabling, in automatic parallelism or semi-automatic parallelism mode, specify whether AllGather communication introduced by optimizer parallelism in Cell is recomputed. Default value: `False`. |
-| mp_comm_recompute | Model parallel communication recomputation, whether to recompute communication operators in model parallelism. | (bool, optional) - After turning on, in automatic parallelism or semi-automatic parallelism mode, specify whether to recompute the communication operations introduced by model parallelism in the cell. Default value: `True`. |
-| recompute_slice_activation | Slice recomputation, whether to slice the cell output that will be kept in memory. | (bool, optional) - Default value: `False`. |
+| Parameter | Description | Value Description |
+|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| recompute | (By layer) Full recompute. | Can be configured as bool, list or tuple of integers, or 2D list or tuple.
When configured as bool type, turn on or off full recompute for all layers;
When configured as list or tuple of integers, it indicates how many layers in each `pipeline_stage` have full recompute enabled. When `pp_interleave_num > 1`, the number of recompute layers enabled will be evenly distributed to each interleave;
When configured as a 2D list or tuple of integers, it indicates how many layers in each mini stage have full recompute enabled. |
+| select_recompute | (By operator) Select recompute. | Can be configured as bool, list or tuple of integers, or two-dimensional list or tuple, list or tuple of strings, and dict.
The default selection recalculation operator is `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` .
When configured as bool type, it turns on or off the selection recalculation of the default operator for all layers;
When configured as an integer list or tuple, it represents how many layers in each `pipeline_stage` turn on the selection recalculation of the default operator. When `pp_interleave_num > 1`, the number of selection recalculation layers turned on will be evenly distributed to each interleave;
When configured as an integer two-dimensional list or tuple, it represents how many layers in each mini stage turn on the selection recalculation of the default operator.
When configured as a string list or tuple, it indicates which operators are enabled for selective recomputation. The operator names are matched by regular expressions, and the hierarchical relationships are separated by `'\\.'`;
When configured as a dict, the key value corresponds to the operator name, and the value corresponds to the configuration method for selective recomputation. This method can fine-tune the recomputation strategy for each operator. |
+| select_comm_recompute | Select communication recomputation (by operator). | The configuration method is the same as **select_recompute**. The default selection of communication recomputation operators is `['.*\\.norm']` . Generally, it is only configured for layer_norm or similar layers. |
+| parallel_optimizer_comm_recompute | Optimizer parallel communication recomputation. Whether to recompute AllGather communication in optimizer parallelism. | (bool, optional) - After enabling, in automatic parallelism or semi-automatic parallelism mode, specify whether AllGather communication introduced by optimizer parallelism in Cell is recomputed. Default value: `False`. |
+| mp_comm_recompute | Model parallel communication recomputation, whether to recompute communication operators in model parallelism. | (bool, optional) - After turning on, in automatic parallelism or semi-automatic parallelism mode, specify whether to recompute the communication operations introduced by model parallelism in the cell. Default value: `True`. |
+| recompute_slice_activation | Slice recomputation, whether to slice the cell output that will be kept in memory. This parameter is only supported in legacy models. | (bool, optional) - Default value: `False`. |
## Fine-Grained Activations SWAP
diff --git a/docs/mindformers/docs/source_zh_cn/feature/configuration.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
index 303b61b100afff95ed9ef39f71a5b26e96d26278..ca09a052e6c29de4b016b9fa10575a4a9b778484 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/configuration.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
@@ -249,15 +249,15 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
1. MindSpore Transformers提供重计算相关配置,以降低模型在训练时的内存占用,详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/master/advanced_development/performance_optimization.html#重计算)。
- | 参数 | 说明 | 类型 |
- |----------------------------------------------------|--------------------------------|-----------------|
- | recompute_config.recompute | 是否开启重计算。 | bool/list/tuple |
- | recompute_config.select_recompute | 开启选择重计算,只针对attention层的算子进行重计算。 | bool/list |
- | recompute_config.parallel_optimizer_comm_recompute | 是否对由优化器并行引入的AllGather通信进行重计算。 | bool/list |
- | recompute_config.mp_comm_recompute | 是否对由模型并行引入的通信进行重计算。 | bool |
- | recompute_config.recompute_slice_activation | 是否对保留在内存中的Cell输出切片。 | bool |
- | recompute_config.select_recompute_exclude | 关闭指定算子的重计算,只对Primitive算子有效。 | bool/list |
- | recompute_config.select_comm_recompute_exclude | 关闭指定算子的通讯重计算,只对Primitive算子有效。 | bool/list |
+ | 参数 | 说明 | 类型 |
+ |----------------------------------------------------|------------------------------------|-----------------|
+ | recompute_config.recompute | 是否开启重计算。 | bool/list/tuple |
+ | recompute_config.select_recompute | 开启选择重计算,只针对attention层的算子进行重计算。 | bool/list |
+ | recompute_config.parallel_optimizer_comm_recompute | 是否对由优化器并行引入的AllGather通信进行重计算。 | bool/list |
+ | recompute_config.mp_comm_recompute | 是否对由模型并行引入的通信进行重计算。 | bool |
+ | recompute_config.recompute_slice_activation | 是否对保留在内存中的Cell输出切片。该参数仅支持legacy模型。 | bool |
+ | recompute_config.select_recompute_exclude | 关闭指定算子的重计算,只对Primitive算子有效。 | bool/list |
+ | recompute_config.select_comm_recompute_exclude | 关闭指定算子的通讯重计算,只对Primitive算子有效。 | bool/list |
2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/master/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。
diff --git a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
index 06afbb52ec3660484c0b0bedf2bc70bc00542ed8..0d1523e31549f5c623935ab21224190cca954225 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
@@ -59,14 +59,14 @@ INFO - Formative select_comm_recompute: {'ffn_norm\.norm': [[4, 5, 5, 5, 5], [5,
有关重计算配置的主要参数如下表所列:
-| 参数 | 描述 | 取值说明 |
-|-----------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| recompute | (按层)完全重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple。
配置为 bool 类型时,对所有层开启或关闭完全重计算;
配置为整数型 list 或 tuple 时,代表每个 `pipeline_stage` 中有多少层开启完全重计算, `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中;
配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启完全重计算。 |
-| select_recompute | (按算子)选择重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple,字符串的 list 或 tuple,以及 dict。
默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。
配置为 bool 类型时,对所有层开启或关闭默认算子的选择重计算;
配置为整数型 list 或 tuple 时,代表每个 `pipeline_stage` 中有多少层开启默认算子的选择重计算, `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中;
配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启默认算子的选择重计算。
配置为字符串 list 或 tuple 时,代表对哪些算子开启选择重计算,算子名通过正则表达式匹配,层级关系通过 `'\\.'` 分割;
配置为 dict 时,key 值对应算子名,value 值对应选择重计算的配置方式,这种配法可以对每个算子精细配置重计算策略。 |
-| select_comm_recompute | (按算子)选择通信重计算。 | 配置方式与 **select_recompute** 相同,默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。 |
-| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下,是否重计算 AllGather 通信。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。默认值: `False` 。 |
-| mp_comm_recompute | 模型并行通信重计算,在模型并行下,是否重计算通信算子。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值: `True` 。 |
-| recompute_slice_activation | 切片重计算,是否对将保留在内存中的 Cell 输出进行切片。 | (bool, 可选) - 默认值: `False` 。 |
+| 参数 | 描述 | 取值说明 |
+|-----------------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| recompute | (按层)完全重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple。
配置为 bool 类型时,对所有层开启或关闭完全重计算;
配置为整数型 list 或 tuple 时,代表每个 `pipeline_stage` 中有多少层开启完全重计算, `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中;
配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启完全重计算。 |
+| select_recompute | (按算子)选择重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple,字符串的 list 或 tuple,以及 dict。
默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。
配置为 bool 类型时,对所有层开启或关闭默认算子的选择重计算;
配置为整数型 list 或 tuple 时,代表每个 `pipeline_stage` 中有多少层开启默认算子的选择重计算, `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中;
配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启默认算子的选择重计算。
配置为字符串 list 或 tuple 时,代表对哪些算子开启选择重计算,算子名通过正则表达式匹配,层级关系通过 `'\\.'` 分割;
配置为 dict 时,key 值对应算子名,value 值对应选择重计算的配置方式,这种配法可以对每个算子精细配置重计算策略。 |
+| select_comm_recompute | (按算子)选择通信重计算。 | 配置方式与 **select_recompute** 相同,默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。 |
+| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下,是否重计算 AllGather 通信。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。默认值: `False` 。 |
+| mp_comm_recompute | 模型并行通信重计算,在模型并行下,是否重计算通信算子。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值: `True` 。 |
+| recompute_slice_activation | 切片重计算,是否对将保留在内存中的 Cell 输出进行切片。该参数仅支持legacy模型。 | (bool, 可选) - 默认值: `False` 。 |
## 细粒度激活值SWAP