diff --git a/docs/get_started/model_optimization.md b/docs/get_started/model_optimization.md index f419eeb0f0baf256af077811e85f6923294e599f..d63ddb884d036642cf680607c657ce9bbc1f6628 100644 --- a/docs/get_started/model_optimization.md +++ b/docs/get_started/model_optimization.md @@ -48,6 +48,37 @@ tcmalloc(即Thread-Caching Malloc)是一个通用的内存分配器,通过 | bev_pool_v3 | from mx_driving import bev_pool_v3 | | SparseConv3d | from mx_driving import SparseConv3d | | SubMConv3d | from mx_driving import SubMConv3d | +#### 【算子替换示例】 + +当采集完profiling后,查看算子耗时统计,分析耗时占比大的算子,进行替换,以BEVFormer模型为例,multi_scale_deformable_attn算子正反向在模型中占比很高,需要替换成mxDriving中的亲和算子: +算子耗时占比 + + +替换过程如下,算子详细参数参考[算子清单](../api/README.md),算子使用位置为projects/mmdet3d_plugin/bevformer/modules/decoder.py: +- 头文件引入mx_driving +```python +import mx_driving +``` +- 找到算子使用的位置,位置在325行,原始代码为: +```python + if torch.cuda.is_available() and value.is_cuda: + + # using fp16 deformable attention is unstable because it performs many sum operations + if value.dtype == torch.float16: + MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32 + else: + MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32 + output = MultiScaleDeformableAttnFunction.apply( + value, spatial_shapes, level_start_index, sampling_locations, + attention_weights, self.im2col_step) + else: + output = multi_scale_deformable_attn_pytorch( + value, spatial_shapes, sampling_locations, attention_weights) +``` +替换为mx_driving仓中算子: +```python + output = mx_driving.multi_scale_deformable_attn(value, spatial_shapes, level_start_index, sampling_locations, attention_weights) +``` ### 2.3 自驾模型host bound问题优化 自动驾驶算法有很多slice、gather、sort等小算子,2D图像特征转换到BEV空间,涉及大量投影、插值、采样等操作,点云数据的体素化处理、稀疏卷积等。自动驾驶算法中涉及很多逻辑处理,如检测算法的target assign,规控类算法lovaz loss,以及一些对groundtruth处理的操作。Host bound问题:小算子下发多,CPU计算处理逻辑多、负载大,算子没有NPU高性能替换等造成host bound问题严重。 diff --git a/docs/get_started/op_statistic.png b/docs/get_started/op_statistic.png new file mode 100644 index 0000000000000000000000000000000000000000..6ea78b1cc9a7167be9bf3a936500eb2ca34c494b Binary files /dev/null and b/docs/get_started/op_statistic.png differ