From 88fffcf450b36221799ee7592bb9829399a5f14b Mon Sep 17 00:00:00 2001
From: chenmingkai <chenmingkai1@huawei.com>
Date: Fri, 8 Mar 2024 17:14:58 +0800
Subject: [PATCH] [Doc] add api list, add security claim.

---
 LICENSE                                  |   4 +-
 README.md                                | 217 ++++--
 Third_Party_Open_Source__Software_Notice | 360 +++++++++
 docs/api/README.md                       | 938 ++++++++++++++++++++++-
 4 files changed, 1456 insertions(+), 63 deletions(-)
 create mode 100644 Third_Party_Open_Source__Software_Notice

diff --git a/LICENSE b/LICENSE
index 278befbe..43e15a43 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,8 @@
 BSD 3-Clause License
 
-Copyright (c) 2023, ckirchhoff
+Copyright (c) 2023, Huawei Technologies Co., Ltd.
+Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
diff --git a/README.md b/README.md
index 69e38538..55fe5ba9 100644
--- a/README.md
+++ b/README.md
@@ -1,62 +1,35 @@
-### 简介
+# ADS-Accelerator
 
-本项目基于昇腾NPU开发了用于自动驾驶场景的高性能算子
+# 简介
 
-### 编译、安装ADS
+ADS-Accelerator是基于昇腾NPU平台开发的适用于自动驾驶场景的算子和模型加速库，提供了一系列高性能的算子和模型加速接口，支持PyTorch框架。
 
-#### 发布包安装
-
-暂未正式发布
-
-#### 源码安装
-
-**安装依赖**
-
-> 安装对应的版本的torch、torch_npu、cann包，具体配套关系见pytorch仓(https://gitee.com/ascend/pytorch)首页readme 
->
-> 并source cann包环境变量
-
-##### 下载ADS
 
+# 安装
+本项目依赖昇腾提供的pytorch_npu包和CANN包，需要先安装对应版本的pytorch_npu和CANN软件包，具体配套关系见pytorch仓[README](https://gitee.com/ascend/pytorch)。
+> 基于安全考虑，建议您以非root用户身份执行以下操作。
+## 准备环境
+请参考昇腾官方文档[Pytorch框架训练环境准备](https://hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes/ptes_00001.html)。建议您在准备好环境后，将umask调整为`0027`，以保证文件权限正确。
+## 从发布包安装
+当前并未正式发布whl包 ，请参考源码安装方式。
+## 从源码安装
+1. 克隆原始仓
 ```shell
-# 下载ads仓
-git clone https://gitee.com/ascend/ads.git
+git clone https://gitee.com/ascend/ADS.git
 ```
-
-##### 编译ADS
-
+2. 编译ADS
+> 注意：请在仓库根目录下执行编译命令
 ```shell
-# 编译
-# NOTE: 请在仓库根目录下执行编译命令
-cd ads
 bash ci/build.sh --python=3.7
 ```
-
-| 架构    | pytorch版本  | 出包版本                                                 |
-| ------- | ------------ | -------------------------------------------------------- |
-| x86     | pytorch1.11  | Python3.7(\>=3.7.5)， Python3.8， Python3.9， Python3.10 |
-| x86     | pytorch2.0.1 | Python3.8， Python3.9， Python3.10                       |
-| x86     | pytorch2.1.0 | Python3.8， Python3.9， Python3.10                       |
-| aarch64 | pytorch1.11  | Python3.7(\>=3.7.5)， Python3.8， Python3.9， Python3.10 |
-| aarch64 | pytorch2.0.1 | Python3.8， Python3.9， Python3.10                       |
-| aarch64 | pytorch2.1.0 | Python3.8， Python3.9， Python3.10                       |
+生成的whl包在`ADS/dist`目录下, 命名规则为`ads_accelerator-1.0.0+git{commit_id}-cp{python_version}-linux_{arch}.whl`。
+参数`--python`指定编译过程中使用的python版本，支持3.7及以上：
 
 | 参数   | 取值范围                                                     | 说明                           | 缺省值 | 备注                                           |
 | ------ | ------------------------------------------------------------ | ------------------------------ | ------ | ---------------------------------------------- |
 | python | pytorch1.11，支持3.7及以上；pytorch1.11以上版本，支持3.8及以上 | 指定编译过程中使用的python版本 | 3.7    | 仅pytorch版本为1.11时才支持指定python版本为3.7 |
 
-##### 安装ADS
-
-```shell
-cd ads/dist
-pip3 install ads-1.0-cp37-cp37m-linux_aarch64.whl
-```
-
-#### CMC取包安装
-
-当前ADS包还未商发，需到https://cmc-szv.clouddragon.huawei.com/cmcversion/index/search 搜索 FrameworkPTAdapter V100R001C01B001 取最新的包即可，注意需要根据环境的torch版本和python版本选择下载，如 ADS_v1.11.0_py37.tar.gz，其中v1.11.0表示torch版本，py37表示python版本。
-
-后续计划发包版本
+支持的CPU架构，python和torch版本对应关系如下：
 
 | 架构    | pytorch版本  | 出包版本                                                 |
 | ------- | ------------ | -------------------------------------------------------- |
@@ -66,31 +39,155 @@ pip3 install ads-1.0-cp37-cp37m-linux_aarch64.whl
 | aarch64 | pytorch1.11  | Python3.7(\>=3.7.5)， Python3.8， Python3.9， Python3.10 |
 | aarch64 | pytorch2.0.1 | Python3.8， Python3.9， Python3.10                       |
 | aarch64 | pytorch2.1.0 | Python3.8， Python3.9， Python3.10                       |
+3. 安装ADS
+```shell
+cd ADS/dist
+pip3 install ads_accelerator-1.0.0+git{commit_id}-cp{python_version}-linux_{arch}.whl
+```
+如需要保存安装日志，可在`pip3 install`命令后添加`--log <PATH>`参数，并对您指定的目录<PATH>做好权限控制。
+# 卸载
+Pytorch 框架训练环境的卸载请参考昇腾官方文档[Pytorch框架训练环境卸载](https://hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes/ptes_00032.html)。
+ADS-Accelerator的卸载只需执行以下命令：
+```shell
+pip3 uninstall ADS-accelerator
+```
 
-### ADS算子调用
-
-##### 设置环境变量
-
-注意：其中xxx表示当前环境上的python安装路径
-
-```bash
+# 快速上手
+1. source 环境变量
+```shell
 # 查看ads安装路径
-pip3 show ads-accelerator
-export ASCEND_CUSTOM_OPP_PATH=xxx/site-packages/ads/packages/vendors/customize/
-export LD_LIBRARY_PATH=xxx/site-packages/ads/packages/vendors/customize/op_api/lib/:$LD_LIBRARY_PATH
+pip3 show ADS-accelerator
+export ASCEND_CUSTOM_OPP_PATH=xxx/site-packages/ADS/packages/vendors/customize/
+export LD_LIBRARY_PATH=xxx/site-packages/ADS/packages/vendors/customize/op_api/lib/:$LD_LIBRARY_PATH
 ```
-
-算子调用
-
+2. 算子调用
 ```python
 import torch
 import torch_npu
 import numpy as np
-import ads.common
+import ADS.common
 device = torch.device("npu:5")
 a=torch.rand([8, 2048]).half().npu()
 b=torch.rand([8, 2048]).half().npu()
-c = ads.common.npu_ads_add(a,b)
+c = ADS.common.npu_ads_add(a,b)
 print(c)
 ```
 
+# 特性介绍
+## 目录结构及说明
+```
+.
+├── ADS
+│  ├── __init__.py
+│  ├── common                   # 通用模块
+│  │  ├── __init__.py
+│  │  ├── CMakeLists.txt
+│  │  ├── components            # 通用组件
+│  │  └── ops                   # 通用算子
+│  ├── motion                   # 运动模块
+│  │  ├── __init__.py
+│  │  ├── CMakeLists.txt   
+│  │  ├── components            # 运动组件
+│  │  └── ops                   # 运动算子
+│  └── perception               # 感知模块
+│     ├── __init__.py
+│     ├── CMakeLists.txt
+│     ├── fused                 # 融合模块
+│     ├── point                 # 点云模块
+│     └── vision                # 视觉模块
+├── bind                        # torch 绑定
+├── ci                          # ci脚本
+├── cmake                       # cmake脚本
+├── CMakeLists.txt              # cmake配置文件
+├── CMakePresets.json           # cmake配置文件
+├── docs                        # 文档
+├── include                     # 头文件
+├── LICENSE                     # 开源协议
+├── MANIFEST.in                 # whl打包配置
+├── OWNERS                      # 代码审查
+├── README.md                   # 项目说明
+├── requirements.txt            # 依赖
+├── scripts                     # 工程脚本
+├── setup.py                    # whl打包配置
+├── tests                       # 测试文件
+└── utils                       # 工具脚本
+```
+## 算子清单
+请参见[算子清单](./docs/api/README.md)。
+## 支持特性
+- [x] 支持PyTorch 1.11.0，2.0.1，2.1.0
+- [x] 支持ONNX模型转换，训推一体
+- [ ] 支持图模式
+
+
+# 安全声明
+## 系统安全加固
+
+1. 建议您在运行系统配置时开启ASLR（级别2），又称**全随机地址空间布局随机化**，以提高系统安全性，可参考以下方式进行配置：
+    ```shell
+    echo 2 > /proc/sys/kernel/randomize_va_space
+    ```
+2. 由于ADS-Accelerator需要用户自行编译，建议您对编译后生成的so文件开启`strip`, 又称**移除调试符号信息**, 开启方式如下：
+    ```shell
+    strip -s <so_file>
+    ```
+   具体so文件如下：
+    - ADS/packages/vendors/customize/op_api/lib/libcust_opapi.so
+    - ADS/packages/vendors/customize/op_proto/lib/linux/aarch64/libcust_opsproto_rt2.0.so
+    - ADS/packages/vendors/customize/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64/libcust_opsproto_rt2.0.so
+## 运行用户建议
+出于安全性及权限最小化角度考虑，不建议使用`root`等管理员类型账户使用ads。
+
+## 文件权限控制
+在使用`ADS`时，您可能会进行profiling、调试等操作，建议您对相关目录及文件做好权限控制，以保证文件安全。
+1. 建议您在使用`ADS`时，将umask调整为`0027`及以上，保障新增文件夹默认最高权限为`750`，文件默认最高权限为`640`。
+2. 建议您对个人数据、商业资产、源文件、训练过程中保存的各类文件等敏感内容做好权限管控，可参考下表设置安全权限。
+### 文件权限参考
+
+|   类型                             |   Linux权限参考最大值   |
+|----------------------------------- |-----------------------|
+|  用户主目录                         |   750（rwxr-x---）     |
+|  程序文件(含脚本文件、库文件等)       |   550（r-xr-x---）     |
+|  程序文件目录                       |   550（r-xr-x---）     |
+|  配置文件                           |   640（rw-r-----）     |
+|  配置文件目录                       |   750（rwxr-x---）     |
+|  日志文件(记录完毕或者已经归档)       |   440（r--r-----）     |
+|  日志文件(正在记录)                  |   640（rw-r-----）    |
+|  日志文件目录                       |   750（rwxr-x---）     |
+|  Debug文件                         |   640（rw-r-----）      |
+|  Debug文件目录                      |   750（rwxr-x---）     |
+|  临时文件目录                       |   750（rwxr-x---）     |
+|  维护升级文件目录                   |   770（rwxrwx---）      |
+|  业务数据文件                       |   640（rw-r-----）      |
+|  业务数据文件目录                   |   750（rwxr-x---）      |
+|  密钥组件、私钥、证书、密文文件目录   |   700（rwx------）      |
+|  密钥组件、私钥、证书、加密密文       |   600（rw-------）     |
+|  加解密接口、加解密脚本              |   500（r-x------）      |
+    
+## 构建安全声明
+在源码编译安装ADS-Accelerator时，需要您自行编译，编译过程中会生成一些中间文件，建议您在编译完成后，对中间文件做好权限控制，以保证文件安全。
+## 运行安全声明
+1. 建议您结合运行环境资源状况编写对应训练脚本。若训练脚本与资源状况不匹配，如数据集加载内存大小超出内存容量限制、训练脚本在本地生成数据超过磁盘空间大小等情况，可能引发错误并导致进程意外退出。
+2. ADS在运行异常时(如输入校验异常（请参考api文档说明），环境变量配置错误，算子执行报错等)会退出进程并打印报错信息，属于正常现象。建议用户根据报错提示定位具体错误原因，包括通过设定算子同步执行、查看CANN日志、解析生成的Core Dump文件等方式。
+## 公网地址声明
+
+在ads的配置文件和脚本中存在[公网地址](#公网地址)
+
+### 公网地址
+
+|   类型   |   开源代码地址   | 文件名                                 |   公网IP地址/公网URL地址/域名/邮箱地址   | 用途说明                          |
+|-------------------------|-------------------------|-------------------------------------|-------------------------|-------------------------------|
+|   自研   |   不涉及   | ci/docker/ARM/Dockerfile            |   https://mirrors.huaweicloud.com/repository/pypi/simple   | docker配置文件，用于配置pip源           |
+|   自研   |   不涉及   | ci/docker/X86/Dockerfile            |   https://mirrors.huaweicloud.com/repository/pypi/simple   | docker配置文件，用于配置pip源           |
+|   自研   |   不涉及   | ci/docker/ARM/Dockerfile            |   https://dl.fedoraproject.org/pub/epel/7/aarch64/Packages/n/ninja-build-1.7.2-2.el7.aarch64.rpm   | docker配置文件，用于下载ninja-build    |
+|   自研   |   不涉及   | ci/docker/ARM/build_protobuf.sh     |   https://gitee.com/it-monkey/protocolbuffers.git   | 用于打包whl的url入参                 |
+|   自研   |   不涉及   | setup.cfg                           |   https://gitee.com/ascend/pytorch/tags   | 用于打包whl的download_url入参        |
+|   自研   |   不涉及   | third_party\op-plugin\ci\build.sh   |   https://gitee.com/ascend/pytorch.git   | 编译脚本根据torch_npu仓库地址拉取代码进行编译   |
+|   自研   |   不涉及   | third_party\op-plugin\ci\exec_ut.sh |   https://gitee.com/ascend/pytorch.git   | UT脚本根据torch_npu仓库地址下拉取代码进行UT测试 |
+|   开源引入   |   https://gitee.com/it-monkey/protocolbuffers.git    | ci/docker/ARM/build_protobuf.sh     |   https://gitee.com/it-monkey/protocolbuffers.git   | 用于构建protobuf                  |
+|   开源引入   |   https://gitee.com/it-monkey/protocolbuffers.git    | ci/docker/X86/build_protobuf.sh     |   https://gitee.com/it-monkey/protocolbuffers.git   | 用于构建protobuf                  |
+
+## 公开接口声明
+参考[API清单](./docs/api/README.md)，Ads提供了对外的自定义接口。如果一个函数在文档中有展示，则该接口是公开接口。否则，使用该功能前可以在社区询问该功能是否确实是公开的或意外暴露的接口，因为这些未暴露接口将来可能会被修改或者删除。
+## 通信安全加固和通讯矩阵
+ADS在运行时依赖于`PyTorch_npu`框架，PyTorch_npu框架的通信安全加固和通讯矩阵请参考[PyTorch框架通信安全加固和通讯矩阵](https://gitee.com/ascend/pytorch/blob/master/SECURITYNOTE.md#%E9%80%9A%E4%BF%A1%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA)。
\ No newline at end of file
diff --git a/Third_Party_Open_Source__Software_Notice b/Third_Party_Open_Source__Software_Notice
new file mode 100644
index 00000000..93966ca9
--- /dev/null
+++ b/Third_Party_Open_Source__Software_Notice
@@ -0,0 +1,360 @@
+OPEN SOURCE SOFTWARE NOTICE
+
+Please note we provide an open source software notice along with this product and/or this product firmware (in the following just “this product”). The open source software licenses are granted by the respective right holders. And the open source licenses prevail all other license information with regard to the respective open source software contained in the product, including but not limited to End User Software Licensing Agreement. This notice is provided on behalf of Huawei Technologies Co. Ltd. and any of its local subsidiaries which may have provided this product to you in your local country.
+
+Warranty Disclaimer
+THE OPEN SOURCE SOFTWARE IN THIS PRODUCT IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL, BUT WITHOUT ANY WARRANTY, WITHOUT EVEN THE IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. SEE THE APPLICABLE LICENSES FOR MORE DETAILS.
+
+Copyright Notice and License Texts
+Software: ads v1.0.0
+Copyright notice:
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+                            NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, write to the Free Software Foundation, Inc.,
+    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
+
+
+Written Offer
+This product contains software whose rights holders license it on the terms of the GNU General Public License, version 2 (GPLv2) and/or other open source software licenses. We will provide you and any third party with the source code of the software licensed under an open source software license if you send us a written request by mail or email to the following addresses:
+foss@huawei.com
+detailing the name of the product and the firmware version for which you need the source code and indicating how we can contact you.
+
+Please note you need to make a payment before you obtain the complete Corresponding Source Code from us. For how much you will pay and how we will deliver the complete Corresponding Source Code to you, we will further discuss it by mail or email.
+This offer is valid to anyone in receipt of this information.
+
+THIS OFFER IS VALID FOR THREE YEARS FROM THE MOMENT WE DISTRIBUTED THE PRODUCT OR FIRMWARE.
diff --git a/docs/api/README.md b/docs/api/README.md
index 79ec13df..841ddde4 100644
--- a/docs/api/README.md
+++ b/docs/api/README.md
@@ -1,2 +1,936 @@
-## Description
-+ The folder provides api informations.
\ No newline at end of file
+# Common 算子
+## scatter_max
+### 接口原型
+```python
+ads.common.scatter_max(Tensor updates, Tensor indices, Tensor out=None) -> (Tensor out, Tensor argmax)
+```
+### 功能描述
+在`第0`维上，将输入张量`updates`中的元素按照`indices`中的索引进行分散，然后在第0维上取最大值，返回最大值和对应的索引。对于1维张量，公式如下：
+$$out_i = max(out_i, max_j(updates_j))$$
+$$argmax_i = argmax_j(updates_j)$$
+这里，$i = indices_j$。
+### 参数说明
+- `updates`：更新源张量，数据类型为`float32`。
+- `indices`：索引张量，数据类型为`int32`，且
+  - `indices`的维度必须为`1`，
+  - `indices`第0维的长度必须与`updates`第0维的长度相同。
+  - `indices`的最大值必须小于`491520`。
+- `out`：被更新张量，数据类型为`float32`，默认为`None`。
+### 返回值
+- `out`：更新后的张量，数据类型为`float32`。
+- `argmax`：最大值对应的索引张量，数据类型为`int32`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import scatter_max
+updates = torch.tensor([[2, 0, 1, 4, 4], [0, 2, 1, 3, 4]], dtype=torch.float32).npu()
+indices = torch.tensor([4, 1, 2, 3], dtype=torch.int32).npu()
+out = updates.new_zeros((2, 6))
+out, argmax = npu_scatter_max(updates, indices, out)
+print(out)
+print(argmax)
+```
+```text
+tensor([[0., 0., 0., 0., 0., 0.],
+        [0., 0., 3., 4., 0., 0.]])
+tensor([[2, 2,  2,  2,  2,  2],
+        [ 1,  1,  1, 1, 0, 0]])
+```
+ ## npu_rotated_box_decode
+### 接口原型
+```python
+ads.common.npu_rotated_box_decode(Tensor anchor_boxes, Tensor deltas, Tensor weight) -> Tensor
+```
+### 功能描述
+解码旋转框的坐标。
+### 参数说明
+- `anchor_box(Tensor)`：锚框张量，数据类型为`float32, float16`，形状为`[B, 5, N]`，其中`B`为批大小，`N`为锚框个数, 值`5`分别代表`x0, x1, y0, y1, angle`。
+- `deltas(Tensor)`：偏移量张量，数据类型为`float32, float16`，形状为`[B, 5, N]`，其中`B`为批大小，`N`为锚框个数, 值`5`分别代表`dx, dy, dw, dh, dangle`。
+- `weight(Tensor)`：权重张量，数据类型为`float32, float16`，形状为`[5]`，其中`5`分别代表`wx, wy, ww, wh, wangle`。默认值为`[1, 1, 1, 1, 1]`。
+### 返回值
+- `Tensor`：解码后的旋转框坐标张量，数据类型为`float32, float16`，形状为`[B, 5, N]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_rotated_box_decode
+anchor_boxes = torch.tensor([[[4.137], [33.72], [29.4], [54.06], [41.28]]], dtype=torch.float16).npu()
+deltas = torch.tensor([[[0.0244], [-1.992], [0.2109], [0.315], [-37.25]]], dtype=torch.float16).npu()
+wegiht = torch.tensor([1, 1, 1, 1, 1], dtype=torch.float16).npu()
+out = npu_rotated_box_decode(anchor_boxes, deltas, weight)
+print(out)
+```
+```text
+tensor([[[1.7861], [-10.5781], [33.0000], [17.2969], [-88.4375]]], dtype=torch.float16)
+```
+## npu_rotated_box_encode
+### 接口原型
+```python
+ads.common.npu_rotated_box_encode(Tensor anchor_boxes, Tensor gt_bboxes, Tensor weight) -> Tensor
+```
+### 功能描述
+编码旋转框的坐标。
+### 参数说明
+- `anchor_box(Tensor)`：锚框张量，数据类型为`float32, float16`，形状为`[B, 5, N]`，其中`B`为批大小，`N`为锚框个数, 值`5`分别代表`x0, x1, y0, y1, angle`。
+- `gt_bboxes(Tensor)`：真实框张量，数据类型为`float32, float16`，形状为`[B, 5, N]`，其中`B`为批大小，`N`为锚框个数, 值`5`分别代表`x0, x1, y0, y1, angle`。
+- `weight(Tensor)`：权重张量，数据类型为`float32, float16`，形状为`[5]`，其中`5`分别代表`wx, wy, ww, wh, wangle`。默认值为`[1, 1, 1, 1, 1]`。
+### 返回值
+- `Tensor`：编码后的旋转框坐标张量，数据类型为`float32, float16`，形状为`[B, 5, N]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_rotated_box_encode
+anchor_boxes = torch.tensor([[[30.69], [32.6], [45.94], [59.88], [-44.53]]], dtype=torch.float16).npu()
+gt_bboxes = torch.tensor([[[30.44], [18.72], [33.22], [45.56], [8.5]]], dtype=torch.float16).npu()
+weight = torch.tensor([1, 1, 1, 1, 1], dtype=torch.float16).npu()
+out = npu_rotated_box_encode(anchor_boxes, gt_bboxes, weight)
+print(out)
+```
+```text
+tensor([[[-0.4253], [-0.5166], [-1.7021], [-0.0162], [1.1328]]], dtype=torch.float16)
+```
+## npu_rotated_iou
+### 接口原型
+```python
+ads.common.npu_rotated_iou(Tensor self, Tensor query_boxes, bool trans=False, int mode=0, bool is_cross=True, float v_threshold=0.0, float e_threshold=0.0) -> Tensor
+```
+### 功能描述
+计算旋转框的IoU。
+### 参数说明
+- `self(Tensor)`：梯度增量，数据类型为`float32, float16`，形状为`[B, 5, N]`。
+- `query_boxes(Tensor)`：查询框张量，数据类型为`float32, float16`，形状为`[B, 5, M]`。
+- `trans(bool)`：是否进行坐标变换。默认值为`False`。值为`True`时，表示`xyxyt`, 值为`False`时，表示`xywht`。
+- `is_cross(bool)`：是否计算交叉面积。默认值为`True`。值为`True`时，表示计算交叉面积，值为`False`时，表示计算并集面积。
+- `mode(int)`：计算IoU的模式。默认值为`0`。值为`0`时，表示计算`IoU`，值为`1`时，表示计算`IoF`。
+- `v_threshold(float)`：垂直方向的阈值。默认值为`0.0`。
+- `e_threshold(float)`：水平方向的阈值。默认值为`0.0`。
+### 返回值
+- `Tensor`：IoU张量，数据类型为`float32, float16`，形状为`[B, N, M]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+import numpy as np
+from ads.common import npu_rotated_iou
+a = np.random.uniform(0, 1, (2, 2, 5)).astype(np.float16)
+b = np.random.uniform(0, 1, (2, 3, 5)).astype(np.float16)
+box1 = torch.from_numpy(a).npu()
+box2 = torch.from_numpy(b).npu()
+iou = npu_rotated_iou(box1, box2, trans=False, mode=0, is_cross=True, v_threshold=0.0, e_threshold=0.0)
+print(iou)
+```
+```text
+tensor([[[3.3325e-01, 1.0162e-01],
+         [1.0162e-01, 1.0000e+00]],
+
+        [[0.0000e+00, 0.0000e+00],
+         [0.0000e+00, 5.9605e-08]]], dtype=torch.float16)
+```
+## npu_rotated_overlaps
+### 接口原型
+```python
+ads.common.npu_rotated_overlaps(Tensor self, Tensor query_boxes, bool trans=False) -> Tensor
+```
+### 功能描述
+计算旋转框的重叠面积。
+### 参数说明
+- `self(Tensor)`：梯度增量，数据类型为`float32, float16`，形状为`[B, 5, N]`。
+- `query_boxes(Tensor)`：查询框张量，数据类型为`float32, float16`，形状为`[B, 5, M]`。
+- `trans(bool)`：是否进行坐标变换。默认值为`False`。值为`True`时，表示`xyxyt`, 值为`False`时，表示`xywht`。
+### 返回值
+- `Tensor`：重叠面积张量，数据类型为`float32, float16`，形状为`[B, N, M]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+import numpy as np
+from ads.common import npu_rotated_overlaps
+a = np.random.uniform(0, 1, (1, 3, 5)).astype(np.float16)
+b = np.random.uniform(0, 1, (1, 2, 5)).astype(np.float16)
+box1 = torch.from_numpy(a).npu()
+box2 = torch.from_numpy(b).npu()
+output = npu_rotated_overlaps(box1, box2)
+print(output)
+```
+```text
+tensor([[[0.0000, 0.1562, 0.0000],
+         [0.1562, 0.3713, 0.0611],
+         [0.0000, 0.0611, 0.0000]]], dtype=torch.float16)
+```
+## npu_sign_bits_pack
+### 接口原型
+```python
+ads.common.npu_sign_bits_pack(Tensor self, int size) -> Tensor
+```
+### 功能描述
+将输入张量的数据按位打包为uint8类型。
+### 参数说明
+- `self(Tensor)`：1D输入张量，数据类型为`float32, float16`。
+- `size(int)`：reshape 时输出张量的第一个维度。
+### 返回值
+- `Tensor`：打包后的张量，数据类型为`uint8`。
+### 约束说明
+Size为可被float打包的输出整除的整数。如果self的size可被8整除，则size为self.size/8，否则size为self.size/8+1。将在小端位置添加-1浮点值以填充可整除性。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_sign_bits_pack
+a = torch.tensor([5, 4, 3, 2, 0, -1, -2, 4, 3, 2, 1, 0, -1, -2], dtype=torch.float32).npu()
+out = npu_sign_bits_pack(a, 2)
+print(out)
+```
+```text
+tensor([[159], [15]], dtype=torch.uint8)
+```
+## npu_sign_bits_unpack
+### 接口原型
+```python
+ads.common.npu_sign_bits_unpack(Tensor x, int dtype, int size) -> Tensor
+```
+### 功能描述
+将输入张量的数据按位解包为float类型。
+### 参数说明
+- `x(Tensor)`：1D输入张量，数据类型为`uint8`。
+- `dtype(torch.dtype)`：输出张量的数据类型。值为1时，表示`float32`，值为0时，表示`float16`。
+- `size(int)`：reshape 时输出张量的第一个维度。
+### 返回值
+- `Tensor`：解包后的张量，数据类型为`float32, float16`。
+### 约束说明
+Size为可被uint8s解包的输出整数。输出大小为(size of x)*8。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_sign_bits_unpack
+a = torch.tensor([159, 15], dtype=torch.uint8).npu()
+out = npu_sign_bits_unpack(a, 0, 2)
+print(out)
+```
+```text
+tensor([[1., 1., 1., 1., 1., -1., -1., 1.], [1., 1., 1., 1., -1., -1., -1., -1.]], dtype=torch.float16)
+```
+## npu_softmax_cross_entropy_with_logits
+### 接口原型
+```python
+ads.common.npu_softmax_cross_entropy_with_logits(Tensor features, Tensor labels) -> Tensor
+```
+### 功能描述
+计算softmax交叉熵。
+### 参数说明
+- `features(Tensor)`：输入张量，数据类型为`float32, float16`。shape为`[B, N]`, 其中`B`为批大小，`N`为类别数。
+- `labels(Tensor)`：标签张量, 与`features`的shape相同。
+### 返回值
+- `Tensor`：交叉熵张量，数据类型为`float32, float16`，shape为`[B]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_softmax_cross_entropy_with_logits
+features = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32).npu()
+labels = torch.tensor([[0, 1, 0], [1, 0, 0]], dtype=torch.float32).npu()
+out = npu_softmax_cross_entropy_with_logits(features, labels)
+print(out)
+```
+```text
+tensor([1.4076, 2.4076], dtype=torch.float32)
+```
+## npu_stride_add
+### 接口原型
+```python
+ads.common.npu_stride_add(Tensor x1, Tensor x2, int offset1, int offset2, int c1_len) -> Tensor
+```
+### 功能描述
+将两个张量按照指定的偏移量进行相加, 格式为`NC1HWC0`。
+### 参数说明
+- `x1(Tensor)`：输入张量，`5HD`格式，数据类型为`float32, float16`。
+- `x2(Tensor)`：输入张量，与`x1`的shape相同，数据类型为`float32, float16`。
+- `offset1(int)`：`x1`的偏移量。
+- `offset2(int)`：`x2`的偏移量。
+- `c1_len(int)`：输出张量的`C1`维度。该值必须小于`x1`和`x2`中`C1`与`offset`的差值。
+### 返回值
+- `Tensor`：相加后的张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_stride_add
+x1 = torch.tensor([[[[[1]]]]], dtype=torch.float32).npu()
+out = npu_stride_add(x1, x1, 0, 0, 1)
+print(out)
+```
+```text
+tensor([[[[[2]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]], [[[0]]]]], dtype=torch.float32)
+```
+## npu_transpose
+### 接口原型
+```python
+ads.common.npu_transpose(Tensor x, List[int] perm, bool require_contiguous=True) -> Tensor
+```
+### 功能描述
+将输入张量的维度按照指定的顺序进行转置。支持`FakeTensor`模式。
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。
+- `perm(List[int])`：转置顺序。
+- `require_contiguous(bool)`：是否要求输出张量是连续的。默认值为`True`。
+### 返回值
+- `Tensor`：转置后的张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_transpose
+x = torch.tensor([[[1, 2, 3], [4, 5, 6]]], dtype=torch.float32).npu()
+y = npu_transpose(x, [0, 2, 1])
+print(y)
+```
+```text
+tensor([[[1., 4.], [2., 5.], [3., 6.]]], dtype=torch.float
+```
+## npu_yolo_boxes_encode
+### 接口原型
+```python
+ads.common.npu_yolo_boxes_encode(Tensor anchors, Tensor gt_bboxes, Tensor stride, bool perfermance_mode=False) -> Tensor
+```
+### 功能描述
+根据YOLO的锚点框(anchor)和真实框(gt_bboxes)生成编码后的框。
+### 参数说明
+- `anchors(Tensor)`：锚点框张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`tx, ty, tw, th`。
+- `gt_bboxes(Tensor)`：真实框张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`dx, dy, dw, dh`。
+- `stride(Tensor)`：步长张量，数据类型为`int32`，形状为`[N]`，其中`N`为`ROI`的个数。
+- `perfermance_mode(bool)`：是否为性能模式。默认值为`False`。当值为`True`时，表示为性能模式，输入类型为`float16`时，将是最新的性能模式，但精度只小于`0.005`；当值为`False`时，表示为精度模式，输入类型为`float32`是,输出精度小于`0.0001`。
+### 返回值
+- `Tensor`：编码后的框张量，数据类型为`float32, float16`，形状为`[N, 4]`。
+### 约束说明
+- `anchors`和`gt_bboxes`的`N`必须相同，且`N`的值必须小于`20480`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_yolo_boxes_encode
+anchors = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=torch.float32).npu()
+gt_bboxes = torch.tensor([[5, 6, 7, 8], [1, 2, 3, 4]], dtype=torch.float32).npu()
+stride = torch.tensor([1, 2], dtype=torch.int32).npu()
+out = npu_yolo_boxes_encode(anchors, gt_bboxes, stride)
+print(out)
+```
+```text
+tensor([[ 1.0000,  1.0000,  0.0000,  0.0000],
+        [1.0133e-06, 1.0133e-06,  0.0000,  0.0000]], dtype=torch.float32)
+```
+## npu_scatter
+### 接口原型
+```python
+ads.common.npu_scatter(Tensor self, Tensor indices, Tensor updates, int dim) -> Tensor
+```
+### 功能描述
+将`updates`张量中的元素按照`indices`张量中的索引进行分散，然后将分散的元素加到`self`张量中。
+### 参数说明
+- `self(Tensor)`：被更新张量，数据类型为`float32, float16`。
+- `indices(Tensor)`：索引张量，数据类型为`int32`。可以为空，也可以与`updates`有相同的维数。当为空时，操作返回`self unchanged`。
+- `updates(Tensor)`：更新源张量，数据类型为`float32, float16`。
+- `dim(int)`：分散的维度。
+### 返回值
+- `Tensor`：更新后的张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_scatter
+input = torch.tensor([[1.6279, 0.1226], [0.9041, 1.0980]], dtype=torch.float32).npu()
+indices = torch.tensor([0, 1], dtype=torch.int32).npu()
+updates = torch.tensor([-1.1993, -1.5247], dtype=torch.float32).npu()
+out = npu_scatter(input, indices, updates, 0)
+print(out)
+```
+```text
+tensor([[-0.1993, 0.1226], [ 0.9041, -1.5247]], dtype=torch.float32)
+```
+## npu_silu
+### 接口原型
+```python
+ads.common.npu_silu(Tensor x) -> Tensor
+```
+### 功能描述
+计算Sigmoid Linear Unit(SiLU)激活函数。公式如下：
+$$f(x) = x * sigmoid(x)$$
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。
+### 返回值
+- `Tensor`：激活后的张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_silu
+x = torch.tensor([1, 2, 3, 4], dtype=torch.float32).npu()
+out = npu_silu(x)
+print(out)
+```
+```text
+tensor([0.7311, 1.7646, 2.8577, 3.9281], dtype=torch.float32)
+```
+> 注意：可以通过`npu_silu_`接口实现原地操作。
+## npu_rotary_mul
+### 接口原型
+```python
+ads.common.npu_rotary_mul(Tensor x, Tensor r1, Tensor r2) -> Tensor
+```
+### 功能描述
+计算旋转乘法。公式如下：
+$$x1, x2 = x[..., :C//2], x[..., C//2:]$$
+$$x_new = [-x2, x1]$$
+$$y = x * r1 + x_new * r2$$
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。要求`x`的维度为`4`。
+- `r1(Tensor)`：旋转因子张量，数据类型为`float32, float16`。代表`cos`。
+- `r2(Tensor)`：旋转因子张量，数据类型为`float32, float16`。代表`sin`。
+### 返回值
+- `Tensor`：旋转乘法后的张量，数据类型为`float32, float16`。
+### 约束说明
+- `x`的维度必须为`4`， 一般为`[B, N, S, D]`或`[B, S, N, D]`或`[S, B, N, D]`。
+- `r1`和`r2`的维度必须为`4`， 一般为`[1, 1, S, D]`或`[S, 1, 1, D]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_rotary_mul
+x = torch.tensor([[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]], dtype=torch.float32).npu()
+r1 = torch.tensor([[[[0.1, 0.2], [0.3, 0.4]], [[0.5, 0.6], [0.7, 0.8]]]], dtype=torch.float32).npu()
+r2 = torch.tensor([[[[0.2, 0.3], [0.4, 0.5]], [[0.6, 0.7], [0.8, 0.9]]]], dtype=torch.float32).npu()
+out = npu_rotary_mul(x, r1, r2)
+print(out)
+```
+```text
+tensor([[[[-0.3000, 0.7000], [-0.7000, 3.1000]], [[-1.1000, 7.1000], [-1.5000, 12.7000]]]], dtype=torch.float32)
+```
+## npu_abs
+### 接口原型
+```python
+ads.common.npu_abs(Tensor x) -> Tensor
+```
+### 功能描述
+计算输入张量的绝对值。
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。
+### 返回值
+- `Tensor`：绝对值张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_abs
+x = torch.tensor([1, -2, 3, -4], dtype=torch.float32).npu()
+out = npu_abs(x)
+print(out)
+```
+```text
+tensor([1., 2., 3., 4.], dtype=torch.float32)
+```
+## fast_gelu
+### 接口原型
+```python
+ads.common.fast_gelu(Tensor x) -> Tensor
+```
+### 功能描述
+计算输入张量的GELU激活函数。公式如下：
+$$f(x) = x/(1+exp(-1.702 * |x|))*exp(0.851*(x-|x|))$$
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。
+### 返回值
+- `Tensor`：激活后的张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+import numpy as np
+from ads.common import fast_gelu
+x = torch.from_numpy(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]])).float().npu()
+output = fast_gelu(x)
+print(output)
+```
+```text
+tensor([[-1.5418735e-01  3.9921875e+00 -9.7473649e-06],  [ 1.9375000e+00 -1.0052517e-03  8.9824219e+00]], dtype=torch.float32)
+```
+## npu_anchor_response_flags
+### 接口原型
+```python
+ads.common.npu_anchor_response_flags(Tensor gt_bboxes, List[int] featmap_size, List[int] strides, int num_base_anchors) -> Tensor
+```
+### 功能描述
+根据真实框(gt_bboxes)和特征图大小(featmap_size)生成锚点响应标志。
+### 参数说明
+- `gt_bboxes(Tensor)`：真实框张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`x0, y0, x1, y1`。
+- `featmap_size(List[int])`：特征图大小，形状为`[2]`，其中`2`分别代表`H, W`。
+- `strides(List[int])`：步长，形状为`[2]`，其中`2`分别代表`stride_h, stride_w`。
+- `num_base_anchors(int)`：基础锚点数。
+### 返回值
+- `Tensor`：锚点响应标志张量，数据类型为`uint8`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_anchor_response_flags
+gt_bboxes = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=torch.float32).npu()
+featmap_size = [2, 3]
+strides = [1, 2]
+num_base_anchors = 2
+out = npu_anchor_response_flags(gt_bboxes, featmap_size, strides, num_base_anchors)
+print(out)
+```
+```text
+tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype=torch.uint8)
+```
+## npu_bounding_box_decode
+### 接口原型
+```python
+ads.common.npu_bounding_box_decode(Tensor rois, Tensor deltas, float means0, float means1, float means2, float means3, float stds0, float stds1, float stds2, float stds3, int max_shape, float wh_ratio_clip) -> Tensor
+```
+### 功能描述
+根据`rois`和`deltas`生成解码后的框。
+### 参数说明
+- `rois(Tensor)`：区域候选网络（RPN）生成的ROI，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`x0, y0, x1, y1`。
+- `deltas(Tensor)`：偏移量张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`dx, dy, dw, dh`。
+- `means0(float)`：均值，用于归一化`dx`。
+- `means1(float)`：均值，用于归一化`dy`。
+- `means2(float)`：均值，用于归一化`dw`。
+- `means3(float)`：均值，用于归一化`dh`。
+- `stds0(float)`：标准差，用于归一化`dx`。
+- `stds1(float)`：标准差，用于归一化`dy`。
+- `stds2(float)`：标准差，用于归一化`dw`。
+- `stds3(float)`：标准差，用于归一化`dh`。
+  - 以上参数均为`float32`类型，`meas`默认值为`0`, `std`默认值为`1`。`delta`的归一化公式为：`delta = (delta - means) / stds`。
+- `max_shape(int)`：最大形状。用于确保转换后的bbox不超过最大形状。默认值为`0`。
+- `wh_ratio_clip(float)`：宽高比裁剪。`dw`和`dh`的值在`(-wh_ratio_clip, wh_ratio_clip)`之间。
+### 返回值
+- `Tensor`：解码后的框张量，数据类型为`float32, float16`，形状为`[N, 4]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_bounding_box_decode
+rois = torch.tensor([[1, 2, 3, 4], [3, 4, 5, 6]], dtype=torch.float32).npu()
+deltas = torch.tensor([[5, 6, 7, 8], [7, 8, 9, 6]], dtype=torch.float32).npu()
+out = npu_bounding_box_decode(rois, deltas, 0, 0, 0, 0, 1, 1, 1, 1, (10, 10), 0.1)
+print(out)
+```
+```text
+tensor([[ 2.5000,  6.5000,  9.0000,  9.0000], [ 9.0000,  9.0000,  9.0000,  9.0000]], dtype=torch.float32)
+```
+## npu_bounding_box_encode
+### 接口原型
+```python
+ads.common.npu_bounding_box_encode(Tensor anchor_boxes, Tensor gt_bboxes, float means0, float means1, float means2, float means3, float stds0, float stds1, float stds2, float stds3) -> Tensor
+```
+### 功能描述
+根据`anchor_boxes`和`gt_bboxes`生成编码后的框。
+### 参数说明
+- `anchor_boxes(Tensor)`：锚框张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`x0, y0, x1, y1`。
+- `gt_bboxes(Tensor)`：真实框张量，数据类型为`float32, float16`，形状为`[N, 4]`，其中`N`为`ROI`的个数，`4`分别代表`x0, y0, x1, y1`。
+- `means0(float)`：均值，用于归一化`dx`。
+- `means1(float)`：均值，用于归一化`dy`。
+- `means2(float)`：均值，用于归一化`dw`。
+- `means3(float)`：均值，用于归一化`dh`。
+- `stds0(float)`：标准差，用于归一化`dx`。
+- `stds1(float)`：标准差，用于归一化`dy`。
+- `stds2(float)`：标准差，用于归一化`dw`。
+- `stds3(float)`：标准差，用于归一化`dh`。
+  - 以上参数均为`float32`类型，`meas`默认值为`0`, `std`默认值为`1`。`delta`的归一化公式为：`delta = (delta - means) / stds`。
+### 返回值
+- `Tensor`：编码后的框张量，数据类型为`float32, float16`，形状为`[N, 4]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_bounding_box_encode
+anchor_boxes = torch.tensor([[1, 2, 3, 4], [3, 4, 5, 6]], dtype=torch.float32).npu()
+gt_bboxes = torch.tensor([[5, 6, 7, 8], [7, 8, 9, 6]], dtype=torch.float32).npu()
+out = npu_bounding_box_encode(anchor_boxes, gt_bboxes, 0, 0, 0, 0, 0.1, 0.1, 0.2, 0.2)
+print(out)
+```
+```text
+tensor([[13.3281, 13.3281,  0.0000,  0.0000], [ 13.3281,  6.6641,  0.0000,  -5.4922]], dtype=torch.float32)
+```
+## npu_batch_nms
+### 接口原型
+```python
+ads.common.npu_batch_nms(Tensor self, Tensor scores, float score_threshold, float iou_threshold, int max_size_per_class, int max_total_size, bool change_coordinate_frame=False, bool transpose_box=False) -> (Tensor, Tensor, Tensor, Tensor)
+```
+### 功能描述
+根据`batch` 分类计算输入框评分，通过评分排序，删除评分高于阈值的框。通过NMS操作，删除重叠度高于阈值的框。
+### 参数说明
+- `self(Tensor)`：输入张量，数据类型为`float16`，形状为`[B, N, q, 1]`，其中`B`为批大小，`N`为框的个数，`q=1`或`q=num_classes`。
+- `scores(Tensor)`：评分张量，数据类型为`float16`，形状为`[B, N, num_classes]`。
+- `score_threshold(float)`：评分阈值，用于过滤评分低于阈值的框。
+- `iou_threshold(float)`：IoU阈值，用于过滤重叠度高于阈值的框。
+- `max_size_per_class(int)`：每个类别的最大框数。
+- `max_total_size(int)`：总的最大框数。
+- `change_coordinate_frame(bool)`：是否正则化输出框坐标矩阵。默认值为`False`。
+- `transpose_box(bool)`：是否转置输出框坐标矩阵。默认值为`False`。
+### 返回值
+- nmsed_boxes(Tensor)：NMS后的框张量，数据类型为`float16`，形状为`[B, max_total_size, 4]`。
+- nmsed_scores(Tensor)：NMS后的评分张量，数据类型为`float16`，形状为`[B, max_total_size]`。
+- nmsed_classes(Tensor)：NMS后的类别张量，数据类型为`float16`，形状为`[B, max_total_size]`。
+- nmsed_num(Tensor)：NMS后的框数张量，数据类型为`int32`，形状为`[B]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_batch_nms
+self = torch.tensor([[[[1, 2, 3, 4]]]], dtype=torch.float16).npu()
+scores = torch.tensor([[[1, 2, 3]]], dtype=torch.float16).npu()
+nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = npu_batch_nms(self, scores, 0.5, 0.5, 1, 1)
+print(nmsed_boxes)
+print(nmsed_scores)
+print(nmsed_classes)
+print(nmsed_num)
+```
+```text
+tensor([[[1.0000, 2.0000, 3.0000, 4.0000]]], dtype=torch.float16)
+tensor([[3.]], dtype=torch.float16)
+tensor([[2.]], dtype=torch.float16)
+tensor([1], dtype=torch.int32)
+```
+## npu_confusion_transpose
+### 接口原型
+```python
+ads.common.npu_confusion_transpose(Tensor self, List[int] perm, List[int] shape, bool transpose_first) -> Tensor
+```
+### 功能描述
+根据`perm`和`shape`对输入张量进行转置。
+### 参数说明
+- `self(Tensor)`：输入张量，数据类型为`float32, float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64`。
+- `perm(List[int])`：转置顺序。
+- `shape(List[int])`：输入张量的形状。
+- `transpose_first(bool)`：是否先转置。默认值为`False`。
+### 返回值
+- `Tensor`：转置后的张量，数据类型为`float32, float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_confusion_transpose
+x = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], dtype=torch.float32).npu()
+out = npu_confusion_transpose(x, [0, 2, 1], [2, 2, 2], False)
+print(out)
+```
+```text
+tensor([[[1., 3.], [2., 4.]], [[5., 7.], [6., 8.]]], dtype=torch.float32)
+```
+## npu_broadcast
+### 接口原型
+```python
+ads.common.npu_broadcast(Tensor self, List[int] size) -> Tensor
+```
+### 功能描述
+根据`size`对输入张量进行广播。
+### 参数说明
+- `self(Tensor)`：输入张量，数据类型为`float32, float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64`。
+- `size(List[int])`：广播后的形状。
+### 返回值
+- `Tensor`：广播后的张量，数据类型为`float32, float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_broadcast
+x = torch.tensor([[1], [2], [3]], dtype=torch.float32).npu()
+out = npu_broadcast(x, [3, 4])
+print(out)
+```
+```text
+tensor([[1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.]], dtype=torch.float32)
+```
+## npu_moe_tutel
+### 接口原型
+```python
+ads.common.npu_moe_tutel(Tensor x, Tensor gates, Tensor indices, Tensor locations, int capacity)
+```
+### 功能描述
+Expert parallelism 把专家分配到不同的计算资源上，比如，一个专家分配1-N个NPU。
+### 参数说明
+- `x(Tensor)`：MHA层输出的全量token，数据类型为`float32, float16, bf16`。
+- `gates(Tensor)`：门控函数的输出结果，数据类型为`float32, float16, bf16`。
+- `indices(Tensor)`：batch值对应的索引，数据类型为`int32`。
+- `locations(Tensor)`：capacity值对应的索引，数据类型为`int32`。
+### 返回值
+- `y(Tensor)`: 专家输出的结果，数据类型为`float32, float16, bf16`。shape 为`[B, capacity, x[1]]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_moe_tutel
+x = torch.tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]], dtype=torch.float32).npu()
+gates = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], dtype=torch.float32).npu()
+indices = torch.tensor([1, 2], dtype=torch.int32).npu()
+locations = torch.tensor([1, 2], dtype=torch.int32).npu()
+out = npu_moe_tutel(x, gates, indices, locations, 2)
+print(out)
+```
+## npu_dynamic_scatter
+### 接口原型
+```python
+ads.common.npu_dynamic_scatter(Tensor feats, Tensor coors_map, string reduce_type) -> Tensor
+```
+### 功能描述
+将特征点在对应体素中进行特征压缩。
+### 参数说明
+- `feats(Tensor)`：特征张量，数据类型为`float32, float16`。
+- `coors_map(Tensor)`：体素坐标映射张量，数据类型为`int32`。
+- `reduce_type(string)`：压缩类型。可选值为`0, 1, 2`。当值为`0`时，表示`sum`；当值为`1`时，表示`mean`；当值为`2`时，表示`max
+### 返回值
+- `Tensor`：压缩后的特征张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_dynamic_scatter
+feats = torch.tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]], dtype=torch.float32).npu()
+coors_map = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], dtype=torch.int32).npu()
+output_feats, output_coors = ads.common.npu_dynamic_scatter(feats, coors, 1)
+print(out)
+```
+## npu_points_in_box
+### 接口原型
+```python
+ads.common.npu_points_in_box(Tensor boxes, Tensor points) -> Tensor
+```
+### 功能描述
+判断点是否在框内。
+### 参数说明
+- `boxes(Tensor)`：框张量，数据类型为`float32, float16`。shape 为`[B, M, 7]`。`7`分别代表`x, y, z, x_size, y_size, z_size, rz`。
+- `points(Tensor)`：点张量，数据类型为`float32, float16`。shape 为`[B, N, 3]`。`3`分别代表`x, y, z`。
+### 返回值
+- `boxes_idx_of_points(Tensor)`：点在框内的索引张量，数据类型为`int32`。shape 为`[B, N]`。
+### 约束说明
+- `boxes`和`points`的`B`必须相同，且只能为`1`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_points_in_box
+boxes = torch.tensor([[[1, 2, 3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8, 9]]], dtype=torch.float32).npu()
+points = torch.tensor([[[1, 2, 3], [3, 4, 5]]], dtype=torch.float32).npu()
+out = npu_points_in_box(boxes, points)
+print(out)
+```
+```text
+tensor([[0, 1]], dtype=torch.int32)
+```
+## npu_ads_add
+### 接口原型
+```python
+ads.common.npu_ads_add(Tensor x, Tensor y) -> Tensor
+```
+### 功能描述
+计算两个张量的和。
+### 参数说明
+- `x(Tensor)`：输入张量，数据类型为`float32, float16`。
+- `y(Tensor)`：输入张量，数据类型为`float32, float16`。
+### 返回值
+- `Tensor`：和张量，数据类型为`float32, float16`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_ads_add
+x = torch.tensor([1, 2, 3, 4], dtype=torch.float32).npu()
+y = torch.tensor([5, 6, 7, 8], dtype=torch.float32).npu()
+out = npu_ads_add(x, y)
+print(out)
+```
+```text
+tensor([6., 8., 10., 12.], dtype=torch.float32)
+```
+## npu_multi_scale_deformable_attn_function
+### 接口原型
+```python
+ads.common.npu_multi_scale_deformable_attn_function(Tensor value, Tensor shape, Tensor offset, Tensor locations, Tensor weight) -> Tensor
+```
+### 功能描述
+多尺度可变形注意力机制, 将多个视角的特征图进行融合。
+### 参数说明
+- `value(Tensor)`：特征张量，数据类型为`float32, float16`。shape为`[bs, num_keys, num_heads, embed_dim]`。其中`bs`为batch size，`num_keys`为特征图的数量，`num_heads`为头的数量，`embed_dim`为特征图的维度。
+- `shape(Tensor)`：特征图的形状，数据类型为`int32`。shape为`[num_levels, 2]`。其中`num_levels`为特征图的数量，`2`分别代表`H, W`。
+- `offset(Tensor)`：偏移量张量，数据类型为`int32`。shape为`[num_levels]`。
+- `locations(Tensor)`：位置张量，数据类型为`int32`。shape为`[bs, num_queries, num_heads, num_levels, num_points, 2]`。其中`bs`为batch size，`num_queries`为查询的数量，`num_heads`为头的数量，`num_levels`为特征图的数量，`num_points`为采样点的数量，`2`分别代表`y, x`。
+- `weight(Tensor)`：权重张量，数据类型为`float32, float16`。shape为`[bs, num_queries, num_heads, num_levels, num_points]`。其中`bs`为batch size，`num_queries`为查询的数量，`num_heads`为头的数量，`num_levels`为特征图的数量，`num_points`为采样点的数量。
+### 返回值
+- `Tensor`：融合后的特征张量，数据类型为`float32, float16`。shape为`[bs, num_queries, num_heads*embed_dim]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 约束说明
+- `locations`的值在`[0, 1]`之间。
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_multi_scale_deformable_attn_function
+value = torch.tensor([[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]], dtype=torch.float32).npu()
+shape = torch.tensor([[1, 1]], dtype=torch.int32).npu()
+offset = torch.tensor([1], dtype=torch.int32).npu()
+locations = torch.tensor([[[[[0.1, 0.2], [0.3, 0.4]]]]], dtype=torch.float32).npu()
+weight = torch.tensor([[[[[1, 2], [3,4]]]]], dtype=torch.float32).npu()
+out = npu_multi_scale_deformable_attn_function(value, shape, offset, locations, weight)
+print(out)
+```
+```text
+tensor([[[9.3002, 11.1603, 0.0000, 0.0000]]], dtype=torch.float32)
+```
+## voxelization
+### 接口原型
+```python
+ads.common.voxelization(Tensor points, List[int] voxel_size, List[int] coors_range, int max_points=-1, int max_voxels=-1, bool deterministic=True) -> Tensor
+```
+### 功能描述
+将点云数据进行体素化。
+### 参数说明
+- `points(Tensor)`：点云数据，数据类型为`float32, float16`。shape为`[3, N]`。其中`N`为点的数量，`3`分别代表`x, y, z`。
+- `voxel_size(List[int])`：体素大小，数据类型为`float32, float16`。shape为`[3]`。其中`3`分别代表`x, y, z`。
+- `coors_range(List[int])`：体素范围，数据类型为`float32, float16`。shape为`[6]`。其中`6`分别代表`x_min, y_min, z_min, x_max, y_max, z_max`。
+- `max_points(int)`：每个体素的最大点数。默认值为`-1`。
+- `max_voxels(int)`：最大体素数。默认值为`-1`。
+- `deterministic(bool)`：是否确定性。默认值为`True`。
+### 返回值
+- `Tensor`：体素化后的张量，数据类型为`int32`。shape为`[max_voxels, max_points]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import voxelization
+points = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float32).npu()
+out = voxelization(points, [1, 1, 1], [1, 2, 3, 4, 5, 6])
+print(out)
+```
+## npu_nms3d_normal
+### 接口原型
+```python
+ads.common.npu_nms3d_normal(Tensor boxes, Tensor scores, float: iou_threshold) -> Tensor
+```
+### 功能描述
+3D非极大值抑制。
+### 参数说明
+- `boxes(Tensor)`：框张量，数据类型为`float32, float16`。shape 为`[N, 7]`。`7`分别代表`x, y, z, x_size, y_size, z_size, rz`。
+- `scores(Tensor)`：评分张量，数据类型为`float32, float16`。shape 为`[N]`。
+- `iou_threshold(float)`：IoU阈值。
+### 返回值
+- `Tensor`：NMS后的框张量，数据类型为`int32`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_nms3d_normal
+boxes = torch.tensor([[1, 2, 3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8, 9]], dtype=torch.float32).npu()
+scores = torch.tensor([1, 2], dtype=torch.float32).npu()
+out = npu_nms3d_normal(boxes, scores, 0.5)
+print(out)
+```
+```text
+tensor([[1, 0]], dtype=torch.int32)
+```
+## npu_nms3d
+### 接口原型
+```python
+ads.common.npu_nms3d(Tensor boxes, Tensor scores, float: iou_threshold) -> Tensor
+```
+### 功能描述
+3D非极大值抑制，在bev视角下剔除多个3d box交并比大于阈值的box。
+### 参数说明
+- `boxes(Tensor)`：框张量，数据类型为`float32, float16`。shape 为`[N, 7]`。`7`分别代表`x, y, z, x_size, y_size, z_size, rz`。
+- `scores(Tensor)`：评分张量，数据类型为`float32, float16`。shape 为`[N]`。
+- `iou_threshold(float)`：IoU阈值。
+### 返回值
+- `Tensor`：NMS后的框张量，数据类型为`int32`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_nms3d
+boxes = torch.tensor([[1, 2, 3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8, 9]], dtype=torch.float32).npu()
+scores = torch.tensor([1, 2], dtype=torch.float32).npu()
+out = npu_nms3d(boxes, scores, 0.5)
+print(out)
+```
+```text
+tensor([[1]], dtype=torch.int32)
+```
+## npu_furthest_point_sampling
+### 接口原型
+```python
+ads.common.npu_furthest_point_sampling(Tensor points, int num_points) -> Tensor
+```
+### 功能描述
+点云数据的最远点采样。
+### 参数说明
+- `points(Tensor)`：点云数据，数据类型为`float32, float16`。shape为`[B, N, 3]`。其中`B`为batch size，`N`为点的数量，`3`分别代表`x, y, z`。
+- `num_points(int)`：采样点的数量。
+### 返回值
+- `Tensor`：采样后的点云数据，数据类型为`float32, float16`。shape为`[B, num_points]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import npu_furthest_point_sampling
+points = torch.tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]], dtype=torch.float32).npu()
+out = npu_furthest_point_sampling(points, 2)
+print(out)
+```
+```text
+tensor([[0, 2]], dtype=torch.int32)
+```
+## furthest_point_sample_with_dist
+### 接口原型
+```python
+ads.common.furthest_point_sample_with_dist(Tensor points, int num_points) -> (Tensor, Tensor)
+```
+### 功能描述
+与`npu_furthest_point_sampling`功能相同，但输入略有不同。
+### 参数说明
+- `points(Tensor)`：点云数据，表示各点间的距离，数据类型为`float32, float16`。shape为`[B, N, N]`。其中`B`为batch size，`N`为点的数量。
+- `num_points(int)`：采样点的数量。
+### 返回值
+- `Tensor`：采样后的点云数据，数据类型为`float32, float16`。shape为`[B, num_points]`。
+### 支持的型号
+- Atlas A2 训练系列产品
+### 调用示例
+```python
+import torch, torch_npu
+from ads.common import furthest_point_sample_with_dist
+points = torch.tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]], dtype=torch.float32).npu()
+out = furthest_point_sample_with_dist(points, 2)
+print(out)
+```
+```text
+tensor([[0, 2]], dtype=torch.int32)
+```
\ No newline at end of file
-- 
Gitee