diff --git a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md index b2eaf55735840bab21e09e6058ebe352fd975919..35b131b6a7dde5177a9ce4e39698b686d81f7a03 100644 --- a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md +++ b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md @@ -37,14 +37,14 @@ - Solution: 1. Check if the model path exists and is valid; - 2. If the model path exists and the model files are in `safetensors` format, confirm whether the yaml file contains the `load_ckpt_format: "safetensors"` field: - 1. Print the path of the yaml file used by the model: + 2. If the model path exists and the model files are in `safetensors` format, confirm whether the YAML file contains the `load_ckpt_format: "safetensors"` field: + 1. Print the path of the YAML file used by the model: ```bash echo $MINDFORMERS_MODEL_CONFIG ``` - 2. Check the yaml file. If the `load_ckpt_format` field is missing, add it: + 2. Check the YAML file. If the `load_ckpt_format` field is missing, add it: ```text load_ckpt_format: "safetensors" diff --git a/docs/vllm_mindspore/docs/source_en/general/security.md b/docs/vllm_mindspore/docs/source_en/general/security.md index df1cea8cd8fe1d01ad068bda9294367100d49cfd..af25481bdd4a55cd12e45d8b8c08808ae76ba4f6 100644 --- a/docs/vllm_mindspore/docs/source_en/general/security.md +++ b/docs/vllm_mindspore/docs/source_en/general/security.md @@ -2,7 +2,7 @@ [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.0/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r2.7.0/docs/vllm_mindspore/docs/source_en/general/security.md) -When enabling inference services using vLLM-MindSpore Plugin on Ascend, there may be some security-related issues due to the need for certain network ports for necessary functions such as serviceification, node communication, and model execution. +When enabling inference services using vLLM-MindSpore Plugin on Ascend, there may be some security-related issues due to the need for certain network ports for necessary functions such as service-oriented, node communication, and model execution. ## Service Port Configuration @@ -28,15 +28,15 @@ For security, it should be deployed in a sufficiently secure isolated network en 1. Environment Variables: * `VLLM_HOST_IP`: Sets the IP address for vLLM processes to communicate on, main scenario is to communicate in MindSpore distributed network. - * `VLLM_DP_MASTER_IP`: Sets the IP address for data parallel(not for online-serving, default: `127.0.0.1`). - * `VLLM_DP_MASTER_PORT`: Sets the port for data parallel(not for online-serving, default: `0`). + * `VLLM_DP_MASTER_IP`: Sets the IP address for data parallel (not for online-serving, default: `127.0.0.1`). + * `VLLM_DP_MASTER_PORT`: Sets the port for data parallel (not for online-serving, default: `0`). 2. Data Parallel Configuration: - * `data_parallel_master_ip`: Sets the IP address for data parallel(default: `127.0.0.1`). - * `data_parallel_master_port`: Sets the port for data parallel(default: `29500`). + * `data_parallel_master_ip`: Sets the IP address for data parallel (default: `127.0.0.1`). + * `data_parallel_master_port`: Sets the port for data parallel (default: `29500`). ### Executing Framework Distributed Communication -It should be noted that vLLM-MindSpore Plugin use MindSpore's distributed communication. For detailed security information about MindSpore, please refer to the [MindSpore](https://www.mindspore.cn/en). +It should be noted that vLLM-MindSpore Plugin uses MindSpore's distributed communication. For detailed security information about MindSpore, please refer to the [MindSpore](https://www.mindspore.cn/en). ## Security Recommendations diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md index b4f917c9ff37ff1d740001f5b4b53fc1edd04585..5480705a82713d2bf87208c39a0e019e43112d8a 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md @@ -150,7 +150,7 @@ vLLM-MindSpore Plugin can be installed in the following two ways. **vLLM-MindSpo - **vLLM-MindSpore Plugin Manual Installation** - If user need to modify the components or use other versions, components need to be manually installed in a specific order. Version compatibility of vLLM-MindSpore Plugin can be found [Version Compatibility](#version-compatibility), abd vLLM-MindSpore Plugin requires the following installation sequence: + If users require custom modifications to dependent components such as vLLM, MindSpore, Golden Stick, or MSAdapter, they can prepare the modified installation packages locally and perform manual installation in a specific sequence. The installation sequence requirements are as follows: 1. Install vLLM diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md index c7871b66d62e81b3030be0198f82701c1fd6ae3d..21f40d50e943eba06a0c91626caa182070dbdb78 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md @@ -140,13 +140,11 @@ Here is an explanation of these environment variables: - `vLLM_MODEL_BACKEND`: The backend of the model to run. User could find supported models and backends for vLLM-MindSpore Plugin in the [Model Support List](../../user_guide/supported_models/models_list/models_list.md). - `MINDFORMERS_MODEL_CONFIG`: The model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/qwen2_5). For Qwen2.5-7B, the YAML file is [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml). -Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command: +Additionally, users need to ensure that MindSpore Transformers is installed. Users can introduce MindSpore Transformers through the following methods: ```bash export PYTHONPATH=/path/to/mindformers:$PYTHONPATH -``` - -This will include MindSpore Transformers in the Python path. +``` ### Offline Inference @@ -209,7 +207,7 @@ INFO: Application startup complete. Additionally, performance metrics will be logged, such as: ```text -Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% +Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% ``` #### Sending Requests diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index 1fd6e9e3f43d26c22fed9227a720b12e1bf56791..2c78b4a27c87dedf0e78429fa26b31f79511b221 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -128,13 +128,11 @@ parallel_config: expert_parallel: 1 ``` -Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command: +Additionally, users need to ensure that MindSpore Transformers is installed. Users can introduce MindSpore Transformers through the following methods: ```bash export PYTHONPATH=/path/to/mindformers:$PYTHONPATH -``` - -This will include MindSpore Transformers in the Python path. +``` ### Starting Ray for Multi-Node Cluster Management diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md index 4854876a6bff491c56f21f11ee824d449c4da365..0243541e58915ed061d0da989b8c9e280cb5aa2b 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md @@ -207,7 +207,7 @@ INFO: Application startup complete. 另外,日志中还会打印服务的性能数据信息,如: ```text -Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% +Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% ``` #### 发送请求 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md index 18c53d93e693b18f3992e50985f855478133b5a7..75248816f482471d3b7c69c00adb36a3b5af1f94 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md @@ -204,7 +204,7 @@ INFO: Application startup complete. 另外,日志中还会打印出服务的性能数据信息,如: ```text -Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% +Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% ``` ### 发送请求 diff --git a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md index e050b680c94356c7db19b189c7d14ef2b2428692..d6678a96bb2e1e7617e2f9978469733e71685fe9 100644 --- a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md +++ b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md @@ -125,7 +125,7 @@ export MODEL_ID="/path/to/model/Qwen2-7B" Run the following command to start the vLLM-MindSpore Plugin service backend: ```shell -vllm-mindspore serve --model=${MODEL_ID} --port=${VLLM_HTTP_PORT} --trust_remote_code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block_size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 1 --data-parallel-size 1 --data-parallel-size-local 1 --data-parallel-start-rank 0 --data-parallel-address ${VLLM_MASTER_IP} --data-parallel-rpc-port ${VLLM_RPC_PORT} &> vllm-mindspore.log & +vllm-mindspore serve --model=${MODEL_ID} --port=${VLLM_HTTP_PORT} --trust_remote_code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block_size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 1 --data-parallel-size 1 --data-parallel-size-local 1 --data-parallel-start-rank 0 --data-parallel-address ${VLLM_MASTER_IP} --data-parallel-rpc-port ${VLLM_RPC_PORT} &> vllm-mindspore.log & ``` After the backend service is loaded, the listening port and provided APIs of the backend service are displayed.