# pyvideotrans
**Repository Path**: mrerror/pyvideotrans
## Basic Information
- **Project Name**: pyvideotrans
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-17
- **Last Updated**: 2026-04-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
> Sponsors: **[Recall.ai](https://www.recall.ai/product/meeting-transcription-api?utm_source=github&utm_medium=sponsorship&utm_campaign=jianchang512-pyvideotrans) - Meeting Transcription API**
>
> If youβre looking for a transcription API for meetings, consider checking out **[Recall.ai](https://www.recall.ai/product/meeting-transcription-api?utm_source=github&utm_medium=sponsorship&utm_campaign=jianchang512-pyvideotrans)** , an API that works with Zoom, Google Meet, Microsoft Teams, and more
# pyVideoTrans
**A Powerful Open Source Video Translation / Audio Transcription / AI Dubbing / Subtitle Translation Tool**
[δΈζ](docs/README_CN.md) | [**Documentation**](https://pyvideotrans.com) | [**Online Q&A**](https://bbs.pyvideotrans.com)
[](LICENSE) [](https://www.python.org/) []()
**pyVideoTrans** is dedicated to seamlessly converting videos from one language to another, offering a complete workflow that includes speech recognition, subtitle translation, multi-role dubbing, and audio-video synchronization. It supports both local offline deployment and a wide variety of mainstream online APIs.
---
## β¨ Core Features
- **π₯ Fully Automatic Video Translation**: One-click workflow: Speech Recognition (ASR) -> Subtitle Translation -> Speech Synthesis (TTS) -> Video Synthesis.
- **ποΈ Audio Transcription / Subtitle Generation**: Batch convert audio/video to SRT subtitles, supporting **Speaker Diarization** to distinguish between different roles.
- **π£οΈ Multi-Role AI Dubbing**: Assign different AI dubbing voices to different speakers.
- **𧬠Voice Cloning**: Integrates models like **F5-TTS, CosyVoice, GPT-SoVITS** for zero-shot voice cloning.
- **π§ Powerful Model Support**:
- **ASR**: Faster-Whisper (Local), OpenAI Whisper, Alibaba Qwen, ByteDance Volcano, Azure, Google, etc.
- **LLM Translation**: DeepSeek, ChatGPT, Claude, Gemini, MiniMax, Ollama (Local), Alibaba Bailian, etc.
- **TTS**: Edge-TTS (Free), OpenAI, Azure, Minimaxi, ChatTTS, ChatterBox, etc.
- **π₯οΈ Interactive Editing**: Supports pausing and manual proofreading at each stage (recognition, translation, dubbing) to ensure accuracy.
- **π οΈ Utility Toolkit**: Includes auxiliary tools such as vocal separation, video/subtitle merging, audio-video alignment, and transcript matching.
- **π» Command Line Interface (CLI)**: Supports headless operation, convenient for server deployment or batch processing.
---
## π Quick Start (Windows Users)
We provide a pre-packaged `.exe` version for Windows 10/11 users, requiring no Python environment configuration.
1. **Download**: [Click to download the latest pre-packaged version](https://github.com/jianchang512/pyvideotrans/releases)
2. **Unzip**: Extract the compressed file to a path (e.g., `D:\pyVideoTrans`).
3. **Run**: Double-click `sp.exe` inside the folder to launch.
> **Note**:
> * Do not run directly from within the compressed archive.
> * To use GPU acceleration, ensure **CUDA 12.8** and **cuDNN 9.11** are installed.
---
## π οΈ Source Deployment (macOS / Linux / Windows Developers)
We recommend using **[`uv`](https://docs.astral.sh/uv/)** for package management for faster speed and better environment isolation.
### 1. Prerequisites
* **Python**: Recommended version 3.10 --> 3.12
* **FFmpeg**: Must be installed and configured in the environment variables.
* **macOS**: `brew install ffmpeg libsndfile git`
* **Linux (Ubuntu/Debian)**: `sudo apt-get install ffmpeg libsndfile1-dev`
* **Windows**: [Download FFmpeg](https://ffmpeg.org/download.html) and configure Path, or place `ffmpeg.exe` and `ffprobe.exe` directly in the project directory.
### 2. Install uv (If not installed)
```bash
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```
### 3. Clone and Install
```bash
# 1. Clone the repository (Ensure path has no spaces/Chinese characters)
git clone https://github.com/jianchang512/pyvideotrans.git
cd pyvideotrans
# 2. Install dependencies (uv automatically syncs environment)
uv sync
# If you need local channels for qwen-tts and qwen-asr, please execute `uv sync --extra qwen-tts --extra qwen-asr`
```
### 4. Launch Software
**Launch GUI**:
```bash
uv run sp.py
```
**Use CLI**:
> [View documentation for detailed parameters](https://pyvideotrans.com/cli)
```bash
# Video Translation Example
uv run cli.py --task vtv --name "./video.mp4" --source_language_code zh --target_language_code en
# Audio to Subtitle Example
uv run cli.py --task stt --name "./audio.wav" --model_name large-v3
```
### 5. (Optional) GPU Acceleration Configuration
If you have an NVIDIA graphics card, execute the following commands to install the CUDA-supported PyTorch version:
```bash
# Uninstall CPU version
uv remove torch torchaudio
# Install CUDA version (Example for CUDA 12.x)
uv add torch==2.7 torchaudio==2.7 --index-url https://download.pytorch.org/whl/cu128
uv add nvidia-cublas-cu12 nvidia-cudnn-cu12
```
---
## π§© Supported Channels & Models (Partial)
| Category | Channel/Model | Description |
| :--- | :--- | :--- |
| **ASR (Speech Recognition)** | **Faster-Whisper** (Local) | Recommended, fast speed, high accuracy |
| | WhisperX / Parakeet | Supports timestamp alignment & speaker diarization |
| | Alibaba Qwen3-ASR / ByteDance Volcano | Online API, excellent for Chinese |
| **Translation (LLM/MT)** | **DeepSeek** / ChatGPT | Supports context understanding, more natural translation |
| | MiniMax AI | MiniMax M2.7 LLM, latest flagship model, OpenAI-compatible |
| | Google / Microsoft | Traditional machine translation, fast speed |
| | Ollama / M2M100 | Fully local offline translation |
| **TTS (Speech Synthesis)** | **Edge-TTS** | Microsoft free interface, natural effect |
| | **F5-TTS / CosyVoice** | Supports **Voice Cloning**, requires local deployment |
| | GPT-SoVITS / ChatTTS | High-quality open-source TTS |
| | 302.AI / OpenAI / Azure | High-quality commercial API |
---
## π Documentation & Support
* **Official Documentation**: [https://pyvideotrans.com](https://pyvideotrans.com) (Includes detailed tutorials, API configuration guides, FAQ)
* **Online Q&A Community**: [https://bbs.pyvideotrans.com](https://bbs.pyvideotrans.com) (Submit error logs for automated AI analysis and answers)
## β οΈ Disclaimer
This software is an open-source, free, non-commercial project. Users are solely responsible for any legal consequences arising from the use of this software (including but not limited to calling third-party APIs or processing copyrighted video content). Please comply with local laws and regulations and the terms of use of relevant service providers.
## π Acknowledgements
This project mainly relies on the following open-source projects (partial):
* [FFmpeg](https://github.com/FFmpeg/FFmpeg)
* [PySide6](https://pypi.org/project/PySide6/)
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
* [openai-whisper](https://github.com/openai/whisper)
* [edge-tts](https://github.com/rany2/edge-tts)
* [F5-TTS](https://github.com/SWivid/F5-TTS)
* [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)
---
*Created by [jianchang512](https://github.com/jianchang512)*