AI 모델을 로컬에서 음성으로 실시간 음성 실행

KDnuggets | | 💼 비즈니스
#ai 모델 #personaplex #speech to speech #tip #로컬 실행 #실시간 음성 #ai #실시간 처리 #음성 인식
원문 출처: KDnuggets · Genesis Park에서 요약 및 분석

요약

This guide explains the steps required to execute a speech-to-speech artificial intelligence model directly on local hardware. It emphasizes that this approach facilitates real-time audio processing and generation without relying on cloud-based services, offering improved privacy and reduced latency for users.

본문

Run a Real Time Speech to Speech AI Model Locally In this guide, you learn how to install and run PersonaPlex locally step by step, so you can experience real time, interruptible speech to speech AI directly on your own machine. Image by Author # Introduction Before we start anything, I want you to watch this video: Isn’t this amazing? I mean you can now run a full local model that you can talk to on your own machine and it works out of the box. It feels like talking to a real person because the system can listen and speak at the same time, just like a natural conversation. This is not the usual “you speak then it waits then it replies” pattern. PersonaPlex is a real-time speech-to-speech conversational AI that handles interruptions, overlaps, and natural conversation cues like “uh-huh” or “right” while you are talking. PersonaPlex is designed to be full duplex so it can listen and generate speech simultaneously without forcing the user to pause first. This makes conversations feel much more fluid and human-like compared to traditional voice assistants. In this tutorial, we will learn how to set up the Linux environment, install PersonaPlex locally, and then start the PersonaPlex web server so you can interact with the AI in your browser in real time. # Using PersonaPlex Locally: A Step-by-Step Guide In this section, we will walk through how we install PersonaPlex on Linux, launch the real-time WebUI, and start talking to a full-duplex speech-to-speech AI model running locally on our own machine. // Step 1: Accepting the Model Terms and Generating a Token Before you can download and run PersonaPlex, you must accept the usage terms for the model on Hugging Face. The speech-to-speech model PersonaPlex-7B-v1 from NVIDIA is gated, which means you cannot access the weights until you agree to the license conditions on the model page. Go to the PersonaPlex model page on Hugging Face and log in. You will see a notice saying that you need to agree to share your contact information and accept the license terms to access the files. Review the NVIDIA Open Model License and accept the conditions to unlock the repository. Once access is granted, create a Hugging Face access token: - Go to Settings → Access Tokens - Create a new token with Read permission - Copy the generated token Then export it in your terminal: export HF_TOKEN="YOUR_HF_TOKEN" This token allows your local machine to authenticate and download the PersonaPlex model. // Step 2: Installing the Linux Dependency Before installing PersonaPlex, you need to install the Opus audio codec development library. PersonaPlex relies on Opus for handling real-time audio encoding and decoding, so this dependency must be available on your system. On Ubuntu or Debian-based systems, run: sudo apt update sudo apt install -y libopus-dev // Step 3: Building PersonaPlex from Source Now we’ll clone the PersonaPlex repository and install the required Moshi package from source. Clone the official NVIDIA repository: git clone https://github.com/NVIDIA/personaplex.git cd personaplex Once inside the project directory, install Moshi: pip install moshi/. This will compile and install the PersonaPlex components along with all required dependencies, including PyTorch, CUDA libraries, NCCL, and audio tooling. You should see packages like torch, nvidia-cublas-cu12, nvidia-cudnn-cu12, sentencepiece, and moshi-personaplex being installed successfully. Tip: Do this inside a virtual environment if you are on your own machine. // Step 4: Starting the WebUI Server Before launching the server, install the faster Hugging Face downloader: pip install hf_transfer Now start the PersonaPlex real-time server: python -m moshi.server --host 0.0.0.0 --port 8998 The first run will download the full PersonaPlex model, which is approximately 16.7 GB. This may take some time depending on your internet speed. After the download completes, the model will load into memory and the server will start. // Step 5: Talking to PersonaPlex in the Browser Now that the server is running, it is time to actually talk to PersonaPlex. If you are running this on your local machine, copy and paste this link into your browser: http://localhost:8998. This will load the WebUI interface in your browser. Once the page opens: - Select a voice - Click Connect - Allow microphone permissions - Start speaking The interface includes conversation templates. For this demo, we selected the Astronaut (fun) template to make the interaction more playful. You can also create your own template by editing the initial system prompt text. This allows you to fully customize the personality and behavior of the AI. For voice selection, we switched from the default and chose Natural F3 just to try something different. And honestly, it feels surprisingly natural. You can interrupt it while it is speaking. You can ask follow-up questions. You can change topics mid-sentence. It handles conversational flow smoothly and responds intelligently in real time. I

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →