Upload your audio and two speaker images to generate a two-person talking podcast video with synchronized lip sync.
Lipsync Image (Two Speakers): Recommended — ideal for podcasts and dialogues; supports two speakers. Supports realistic humans, animals, cartoons, or stylized characters. Maximum duration: 500s
*1. Upload, Generate, or Edit Photo
Choose whether to upload two finished role tracks, split one existing podcast, or generate podcast audio first.
Upload at least one audio. If no audio is uploaded for the left role, it will remain silent.
Upload at least one audio. If no audio is uploaded for the right role, it will remain silent.
The order of the two audio sources in the output video, 'meanwhile' means both audio sources will play at the same time,'left_right' means the left audio will play first then the right audio will play, 'right_left' means the right audio will play first then the left audio will play.
Log in to get daily credits and start generating videos. Your tasks will continue in the background if you close the page. Please do not submit the same task repeatedly. You can find your previous generations on the My Creations page.
*1. Upload, Generate, or Edit Photo
Choose whether to upload two finished role tracks, split one existing podcast, or generate podcast audio first.
Upload at least one audio. If no audio is uploaded for the left role, it will remain silent.
Upload at least one audio. If no audio is uploaded for the right role, it will remain silent.
The order of the two audio sources in the output video, 'meanwhile' means both audio sources will play at the same time,'left_right' means the left audio will play first then the right audio will play, 'right_left' means the right audio will play first then the left audio will play.
Log in to get daily credits and start generating videos. Your tasks will continue in the background if you close the page. Please do not submit the same task repeatedly. You can find your previous generations on the My Creations page.








































Pick the workflow that matches your source media and goal, then use the model, upload, and masking tips to get cleaner lip sync results.
Create a podcast-style video where two people speak naturally. Upload a two-person image and provide one audio track for each speaker, or split a full podcast recording into separate speaker tracks first.
Standard
* Annual credits are issued in full upon purchase and refreshed annually.
Pro
* Annual credits are issued in full upon purchase and refreshed annually.
Basic
* Annual credits are issued in full upon purchase and refreshed annually.
Pay as you go. Credits never expire.
Upload audio and speaker images, adjust settings like duration and quality, and generate professional multi-speaker lip sync podcast videos instantly.
Our AI Podcast Video Generator uses advanced technology to align phonemes and facial motion, creating compelling multi-speaker lip sync for professional podcast visuals.
Add your podcast audio (MP3/WAV) and speaker images (PNG/JPG) to the AI Podcast Video Generator.
Configure duration, quality, and multi-speaker lip sync preferences.
Create professional podcast videos with synchronized multi-speaker lip sync.
Review your AI-generated podcast video and download or share instantly.
Upload audio and images to create professional multi-speaker lip sync podcast videos. Free tier available with limits.