An end-to-end, multi-service pipeline that automates YouTube content production — AI voiceover, 9-language dubbing, subtitle synchronization, Shorts and avatar generation, and automated video assembly and publishing.
Private project — source code is not publicly available.

The YouTube Content Automation Suite is a Dockerized collection of specialized services that takes raw input and produces finished, multilingual video content. It handles AI text-to-speech narration, dubbing into nine languages, synchronized subtitles, short-form and avatar clips, branded overlays, and automated assembly and upload. Each capability is its own containerized service, so the pipeline can scale and evolve one stage at a time.
AI text-to-speech narration with quality checks and tempo normalization
Automated dubbing into 9 languages
Subtitle generation and synchronization
Short-form (Shorts) and avatar-based clip generation
Automated video assembly with branded overlays and watermarks
Image generation service for thumbnails and visuals
Containerized microservices, each independently deployable
Orchestrated, automated path from raw input to published video
The YouTube Content Automation Suite is a production content factory for video. It is built as a set of independent, Dockerized microservices — text-to-speech, dubbing, subtitles, Shorts, avatar Shorts, image generation, overlays, and video assembly — that together turn source material into finished, publish-ready content with minimal human involvement. The audio path focuses on natural-sounding narration: a TTS service is paired with an audio optimizer and a chunk tempo normalizer, plus quality checkers that validate generated audio before it moves downstream. A dubbing service then localizes that narration into nine languages. The video path adds synchronized subtitles, branded overlays and watermarks, and both long-form and short-form assembly. Orchestration ties the stages together and connects generation to publishing, so the suite can run as an automated pipeline rather than a set of manual tools. It is the engine behind a real content operation, designed for throughput, repeatability, and multilingual reach.
The suite is split into separate Dockerized services — TTS, dubbing, subtitle, Shorts, avatar Shorts, image, overlay, and video — each with its own Dockerfile and composed together. Dedicated quality checkers validate audio chunks so problems are caught early in the pipeline.
A text-to-speech service is combined with an audio optimizer and a chunk tempo normalizer to keep narration natural and consistent, and a dubbing service localizes that narration into nine languages.
A subtitle system synchronizes captions, an overlay and watermark service applies branding, and the video assembly service produces both long-form and short-form output ready for publishing.
Workflow orchestration ties generation to publishing across the services. The stack is Python and Docker throughout, with FFmpeg-based media processing, text-to-speech engines, and Gemini for AI-assisted steps.