Engineering

Why we built Perfect-Sync narration

When we could not find a solution for visual synchronization with narrator voice, we built one ourselves.

Simon Hedberg February 20, 2026 5 min

The problem

When you produce an explainer video, visual elements must sync with the narrator voice. A graph that appears too early or text that shows up after the speaker has moved on ruins the experience. In traditional production, an animator handles this manually - frame by frame.

Why manual sync does not scale

Manual synchronization works fine for one video. But what happens when you have 50 videos in 14 languages? Suddenly you have 700 unique timelines to synchronize. Each language has different speech tempo, word length, and natural pauses. You cannot just copy the timing from one version to another.

How Perfect-Sync works

Perfect-Sync analyzes the narrator voice audio and identifies keywords, pauses, and emphasis. Then it adjusts the timeline so visual elements - animations, text blocks, transitions - appear at the right moment.

It works for both AI-generated voices and recorded studio voices. And it works for every language, whether it is Swedish with long compound words or Japanese with short, rhythmic speech.

The technology behind it

We use a combination of speech-to-text analysis to identify word boundaries and a rule-based system to determine which visual elements belong to which sections of the script. The system is built to be conservative - it is better for an animation to be slightly early than too late.

We have iterated on the algorithm over two years and it now handles most edge cases: long silent pauses, fast enumerations, and questions ending with rising intonation.

Results

Perfect-Sync has reduced our localization time by over 80%. Instead of an animator spending two days per language version, it now takes under an hour. And the quality is more consistent - manual errors are eliminated.

Is it perfect? No. That is why it is called Perfect-Sync and not Auto-Sync. It is a tool that does 90% of the job so humans can focus on the 10% that requires creative judgment.

Want to learn more?

Book a call and we will show you how SkillGround can help.