
For media publishers, producing written content from audio and video sources is a core part of daily operations. Podcasts, interviews, and live recordings all need to be transcribed before they can be edited, published, or repurposed.
However, this preprocessing phase is one of the most time consuming steps in the content production pipeline. Transcribing audio manually requires significant effort, and the process must balance both accuracy and cost.
Beyond the immediate operational burden, media teams also face a broader challenge in adopting new technologies into established workflows. Introducing automation tools into day to day operations requires not only technical capability but also alignment with how content teams actually work.
- Time consuming preprocessing
Transcribing audio and video content manually is a slow and labor intensive process. For organizations producing large volumes of content, this creates a bottleneck that delays publishing and limits how much content can be repurposed.
- Balancing accuracy and cost
Higher transcription accuracy typically requires more resources, whether through manual review or more sophisticated tools. Finding the right balance between quality and efficiency is a persistent challenge.
- Difficulty integrating new tools into existing workflows
Even when promising technologies exist, embedding them into the daily routines of content teams requires careful design and iteration based on real user feedback.
The Approach: Semi Automated Transcription Powered by Typhoon
To address these challenges, SCB 10X and Typhoon partnered with a leading media publisher to develop Rhapsode, a semi automated speech to transcript tool designed specifically for media content workflows.
First iteration: From lab to user prototype
The initial version used Typhoon ASR to transcribe audio files such as podcasts and interviews. In controlled testing, the system achieved strong results, reaching 96 percent accuracy for solo podcast recordings and 83 percent accuracy in noisy environments.
However, when the tool was tested with real users, the experience did not match the lab results. Users reported approximately 60 percent accuracy on average, with issues including misspelled proper nouns and personal names, incorrect spacing, long processing times, and inconsistent error highlighting that made manual review difficult.
Second iteration: A multi agent approach
Based on this user feedback, the team redesigned the system into a multi agent pipeline to address each pain point systematically.
The revised process begins with Voice Activity Detection, which breaks audio into smaller chunks to speed up processing and optimize resource usage. The audio is then transcribed using Typhoon ASR. From there, a first LLM agent reviews and corrects the transcript, fixing errors in spelling, names, and formatting. A second LLM agent consolidates the output and automatically highlights low confidence words for human review. The final result is exported as a ready to use transcript file.
This semi automated approach reduces the manual effort required from content teams while improving overall output quality through layered AI review.
From Experimentation to Organizational Impact
The improvements introduced in the second iteration delivered significant results. Transcription performance improved to over 88 percent accuracy, and the system is estimated to achieve approximately 60 percent productivity improvement compared to fully manual transcription workflows. As the project is currently at the proof of concept stage, these productivity gains reflect projected impact based on initial testing rather than full scale production deployment.
Beyond immediate efficiency gains, the tool opens up broader opportunities for media organizations. Semi automated transcription enables content to become indexable and searchable, improving discoverability and SEO. It also allows publishers to convert large volumes of non text content, such as podcasts and video archives, into reproducible written knowledge that can be repurposed across formats.
Through close collaboration between SCB 10X, Typhoon, and their media publishing partner, the Rhapsode project demonstrates how iterating on real user feedback can transform a promising technology into a practical tool that fits naturally into existing content production workflows.

