Automated Captioning in the multi-platform Delivery Age

By Ken Frommert and Eduardo Martinez

Automation has been infused into innumerable elements of our daily lives. From production and assembly lines to broadcast facilities around the world, the transition to automated processes and workflows now have deep roots, and have forever changed the way we work, shop and entertain.

In broadcasting, the production and transmission of live, manual captioning has long been challenged by high costs, availability, varied latency, and inconsistent accuracy rates. While perfection is impossible due to the speed of live captioning, the transition to more automated, software-defined captioning workflows introduced a new series of challenges.

Closed-captioning is in large part driven by government mandates worldwide to ensure that deaf and hearing-impaired viewers can fully understand and enjoy on-air programming. Closed captions are typically encoded within the video stream and decoded by the TV, set-top box or other viewing/receiving device.

While different mandates on closed-captioning in broadcast television exist around the world, the unifying purpose ensures that deaf and hearing-impaired viewers can fully understand and enjoy the shows they watch. Beyond the hearing impaired, statistics show that one in six viewers worldwide prefer to receive closed captions with their content. This means that has viewers continue to consume content in different ways, technology must evolve to serve changing viewer habits.

Deep Neural Benefits

A common concern across all appliances of automation is the reduction, or outright elimination, of the human element. Closed-captioning is just the latest platform to which these conversations have shifted.

These concerns are beginning to subside as speed and accuracy of speech-to-text conversion continues to improve with the emergence of deep neural network advances. The statistical algorithms associated with these advances, coupled with larger multi-lingual databases to mine, more effectively interpret – and accurately spell out – the speech as it passes through the automated workflow.

Ken Frommert, President of ENCO

Today’s strongest automated captioning systems, like ENCO’s enCaption4, today approach accuracy rates of 90 percent or higher. The statistical algorithms associated with these advances, coupled with larger multi-lingual databases to mine, more effectively interpret – and accurately spell out – the speech coming through the air feed or mix-minus microphone.

Meanwhile, the faster and more powerful processing of the computing engines within automated captioning technology has significantly reduced the latency to near real-time. This achievement is particularly impressive given that automated captions took between 30-to-60 seconds on many systems as recently as one or two generations ago.

Additionally, as closed captioning software matures, emerging applications to eliminate crosstalk, improve speaker identification and ignore interruptions is improving the overall quality and experience for hearing impaired viewers. Furthermore, the technology is also advancing to support closed-captioning transmission across multiple delivery platforms.

New Efficiencies, New Services

One recent innovation is the introduction of multi-speaker identification, which isolates separate microphone feeds to reduce confusion from cross-talk.

Live talk shows represent an ideal use case. In this scenario, each speaker on the stage is assimilated into the captioning workflow based on their assigned microphone positions, while the software ignores distractions such as low voices and interruptions. The end result is a seamless transition as the conversation shifts between each speaker, eliminating cross-talk and other events detrimental to the viewer experience.

Many of the above improvements are related to recent breakthroughs in machine learning technology for voice recognition. Machine learning not only strengthens accuracy, but it also provides value through detection of different languages and the different ways that people speak.

That intelligence as it relates to different dialects will provide an overall boost to accuracy in closed captioning. Consider a live news operation, where on-premise, automated captioning software now directly integrates with newsroom computer systems with the need for a network connection. This will now help broadcasters strengthen availability – no concerns about a network outage taking the system down – and take advantage of news scripts and rundowns to learn and validate the spelling of local names and terminology.

Automated captioning also enables the applications to be achieved efficiently on a larger scale. The costs are lowered due to the transition from human stenographers to computer automation. And as there is a need to captioning a growing amount of content, there is an economy of scale that drives the cost down even further as broadcasters automate these processes.

As systems grow more reliable and broadcasters grow more comfortable with the technology, they will also find new efficiencies and opportunities along the way. For one, broadcasters that need to cut into a regularly scheduled program with breaking news or weather alerts will no longer forced to find qualified (and expensive) live captioners on short notice.

Streaming, the Cloud and Closed-Captioning

As with many technologies, captioning systems are applicable in both on-premise and cloud configurations. In the latter case, some systems are now offered as SaaS platforms, with monthly fees that include the hardware costs coming out to as low as approximately $15 per hour for the average rate of use. With stenographer rates sitting at approximately $150 per hour, this equates to a tenfold savings that can return tens-to-hundreds of thousands of dollars to the broadcaster annually.

However, establishing captioning software in the cloud also extends the service for online audiences outside the local facility, opening the door for efficient delivery of captioned content over streaming networks and delivery platforms. One emerging opportunity for this is the automatic generation of transcriptions for live and archived, pre-recorded content.

Eduardo Martinez, Director of Technology at StreamGuys

As more systems move to software-defined platforms, the captioning workflow for pre-recorded and/or long-form content has been greatly simplified. Post-production staff can essentially drag-and-drop video files into a file-based workflow that extracts the audio track for text conversion. These files can then be delivered in various lengths and formats for a TV broadcast, the web, mobile and other platforms.

This trend aligns especially well for broadcasters and content producers with large volumes of stored media, providing tremendous flexibility to very quickly archive, search, find and recast content tailored to specific audiences and on-demand requests.

Content repurposing software from companies like StreamGuys, previously used for podcasting and specialty broadcast streams, are being tailored for closed-captioning in streaming applications. In this architecture, previously ingested content is recalled through an archived search process. This level of integration also enables users to label and search for specific speakers for improved recognition and tracking, and later search the system for all content related to a program – down to exact, spoken sentences.

With multiplatform reach, broadcasters now have opportunities to caption live and on-demand streams, ensuring that hearing-impaired and multi-lingual audiences watching online are properly served as well. The future of this technology is very exciting, especially with the knowledge that we’re really just beginning to reap the fruits of this technology.

Ken Frommert is President of ENCO, and Eduardo Martinez is Director of Technology at StreamGuys. 

National Broadcaster of Nepal Selects FOR-A for Full HD Upgrade by January 2019

Given a mandate from the country’s Ministry of Information and Communications Technology, Nepal Television, the national broadcaster of Nepal, must fully upgrade to high-definition by January of next year – which also marks the broadcaster’s 34th anniversary.

By January 31st 2019, Nepal Television will be running three HD studios, replacing all of its current equipment from SD to HD. As of that date, all new equipment must be installed and on-air so that Nepalese viewers can enjoy HDTV broadcasts from the country’s national broadcaster.


The selection was made in October to work with FOR-A’s Southeast Asia office to fulfill the networks’ quick turnaround, future 4K upgrade path, and budgetary requirements. Nepal Television will utilize two FOR-A HVS-2000 2 M/E video switchers for two of  its studios, one HVS-2000 3 M/E video switcher for a biggest production studio, and dual-channel FA-9520 signal processors for frame synchronization, color correction, and up/down/cross/aspect conversion. The Nepal Television has been using HVS-390 video switchers in two studios, including one regional studio in Kohalpur.


Nepal Television will also upgrade the master control room (MCR) and program control room (PCR) for its two national channels (NTV and NTV PLUS) to full HD. Nepal Television will use two FOR-A MFR-3000 routing switchers – one in the Master Control Room and one as a redundant router. In the program control room, FOR-A’s MFR-1616 routing switcher will serve as the upstream router, the HVS-100 as the on-air production switcher, and FA-9520’s for all signal processing.


“FOR-A is known for the ruggedness and reliability of its product range and its modern design,” said Mr. Chinta Mani Baral, HOD, Engineering and Mr. Hari Prasad Bhandari, Chief of Studio Transmission and Maintenance of Nepal Television. “From video switchers, to routing switchers to signal processing, we were able to deal with one premier manufacturer in Japan, which was a big benefit to us. They’re able to meet our pricing and tight turnaround time and make sure we’re fully HD by January 31st 2019. We expect to enjoy a mutually beneficial relationship with FOR-A for a long time.”

More info

LMG Adds Riedel’s Bolero and Artist to Rental Portfolio for Comprehensive and Reliable Comms

With a national reputation of going beyond technology to provide innovative audiovisual solutions, LMG will leverage the seamless integration of Bolero and Artist to deliver comprehensive and crystal-clear communications to productions of all sizes.


LMG will deploy its initial purchase of 10 Bolero beltpacks in stand-alone mode for smaller live events and will integrate the wireless intercom with an Artist 64 mainframe for larger corporate projects and high-profile live productions, such as awards broadcasts.

Riedel_LMG-Bolero_3“At LMG, we’re committed to offering our clients the absolute state of the art in communications solutions. We know we can count on quality products from Riedel, so it was an easy decision to bring in Bolero and an additional Artist frame,” said Shane Smith, Director of Audio Services, LMG. “Our testing at Nationwide Arena convinced us that Bolero is, hands-down, the best intercom solution on the market today. It worked flawlessly, and we had absolutely no issues with dropouts or poor audio quality.”

A key deciding factor for LMG’s choice of Bolero was its ability to support challenging RF environments through Riedel-exclusive ADR (Advanced DECT Receiver) technology, a diversity receiver technology specifically designed to reduce sensitivity to multipath RF reflections. With more efficient use of RF spectrum, each Bolero antenna is able to support twice the number of beltpacks as other DECT-based systems.

“Bolero is an invaluable tool in our arsenal, particularly when we’re faced with RF challenges in large and complex venues. Plus, customers love the sound, look, and feel of Bolero,” Smith added. “We also appreciate how fast and easy it is to get up and running, even with large deployments. With Bolero’s near-field communication (NFC) technology, you register a beltpack simply by touching it to the antenna. For one recent event, we were completely ready to go with 40 Bolero beltpacks in a single afternoon — a huge time savings over previous systems we’ve used.”


Joyce Bente, President and CEO, Riedel North America, commented, “LMG is a longtime Riedel customer, and its choice of Artist and Bolero is a powerful endorsement of our communications solutions. Bolero is an ideal rental product because of its easy Artist integration and its ability to handle productions of any type and size. Add in its ease of use, robust RF performance, and great sound, and you have a winning solution for large rental houses such as LMG.”

More info