How does video on demand work?

In addition to live video workflow, let’s have a general idea on the Video on demand or VOD workflow. This is an example of a VOD workflow, there could be different ones, VOD is what you get when you watch a series on Netflix for instance.

This is a general case for processing video on demand files and getting them to your users and possibly scaling to hundreds of thousands or even million users.

The first step is recording a video with the camera and putting it on mobile phones storage, or if you have something already on the file storage, then delivering that either to an S3 location, a NAS on a local network, or to Azure Cloud, FTP, for example, SFTP, AWS cloud or the Google Cloud.

Once these files are accessible by the next step of the process will be encoding the content. Video encoding is the process of taking a raw video format and compressing it further to better handle transmission and delivery with the highest video quality possible. Basically, encoding always tries to maximize video quality and minimize the file size. So this is an important step, where you have different components that can come into play to operate this compression. Video encoding can happen with either cloud encoding (with a distributed Kubernetes environment), or on prem and hardware encoders (like the Harmonic VOS encoder).


So these are kind of VMs that run in the cloud and can run on a Kubernetes cluster, for example, and to make the encoding part, scalable, and available to autoscale. There is no need to have any hardware installed in your local office. On prem requires hardware locally, but it still runs on a computer. Also, if you have for example hardware encoders, these are actual physical cards that are designed for encoding to very high quality and can take in very high bitrate stream and file, for example SDI or 2022-6 video. In the delivery process this part reduces the file size while maximizing the quality, this transcoding pass – or compressing even further a compressed video. This step can be repeated 1-pass encoding, 2-pass encoding or 3-pass encoding, where each step has an additional resources cost but delivers a more optimized set of versions for the video, aiming at maxing out the Video KPIs of quality versus file size.

The next part in the chain is to have these files created by the different encoding profiles, which are specified before the start of the encoding job, or can be automagically derived by per title algorithms, where the algorithm itself depending on the content efficiently compresses the video by applying content-specific encoding settings. So there’s a configuration here, which says, okay, to have a specific video asset, create different copies of it, at multiple bitrates and resolution to deliver the best quality as efficiently as possible to users who consume the content on different devices and at different speeds. So our users can consume the video content on devices with different characteristics like SmartTVs or tablets.

To be clear, there is no point in sending a five megabit video file to a telephone or a tablet, because it’s going to be too heavy, the quality and the screen size of the phone does not require such high resolution and bitrate. So we create different copies at different resolutions, and with different protocols. So we have DASH and HLS protocols as examples in this workflow. DASH works on Android devices and generally SmartTVs, whereas HLS, works on Android devices, but also Apple devices. The next step is dynamic packaging, which can then add in, for example, server side ad insertion, or DRM, depending on what’s the use case and what the business need is. Then those files along with the MPD (media presentation description) file for DASH, which is the manifest and describes how the video should be played, in m3u8 format for HLS, are delivered to an origin server and a CDN. Basically, they’re put on a server, which then copies those files across all the different servers in the different geographical regions, duplicating the content in each sever. These edge servers are located near the geographical location of the users, who would then access those videos from a specific device can load it much faster, then if they were far away. So you’re really reducing the startup time when scaling to a lot of (far away) users, who would otherwise, without the benefit of a cdn load the video slowly and suffer buffering.

Video on Demand workflow from ingestion to encoding and delivery.

Also, if you have users far away geographically from where you’re storing the files, these added to a CDN can be accessed earlier and quicker than if it had to load them from a very far off server. Starting the video as fast as possible is very important, after 3 seconds that a video hasn’t loaded they leave. So the last part is actually users, which can access the files through different devices, and that get different resolution and bitrates depending on their context and bandwidth speed. And finally there is the Video developer who’s kind of managing it, for each section, we can have analytics relating to it to then tune the different encoding profiles. So for example, if we see that nobody is using the one megabit resolution via the video analytics, and then we can say, okay, we don’t need that version and we’re not going to encode it and distribute it to a CDN. So we’re not going to waste space storing it. And actually, because nobody’s consuming it, there is no need to put it on the CDN server and the related transmission costs. As we said, bear in mind every video must be duplicated for the number of resolution and bit rates and divided into segments that make up the whole video to achieve maximum quality at the lowest file size for the users. So the small chunks, or fragments, making up the whole video allow for the quality to be switched very quickly, to avoid buffering the video.
The goal with Video on Demand is to get the video to the user at the best possible quality with the smallest file size, even at the cost of increasing the resources needed to encode it as efficiently as possible.