The following article is an excerpt from Chapter 4 of HTTP Live Streaming: A Practical Guide. This is the introductory section of the chapter and is meant to give you the background you need to understand how adaptive streaming works and how to do it with HLS.
In previous chapters, we’ve seen how to stream a video with HLS for both on- demand and live video. However, a single video stream at a particular bit rate isn’t going to work on all the devices that people watch video on today. Some devices are more powerful than others, screen sizes are different, some support different H.264 profiles, and so on. Connection speeds can also vary depending on how you are connected to the internet.
For example, you can watch an HD video stream on your flat screen TV at home, but you won’t get the same experience watching the same stream on a mobile device over a cellular network. Playback will stall as the device won’t be able to download the video data fast enough. In some cases, the video may not play at all because the device won’t be powerful enough to decode the video. Then there’s the wasted bandwidth. Most screens on mobile devices do not have the same resolution as a TV so all those pixels will be wasted.
The amount of bandwidth available also plays a part. Your internet provider may promise you super-fast connection speeds, but what you actually get may fluctuate at any given time. If everybody in your neighbourhood starts streaming the latest movie from Netflix or watching YouTube videos, you can be fairly certain that your connection speed will drop off. The result? More stalls and buffering.
We need a solution that allows us to deliver a video stream that’s optimal for the device the stream is being watched on, and that is where adaptive streaming comes in to play.
Here’s the plan for what we’ll cover in this chapter:
- First we’ll look at what adaptive streaming is, how it works, and why it’s useful.
- Then you’ll learn how to take a video and stream it on-demand using adaptive streaming.
We’ll be doing some video encoding in this chapter so if you come across any terms you aren’t familiar with, refer to Appendix A, An Encoding Primer.
Understanding Adaptive Streaming
Adaptive streaming is a technique that adjusts the quality of a video in real-time based upon a number of factors, such as available bandwidth and the capabilities of a device. The player (on the device) monitors the CPU and connection speed in real time and selects an appropriate stream. If anything changes that affects playback, the player can switch to a different one—it adapts.
Figure 4.1 shows an example of how adaptive streaming adjusts to available bandwidth. In practice, this should produce the best possible viewing experience for users; if somebody is watching the stream on a low-end mobile device, the quality of the video won’t be the best but at least it won’t stall during playback.
Bandwidth isn’t the only factor that indicates when to switch streams. Other factors include:
- Buffer conditions. A player typically uses a buffer that can hold a few seconds worth of video. The buffer ensures that the video will continue to play if there are intermittent problems with the connection. If the amount of data in the buffer drops below a certain level, this can indicate that the connection speed is not fast enough to download data at the rate required, which can result in the video stalling.
- CPU usage. If the CPU cannot decode frames fast enough to sustain the required frame rate, it will drop some of them. This is an indication that the device’s CPU is not powerful enough to deal with how the video has been encoded. The player can use the number of dropped frames to determine if it needs to switch to a lower quality stream.
Let’s take a look at adaptive streaming in action. We’ll use Apple’s test HLS stream as an example. Figure 4.2 shows two images taken from the video. If you look closely you’ll see that the image on the right is a lot sharper than the image on the left. Why is that? A few seconds after the video starts playing, the player calculates that there is sufficient bandwidth available so it switches to a higher quality version. When it switches, you should notice the difference in quality. Playback continues from where the previous one left off.
To do adaptive streaming we need to create multiple versions of the source video at different bit-rates, and that is what we’ll look at in the next section.
Encoding the Variants
To make use of adaptive streaming, we need to create several video streams (variants) at different bit rates. Apple publishes guidelines on what the video and audio bit rates should be. You should choose your streams carefully. For example, there isn’t much point having a 250 Kbps and a 300 Kbps stream available as there will be no significant difference in quality—Apple recommends keeping adjacent bit rates at a factor of 1.5 to 2 times apart. The number of alternative streams you need to create depends (in part) on what devices you want to support. Having lots of streams is not necessarily better as more streams means more opportunities for the player to switch with little or no difference in quality.
We’ll create the different versions from a single video so it needs to be high quality. In this case, quality refers to the bit rate and resolution of the video, not how entertaining it is. You can create a low bit rate version from an HD video, but not the other way around. We’re going to create the different versions from a 1080p source. There is a 1080p version of the sample video available to download from the book’s website, or you can use your own.
We’ll use QuickTime Player to encode the videos. Open up the video in QuickTime. From the “File” menu, select “Export”. You’ll see a list of options to choose from. Select “720p”. Export the file as “sample_720p” and save it in the same directory as the sample video. Repeat the process for the “480p” option, but save it as “sample_360p”. (As the aspect ratio of the source video is 16:9, the video will actually be encoded as 360p, not 480p.) When you’re finished, you’ll have two additional videos. We’ll use these to create the alternative streams.
Encoding the Hard(er) Way
Using QuickTime to encode our videos may be easy to do, but it doesn’t give us much control over how the video is encoded; we have to make do with whatever the default settings are. If we want to use custom settings, we need to use something else. There are commercial video encoders available, such as Sorenson’s Squeeze and Telestream’s Episode to name a few, but these can cost big bucks. Instead, we’ll use ffmpeg
.
Let’s take one of Apple’s recommended settings and encode our video accordingly with ffmpeg
. The dimensions of the output video should be 640×360 with the video and audio bit rates set to 1200k and 96k respectively. We also want to restrict the encoder to the Baseline Profile (Level 3.1) and there should be a keyframe every 3 seconds.
Here’s how you would invoke ffmpeg
to do the encoding using the settings we just described:
$ ffmpeg -i sample_720p.mov \ -force_key_frames "expr:gte(t,n_forced*3)" \ -c:v libx264 -vprofile baseline -vlevel 3.1 \ -s 640x360 \ -b:v 1200k \ -strict -2 \ -c:a aac -ar 44100 -ac 2 -b:a 96k \ sample_360p.mov
We encode the video (-c:v
) with libx264 and set the profile and level accordingly with the -vprofile
and -vlevel
options. To insert a keyframe every 3 seconds, we use the -force_key_frames
option with the expression expr:gte(t,n_forced*3)
, where t is the time of the current frame and n_forced
is the number of frames forced so far. If the expression evaluates to 1, a keyframe will be inserted at the current position.
A quick note about the audio settings. We encode the audio (-c:a
) with ffmpeg
’s native AAC encoder. It’s considered an experimental encoder, so we have to include the -strict -2
flags to enable it. There are other AAC encoders available, such as libfaac
, but their licenses are not compatible with the GPL so you cannot download a pre-built version of ffmpeg
that includes them. If you want to use one of these “non-free” encoders, you can always compile ffmpeg
yourself, but we won’t be covering how to do that here.
You can check that the video has been encoded correctly by using a tool like MediaInfo.
Preparing the Video Streams
The next step is to segment each video and create the playlists. As before, we’ll use mediafilesegmenter to do this but with some additional options. Here’s the command to segment the 720p version:
$ mediafilesegmenter \ -start-segments-with-iframe \ -I \ -f sample_720p \ -t 9 \ sample_720p.mov
The -I
option creates a property list file. This file contains information about the video including how it is encoded, the video dimensions, and so on. The name of this file will be the same as the input file minus the extension and with .plist
added. We’ll see how to use this file in the next section.
The -start-segments-with-iframe
requires an explanation. As you can probably guess from the name, this option guarantees that each segment begins with an I-frame. Why is this important? It helps make video playback as seamless as possible when switching streams. For example, the player might start playing the lowest quality stream (sample_360p/fileSequence1.ts
), but if it can handle a higher quality stream, it will retrieve and play the next segment from that stream (sample_720p/fileSequence2.ts
). If the next segment didn’t begin with an I-frame, there’s a possibility that some frames in the segment may rely on frames in a different segment. The player would need to download these segments as well in order to construct the current video frame and this could cause a delay in switching streams. Although starting each segment with an I-frame is not strictly a requirement, it can help.
…