AV1 Codec - Complete guide for video application devs

In recent years, the demand for web-based video content has surged, and this trend is expected to continue. With a growing market for high-resolution (HD, UHD, and 4K) and high-quality videos (wide color gamut and HDR) and a significant increase in users, there is a pressing need for highly efficient video codecs that do not compromise quality.

AV1 is a current-generation video codec that addresses these growing needs by improving efficiency, adaptability, and quality. It is also an open-source, royalty-free codec that surpasses the constraints of some other codecs.

Here are the key takeaways from this blog:

Get a brief history of the AV1 codec and its emergence.
Understand the key features of AV1, practical differences with other popular codecs, and the basics of how AV1 works.
Current adoption.
Learn how to use AV1 and how ImageKit can help you set it up quickly.

A Brief History of AV1

AV1 was officially announced on September 1, 2015, along with the formation of AOMedia with seven founding members: Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix.
Many components of AV1 were built on top of previous research work done by the Alliance’s members, including Google’s VP10, which was scrapped after the announcement of AV1.
The Alliance’s primary motivation to create a royalty-free and open-source codec was the uncertainty surrounding the licensing of HEVC due to increased royalty fees and the complexity of the licensing process.
Tests conducted by Moscow State University in 2017 found AV1 to be 22% more efficient than HEVC, although the encoding speed for AV1 was a lot slower at that time due to a lack of optimizations.
As per tests conducted by Facebook in 2018, the AV1 reference encoder had 34% and 50% higher data compression than VP9 and H.264, respectively, in practical use cases.
Major web browsers have added support for AV1, including Google Chrome, Firefox, Edge, and recently Safari on a few devices.

Many providers have added dedicated hardware support for AV1, including Intel, Nvidia, and recently Apple, in their A17 Pro and M3 system-on-chip (SoC).

Key features of AV1

Compression efficiency: AV1 outperforms HEVC, VP9, and AVC. It is suitable for serving high-resolution content like HD and UHD, as it saves bandwidth, and improves the user experience. It may have future potential for real-time applications.

Comparison of visual quality between H.264, VP9, and AV1 at similar bitrates. [Source: Meta engineering blog]

Quality: AV1 is well-suited for high-resolution (HD, UHD), wide color gamut, and high frame rate applications with better support for 10/12-bit encoding and HDR. Objective metrics from Netflix’s 2017 tests show that AV1 was 25% more efficient than VP9 based on measurements with peak signal-to-noise ratio (PSNR) and video multi-method assessment fusion (VMAF) at 720p.

Netflix’s enabling of AV1 on 4K TVs also improved the quality of the experience (QoE), with smoother streaming at high resolution and a reduced drop in quality for their users.
Open-source and royalty-free: AV1 is a current-generation video codec that is both state-of-the-art and royalty-free. It addresses difficulties related to royalty and licensing from codecs such as HEVC and AVC. It is in line to create royalty-free web standards.
Compatibility and support: Many browsers now support it, and dedicated hardware support is increasing.
Besides videos, AVIF image format uses AV1 codec as its compression algorithm.

A quick comparison of AV1 with other codecs

Metric	AV1	VP9	AVC (H.264)	HEVC (H.265)
Compression Efficiency	Highest Significantly outperforms VP9, HEVC, and AVC at similar bitrates.	Moderate	Moderate Least efficient among VP9, HEVC, and AV1.	High More efficient than VP9.
Licensing Model	Royalty-Free	Royalty-Free	Patent-Encumbered	Patent-Encumbered
Hardware Support	Limited Support is increasing in recent years.	Widespread	Widespread	Limited More support than AV1, but not as widely adopted at VP9.
Computational Complexity	Highest High computational complexity. It may go up to 4-10x of VP9 depending on the hardware, encoder, and decoder used. Rapid improvements in the SVT-AV1 and other encoders and optimal hardware configurations are closing the gap between AV1 and libvpx, H.265.	Moderate Lower encoding times than HEVC and AV1.	Moderate Least complex among VP9, HEVC, and AV1.	High Lower encoding time than AV1.

Adoption

AV1 is usually available in an MP4 container with an AAC/Opus audio codec and a WebM container with an Opus audio codec.
In February 2020, Netflix started streaming select titles using the AV1 codec in its Android app. This codec improved compression efficiency by 20% over the VP9 codec. In November 2021, Netflix started streaming in AV1 on TVs with hardware AV1 decoders and the PlayStation 4 Pro.
In 2020, YouTube started streaming videos at 8K resolution on 8K TVs using the AV1 codec. Certain Android TVs with the YouTube app also support AV1 streaming. A user can configure settings in YouTube to use the AV1 codec.
In early 2022, Meta started rolling out AV1 codec for Instagram Reels. They noticed significant reduction in bitrates and improved playback experience. Meta stated that AV1 will be the most viable codec for them over the next several years.

Several large-scale video content platforms have planned and talked about adopting AV1 with its increasing hardware support and improved software optimizations of the codec.

Software implementations

libaom: Developed by AOMedia, it is the reference implementation of the AV1 codec. It has demonstrated the efficiency and features of AV1 but has been significantly slow in terms of encoding time. It has an encoder called aomenc and decoder called aomdec.
SVT-AV1: It is an open-source production encoder that shows significantly improved performance. Netflix and Meta use it in their production applications. It uses multi-threading and vectorization techniques to utilize modern hardware, making it suitable for real-time and offline video encoding applications. On March 13, 2024, SVT-AV1 2.0.0 was released.
dav1d: It is a cross-platform, open-source AV1 decoder focused on speed and correctness. It is used widely in production, for example, in Meta and Netflix. It outperforms other AV1 decoders like libgav1 and libaom. In May 2019, Firefox switched its default AV1 decoder from libaom to dav1d.

Delivering videos using the AV1 codec with ImageKit

Let’s understand how to encode and deliver videos in the AV1 codec.

💡

We recommend combining the latest codecs, like AV1 (or VP9), with a widely supported codec like H.264/AVC as a fallback to ensure better compatibility across different devices and platforms while offering a good viewing experience on modern devices with newer codec support.

For example, YouTube encodes content using H.264/AVC, VP9, and AV1. Similarly, Netflix uses H.264/AVC, VP9, AV1, and H.265/HEVC in some cases.

Setting up a video encoding pipeline and delivery infrastructure can quickly get tricky. It includes figuring out and implementing the technical details of:

Configuring applications to load the correct file based on codec support or detecting codec support in the backend service.
Maintaining separate cache copies of video content on the CDN.

ImageKit’s Video API can help you with this and many more problems, with many valuable features that can simplify your video delivery needs. Along with automatically detecting and delivering the best format and codec, you can utilize these powerful features right out of the box:

URL-based video transformation.
Automatically convert video format on the same URL.
Dynamically generate thumbnails.
Seamlessly integrate your external storage, such as AWS S3 or web servers, and deliver optimized videos without much effort.

How does AV1 work?

AV1 is based on Google's VP9. It improves upon traditional block-based encoding techniques with new features for better adaptability to various input video types. Here's a simplified breakdown of the different steps:

Block partitioning

The video frames aren’t processed as a whole. In AV1 video encoding, they are divided into same-sized blocks called "superblocks", which can be either 128x128 or 64x64 pixels. These superblocks can then be split into smaller blocks using different partitioning patterns. These blocks are then processed at a smaller level.

There are five ways in which AV1 partitions the blocks:
1. Two-way split
2. Four-way split
3. T-shaped split
4. Horizontal split (4:1 ratio)
5. Vertical split (1:4 ratio)

When the four-way split pattern is used, its partitions can be recursively subdivided. This enables partitioning down to blocks as small as 4x4 pixels.

New partitioning patterns, including "T-shaped" splits, horizontal (4:1) and vertical (1:4) that splits into four stripes, have been introduced in this codec.

Two-way, Four-way, T-shaped, 4:1, 1:4 partitioning of blocks [Source: Wikipedia AV1]

Inter-frame prediction

Inter-frame prediction is a technique used in video compression to predict how pixels will change from one frame to the next by using information from previous frames. For example, in a video of someone walking, this method analyzes how the person moved in earlier frames to predict their position in the upcoming frame.

Here are the key points used by the AV1 codec in inter-prediction:

AV1 uses higher precision processing with 10 or 12 bits per sample to minimize rounding errors. This ensures that subtle details in the video are preserved more accurately during prediction and encoding.
It uses wedge-partitioned prediction, for smoother and sharper transition gradients in different directions in a block.
AV1 expands frame referencing to utilize 7 out of 8 available frames in the decoded frame buffer. By referencing more frames, AV1 can better analyze motion and predict pixel changes between frames accurately.
Motion vectors tell where to look in the previous frames to find similar pixels. AV1 introduces Warped Motion and Global Motion tools to reduce redundant motion vector information.
AV1 introduces Switch frames, a type of inter-frame prediction that uses already-decoded reference frames from a higher-resolution version of the same video. This allows smooth resolution switching in adaptive bitrate streaming without requiring the full keyframe at the beginning of each video segment. Switch frames enable efficient bitrate adaptation while maintaining video quality.
AV1 employs overlapped block motion compensation (OBMC) to address visible boundaries along block borders. This technique extends block sizes to overlap neighbouring blocks, blending these overlapping regions together.

Intra-frame prediction

Intra-frame prediction is a method for guessing what the pixels in each part of a frame look like, using only the information from that same frame. Rather than looking at other frames, it's essentially making an educated guess about a block's contents based on the pixels surrounding it.

Like VP9, AV1 uses directional predictors to extrapolate neighbouring pixels based on specified angles. It is like drawing lines from the edges to fill a block in different directions.

AV1 replaces VP9's "TrueMotion" predictor with the Paeth predictor. It computes a simple linear function of the three neighbouring pixels (left, top, top-left), then chooses as predictor the neighbouring pixel closest to the computed value.

Transforming the residual data

After the prediction estimations are done, AV1 calculates the difference between the predicted and actual block. This difference is called "residual data." This residual data is then transformed using different mathematical techniques, simplifying the data for size reduction and organizing it in a special pattern that prioritizes important information for better video quality and compression efficiency.

In AV1, encoders use different transform options which include square and rectangular DCTs (Discrete Cosine Transform) and an asymmetric DST (Discrete Sine Transform).

The asymmetric DST is used for blocks where errors are expected to be lower due to nearby pixel prediction. The Discrete Cosine Transform (DCT) is used because it efficiently represents the video frame data in a compact form. It helps compress the residual error signal by transforming it into frequency coefficients, making it easier to discard less visually relevant information while preserving important details. Alternatively, encoders can choose not to transform the errors at all.

AV1 allows the combining of two one-dimensional transforms. This means different transformations can be applied along the horizontal and vertical dimensions. This flexibility helps pick the transform choices for specific block content characteristics.

Quantization

Quantization is a process that further compresses the output of the preceding transformation steps. It reduces the precision of the numbers representing the pixels information through division and rounding off. However, some video details are lost as the file size decreases. (That's why it's called "lossy" compression.)

AV1 allows for different levels of quantization to be applied to different blocks of the video, depending on their importance. This helps maintain quality where it matters most while aggressively compressing less critical parts.

Filters

Compression processes can introduce artifacts (undesired distortions and anomalies) in the video. Filters are tools used in video encoding to remove such artifacts and enhance visual quality.

AV1 employs several filters to improve video quality:

Constrained Directional Enhancement Filter (CDEF) combines Thor's low-pass filter with Daala's directional de-ringing filter. This filter makes the edges along the blocks smoother, especially in the directions where the edges are most noticeable. This gets rid of any distracting artifacts around the edges and improves clarity.
Loop Restoration Filter eliminates blurry areas caused by block processing, improving sharpness and details that might have been lost during encoding.
Deblocking Filter: This filter smooths transitions between adjacent blocks, reducing visible block artifacts caused by quantization errors.

In addition, AV1 offers optional Film Grain Synthesis to address the challenges of encoding film grain. It removes film grain during the encoding process. Film grain is analyzed, parameters are determined, and then passed along to the decoder, which recreates and adds a synthetic noise signal that resembles the original frame.

Entropy Coding

Entropy coding further reduces the encoded data size for efficient storage and transmission by assigning shorter codes to common patterns in the video data and longer codes to less common ones, reducing the overall size of the encoded video by utilizing statistical redundancies within the data.

AV1 uses a multi-symbol arithmetic coding method, which is an improvement over VP9's binary arithmetic coding in terms of efficiency and processing.

Conclusion

The AV1 codec is a royalty-free video codec that addresses the need for efficient and high-quality video delivery. It outperforms codecs like AVC, VP9, and HEVC at the expense of longer encoding times. However, with improving hardware and software optimizations, that gap is closing.
AV1 enhances efficiency and quality and gives a better experience for HD, UHD, 4K, and HDR video playback, especially on devices that provide dedicated support for it or in low-bandwidth situations.
It reduces delivery costs associated with larger and more-played videos.
AV1 can be effectively combined with VP9 and H.264 codecs in production applications.