posh.wiki


The intro to video transcoding I wish I had

2026-04-04

Tags: video

There's a lot that goes into storing a video, far more than most people ever need to think of. It's quite complicated to learn about, so here's the introduction I wish I had when I started out with video archival, including some commands for ffmpeg (a free, open-source, popular, and very powerful multimedia framework) to work with them.

Resolution

A video, as presented on a computer, is simply a stream of images. Each image has a width and a height, known as its "resolution".

Some resolutions are expressed as a whole number suffixed with "p" (for "progressive scan"). In this case, the number should be taken to mean the number of pixels vertically. The number of horizontal pixels is usually calculated as the ratio 16:9 (with 9 being the vertical aspect).

Numbers suffixed with "K" are interpreted differently based on whether using "Ultra High Definition" (UHD) or Digital Cinema Initiatives (DCI) measures. The most common is 4K, which is always 2160p high. In UHD, the width is calculated at 16:9, while DCI ratios can vary.

The total count of pixels is equal to the number of vertical pixels multipled by the number of horizontal pixels. A higher total pixel count means higher quality, but also that there's more data to store.

This table breaks down common resolution names.

Name Aliases Aspect ratio Dimensions (px) Pixel count
Quarter Video Graphics Array QVGA 4:3 320 x 240 76,800
Video Graphics Array VGA 4:3 640 x 480 307,200
Standard Definition SD, 480p 16:9 720 x 480 345,600
High Definition HD, 720p 16:9 1280 x 720 921,600
Full High Definition FHD, 1080p 16:9 1920 x 1080 2,073,600
Consumer 2K Quad HD, UHD 2K, 1440p 16:9 2560 x 1440 3,686,400
Cinema 2K DCI 2K Variable variable x 1440 variable
Consumer 4K UHD 4K, 2160p 16:9 3840 x 2160 7,516,800
Cinema 4K DCI 4K Variable variable x 2160 variable
Consumer 8K UHD 8K 16:9 7680 x 4320 33,177,600
Cinema 8K DCI 8K Variable variable x 4320 variable

To scale a video while preserving the aspect ratio:

ffmpeg -i in.mkv out.mkv -vf scale=480:-2,setsar=1:1 

To set the aspect ratio explicitly, using black bars to fill empty space (substitute $WIDTH and $HEIGHT with the new values):

ffmpeg -i in.mkv out.mkv -vf "[in]scale=iw*min($WIDTH/iw\,$HEIGHT/ih):ih*min($WIDTH/iw\,$HEIGHT/ih)[scaled]; [scaled]pad=$WIDTH:$HEIGHT:($WIDTH-iw*min($WIDTH/iw\,$HEIGHT/ih))/2:($HEIGHT-ih*min($WIDTH/iw\,$HEIGHT/ih))/2[padded]; [padded]setsar=1:1[out]"

Frame rate

As previously mentioned, a video is just a series of images in rapid succession. The "frame rate" of the video represents how many individual images are displayed each second. It's measured in "Hertz" (Hz), meaning "number of occurences per second". A higher value makes for a smoother video with less motion blur, but takes more space.

Movies are generally filmed in 24Hz, though some differ from this as stylistic choices.

Analog TV uses either 50Hz (in PAL/Phase Alternating Line regions), or 59.94Hz (in NTSC/National Television System Committee regions).

Digital TV is transmitted at 30 or 60Hz for ATSC (Advanced Television Systems Committee) regions, and 25 or 50Hz for DVB (Digital Video Broadcasting) regions.

You can change framerate using -filter:v fps=$VALUE in ffmpeg, e.g.:

ffmpeg -i in.mkv out.mkv -filter:v fps=30

Containers & codecs

Each file type for video, such as .mp4 or .mkv, represents a container for video and audio, plus additional optional features. Both video and audio streams are encoded using a "codec", which defines how the data is stored on disk, with a trade off between quality and file size.

MP4 (MPEG-4 Part XIV) is a partially open container format. It's defined by an ISO standard, but there are some patents related to it. It has limited support for subtitles, chapters, and multiple audio tracks. It's great for compatibility as it's supported by the majority of platforms.

MKV (Matroska Video) is a fully open container standard with very high flexibility. It has excellent support for subtitles, multiple audio tracks, and chapters.

AVI is a proprietary container format designed by Microsoft. These days, it's considered legacy, with very limited support for subtitling and multiple audio tracks, and no support for chapters.

MOV is Apple's proprietary container format. It has great support for subtitles, multiple audio tracks, and chapters, and is often used by video editing professionals.

WebM is an emerging and fully open container standard developed by Google. It has good support for subtitles and multiple audio tracks, but limited support for chapters.

Video codecs

The video codec defines how a video is stored as data. Storing each frame as an individual image would be terribly inefficient, so codecs employ various strategies to make storage more efficient, at the cost of some quality loss. Note that not all codecs are supported everywhere - VLC media player supports most, but more advanced codecs may not be supported on platforms such as the web.

Many DVDs and other older media are encoded with the patented MPEG-2 codec, which is widely supported for playback but terribly inefficient and yields low quality results in comparison with modern alternatives.

H.264 (AVC, Advanced Video Codec) is widely supported for playback, but subject to many patents.

H.265 (HEVC, Highly Efficient Video Codec) is an improvement upon AVC with higher quality and compression efficiency, though it's also subject to many patents.

VP9 (Video Processing 9) is an open standard with similarly high compression efficiency and quality, optimised for video streaming.

AV1 (AOMedia Video 1) is an open standard that provides the best compression ratio and quality. It's compatible with relatively few devices, but support continues to grow.

You can re-encode a video by passing -vcodec or -c:v to ffmpeg. It can be explicitly preserved using value copy. E.g.:

ffmpeg -i in.mkv out.mkv -vcodec mpeg2

Audio Codecs

Audio codecs, similar to video codecs, define how audio data is stored as bytes. Unlike video codecs, audio codecs can (but don't always) retain the full quality of the original audio.

MP3 (MPEG-1 Audio Layer III) is a low-efficiency lossy audio codec with an expired patent. It's supported nearly everywhere, but delivers relatively terrible quality.

AAC (Advanced Audio Coding) is a slight improvement upon MP3. It remains lossy, but has a higher quality and efficiency, though it is subject to patents.

Opus is a high-quality lossy codec with an open standard. Its main benefit is its low latency, which makes it better suited for real-time voice communications than media playback.

FLAC (Free Lossless Audio Codec) is a lossless, high-quality, high-efficiency codec with an open standard. It's the option for retaining full quality in audio. Apple has a variant, ALAC (Apple Lossles Audio Codec), used in their own ecosystem.

You can transform a video file's audio codec using ffmpeg using -acodec or -c:a, with copy meaning preserve the current, e.g.:

ffmpeg -i in.mkv out.mkv -acodec mp3

Container codec support comparison

Legend:

Higher quality codecs are presented further towards the right. More versatile containers are presented further towards the bottom.

Video

Container x Codec MPEG-2 H.264 (AVC) H.265 (HEVC) VP9 AV1
WebM N N N Y Y
AVI Y L L N N
MOV Y Y Y N L
MP4 Y Y Y L Y
MKV Y Y Y Y Y

Audio

Container x Codec MP3 AAC Opus FLAC ALAC
WebM N N Y N N
AVI Y Y N N N
MOV Y Y N N Y
MP4 Y Y L N Y
MKV Y Y Y Y Y

Conclusion

Picking the best resolution, codecs, and containers for a video really depends on the data you already have and how much you value quality over file size.

There's a lot that hasn't been mentioned here too, including suitability for streaming, support for digital rights management (DRM), error resilience, and so, so much about colours.