How YouTube delivers video (DASH adaptive streaming)

There is no single MP4 file sitting on YouTube's servers for any video you watch. The architecture does not work that way, and understanding this is the key to understanding why downloading is more complicated than it seems.

YouTube switched to DASH (Dynamic Adaptive Streaming over HTTP) around 2013. Before DASH, the platform served traditional combined streams where audio and video were interleaved in one file. Simple, but inefficient. DASH separated audio and video into independent tracks, each available at multiple quality levels. The player fetches small segments from each track and combines them locally in your browser.

Why DASH exists

Adaptive bitrate streaming means the player can independently adjust video and audio quality in real time. If your connection drops briefly, the video switches to a lower resolution but the audio may stay crisp. If you are on a very fast connection, you can get 4K video and high-quality audio simultaneously. This reduces buffering and improves the experience across the wildly different network conditions YouTube's 2.53 billion monthly users experience.

YouTube serves content to roughly 44% of US connected TV watch time in 2026, according to platform data. Those large-screen viewers need 4K at high bitrates. Mobile users need smooth 360p on patchy connections. DASH serves both with the same infrastructure.

The practical consequence for downloading

When you use a YouTube downloader at any resolution above 720p, the tool must fetch the video-only stream and the audio-only stream separately, then merge them. This is a two-request operation instead of one. Tools that do not implement the merge step simply cannot offer 1080p. They cap at 720p because that is the highest resolution still available as a legacy combined (progressive) stream on most videos.

Why downloads sometimes come as MKV

MP4 has strict codec compatibility requirements. The MP4 container works cleanly with H.264 video paired with AAC audio. It can technically hold VP9 and Opus, but compatibility with non-browser players is inconsistent. When a downloader grabs a VP9 video stream and pairs it with an Opus audio stream (which is what Chrome and Firefox request from YouTube), the cleanest way to contain that combination is MKV. MKV supports any codec combination without complaint. If you need MP4 specifically, you need either H.264+AAC or to accept that one stream gets re-encoded.

Why 1080p and 4K require stream merging

The 720p cutoff is not arbitrary. YouTube provides progressive download streams (video with embedded audio) up to 720p for legacy compatibility. These are the "itag 22" streams in YouTube's format list, usable without any merge step.

Above 720p, all streams are DASH-only: video-only and audio-only, downloaded separately. A 1080p video-only stream has no audio data at all. To create a watchable 1080p MP4, a downloader must:

  1. Fetch the best 1080p video-only stream (typically H.264 or VP9)
  2. Fetch the best available audio-only stream (typically AAC or Opus)
  3. Pass both through ffmpeg with the stream copy flag (-c copy) to merge them into one container
  4. Serve the combined file

The stream copy step is fast. It does not re-encode anything. It simply places both streams into a shared container. A 90-minute 1080p video takes roughly 30-60 seconds to merge. A full re-encode at the same resolution would take 15-30 minutes on typical server hardware.

Mux vs transcode: the critical distinction

Muxing (multiplexing) is the stream copy approach. No quality loss. Fast. The output is bit-for-bit identical to what YouTube served as separate streams.

Transcoding re-compresses the video data. Quality loss is guaranteed. The file may end up larger or smaller depending on the target bitrate settings, but the video quality ceiling drops. If a tool offers "1080p" but takes 20+ minutes and produces a large file, it is likely transcoding. If it takes 1-2 minutes, it is muxing.

For details on how bitrate choices affect the output, our bitrate, quality, and file size guide covers the numbers in depth.

Codec comparison: H.264 vs VP9 vs AV1

YouTube serves three video codecs depending on the viewer's browser, device capability, and the video's resolution. Understanding which one you are getting explains why files from different tools vary so dramatically in size.

CodecCompatibilityEfficiency vs H.264Common ResolutionsApproximate File Size
H.264 / AVCUniversal (every device, every player)Baseline360p to 1080pLargest
VP9Chrome, Firefox, Android, Samsung TVs30-50% smaller at equal quality1080p to 4KMedium
AV1Modern browsers, YouTube TV app, some 2022+ TVs50-70% smaller at equal quality4K to 8KSmallest

H.264: the safe choice

H.264 works everywhere. DVD players, ancient smartphones, Windows Media Player, every smart TV from the last decade, GoPro cameras, car infotainment systems. If you are downloading content to play on a device where compatibility is unclear, H.264 MP4 is the right choice. The file will be larger than VP9 or AV1 for the same quality, but it will play.

H.264 is YouTube's fallback codec. Devices and browsers that announce they cannot handle VP9 receive H.264 streams. Most downloads end up as H.264 even from tools that do not explicitly say so, simply because H.264 is what gets served to their server-side fetching infrastructure.

VP9: the efficiency sweet spot

VP9 is what Chrome and Firefox request from YouTube for most 1080p and 4K content. It is 30-50% more efficient than H.264, meaning you get the same visual quality in a smaller file. The tradeoff is that some older devices and players (especially hardware-based media players and older TVs) cannot decode VP9 in real time.

If your playback environment is a modern computer, phone, or streaming device (Chromecast, Apple TV 2021+, Roku), VP9 is fine. If you are going to play the file on older or embedded hardware, stick with H.264.

AV1: the future that is slowly arriving

AV1 is the most efficient codec YouTube supports, used primarily for 4K and 8K content on modern hardware. It is open-source and royalty-free. YouTube streams AV1 to Chrome and Firefox on capable hardware when watching 4K. The file size savings over VP9 are real (another 20-30% reduction) but require a capable decoder. Hardware AV1 decoding arrived in late 2022 in most new CPUs and GPUs, so it is now reasonably safe for desktop playback.

For archiving 4K content you want to keep long-term, AV1 is worth the smaller file size if you know your playback devices can handle it. For anything that might end up on older hardware, download VP9 or H.264.

The complete guide to video formats and codecs has waveform analysis and side-by-side quality comparisons for all three codecs at various bitrates.

Resolution guide: 360p through 8K

Resolution determines how many pixels the video contains, but it is not the only factor in perceived quality. Bitrate matters too. A high-bitrate 720p video can look better than a low-bitrate 1080p video because the encoder had enough bits to represent detail accurately.

Practical use cases for each resolution

240p and 360p: These are archival and minimal-bandwidth formats. A 360p video looks acceptable on a phone screen held at arm's length. On a 55-inch TV it looks like someone smeared petroleum jelly on the lens. Use these only for extremely long content where storage is the hard constraint, or for reference copies.

480p: The old DVD standard. Still watchable on medium screens. Reasonable for podcasts, talking-head interviews, or any content where visual detail is secondary to audio. Not recommended if 720p is available.

720p (HD): The minimum for comfortable laptop viewing. Handles most content well. This is also the highest combined-stream resolution YouTube still provides, which means 720p downloads do not require stream merging. Fast and simple.

1080p (Full HD): The sweet spot for most use cases. Looks good on 4K monitors (upscaling is fine at this scale), correct on 1080p monitors, fine on phones. Most content on YouTube was uploaded at 1080p or below. Requires stream merging as described above.

1440p (2K / QHD): Rarely necessary for non-gaming content. Useful for presentations, screencasts, and desktop recordings at high DPI. File sizes jump considerably.

2160p (4K / UHD): Worthwhile only if: (a) the original video was shot at 4K, (b) your display is 4K, and (c) you are viewing at a distance where pixels are distinguishable. Downloading 4K from a video that was uploaded at 1080p gets you the YouTube upscale, not real 4K detail.

4320p (8K): Exists on YouTube. Practically: almost no content was originally produced at 8K resolution, and almost no consumer displays show a meaningful difference between 4K and 8K at normal viewing distances. 8K downloads are enormous. They exist mostly as benchmarks.

The YouTube video quality and resolution guide has a full breakdown of how YouTube's upload processing affects each resolution.

Source upload quality limits everything

This bears repeating clearly: if a video was uploaded at 720p, downloading it at "1080p" gets you YouTube's upscaled version. The actual pixel detail is 720p. YouTube processes uploads and generates versions at each resolution below the source, not above it. Requesting a resolution higher than the upload is asking for a software upscale, which adds no real detail.

How YTCut handles MP4 downloads

YTCut fetches the requested video from YouTube's DASH streams server-side and processes the merge with ffmpeg. Here is exactly what happens under the hood.

Format selection

YTCut defaults to H.264 video paired with AAC audio. This pairing was chosen deliberately for maximum compatibility. Both codecs are natively supported by the MP4 container specification. The combined output is a standard MP4 file that plays in every media player without requiring codec packs or compatibility modes.

For resolutions at or below 720p, YTCut can use the existing combined stream when available, which is faster because no merge step is needed.

For 1080p and above, YTCut fetches the best available H.264 video-only stream and the best AAC audio-only stream, then runs ffmpeg with stream copy flags to merge them. No re-encoding occurs. The output quality is identical to what YouTube's servers delivered.

Quality vs file size: CRF 21 and what it means

When YTCut needs to re-encode (for format conversion or trim operations), it uses H.264 with CRF (Constant Rate Factor) 21. CRF is a quality-based encoding mode rather than a fixed bitrate. CRF 0 is lossless, CRF 51 is maximum compression with severe degradation. CRF 18-23 is the generally accepted range for "visually lossless" content where compression artifacts are imperceptible.

CRF 21 is a reasonable middle ground: excellent perceived quality, moderate file size. A CRF 21 encode of a 1080p video will look indistinguishable from the source on any normal display. Side-by-side pixel-level comparison would show minor differences in very fast motion, but nothing that matters in practice.

Why processing happens on YTCut's servers

Your browser cannot access YouTube's CDN streams directly. Those streams require authentication tokens generated by YouTube's backend that expire within minutes and are tied to session identifiers. YTCut's server handles the authenticated fetch, the stream selection, the merge, and serves the resulting file over a clean direct-download URL. Nothing is installed on your device. The entire pipeline is remote.

If you want to trim the video before downloading, YTCut's cutting tool handles that before the final MP4 is assembled.

Tool comparison

Here is an honest side-by-side of the main YouTube-to-MP4 options available in 2026. No affiliate links, no sponsored rankings.

ToolEase4K SupportCodec ChoiceBatchPlatformFree?
YTCutExcellent (browser, no install)Up to 1080pH.264 defaultNoWebYes, fully
yt-dlpRequires terminal familiarityFull 8K supportFull control (H.264, VP9, AV1)Yes (playlists, channels)Win / Mac / LinuxYes, open source
4K Video DownloaderGood (GUI)Yes (4K, 8K)VP9, H.264, AV1Yes (playlists)Win / MacFree tier (limited), paid full
cobalt.toolsVery easy (web)Up to 4KH.264, AV1 optionsNoWebYes
Browser extensionsVery easy (one click)Usually 720p maxLimitedNoChrome / FirefoxUsually free with caveats

When to use each tool

Use YTCut for quick single-video downloads at 1080p or below where you want MP4 out with no setup. Use yt-dlp when you need playlists, specific codec control, 4K, or you are building scripts. Use 4K Video Downloader if you want a GUI app and are willing to manage an installation. cobalt.tools is a good browser-based alternative for 4K. Browser extensions are convenient but often max at 720p because they rely on the legacy combined streams.

For a step-by-step guide on using yt-dlp specifically for video downloads, see our complete guide to video formats and codecs which includes yt-dlp format selector syntax.

File size reference table

File size estimates below assume a typical streaming bitrate for each resolution and codec. Actual file sizes vary because YouTube's encoding quality depends on the source content: a screencast with static areas compresses much better than a fast-moving sports broadcast.

ResolutionCodecTypical BitrateFile Size per HourNotes
360pH.264400-700 kbps~250 MBGood for speech, rough for video
480pH.264700-1,200 kbps~400 MBAcceptable, small storage
720pH.2642-3 Mbps~1 GBHD, combined stream available
1080pH.2644-8 Mbps~2.5 GBRequires stream merge
1080pVP92-4 Mbps~1.5 GB30-50% smaller than H.264
1440pVP96-12 Mbps~4 GBNiche use, large files
4K (2160p)H.26420-45 Mbps~12 GBVery large, wide compatibility
4K (2160p)VP9 / AV110-20 Mbps~6 GBBetter efficiency, modern devices
8K (4320p)AV150-100 Mbps~30-60 GBPractically, very limited content

These numbers explain why a "1080p download" can be 800 MB from one tool and 3 GB from another. A tool using VP9 at the YouTube-native bitrate produces a smaller file. A tool using H.264 at a higher target bitrate produces a larger one. Neither is wrong in absolute terms, but knowing what you are getting helps you make the right choice for your storage situation.

For an even deeper breakdown of bitrate, quality targets, and codec efficiency, the bitrate, quality, and file size guide has per-content-type recommendations.

The legal situation for YouTube MP4 downloads mirrors the audio situation. Two separate frameworks apply.

YouTube's Terms of Service prohibit downloading in Section 5B unless you have explicit permission or use YouTube Premium. This is a contract between you and YouTube. Violating it risks account suspension. It is not a criminal matter.

Copyright law applies separately. The video content belongs to whoever created it. Personal use may qualify as fair use in some jurisdictions (US in particular), but this is fact-specific and not a guaranteed protection. Distributing, uploading elsewhere, or monetizing downloaded content is clearly infringement in virtually every country.

Creative Commons licensed content on YouTube (music, some educational videos) can be legally downloaded and reused depending on the specific license. Always check the license in the video description.

Our detailed legal analysis of YouTube downloading covers the jurisdiction-specific nuances including EU private copy exceptions and the DMCA safe harbor framework.

Frequently asked questions

Why does YouTube use separate audio and video streams?

YouTube switched to DASH (Dynamic Adaptive Streaming over HTTP) around 2013. DASH stores audio and video in separate tracks so the player can independently adjust quality for each based on your connection speed. On a slow connection, you might get 480p video but still get high-quality audio. The separate tracks also let YouTube serve different video codecs (H.264, VP9, AV1) to different browsers while using a single audio stream. This is efficient for streaming but means no single combined MP4 file exists at resolutions above 720p on YouTube's servers.

Why do 1080p downloads need merging?

Because YouTube stopped offering combined audio-video streams at 1080p and above when it switched to DASH. The 1080p video stream has no audio track. The audio stream has no video. A downloader must fetch both separately and merge them with a tool like ffmpeg. Tools that skip this step can only offer 720p maximum, which is the highest resolution still available as a legacy combined stream on most YouTube videos.

What is the difference between H.264, VP9, and AV1 on YouTube?

H.264 (AVC) is the oldest and most widely supported video codec. Every device plays it. YouTube serves H.264 to browsers and devices that cannot handle newer codecs. VP9 is Google's codec, roughly 30-50% more efficient than H.264 at equivalent quality. Chrome and Firefox receive VP9 streams by default. AV1 is the newest, most efficient codec, used for 4K and 8K content. It delivers the best quality-to-size ratio but requires more processing power to decode. Older devices and browsers fall back to VP9 or H.264 automatically.

Which resolutions are available for YouTube download in 2026?

Available resolutions depend entirely on what the uploader submitted. YouTube accepts up to 8K (7680x4320) and serves 4320p, 2160p (4K), 1440p, 1080p, 720p, 480p, 360p, and 240p. Not all resolutions are available for all videos. A video uploaded at 1080p will not have a 4K stream. A video uploaded at 4K will have streams at all resolutions below it as well. Check the available formats with yt-dlp -F VIDEO_URL to see exactly what streams exist for a specific video.

Why is my YouTube MP4 download larger than expected?

Several reasons. If the tool used H.264 instead of VP9, the file is roughly 30-50% larger for equivalent quality. If the tool re-encoded rather than merging streams, the bitrate may have been set higher than the source requires. Very long videos at high quality settings produce very large files. A 1080p H.264 video at typical YouTube bitrates runs about 4-8 Mbps, which means a 2-hour movie would be 4-8 GB. This is not a bug. It is just math.

Do downloaders re-encode or just merge?

Good ones merge (remux) without re-encoding. A tool that fetches the video-only and audio-only streams from YouTube and combines them using ffmpeg stream copy is doing a lossless merge. The video and audio data are unchanged. A tool that re-encodes re-compresses the video, which loses quality and takes much longer. You can often tell the difference by timing: a remux of a 1-hour 1080p video takes about 30 seconds. A full re-encode at the same resolution takes 10-30 minutes on typical server hardware.

How can I download without losing quality?

Use a tool that does stream copy (remux) rather than re-encoding. For yt-dlp, the default behavior when formats are compatible is to remux without re-encoding. For web tools, look for ones that explicitly state they use ffmpeg for merging. The output will be the exact same quality as what YouTube streams to your browser, which is itself already a compressed version of whatever the uploader originally submitted.

Why does one downloader give MKV instead of MP4?

When the best video stream uses one codec (say VP9) and the best audio stream uses another (say Opus), they cannot be cleanly placed into an MP4 container without re-encoding one of them. MKV supports virtually any codec combination, so the tool defaults to MKV for the combined file. If you specifically need MP4, you have two options: accept a re-encode of one stream (quality loss), or select H.264 video paired with AAC audio (both are MP4-compatible). YTCut handles this by defaulting to H.264 for maximum compatibility.

If you only need the audio and not the full video, you can extract just the audio as MP3 or M4A without dealing with stream merging at all.