Why People Extract Audio from YouTube (The Actual Reasons)
Let's start here because the reason you're extracting audio matters a lot for which method and format you should pick. Not all use cases are the same, and treating them like they are is how you end up with a 320kbps MP3 of a 45-minute lecture you're only going to listen to once while doing dishes.
Language learning
This is probably the biggest category. YouTube has an enormous amount of native-speaker content in every language imaginable, from French cooking shows to Japanese variety programs to Spanish news broadcasts. Language learners extract short audio segments to practice listening, shadow speakers, and build vocabulary. They need clips. Not the full hour-long video. Just the 90-second segment where the host explains something interesting using vocabulary at their level.
For this use case: MP3 at 192kbps is plenty. You want small files you can load onto a phone. Audio quality matters because speech clarity is important, but you don't need studio-grade WAV files for this.
Podcast clips and reference material
Someone says something genuinely interesting in a three-hour podcast. You want that clip. The whole podcast is on YouTube (most are these days) and you need the two-minute segment where the guest drops a fact you want to share or listen to again. Downloading the entire three-hour video as an MP4 to get 120 seconds of audio is absurd. Extract just what you need.
Music snippets for personal use
Live performances. Rare B-sides that never got an official release. Session recordings that exist only on YouTube. Bootleg concert footage from 2008 that will never be on any streaming platform. People pull these because there's genuinely no other option to have them offline.
For music: quality matters more. Use 320kbps MP3 or M4A if the source is good enough to justify it. Don't bother with anything above 192kbps if the original upload was a potato-quality 240p video from 2009.
Lecture notes and educational content
University lectures uploaded to YouTube. Coding tutorials. Cooking demonstrations. History documentaries. People extract these to listen while commuting, exercising, or doing tasks where they can't watch a screen. Audio-only works perfectly for content that's primarily voice-driven.
Ringtones and notification sounds
Guilty. We all know someone who has done this. A specific sound effect, a memorable line, a piece of music. People grab very short clips (5-30 seconds) for personal use. For this: MP3 works, and you want a precise start and end point so the clip doesn't have a bunch of silence before the sound hits.
Content creation and remixing
Podcasters sampling interview clips. Video editors using B-roll audio. YouTube creators using audio from their own older videos. DJ practice. This is where format choice gets more serious: WAV or FLAC if you're doing further audio processing, because you want the full quality before you start doing anything else to it.
With the "why" established, let's go through every method.
Method 1: YTCut (Precise Segments, Any Format)
This is the method you want when you need a specific segment of a video's audio, not the whole thing. Most tools extract the entire video's audio track. YTCut lets you set an exact start time and end time, then download only that portion as audio. This is genuinely useful.
Step-by-step
- Go to ytcut.org and paste the YouTube URL into the input field.
- The video preview loads. Use the timeline to set your start point. You can type exact timestamps in the fields or drag the handles on the waveform view. Millisecond precision is available.
- Set the end point the same way. If you're grabbing a specific quote that ends at exactly 4:32.8, you can set it there rather than guessing with a rough cut.
- Click the format dropdown. You'll see audio options: MP3, M4A, WAV, OGG. Pick the one you need.
- For MP3: you can select bitrate (128kbps, 192kbps, 320kbps). For most uses, 192kbps is the right call.
- Click Download. Your exact audio segment downloads directly.
Why YTCut is the right tool for segments
Most other tools (including yt-dlp without extra scripting) will download the entire audio track of a video and expect you to trim it afterward in a separate audio editor. YTCut cuts the segment server-side before downloading. This means you download a 45-second MP3 instead of a 3-hour WAV file that you then have to open in Audacity to trim. For the average person who just wants a clip, this is significantly faster and easier.
The tradeoff: you need an internet connection and you're relying on the YTCut server to do the processing. For offline workflows, or if you're doing batch extraction of dozens of clips, yt-dlp is a better choice.
Method 2: yt-dlp Command Line (Best Quality Control)
yt-dlp is a fork of the now-abandoned youtube-dl project. It's actively maintained, significantly faster, and has more features. If you're comfortable with a terminal, this is the most powerful option for audio extraction.
Install yt-dlp
On Mac with Homebrew: brew install yt-dlp
On Windows: Download the .exe from the GitHub releases page and add it to your PATH, or install via pip: pip install yt-dlp
On Linux: sudo apt install yt-dlp or pip install yt-dlp
You also need FFmpeg installed, which yt-dlp uses for audio conversion. On Mac: brew install ffmpeg. On Windows: download from ffmpeg.org and add to PATH.
The basic extraction commands
Download as MP3 at 192kbps:
yt-dlp -x --audio-format mp3 --audio-quality 192K "https://www.youtube.com/watch?v=VIDEOID"
What each flag does:
-x: Extract audio. This tells yt-dlp to download the best available video stream and then strip the video, keeping only audio. Without this flag it downloads video.--audio-format mp3: Convert the extracted audio to MP3. Without this, yt-dlp gives you whatever format YouTube provides natively (usually M4A or WebM/Opus).--audio-quality 192K: Sets the bitrate for the conversion. Valid values: 0 (best VBR) through 9 (worst VBR) for VBR mode, or a specific bitrate like 128K, 192K, 320K.
Download as WAV (uncompressed):
yt-dlp -x --audio-format wav "https://www.youtube.com/watch?v=VIDEOID"
Download as M4A (keep native format, no re-encode):
yt-dlp -x --audio-format m4a "https://www.youtube.com/watch?v=VIDEOID"
Download best quality audio, no re-encoding:
yt-dlp -f bestaudio "https://www.youtube.com/watch?v=VIDEOID"
This last command is the one audiophiles care about. It downloads the native audio stream from YouTube without any re-encoding. YouTube's best audio is typically 128kbps AAC (M4A) or 160kbps Opus (WebM), depending on the video. No conversion means no generation loss.
Download audio with custom filename
yt-dlp -x --audio-format mp3 --audio-quality 192K -o "%(title)s.%(ext)s" "URL"
The -o flag sets the output filename template. %(title)s uses the video title, %(ext)s uses the correct extension.
Batch download audio from a playlist
yt-dlp -x --audio-format mp3 --audio-quality 192K "https://www.youtube.com/playlist?list=PLAYLISTID"
This downloads every video in the playlist as MP3. yt-dlp handles the whole thing automatically. If a download fails partway through, just run it again and it skips already-downloaded files.
Method 3: VLC Media Player
VLC can open YouTube URLs and convert the stream to audio. This is useful if you already have VLC installed and don't want to install anything new. The quality is decent, but the process is a bit clunky compared to dedicated tools.
Steps
- Open VLC. Go to Media in the top menu.
- Click "Convert/Save" (Ctrl+R on Windows).
- Click the "Network" tab and paste the YouTube URL.
- Click "Convert/Save" (not Open).
- In the Convert dialog, click the dropdown next to Profile and choose "Audio - MP3".
- To customize quality: click the wrench icon next to the profile. Go to the Audio codec tab. Change the bitrate to 192kbps or higher.
- Set a destination file path under "Destination file".
- Click Start. VLC processes the stream and saves the audio file.
The catch with VLC: it's slow for long videos because it processes in real-time by default. A 2-hour video takes roughly 2 hours to convert in VLC. yt-dlp is much faster because it downloads the full stream and then converts.
VLC is good for: occasional conversions, users who hate the command line, situations where you already have VLC open and just need one file quickly.
Method 4: Audacity
Audacity is a free audio editor that's been around since 2000. It doesn't download from YouTube directly, but once you have an audio file, it's an excellent tool for editing and re-exporting in different formats.
The workflow
- First, get the audio file using any of the other methods (YTCut, yt-dlp, etc.).
- Open Audacity. Drag the audio file into the Audacity window, or use File > Import > Audio.
- The audio loads as a waveform. You can now trim, adjust levels, reduce noise, cut sections, or do anything else.
- To export: File > Export > Export as MP3 (or WAV, FLAC, etc.).
- Set quality (bitrate for MP3, sample rate, channels) in the export dialog.
Why use Audacity after extracting audio? Because you might want to do things like: normalize the volume so the audio is a consistent level, reduce background noise from a recording, trim silence from the beginning and end, combine multiple clips into one file, or add basic effects. Audacity handles all of this well.
Audacity export quality settings
When exporting as MP3 from Audacity, you'll see quality options. Use "Preset: Standard (170-210 kbps)" for most uses. Use "Preset: Extreme (220-260 kbps)" for music where quality matters. The "Insane (320 kbps)" preset doesn't sound much better than Extreme for most material and creates significantly larger files.
For WAV export: Audacity exports at the sample rate and bit depth you're working in. If you imported a 44.1kHz file, it exports at 44.1kHz. This is the format to use before sending audio to a professional for mastering or mixing.
Method 5: Online Converters
Y2Mate, OnlineVideoConverter, YTMP3, SaveFrom.net. These sites exist. They work. Sort of.
The honest assessment
Fast. No installation. Paste URL, get MP3. That's genuinely appealing.
The problems, which are real:
- Quality ceiling: Most cap output at 128kbps MP3. That's noticeably worse than 192kbps for music, and noticeably worse than 320kbps for anything where audio fidelity matters. For a spoken-word podcast clip it's fine. For music it's not.
- Privacy: You're submitting a YouTube URL (and sometimes your IP and browser info) to a third-party server that you know nothing about. Most of these sites monetize through aggressive advertising. Some have had malware issues historically.
- Reliability: These sites go down frequently. YouTube regularly blocks their API access. The service that worked last week might not work this week. You get no warning.
- Ad experience: Many of these sites are genuinely unpleasant to use. Multiple popups, fake download buttons, sketchy redirects. If you're going to use one, uBlock Origin in your browser is not optional.
- No segment control: You get the full video's audio. You cannot specify "I want minutes 12:30 through 15:45 only."
For a one-time extraction of a speech or lecture where 128kbps is fine and you just need it done in 30 seconds: these tools are acceptable. For anything where quality or privacy matters, use a better method.
Method 6: Descript
Descript is a paid tool (with a limited free tier) primarily designed for podcast and video editing. It's in this list because it does something the other tools don't: it generates a text transcript of your audio as part of the workflow.
When Descript makes sense
You have a YouTube interview or podcast episode. You want to extract a specific quote as audio, but you're not sure exactly where the quote starts and ends. Descript's workflow is: import the video, it transcribes it automatically, you find the words in the transcript and click them to find the timestamp, then cut the audio around that word.
For podcast producers who regularly extract clips to promote episodes, this is a legitimate workflow. You find the quotable moment by reading the transcript rather than scrubbing through audio. Then you export just that clip.
The free tier of Descript allows a limited number of hours per month. For occasional use it works. For regular use, the subscription cost is significant (pricing varies, check their site for current rates).
For most people who just want to extract audio, Descript is overkill. But if you're building a content workflow where clips are a regular output, the transcript-based editing is genuinely much faster than timeline scrubbing.
Method 7: FFmpeg Direct
FFmpeg is the underlying engine that most of these tools use. You can use it directly for maximum control, including extracting audio from an already-downloaded video file.
Extract audio from a video file to MP3:
ffmpeg -i input_video.mp4 -vn -acodec libmp3lame -q:a 2 output_audio.mp3
Flags explained:
-i input_video.mp4: Input file-vn: No video (skip the video stream, audio only)-acodec libmp3lame: Use the LAME MP3 encoder-q:a 2: VBR quality level 2, which is roughly 190-200kbps average. Scale is 0 (best) to 9 (worst). Quality 0 is approximately 320kbps average.
Extract audio as WAV (uncompressed, lossless):
ffmpeg -i input_video.mp4 -vn -acodec pcm_s16le output_audio.wav
Extract only a segment (30 seconds starting at 5 minutes):
ffmpeg -i input_video.mp4 -ss 00:05:00 -to 00:05:30 -vn -acodec libmp3lame -q:a 2 output_clip.mp3
Copy audio stream without re-encoding (fastest, lossless if the source is compatible):
ffmpeg -i input_video.mp4 -vn -acodec copy output_audio.m4a
This last command is the fastest option when the source video already has an AAC audio track (which most MP4 files do). It extracts the audio without any re-encoding, so there's no quality loss and it completes in seconds regardless of file length.
-acodec copy trick only works if the output format is compatible with the source audio codec. MP4 video with AAC audio: extract to .m4a or .aac with -acodec copy. WebM with Opus audio: extract to .opus or .ogg with -acodec copy. If you need MP3 specifically, you must re-encode.Audio Format Comparison: MP3 vs M4A vs WAV vs OGG vs FLAC
People get weirdly religious about audio formats. Let's be practical about it.
| Format | Compression | Quality at small size | Compatibility | Best use case |
|---|---|---|---|---|
| MP3 | Lossy | Good at 192kbps+ | Universal (plays literally everywhere) | General sharing, ringtones, portable audio |
| M4A (AAC) | Lossy | Better than MP3 at same bitrate | Excellent (all Apple devices, most Android) | Streaming, Apple ecosystem, better efficiency |
| WAV | Lossless (uncompressed) | Perfect, but huge files | Excellent on desktop, poor on mobile | Audio editing, professional production |
| OGG (Vorbis) | Lossy | Good at 160kbps+ | Poor (not supported by iTunes/iOS natively) | Web streaming, Linux, open-source workflows |
| FLAC | Lossless (compressed) | Perfect, smaller than WAV | Good on desktop, improving on mobile | Archiving, audiophile listening, editing masters |
| Opus | Lossy | Excellent, best at low bitrates | Web and modern apps, not iTunes | Voice calls, web audio, streaming at low bitrates |
The practical summary
Use MP3 when: you're sharing the file with anyone and don't know what they're using. MP3 plays in everything. No exceptions. 192kbps for speech, 320kbps for music.
Use M4A when: you're on an Apple device or ecosystem, and you want slightly better quality than MP3 at the same file size. M4A (AAC) is genuinely more efficient than MP3 at equivalent bitrates. YouTube's native audio is AAC, so downloading as M4A avoids one generation of re-encoding loss.
Use WAV when: you're going to edit the audio further in any professional software. Always do audio editing from lossless source material. If you compress to MP3, edit the MP3, then export again as MP3, you've introduced two generations of lossy compression. That adds up audibly.
Use OGG when: you specifically need an open-source format and know the recipient can play it. Web developers and Linux users reach for this. Most other people don't need it.
Use FLAC when: you want lossless audio but also want a smaller file than WAV. FLAC is lossless but compressed, so a 200MB WAV might be 100MB as FLAC with no quality difference. Good for archiving large audio collections.
File Size Reference Table
This is what people actually want to know. How big is this going to be?
| Format and bitrate | 1 hour of audio | 30 minutes | 10 minutes | 1 minute |
|---|---|---|---|---|
| MP3 at 128kbps | 56 MB | 28 MB | 9.4 MB | 0.94 MB |
| MP3 at 192kbps | 84 MB | 42 MB | 14 MB | 1.4 MB |
| MP3 at 320kbps | 140 MB | 70 MB | 23 MB | 2.3 MB |
| M4A at 128kbps (AAC) | 56 MB | 28 MB | 9.4 MB | 0.94 MB |
| WAV at 44.1kHz/16-bit stereo | 635 MB | 317 MB | 106 MB | 10.6 MB |
| FLAC (44.1kHz, varies) | approx 200-280 MB | approx 100-140 MB | approx 33-46 MB | approx 3-5 MB |
| OGG Vorbis at 160kbps | 70 MB | 35 MB | 11.7 MB | 1.17 MB |
The WAV row is the one that surprises people. An hour of uncompressed audio is over 600MB. That's why MP3 exists. For most use cases, 192kbps MP3 or M4A gives you excellent quality at a very manageable file size. You don't need WAV unless you're doing professional audio work.
How to Get the Best Possible Audio Quality
Here's the core principle that most guides skip: the quality ceiling is set by the source video, not your conversion settings.
YouTube does not store your audio in lossless format. They compress everything they receive during upload. The original uploader might have uploaded a lossless WAV file, but YouTube re-encoded it before storing it. What YouTube serves you is typically 128kbps AAC in M4A format (for most videos) or 160kbps Opus in WebM format (for newer uploads). That's the quality ceiling.
If you convert that to a 320kbps MP3, you have a 320kbps MP3 that contains... 128kbps worth of audio data. The extra bits are just encoding the already-compressed audio into a larger container. This doesn't add quality. It adds file size.
The native extraction approach
The highest quality extraction method is to get the audio stream as YouTube provides it, without re-encoding:
yt-dlp -f bestaudio "URL"
This gives you YouTube's native audio stream. Usually that's a .webm file with Opus audio at 160kbps, or an .m4a file with AAC audio at 128kbps. These are lossy, but they're the original lossy copy without an additional re-encoding step on top.
When you tell a tool "give me 320kbps MP3," what actually happens is: download YouTube's 128kbps AAC, re-encode that AAC to MP3 at 320kbps. You've added one generation of lossy compression on top of the existing lossy compression. The output is slightly worse than the source, just larger.
The practical advice
- If the audio quality of the original video is good (professionally produced podcast, studio recording, etc.): download as M4A or native format to avoid re-encoding. 128kbps AAC is fine.
- If you need MP3 specifically: 192kbps is the sweet spot. Going higher doesn't help, going lower is noticeable.
- If you're going to edit the audio after extracting it: download as WAV to avoid further generation loss during your editing exports.
- If the source video is low quality (phone recording, webcam audio, background noise): no bitrate setting will fix bad source audio. 128kbps MP3 is fine, the audio quality is limited by the source anyway.
Common Mistakes When Extracting Audio
Mistake 1: Downloading the full video when you need 30 seconds
This is the most common one. Someone needs a 45-second clip from a 3-hour recorded event. They download the entire 3-hour MP4 (2-4GB), then try to figure out how to trim it. The better approach: use YTCut to set the start and end point precisely before downloading, so you get a 45-second MP3 file directly.
Mistake 2: Using 128kbps for music
128kbps is genuinely not great for music listening. The artifacts are audible on headphones, particularly in the high frequencies (cymbals, strings) and during quiet passages. Use 192kbps minimum for music. 320kbps if you're particularly sensitive to audio quality or listening through good headphones.
Mistake 3: Using 320kbps for speech
The reverse mistake. A podcast or lecture at 128kbps is fine. Human speech doesn't have the complex frequency content of music. You're wasting file space. 128kbps MP3 for speech sounds indistinguishable from 320kbps to virtually everyone.
Mistake 4: Converting MP3 to WAV and thinking you got lossless
A WAV file made from an MP3 source is not lossless. It's just an MP3's audio data stored in a WAV container. The file is much larger but the audio quality is identical to the source MP3 (or very slightly worse due to the conversion). If you need lossless, you needed to start with a lossless source.
Mistake 5: Trusting "HD audio" labels on sketchy converter sites
Some online converter sites claim to offer "HD 320kbps" or "high quality" extraction. In most cases, they're downloading the same 128kbps stream that everyone else downloads and encoding it to 320kbps MP3. The label means nothing. The actual quality is determined by what YouTube provides, not what the site claims to do.
Mistake 6: Ignoring the sample rate
YouTube audio is typically 44.1kHz stereo. This is CD quality in terms of sample rate and is completely standard for music and speech. You don't need to change this. However, some converters default to 22.05kHz or mono output, which sounds noticeably worse. Check that your output is 44.1kHz stereo before committing to a workflow.
Frequently Asked Questions
Can I extract audio from a YouTube video that has copyright-restricted music?
Technically, the extraction tools don't care about copyright. They download what's available on YouTube. However, whether you're allowed to do so is a separate question governed by copyright law and YouTube's Terms of Service. For personal, non-commercial use, the practical risk is very low. For commercial use or redistribution, there are real legal considerations. (Our legal guide covers this in much more detail.)
Why does my extracted MP3 sound muffled or worse than the video?
Most likely reason: the video's audio was better quality than the extraction produced. Check what bitrate you extracted at. If you used 64kbps or 96kbps, that's why. Try again at 192kbps. Second likely reason: you downloaded the video's audio at a lower quality setting than what YouTube had available. Make sure you're selecting "best available" quality in your tool.
Is there a file size limit for extraction on YTCut?
YTCut doesn't impose strict file size limits for audio extraction, but very long segments from very long videos may take longer to process. For extracting audio from full 4-hour videos, yt-dlp is more practical since it runs locally. For clips and segments, YTCut handles it quickly.
What's the difference between extracting audio and downloading audio?
Essentially nothing in this context. Both mean getting the audio track from a YouTube video saved to your device. The word "extract" technically implies separating the audio from the video (which is what happens), while "download" emphasizes getting it onto your device. In practice, people use these interchangeably and they mean the same process.
Can I extract audio from YouTube Live streams?
While a stream is live: yt-dlp can record a live stream, but it's complicated and might not work reliably. After a live stream is archived as a regular video: yes, any method in this guide works on archived streams exactly as it does on regular videos.
Why does yt-dlp give me a .webm file instead of MP3?
You either didn't include the --audio-format mp3 flag, or you don't have FFmpeg installed (yt-dlp needs FFmpeg to convert formats). Install FFmpeg and make sure it's in your system PATH. Then the --audio-format flag will work correctly.
Does the audio quality depend on the video's resolution?
No. YouTube stores audio and video as separate streams. A 1080p video and a 360p video of the same content will usually have identical audio streams. The video resolution does not affect audio quality. What affects audio quality is the original upload quality and YouTube's audio processing, both of which are independent of resolution.