What is a podcast audiogram?

An audiogram is a short video clip — typically 20 to 45 seconds — that combines your podcast audio with an animated waveform visualization, burned-in captions, and a designed background. It is a video file, not a static image, and is designed to perform on social media platforms where most content is consumed silently.

How do I create a podcast audiogram from a YouTube episode?

Extract your desired 20–45 second clip from the YouTube episode using YTCut — paste the URL, set the timestamps, download as MP3 (audio only) or MP4 (with video). Then import into an audiogram tool like Headliner, Descript, or Canva to add the waveform visualization and burned-in captions before exporting.

Do I need video of my podcast to make audiograms?

No. Traditional audiograms work with audio only — the format places audio over a designed background with an animated waveform. No video recording is required. If you do have video of your podcast, you have more options including talking-head clips, but audio-only audiograms perform just as well with good design.

What aspect ratio should I use for podcast audiograms?

Create a 9:16 vertical version for TikTok, Reels, and YouTube Shorts, and a 1:1 square or 16:9 horizontal version for Twitter and LinkedIn. The same clip can be reformatted for each platform. Posting the wrong aspect ratio for a platform significantly reduces performance because the content does not fill the screen correctly.

How do I get podcast guests to share my audiograms?

Send the finished audiogram file directly to the guest — not a link requiring login. Include the caption text already written for them and specify which platforms it would work best for. Most guests are happy to share content from their appearance but will not create the post themselves. Remove that friction by delivering a ready-to-post file.

How to Make Podcast Audiograms from YouTube Videos (2026 Guide)

Q: How long should a podcast audiogram be?

20 to 45 seconds is the practical sweet spot. Under 20 seconds often fails to deliver a complete thought. Over 60 seconds loses significant portions of social media audiences. 30 to 35 seconds delivers the best combination of complete idea, maintained attention, and compatibility across TikTok, Reels, Twitter, and LinkedIn.

What an Audiogram Actually Is

Let us get the definition right before anything else, because "audiogram" gets used loosely online and the confusion leads to bad results.

An audiogram is a short video. Not a static image. A video that contains an audio clip, an animated waveform that visualizes the audio in real time, burned-in captions so it works without sound, and usually branding elements like a podcast logo, episode title, and guest name. The whole thing typically runs 15 to 60 seconds.

What an audiogram is NOT: a static image of a soundwave. That is just a picture. It does not play. It cannot be heard. It does not communicate the energy of the conversation. A true audiogram is a video file that social media platforms treat as a video and play in feeds.

The anatomy of a well-made audiogram:

Background layer: a solid color, branded gradient, studio photo, or blurred image. Something that holds attention without competing with the text.
Waveform animation: bars, circles, or lines that animate in sync with the audio. This is the visual "proof" that there is audio happening, and it draws the eye even when someone is watching silently.
Captions: synchronized to the speech, displayed prominently enough to read on a phone screen. This is what actually communicates the content to the 80% of viewers watching without sound.
Branding elements: podcast name, episode number, guest name, logo, and a CTA (call to action) like "Listen to the full episode" with a link or handle.
Progress bar (optional): some audiogram templates include a thin progress bar at the bottom showing where you are in the clip. Viewers often find this satisfying. It also signals that the clip has a defined end, which subtly encourages them to stay and watch it through.

The format is a video export, usually MP4 with H.264 encoding, in whatever aspect ratio suits the target platform. The audio is the star. The visual design is the frame that makes the audio accessible and shareable.

Why Audiograms Still Work in 2026

Every year someone writes a post declaring that audiograms are dead. Every year the data disagrees.

Here is why they still work, specifically in 2026's social media environment.

Social platforms prioritize video above all other content types. Facebook, Instagram, LinkedIn, and Twitter/X all give video significantly more organic reach than static images or text posts. An audiogram is a video. It gets the video treatment from the algorithm. A static quote card gets the image treatment, which is substantially less reach on most platforms. This is not a creative argument. It is an algorithmic one.

80% of social video is watched without sound. This has been true since Facebook started auto-playing videos in feeds in 2013, and nothing has changed this trend. Audiograms are designed for exactly this environment. The captions carry the content. The waveform provides visual engagement. The branding identifies the show. All of this works perfectly in a muted feed.

Podcast discovery is still primarily social-driven. People find new podcasts because someone they follow mentioned it, or because they scrolled past a clip that made them laugh or think. A 30-second audiogram that delivers one genuinely interesting idea is a functional discovery mechanism. It gives a potential new listener enough information to decide whether the show is for them. A static "New episode out now!" post gives them nothing.

Clips are shareable. Full episodes are not. Nobody shares a two-hour podcast in their Stories. They might share a 30-second clip where the guest said something they found genuinely surprising. Short, quotable, self-contained content gets shared. Long-form does not. Audiograms create shareable units from non-shareable source material.

The competition is still low in most podcast niches. Despite all this being known, the majority of podcasters still do not produce audiograms consistently. They post "new episode" text posts and wonder why growth is slow. Doing the thing that works, consistently, is enough to stand out in most niches. The bar is not high.

Finding the Right Clip to Turn Into an Audiogram

This is where most podcasters go wrong. They pick a clip because they remember it being good when they recorded it, or because it is easy to find, not because it is actually the right clip for social media.

The criteria for an ideal audiogram clip are specific. A good clip meets all four of these:

1. One complete idea. The clip should contain a single, complete thought. Not the setup for an idea that pays off five minutes later. Not a reference to something discussed earlier. A complete thought with a beginning, a middle, and a conclusion. "Most people think X, but actually Y, and here is why that matters: Z." That is a complete idea. "And as I was saying before about the marketing funnel..." is not.

2. Self-contained without context. Someone who has never heard your podcast should be able to watch this clip cold and understand exactly what is being discussed. If the clip requires knowing who the guest is, what the episode is about, or what was said in the previous segment to make sense, it is the wrong clip. Add a text overlay to provide context if you love a clip that does not quite meet this criterion. But ideally, the clip stands alone.

3. Provocative opening line. The first sentence should create a question in the viewer's mind or make a statement that is surprising, funny, or counter-intuitive enough to make them want to hear the next sentence. "The most successful entrepreneurs I have interviewed all have one thing in common that nobody talks about" works. "So, yeah, that is an interesting point and I think what I want to say about that is..." does not work. Good clips start mid-energy, not mid-thought.

4. Emotional peak or surprising fact. The best audiogram moments are the ones where something genuinely surprising or emotionally resonant happens. The guest reveals an unexpected failure. The host says something genuinely funny. Someone admits to something counter-intuitive. An astonishing statistic gets dropped. Audiences share content that made them feel something or that taught them something surprising. Clips that are interesting but not particularly striking do not get shared.

Practical tip for finding these clips: listen to the episode with a text editor open. Every time you hear something that meets the criteria above, write down the timestamp and a one-sentence description of the moment. Do not stop to clip anything yet. Listen through and collect 8 to 12 candidate moments. Then rank them and choose your best 3 to 5. Those are your audiograms for the week from this episode.

Step 1: Extract the Clip from YouTube Using YTCut

Most podcasters who also record video publish their episodes to YouTube. If that is you, your source material is right there on YouTube and you can skip the step of digging through local audio files.

Go to ytcut.org. The interface is minimal and intentional. Paste the YouTube URL of your podcast episode into the input field. The video loads in the player.

Now you need your timestamp. From your candidate moments list, you have the approximate time of the clip you want. Type it into the start time field. Watch a few seconds of the video to confirm you are in the right place. Adjust the in-point to a moment where the speaker is clearly mid-thought and the audio is clean. You want to start the clip a half-second or so before the first real word so the audio does not start abruptly.

Set the out-point at the moment the thought completes cleanly. A moment of brief silence, a natural pause, the end of a sentence. Do not cut off mid-word. Do not leave five seconds of dead air at the end. The clip should feel complete when it ends, not truncated.

For most audiogram use cases, the ideal clip length is 20 to 45 seconds. Under 20 seconds is often too short to deliver a complete idea. Over 60 seconds starts losing social media audiences who are in scroll mode. 30 to 40 seconds is the sweet spot.

Format choice depends on what you will do next:

If you are creating a pure audiogram (just the audio over a designed background and waveform), download as MP3. The audiogram tool will handle the visual design and you only need the audio.
If you are creating a video clip of the actual recording (talking heads, studio footage), download as MP4. You will then format this in CapCut or Premiere for social platforms.
If you want to create both versions from one download, download as MP4. You can extract the audio from the MP4 in any audio editor or audiogram tool.

Click download. The clip arrives in seconds. Rename it immediately with a descriptive name: "ep142-clip-trust-failure-moment.mp3" tells you exactly what this is when you open your files folder two days later. "clip1.mp3" tells you nothing.

Repeat for each of the 3 to 5 clips you identified. The whole extraction step for a full week's worth of audiograms takes about 10 minutes.

Step 2: Choose Your Audiogram Tool

Several tools exist specifically for creating audiograms. Each has a different target user, pricing model, and feature set. Here is an honest breakdown.

Headliner is the most popular audiogram-specific tool. It has a clean interface, solid auto-captioning, a library of waveform animation styles, good template variety, and integrates directly with podcast RSS feeds (so it can pull episode audio automatically). Free plan allows a limited number of videos per month with Headliner branding. Paid plans remove the branding and add more exports. The free tier is enough to test the workflow. Good choice for podcasters who want purpose-built audiogram tools.

Wavve is similar to Headliner. Slightly different template aesthetic, slightly different pricing. The interface is clean and the waveform animations are smooth. Less automatic integration with podcast platforms compared to Headliner, but the output quality is comparable. Good for creators who want a Headliner alternative.

Descript is a full audio/video editing tool that also does audiograms as one of its features. Its workflow is transcript-based rather than timeline-based, which is genuinely different and useful if you do a lot of editing. Auto-captions are excellent. If you are already using Descript for podcast editing, doing audiograms in it makes sense because you are already working in the tool. If you are not already a Descript user, subscribing just for audiograms is probably overkill.

Riverside.fm has an audiogram feature built into its platform. If you record your podcast using Riverside (which many podcasters do for its high-quality separate track recording), staying in the ecosystem for audiograms makes workflow sense. The audiogram tool is not as feature-rich as Headliner but it is convenient if your recording is already there.

Canva added video editing capabilities that can serve audiogram creation. You can import an audio file, add an animated waveform element from Canva's library, add captions manually or through Canva's (limited) auto-caption feature, and export as a video. It is more manual than Headliner but gives you more design control, and if you already use Canva for other graphics, the learning curve is minimal. Best for designers who want full control and are comfortable with a less automated workflow.

Adobe Premiere Pro or Audition with the Essential Sound panel and manual caption creation gives you complete control but requires the most time and skill. Worth it if you are already proficient in Adobe tools and need specific design outcomes that other tools cannot achieve. Not worth it for someone who just wants to get audiograms out efficiently.

Step 3: Design the Audiogram

The design of your audiogram should be consistent across episodes. Consistency builds brand recognition. The third time someone sees your audiogram style in their feed, they know whose podcast it is before they read a word. That recognition is valuable and only develops through repetition of the same design elements.

Background options:

A solid dark color is the cleanest option and works across all platforms and lighting conditions. It puts maximum focus on the waveform and captions. Choose your brand color or a neutral dark tone. A very dark navy, charcoal, or deep green tends to look more sophisticated than pure black, which can look flat on some screens.

A brand gradient (two of your brand colors blending) adds visual interest without the complexity of a photo background. It reads as designed and intentional. Many of the most recognizable podcast audiograms use simple gradients.

A studio or recording environment photo can work, but requires careful treatment. The photo should be blurred or darkened significantly so the captions remain readable over it. A busy, high-detail photo as a background makes the text hard to read. If you use a photo, apply a dark overlay (70% opacity black over the photo) before adding text.

A guest headshot layout is popular for interview podcasts. The guest's photo on the right side, your brand color on the left, waveform across the bottom or in a dedicated band. This works well for clips where the guest is the draw and their face is recognizable to your audience.

Waveform style:

Bars are the classic choice. A row of vertical bars that extend and contract with audio amplitude. Clean, recognizable, and works at any size. The default choice if you are not sure what to use.

Circles or rings animate outward from a center point. More visually dynamic than bars but can feel dated depending on the design context. Works well with brand colors on a dark background.

Lines (continuous waveform visualization) are more subtle and modern-looking. A single line that undulates with the audio. Tends to look cleaner on minimalist designs.

The rule: the waveform should be visible but not dominant. It is supporting evidence that audio is playing, not the main attraction. The main attraction is the captions. Size your waveform so it reads clearly but does not compete with the text.

Step 4: Captions Are Everything

Read this section twice.

Captions are not an accessibility feature for your audiogram. They are the content delivery mechanism for the majority of your viewers. If your captions are wrong, too small, too slow, or too fast, your audiogram has failed regardless of how good the audio is.

Accuracy: Auto-captions in Headliner, Descript, Riverside, and CapCut are all roughly 90 to 95% accurate for clear speech in English. That sounds good until you remember that 1 in 10 words may be wrong. Names, technical terms, slang, and words with unusual pronunciations fail most often. Always proofread every word. A caption that says "their" instead of "there" is minor. A caption that mangles a guest's name, misquotes a statistic, or creates an embarrassing homophone error damages credibility. Two minutes of proofreading prevents this.

Font size: 36px minimum on a 1080x1920 canvas. Larger is almost always better for readability on small phone screens. Many of the best-performing audiograms use very large, bold fonts where the caption takes up a significant portion of the screen. Do not be timid with text size.

Maximum lines: Two lines of caption on screen at any given time. Three lines becomes crowded and hard to read quickly. One line is fine. Two is the maximum.

Contrast: White text on a dark background is the most readable combination and works on virtually every background. Yellow is also popular (high visibility, feels urgent). Avoid light gray, pale blue, or any color that blends with a lighter background. If your background is bright, add a dark semi-transparent box behind the text to ensure contrast regardless of what is behind it.

Caption style: The "karaoke" style (individual words highlighted one at a time as they are spoken) has become extremely popular on TikTok and Reels because it draws the eye, creates forward momentum, and is easy to follow at any speaking speed. Tools like Headliner, CapCut, and Descript all support this style. It is worth implementing if your tool supports it, particularly for clips with fast or emphatic speech.

Caption timing: The text should appear exactly as the word is spoken, not half a second early or late. Auto-sync is usually accurate but check it, especially at sentence transitions. A caption that lingers one second after the word was spoken creates a subtle but distracting disconnect that viewers sense even if they cannot articulate it.

One more thing: caption the filler words. "Um," "uh," and "you know" that appear in the speech should be captioned or edited out. If you caption them, they look sloppy. If you leave them in the audio uncaptioned, the captions feel out of sync. The cleanest solution is to edit these out of the clip before creating the audiogram. If the speaker is generally articulate and clean, this is not a big issue. For speakers with a lot of verbal filler, editing the clip to the cleanest 30-second segment saves you work downstream.

Step 5: Add Context and Branding

The clip is extracted. The waveform is designed. The captions are proofread and timed correctly. Now add the elements that tell people who made this and what to do next.

Episode title: A short version of the episode title, displayed at the top or bottom of the audiogram. Not the full title. A 60-character episode title does not fit in a banner without being unreadably small. Truncate it: "Ep. 142: Why Trust Is Overrated" is enough. Viewers can find the full episode title when they search for it.

Guest name: For interview podcasts, the guest's name should be visible. Either in a lower-third banner (the TV-style name-and-title display at the bottom of the frame) or as a persistent text element near their photo. This matters because often the clip is shareworthy because of who said it, not just what was said. "John Smith, CEO of Acme Corp" gives the viewer a reason to care based on the speaker's authority.

Podcast logo: Your logo should appear somewhere. Corner placement is standard: top-left or top-right. Small enough not to compete with the content, large enough to be identifiable. If someone screenshots or crops the video, your logo should still be visible in the cropped version.

CTA text: A call to action should appear, ideally at the end of the clip or as a persistent element. "Listen to the full episode" or "New episode every Tuesday" or "Subscribe on Spotify" are all functional CTAs. Keep it short. Three to five words maximum. The CTA does not need to be elaborate. It just needs to exist.

Handle or URL: Include your @handle or podcast URL somewhere visible. When the audiogram gets shared, you want the source to be identifiable. A handle is better than a URL because handles are searchable directly on social platforms.

Design principle: every element should earn its space. If adding something makes the design more cluttered without adding meaningful information, leave it out. White space (or in this case, dark negative space) is not wasted space. It gives the eye somewhere to rest and makes the important elements stand out more clearly.

Platform Specs and Export Settings

Getting the technical specs right prevents your carefully designed audiogram from being displayed in the wrong aspect ratio, with black bars, or at degraded quality.

TikTok / Instagram Reels / YouTube Shorts

Resolution: 1080x1920 (9:16 portrait)
Video codec: H.264
Audio codec: AAC
Frame rate: 30fps (24fps is also accepted)
Max file size: TikTok 287.6 MB; Instagram 650 MB; YouTube Shorts 256 GB (not a meaningful limit)
Max duration: TikTok 10 min; Instagram 90 seconds; YouTube Shorts 60 seconds
Safe zone: Keep all important text and waveform between 250px from top and 420px from bottom to avoid UI overlay from each platform

Instagram Feed (Square)

Resolution: 1080x1080 (1:1 square)
Video codec: H.264
Max duration: 60 seconds for feed video, longer for carousel
Notes: Square audiograms work well for Instagram feed posts where you want the image to take up more vertical space than a landscape video would

Twitter/X

Resolution: Up to 1280x720 (16:9) for landscape, 720x1280 for portrait
Recommended: 1280x720 or 720x720 for audiograms. The Twitter feed is not optimized for 9:16 portrait video the way TikTok and Reels are.
Max file size: 512 MB
Max duration: 140 seconds for standard accounts; longer for verified
Notes: Twitter/X video auto-plays in the feed but does not auto-expand to full screen. Landscape or square tends to show more of the audiogram without requiring the viewer to tap.

Resolution: Up to 4096x2304 (but 1920x1080 or 1080x1920 are practical maximums)
Max file size: 5 GB
Max duration: 10 minutes
Notes: LinkedIn supports both landscape (16:9) and portrait (9:16). For audiograms, portrait (1080x1920) works well. The LinkedIn audience is professional, so lean more informational and less playful with the design aesthetic.

Export settings to request from your audiogram tool

Most audiogram tools export at the correct resolution for each platform automatically. If you are exporting manually from a tool like Premiere or Canva, use these settings: H.264 video codec, 8 to 12 Mbps video bitrate for 1080p, AAC audio at 192 kbps, 30fps, MP4 container. These settings produce a file that looks great on every platform and is small enough to upload quickly.

Tool Comparison Table

Tool	Best For	Auto-Captions	Free Tier	Waveform Styles
Headliner	Podcasters wanting purpose-built audiogram tool	Yes, excellent	Yes (with branding)	Many (bars, lines, circles)
Wavve	Headliner alternative, clean templates	Yes	Yes (limited exports)	Several
Descript	Existing Descript users, transcript-based editing	Yes, excellent	Yes (limited hours/mo)	Basic
Riverside	Users already recording on Riverside	Yes	Yes (limited)	Basic
Canva	Designers wanting full control, existing Canva users	Limited	Yes (robust free tier)	Elements from library
Premiere Pro	Full control, complex designs, existing Adobe users	Yes (via AI)	No (subscription)	Custom via plugins

The recommendation for most podcasters just starting with audiograms: begin with Headliner. It is purpose-built, the free tier is genuinely useful for testing the workflow, and the learning curve is about 20 minutes. Once you have figured out whether audiograms are worth doing for your show (they almost certainly are), you can evaluate whether to upgrade Headliner or switch to a different tool based on your specific needs.

Best Practices for Hook-Writing on Audiograms

The hook is the first 3 seconds of the audiogram and the caption you write when you post it. These are not the same thing and both need attention.

The first 3 seconds determine whether the viewer keeps watching or swipes. On TikTok and Reels, where the feed moves at roughly one swipe per second for active scrollers, you have almost no time. The visual design and the first sentence visible in the captions do all the work.

Here is a crucial insight that separates good audiogram creators from lazy ones: do not use the verbatim opening line of your clip as the hook text overlay. The clip starts wherever the speaker starts talking. That might be a perfectly good opening for an audio listener, but it may not be a great hook for someone seeing a text overlay in a social feed.

Rewrite the hook for social. Take the core idea of the clip and translate it into a pattern-interrupting social hook.

Original clip opening: "Yeah, I think the thing that most people get wrong about building an audience is they focus on the wrong metrics."

That is a solid statement but a mediocre hook. The social hook version might be:

Text overlay: "Stop tracking the wrong numbers." (then the speaker says what they mean)
Or: "The audience metric nobody tells you about"
Or: "97% of creators track this. It is the wrong stat."

The same underlying idea, made into a hook that stops the scroll instead of flowing past it. The actual clip audio stays the same. You are just adding a text layer in the first 1 to 2 seconds that frames what the viewer is about to hear.

Platform-specific hook variations:

TikTok: Bold, personal, direct. "I was wrong about this for 5 years" works better than "Expert discusses common misconceptions." First person and vulnerable tends to outperform third person and formal.
Instagram Reels: Similar to TikTok but slightly more polished. Emojis in the text overlay can work here in ways they might not on LinkedIn.
LinkedIn: Professional framing. "The leadership advice that actually held my career back" works. "SHOCKING thing my boss never told me" does not fit the LinkedIn culture.
Twitter/X: The tweet itself does most of the hook work. Keep it punchy and specific. The clip then supports the tweet rather than needing its own separate hook.

Also: the caption you write when you post matters as much as the hook inside the video. The caption is what appears above or below the audiogram in the feed before someone taps to expand it. Write this with the same care as a headline. One strong sentence. Then details. Then CTA. Not a paragraph of context followed by the interesting bit.

How to Batch-Produce Audiograms

Doing one audiogram at a time is how you burn out. Batch production is how you maintain a consistent posting schedule without audio editing consuming your entire week.

The system: after each episode publishes, immediately identify 4 to 5 clip candidates and note their timestamps. Do not clip anything yet. Just note the timestamps.

Once a week, on a dedicated production day (Wednesday works well for many creators), process all the pending clips at once. Open YTCut, extract all clips from that week's episode and any backlogged episodes. Takes 15 minutes for 5 clips.

Then open your audiogram tool. Import all clips. Add captions to all of them. Check and correct captions for all of them. Apply your template (which is already saved and consistent). Adjust any design elements that are episode-specific (guest name, episode number). Export all of them.

Write all the platform-specific captions in a Google Doc. Five clips, three platforms each = 15 caption variations. This is the part that takes the most creative energy. Budget 45 to 60 minutes for this. But you are doing it all at once rather than in five separate daily sessions, so the overhead (opening tools, finding files, context switching) is paid once, not five times.

Load everything into your scheduler. Buffer, Later, Hootsuite, or any social scheduling tool. Set the dates and times. Done.

Total time for this weekly batch: 2 to 3 hours on production day. Result: a full week of audiogram content across multiple platforms, plus a growing library of clips for the vault strategy.

One batch-production tip: create a reusable template in your audiogram tool that requires only three changes per episode: the episode number, the guest name, and the clip audio. Everything else (colors, fonts, logo placement, waveform style, CTA text) is pre-built and locked. Changing only these three things per episode reduces per-audiogram production time from 20 minutes to 5 minutes. Templates are a force multiplier.

Common Mistakes

Learn from these before you waste time on audiograms that will not work.

Waveform too large over captions. The waveform animation catches the eye, which is great. But if it overlaps with the caption text, it becomes a visual conflict that makes both elements harder to process. The waveform should occupy a dedicated band of the design (typically the bottom third) and the captions should be in a separate zone (typically the middle). They should not fight for the same space.

No CTA. An audiogram that ends with the audio and then... nothing. No "follow for more," no episode link, no "subscribe on Spotify." The viewer watched 30 seconds of your content and liked it. Give them somewhere to go. This is not aggressive marketing. It is basic courtesy. You did the work. Ask for the follow.

Wrong aspect ratio for the platform. A 16:9 landscape audiogram posted to Instagram Reels or TikTok gets tiny black bars on the sides and looks like someone posted a YouTube video instead of a native Reel. Make a separate 9:16 version for vertical platforms. It takes 3 extra minutes and looks like you made it for the platform you are posting to.

Caption font too small. This one is extremely common. The creator designs their audiogram on a laptop screen where 24px text looks perfectly readable. Then they post it and viewers see it on a 5-inch phone screen held at arm's length. 24px text on a phone screen in a social feed is nearly illegible. 36px is your floor. 48px or larger is often better. Err toward bigger.

Choosing the wrong clip. A clip that is interesting in the context of the full episode but requires 20 minutes of prior listening to understand is not a good standalone audiogram. If you find yourself adding multiple text overlays to explain who the guest is, what they meant by that reference, or what the larger point is, the clip is not right for audiogram use. Find a genuinely self-contained moment.

Posting without a caption. The audiogram itself is the video. The post caption is the hook that determines whether someone even clicks to expand and watch the video. "New episode clip" is not a caption. Tell people what this clip is about in one sentence that makes them want to watch. Then they watch. Then the CTA inside the audiogram does the rest.

Giving up after two weeks. Audiograms take time to build momentum. The first five audiograms you post will probably get modest engagement. That is normal. The algorithm needs to learn your audience. Your audience needs to learn your format. Consistency over 60 to 90 days is what builds the flywheel. Two weeks of mediocre numbers is not data. It is a warm-up.

FAQ

Do I need video of my podcast or just audio for audiograms?

Just audio is sufficient for traditional audiograms. The format puts audio over a designed background with an animated waveform. No video recording is required. However, if you do record video of your podcast (many creators record to YouTube), you have more options: you can create talking-head clips in addition to audio-only audiograms, and you can use still frames from the video as background images in the audiogram design.

How long should a podcast audiogram be?

20 to 45 seconds is the practical sweet spot for most use cases. Under 20 seconds often does not deliver a complete thought. Over 60 seconds loses significant percentage of social media audiences. 30 to 35 seconds is the format that delivers the best combination of complete idea, maintained attention, and platform compatibility across TikTok, Reels, Twitter, and LinkedIn.

Can I use copyrighted music as background in my audiogram?

No, not without a license. Background music in audiograms is tempting from a production-quality standpoint but using commercially released music without a sync license violates copyright and can result in your post being muted, removed, or your account being flagged. Use royalty-free music from platforms like Epidemic Sound, Artlist, or Pixabay Music. Or use no background music at all. The speech and waveform carry the audiogram without needing music underneath.

Should the audiogram show the guest's face or just audio over a graphic?

Both work. A graphic-only audiogram (no face) is faster to produce and more consistent in style. A face-on-background design can perform better when the guest is well-known because their face is a signal of credibility and familiarity. For guests with significant audiences of their own, featuring their face is smart because their followers may discover the clip through tags. For most episodes with lesser-known guests, a well-designed graphic audiogram performs just as well without the complexity.

Should I post the same audiogram on every platform?

The same clip video can go to every platform, but adjust the aspect ratio and caption for each one. A 9:16 version for TikTok, Reels, and Shorts. A 1:1 or 16:9 version for Twitter and LinkedIn. The caption you write in the post should be customized per platform since each platform has different culture and what resonates with your LinkedIn audience is different from what resonates with your TikTok audience.

How do I get guests to share my audiograms?

Make it extremely easy. Send the guest the finished audiogram file directly (not a link that requires them to log in somewhere). Include the caption text already written for them. Tell them which platforms you think it would work best for. Most guests are happy to share content from their appearance but will not do the work of creating the post themselves. Hand them the ready-to-post file and they are much more likely to actually share it.

What if my podcast is video-only and not on YouTube?

If your podcast episode is hosted elsewhere as a video file, you can still create audiograms. Download the episode file, import the audio into your audiogram tool of choice, and proceed as normal. YTCut is specifically for YouTube URLs. For files hosted elsewhere, use your audiogram tool's direct file upload feature or extract audio from the video file using a local tool like FFmpeg or VLC before importing.