Hedra Track/Hedra Fundamentals
Hedra Track
Module 1 of 6

Hedra Fundamentals

Understand what Hedra does, create your first character from a portrait, and generate audio-driven animation in minutes.

16 min read

What You'll Learn

  • Understand what Hedra is and how its Character-3 model generates audio-driven character animation
  • Create your first talking avatar video by uploading a portrait and syncing it to audio
  • Navigate the Hedra Studio interface including the workspace, settings panels, and export controls
  • Recognize the difference between Hedra and traditional lip-sync or video editing tools
  • Apply best practices for source image selection to maximize animation quality

What Hedra Does and Why It Matters

Hedra is a generative video platform built around one core idea: audio should drive everything. Rather than animating a character first and then trying to match sound to movement, Hedra's Character-3 model takes your audio as the primary input and generates facial animation, head motion, eye tracking, and natural micro-expressions to match it. The result is a character that feels like it is genuinely speaking, not a puppet with dubbed-over sound.

The platform launched publicly in 2024 and grew to over 350,000 users producing more than 1.6 million video generations within its first year. That adoption rate reflects a real gap Hedra filled: most creators who wanted a talking character on screen previously needed either a camera, an actor, or expensive 3D animation software. Hedra collapsed that requirement to a single portrait image and an audio file.

Character-3 is the current flagship model. It improves on earlier versions with more expressive eye movement, upper-body sway, natural blinking patterns, and the ability to handle a wide range of image styles - from photorealistic portraits to illustrated characters and even stylized art. The model treats audio as an omnimodal signal, meaning it does not just match phonemes to mouth shapes. It reads the emotional content, the pacing, and the stress patterns of speech and adjusts the entire face and upper body accordingly.

Hedra Studio is the browser-based workspace where you build your videos. It uses a node-based workflow model, so you connect an image source, an audio source, and a Character-3 generation node, then export. This approach makes it composable - you can swap inputs, adjust settings, and regenerate specific clips without rebuilding your entire project from scratch. Understanding this node model from the start will save you significant time as your projects grow in complexity.

Quick Test: Complete the Full Hedra Generation Loop

Go to hedra.com and sign up for a free account.

Upload any clear, front-facing portrait photo (a headshot works perfectly).

Type a short sentence into the text-to-speech field and pick any voice.

Generate your first video - do not worry about quality settings yet.

Observe how the character blinks, shifts, and moves even during pauses in the speech.

The Character-3 Model - How It Works

Character-3 processes your input portrait through a pipeline that extracts a detailed facial geometry map. This map tracks key landmarks - the corners of the mouth, the positions of the eyes, the jawline, and dozens of intermediate points - and uses them as anchors for the animation. When you supply audio, the model predicts the sequence of facial states that would produce that sound naturally, then renders each frame by warping and synthesizing from your original image.

This approach has a few important practical implications. First, image quality matters enormously. A blurry, low-resolution, or heavily shadowed portrait gives the model less data to work with, and the resulting animation will look less convincing. A well-lit, high-resolution, front-facing image produces dramatically better results. Second, the model handles head angles up to approximately 30 degrees off center reliably. Beyond that, quality degrades because the model has less information about the obscured side of the face.

Third, and most usefully, Character-3 is not limited to photorealistic human faces. Illustrated characters, cartoon avatars, stylized portraits, and even some abstract representations can be animated effectively, as long as the image contains recognizable facial landmarks. This makes Hedra valuable for brand mascots, educational characters, animated book covers, and content that explicitly should not look like a real person.

The generation time for a standard clip at 720p is typically 30 to 90 seconds depending on server load and clip length. Hedra's free tier produces videos with a watermark. Paid plans remove the watermark, increase resolution options to 1080p, and provide priority generation queue access.

Navigating the Hedra Studio Interface

The Hedra Studio workspace is organized around a central canvas where your workflow nodes live. On the left panel you will find your asset library - uploaded images and audio files. The right panel shows the properties of whichever node is currently selected. The top bar contains your export and settings controls.

When you create a new project, Hedra gives you a pre-configured starting template with three nodes already connected: an Image node, an Audio node, and a Character-3 node. To get started, you replace the placeholder content in the Image and Audio nodes with your own assets, then click Generate on the Character-3 node.

The Character-3 node has several parameters worth understanding. Motion intensity controls how much the character moves beyond the minimum required by the audio. A low value produces a more composed, stationary delivery; a high value creates more head movement and body sway. Expression sensitivity affects how dramatically the face responds to emotional cues in the audio. Duration is automatically set by the length of your audio but can be manually capped if you want a shorter clip. Aspect ratio and resolution settings are also in this node - 9:16 vertical format is ideal for social platforms, 16:9 landscape for YouTube or presentations.

After generation, the output node shows a preview player. You can scrub through the clip, check lip-sync accuracy at specific words, and decide whether to export or regenerate with different settings. Exports download as MP4 files, which are compatible with every major video editing tool.

Try This Yourself

Generate the same audio clip three times with the motion intensity slider set to low, medium, and high. Download all three clips and play them side by side. Notice how the character posture and expressiveness change. Then try the same test with expression sensitivity. This gives you an intuitive feel for what each slider does before you build anything serious.

Source Image Best Practices

The single biggest lever you have over output quality in Hedra is the quality of your input portrait. A few straightforward guidelines make a significant difference.

Resolution: Use images at 512x512 pixels or larger. The ideal is around 1024x1024 for a square crop or the equivalent in your target aspect ratio. Higher resolution gives the model more detail to preserve during animation, resulting in sharper skin texture, clearer eye detail, and more convincing lip movement.

Lighting: Even, diffuse lighting that illuminates both sides of the face equally produces the best results. Harsh side lighting creates shadows that the model interprets as facial features, which can distort the animation. If you are generating AI portraits specifically for Hedra, prompt for "studio lighting" or "soft diffuse lighting."

Expression: A neutral or mildly pleasant expression in the source image is preferable to an open-mouthed smile. A wide smile locks the mouth muscles in a position that conflicts with speech animation. Similarly, avoid images where the subject is already mid-sentence or making an exaggerated expression.

Framing: Center the face in the frame with approximately 20 percent clear space around the head. Tight crops that cut off the chin or forehead limit the model's ability to generate natural head movement. Full-body images work but the face becomes small, reducing animation quality.

Background: A clean, non-distracting background helps the model isolate the subject. Complex busy backgrounds do not prevent animation but can appear to warp or shimmer in the output as the model focuses processing on the face. A plain or blurred background gives the cleanest result.

Core Insights

  • Hedra Character-3 treats audio as the primary driver of all facial and body animation, producing movement that is genuinely synchronized rather than post-processed.
  • Image quality is the single most controllable factor in output quality - a high-resolution, well-lit, front-facing portrait dramatically outperforms a casual snapshot.
  • The node-based Hedra Studio workflow makes it easy to swap inputs and regenerate specific clips without rebuilding entire projects.
  • Character-3 works with illustrated and stylized characters, not just photorealistic faces, opening up use cases for brand mascots and animated content.
  • Motion intensity and expression sensitivity sliders give precise control over how much the character moves, allowing both composed formal delivery and expressive emotional performance.