Video → context for LLMs

Your model can't
watch video.
paned lets it read it.

paned turns any video into a rich, frame-by-frame multimodal description — what's on screen, what's said, what's written, and what happens — structured as clean text built to drop straight into your prompts.

interview_clip.mp4 decoding · 00:07
person · standing
marble counter
Dr. Maya Ito — Food Scientist
visualDaylit kitchen, wide shot. A woman in an apron slices a red onion at a marble island.
on-screen"Step 1 — Mise en place"
speech"First, we get all our prep done."
audiorhythmic knife on board
frame-accurate timestampsspeaker-labelled speechon-screen text · OCRscene & actioncamera & shot typeJSON or markdown frame-accurate timestampsspeaker-labelled speechon-screen text · OCRscene & actioncamera & shot typeJSON or markdown
The gap

Dropping a raw transcript into a prompt throws away most of the video.

A speech transcript captures the words and nothing else — no setting, no gestures, no on-screen text, no idea of what actually happened on camera. Models reason on what you give them. Give them the whole scene.

Plain transcript

  • Words only — no visual context whatsoever
  • On-screen text, slides and captions vanish
  • No actions, gestures, or scene changes
  • Speakers blurred together, timing approximate
  • Silent and visual-only moments simply disappear

paned multimodal description

  • Every modality — visuals, speech, text, audio, action
  • On-screen text and slides read and transcribed
  • Actions and scene transitions described per pane
  • Speakers labelled, frame-accurate timestamps
  • Structured so an LLM can cite and reason across it
The output

One video, decomposed into panes.

Each pane is a time-bounded slice of the video with every modality aligned — exactly the shape a language model reasons over best. Export as markdown for prompts or JSON for pipelines.

interview_clip.mp4 → paned.md 04:12 runtime38 panesmarkdown
00:00–00:04
PANE 01
visualWide shot, modern kitchen, daylight. A woman (30s, apron) stands at a marble island holding a chef's knife.
actionShe slices a red onion into even half-moons, left to right.
on-screen"Step 1 — Mise en place" (lower third)
speechSpeaker A (female): "First, we get all our prep done."
audio[rhythmic knife on cutting board]
00:04–00:09
PANE 02
visualCut to medium close-up. Hands sweep chopped onion into a steel bowl; steam rises from a pan behind.
cameraShot change — handheld, slight push-in.
speechSpeaker A: "The pan should already be hot — listen for it."
audio[oil sizzle, rising]
00:09–00:15
PANE 03
visualInsert graphic over black: an ingredient list animates in, three items at a time.
on-screen"1 red onion · 2 tbsp olive oil · 1 tsp cumin · pinch of salt"
audio[soft synth pad, no speech]
What it captures

Five modalities, aligned on one timeline.

01

Visuals & scene

Setting, subjects, composition and shot type — described per pane so the model knows what it's looking at.

02

Speech & speakers

Accurate transcription with speaker labels and frame-accurate timing — every line anchored to where it happens.

03

On-screen text

Captions, slides, lower thirds and UI read straight off the frame via OCR — nothing on screen is lost.

04

Action & audio

What people and objects do, plus non-speech sound — sizzles, applause, music cues — captured as events.

How it works

From file to prompt-ready in three moves.

1

Point it at a video

Upload a file or pass a URL. Any length, any format — paned segments it into coherent panes automatically.

2

It decodes every pane

Vision, speech, OCR and audio models run per pane and are merged onto a single aligned timeline.

3

Drop it in your prompt

Get back paned.md or paned.json — paste into a prompt or stream it into your pipeline.

Where teams use it

Anywhere an LLM needs to understand footage.

Video RAG & semantic search
Meeting & interview analysis
Content moderation
Course & lecture summarization
Highlight & clip generation
Ad & brand-safety review
Sports & broadcast tagging
Accessibility & audio description
Training-data captioning
Early access

Give your model eyes.

paned is opening up to a first wave of teams building with video. Leave your email and we'll be in touch.

No spam — just an invite when your spot opens.