End-to-end: take a long-form recording (Twitch VOD, podcast session, livestream) and ship five ready-to-post 9:16 clips into a Google shared drive with caption quality that survives iteration. Every file in Drive has a stable webViewLink that never changes, no matter how many times you re-transcribe or re-render.
The pipeline is two project skills working in sequence: portrait-foreignfilm-clips (first pass, end-to-end) and caption-quality-boost (re-transcribe + replace in place). Both live in .claude/skills/.
This guide is written in the order of the pipeline. If you already have landscape -captioned clips on disk, skip to #portrait. If you're starting from a raw recording, begin at #source.
Two common sources: a Twitch VOD / YouTube upload, or a locally-recorded podcast session. Both end in the same place: one large .mp4 on disk.
# youtube / twitch
yt-dlp -f "bv*[height<=1080]+ba/b[height<=1080]" \
-o "source/%(title)s.%(ext)s" \
"https://youtu.be/<id>"
# or just drop a local recording into source/
cp ~/Movies/Session-2026-04-16.mp4 source/
~/local/source/ or /tmp. Moviepy itself is fine with iCloud.
Three ways, pick one.
Transcribe the whole long-form once, then ask Claude to rank the top N moments against a rubric: hook strength, standalone coherence, quotability, visible screen activity. Output: a JSON list of {start, end, slug, title}.
Existing skill at .claude/skills/best-clips/. Scores long-form windows on visible coding activity + transcript energy. Good for stream recordings where the screen right-side moves.
For a curated show, write the timestamps by hand into a CSV.
# clip-list.csv
slug,start,end,title
experiment-until-you-beat-the-record,00:03:14,00:04:19,"Experiment until you beat the record"
i-wasnt-crazy-this-works,00:07:02,00:07:50,"I wasn't crazy — this works"
mcp-reliability-is-a-gamble,00:12:45,00:13:17,"MCP reliability is a gamble"
put-your-expertise-in-the-skills,00:21:30,00:22:19,"Put your expertise in the skills"
same-team-more-clients,00:33:12,00:34:00,"Same team, more clients"
Stream-copy for speed where cuts land on keyframes; re-encode with -c:v libx264 for frame-accurate cuts.
mkdir -p landscape-masters
while IFS=, read -r slug start end title; do
[[ "$slug" == "slug" ]] && continue
ffmpeg -y -ss "$start" -to "$end" -i "source/session.mp4" \
-c:v libx264 -preset medium -crf 20 \
-c:a aac -b:a 128k \
"landscape-masters/${slug}.mp4"
done < clip-list.csv
Output: five 1662×1080 clips. Face PiP is already baked into the long-form recording (mac PiP camera dock), so every extracted clip carries it at roughly (1370, 790) with size 260×260.
openai-whisper on CPU. small is ~real-time; medium is ~3× slower but meaningfully better on proper nouns and jargon — start with small on the first pass, upgrade in iteration.
python3 .claude/skills/portrait-foreignfilm-clips/scripts/transcribe.py \
landscape-masters/
# writes landscape-masters/_transcripts/{slug}.json — whisper raw, word_timestamps=true
Optional but conventional in this project. Many upstream clips already ship with a yellow all-caps caption burned into the landscape master. If yours do, call that folder *-captioned/ and continue. If not, you can skip straight to the portrait stage — the foreignfilm caption layer in stage 7 stands on its own.
The cover bar in stage 6 is sized to fully obscure any existing burned-in captions at y ≈ 860–890 of the 1080-tall source. If your source has no prior captions, you can drop or shrink the bar.
The layout transform. Two independent crops from the same 1662×1080 landscape master, vertically stacked into a 1080×1920 canvas.
| layer | size | y range | source |
|---|---|---|---|
| face | 1080×1080 | 0–1080 | crop 260² @ (1370, 790), upscaled 4× |
| screen | 1080×840 | 1080–1920 | crop 1340×1080, scaled to fit width, center-cropped |
| cover bar | 1080×220 | 1700–1920 | solid black, opacity 1.0 — hides any v1 captions |
| caption PNG | ≤1080×180 | ~1780 | pre-rasterized per cue (stage 7) |
Homebrew's ffmpeg 8 ships without libass, the subtitles filter, or even drawtext. So SRT/ASS burn-in is off the table. Instead: pre-rasterise each cue to a transparent PNG with PIL, then composite with moviepy.
# group whisper words into screen-cue chunks
groups = []; cur = []
for w in words:
cur.append(w)
if len(cur) >= 5 or cur[-1].end - cur[0].start >= 2.5 \
or len(" ".join(x.word for x in cur)) >= 34:
groups.append(cur); cur = []
Each group becomes a PNG drawn with Georgia Bold Italic 60pt, fill #F2D21B, 4px black stroke, centered, wrapped at ~22 chars. Positioned y = 1920 - img_h - 30.
| field | value | why |
|---|---|---|
| font | Georgia Bold Italic | foreign-film default; reads warm and serious |
| size | 60pt | legible at thumb scroll scale |
| colour | #F2D21B | warm yellow, pops on dark screens |
| stroke | 4px black | survives bright backgrounds without a box |
| chunk | ≤5 words / ≤2.5s / ≤34 chars | TikTok-pace, readable before it moves |
gws drive files update --upload replaces the media content of an existing file. The fileId and every webViewLink you've already sent keep working. Never create during iteration.
# list the destination folder, build a name → id inventory
gws drive files list \
--params '{"q":"\"<folder-id>\" in parents and trashed=false",
"driveId":"0AI4JyAzsoqRJUk9PVA",
"corpora":"drive",
"includeItemsFromAllDrives":true,
"supportsAllDrives":true,
"pageSize":200,
"fields":"files(id,name)"}'
# for each local mp4 — replace if match, create if new
cd "$OUT_DIR"
gws drive files update \
--params "{\"fileId\":\"$ID\",\"supportsAllDrives\":true}" \
--upload "$f" --upload-content-type video/mp4
--upload paths outside the current working directory. Always cd into the output folder and pass a basename.
Two-skill orchestration. Skill 1 does the first render and uploads. Skill 2 upgrades and replaces.
SESSION=measure-summit-2026
SRC="public/clips/portrait/podcast-clips/${SESSION}-captioned"
OUT="public/clips/portrait/podcast-clips/${SESSION}-portrait-ff"
python3 .claude/skills/portrait-foreignfilm-clips/scripts/transcribe.py "$SRC"
python3 .claude/skills/portrait-foreignfilm-clips/scripts/render_portrait_ff.py "$SRC" "$OUT"
bash .claude/skills/portrait-foreignfilm-clips/scripts/upload_clips.sh "$OUT" "${SESSION}-portrait-ff"
python3 .claude/skills/caption-quality-boost/scripts/retranscribe.py "$SRC" --model medium
python3 .claude/skills/portrait-foreignfilm-clips/scripts/render_portrait_ff.py "$SRC" "$OUT"
bash .claude/skills/caption-quality-boost/scripts/drive_replace.sh "$OUT" "${SESSION}-portrait-ff"
Render in one shell; the watcher in another uploads each clip the instant its .done sentinel appears.
# shell 1 — renderer touches {out}.done after each clip
python3 .claude/skills/portrait-foreignfilm-clips/scripts/render_portrait_ff.py "$SRC" "$OUT"
# shell 2 — watcher polls, replaces in place, writes .replaced
/bin/bash .claude/skills/caption-quality-boost/scripts/drive_replace_watch.sh "$OUT" "${SESSION}-portrait-ff"
| lever | from → to | lift | cost |
|---|---|---|---|
| whisper model | small → medium | big win on proper nouns, jargon | ~3× CPU time |
| LLM polish pass | raw → Claude-cleaned | punctuation, split run-ons, fix names | ~1 API call / clip |
| chunk length | 5w / 2.5s → 3w / 1.5s | tighter pacing, more beats | config only |
| caption size | 60pt → 72pt | easier at thumb scale | config only |
| face crop tightness | 260² → 220² | closer face, more emotion | per-session tune |
| moment selection | manual → best-clips skill | finds higher-energy hooks you'd skim past | ~1 min / hr of source |
-vf subtitles=. Use the PIL+moviepy renderer./bin/bash (Homebrew 5.x) or a case statement.cd first.drive files create mints a new fileId and breaks every share link you sent. Use drive files update for iteration.small mishears names. If your content is jargon-dense, jump to medium for the first pass and skip the re-render cycle.project/
├── source/ # long-form recordings (gitignored)
├── landscape-masters/ # extracted 1662×1080 cuts
│ └── _transcripts/{slug}.json # whisper output
├── public/clips/portrait/podcast-clips/
│ ├── {session}-captioned/{slug}-captioned.mp4 # landscape + v1 caption
│ └── {session}-portrait-ff/{slug}-ff.mp4 # final 1080×1920
├── .claude/skills/
│ ├── portrait-foreignfilm-clips/ # first render + upload
│ └── caption-quality-boost/ # re-transcribe + replace
└── DOCUMENTATION/PORTRAIT-CAPTIONS-GUIDE.html # this doc