My way of practicing English is simple: I watch one episode of the CBS Evening News every day. The hard part is the word "every" — humans are lazy — so I wrote the whole pipeline to take that decision away from me.

What happens before I wake up

Starting at 09:00 local time, a worker on the home server polls @CBSEveningNews every two hours for a new full broadcast. When it finds one, it calls yt-dlp to pull stream 137 (1080p H.264 video) and stream 140 (m4a audio at 128k), then repackages them with ffmpeg:

ffmpeg -i input.mp4 -c copy -movflags +faststart output.mp4

The +faststart flag moves the moov atom to the head of the file. Without that, the browser can't start seeking until the entire file is downloaded — which kills the experience of "rewind to that one sentence I want to listen to again."

YouTube's auto-generated captions are parsed into a subtitle_cues table, one row per cue with start/end and English text. If captions are missing for a given day, a second worker fires at 02:00 and runs faster-whisper base.en (int8 quantized) at low priority over the audio track. It costs an extra fifteen minutes of CPU time and never starves the daytime web server.

What I do when I wake up

I open the browser. The video is already there. Bilingual subtitles sit beneath it; click any English word and the offline ECDICT — about 770,000 entries — returns a definition in under 100 ms. The handful of words ECDICT doesn't know fall back to dictionaryapi.dev. New words save to a personal vocab list that exports to CSV or directly to an Anki deck.

The mode I use most is dictation. The audio plays, the subtitles are hidden, I type what I hear into an input box, and on submit it diffs my line against the cue and highlights what I got wrong. It's the only thing I've found that consistently catches words I think I heard right but didn't.

The frontend has no build chain. Just HTML and ES modules. I didn't want a Raspberry Pi running webpack every time I edit a stylesheet.