Skip to main content

The Audio Layer

Type: ReferenceCreated: Team: Platform
draft

Why narration is a separate file

Text-to-speech reading raw markdown is unlistenable — it speaks the punctuation, the code, the syntax, and the heading hashes. Every doc that wants audio gets a hand-written .audio.txt companion. The narration adapts the doc for the ear: paraphrasing diagrams, smoothing acronyms, structuring breath. No file → no audio reader on the page.

File shape

  • Path mirrors the markdown URL. docs/foo/bar.mddocusaurus/static/audio/foo/bar.audio.txt.
  • Open with one or two sentences before the first ## . This becomes a synthetic "Introduction" section that the reader speaks under the page title — without it, the listener jumps straight to the first H2 and never hears the doc title. Don't restate the title (the reader speaks it for you at higher pitch); use the intro to add real context — what the doc covers, who it's for.
  • ## Title → H2 section. ### Title → H3 section. Don't go deeper.
  • Section titles must mirror the markdown headings exactly — that's how the per-heading play triggers map to sections.
  • Don't echo a heading in its first sentence. The reader speaks each heading for you at higher pitch.
  • Skip frontmatter, page-level metadata, and visual-only callouts.

Sentence rules

  • A period (or !?) ends an utterance. The TTS engine plays each utterance independently — that gives prev/next finer granularity and dodges Chrome's silent-cutoff bug on long sentences.
  • 15–25 words per sentence. Break run-ons.
  • Convert mid-sentence colons and semicolons to periods: "Two things: A and B""Two things. First, A. Second, B."
  • Em-dashes stay inside an utterance.
  • Aim for 30 seconds to 2 minutes per section. The title sentence is the natural breath — don't try to add pauses with ... (SpeechSynthesis ignores them).

Speaking the unspeakable

SourceWrite as
Times — 09:00, 5:30 P M9 A M, 5 30 P M
Numbered steps — Step 1Step one
Acronyms not pronounced as wordsSpell with spaces: U R L, A P I, I D
Code identifiers in prose — MANUAL_ENTRY_REQUESTNatural names: Manual Entry Request. Keep literal flag names when discussing the flag itself: isCompleted flips true.
Latin abbreviationse.g.for example, i.e.that is, etc.and so on
Markdown syntax — **bold**, _italic_, backticks, link URLsDrop entirely (keep link text)

Embedded content

  • Mermaid diagrams — describe in third person: open with the diagram type, then walk its structure. "Here is a state machine with three states. The flow begins at…"
  • Code blocks — don't read line-by-line. One or two sentences on what the snippet does.
  • Numbered lists"Step one. Step two." or "First. Second. Third."
  • Bulleted lists — preface with the count if it's known: "Three things to know. First, …". Otherwise just sentences, blank-line separated.
  • Small tables — paraphrase the contrast in prose. Larger tables — summarise and point to the page: "See the table on the page for the full breakdown."

Cross-references

Don't skip "See also" — give each linked item one short sentence so the listener knows what's there without navigating. In body text, use "See the X chapter" / "covered in the Y section" — drop URLs.

Pronunciations

Some product, team, or feature names get mispronounced. Register phonetic spellings here so authors and the AI generator stay consistent.

TermSpell asNotes
Skapp(TBD — confirm with team)Product name

When adding an entry: pick a phonetic spelling, run a 5-second test on the on-page reader to confirm it sounds right, then commit.

Validation

The audio file's sections must line up exactly with the markdown's headings. Every ## or ### in the markdown should have a matching one in the audio file, and vice versa. CI fails the PR when they don't match.

Two ways things break:

  • A heading exists in the markdown but not the audio file. Clicking the heading's play button would do nothing.
  • A marker exists in the audio file but not the markdown. The reader would announce a section that's gone from the page.

The check is forgiving on capitals and punctuation — ## Foo: Bar and ## foo bar count as the same.