The Audio Layer
Why narration is a separate file
Text-to-speech reading raw markdown is unlistenable — it speaks the punctuation, the code, the syntax, and the heading hashes. Every doc that wants audio gets a hand-written .audio.txt companion. The narration adapts the doc for the ear: paraphrasing diagrams, smoothing acronyms, structuring breath. No file → no audio reader on the page.
File shape
- Path mirrors the markdown URL.
docs/foo/bar.md→docusaurus/static/audio/foo/bar.audio.txt. - Open with one or two sentences before the first
##. This becomes a synthetic "Introduction" section that the reader speaks under the page title — without it, the listener jumps straight to the first H2 and never hears the doc title. Don't restate the title (the reader speaks it for you at higher pitch); use the intro to add real context — what the doc covers, who it's for. ## Title→ H2 section.### Title→ H3 section. Don't go deeper.- Section titles must mirror the markdown headings exactly — that's how the per-heading play triggers map to sections.
- Don't echo a heading in its first sentence. The reader speaks each heading for you at higher pitch.
- Skip frontmatter, page-level metadata, and visual-only callouts.
Sentence rules
- A period (or
!?) ends an utterance. The TTS engine plays each utterance independently — that gives prev/next finer granularity and dodges Chrome's silent-cutoff bug on long sentences. - 15–25 words per sentence. Break run-ons.
- Convert mid-sentence colons and semicolons to periods: "Two things: A and B" → "Two things. First, A. Second, B."
- Em-dashes stay inside an utterance.
- Aim for 30 seconds to 2 minutes per section. The title sentence is the natural breath — don't try to add pauses with
...(SpeechSynthesis ignores them).
Speaking the unspeakable
| Source | Write as |
|---|---|
Times — 09:00, 5:30 P M | 9 A M, 5 30 P M |
Numbered steps — Step 1 | Step one |
| Acronyms not pronounced as words | Spell with spaces: U R L, A P I, I D |
Code identifiers in prose — MANUAL_ENTRY_REQUEST | Natural names: Manual Entry Request. Keep literal flag names when discussing the flag itself: isCompleted flips true. |
| Latin abbreviations | e.g. → for example, i.e. → that is, etc. → and so on |
Markdown syntax — **bold**, _italic_, backticks, link URLs | Drop entirely (keep link text) |
Embedded content
- Mermaid diagrams — describe in third person: open with the diagram type, then walk its structure. "Here is a state machine with three states. The flow begins at…"
- Code blocks — don't read line-by-line. One or two sentences on what the snippet does.
- Numbered lists — "Step one. Step two." or "First. Second. Third."
- Bulleted lists — preface with the count if it's known: "Three things to know. First, …". Otherwise just sentences, blank-line separated.
- Small tables — paraphrase the contrast in prose. Larger tables — summarise and point to the page: "See the table on the page for the full breakdown."
Cross-references
Don't skip "See also" — give each linked item one short sentence so the listener knows what's there without navigating. In body text, use "See the X chapter" / "covered in the Y section" — drop URLs.
Pronunciations
Some product, team, or feature names get mispronounced. Register phonetic spellings here so authors and the AI generator stay consistent.
| Term | Spell as | Notes |
|---|---|---|
| Skapp | (TBD — confirm with team) | Product name |
When adding an entry: pick a phonetic spelling, run a 5-second test on the on-page reader to confirm it sounds right, then commit.
Validation
The audio file's sections must line up exactly with the markdown's headings. Every ## or ### in the markdown should have a matching one in the audio file, and vice versa. CI fails the PR when they don't match.
Two ways things break:
- A heading exists in the markdown but not the audio file. Clicking the heading's play button would do nothing.
- A marker exists in the audio file but not the markdown. The reader would announce a section that's gone from the page.
The check is forgiving on capitals and punctuation — ## Foo: Bar and ## foo bar count as the same.