# seriatim `seriatim` merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order. The current implementation supports the `merge` command. It reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, detects and resolves overlaps when word-level timing is available, assigns consecutive numeric `id` values, and writes a merged JSON artifact. ## Usage Run from source: ```sh go run ./cmd/seriatim merge \ --input-file samples/raw/2026-04-19-Eric_Rakestraw.json \ --input-file samples/raw/2026-04-19-Mike_Brown.json \ --output-file merged.json ``` Optional report output: ```sh go run ./cmd/seriatim merge \ --input-file eric.json \ --input-file mike.json \ --output-file merged.json \ --report-file report.json ``` ## CLI ```text seriatim merge [flags] ``` Required flags for the default pipeline: - `--input-file`: input transcript JSON file. Repeat once per speaker/input file. - `--output-file`: merged transcript JSON output path. Optional flags: - `--report-file`: write a JSON report with pipeline events. - `--speakers`: speaker map YAML file. When omitted, input file basenames are used as speaker labels. - `--autocorrect`: autocorrect rules file. When omitted, the default `autocorrect` module no-ops. - `--input-reader`: input reader module. Default: `json-files`. - `--output-modules`: comma-separated output modules. Default: `json`. - `--preprocessing-modules`: comma-separated preprocessing modules. Default: `validate-raw,normalize-speakers,trim-text`. - `--postprocessing-modules`: comma-separated postprocessing modules. Default: `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output`. - `--coalesce-gap`: maximum same-speaker gap in seconds for `coalesce`. Default: `3.0`. ## Input JSON Format Each input file must be valid JSON with a top-level `segments` array. The current parser accepts the WhisperX segment subset needed for merging: ```json { "segments": [ { "start": 1.25, "end": 3.5, "text": "Hello there.", "words": [ {"word": "Hello", "start": 1.25, "end": 1.55, "score": 0.98}, {"word": "there.", "start": 1.7, "end": 2.0} ] } ] } ``` Required segment fields: - `start`: number, must be `>= 0`. - `end`: number, must be `>= start`. - `text`: string. Optional word fields: - `words`: array of word timing objects. - `words[].word`: string. - `words[].start`: optional number, must be `>= 0` when present. - `words[].end`: optional number, must be `>= start` when present with `start`. - `words[].score`: optional number. - `words[].speaker`: optional raw speaker label string. Word-level timing is preserved internally for overlap resolution. If a word is missing `start` or `end`, seriatim keeps the word text, emits a warning in the optional report, and does not use that word as a timing anchor. Word timing is not emitted in the final JSON artifact. ## Speaker Map Format `speakers.yml` maps input files to canonical speaker names using ordered substring rules: This file is optional. If `--speakers` is omitted, `seriatim` uses each input file basename as the segment speaker label. ```yaml match: - speaker: "Eric Rakestraw" match: - "Eric_Rakestraw" - "Eric" - speaker: "Mike Brown" match: - "Mike_Brown" - "mb" ``` For each `--input-file`, `seriatim` takes the file basename and evaluates the rules in order. The first rule with a matching substring wins, and no later rules are evaluated. For example, this input: ```text samples/raw/2026-04-19-Eric_Rakestraw.json ``` matches this rule because the basename contains `Eric_Rakestraw`: ```yaml - speaker: "Eric Rakestraw" match: - "Eric_Rakestraw" ``` Important details: - Matching is against the input file basename, not the full path. - Matching is case-insensitive. - Rules are evaluated from first to last. - Each rule must have a non-empty `speaker`. - Each rule must have at least one non-empty `match` string. - Duplicate speaker names are invalid. - Every input file must match at least one rule or the command fails. Deprecated old format: ```yaml inputs: eric.json: speaker: "Eric Rakestraw" ``` The old `inputs:` direct mapping format is no longer supported. ## Output JSON Format The merged output uses the current seriatim envelope: ```json { "metadata": { "application": "seriatim", "version": "dev", "input_reader": "json-files", "input_files": ["eric.json", "mike.json"], "preprocessing_modules": ["validate-raw", "normalize-speakers", "trim-text"], "postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "backchannel", "filler", "coalesce", "detect-overlaps", "autocorrect", "assign-ids", "validate-output"], "output_modules": ["json"] }, "segments": [ { "id": 1, "source": "eric.json", "source_segment_index": 0, "speaker": "Eric Rakestraw", "start": 1.25, "end": 3.5, "text": "Hello there.", "overlap_group_id": 1 }, { "id": 2, "source": "eric.json", "source_ref": "word-run:1:1:1", "derived_from": ["eric.json#0"], "speaker": "Eric Rakestraw", "start": 2.0, "end": 2.5, "text": "Resolved word run", "categories": ["backchannel"] } ], "overlap_groups": [ { "id": 1, "start": 1.25, "end": 4.0, "segments": ["eric.json#0", "mike.json#0"], "speakers": ["Eric Rakestraw", "Mike Brown"], "class": "unknown", "resolution": "unresolved" } ] } ``` Segments are sorted deterministically by: ```text (start, end, source, source_segment_index/source_ref, speaker) ``` Final segment IDs are assigned after sorting and start at `1`. ## Overlap Detection The default postprocessing pipeline detects overlapping segment groups. Overlap behavior: - A strict timing overlap is required: `next.start < current_group_end`. - Segments that only touch at a boundary are not grouped. - Groups require at least two distinct speakers. - Transitive overlaps are grouped together. - Segments in detected groups receive `overlap_group_id`. - `overlap_groups[].segments` contains stable references in `source#source_segment_index` format. - `class` is currently `unknown`. - `resolution` is `unresolved` until `resolve-overlaps` replaces the group. ## Overlap Resolution The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overlaps`, then `backchannel`, then `filler`, then `coalesce`, then a second `detect-overlaps` pass. For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments: - Words are included when their interval intersects the overlap window: `word.end > group.start && word.start < group.end`. - Untimed words are included in replacement text in original word order when nearby timed words create a replacement run. - Untimed words do not affect replacement segment start/end times or word-run gap splitting. - Words for the same speaker are merged into one run when the gap between adjacent words is no greater than `SERIATIM_OVERLAP_WORD_RUN_GAP`. - The default word-run gap is `0.75` seconds. - Set `SERIATIM_OVERLAP_WORD_RUN_GAP` to a positive number of seconds to override the default. - Near-start replacement word runs are reordered so shorter segments come first when adjacent starts are within `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW`. - The default word-run reorder window is `0.4` seconds. - Set `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` to a positive number of seconds to override the default. - Replacement segment text is built by joining word text with single spaces. - Replacement segments include `source_ref` and `derived_from`. - Replacement segments omit `source_segment_index` because they are derived from one or more original segments. - Resolved overlap groups are removed before the second detection pass. - Replacement segments are left without `overlap_group_id` until the second detection pass annotates any remaining overlap. - If a speaker has no usable word timing in a group, that speaker's original segment is kept. - If no speakers in a group have usable word timing, the original group and annotations remain unchanged. ## Backchannels The default pipeline runs `backchannel` before `coalesce`. It tags short acknowledgement segments with: ```json "categories": ["backchannel"] ``` Backchannel matching is case-insensitive, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `1.0` second. ## Fillers The default pipeline runs `filler` after `backchannel` and before `coalesce`. It tags short filler utterances with: ```json "categories": ["filler"] ``` Filler matching is case-insensitive, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `1.0` second. ## Coalescing The default pipeline runs `coalesce` before the second overlap detection pass. It merges adjacent same-speaker segments in the transcript's current order when `next.start - current.end <= --coalesce-gap`. Coalesced segments use `source_ref` values such as `coalesce:1`, include `derived_from`, and omit `source_segment_index`. Different-speaker backchannel and filler segments do not block coalescing of surrounding same-speaker segments. Same-speaker backchannel and filler segments are merged normally when they are within `--coalesce-gap`. When same-speaker segments are coalesced, any `backchannel` or `filler` category from the merged inputs is dropped from the coalesced segment. ## Autocorrect Autocorrect is included in the default postprocessing pipeline. If `--autocorrect` is omitted, the module leaves transcript text unchanged and records a skip event in the optional report. Enable corrections by passing `--autocorrect`: ```sh go run ./cmd/seriatim merge \ --input-file input.json \ --autocorrect autocorrect.yml \ --output-file merged.json ``` `autocorrect.yml` format: ```yaml autocorrect: - target: "Hrank" match: - "hrank" - "Frank" - target: "Mike Brown" match: - "Mike Pat" ``` Matching behavior: - Matching is case-sensitive. - Matches apply only to whole tokens, not substrings inside larger words. - Punctuation and whitespace can surround a match. - Multi-word and hyphenated matches are supported. - Duplicate match strings are invalid, including duplicates across separate rules. ## Current Limitations - Only JSON input is supported. - Overlap resolution depends on WhisperX word timing; groups without usable word timing remain unresolved. - Coalescing and alternate output formats are not implemented yet.