seriatim
seriatim merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order.
The current implementation supports the merge command. It reads one or more input JSON files, maps each input file to a canonical speaker using speakers.yml, sorts all segments by timestamp, assigns consecutive numeric id values, and writes a merged JSON artifact.
Usage
Run from source:
go run ./cmd/seriatim merge \
--input-file samples/raw/2026-04-19-Eric_Rakestraw.json \
--input-file samples/raw/2026-04-19-Mike_Brown.json \
--speakers samples/speakers.yml \
--output-file merged.json
Optional report output:
go run ./cmd/seriatim merge \
--input-file eric.json \
--input-file mike.json \
--speakers speakers.yml \
--output-file merged.json \
--report-file report.json
CLI
seriatim merge [flags]
Required flags for the default pipeline:
--input-file: input transcript JSON file. Repeat once per speaker/input file.--speakers: speaker map YAML file. Required becausenormalize-speakersis enabled by default.--output-file: merged transcript JSON output path.
Optional flags:
--report-file: write a JSON report with pipeline events.--input-reader: input reader module. Default:json-files.--output-modules: comma-separated output modules. Default:json.--preprocessing-modules: comma-separated preprocessing modules. Default:validate-raw,normalize-speakers,trim-text.--postprocessing-modules: comma-separated postprocessing modules. Default:detect-overlaps,resolve-overlaps,assign-ids,validate-output.--autocorrect: autocorrect rules file. Required when the postprocessingautocorrectmodule is enabled.
Input JSON Format
Each input file must be valid JSON with a top-level segments array. The current parser accepts the WhisperX segment subset needed for merging:
{
"segments": [
{
"start": 1.25,
"end": 3.5,
"text": "Hello there."
}
]
}
Required segment fields:
start: number, must be>= 0.end: number, must be>= start.text: string.
Other WhisperX fields, including words and raw diarization speaker labels, are ignored for now.
Speaker Map Format
speakers.yml maps input files to canonical speaker names using ordered substring rules:
match:
- speaker: "Eric Rakestraw"
match:
- "Eric_Rakestraw"
- "Eric"
- speaker: "Mike Brown"
match:
- "Mike_Brown"
- "mb"
For each --input-file, seriatim takes the file basename and evaluates the rules in order. The first rule with a matching substring wins, and no later rules are evaluated.
For example, this input:
samples/raw/2026-04-19-Eric_Rakestraw.json
matches this rule because the basename contains Eric_Rakestraw:
- speaker: "Eric Rakestraw"
match:
- "Eric_Rakestraw"
Important details:
- Matching is against the input file basename, not the full path.
- Matching is case-insensitive.
- Rules are evaluated from first to last.
- Each rule must have a non-empty
speaker. - Each rule must have at least one non-empty
matchstring. - Duplicate speaker names are invalid.
- Every input file must match at least one rule or the command fails.
Deprecated old format:
inputs:
eric.json:
speaker: "Eric Rakestraw"
The old inputs: direct mapping format is no longer supported.
Output JSON Format
The merged output uses the current seriatim envelope:
{
"metadata": {
"application": "seriatim",
"version": "dev",
"input_reader": "json-files",
"input_files": ["eric.json", "mike.json"],
"preprocessing_modules": ["validate-raw", "normalize-speakers", "trim-text"],
"postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "assign-ids", "validate-output"],
"output_modules": ["json"]
},
"segments": [
{
"id": 1,
"source": "eric.json",
"source_segment_index": 0,
"speaker": "Eric Rakestraw",
"start": 1.25,
"end": 3.5,
"text": "Hello there."
}
],
"overlap_groups": []
}
Segments are sorted deterministically by:
(start, end, source, source_segment_index, speaker)
Final segment IDs are assigned after sorting and start at 1.
Autocorrect
Autocorrect is an opt-in postprocessing module. It is not part of the default pipeline.
Enable it by adding autocorrect to --postprocessing-modules and passing --autocorrect:
go run ./cmd/seriatim merge \
--input-file input.json \
--speakers speakers.yml \
--autocorrect autocorrect.yml \
--postprocessing-modules detect-overlaps,resolve-overlaps,autocorrect,assign-ids,validate-output \
--output-file merged.json
autocorrect.yml format:
autocorrect:
- target: "Hrank"
match:
- "hrank"
- "Frank"
- target: "Mike Brown"
match:
- "Mike Pat"
Matching behavior:
- Matching is case-sensitive.
- Matches apply only to whole tokens, not substrings inside larger words.
- Punctuation and whitespace can surround a match.
- Multi-word and hyphenated matches are supported.
- Duplicate match strings are invalid, including duplicates across separate rules.
Current Limitations
- Only JSON input is supported.
- Word-level timing data is not preserved yet.
- Overlap detection and overlap resolution are currently no-op modules.
- Coalescing and alternate output formats are not implemented yet.