seriatim

seriatim merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order.

The current implementation supports the merge command. It reads one or more input JSON files, maps each input file to a canonical speaker using speakers.yml, sorts all segments by timestamp, assigns consecutive numeric id values, and writes a merged JSON artifact.

Usage

Run from source:

go run ./cmd/seriatim merge \
  --input-file samples/raw/2026-04-19-Eric_Rakestraw.json \
  --input-file samples/raw/2026-04-19-Mike_Brown.json \
  --speakers samples/speakers.yml \
  --output-file merged.json

Optional report output:

go run ./cmd/seriatim merge \
  --input-file eric.json \
  --input-file mike.json \
  --speakers speakers.yml \
  --output-file merged.json \
  --report-file report.json

CLI

seriatim merge [flags]

Required flags for the default pipeline:

  • --input-file: input transcript JSON file. Repeat once per speaker/input file.
  • --speakers: speaker map YAML file. Required because normalize-speakers is enabled by default.
  • --output-file: merged transcript JSON output path.

Optional flags:

  • --report-file: write a JSON report with pipeline events.
  • --input-reader: input reader module. Default: json-files.
  • --output-modules: comma-separated output modules. Default: json.
  • --preprocessing-modules: comma-separated preprocessing modules. Default: validate-raw,normalize-speakers,trim-text.
  • --postprocessing-modules: comma-separated postprocessing modules. Default: detect-overlaps,resolve-overlaps,assign-ids,validate-output.
  • --autocorrect: autocorrect rules file. Required when the postprocessing autocorrect module is enabled.

Input JSON Format

Each input file must be valid JSON with a top-level segments array. The current parser accepts the WhisperX segment subset needed for merging:

{
  "segments": [
    {
      "start": 1.25,
      "end": 3.5,
      "text": "Hello there."
    }
  ]
}

Required segment fields:

  • start: number, must be >= 0.
  • end: number, must be >= start.
  • text: string.

Other WhisperX fields, including words and raw diarization speaker labels, are ignored for now.

Speaker Map Format

speakers.yml maps input files to canonical speaker names using ordered substring rules:

match:
  - speaker: "Eric Rakestraw"
    match:
      - "Eric_Rakestraw"
      - "Eric"

  - speaker: "Mike Brown"
    match:
      - "Mike_Brown"
      - "mb"

For each --input-file, seriatim takes the file basename and evaluates the rules in order. The first rule with a matching substring wins, and no later rules are evaluated.

For example, this input:

samples/raw/2026-04-19-Eric_Rakestraw.json

matches this rule because the basename contains Eric_Rakestraw:

- speaker: "Eric Rakestraw"
  match:
    - "Eric_Rakestraw"

Important details:

  • Matching is against the input file basename, not the full path.
  • Matching is case-insensitive.
  • Rules are evaluated from first to last.
  • Each rule must have a non-empty speaker.
  • Each rule must have at least one non-empty match string.
  • Duplicate speaker names are invalid.
  • Every input file must match at least one rule or the command fails.

Deprecated old format:

inputs:
  eric.json:
    speaker: "Eric Rakestraw"

The old inputs: direct mapping format is no longer supported.

Output JSON Format

The merged output uses the current seriatim envelope:

{
  "metadata": {
    "application": "seriatim",
    "version": "dev",
    "input_reader": "json-files",
    "input_files": ["eric.json", "mike.json"],
    "preprocessing_modules": ["validate-raw", "normalize-speakers", "trim-text"],
    "postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "assign-ids", "validate-output"],
    "output_modules": ["json"]
  },
  "segments": [
    {
      "id": 1,
      "source": "eric.json",
      "source_segment_index": 0,
      "speaker": "Eric Rakestraw",
      "start": 1.25,
      "end": 3.5,
      "text": "Hello there."
    }
  ],
  "overlap_groups": []
}

Segments are sorted deterministically by:

(start, end, source, source_segment_index, speaker)

Final segment IDs are assigned after sorting and start at 1.

Autocorrect

Autocorrect is an opt-in postprocessing module. It is not part of the default pipeline.

Enable it by adding autocorrect to --postprocessing-modules and passing --autocorrect:

go run ./cmd/seriatim merge \
  --input-file input.json \
  --speakers speakers.yml \
  --autocorrect autocorrect.yml \
  --postprocessing-modules detect-overlaps,resolve-overlaps,autocorrect,assign-ids,validate-output \
  --output-file merged.json

autocorrect.yml format:

autocorrect:
  - target: "Hrank"
    match:
      - "hrank"
      - "Frank"

  - target: "Mike Brown"
    match:
      - "Mike Pat"

Matching behavior:

  • Matching is case-sensitive.
  • Matches apply only to whole tokens, not substrings inside larger words.
  • Punctuation and whitespace can surround a match.
  • Multi-word and hyphenated matches are supported.
  • Duplicate match strings are invalid, including duplicates across separate rules.

Current Limitations

  • Only JSON input is supported.
  • Word-level timing data is not preserved yet.
  • Overlap detection and overlap resolution are currently no-op modules.
  • Coalescing and alternate output formats are not implemented yet.
Description
Seriatim merges per-speaker whisperx transcripts into a single output transcript that preserves speaker identity and chronological order.
Readme BSD-3-Clause 652 KiB
v1.2.0 Latest
2026-05-09 12:38:06 +00:00
Languages
Go 100%