# seriatim `seriatim` merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order. It also trims existing seriatim output artifacts by segment ID. The current implementation supports the `merge` and `trim` commands. `merge` reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, detects and resolves overlaps when word-level timing is available, assigns consecutive numeric `id` values, and writes a merged JSON artifact. `trim` reads an existing seriatim output artifact and projects it to a retained segment subset. ## Usage Run from source: ```sh go run ./cmd/seriatim merge \ --input-file samples/raw/2026-04-19-Eric_Rakestraw.json \ --input-file samples/raw/2026-04-19-Mike_Brown.json \ --output-file merged.json ``` Optional report output: ```sh go run ./cmd/seriatim merge \ --input-file eric.json \ --input-file mike.json \ --output-file merged.json \ --report-file report.json ``` Trim an existing seriatim artifact: ```sh go run ./cmd/seriatim trim \ --input-file merged.json \ --output-file trimmed.json \ --keep "1-10, 15, 20-25" ``` ## CLI ```text seriatim merge [flags] seriatim trim [flags] ``` Global flags: | Flag | Description | | --- | --- | | `--help` | Show command help. | | `--version` | Show application version. Local builds default to `dev`; release builds inject the release version. | `merge` flags: | Flag | Required | Default | Description | | --- | --- | --- | --- | | `--input-file` | Yes | none | Input transcript JSON file. Repeat once per speaker/input file. | | `--output-file` | Yes | none | Merged transcript JSON output path. | | `--report-file` | No | none | Optional report JSON output path. | | `--speakers` | No | none | Speaker map YAML file. When omitted, input file basenames are used as speaker labels. | | `--autocorrect` | No | none | Autocorrect rules YAML file. When omitted, the default `autocorrect` module leaves text unchanged. | | `--input-reader` | No | `json-files` | Input reader module. | | `--output-modules` | No | `json` | Comma-separated output modules. | | `--output-schema` | No | `seriatim-intermediate` | JSON output contract. Allowed values are `seriatim-minimal`, `seriatim-intermediate`, and `seriatim-full`. If omitted, the runtime default is used; consumers that depend on a specific shape should set this explicitly. | | `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. | | `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,resolve-danglers,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. | | `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`; also used as the `resolve-overlaps` context window. Must be a non-negative float. | `trim` flags: | Flag | Required | Default | Description | | --- | --- | --- | --- | | `--input-file` | Yes | none | Input seriatim output artifact JSON file. | | `--output-file` | Yes | none | Trimmed transcript JSON output path. | | `--keep` | Exactly one of `--keep` or `--remove` is required | none | Segment ID selector to retain. | | `--remove` | Exactly one of `--keep` or `--remove` is required | none | Segment ID selector to drop. | | `--output-schema` | No | preserve input artifact schema | Optional output schema override: `seriatim-minimal`, `seriatim-intermediate`, or `seriatim-full`. | | `--report-file` | No | none | Optional report JSON output path. | | `--allow-empty` | No | `false` | Allow trimming to zero retained segments. | `trim` selection rules: - `--keep` and `--remove` are mutually exclusive. - Exactly one of `--keep` or `--remove` is required. - Selection is by segment ID only. - Invalid selected segment IDs fail the command by default. `trim` selector syntax: - Segment IDs are positive 1-based integers. - Inclusive ranges are supported: `1-10`. - Comma-separated selectors are supported: `1-10,15,20-25`. - Whitespace around numbers, commas, and hyphens is allowed: `1 - 10, 15, 20 - 25`. - Duplicate and overlapping ranges are accepted and normalized as a union. - Descending ranges (for example `10-1`) are rejected. `trim` behavior: - `trim` consumes existing seriatim JSON output artifacts only. - `trim` does not accept raw WhisperX transcript JSON as input. - Retained output segment IDs are renumbered sequentially from `1` to `N`. - Transcript order is preserved from input transcript order; selector order does not reorder output. - When output schema is `seriatim-full`, overlap groups are recomputed from retained segments. - `--output-schema seriatim-full` is supported when trim has full-schema artifact data to emit; trim does not synthesize missing full-schema provenance from minimal/intermediate input artifacts. - `trim` does not run merge postprocessors such as `resolve-overlaps`, `coalesce`, or `autocorrect`. `trim` report output: - When `--report-file` is provided, the report includes standard trim/validation/output events. - The report includes a `trim-audit` event containing trim operation metadata, including selected IDs, retained/removed counts, removed IDs, and old-to-new segment ID mapping. - Old-to-new ID mapping is emitted as a deterministic ordered array of `{old_id, new_id}` pairs. Environment variables: | Environment Variable | Default | Description | | --- | --- | --- | | `SERIATIM_OUTPUT_SCHEMA` | `seriatim-intermediate` | Output schema used when `--output-schema` is not explicitly provided. Allowed values are `seriatim-minimal`, `seriatim-intermediate`, and `seriatim-full`. The CLI flag takes precedence. | | `SERIATIM_OVERLAP_WORD_RUN_GAP` | `1.0` | Maximum gap in seconds between adjacent timed words when `resolve-overlaps` builds word-run replacement segments. Must be a positive float. | | `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` | `1.0` | Near-start window in seconds for ordering replacement word runs shortest-first. Must be a positive float. | | `SERIATIM_BACKCHANNEL_MAX_DURATION` | `2.0` | Maximum duration in seconds for `backchannel` classification. Must be a positive float. | | `SERIATIM_FILLER_MAX_DURATION` | `1.25` | Maximum duration in seconds for `filler` classification. Must be a positive float. | ## Input JSON Format Each input file must be valid JSON with a top-level `segments` array. The current parser accepts the WhisperX segment subset needed for merging: ```json { "segments": [ { "start": 1.25, "end": 3.5, "text": "Hello there.", "words": [ {"word": "Hello", "start": 1.25, "end": 1.55, "score": 0.98}, {"word": "there.", "start": 1.7, "end": 2.0} ] } ] } ``` Required segment fields: - `start`: number, must be `>= 0`. - `end`: number, must be `>= start`. - `text`: string. Optional word fields: - `words`: array of word timing objects. - `words[].word`: string. - `words[].start`: optional number, must be `>= 0` when present. - `words[].end`: optional number, must be `>= start` when present with `start`. - `words[].score`: optional number. - `words[].speaker`: optional raw speaker label string. Word-level timing is preserved internally for overlap resolution. If a word is missing `start` or `end`, seriatim keeps the word text, emits a warning in the optional report, and does not use that word as a timing anchor. Word timing is not emitted in the final JSON artifact. ## Speaker Map Format `speakers.yml` maps input files to canonical speaker names using ordered substring rules: This file is optional. If `--speakers` is omitted, `seriatim` uses each input file basename as the segment speaker label. ```yaml match: - speaker: "Eric Rakestraw" match: - "Eric_Rakestraw" - "Eric" - speaker: "Mike Brown" match: - "Mike_Brown" - "mb" ``` For each `--input-file`, `seriatim` takes the file basename and evaluates the rules in order. The first rule with a matching substring wins, and no later rules are evaluated. For example, this input: ```text samples/raw/2026-04-19-Eric_Rakestraw.json ``` matches this rule because the basename contains `Eric_Rakestraw`: ```yaml - speaker: "Eric Rakestraw" match: - "Eric_Rakestraw" ``` Important details: - Matching is against the input file basename, not the full path. - Matching is case-insensitive. - Rules are evaluated from first to last. - Each rule must have a non-empty `speaker`. - Each rule must have at least one non-empty `match` string. - Duplicate speaker names are invalid. - Every input file must match at least one rule or the command fails. Deprecated old format: ```yaml inputs: eric.json: speaker: "Eric Rakestraw" ``` The old `inputs:` direct mapping format is no longer supported. ## Output JSON Format `--output-modules json` controls the writer. `--output-schema` controls the JSON contract that writer serializes. The named schemas are stable public contracts. If a consumer depends on a specific shape, it should request that schema explicitly at runtime. The runtime default selection may change in a future release. The `seriatim-intermediate` schema is the current default selection when neither `--output-schema` nor `SERIATIM_OUTPUT_SCHEMA` is set. It stays close to the minimal schema, but adds optional `categories` on each segment: ```json { "metadata": { "application": "seriatim", "version": "dev", "output_schema": "seriatim-intermediate" }, "segments": [ { "id": 1, "start": 1.25, "end": 3.5, "speaker": "Eric Rakestraw", "text": "Hello there.", "categories": ["backchannel"] } ] } ``` The `seriatim-full` schema uses the full seriatim envelope: ```json { "metadata": { "application": "seriatim", "version": "dev", "input_reader": "json-files", "input_files": ["eric.json", "mike.json"], "preprocessing_modules": ["validate-raw", "normalize-speakers", "trim-text"], "postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "backchannel", "filler", "resolve-danglers", "coalesce", "detect-overlaps", "autocorrect", "assign-ids", "validate-output"], "output_modules": ["json"] }, "segments": [ { "id": 1, "source": "eric.json", "source_segment_index": 0, "speaker": "Eric Rakestraw", "start": 1.25, "end": 3.5, "text": "Hello there.", "overlap_group_id": 1 }, { "id": 2, "source": "eric.json", "source_ref": "word-run:1:1:1", "derived_from": ["eric.json#0"], "speaker": "Eric Rakestraw", "start": 2.0, "end": 2.5, "text": "Resolved word run", "categories": ["backchannel"] } ], "overlap_groups": [ { "id": 1, "start": 1.25, "end": 4.0, "segments": ["eric.json#0", "mike.json#0"], "speakers": ["Eric Rakestraw", "Mike Brown"], "class": "unknown", "resolution": "unresolved" } ] } ``` The `seriatim-minimal` schema emits minimal metadata and compact ordered segments: ```json { "metadata": { "application": "seriatim", "version": "dev", "output_schema": "seriatim-minimal" }, "segments": [ { "id": 1, "start": 1.25, "end": 3.5, "speaker": "Eric Rakestraw", "text": "Hello there." } ] } ``` Minimal output intentionally omits categories, overlap groups, source/provenance fields, and pipeline configuration metadata. Intermediate output intentionally omits overlap groups and source/provenance fields, but keeps optional `categories` and minimal metadata. Segments are sorted deterministically by: ```text (start, end, source, source_segment_index/source_ref, speaker) ``` Final segment IDs are assigned after sorting and start at `1`. The public Go output contract is available from: ```go import "gitea.maximumdirect.net/eric/seriatim/schema" ``` The same package embeds machine-readable JSON Schemas in `schema/full-output.schema.json`, `schema/intermediate-output.schema.json`, and `schema/minimal-output.schema.json`. The default `validate-output` postprocessor validates the selected output shape and verifies final segment IDs are present, sequential, and start at `1`. ## Overlap Detection The default postprocessing pipeline detects overlapping segment groups. Overlap behavior: - A strict timing overlap is required: `next.start < current_group_end`. - Segments that only touch at a boundary are not grouped. - Groups require at least two distinct speakers. - Transitive overlaps are grouped together. - Segments in detected groups receive `overlap_group_id`. - `overlap_groups[].segments` contains stable references in `source#source_segment_index` format. - `class` is currently `unknown`. - `resolution` is `unresolved` until `resolve-overlaps` replaces the group. ## Overlap Resolution The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overlaps`, then `backchannel`, then `filler`, then `resolve-danglers`, then `coalesce`, then a second `detect-overlaps` pass. For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments: - The resolution window expands the detected overlap group by `--coalesce-gap` seconds on both sides. - Nearby same-speaker context segments are included when they intersect the expanded window and their start or end is within `--coalesce-gap` of the original overlap boundary. - Once a segment is selected for replacement, all timed words from that segment participate in word-run construction; the window controls segment selection, not per-word clipping. - Context segments that are part of another detected overlap group are not pulled into the current group. - Untimed words are included in replacement text in original word order when nearby timed words create a replacement run. - Untimed words do not affect replacement segment start/end times or word-run gap splitting. - Words for the same speaker are merged into one run when the gap between adjacent words is no greater than `SERIATIM_OVERLAP_WORD_RUN_GAP`. - The default word-run gap is `1.0` seconds. - Set `SERIATIM_OVERLAP_WORD_RUN_GAP` to a positive number of seconds to override the default. - Near-start replacement word runs are reordered so shorter segments come first when adjacent starts are within `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW`. - The default word-run reorder window is `1.0` seconds. - Set `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` to a positive number of seconds to override the default. - Replacement segment text is built by joining word text with single spaces. - Replacement segments include `source_ref` and `derived_from`. - Replacement segments omit `source_segment_index` because they are derived from one or more original segments. - Resolved overlap groups are removed before the second detection pass. - Replacement segments are left without `overlap_group_id` until the second detection pass annotates any remaining overlap. - If a speaker has no usable word timing in a group, that speaker's original segment is kept. - If no speakers in a group have usable word timing, the original group and annotations remain unchanged. ## Backchannels The default pipeline runs `backchannel` before `coalesce`. It tags short acknowledgement segments with: ```json "categories": ["backchannel"] ``` Backchannel matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `SERIATIM_BACKCHANNEL_MAX_DURATION` seconds. The default maximum duration is `2.0` seconds. ## Fillers The default pipeline runs `filler` after `backchannel` and before `coalesce`. It tags short filler utterances with: ```json "categories": ["filler"] ``` Filler matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `SERIATIM_FILLER_MAX_DURATION` seconds. The default maximum duration is `1.25` seconds. ## Dangler Resolution The default pipeline runs `resolve-danglers` before `coalesce` and before the second overlap detection pass. It repairs short derived fragments when they share provenance with a nearby segment: - Dangling-end fragments have no more than two words and end in punctuation. - Dangling-start fragments have no more than two words. - Matching uses same-speaker segments with any shared `derived_from` value. - Merged segments use `source_ref` values such as `resolve-danglers:1`, keep the target segment's transcript position, and union `derived_from`. ## Coalescing The default pipeline runs `coalesce` after `resolve-danglers` and before the second overlap detection pass. It merges adjacent same-speaker segments in the transcript's current order when `next.start - current.end <= --coalesce-gap`. Coalesced segments use `source_ref` values such as `coalesce:1`, include `derived_from`, and omit `source_segment_index`. Different-speaker backchannel and filler segments do not block coalescing of surrounding same-speaker segments. Same-speaker backchannel and filler segments are merged normally when they are within `--coalesce-gap`. When same-speaker segments are coalesced, any `backchannel` or `filler` category from the merged inputs is dropped from the coalesced segment. ## Autocorrect Autocorrect is included in the default postprocessing pipeline. If `--autocorrect` is omitted, the module leaves transcript text unchanged and records a skip event in the optional report. Enable corrections by passing `--autocorrect`: ```sh go run ./cmd/seriatim merge \ --input-file input.json \ --autocorrect autocorrect.yml \ --output-file merged.json ``` `autocorrect.yml` format: ```yaml autocorrect: - target: "Hrank" match: - "hrank" - "Frank" - target: "Mike Brown" match: - "Mike Pat" ``` Matching behavior: - Matching is case-sensitive. - Matches apply only to whole tokens, not substrings inside larger words. - Punctuation and whitespace can surround a match. - Multi-word and hyphenated matches are supported. - Duplicate match strings are invalid, including duplicates across separate rules. ## Current Limitations - Only JSON input is supported. - Overlap resolution depends on WhisperX word timing; groups without usable word timing remain unresolved. - Alternate output formats are not implemented yet. ## Release Builds Local builds record version metadata as `dev`. Release builds should inject the release version with `ldflags`: ```sh go build -ldflags "-X gitea.maximumdirect.net/eric/seriatim/internal/buildinfo.Version=v1.0.0" ./cmd/seriatim ```