Document trim command

This commit is contained in:
2026-05-08 14:57:52 +00:00
parent c48b02d2ec
commit 54f7717de8
2 changed files with 95 additions and 3 deletions

View File

@@ -1,8 +1,8 @@
# seriatim
`seriatim` merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order.
`seriatim` merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order. It also trims existing seriatim output artifacts by segment ID.
The current implementation supports the `merge` command. It reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, detects and resolves overlaps when word-level timing is available, assigns consecutive numeric `id` values, and writes a merged JSON artifact.
The current implementation supports the `merge` and `trim` commands. `merge` reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, detects and resolves overlaps when word-level timing is available, assigns consecutive numeric `id` values, and writes a merged JSON artifact. `trim` reads an existing seriatim output artifact and projects it to a retained segment subset.
## Usage
@@ -25,10 +25,20 @@ go run ./cmd/seriatim merge \
--report-file report.json
```
Trim an existing seriatim artifact:
```sh
go run ./cmd/seriatim trim \
--input-file merged.json \
--output-file trimmed.json \
--keep "1-10, 15, 20-25"
```
## CLI
```text
seriatim merge [flags]
seriatim trim [flags]
```
Global flags:
@@ -54,6 +64,50 @@ Global flags:
| `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,resolve-danglers,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`; also used as the `resolve-overlaps` context window. Must be a non-negative float. |
`trim` flags:
| Flag | Required | Default | Description |
| --- | --- | --- | --- |
| `--input-file` | Yes | none | Input seriatim output artifact JSON file. |
| `--output-file` | Yes | none | Trimmed transcript JSON output path. |
| `--keep` | Exactly one of `--keep` or `--remove` is required | none | Segment ID selector to retain. |
| `--remove` | Exactly one of `--keep` or `--remove` is required | none | Segment ID selector to drop. |
| `--output-schema` | No | preserve input artifact schema | Optional output schema override: `seriatim-minimal`, `seriatim-intermediate`, or `seriatim-full`. |
| `--report-file` | No | none | Optional report JSON output path. |
| `--allow-empty` | No | `false` | Allow trimming to zero retained segments. |
`trim` selection rules:
- `--keep` and `--remove` are mutually exclusive.
- Exactly one of `--keep` or `--remove` is required.
- Selection is by segment ID only.
- Invalid selected segment IDs fail the command by default.
`trim` selector syntax:
- Segment IDs are positive 1-based integers.
- Inclusive ranges are supported: `1-10`.
- Comma-separated selectors are supported: `1-10,15,20-25`.
- Whitespace around numbers, commas, and hyphens is allowed: `1 - 10, 15, 20 - 25`.
- Duplicate and overlapping ranges are accepted and normalized as a union.
- Descending ranges (for example `10-1`) are rejected.
`trim` behavior:
- `trim` consumes existing seriatim JSON output artifacts only.
- `trim` does not accept raw WhisperX transcript JSON as input.
- Retained output segment IDs are renumbered sequentially from `1` to `N`.
- Transcript order is preserved from input transcript order; selector order does not reorder output.
- When output schema is `seriatim-full`, overlap groups are recomputed from retained segments.
- `--output-schema seriatim-full` is supported when trim has full-schema artifact data to emit; trim does not synthesize missing full-schema provenance from minimal/intermediate input artifacts.
- `trim` does not run merge postprocessors such as `resolve-overlaps`, `coalesce`, or `autocorrect`.
`trim` report output:
- When `--report-file` is provided, the report includes standard trim/validation/output events.
- The report includes a `trim-audit` event containing trim operation metadata, including selected IDs, retained/removed counts, removed IDs, and old-to-new segment ID mapping.
- Old-to-new ID mapping is emitted as a deterministic ordered array of `{old_id, new_id}` pairs.
Environment variables:
| Environment Variable | Default | Description |