Cleaned up documentation and development artifcats in advance of release

This commit is contained in:
2026-04-27 21:48:04 -05:00
parent 6cb739be55
commit 28c2eea340
15 changed files with 336 additions and 92 deletions

View File

@@ -31,21 +31,34 @@ go run ./cmd/seriatim merge \
seriatim merge [flags]
```
Required flags for the default pipeline:
Global flags:
- `--input-file`: input transcript JSON file. Repeat once per speaker/input file.
- `--output-file`: merged transcript JSON output path.
| Flag | Description |
| --- | --- |
| `--help` | Show command help. |
| `--version` | Show application version. Local builds default to `dev`; release builds inject the release version. |
Optional flags:
`merge` flags:
- `--report-file`: write a JSON report with pipeline events.
- `--speakers`: speaker map YAML file. When omitted, input file basenames are used as speaker labels.
- `--autocorrect`: autocorrect rules file. When omitted, the default `autocorrect` module no-ops.
- `--input-reader`: input reader module. Default: `json-files`.
- `--output-modules`: comma-separated output modules. Default: `json`.
- `--preprocessing-modules`: comma-separated preprocessing modules. Default: `validate-raw,normalize-speakers,trim-text`.
- `--postprocessing-modules`: comma-separated postprocessing modules. Default: `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output`.
- `--coalesce-gap`: maximum same-speaker gap in seconds for `coalesce`. Default: `3.0`.
| Flag | Required | Default | Description |
| --- | --- | --- | --- |
| `--input-file` | Yes | none | Input transcript JSON file. Repeat once per speaker/input file. |
| `--output-file` | Yes | none | Merged transcript JSON output path. |
| `--report-file` | No | none | Optional report JSON output path. |
| `--speakers` | No | none | Speaker map YAML file. When omitted, input file basenames are used as speaker labels. |
| `--autocorrect` | No | none | Autocorrect rules YAML file. When omitted, the default `autocorrect` module leaves text unchanged. |
| `--input-reader` | No | `json-files` | Input reader module. |
| `--output-modules` | No | `json` | Comma-separated output modules. |
| `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. |
| `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`. Must be a non-negative float. |
Environment variables:
| Environment Variable | Default | Description |
| --- | --- | --- |
| `SERIATIM_OVERLAP_WORD_RUN_GAP` | `0.75` | Maximum gap in seconds between adjacent timed words when `resolve-overlaps` builds word-run replacement segments. Must be a positive float. |
| `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` | `0.4` | Near-start window in seconds for ordering replacement word runs shortest-first. Must be a positive float. |
## Input JSON Format
@@ -312,4 +325,12 @@ Matching behavior:
- Only JSON input is supported.
- Overlap resolution depends on WhisperX word timing; groups without usable word timing remain unresolved.
- Coalescing and alternate output formats are not implemented yet.
- Alternate output formats are not implemented yet.
## Release Builds
Local builds record version metadata as `dev`. Release builds should inject the release version with `ldflags`:
```sh
go build -ldflags "-X gitea.maximumdirect.net/eric/seriatim/internal/buildinfo.Version=v1.0.0" ./cmd/seriatim
```