Cleaned up documentation and development artifcats in advance of release
This commit is contained in:
47
README.md
47
README.md
@@ -31,21 +31,34 @@ go run ./cmd/seriatim merge \
|
||||
seriatim merge [flags]
|
||||
```
|
||||
|
||||
Required flags for the default pipeline:
|
||||
Global flags:
|
||||
|
||||
- `--input-file`: input transcript JSON file. Repeat once per speaker/input file.
|
||||
- `--output-file`: merged transcript JSON output path.
|
||||
| Flag | Description |
|
||||
| --- | --- |
|
||||
| `--help` | Show command help. |
|
||||
| `--version` | Show application version. Local builds default to `dev`; release builds inject the release version. |
|
||||
|
||||
Optional flags:
|
||||
`merge` flags:
|
||||
|
||||
- `--report-file`: write a JSON report with pipeline events.
|
||||
- `--speakers`: speaker map YAML file. When omitted, input file basenames are used as speaker labels.
|
||||
- `--autocorrect`: autocorrect rules file. When omitted, the default `autocorrect` module no-ops.
|
||||
- `--input-reader`: input reader module. Default: `json-files`.
|
||||
- `--output-modules`: comma-separated output modules. Default: `json`.
|
||||
- `--preprocessing-modules`: comma-separated preprocessing modules. Default: `validate-raw,normalize-speakers,trim-text`.
|
||||
- `--postprocessing-modules`: comma-separated postprocessing modules. Default: `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output`.
|
||||
- `--coalesce-gap`: maximum same-speaker gap in seconds for `coalesce`. Default: `3.0`.
|
||||
| Flag | Required | Default | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `--input-file` | Yes | none | Input transcript JSON file. Repeat once per speaker/input file. |
|
||||
| `--output-file` | Yes | none | Merged transcript JSON output path. |
|
||||
| `--report-file` | No | none | Optional report JSON output path. |
|
||||
| `--speakers` | No | none | Speaker map YAML file. When omitted, input file basenames are used as speaker labels. |
|
||||
| `--autocorrect` | No | none | Autocorrect rules YAML file. When omitted, the default `autocorrect` module leaves text unchanged. |
|
||||
| `--input-reader` | No | `json-files` | Input reader module. |
|
||||
| `--output-modules` | No | `json` | Comma-separated output modules. |
|
||||
| `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. |
|
||||
| `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
|
||||
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`. Must be a non-negative float. |
|
||||
|
||||
Environment variables:
|
||||
|
||||
| Environment Variable | Default | Description |
|
||||
| --- | --- | --- |
|
||||
| `SERIATIM_OVERLAP_WORD_RUN_GAP` | `0.75` | Maximum gap in seconds between adjacent timed words when `resolve-overlaps` builds word-run replacement segments. Must be a positive float. |
|
||||
| `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` | `0.4` | Near-start window in seconds for ordering replacement word runs shortest-first. Must be a positive float. |
|
||||
|
||||
## Input JSON Format
|
||||
|
||||
@@ -312,4 +325,12 @@ Matching behavior:
|
||||
|
||||
- Only JSON input is supported.
|
||||
- Overlap resolution depends on WhisperX word timing; groups without usable word timing remain unresolved.
|
||||
- Coalescing and alternate output formats are not implemented yet.
|
||||
- Alternate output formats are not implemented yet.
|
||||
|
||||
## Release Builds
|
||||
|
||||
Local builds record version metadata as `dev`. Release builds should inject the release version with `ldflags`:
|
||||
|
||||
```sh
|
||||
go build -ldflags "-X gitea.maximumdirect.net/eric/seriatim/internal/buildinfo.Version=v1.0.0" ./cmd/seriatim
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user