Added support for a minimal JSON output schema
This commit is contained in:
30
README.md
30
README.md
@@ -49,6 +49,7 @@ Global flags:
|
||||
| `--autocorrect` | No | none | Autocorrect rules YAML file. When omitted, the default `autocorrect` module leaves text unchanged. |
|
||||
| `--input-reader` | No | `json-files` | Input reader module. |
|
||||
| `--output-modules` | No | `json` | Comma-separated output modules. |
|
||||
| `--output-schema` | No | `seriatim` | JSON output contract. Allowed values are `seriatim` and `minimal`. |
|
||||
| `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. |
|
||||
| `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
|
||||
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`; also used as the `resolve-overlaps` context window. Must be a non-negative float. |
|
||||
@@ -156,7 +157,9 @@ The old `inputs:` direct mapping format is no longer supported.
|
||||
|
||||
## Output JSON Format
|
||||
|
||||
The merged output uses the current seriatim envelope:
|
||||
`--output-modules json` controls the writer. `--output-schema` controls the JSON contract that writer serializes.
|
||||
|
||||
The default `seriatim` schema uses the full seriatim envelope:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -206,6 +209,29 @@ The merged output uses the current seriatim envelope:
|
||||
}
|
||||
```
|
||||
|
||||
The `minimal` schema emits minimal metadata and compact ordered segments:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"application": "seriatim",
|
||||
"version": "dev",
|
||||
"output_schema": "minimal"
|
||||
},
|
||||
"segments": [
|
||||
{
|
||||
"id": 1,
|
||||
"start": 1.25,
|
||||
"end": 3.5,
|
||||
"speaker": "Eric Rakestraw",
|
||||
"text": "Hello there."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Minimal output intentionally omits overlap groups, categories, source/provenance fields, and pipeline configuration metadata.
|
||||
|
||||
Segments are sorted deterministically by:
|
||||
|
||||
```text
|
||||
@@ -220,7 +246,7 @@ The public Go output contract is available from:
|
||||
import "gitea.maximumdirect.net/eric/seriatim/schema"
|
||||
```
|
||||
|
||||
The same package embeds the machine-readable JSON Schema in `schema/output.schema.json`. The default `validate-output` postprocessor validates the output shape and verifies final segment IDs are present, sequential, and start at `1`.
|
||||
The same package embeds machine-readable JSON Schemas in `schema/output.schema.json` and `schema/minimal-output.schema.json`. The default `validate-output` postprocessor validates the selected output shape and verifies final segment IDs are present, sequential, and start at `1`.
|
||||
|
||||
## Overlap Detection
|
||||
|
||||
|
||||
Reference in New Issue
Block a user