Minor updates to overlap detection and segment coalescing logic
This commit is contained in:
13
README.md
13
README.md
@@ -51,7 +51,7 @@ Global flags:
|
||||
| `--output-modules` | No | `json` | Comma-separated output modules. |
|
||||
| `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. |
|
||||
| `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
|
||||
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`. Must be a non-negative float. |
|
||||
| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`; also used as the `resolve-overlaps` context window. Must be a non-negative float. |
|
||||
|
||||
Environment variables:
|
||||
|
||||
@@ -59,6 +59,8 @@ Environment variables:
|
||||
| --- | --- | --- |
|
||||
| `SERIATIM_OVERLAP_WORD_RUN_GAP` | `0.75` | Maximum gap in seconds between adjacent timed words when `resolve-overlaps` builds word-run replacement segments. Must be a positive float. |
|
||||
| `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` | `0.4` | Near-start window in seconds for ordering replacement word runs shortest-first. Must be a positive float. |
|
||||
| `SERIATIM_BACKCHANNEL_MAX_DURATION` | `2.0` | Maximum duration in seconds for `backchannel` classification. Must be a positive float. |
|
||||
| `SERIATIM_FILLER_MAX_DURATION` | `1.25` | Maximum duration in seconds for `filler` classification. Must be a positive float. |
|
||||
|
||||
## Input JSON Format
|
||||
|
||||
@@ -241,7 +243,10 @@ The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overla
|
||||
|
||||
For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments:
|
||||
|
||||
- Words are included when their interval intersects the overlap window: `word.end > group.start && word.start < group.end`.
|
||||
- The resolution window expands the detected overlap group by `--coalesce-gap` seconds on both sides.
|
||||
- Nearby same-speaker context segments are included when they intersect the expanded window and their start or end is within `--coalesce-gap` of the original overlap boundary.
|
||||
- Words are included when their interval intersects the expanded resolution window.
|
||||
- Context segments that are part of another detected overlap group are not pulled into the current group.
|
||||
- Untimed words are included in replacement text in original word order when nearby timed words create a replacement run.
|
||||
- Untimed words do not affect replacement segment start/end times or word-run gap splitting.
|
||||
- Words for the same speaker are merged into one run when the gap between adjacent words is no greater than `SERIATIM_OVERLAP_WORD_RUN_GAP`.
|
||||
@@ -266,7 +271,7 @@ The default pipeline runs `backchannel` before `coalesce`. It tags short acknowl
|
||||
"categories": ["backchannel"]
|
||||
```
|
||||
|
||||
Backchannel matching is case-insensitive, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `1.0` second.
|
||||
Backchannel matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `SERIATIM_BACKCHANNEL_MAX_DURATION` seconds. The default maximum duration is `2.0` seconds.
|
||||
|
||||
## Fillers
|
||||
|
||||
@@ -276,7 +281,7 @@ The default pipeline runs `filler` after `backchannel` and before `coalesce`. It
|
||||
"categories": ["filler"]
|
||||
```
|
||||
|
||||
Filler matching is case-insensitive, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `1.0` second.
|
||||
Filler matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `SERIATIM_FILLER_MAX_DURATION` seconds. The default maximum duration is `1.25` seconds.
|
||||
|
||||
## Coalescing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user