Minor updates to overlap detection and segment coalescing logic

2026-04-28 14:11:38 -05:00
parent 28c2eea340
commit a3ca6665a9
14 changed files with 662 additions and 95 deletions
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ Global flags:
 | `--output-modules` | No | `json` | Comma-separated output modules. |
 | `--preprocessing-modules` | No | `validate-raw,normalize-speakers,trim-text` | Comma-separated preprocessing modules, evaluated in order. |
 | `--postprocessing-modules` | No | `detect-overlaps,resolve-overlaps,backchannel,filler,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output` | Comma-separated postprocessing modules, evaluated in order. |
-| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`. Must be a non-negative float. |
+| `--coalesce-gap` | No | `3.0` | Maximum same-speaker gap in seconds for `coalesce`; also used as the `resolve-overlaps` context window. Must be a non-negative float. |

 Environment variables:

@@ -59,6 +59,8 @@ Environment variables:
 | --- | --- | --- |
 | `SERIATIM_OVERLAP_WORD_RUN_GAP` | `0.75` | Maximum gap in seconds between adjacent timed words when `resolve-overlaps` builds word-run replacement segments. Must be a positive float. |
 | `SERIATIM_OVERLAP_WORD_RUN_REORDER_WINDOW` | `0.4` | Near-start window in seconds for ordering replacement word runs shortest-first. Must be a positive float. |
+| `SERIATIM_BACKCHANNEL_MAX_DURATION` | `2.0` | Maximum duration in seconds for `backchannel` classification. Must be a positive float. |
+| `SERIATIM_FILLER_MAX_DURATION` | `1.25` | Maximum duration in seconds for `filler` classification. Must be a positive float. |

 ## Input JSON Format

@@ -241,7 +243,10 @@ The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overla

 For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments:

- Words are included when their interval intersects the overlap window: `word.end > group.start && word.start < group.end`.
+- The resolution window expands the detected overlap group by `--coalesce-gap` seconds on both sides.
+- Nearby same-speaker context segments are included when they intersect the expanded window and their start or end is within `--coalesce-gap` of the original overlap boundary.
+- Words are included when their interval intersects the expanded resolution window.
+- Context segments that are part of another detected overlap group are not pulled into the current group.
 - Untimed words are included in replacement text in original word order when nearby timed words create a replacement run.
 - Untimed words do not affect replacement segment start/end times or word-run gap splitting.
 - Words for the same speaker are merged into one run when the gap between adjacent words is no greater than `SERIATIM_OVERLAP_WORD_RUN_GAP`.
@@ -266,7 +271,7 @@ The default pipeline runs `backchannel` before `coalesce`. It tags short acknowl
 "categories": ["backchannel"]
 ```

-Backchannel matching is case-insensitive, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `1.0` second.
+Backchannel matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `SERIATIM_BACKCHANNEL_MAX_DURATION` seconds. The default maximum duration is `2.0` seconds.

 ## Fillers

@@ -276,7 +281,7 @@ The default pipeline runs `filler` after `backchannel` and before `coalesce`. It
 "categories": ["filler"]
 ```

-Filler matching is case-insensitive, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `1.0` second.
+Filler matching is case-insensitive, ignores punctuation for matching and word-count purposes, trims surrounding whitespace, and requires only filler tokens such as `um`, `uh`, `er`, `erm`, `ah`, `eh`, `hmm`, `mm`, or repeated combinations of those tokens. Matching segments must contain no more than three whitespace-delimited words and have duration no greater than `SERIATIM_FILLER_MAX_DURATION` seconds. The default maximum duration is `1.25` seconds.

 ## Coalescing