Implemented a module to detect backchannel segments, and updated the coalesce module to ignore them when coalescing same-speaker turns

2026-04-27 19:49:25 -05:00
parent aab6d12730
commit bbfb8aba44
10 changed files with 360 additions and 6 deletions
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ Optional flags:
 - `--input-reader`: input reader module. Default: `json-files`.
 - `--output-modules`: comma-separated output modules. Default: `json`.
 - `--preprocessing-modules`: comma-separated preprocessing modules. Default: `validate-raw,normalize-speakers,trim-text`.
- `--postprocessing-modules`: comma-separated postprocessing modules. Default: `detect-overlaps,resolve-overlaps,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output`.
+- `--postprocessing-modules`: comma-separated postprocessing modules. Default: `detect-overlaps,resolve-overlaps,backchannel,coalesce,detect-overlaps,autocorrect,assign-ids,validate-output`.
 - `--coalesce-gap`: maximum same-speaker gap in seconds for `coalesce`. Default: `3.0`.

 ## Input JSON Format
@@ -151,7 +151,7 @@ The merged output uses the current seriatim envelope:
    "input_reader": "json-files",
    "input_files": ["eric.json", "mike.json"],
    "preprocessing_modules": ["validate-raw", "normalize-speakers", "trim-text"],
-    "postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "coalesce", "detect-overlaps", "autocorrect", "assign-ids", "validate-output"],
+    "postprocessing_modules": ["detect-overlaps", "resolve-overlaps", "backchannel", "coalesce", "detect-overlaps", "autocorrect", "assign-ids", "validate-output"],
    "output_modules": ["json"]
  },
  "segments": [
@@ -173,7 +173,8 @@ The merged output uses the current seriatim envelope:
      "speaker": "Eric Rakestraw",
      "start": 2.0,
      "end": 2.5,
-      "text": "Resolved word run"
+      "text": "Resolved word run",
+      "categories": ["backchannel"]
    }
  ],
  "overlap_groups": [
@@ -215,7 +216,7 @@ Overlap behavior:

 ## Overlap Resolution

-The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overlaps`, then `coalesce`, then a second `detect-overlaps` pass.
+The default postprocessing pipeline runs `detect-overlaps`, then `resolve-overlaps`, then `backchannel`, then `coalesce`, then a second `detect-overlaps` pass.

 For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments:

@@ -236,12 +237,24 @@ For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word
 - If a speaker has no usable word timing in a group, that speaker's original segment is kept.
 - If no speakers in a group have usable word timing, the original group and annotations remain unchanged.

+## Backchannels
+
+The default pipeline runs `backchannel` before `coalesce`. It tags short acknowledgement segments with:
+
+```json
+"categories": ["backchannel"]
+```
+
+Backchannel matching is case-insensitive, trims surrounding whitespace, and requires a matching acknowledgement phrase, no more than three whitespace-delimited words, and duration no greater than `1.0` second.
+
 ## Coalescing

 The default pipeline runs `coalesce` before the second overlap detection pass. It merges adjacent same-speaker segments in the transcript's current order when `next.start - current.end <= --coalesce-gap`.

 Coalesced segments use `source_ref` values such as `coalesce:1`, include `derived_from`, and omit `source_segment_index`.

+Different-speaker backchannel segments do not block coalescing of surrounding same-speaker segments. When same-speaker segments are coalesced, any `backchannel` category from the merged inputs is dropped from the coalesced segment.
+
 ## Autocorrect

 Autocorrect is included in the default postprocessing pipeline. If `--autocorrect` is omitted, the module leaves transcript text unchanged and records a skip event in the optional report.