Implemented an overlap detection module in the postprocessing chain
This commit is contained in:
32
README.md
32
README.md
@@ -148,10 +148,21 @@ The merged output uses the current seriatim envelope:
|
||||
"speaker": "Eric Rakestraw",
|
||||
"start": 1.25,
|
||||
"end": 3.5,
|
||||
"text": "Hello there."
|
||||
"text": "Hello there.",
|
||||
"overlap_group_id": 1
|
||||
}
|
||||
],
|
||||
"overlap_groups": []
|
||||
"overlap_groups": [
|
||||
{
|
||||
"id": 1,
|
||||
"start": 1.25,
|
||||
"end": 4.0,
|
||||
"segments": ["eric.json#0", "mike.json#0"],
|
||||
"speakers": ["Eric Rakestraw", "Mike Brown"],
|
||||
"class": "unknown",
|
||||
"resolution": "unresolved"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
@@ -163,6 +174,21 @@ Segments are sorted deterministically by:
|
||||
|
||||
Final segment IDs are assigned after sorting and start at `1`.
|
||||
|
||||
## Overlap Detection
|
||||
|
||||
The default postprocessing pipeline detects overlapping segment groups.
|
||||
|
||||
Overlap behavior:
|
||||
|
||||
- A strict timing overlap is required: `next.start < current_group_end`.
|
||||
- Segments that only touch at a boundary are not grouped.
|
||||
- Groups require at least two distinct speakers.
|
||||
- Transitive overlaps are grouped together.
|
||||
- Segments in detected groups receive `overlap_group_id`.
|
||||
- `overlap_groups[].segments` contains stable references in `source#source_segment_index` format.
|
||||
- `class` is currently `unknown`.
|
||||
- `resolution` is currently `unresolved`; overlap resolution is still a no-op.
|
||||
|
||||
## Autocorrect
|
||||
|
||||
Autocorrect is included in the default postprocessing pipeline. If `--autocorrect` is omitted, the module leaves transcript text unchanged and records a skip event in the optional report.
|
||||
@@ -202,5 +228,5 @@ Matching behavior:
|
||||
|
||||
- Only JSON input is supported.
|
||||
- Word-level timing data is not preserved yet.
|
||||
- Overlap detection and overlap resolution are currently no-op modules.
|
||||
- Overlap resolution is currently a no-op module.
|
||||
- Coalescing and alternate output formats are not implemented yet.
|
||||
|
||||
Reference in New Issue
Block a user