Added initial segment overlap resolution logic
This commit is contained in:
56
README.md
56
README.md
@@ -2,7 +2,7 @@
|
||||
|
||||
`seriatim` merges per-speaker WhisperX-style JSON transcripts into a single JSON transcript that preserves speaker identity and chronological order.
|
||||
|
||||
The current implementation supports the `merge` command. It reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, assigns consecutive numeric `id` values, and writes a merged JSON artifact.
|
||||
The current implementation supports the `merge` command. It reads one or more input JSON files, optionally maps each input file to a canonical speaker using `speakers.yml`, sorts all segments by timestamp, detects and resolves overlaps when word-level timing is available, assigns consecutive numeric `id` values, and writes a merged JSON artifact.
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -56,7 +56,11 @@ Each input file must be valid JSON with a top-level `segments` array. The curren
|
||||
{
|
||||
"start": 1.25,
|
||||
"end": 3.5,
|
||||
"text": "Hello there."
|
||||
"text": "Hello there.",
|
||||
"words": [
|
||||
{"word": "Hello", "start": 1.25, "end": 1.55, "score": 0.98},
|
||||
{"word": "there.", "start": 1.7, "end": 2.0}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -68,7 +72,16 @@ Required segment fields:
|
||||
- `end`: number, must be `>= start`.
|
||||
- `text`: string.
|
||||
|
||||
Other WhisperX fields, including `words` and raw diarization speaker labels, are ignored for now.
|
||||
Optional word fields:
|
||||
|
||||
- `words`: array of word timing objects.
|
||||
- `words[].word`: string.
|
||||
- `words[].start`: optional number, must be `>= 0` when present.
|
||||
- `words[].end`: optional number, must be `>= start` when present with `start`.
|
||||
- `words[].score`: optional number.
|
||||
- `words[].speaker`: optional raw speaker label string.
|
||||
|
||||
Word-level timing is preserved internally for overlap resolution. If a word is missing `start` or `end`, seriatim keeps the word text, emits a warning in the optional report, and does not use that word as a timing anchor. Word timing is not emitted in the final JSON artifact.
|
||||
|
||||
## Speaker Map Format
|
||||
|
||||
@@ -150,6 +163,16 @@ The merged output uses the current seriatim envelope:
|
||||
"end": 3.5,
|
||||
"text": "Hello there.",
|
||||
"overlap_group_id": 1
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"source": "eric.json",
|
||||
"source_ref": "word-run:1:1:1",
|
||||
"derived_from": ["eric.json#0"],
|
||||
"speaker": "Eric Rakestraw",
|
||||
"start": 2.0,
|
||||
"end": 2.5,
|
||||
"text": "Resolved word run"
|
||||
}
|
||||
],
|
||||
"overlap_groups": [
|
||||
@@ -169,7 +192,7 @@ The merged output uses the current seriatim envelope:
|
||||
Segments are sorted deterministically by:
|
||||
|
||||
```text
|
||||
(start, end, source, source_segment_index, speaker)
|
||||
(start, end, source, source_segment_index/source_ref, speaker)
|
||||
```
|
||||
|
||||
Final segment IDs are assigned after sorting and start at `1`.
|
||||
@@ -187,7 +210,27 @@ Overlap behavior:
|
||||
- Segments in detected groups receive `overlap_group_id`.
|
||||
- `overlap_groups[].segments` contains stable references in `source#source_segment_index` format.
|
||||
- `class` is currently `unknown`.
|
||||
- `resolution` is currently `unresolved`; overlap resolution is still a no-op.
|
||||
- `resolution` is `unresolved` until `resolve-overlaps` replaces the group.
|
||||
|
||||
## Overlap Resolution
|
||||
|
||||
The default postprocessing pipeline runs `resolve-overlaps` after `detect-overlaps`.
|
||||
|
||||
For each detected overlap group, `resolve-overlaps` uses preserved WhisperX word timing to build smaller word-run replacement segments:
|
||||
|
||||
- Words are included when their interval intersects the overlap window: `word.end > group.start && word.start < group.end`.
|
||||
- Untimed words are included in replacement text in original word order when nearby timed words create a replacement run.
|
||||
- Untimed words do not affect replacement segment start/end times or word-run gap splitting.
|
||||
- Words for the same speaker are merged into one run when the gap between adjacent words is no greater than `SERIATIM_OVERLAP_WORD_RUN_GAP`.
|
||||
- The default word-run gap is `0.75` seconds.
|
||||
- Set `SERIATIM_OVERLAP_WORD_RUN_GAP` to a positive number of seconds to override the default.
|
||||
- Replacement segment text is built by joining word text with single spaces.
|
||||
- Replacement segments include `source_ref` and `derived_from`.
|
||||
- Replacement segments omit `source_segment_index` because they are derived from one or more original segments.
|
||||
- Resolved overlap groups are removed from `overlap_groups`.
|
||||
- Replacement segments are left without `overlap_group_id`; future passes can detect any remaining overlap.
|
||||
- If a speaker has no usable word timing in a group, that speaker's original segment is kept.
|
||||
- If no speakers in a group have usable word timing, the original group and annotations remain unchanged.
|
||||
|
||||
## Autocorrect
|
||||
|
||||
@@ -227,6 +270,5 @@ Matching behavior:
|
||||
## Current Limitations
|
||||
|
||||
- Only JSON input is supported.
|
||||
- Word-level timing data is not preserved yet.
|
||||
- Overlap resolution is currently a no-op module.
|
||||
- Overlap resolution depends on WhisperX word timing; groups without usable word timing remain unresolved.
|
||||
- Coalescing and alternate output formats are not implemented yet.
|
||||
|
||||
Reference in New Issue
Block a user