feat(feedkit): add optional normalization hook and document external API
Introduce an optional normalization stage for feedkit pipelines via the new normalize package. This adds: - normalize.Normalizer interface with flexible Match() semantics - normalize.Registry for ordered normalizer selection (first match wins) - normalize.Processor adapter implementing pipeline.Processor - Pass-through behavior when no normalizer matches (normalization is optional) - Func helper for ergonomic normalizer definitions Update root doc.go to fully document the normalization model, its role in the pipeline, recommended conventions (Schema-based matching, raw vs normalized events), and concrete wiring examples. The documentation now serves as a complete external-facing API specification for downstream daemons such as weatherfeeder. This change preserves feedkit’s non-framework philosophy while enabling a clean separation between data collection and domain normalization.
This commit is contained in:
160
doc.go
160
doc.go
@@ -16,8 +16,7 @@
|
||||
// In feedkit today, that maps to:
|
||||
//
|
||||
// Collect: sources.Source + scheduler.Scheduler
|
||||
// Normalize: (today: domain code typically does this inside Source.Poll;
|
||||
// future: a normalization Processor is a good fit)
|
||||
// Normalize: (optional) normalize.Processor (or domain code inside Source.Poll)
|
||||
// Policy: pipeline.Pipeline (Processor chain; dedupe/ratelimit are planned)
|
||||
// Emit: dispatch.Dispatcher + dispatch.Fanout
|
||||
// Sinks: sinks.Sink (+ sinks.Registry to build from config)
|
||||
@@ -76,6 +75,147 @@
|
||||
//
|
||||
// - dedupe/ratelimit processors are placeholders (planned).
|
||||
//
|
||||
// - normalize
|
||||
// Optional normalization hook for splitting "fetch" from "transform".
|
||||
//
|
||||
// Many domains (like weather) ingest multiple upstream providers whose payloads
|
||||
// differ. A common evolution is to keep sources small and focused on polling,
|
||||
// and move mapping/normalization into a dedicated stage.
|
||||
//
|
||||
// feedkit provides this as an OPTIONAL pipeline processor:
|
||||
//
|
||||
// - normalize.Normalizer: domain-implemented mapping logic
|
||||
//
|
||||
// - normalize.Registry: holds normalizers and selects one by Match()
|
||||
//
|
||||
// - normalize.Processor: adapts Registry into a pipeline.Processor
|
||||
//
|
||||
// Normalization is NOT required:
|
||||
//
|
||||
// - If you do all normalization inside Source.Poll, you can ignore this package.
|
||||
//
|
||||
// - If normalize.Processor is not installed in your pipeline, nothing changes.
|
||||
//
|
||||
// - If normalize.Processor is installed but no Normalizer matches an event,
|
||||
// the event passes through unchanged.
|
||||
//
|
||||
// The key types:
|
||||
//
|
||||
// type Normalizer interface {
|
||||
// // Match returns true if this normalizer should handle the event.
|
||||
// // Matching is intentionally flexible: match on Schema, Kind, Source,
|
||||
// // or any combination.
|
||||
// Match(e event.Event) bool
|
||||
//
|
||||
// // Normalize converts the incoming event into a new (or modified) event.
|
||||
// //
|
||||
// // Return values:
|
||||
// // - (out, nil) where out != nil: emit the normalized event
|
||||
// // - (nil, nil): drop the event (policy drop)
|
||||
// // - (nil, err): fail the pipeline
|
||||
// Normalize(ctx context.Context, in event.Event) (*event.Event, error)
|
||||
// }
|
||||
//
|
||||
// type Registry struct { ... }
|
||||
//
|
||||
// func (r *Registry) Register(n Normalizer)
|
||||
//
|
||||
// // Normalize finds the first matching normalizer (in registration order) and applies it.
|
||||
// // If none match, it returns the input event unchanged.
|
||||
// func (r *Registry) Normalize(ctx context.Context, in event.Event) (*event.Event, error)
|
||||
//
|
||||
// // Processor implements pipeline.Processor and calls into the Registry.
|
||||
// // Optional behavior:
|
||||
// // - If Registry is nil, Processor is a no-op pass-through.
|
||||
// // - If RequireMatch is false (default), non-matching events pass through.
|
||||
// // - If RequireMatch is true, non-matching events are treated as errors.
|
||||
// type Processor struct {
|
||||
// Registry *Registry
|
||||
// RequireMatch bool
|
||||
// }
|
||||
//
|
||||
// "First match wins":
|
||||
// Registry applies the first Normalizer whose Match() returns true.
|
||||
// This is intentional: normalization is usually a single mapping step from a
|
||||
// raw schema into a canonical schema. If you want multiple sequential transforms,
|
||||
// model them as multiple pipeline processors.
|
||||
//
|
||||
// Recommended convention: match by Event.Schema
|
||||
// ------------------------------------------------
|
||||
// Schema gives you a versionable selector that doesn't depend on source names.
|
||||
//
|
||||
// A common pattern is:
|
||||
//
|
||||
// - sources emit "raw" events with Schema like:
|
||||
// "raw.openweather.current.v1"
|
||||
// "raw.openmeteo.current.v1"
|
||||
// "raw.nws.observation.v1"
|
||||
//
|
||||
// - normalizers transform them into canonical domain schemas like:
|
||||
// "weather.observation.v1"
|
||||
// "weather.forecast.v1"
|
||||
// "weather.alert.v1"
|
||||
//
|
||||
// What is a "raw event"?
|
||||
// ------------------------------------------------
|
||||
// feedkit does not prescribe the raw payload representation.
|
||||
// A raw payload is typically one of:
|
||||
//
|
||||
// - json.RawMessage (recommended for JSON APIs)
|
||||
//
|
||||
// - []byte (raw bytes)
|
||||
//
|
||||
// - map[string]any (already-decoded but untyped JSON)
|
||||
//
|
||||
// The only hard requirement enforced by feedkit is Event.Validate():
|
||||
//
|
||||
// - ID, Kind, Source, EmittedAt must be set
|
||||
//
|
||||
// - Payload must be non-nil
|
||||
//
|
||||
// If you use raw events, you still must provide Event.Kind.
|
||||
// Typical approaches:
|
||||
//
|
||||
// - set Kind to the intended canonical kind (e.g. "observation") even before normalization
|
||||
//
|
||||
// - or set Kind to a domain-defined "raw_*" kind and normalize it later
|
||||
//
|
||||
// The simplest approach is: set Kind to the final kind early, and use Schema
|
||||
// to describe the raw-vs-normalized payload shape.
|
||||
//
|
||||
// Wiring example (daemon main.go)
|
||||
// ------------------------------------------------
|
||||
// Install normalize.Processor at the front of your pipeline:
|
||||
//
|
||||
// normReg := &normalize.Registry{}
|
||||
//
|
||||
// normReg.Register(normalize.Func{
|
||||
// Name: "openweather current -> weather.observation.v1",
|
||||
// MatchFn: func(e event.Event) bool {
|
||||
// return e.Schema == "raw.openweather.current.v1"
|
||||
// },
|
||||
// NormalizeFn: func(ctx context.Context, in event.Event) (*event.Event, error) {
|
||||
// // 1) interpret in.Payload (json.RawMessage / []byte / map)
|
||||
// // 2) build canonical domain payload
|
||||
// // 3) return updated event
|
||||
//
|
||||
// out := in
|
||||
// out.Schema = "weather.observation.v1"
|
||||
// // Optionally adjust Kind, EffectiveAt, etc.
|
||||
// out.Payload = /* canonical weather observation struct */
|
||||
// return &out, nil
|
||||
// },
|
||||
// })
|
||||
//
|
||||
// p := &pipeline.Pipeline{
|
||||
// Processors: []pipeline.Processor{
|
||||
// normalize.Processor{Registry: normReg}, // optional stage
|
||||
// // dedupe.New(...), ratelimit.New(...), ...
|
||||
// },
|
||||
// }
|
||||
//
|
||||
// If the event does not match any normalizer, it passes through unmodified.
|
||||
//
|
||||
// - sinks
|
||||
// Extension point for output adapters.
|
||||
//
|
||||
@@ -141,13 +281,24 @@
|
||||
// // Event bus.
|
||||
// bus := make(chan event.Event, 256)
|
||||
//
|
||||
// // Optional normalization registry + pipeline.
|
||||
// normReg := &normalize.Registry{}
|
||||
// // domain registers normalizers into normReg...
|
||||
//
|
||||
// p := &pipeline.Pipeline{
|
||||
// Processors: []pipeline.Processor{
|
||||
// normalize.Processor{Registry: normReg}, // optional
|
||||
// // dedupe/ratelimit/etc...
|
||||
// },
|
||||
// }
|
||||
//
|
||||
// // Scheduler.
|
||||
// s := &scheduler.Scheduler{Jobs: jobs, Out: bus, Logf: logf}
|
||||
//
|
||||
// // Dispatcher.
|
||||
// d := &dispatch.Dispatcher{
|
||||
// In: bus,
|
||||
// Pipeline: &pipeline.Pipeline{Processors: nil},
|
||||
// Pipeline: p,
|
||||
// Sinks: builtSinks,
|
||||
// Routes: routes,
|
||||
// }
|
||||
@@ -167,13 +318,12 @@
|
||||
// All blocking or I/O work should honor ctx.Done():
|
||||
// - sources.Source.Poll should pass ctx to HTTP calls, etc.
|
||||
// - sinks.Sink.Consume should honor ctx (Fanout timeouts only help if sinks cooperate).
|
||||
// - normalizers should honor ctx if they do expensive work (rare; usually pure transforms).
|
||||
//
|
||||
// Future additions (likely)
|
||||
//
|
||||
// - A small Runner helper that performs the standard wiring (load config,
|
||||
// build sources/sinks/routes, run scheduler+dispatcher, handle shutdown).
|
||||
// - A normalization hook (a Pipeline Processor + registry) that allows sources
|
||||
// to emit "raw" payloads and defer normalization to a dedicated stage.
|
||||
//
|
||||
// # Non-goals
|
||||
//
|
||||
|
||||
Reference in New Issue
Block a user