Cleaned up documentation and removed stubs and TODOs throughout the application

This commit is contained in:
2026-03-28 13:02:37 -05:00
parent 3ef93faf69
commit 3281368922
18 changed files with 403 additions and 345 deletions

177
README.md
View File

@@ -1,127 +1,92 @@
# feedkit
`feedkit` provides domain-agnostic plumbing for feed-processing daemons.
`feedkit` is a small Go toolkit for building feed-processing daemons.
A daemon built on feedkit typically:
- ingests upstream input (polling APIs or consuming streams)
It gives you the reusable plumbing around collection, processing, routing, and
emission, while leaving domain concepts, schemas, and application wiring in
your daemon. The intended shape is a family of sibling applications such as
`weatherfeeder`, `newsfeeder`, or `earthquakefeeder` that all share the same
infrastructure patterns without sharing domain logic.
## What It Does
A daemon built on `feedkit` typically:
- ingests upstream input by polling HTTP APIs or consuming streams
- emits domain-agnostic `event.Event` values
- applies optional processing (normalization, dedupe, policy)
- routes events to sinks (stdout, NATS, files, databases, etc.)
- optionally processes those events with stages like dedupe or normalization
- routes events to one or more sinks such as stdout, NATS, or Postgres
Conceptually, the pipeline is:
`Collect -> Process -> Route -> Emit`
## Philosophy
feedkit is not a framework. It provides small composable packages and leaves
lifecycle, domain schemas, and domain-specific validation in your daemon.
`feedkit` is intentionally not a framework.
## Conceptual pipeline
It does not try to own:
- your domain payload schemas
- your domain event kinds
- your daemon lifecycle or `main.go`
- your observability stack or deployment model
Collect -> Process (optional stages, including dedupe + normalize) -> Route -> Emit
Instead, it provides small composable packages that are easy to wire together in
different daemons.
| Stage | Package(s) |
|---|---|
| Collect | `sources`, `scheduler` |
| Process | `pipeline`, `processors`, `processors/dedupe`, `processors/normalize` (optional stages) |
| Route | `dispatch` |
| Emit | `sinks` |
| Configure | `config` |
## When To Use It
## Core packages
`feedkit` is a good fit when you want:
- multiple small ingestion daemons with shared infrastructure patterns
- clear separation between raw upstream payloads and normalized canonical models
- reusable routing and sink behavior across domains
- strong config and event-envelope conventions without centralizing domain rules
### `config`
It is a poor fit if you want a monolithic framework that dictates application
structure end-to-end.
Loads YAML config with strict decoding and domain-agnostic validation.
## Built-In Capabilities
`SourceConfig` supports both source modes:
- `mode: poll` requires `every`
- `mode: stream` forbids `every`
- omitted `mode` means auto (inferred from the registered driver type)
`feedkit` currently includes:
- strict YAML config loading and validation
- polling and streaming source abstractions
- scheduler orchestration for configured sources
- optional pipeline processors
- built-in dedupe and normalization processors
- route compilation and sink fanout
- built-in sinks for `stdout`, `nats`, and `postgres`
It also supports optional expected source kinds:
- `kinds: ["observation", "alert"]` (preferred)
- `kind: "observation"` (legacy fallback)
The Postgres sink is intentionally split between feedkit-owned infrastructure
and daemon-owned schema mapping. `feedkit` manages connection setup, DDL,
writes, and pruning; downstream applications define the schema and event mapper.
### `event`
## Typical Wiring
Defines the domain-agnostic event envelope (`event.Event`) used across the system.
### `sources`
Defines source interfaces and driver registry:
```go
type Input interface {
Name() string
}
type PollSource interface {
Input
Poll(ctx context.Context) ([]event.Event, error)
}
type StreamSource interface {
Input
Run(ctx context.Context, out chan<- event.Event) error
}
```
Notes:
- a poll can emit `0..N` events
- stream sources emit events continuously
- a single source may emit multiple event kinds
- driver implementations live in downstream daemons and are registered via `sources.Registry`
### `scheduler`
Runs one goroutine per source job:
- poll sources: cadence driven (`every` + jitter)
- stream sources: continuous run loop
### `pipeline`
Optional processing chain between collection and dispatch.
Processors can transform, drop, or reject events.
### `processors`
Defines the generic processor interface and a named-driver registry used by
daemons to build ordered processor chains.
### `processors/dedupe`
Built-in in-memory LRU dedupe processor that drops repeated events by `Event.ID`.
### `processors/normalize`
Concrete normalization processor implementation. Typical use: sources emit raw
payload events, then a normalize stage maps them to canonical schemas.
### `dispatch`
Compiles routes and fans out events to sinks with per-sink queue/worker isolation.
### `sinks`
Defines sink interface and sink registry. Built-ins include:
- `stdout`
- `nats`
- `postgres`
Detailed Postgres configuration and wiring examples live in package docs:
`sinks/doc.go`.
## Typical wiring
At a high level, a daemon built on `feedkit` does this:
1. Load config.
2. Register/build sources from `cfg.Sources`.
3. Register/build sinks from `cfg.Sinks`.
4. Compile routes.
5. Start scheduler (`sources -> bus`).
6. Start dispatcher (`bus -> pipeline -> sinks`).
2. Register domain-specific source drivers.
3. Register built-in and/or custom sinks.
4. Build sources, sinks, and optional processor chain from config.
5. Compile routes.
6. Start the scheduler and dispatcher.
## Non-goals
The package docs are the better source of truth for code-level details. In
particular, each subpackage `doc.go` describes its external API surface and any
optional helper APIs in `helpers.go`.
feedkit intentionally does not:
- define domain payload schemas
- enforce domain-specific event kinds
- own application lifecycle
- prescribe observability stack choices
## Package Layout
The major packages are:
- `config`: config loading and validation
- `event`: the domain-agnostic event envelope
- `sources`: source interfaces and reusable source helpers
- `scheduler`: source execution and cadence management
- `processors`: processor interfaces and registry
- `processors/dedupe`: built-in in-memory dedupe processor
- `processors/normalize`: built-in normalization processor and helpers
- `pipeline`: optional processor chain
- `dispatch`: route compilation and fanout
- `sinks`: sink interfaces, built-ins, and Postgres registration helpers
The root package docs in `doc.go` provide a concise package-by-package map for
Go documentation consumers.