Cleaned up documentation and removed stubs and TODOs throughout the application
This commit is contained in:
177
README.md
177
README.md
@@ -1,127 +1,92 @@
|
||||
# feedkit
|
||||
|
||||
`feedkit` provides domain-agnostic plumbing for feed-processing daemons.
|
||||
`feedkit` is a small Go toolkit for building feed-processing daemons.
|
||||
|
||||
A daemon built on feedkit typically:
|
||||
- ingests upstream input (polling APIs or consuming streams)
|
||||
It gives you the reusable plumbing around collection, processing, routing, and
|
||||
emission, while leaving domain concepts, schemas, and application wiring in
|
||||
your daemon. The intended shape is a family of sibling applications such as
|
||||
`weatherfeeder`, `newsfeeder`, or `earthquakefeeder` that all share the same
|
||||
infrastructure patterns without sharing domain logic.
|
||||
|
||||
## What It Does
|
||||
|
||||
A daemon built on `feedkit` typically:
|
||||
- ingests upstream input by polling HTTP APIs or consuming streams
|
||||
- emits domain-agnostic `event.Event` values
|
||||
- applies optional processing (normalization, dedupe, policy)
|
||||
- routes events to sinks (stdout, NATS, files, databases, etc.)
|
||||
- optionally processes those events with stages like dedupe or normalization
|
||||
- routes events to one or more sinks such as stdout, NATS, or Postgres
|
||||
|
||||
Conceptually, the pipeline is:
|
||||
|
||||
`Collect -> Process -> Route -> Emit`
|
||||
|
||||
## Philosophy
|
||||
|
||||
feedkit is not a framework. It provides small composable packages and leaves
|
||||
lifecycle, domain schemas, and domain-specific validation in your daemon.
|
||||
`feedkit` is intentionally not a framework.
|
||||
|
||||
## Conceptual pipeline
|
||||
It does not try to own:
|
||||
- your domain payload schemas
|
||||
- your domain event kinds
|
||||
- your daemon lifecycle or `main.go`
|
||||
- your observability stack or deployment model
|
||||
|
||||
Collect -> Process (optional stages, including dedupe + normalize) -> Route -> Emit
|
||||
Instead, it provides small composable packages that are easy to wire together in
|
||||
different daemons.
|
||||
|
||||
| Stage | Package(s) |
|
||||
|---|---|
|
||||
| Collect | `sources`, `scheduler` |
|
||||
| Process | `pipeline`, `processors`, `processors/dedupe`, `processors/normalize` (optional stages) |
|
||||
| Route | `dispatch` |
|
||||
| Emit | `sinks` |
|
||||
| Configure | `config` |
|
||||
## When To Use It
|
||||
|
||||
## Core packages
|
||||
`feedkit` is a good fit when you want:
|
||||
- multiple small ingestion daemons with shared infrastructure patterns
|
||||
- clear separation between raw upstream payloads and normalized canonical models
|
||||
- reusable routing and sink behavior across domains
|
||||
- strong config and event-envelope conventions without centralizing domain rules
|
||||
|
||||
### `config`
|
||||
It is a poor fit if you want a monolithic framework that dictates application
|
||||
structure end-to-end.
|
||||
|
||||
Loads YAML config with strict decoding and domain-agnostic validation.
|
||||
## Built-In Capabilities
|
||||
|
||||
`SourceConfig` supports both source modes:
|
||||
- `mode: poll` requires `every`
|
||||
- `mode: stream` forbids `every`
|
||||
- omitted `mode` means auto (inferred from the registered driver type)
|
||||
`feedkit` currently includes:
|
||||
- strict YAML config loading and validation
|
||||
- polling and streaming source abstractions
|
||||
- scheduler orchestration for configured sources
|
||||
- optional pipeline processors
|
||||
- built-in dedupe and normalization processors
|
||||
- route compilation and sink fanout
|
||||
- built-in sinks for `stdout`, `nats`, and `postgres`
|
||||
|
||||
It also supports optional expected source kinds:
|
||||
- `kinds: ["observation", "alert"]` (preferred)
|
||||
- `kind: "observation"` (legacy fallback)
|
||||
The Postgres sink is intentionally split between feedkit-owned infrastructure
|
||||
and daemon-owned schema mapping. `feedkit` manages connection setup, DDL,
|
||||
writes, and pruning; downstream applications define the schema and event mapper.
|
||||
|
||||
### `event`
|
||||
## Typical Wiring
|
||||
|
||||
Defines the domain-agnostic event envelope (`event.Event`) used across the system.
|
||||
|
||||
### `sources`
|
||||
|
||||
Defines source interfaces and driver registry:
|
||||
|
||||
```go
|
||||
type Input interface {
|
||||
Name() string
|
||||
}
|
||||
|
||||
type PollSource interface {
|
||||
Input
|
||||
Poll(ctx context.Context) ([]event.Event, error)
|
||||
}
|
||||
|
||||
type StreamSource interface {
|
||||
Input
|
||||
Run(ctx context.Context, out chan<- event.Event) error
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
- a poll can emit `0..N` events
|
||||
- stream sources emit events continuously
|
||||
- a single source may emit multiple event kinds
|
||||
- driver implementations live in downstream daemons and are registered via `sources.Registry`
|
||||
|
||||
### `scheduler`
|
||||
|
||||
Runs one goroutine per source job:
|
||||
- poll sources: cadence driven (`every` + jitter)
|
||||
- stream sources: continuous run loop
|
||||
|
||||
### `pipeline`
|
||||
|
||||
Optional processing chain between collection and dispatch.
|
||||
Processors can transform, drop, or reject events.
|
||||
|
||||
### `processors`
|
||||
|
||||
Defines the generic processor interface and a named-driver registry used by
|
||||
daemons to build ordered processor chains.
|
||||
|
||||
### `processors/dedupe`
|
||||
|
||||
Built-in in-memory LRU dedupe processor that drops repeated events by `Event.ID`.
|
||||
|
||||
### `processors/normalize`
|
||||
|
||||
Concrete normalization processor implementation. Typical use: sources emit raw
|
||||
payload events, then a normalize stage maps them to canonical schemas.
|
||||
|
||||
### `dispatch`
|
||||
|
||||
Compiles routes and fans out events to sinks with per-sink queue/worker isolation.
|
||||
|
||||
### `sinks`
|
||||
|
||||
Defines sink interface and sink registry. Built-ins include:
|
||||
- `stdout`
|
||||
- `nats`
|
||||
- `postgres`
|
||||
|
||||
Detailed Postgres configuration and wiring examples live in package docs:
|
||||
`sinks/doc.go`.
|
||||
|
||||
## Typical wiring
|
||||
At a high level, a daemon built on `feedkit` does this:
|
||||
|
||||
1. Load config.
|
||||
2. Register/build sources from `cfg.Sources`.
|
||||
3. Register/build sinks from `cfg.Sinks`.
|
||||
4. Compile routes.
|
||||
5. Start scheduler (`sources -> bus`).
|
||||
6. Start dispatcher (`bus -> pipeline -> sinks`).
|
||||
2. Register domain-specific source drivers.
|
||||
3. Register built-in and/or custom sinks.
|
||||
4. Build sources, sinks, and optional processor chain from config.
|
||||
5. Compile routes.
|
||||
6. Start the scheduler and dispatcher.
|
||||
|
||||
## Non-goals
|
||||
The package docs are the better source of truth for code-level details. In
|
||||
particular, each subpackage `doc.go` describes its external API surface and any
|
||||
optional helper APIs in `helpers.go`.
|
||||
|
||||
feedkit intentionally does not:
|
||||
- define domain payload schemas
|
||||
- enforce domain-specific event kinds
|
||||
- own application lifecycle
|
||||
- prescribe observability stack choices
|
||||
## Package Layout
|
||||
|
||||
The major packages are:
|
||||
- `config`: config loading and validation
|
||||
- `event`: the domain-agnostic event envelope
|
||||
- `sources`: source interfaces and reusable source helpers
|
||||
- `scheduler`: source execution and cadence management
|
||||
- `processors`: processor interfaces and registry
|
||||
- `processors/dedupe`: built-in in-memory dedupe processor
|
||||
- `processors/normalize`: built-in normalization processor and helpers
|
||||
- `pipeline`: optional processor chain
|
||||
- `dispatch`: route compilation and fanout
|
||||
- `sinks`: sink interfaces, built-ins, and Postgres registration helpers
|
||||
|
||||
The root package docs in `doc.go` provide a concise package-by-package map for
|
||||
Go documentation consumers.
|
||||
|
||||
Reference in New Issue
Block a user