Most retrieval-augmented generation (RAG) stacks end up running four systems: PostgreSQL for relational data, a vector database (pgvector, Qdrant, Weaviate), an inverted index (Elasticsearch, Meilisearch), and some batch layer for analytics. Four systems, four sets of credentials, four clients, four different consistency models.
One PostgreSQL instance with the right extensions can cover all of that — and fit in a single Alpine container.
This post shows the tradeoffs of the oorabona/postgres image family and when each flavor makes sense.
The flavors
The image ships 7 flavors for PostgreSQL 16, 17, and 18, each bundling a different extension set:
| Flavor | Compressed size (amd64) | Extensions |
|---|---|---|
base |
108 MB | pg_stat_statements only |
vector |
214 MB | pgvector |
analytics |
179 MB | paradedb (pg_search with BM25) |
timeseries |
— | timescaledb, pg_partman |
spatial |
— | postgis |
distributed |
— | citus |
full |
242 MB | pgvector + paradedb + timescale + postgis + pg_cron + pg_ivm + pg_partman + hypopg + pg_qualstats + citus |
Each extension is compiled against its PostgreSQL version — no _pg15 / _pg16 filename gymnastics.
Why Alpine
The underlying base is postgres:X-alpine (musl libc, Alpine Linux). Three reasons:
- Smaller attack surface — the stripped-down package set reduces CVE count relative to Debian slim.
- Multi-arch out of the box — amd64 and arm64 from the same build pipeline.
- ParadeDB Alpine support — upstream ships for glibc; we compile against musl with
RUSTFLAGS="-C target-feature=-crt-static"so the image stays Alpine-consistent.
A gotcha that cost us half a day: Appender batch inserts on TIMESTAMPTZ columns reject naive timestamps. Extensions that write through the Appender path need explicit tz handling. Worth knowing if you port a DuckDB ingestion pipeline to this image.
When to use which flavor
base — Run regular Postgres. You wouldn’t pick this image for that (official postgres is fine), but it’s the foundation every other flavor builds on.
vector — You have embeddings. Store them as vector(1536) (OpenAI), vector(768) (Sentence Transformers), etc. HNSW indexes are available out of the box:
CREATE EXTENSION vector;
CREATE TABLE docs (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536)
);
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
-- Nearest neighbors
SELECT id, content
FROM docs
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
analytics — You need BM25 full-text search (the algorithm Elasticsearch uses). ParadeDB wraps Tantivy as a Postgres index, giving you Elastic-quality ranking without running another system:
CREATE EXTENSION pg_search;
CREATE INDEX search_idx ON docs
USING bm25 (id, content)
WITH (key_field='id');
-- Ranked results
SELECT id, content, paradedb.score(id)
FROM docs
WHERE content @@@ 'alpine AND postgres'
ORDER BY paradedb.score(id) DESC;
timeseries — IoT, metrics, anything with a time dimension. TimescaleDB hypertables automatically partition by time; pg_partman manages the partition lifecycle.
spatial — PostGIS. You know if you need it.
distributed — Citus for horizontal sharding. Niche but solid when you need it.
full — All of the above. 242 MB compressed. For the 12-factor apps that want to experiment without choosing upfront.
vs stacking separate containers
Could you run postgres:18-alpine + pgvector/pgvector + paradedb/paradedb separately? Sure. But:
- One connection pool — a single
pgbouncerorpool.max=20in your app covers everything - Transactional consistency — you can write a document AND update its embedding AND update its BM25 index in one transaction
- One backup —
pg_dump,wal-g, pgBackRest, all work unchanged - One monitoring —
pg_stat_statementssees every query across features
The downside: you can’t independently scale the vector workload away from the transactional one. For most teams under 1 TB of data, that’s a non-issue.
Deployment
# docker-compose.yml
services:
db:
image: ghcr.io/oorabona/postgres:18-alpine-full
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?required}
POSTGRES_DB: app
# Required for extensions that need preload (timescaledb, pg_cron, citus)
POSTGRES_SHARED_PRELOAD_LIBRARIES: "timescaledb,pg_cron,pg_stat_statements"
volumes:
- pgdata:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
volumes:
pgdata:
After first boot, enable the extensions you want:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_search;
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS postgis;
Only the ones with shared_preload_libraries need a restart (timescaledb, pg_cron, citus). The others load on CREATE EXTENSION.
Upgrades
When a new minor version of a preloaded extension ships, the running database may refuse to start (“extension version mismatch”). Fix:
ALTER EXTENSION timescaledb UPDATE;
ALTER EXTENSION pg_cron UPDATE;
The image’s daily CI picks up upstream releases; after a docker pull, run the ALTER EXTENSION UPDATE for each preloaded extension. Tracked and tested in the build pipeline — no surprises from upstream tag changes.
Gotchas
TIMESTAMPTZ+ Appender — if you ingest via the binaryCOPY/Appender path (a common DuckDB → Postgres pattern), timestamps must carry timezone metadata. Naive timestamps are rejected.chmoderrors on startup — ignorechmod: /var/lib/postgresql/X/docker: Operation not permittedwhen using podman or rootless docker. The initdb script still succeeds.- Writable tmpfs for temp files — under
read_only: true, mount/tmpand/var/run/postgresqlas tmpfs, otherwise WAL writes fail.
Automated upstream tracking
Every extension version is pinned in postgres/extensions/config.yaml and bumped by CI when upstream releases:
- pgvector (pgvector/pgvector) via GitHub releases
- paradedb (paradedb/paradedb) via GitHub releases
- timescaledb (timescale/timescaledb) via GitHub releases
- postgis, citus, pg_cron, pg_ivm, pg_partman, hypopg, pg_qualstats — same
Minor/patch bumps auto-merge; majors wait for manual review. Each build produces an SPDX SBOM with all bundled extensions and their versions — handy when you need to prove which CVE you’re running or aren’t.
TL;DR
# Just pgvector
docker pull ghcr.io/oorabona/postgres:18-alpine-vector # 214 MB
# Everything
docker pull ghcr.io/oorabona/postgres:18-alpine-full # 242 MB
Live status and all variants: container dashboard.
If this saved you from setting up three systems, drop a ⭐. It’s how we learn that the 29 000 monthly pulls aren’t all CI robots.