System Overview
High-level architecture of the Archipelag.io distributed compute network
System Overview
Archipelag.io is built as a distributed system with three main components: the Coordinator (control plane), Islands (running the Island software), and the Message Fabric (NATS) connecting them.
Repository Structure
archipelag-io/
├── app/ # Coordinator (Phoenix LiveView)
├── node-agent/ # Island software (Rust)
├── proto-contracts/ # Protobuf definitions
├── workload-containers/ # Docker images for Cargos
├── website/ # Marketing site (Zola)
├── docs/ # This documentation
└── infra/ # Docker Compose, Terraform
Component Diagram
┌───────────────┐
│ website │ (public)
└──────┬────────┘
│ links/docs/API
┌──────────────────┐ publishes │
│ proto-contracts │◄────────────────────────┘
│ (gRPC/Protobuf) │
└──────┬───────────┘
│ versioned schema (tags)
│
│ consumes consumes
┌──────▼───────────┐ ┌─────────────────┐ ┌─────────────────┐
│ app/ │ │ node-agent │ │ sdk-js/python │
│ (Phoenix/Elixir) │ │ (Rust) │ │ (client APIs) │
└───┬──────────────┘ └───┬─────────────┘ └────────┬────────┘
│ uses images │ pulls images │
│ and manifests │ │ used by
│ │ │ websites/apps
┌───▼───────────────────────▼───┐
│ workload-containers │ (OCI images: vLLM, SD, etc.)
└───────────────────────────────┘
Runtime Data Flow
[ End User ]
│ HTTP/WS
▼
┌───────────────┐
│ Coordinator │ (app/)
│ - LiveView UI │
│ - Oban jobs │
│ - Placement │
└─┬───────────┬─┘
│ │ gRPC (schemas from proto-contracts)
│ ▼
│ ┌──────────────┐ telemetry/metrics
│ │ NATS/JetStrm │◄──────────────┐
│ └──────┬───────┘ │
│ │ jobs/heartbeats │
│ ▼ │
│ ┌───────────────┐ pulls OCI │
│ │ node-agent │──────────────┘
│ │ (on Island) │ images from workload-containers
│ └──────┬────────┘
│ │ containerd/NVIDIA
│ ▼
│ ┌───────────────┐
│ │ Cargos │ (vLLM / SD / FFmpeg / WASM)
│ └───────────────┘
│
│ PostgreSQL ⇦ coordinator owns schema & billing
│ ──────────────────────────────────────────────
└─► Prometheus/Grafana for observability
Core Components
Coordinator (app/)
The coordinator is the control plane, built with Phoenix LiveView.
Responsibilities:
- Web UI and API
- User authentication and authorization
- Job placement and dispatch
- Island health monitoring
- Billing and credit management
- Background jobs via Oban
- Database migrations
/api/v1/health/api/v1/chat/completionsRequiredIsland Software (node-agent/)
The Island software is a Rust binary that runs on Islands.
Responsibilities:
- Hardware capability detection (GPU, CPU, memory)
- Heartbeat and health reporting
- Container image management
- Cargo execution and resource isolation
- Output streaming back to coordinator
- Resource metering for billing
Message Fabric (NATS)
NATS with JetStream provides the messaging layer.
Key Features:
- Durable message delivery for job dispatch
- Subject-based routing for Island targeting
- Streaming for real-time output
- Automatic reconnection handling
Cargo Containers
Pre-built, signed Docker images for specific Cargos.
Current Cargos:
| Cargo | Image | Description |
|---|---|---|
llm-chat | ghcr.io/archipelag-io/llm-chat:v1 | LLM inference with llama.cpp |
image-gen | ghcr.io/archipelag-io/image-gen:v1 | Image generation with SD |
Technology Stack
| Component | Technology | Rationale |
|---|---|---|
| Coordinator | Phoenix LiveView (Elixir) | Real-time native, fault-tolerant |
| Job Queue | Oban (Elixir) | Persistent, reliable |
| Message Fabric | NATS JetStream | Durable, fast, reconnection |
| Island Software | Rust | Memory-safe, single binary |
| Database | PostgreSQL | Reliable, Ecto migrations |
| Container Runtime | Docker/containerd | Ubiquitous, GPU support |
| Cargo Protocol | gRPC (Protobuf) | Typed contracts, streaming |
Design Principles
Next Steps
Island
How the Island software manages Cargos.
Cargos
Container specifications and execution model.
{% end %}
