Multi-Island Compute
How Archipelag.io distributes work across multiple Islands — batch fan-out, pipeline parallelism (inference rings), expert routing, and speculative decoding
Multi-Island Compute
Archipelag.io can combine multiple Islands into unified compute groups, allowing Cargos that exceed any single Island’s capacity. A 70B LLM can run across four laptops. A thousand images can be captioned in parallel across fifty phones.
This page covers the distribution strategies and their current status.
Overview
The standard job flow is 1 job → 1 Island. Multi-Island compute extends this to 1 job → N Islands, with the coordinator handling splitting, orchestration, and result merging.
Standard: Consumer → Coordinator → Island → Result
Distributed: Consumer → Coordinator ─┬→ Island A → Result A ─┐
├→ Island B → Result B ──┤→ Merge → Consumer
└→ Island C → Result C ─┘
Four distribution strategies target different workload types:
| Strategy | Use Case | Inter-Island Traffic | Status |
|---|---|---|---|
| Batch fan-out | Embarrassingly parallel (image batch, embeddings) | None | Beta |
| Pipeline parallel | Large models that don’t fit on one device | High (activations per layer) | Experimental |
| Expert parallel | MoE models (Mixtral, etc.) | Medium (active expert outputs) | Experimental |
| Speculative decoding | Accelerating autoregressive generation | Low (draft tokens) | Experimental |
Batch Fan-Out (Data Parallelism)
Batch fan-out splits a list of independent inputs across multiple Islands and merges the results when all children complete. There is zero inter-Island communication — each child job runs independently.
How It Works
┌─────────────────┐
│Consumer submits │
│batch of inputs │
└────────┬────────┘
│
┌────────▼────────┐
│Coordinator makes │
│parent job + N │
│child jobs │
└──┬─────┬─────┬──┘
│ │ │
┌────────┘ │ └────────┐
│ │ │
▼ ▼ ▼
Child 0 Child 1 Child 2
→ Island A → Island B → Island C
│ │ │
▼ ▼ ▼
Result 0 Result 1 Result 2
│ │ │
└──────┬───────┘ │
│ ┌───────────────────┘
│ │
┌──────▼──▼──────┐
│Merge by │
│batch_index │
└───────┬─────────┘
│
┌───────▼─────────┐
│Parent job │
│succeeds │
└─────────────────┘
- Consumer submits a batch via
POST /api/v1/jobs/batchwith a list of inputs - The coordinator creates a parent job (not dispatched to any Island) that tracks overall progress
- N child jobs are created, each with a single input and a
batch_index(0-indexed) - Each child is dispatched through the normal placement engine — children spread across different Islands for fault isolation
- As each child reaches a terminal state, the coordinator checks if the batch is complete
- When all children finish, results are merged by batch_index and stored on the parent job
Job Relationships
Parent Job (batch_config set, no host assigned)
├── Child Job 0 (batch_index: 0, parent_job_id: parent.id)
├── Child Job 1 (batch_index: 1, parent_job_id: parent.id)
├── Child Job 2 (batch_index: 2, parent_job_id: parent.id)
└── ...
The parent job has a batch_config field containing:
| Field | Description |
|---|---|
chunk_count | Total number of child jobs |
merge_strategy | How to combine results: "concat" or "flatten" |
fail_mode | What to do on failure: "best_effort" or "fail_fast" |
completed_count | Number of succeeded children (updated as they finish) |
failed_count | Number of failed children |
Merge Strategies
| Strategy | Behavior | Best For |
|---|---|---|
concat | Collects child outputs into an ordered list by batch_index | Most use cases — each child returns a single result |
flatten | Like concat, but flattens one level if children return lists | Children that return multiple items each |
Failure Modes
| Mode | Behavior |
|---|---|
best_effort (default) | Parent succeeds with partial results. Failed children are noted in the errors array of the output. |
fail_fast | Parent fails immediately when the first child fails. All remaining non-terminal children are cancelled. |
API
Submit a Batch
POST /api/v1/jobs/batch
{
"workload": "image-caption",
"inputs": [
{ "image_url": "https://example.com/photo1.jpg" },
{ "image_url": "https://example.com/photo2.jpg" },
{ "image_url": "https://example.com/photo3.jpg" }
],
"merge_strategy": "concat",
"fail_mode": "best_effort",
"max_parallelism": 10,
"region": "us-east"
}
| Field | Required | Default | Description |
|---|---|---|---|
workload | Yes | — | Cargo slug to run for each input |
inputs | Yes | — | Array of input objects (1–100 items, max 256KB each) |
merge_strategy | No | "concat" | How to combine child outputs |
fail_mode | No | "best_effort" | How to handle child failures |
max_parallelism | No | unlimited | Max concurrent children (reserved for future use) |
region | No | any | Preferred region for placement |
bid_price | No | — | Per-child bid price for market pricing |
Response (201 Created):
{
"id": "parent-job-uuid",
"state": "started",
"workload": "image-caption",
"batch": {
"chunk_count": 3,
"merge_strategy": "concat",
"fail_mode": "best_effort",
"completed": 0,
"failed": 0
},
"children": [
{ "id": "child-uuid-0", "batch_index": 0, "state": "submitted" },
{ "id": "child-uuid-1", "batch_index": 1, "state": "submitted" },
{ "id": "child-uuid-2", "batch_index": 2, "state": "submitted" }
],
"created_at": "2026-03-16T12:00:00Z"
}
Check Batch Progress
GET /api/v1/jobs/{parent_id}/batch-status
Response:
{
"parent_id": "parent-job-uuid",
"parent_state": "started",
"chunk_count": 3,
"merge_strategy": "concat",
"fail_mode": "best_effort",
"child_states": {
"succeeded": 2,
"started": 1
},
"children": [
{ "id": "child-0", "batch_index": 0, "state": "succeeded", "host_id": "island-a" },
{ "id": "child-1", "batch_index": 1, "state": "succeeded", "host_id": "island-b" },
{ "id": "child-2", "batch_index": 2, "state": "started", "host_id": "island-c" }
]
}
Completed Batch Output
When the parent job reaches "succeeded", its output contains:
{
"batch_results": [
{ "text": "A cat sitting on a windowsill" },
{ "text": "A sunset over the ocean" },
{ "text": "A group of people hiking" }
],
"total": 3,
"failed": 0
}
With best_effort and partial failures:
{
"batch_results": [
{ "text": "A cat sitting on a windowsill" },
null
],
"total": 3,
"succeeded": 2,
"failed": 1,
"errors": [
{ "batch_index": 1, "error": "OOM error" }
]
}
Billing
Each child job is billed individually through the existing per-job billing system. The parent job’s total cost equals the sum of all children’s clearing prices. Credits are checked upfront — the batch is rejected if the Consumer doesn’t have enough credits for the full batch at current pricing.
Real-Time Progress
Subscribe to the parent job’s PubSub channel (job:{parent_id}) to receive progress updates as children complete:
{
"state": "streaming",
"batch_progress": {
"completed": 7,
"failed": 0,
"total": 10
}
}
SSE streaming via GET /api/v1/jobs/{parent_id}/stream also works — you’ll receive state events as the batch progresses.
Limits
| Limit | Value |
|---|---|
| Max batch size | 100 inputs |
| Max input item size | 256 KB (serialized JSON) |
| Max total batch cost | Must not exceed Consumer’s credit balance |
Pipeline Parallelism (Inference Rings)
Pipeline parallelism shards a large model across N Islands in a sequential chain. Each Island holds a subset of layers. Tokens flow through the pipeline — Island 1 processes layers 0–15, sends activations to Island 2 (layers 16–31), and so on. This is the “run 70B across four laptops” feature.
How It Works
Consumer submits job (normal API — no special parameters)
│
▼
Coordinator detects: Cargo is pipeline-capable AND
no single Island has enough VRAM
│
▼
Form Island Group: find N Islands with highest Karma,
sufficient per-shard VRAM, same region preferred
│
▼
Start Ring Session (one per active pipeline job)
│
├──→ Island 1 (position 0, layers 0–10)
│ │ download shard, signal ready
│ │
├──→ Island 2 (position 1, layers 11–21)
│ │ download shard, signal ready
│ │
└──→ Island 3 (position 2, layers 22–31)
│ download shard, signal ready
│
All ready → Coordinator sends "start" + prompt to position 0
│
Island 1: embedding + layers 0–10 → activations
│
Island 2: layers 11–21 → activations
│
Island 3: layers 22–31 → final logits → tokens
│
Tokens stream back to Consumer (same as any other job)
Consumer Transparency
Pipeline execution is completely invisible to Consumers. You submit a job the same way you always do:
POST /api/v1/jobs
{
"workload": "llama-70b-chat",
"input": { "prompt": "Explain quantum computing" }
}
If the model needs pipeline execution, the coordinator handles everything automatically. The streaming response looks identical to a single-Island job. Billing works the same way — you pay the per-job price, and the coordinator splits earnings among participating Islands.
Island Groups
When the coordinator decides to use pipeline execution, it forms an Island Group — a set of Islands working together on a single Cargo.
Formation criteria:
- Each Island must have enough VRAM for its shard (not the full model)
- Islands must support the required runtime (e.g.,
llmcpp) - Islands must be online, approved, and not in cooldown
- Higher-Karma Islands are preferred — a pipeline is only as reliable as its weakest link
- Same-region Islands are preferred for lower latency
Group lifecycle:
| Status | Meaning |
|---|---|
forming | Group created, members downloading shards |
active | All members ready, accepting jobs |
degraded | A member failed — group can’t serve new jobs |
disbanded | Group torn down (timeout, manual, or error) |
Active groups are reused across multiple jobs — Islands keep their shards loaded in memory, so subsequent requests skip the download step and start immediately.
Shard Manifests
For a Cargo to support pipeline execution, it must declare a shard manifest — metadata describing how the model can be split:
{
"distribution_strategies": ["single", "pipeline"],
"shard_manifest": {
"total_layers": 80,
"min_shards": 2,
"max_shards": 8,
"shard_urls": {
"0": "https://cdn.example.com/llama-70b-shard-0.gguf",
"1": "https://cdn.example.com/llama-70b-shard-1.gguf",
"2": "https://cdn.example.com/llama-70b-shard-2.gguf",
"3": "https://cdn.example.com/llama-70b-shard-3.gguf"
}
}
}
| Field | Description |
|---|---|
total_layers | Total number of transformer layers in the full model |
min_shards | Minimum Islands needed (fewer shards = more VRAM per Island) |
max_shards | Maximum Islands supported |
shard_urls | Pre-split GGUF files, one per shard position |
The "single" strategy is always included as a fallback — when a single Island has enough VRAM, the model runs normally without any pipeline overhead.
Preparing a Model for Pipeline Execution
Cargo publishers use the built-in shard splitting tool to prepare models for pipeline execution:
island --split-model llama-70b-Q4_K_M.gguf --shards 4 --output-dir ./shards --layer-aware
The --layer-aware flag parses the GGUF binary format, identifies each tensor’s layer from its name (blk.0.attn_q.weight, blk.1.ffn_gate.weight, etc.), and produces valid sub-GGUF files where each shard contains only its layer range’s tensors. Embedding tensors go in the first shard; the output head goes in the last. Layer tensors are renumbered (blk.16.* → blk.0.*) and the model’s block count is updated, so each shard is a valid standalone model that llama.cpp can load directly.
This produces 4 shard files and a shard_manifest.json with per-shard layer ranges and SHA256 hashes. Upload the shards to your CDN, update the URLs in the manifest, and set it as the Cargo’s shard_manifest field.
Automatic Pipeline Detection
The coordinator automatically decides whether to use pipeline execution based on two conditions:
- The Cargo supports it —
distribution_strategiesincludes"pipeline"andshard_manifestis present - No single Island can handle it — the maximum VRAM of any online Island is less than the Cargo’s
required_vram_mb
If either condition is false, the job dispatches normally to a single Island.
Fault Tolerance
If any Island in the pipeline fails during inference, the entire pipeline fails and the job is retried from scratch. This is the simplest and most robust approach — re-sharding mid-inference is complex, and for most workloads the retry cost is acceptable.
Specifically:
- If a member fails during the
formingphase, the group is disbanded and the job is re-queued - If a member fails during active inference, a stop signal is sent to all members, the job fails, and it retries if attempts remain
- If the session times out (120 seconds with no progress), the group is disbanded
Group Lifecycle & Idle Timeout
Active groups are reused across multiple sequential jobs — Islands keep their model shards cached in memory, so the next request for the same Cargo starts immediately without re-downloading.
Groups are automatically disbanded after 5 minutes of inactivity (no new jobs). This frees Islands to participate in other groups or serve single-Island jobs. The idle timeout is monitored by a background process that checks every 60 seconds.
Activation Transport
Tokens and activations flow between pipeline stages through a pluggable transport layer with two backends:
| Transport | Status | Latency | Best For |
|---|---|---|---|
| NATS | Default | ~1–2ms/hop | All pipelines — simple, reliable, no extra config |
| QUIC with relay | Recommended | ~0.1ms or ~1ms | Production pipelines — tries direct QUIC, falls back to NATS |
| QUIC | Available | ~0.1–0.5ms | Co-located Islands on LAN/VPN where latency is guaranteed |
The coordinator automatically enables the QUIC with relay mode when Islands report a public address. Each Island discovers its public IP via STUN (a lightweight NAT traversal protocol) and reports it in every heartbeat. The coordinator includes the next pipeline member’s address in each Island’s configuration, allowing direct QUIC connections with ephemeral self-signed TLS certificates. If QUIC fails (firewall, symmetric NAT), the transport falls back to NATS seamlessly — no interruption, no error visible to the Consumer.
Position 0 (the first Island in the chain) supports microbatching — collecting multiple tokens before sending each activation message. This reduces message overhead for high-throughput pipelines. Microbatch size is configurable per job (default: 1 token = real-time streaming).
Billing
Pipeline jobs are billed at the same per-job rate as single-Island jobs from the Consumer’s perspective — you pay the clearing price (or the Cargo’s default price) once per job, regardless of how many Islands participate.
The coordinator splits the earned credits among participating Islands proportional to the layers each Island processes. For example, in a 32-layer model split across two Islands:
| Island | Layers | Share | Payout (10 credit job) |
|---|---|---|---|
| Island A | 0–15 (16 layers) | 50% | 5.00 credits |
| Island B | 16–31 (16 layers) | 50% | 5.00 credits |
With an unequal split (e.g., one Island has more VRAM and takes more layers):
| Island | Layers | Share | Payout (12 credit job) |
|---|---|---|---|
| Island A | 0–7 (8 layers) | 25% | 3.00 credits |
| Island B | 8–31 (24 layers) | 75% | 9.00 credits |
Each Island’s earnings are credited immediately on job completion and count toward their payout balance.
Expert Routing (MoE Parallelism)
For Mixture-of-Experts models (like Mixtral), only a subset of experts are activated per token. Each Island loads a subset of experts instead of the full model. A Router Island runs the gating network, determines which experts should process each token, dispatches work, and combines the results.
Consumer submits job (normal API — no special parameters)
│
▼
Coordinator detects: Cargo is expert-capable
│
▼
Form Expert Group: 1 Router Island (highest Karma)
+ N Expert Islands (each loads a subset of experts)
│
▼
Start Expert Session
│
├──→ Router Island (position 0, gating network)
│ │ download router model, signal ready
│ │
├──→ Expert Island A (position 1, experts [0-3])
│ │ download expert shards, signal ready
│ │
└──→ Expert Island B (position 2, experts [4-7])
│ download expert shards, signal ready
│
All ready → Coordinator sends "start" + prompt to Router
│
Router: gating network → select top-K experts per token
│
Dispatch tokens to Expert Islands via NATS
│
Expert Islands process tokens, return results
│
Router: combine expert outputs → final tokens
│
Tokens stream back to Consumer
Consumer Transparency
Like pipeline execution, expert routing is completely invisible to Consumers. The same API, same streaming, same billing. The coordinator decides to use expert routing when the Cargo’s distribution_strategies includes "expert".
Expert Manifests
For a Cargo to support expert routing, it must declare an expert manifest in its shard_manifest:
{
"distribution_strategies": ["single", "expert"],
"shard_manifest": {
"total_experts": 8,
"active_experts": 2,
"min_expert_islands": 2,
"max_expert_islands": 4,
"router_url": "https://cdn.example.com/mixtral-router.gguf",
"expert_urls": {
"0": "https://cdn.example.com/expert-0.gguf",
"1": "https://cdn.example.com/expert-1.gguf",
"2": "https://cdn.example.com/expert-2.gguf",
"3": "https://cdn.example.com/expert-3.gguf",
"4": "https://cdn.example.com/expert-4.gguf",
"5": "https://cdn.example.com/expert-5.gguf",
"6": "https://cdn.example.com/expert-6.gguf",
"7": "https://cdn.example.com/expert-7.gguf"
}
}
}
| Field | Description |
|---|---|
total_experts | Number of experts in the model (e.g., 8 for Mixtral) |
active_experts | Experts activated per token — the top-K value (e.g., 2) |
min_expert_islands | Minimum expert Islands needed (excluding router) |
max_expert_islands | Maximum expert Islands supported |
router_url | GGUF model for the gating network / routing |
expert_urls | Map of expert_id → download URL for each expert shard |
Expert Group Formation
The coordinator forms an expert group by selecting:
- The highest-scoring Island as the Router (position 0) — it handles every token, so reliability and latency matter most
- N Expert Islands (positions 1..N) — each assigned a subset of expert IDs
Expert IDs are distributed round-robin: for 8 experts across 2 Islands, Island A gets experts [0,1,2,3] and Island B gets [4,5,6,7].
Formation uses the same scoring as pipeline groups: Karma (40%), region affinity (30%), and NATS RTT (30%).
Fault Tolerance
If any Island in the expert group fails during inference, the entire group fails and the job is retried. Expert failover (re-routing tokens to a different Island’s copy of the same expert) is planned for a future version.
Billing
Expert jobs are billed at the same per-job rate as single-Island jobs. The coordinator splits earnings:
| Role | Share | Rationale |
|---|---|---|
| Router | 20% | Processes every token (gating + combination) |
| Each Expert Island | 80% ÷ N | Processes only routed tokens |
Example for a 10.00 credit job with 1 router + 2 expert Islands:
- Router: 2.00 credits
- Expert A: 4.00 credits
- Expert B: 4.00 credits
Expert Replication & Load Balancing
Popular experts (those most frequently activated by the gating network) can be replicated across multiple Islands for load balancing. When replicate_popular_experts is enabled in the Cargo manifest, each expert Island also loads a copy of expert 0 (typically the most activated).
The coordinator tracks tokens in-flight per expert Island and uses capacity-aware routing — when multiple Islands can serve the same expert, the router dispatches to the one with the lowest load. This prevents any single Island from becoming a bottleneck.
The router supports multiple gating strategies for expert selection:
- Hash-based (default): deterministic routing via consistent hashing — no model needed
- Embedding-based: routes tokens to experts whose embedding centroids are most similar — learned from training data
- Native MoE gating: uses actual gating layer weights when available — highest accuracy
- Round-robin: sequential assignment for load testing
Islands that already have expert weights cached get a warmth bonus during group formation, avoiding cold starts when experts are reassigned.
Bandwidth Efficiency
Expert routing is more bandwidth-efficient than pipeline parallelism because only active expert outputs travel between Islands — not full activation tensors. With top-2 routing on an 8-expert model, only 25% of expert outputs cross the network per token.
Speculative Decoding Pairs
Speculative decoding pairs a fast Island (small draft model) with a powerful Island (large verifier model) to accelerate autoregressive generation by 2–3x.
Consumer submits job (normal API — no special parameters)
│
▼
Coordinator detects: Cargo is speculative-capable
│
▼
Form pair: Draft Island (TinyLlama 1B, fast)
Verify Island (Llama 70B, accurate)
│
├── Draft generates K tokens quickly (K=5)
│ ↓
├── Verify checks all K in one forward pass
│ ↓
├── Accepts matching prefix + first corrected token
│ ↓
├── Accepted tokens stream to Consumer
│ ↓
└── Draft continues from accepted point
...repeat until done...
How It Works
- The Draft Island generates K candidate tokens autoregressively (K=4–8) using a small, fast model
- All K tokens are sent to the Verify Island
- The Verify Island runs a single forward pass on all K tokens in parallel (same cost as 1 token)
- Accepts tokens that match (within a configurable threshold), rejects divergent ones
- Returns the accepted prefix + first corrected token
- The Draft Island continues from the accepted point
This is transparent to the Consumer — they just see faster token output. The speedup comes from the draft model being 5–10x faster per token than the verifier: it generates K tokens in the time the verifier processes 1.
Speculative Manifest
For a Cargo to support speculative decoding, it must declare a speculative manifest:
{
"distribution_strategies": ["single", "speculative"],
"shard_manifest": {
"draft_model_url": "https://cdn.example.com/tinyllama-1b.gguf",
"verify_model_url": "https://cdn.example.com/llama-70b.gguf",
"draft_tokens": 5,
"acceptance_threshold": 0.9
}
}
| Field | Description |
|---|---|
draft_model_url | Small, fast model for generating candidate tokens |
verify_model_url | Large, accurate model for verification |
draft_tokens | K — number of tokens per draft round (default: 5) |
acceptance_threshold | Log-prob match threshold for acceptance (default: 0.9) |
Pair Formation
The coordinator selects:
- Verify Island: the highest-VRAM candidate that can run the target model — accuracy is priority
- Draft Island: the best remaining candidate — any VRAM is fine since the draft model is small
Both Islands are scored by Karma, region affinity, and NATS RTT. Low RTT between the pair is critical because draft tokens must travel to the verifier quickly.
Multi-Draft Mode
For maximum throughput, speculative decoding supports multiple draft Islands generating candidates in parallel. Set draft_count in the Cargo manifest to use N drafts + 1 verifier. Each draft independently generates K tokens per round, and the verifier picks the best batch — the one with the highest acceptance rate. This “best-of-N” selection ensures the verifier always uses the highest-quality draft output.
Adaptive Draft Size
The number of draft tokens (K) is adjusted dynamically based on the acceptance rate:
- High acceptance (>80%): K increases (up to 12) — draft and verifier agree well, generate more tokens per round
- Low acceptance (<50%): K decreases (down to 2) — draft diverges too much, fewer tokens per round
- Moderate (50–80%): K stays the same
This automatic tuning maximizes throughput without requiring manual configuration per model pair.
Billing
| Role | Share | Rationale |
|---|---|---|
| Draft Island | 30% | Generates most tokens (fast, cheap model) |
| Verify Island | 70% | Runs the expensive target model |
Example for a 10.00 credit job:
- Draft Island: 3.00 credits
- Verify Island: 7.00 credits
Shared Infrastructure
All multi-Island strategies share common infrastructure:
Island Groups
The island_groups system tracks groups of Islands working together:
| Field | Purpose |
|---|---|
topology | "pipeline", "expert", or "speculative" |
status | "forming" → "active" → "degraded" / "disbanded" |
workload_id | Which Cargo this group runs |
members | Ordered list of Islands with position and shard assignments |
Groups are reusable — an active group can serve multiple sequential jobs without re-forming. Groups are automatically disbanded after 5 minutes of inactivity (configurable) or when a member goes offline.
Placement Engine Extensions
The placement engine is extended with multi-dimensional scoring for pipeline member selection:
| Dimension | Weight | What It Measures |
|---|---|---|
| Karma | 40% | Island reliability — higher karma = fewer pipeline failures |
| Region affinity | 30% | Geographic proximity — same-region Islands have lower inter-hop latency |
| NATS RTT | 30% | Measured network latency — Islands report round-trip time in every heartbeat |
The coordinator scores all eligible Islands and picks the top N by composite score. Islands with lower measured latency are preferred because every millisecond of hop delay is multiplied by the number of pipeline stages.
Islands also support peer-to-peer RTT probes — each Island responds to latency probes from other Islands via NATS request/reply. The coordinator caches these measurements in a pairwise RTT cache (refreshed every 5 minutes) so formation decisions use real network conditions, not just geographic estimates.
Planned extensions:
- Anti-affinity for batch — spread batch children across different Islands for fault isolation
Billing
| Strategy | How Islands Earn | Status |
|---|---|---|
| Batch fan-out | Per child job (existing billing) | Beta |
| Pipeline parallel | Proportional to layers held (split on completion) | Experimental |
| Expert parallel | Router 20%, experts split 80% equally | Experimental |
| Speculative decoding | Draft 30%, verify 70% | Experimental |
Cargo Metadata
Cargos declare their distribution capabilities via the distribution_strategies field and strategy-specific metadata:
{
"distribution_strategies": ["single", "pipeline", "batch"],
"shard_manifest": {
"total_layers": 32,
"min_shards": 2,
"max_shards": 4,
"shard_urls": { ... }
}
}
Observability
Telemetry events for multi-Island compute:
ring:formed/completed/failed— pipeline group lifecycleexpert:formed/completed/failed— expert group lifecyclespeculative:formed/completed/failed— speculative pair lifecycle- Batch completion progress (existing
job:completedevents per child)
All distributed jobs carry the same job_id correlation ID through every hop, enabling end-to-end tracing across Islands.
