Multi-Island Compute

How Archipelag.io distributes work across multiple Islands — batch fan-out, pipeline parallelism (inference rings), expert routing, and speculative decoding

Multi-Island Compute

Archipelag.io can combine multiple Islands into unified compute groups, allowing Cargos that exceed any single Island’s capacity. A 70B LLM can run across four laptops. A thousand images can be captioned in parallel across fifty phones.

Experimental

Multi-Island compute is in active development. The API endpoints exist and the code is functional, but these features have not been verified in production with real multi-Island workloads. Expect breaking changes during beta. Single-Island job dispatch (the standard flow) is stable and production-ready.

This page covers the distribution strategies and their current status.

Overview

The standard job flow is 1 job → 1 Island. Multi-Island compute extends this to 1 job → N Islands, with the coordinator handling splitting, orchestration, and result merging.

Standard:     Consumer → Coordinator → Island → Result

Distributed:  Consumer → Coordinator ─┬→ Island A → Result A ─┐
                                      ├→ Island B → Result B ──┤→ Merge → Consumer
                                      └→ Island C → Result C ─┘

Four distribution strategies target different workload types:

Strategy	Use Case	Inter-Island Traffic	Status
Batch fan-out	Embarrassingly parallel (image batch, embeddings)	None	Beta
Pipeline parallel	Large models that don’t fit on one device	High (activations per layer)	Experimental
Expert parallel	MoE models (Mixtral, etc.)	Medium (active expert outputs)	Experimental
Speculative decoding	Accelerating autoregressive generation	Low (draft tokens)	Experimental

Batch Fan-Out (Data Parallelism)

Beta

Batch fan-out is implemented and available via the API. No changes needed on the Island side — it uses the existing single-job dispatch infrastructure. This feature is in beta and may have rough edges.

Batch fan-out splits a list of independent inputs across multiple Islands and merges the results when all children complete. There is zero inter-Island communication — each child job runs independently.

How It Works

          ┌─────────────────┐
          │Consumer submits  │
          │batch of inputs   │
          └────────┬────────┘
                   │
          ┌────────▼────────┐
          │Coordinator makes │
          │parent job + N    │
          │child jobs        │
          └──┬─────┬─────┬──┘
             │     │     │
    ┌────────┘     │     └────────┐
    │              │              │
    ▼              ▼              ▼
 Child 0        Child 1       Child 2
 → Island A     → Island B    → Island C
    │              │              │
    ▼              ▼              ▼
 Result 0       Result 1      Result 2
    │              │              │
    └──────┬───────┘              │
           │  ┌───────────────────┘
           │  │
    ┌──────▼──▼──────┐
    │Merge by         │
    │batch_index      │
    └───────┬─────────┘
            │
    ┌───────▼─────────┐
    │Parent job        │
    │succeeds          │
    └─────────────────┘

Consumer submits a batch via POST /api/v1/jobs/batch with a list of inputs
The coordinator creates a parent job (not dispatched to any Island) that tracks overall progress
N child jobs are created, each with a single input and a batch_index (0-indexed)
Each child is dispatched through the normal placement engine — children spread across different Islands for fault isolation
As each child reaches a terminal state, the coordinator checks if the batch is complete
When all children finish, results are merged by batch_index and stored on the parent job

Job Relationships

Parent Job (batch_config set, no host assigned)
├── Child Job 0  (batch_index: 0, parent_job_id: parent.id)
├── Child Job 1  (batch_index: 1, parent_job_id: parent.id)
├── Child Job 2  (batch_index: 2, parent_job_id: parent.id)
└── ...

The parent job has a batch_config field containing:

Field	Description
`chunk_count`	Total number of child jobs
`merge_strategy`	How to combine results: `"concat"` or `"flatten"`
`fail_mode`	What to do on failure: `"best_effort"` or `"fail_fast"`
`completed_count`	Number of succeeded children (updated as they finish)
`failed_count`	Number of failed children

Merge Strategies

Strategy	Behavior	Best For
`concat`	Collects child outputs into an ordered list by `batch_index`	Most use cases — each child returns a single result
`flatten`	Like concat, but flattens one level if children return lists	Children that return multiple items each

Failure Modes

Mode	Behavior
`best_effort` (default)	Parent succeeds with partial results. Failed children are noted in the `errors` array of the output.
`fail_fast`	Parent fails immediately when the first child fails. All remaining non-terminal children are cancelled.

API

Submit a Batch

POST /api/v1/jobs/batch

{
  "workload": "image-caption",
  "inputs": [
    { "image_url": "https://example.com/photo1.jpg" },
    { "image_url": "https://example.com/photo2.jpg" },
    { "image_url": "https://example.com/photo3.jpg" }
  ],
  "merge_strategy": "concat",
  "fail_mode": "best_effort",
  "max_parallelism": 10,
  "region": "us-east"
}

Field	Required	Default	Description
`workload`	Yes	—	Cargo slug to run for each input
`inputs`	Yes	—	Array of input objects (1–100 items, max 256KB each)
`merge_strategy`	No	`"concat"`	How to combine child outputs
`fail_mode`	No	`"best_effort"`	How to handle child failures
`max_parallelism`	No	unlimited	Max concurrent children (reserved for future use)
`region`	No	any	Preferred region for placement
`bid_price`	No	—	Per-child bid price for market pricing

Response (201 Created):

{
  "id": "parent-job-uuid",
  "state": "started",
  "workload": "image-caption",
  "batch": {
    "chunk_count": 3,
    "merge_strategy": "concat",
    "fail_mode": "best_effort",
    "completed": 0,
    "failed": 0
  },
  "children": [
    { "id": "child-uuid-0", "batch_index": 0, "state": "submitted" },
    { "id": "child-uuid-1", "batch_index": 1, "state": "submitted" },
    { "id": "child-uuid-2", "batch_index": 2, "state": "submitted" }
  ],
  "created_at": "2026-03-16T12:00:00Z"
}

Check Batch Progress

GET /api/v1/jobs/{parent_id}/batch-status

Response:

{
  "parent_id": "parent-job-uuid",
  "parent_state": "started",
  "chunk_count": 3,
  "merge_strategy": "concat",
  "fail_mode": "best_effort",
  "child_states": {
    "succeeded": 2,
    "started": 1
  },
  "children": [
    { "id": "child-0", "batch_index": 0, "state": "succeeded", "host_id": "island-a" },
    { "id": "child-1", "batch_index": 1, "state": "succeeded", "host_id": "island-b" },
    { "id": "child-2", "batch_index": 2, "state": "started", "host_id": "island-c" }
  ]
}

Completed Batch Output

When the parent job reaches "succeeded", its output contains:

{
  "batch_results": [
    { "text": "A cat sitting on a windowsill" },
    { "text": "A sunset over the ocean" },
    { "text": "A group of people hiking" }
  ],
  "total": 3,
  "failed": 0
}

With best_effort and partial failures:

{
  "batch_results": [
    { "text": "A cat sitting on a windowsill" },
    null
  ],
  "total": 3,
  "succeeded": 2,
  "failed": 1,
  "errors": [
    { "batch_index": 1, "error": "OOM error" }
  ]
}

Billing

Each child job is billed individually through the existing per-job billing system. The parent job’s total cost equals the sum of all children’s clearing prices. Credits are checked upfront — the batch is rejected if the Consumer doesn’t have enough credits for the full batch at current pricing.

Real-Time Progress

Subscribe to the parent job’s PubSub channel (job:{parent_id}) to receive progress updates as children complete:

{
  "state": "streaming",
  "batch_progress": {
    "completed": 7,
    "failed": 0,
    "total": 10
  }
}

SSE streaming via GET /api/v1/jobs/{parent_id}/stream also works — you’ll receive state events as the batch progresses.

Limits

Limit	Value
Max batch size	100 inputs
Max input item size	256 KB (serialized JSON)
Max total batch cost	Must not exceed Consumer’s credit balance

Pipeline Parallelism (Inference Rings)

Experimental

Pipeline parallelism code exists but has not been verified with real multi-Island workloads in production. The API surface is subject to change during beta.

Pipeline parallelism shards a large model across N Islands in a sequential chain. Each Island holds a subset of layers. Tokens flow through the pipeline — Island 1 processes layers 0–15, sends activations to Island 2 (layers 16–31), and so on. This is the “run 70B across four laptops” feature.

Prior art: Petals, Exo.

How It Works

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is pipeline-capable AND
                     no single Island has enough VRAM
       │
       ▼
Form Island Group: find N Islands with highest Karma,
                   sufficient per-shard VRAM, same region preferred
       │
       ▼
Start Ring Session (one per active pipeline job)
       │
       ├──→ Island 1 (position 0, layers 0–10)
       │         │ download shard, signal ready
       │         │
       ├──→ Island 2 (position 1, layers 11–21)
       │         │ download shard, signal ready
       │         │
       └──→ Island 3 (position 2, layers 22–31)
                 │ download shard, signal ready
                 │
All ready → Coordinator sends "start" + prompt to position 0
                 │
       Island 1: embedding + layers 0–10 → activations
                 │
       Island 2: layers 11–21 → activations
                 │
       Island 3: layers 22–31 → final logits → tokens
                 │
       Tokens stream back to Consumer (same as any other job)

Consumer Transparency

Pipeline execution is completely invisible to Consumers. You submit a job the same way you always do:

POST /api/v1/jobs
{
  "workload": "llama-70b-chat",
  "input": { "prompt": "Explain quantum computing" }
}

If the model needs pipeline execution, the coordinator handles everything automatically. The streaming response looks identical to a single-Island job. Billing works the same way — you pay the per-job price, and the coordinator splits earnings among participating Islands.

Island Groups

When the coordinator decides to use pipeline execution, it forms an Island Group — a set of Islands working together on a single Cargo.

Formation criteria:

Each Island must have enough VRAM for its shard (not the full model)
Islands must support the required runtime (e.g., llmcpp)
Islands must be online, approved, and not in cooldown
Higher-Karma Islands are preferred — a pipeline is only as reliable as its weakest link
Same-region Islands are preferred for lower latency

Group lifecycle:

Status	Meaning
`forming`	Group created, members downloading shards
`active`	All members ready, accepting jobs
`degraded`	A member failed — group can’t serve new jobs
`disbanded`	Group torn down (timeout, manual, or error)

Active groups are reused across multiple jobs — Islands keep their shards loaded in memory, so subsequent requests skip the download step and start immediately.

Shard Manifests

For a Cargo to support pipeline execution, it must declare a shard manifest — metadata describing how the model can be split:

{
  "distribution_strategies": ["single", "pipeline"],
  "shard_manifest": {
    "total_layers": 80,
    "min_shards": 2,
    "max_shards": 8,
    "shard_urls": {
      "0": "https://cdn.example.com/llama-70b-shard-0.gguf",
      "1": "https://cdn.example.com/llama-70b-shard-1.gguf",
      "2": "https://cdn.example.com/llama-70b-shard-2.gguf",
      "3": "https://cdn.example.com/llama-70b-shard-3.gguf"
    }
  }
}

Field	Description
`total_layers`	Total number of transformer layers in the full model
`min_shards`	Minimum Islands needed (fewer shards = more VRAM per Island)
`max_shards`	Maximum Islands supported
`shard_urls`	Pre-split GGUF files, one per shard position

The "single" strategy is always included as a fallback — when a single Island has enough VRAM, the model runs normally without any pipeline overhead.

Preparing a Model for Pipeline Execution

Cargo publishers use the built-in shard splitting tool to prepare models for pipeline execution:

island --split-model llama-70b-Q4_K_M.gguf --shards 4 --output-dir ./shards --layer-aware

The --layer-aware flag parses the GGUF binary format, identifies each tensor’s layer from its name (blk.0.attn_q.weight, blk.1.ffn_gate.weight, etc.), and produces valid sub-GGUF files where each shard contains only its layer range’s tensors. Embedding tensors go in the first shard; the output head goes in the last. Layer tensors are renumbered (blk.16.* → blk.0.*) and the model’s block count is updated, so each shard is a valid standalone model that llama.cpp can load directly.

This produces 4 shard files and a shard_manifest.json with per-shard layer ranges and SHA256 hashes. Upload the shards to your CDN, update the URLs in the manifest, and set it as the Cargo’s shard_manifest field.

Automatic Pipeline Detection

The coordinator automatically decides whether to use pipeline execution based on two conditions:

The Cargo supports it — distribution_strategies includes "pipeline" and shard_manifest is present
No single Island can handle it — the maximum VRAM of any online Island is less than the Cargo’s required_vram_mb

If either condition is false, the job dispatches normally to a single Island.

Fault Tolerance

If any Island in the pipeline fails during inference, the entire pipeline fails and the job is retried from scratch. This is the simplest and most robust approach — re-sharding mid-inference is complex, and for most workloads the retry cost is acceptable.

Specifically:

If a member fails during the forming phase, the group is disbanded and the job is re-queued
If a member fails during active inference, a stop signal is sent to all members, the job fails, and it retries if attempts remain
If the session times out (120 seconds with no progress), the group is disbanded

Group Lifecycle & Idle Timeout

Active groups are reused across multiple sequential jobs — Islands keep their model shards cached in memory, so the next request for the same Cargo starts immediately without re-downloading.

Groups are automatically disbanded after 5 minutes of inactivity (no new jobs). This frees Islands to participate in other groups or serve single-Island jobs. The idle timeout is monitored by a background process that checks every 60 seconds.

Activation Transport

Tokens and activations flow between pipeline stages through a pluggable transport layer with two backends:

Transport	Status	Latency	Best For
NATS	Default	~1–2ms/hop	All pipelines — simple, reliable, no extra config
QUIC with relay	Recommended	~0.1ms or ~1ms	Production pipelines — tries direct QUIC, falls back to NATS
QUIC	Available	~0.1–0.5ms	Co-located Islands on LAN/VPN where latency is guaranteed

The coordinator automatically enables the QUIC with relay mode when Islands report a public address. Each Island discovers its public IP via STUN (a lightweight NAT traversal protocol) and reports it in every heartbeat. The coordinator includes the next pipeline member’s address in each Island’s configuration, allowing direct QUIC connections with ephemeral self-signed TLS certificates. If QUIC fails (firewall, symmetric NAT), the transport falls back to NATS seamlessly — no interruption, no error visible to the Consumer.

Position 0 (the first Island in the chain) supports microbatching — collecting multiple tokens before sending each activation message. This reduces message overhead for high-throughput pipelines. Microbatch size is configurable per job (default: 1 token = real-time streaming).

Billing

Pipeline jobs are billed at the same per-job rate as single-Island jobs from the Consumer’s perspective — you pay the clearing price (or the Cargo’s default price) once per job, regardless of how many Islands participate.

The coordinator splits the earned credits among participating Islands proportional to the layers each Island processes. For example, in a 32-layer model split across two Islands:

Island	Layers	Share	Payout (10 credit job)
Island A	0–15 (16 layers)	50%	5.00 credits
Island B	16–31 (16 layers)	50%	5.00 credits

With an unequal split (e.g., one Island has more VRAM and takes more layers):

Island	Layers	Share	Payout (12 credit job)
Island A	0–7 (8 layers)	25%	3.00 credits
Island B	8–31 (24 layers)	75%	9.00 credits

Each Island’s earnings are credited immediately on job completion and count toward their payout balance.

Expert Routing (MoE Parallelism)

Experimental

Expert routing code exists but has not been verified with real MoE workloads in production. The API surface is subject to change during beta.

For Mixture-of-Experts models (like Mixtral), only a subset of experts are activated per token. Each Island loads a subset of experts instead of the full model. A Router Island runs the gating network, determines which experts should process each token, dispatches work, and combines the results.

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is expert-capable
       │
       ▼
Form Expert Group: 1 Router Island (highest Karma)
                   + N Expert Islands (each loads a subset of experts)
       │
       ▼
Start Expert Session
       │
       ├──→ Router Island (position 0, gating network)
       │         │ download router model, signal ready
       │         │
       ├──→ Expert Island A (position 1, experts [0-3])
       │         │ download expert shards, signal ready
       │         │
       └──→ Expert Island B (position 2, experts [4-7])
                 │ download expert shards, signal ready
                 │
All ready → Coordinator sends "start" + prompt to Router
                 │
       Router: gating network → select top-K experts per token
                 │
       Dispatch tokens to Expert Islands via NATS
                 │
       Expert Islands process tokens, return results
                 │
       Router: combine expert outputs → final tokens
                 │
       Tokens stream back to Consumer

Consumer Transparency

Like pipeline execution, expert routing is completely invisible to Consumers. The same API, same streaming, same billing. The coordinator decides to use expert routing when the Cargo’s distribution_strategies includes "expert".

Expert Manifests

For a Cargo to support expert routing, it must declare an expert manifest in its shard_manifest:

{
  "distribution_strategies": ["single", "expert"],
  "shard_manifest": {
    "total_experts": 8,
    "active_experts": 2,
    "min_expert_islands": 2,
    "max_expert_islands": 4,
    "router_url": "https://cdn.example.com/mixtral-router.gguf",
    "expert_urls": {
      "0": "https://cdn.example.com/expert-0.gguf",
      "1": "https://cdn.example.com/expert-1.gguf",
      "2": "https://cdn.example.com/expert-2.gguf",
      "3": "https://cdn.example.com/expert-3.gguf",
      "4": "https://cdn.example.com/expert-4.gguf",
      "5": "https://cdn.example.com/expert-5.gguf",
      "6": "https://cdn.example.com/expert-6.gguf",
      "7": "https://cdn.example.com/expert-7.gguf"
    }
  }
}

Field	Description
`total_experts`	Number of experts in the model (e.g., 8 for Mixtral)
`active_experts`	Experts activated per token — the top-K value (e.g., 2)
`min_expert_islands`	Minimum expert Islands needed (excluding router)
`max_expert_islands`	Maximum expert Islands supported
`router_url`	GGUF model for the gating network / routing
`expert_urls`	Map of expert_id → download URL for each expert shard

Expert Group Formation

The coordinator forms an expert group by selecting:

The highest-scoring Island as the Router (position 0) — it handles every token, so reliability and latency matter most
N Expert Islands (positions 1..N) — each assigned a subset of expert IDs

Expert IDs are distributed round-robin: for 8 experts across 2 Islands, Island A gets experts [0,1,2,3] and Island B gets [4,5,6,7].

Formation uses the same scoring as pipeline groups: Karma (40%), region affinity (30%), and NATS RTT (30%).

Fault Tolerance

If any Island in the expert group fails during inference, the entire group fails and the job is retried. Expert failover (re-routing tokens to a different Island’s copy of the same expert) is planned for a future version.

Billing

Expert jobs are billed at the same per-job rate as single-Island jobs. The coordinator splits earnings:

Role	Share	Rationale
Router	20%	Processes every token (gating + combination)
Each Expert Island	80% ÷ N	Processes only routed tokens

Example for a 10.00 credit job with 1 router + 2 expert Islands:

Router: 2.00 credits
Expert A: 4.00 credits
Expert B: 4.00 credits

Expert Replication & Load Balancing

Popular experts (those most frequently activated by the gating network) can be replicated across multiple Islands for load balancing. When replicate_popular_experts is enabled in the Cargo manifest, each expert Island also loads a copy of expert 0 (typically the most activated).

The coordinator tracks tokens in-flight per expert Island and uses capacity-aware routing — when multiple Islands can serve the same expert, the router dispatches to the one with the lowest load. This prevents any single Island from becoming a bottleneck.

The router supports multiple gating strategies for expert selection:

Hash-based (default): deterministic routing via consistent hashing — no model needed
Embedding-based: routes tokens to experts whose embedding centroids are most similar — learned from training data
Native MoE gating: uses actual gating layer weights when available — highest accuracy
Round-robin: sequential assignment for load testing

Islands that already have expert weights cached get a warmth bonus during group formation, avoiding cold starts when experts are reassigned.

Bandwidth Efficiency

Expert routing is more bandwidth-efficient than pipeline parallelism because only active expert outputs travel between Islands — not full activation tensors. With top-2 routing on an 8-expert model, only 25% of expert outputs cross the network per token.

Speculative Decoding Pairs

Experimental

Speculative decoding code exists but has not been verified with real workloads in production. The API surface is subject to change during beta.

Speculative decoding pairs a fast Island (small draft model) with a powerful Island (large verifier model) to accelerate autoregressive generation by 2–3x.

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is speculative-capable
       │
       ▼
Form pair: Draft Island (TinyLlama 1B, fast)
           Verify Island (Llama 70B, accurate)
       │
       ├── Draft generates K tokens quickly (K=5)
       │     ↓
       ├── Verify checks all K in one forward pass
       │     ↓
       ├── Accepts matching prefix + first corrected token
       │     ↓
       ├── Accepted tokens stream to Consumer
       │     ↓
       └── Draft continues from accepted point
             ...repeat until done...

How It Works

The Draft Island generates K candidate tokens autoregressively (K=4–8) using a small, fast model
All K tokens are sent to the Verify Island
The Verify Island runs a single forward pass on all K tokens in parallel (same cost as 1 token)
Accepts tokens that match (within a configurable threshold), rejects divergent ones
Returns the accepted prefix + first corrected token
The Draft Island continues from the accepted point

This is transparent to the Consumer — they just see faster token output. The speedup comes from the draft model being 5–10x faster per token than the verifier: it generates K tokens in the time the verifier processes 1.

Speculative Manifest

For a Cargo to support speculative decoding, it must declare a speculative manifest:

{
  "distribution_strategies": ["single", "speculative"],
  "shard_manifest": {
    "draft_model_url": "https://cdn.example.com/tinyllama-1b.gguf",
    "verify_model_url": "https://cdn.example.com/llama-70b.gguf",
    "draft_tokens": 5,
    "acceptance_threshold": 0.9
  }
}

Field	Description
`draft_model_url`	Small, fast model for generating candidate tokens
`verify_model_url`	Large, accurate model for verification
`draft_tokens`	K — number of tokens per draft round (default: 5)
`acceptance_threshold`	Log-prob match threshold for acceptance (default: 0.9)

Pair Formation

The coordinator selects:

Verify Island: the highest-VRAM candidate that can run the target model — accuracy is priority
Draft Island: the best remaining candidate — any VRAM is fine since the draft model is small

Both Islands are scored by Karma, region affinity, and NATS RTT. Low RTT between the pair is critical because draft tokens must travel to the verifier quickly.

Multi-Draft Mode

For maximum throughput, speculative decoding supports multiple draft Islands generating candidates in parallel. Set draft_count in the Cargo manifest to use N drafts + 1 verifier. Each draft independently generates K tokens per round, and the verifier picks the best batch — the one with the highest acceptance rate. This “best-of-N” selection ensures the verifier always uses the highest-quality draft output.

Adaptive Draft Size

The number of draft tokens (K) is adjusted dynamically based on the acceptance rate:

High acceptance (>80%): K increases (up to 12) — draft and verifier agree well, generate more tokens per round
Low acceptance (<50%): K decreases (down to 2) — draft diverges too much, fewer tokens per round
Moderate (50–80%): K stays the same

This automatic tuning maximizes throughput without requiring manual configuration per model pair.

Billing

Role	Share	Rationale
Draft Island	30%	Generates most tokens (fast, cheap model)
Verify Island	70%	Runs the expensive target model

Example for a 10.00 credit job:

Draft Island: 3.00 credits
Verify Island: 7.00 credits

Shared Infrastructure

All multi-Island strategies share common infrastructure:

Island Groups

The island_groups system tracks groups of Islands working together:

Field	Purpose
`topology`	`"pipeline"`, `"expert"`, or `"speculative"`
`status`	`"forming"` → `"active"` → `"degraded"` / `"disbanded"`
`workload_id`	Which Cargo this group runs
`members`	Ordered list of Islands with position and shard assignments

Groups are reusable — an active group can serve multiple sequential jobs without re-forming. Groups are automatically disbanded after 5 minutes of inactivity (configurable) or when a member goes offline.

Placement Engine Extensions

The placement engine is extended with multi-dimensional scoring for pipeline member selection:

Dimension	Weight	What It Measures
Karma	40%	Island reliability — higher karma = fewer pipeline failures
Region affinity	30%	Geographic proximity — same-region Islands have lower inter-hop latency
NATS RTT	30%	Measured network latency — Islands report round-trip time in every heartbeat

The coordinator scores all eligible Islands and picks the top N by composite score. Islands with lower measured latency are preferred because every millisecond of hop delay is multiplied by the number of pipeline stages.

Islands also support peer-to-peer RTT probes — each Island responds to latency probes from other Islands via NATS request/reply. The coordinator caches these measurements in a pairwise RTT cache (refreshed every 5 minutes) so formation decisions use real network conditions, not just geographic estimates.

Planned extensions:

Anti-affinity for batch — spread batch children across different Islands for fault isolation

Billing

Strategy	How Islands Earn	Status
Batch fan-out	Per child job (existing billing)	Beta
Pipeline parallel	Proportional to layers held (split on completion)	Experimental
Expert parallel	Router 20%, experts split 80% equally	Experimental
Speculative decoding	Draft 30%, verify 70%	Experimental

Cargo Metadata

Cargos declare their distribution capabilities via the distribution_strategies field and strategy-specific metadata:

{
  "distribution_strategies": ["single", "pipeline", "batch"],
  "shard_manifest": {
    "total_layers": 32,
    "min_shards": 2,
    "max_shards": 4,
    "shard_urls": { ... }
  }
}

Observability

Telemetry events for multi-Island compute:

ring:formed/completed/failed — pipeline group lifecycle
expert:formed/completed/failed — expert group lifecycle
speculative:formed/completed/failed — speculative pair lifecycle
Batch completion progress (existing job:completed events per child)

All distributed jobs carry the same job_id correlation ID through every hop, enabling end-to-end tracing across Islands.