Open Beta Archipelag.io is in open beta until June 2026. All credits and earnings are virtual. Read the announcement →

Multi-Island Compute

How Archipelag.io distributes work across multiple Islands — batch fan-out, pipeline parallelism (inference rings), expert routing, and speculative decoding

Multi-Island Compute

Archipelag.io can combine multiple Islands into unified compute groups, allowing Cargos that exceed any single Island’s capacity. A 70B LLM can run across four laptops. A thousand images can be captioned in parallel across fifty phones.

Experimental
Multi-Island compute is in active development. The API endpoints exist and the code is functional, but these features have not been verified in production with real multi-Island workloads. Expect breaking changes during beta. Single-Island job dispatch (the standard flow) is stable and production-ready.

This page covers the distribution strategies and their current status.

Overview

The standard job flow is 1 job → 1 Island. Multi-Island compute extends this to 1 job → N Islands, with the coordinator handling splitting, orchestration, and result merging.

Standard:     Consumer → Coordinator → Island → Result

Distributed:  Consumer → Coordinator ─┬→ Island A → Result A ─┐
                                      ├→ Island B → Result B ──┤→ Merge → Consumer
                                      └→ Island C → Result C ─┘

Four distribution strategies target different workload types:

StrategyUse CaseInter-Island TrafficStatus
Batch fan-outEmbarrassingly parallel (image batch, embeddings)NoneBeta
Pipeline parallelLarge models that don’t fit on one deviceHigh (activations per layer)Experimental
Expert parallelMoE models (Mixtral, etc.)Medium (active expert outputs)Experimental
Speculative decodingAccelerating autoregressive generationLow (draft tokens)Experimental

Batch Fan-Out (Data Parallelism)

Beta
Batch fan-out is implemented and available via the API. No changes needed on the Island side — it uses the existing single-job dispatch infrastructure. This feature is in beta and may have rough edges.

Batch fan-out splits a list of independent inputs across multiple Islands and merges the results when all children complete. There is zero inter-Island communication — each child job runs independently.

How It Works

          ┌─────────────────┐
          │Consumer submits  │
          │batch of inputs   │
          └────────┬────────┘
                   │
          ┌────────▼────────┐
          │Coordinator makes │
          │parent job + N    │
          │child jobs        │
          └──┬─────┬─────┬──┘
             │     │     │
    ┌────────┘     │     └────────┐
    │              │              │
    ▼              ▼              ▼
 Child 0        Child 1       Child 2
 → Island A     → Island B    → Island C
    │              │              │
    ▼              ▼              ▼
 Result 0       Result 1      Result 2
    │              │              │
    └──────┬───────┘              │
           │  ┌───────────────────┘
           │  │
    ┌──────▼──▼──────┐
    │Merge by         │
    │batch_index      │
    └───────┬─────────┘
            │
    ┌───────▼─────────┐
    │Parent job        │
    │succeeds          │
    └─────────────────┘
  1. Consumer submits a batch via POST /api/v1/jobs/batch with a list of inputs
  2. The coordinator creates a parent job (not dispatched to any Island) that tracks overall progress
  3. N child jobs are created, each with a single input and a batch_index (0-indexed)
  4. Each child is dispatched through the normal placement engine — children spread across different Islands for fault isolation
  5. As each child reaches a terminal state, the coordinator checks if the batch is complete
  6. When all children finish, results are merged by batch_index and stored on the parent job

Job Relationships

Parent Job (batch_config set, no host assigned)
├── Child Job 0  (batch_index: 0, parent_job_id: parent.id)
├── Child Job 1  (batch_index: 1, parent_job_id: parent.id)
├── Child Job 2  (batch_index: 2, parent_job_id: parent.id)
└── ...

The parent job has a batch_config field containing:

FieldDescription
chunk_countTotal number of child jobs
merge_strategyHow to combine results: "concat" or "flatten"
fail_modeWhat to do on failure: "best_effort" or "fail_fast"
completed_countNumber of succeeded children (updated as they finish)
failed_countNumber of failed children

Merge Strategies

StrategyBehaviorBest For
concatCollects child outputs into an ordered list by batch_indexMost use cases — each child returns a single result
flattenLike concat, but flattens one level if children return listsChildren that return multiple items each

Failure Modes

ModeBehavior
best_effort (default)Parent succeeds with partial results. Failed children are noted in the errors array of the output.
fail_fastParent fails immediately when the first child fails. All remaining non-terminal children are cancelled.

API

Submit a Batch

POST /api/v1/jobs/batch
{
  "workload": "image-caption",
  "inputs": [
    { "image_url": "https://example.com/photo1.jpg" },
    { "image_url": "https://example.com/photo2.jpg" },
    { "image_url": "https://example.com/photo3.jpg" }
  ],
  "merge_strategy": "concat",
  "fail_mode": "best_effort",
  "max_parallelism": 10,
  "region": "us-east"
}
FieldRequiredDefaultDescription
workloadYesCargo slug to run for each input
inputsYesArray of input objects (1–100 items, max 256KB each)
merge_strategyNo"concat"How to combine child outputs
fail_modeNo"best_effort"How to handle child failures
max_parallelismNounlimitedMax concurrent children (reserved for future use)
regionNoanyPreferred region for placement
bid_priceNoPer-child bid price for market pricing

Response (201 Created):

{
  "id": "parent-job-uuid",
  "state": "started",
  "workload": "image-caption",
  "batch": {
    "chunk_count": 3,
    "merge_strategy": "concat",
    "fail_mode": "best_effort",
    "completed": 0,
    "failed": 0
  },
  "children": [
    { "id": "child-uuid-0", "batch_index": 0, "state": "submitted" },
    { "id": "child-uuid-1", "batch_index": 1, "state": "submitted" },
    { "id": "child-uuid-2", "batch_index": 2, "state": "submitted" }
  ],
  "created_at": "2026-03-16T12:00:00Z"
}

Check Batch Progress

GET /api/v1/jobs/{parent_id}/batch-status

Response:

{
  "parent_id": "parent-job-uuid",
  "parent_state": "started",
  "chunk_count": 3,
  "merge_strategy": "concat",
  "fail_mode": "best_effort",
  "child_states": {
    "succeeded": 2,
    "started": 1
  },
  "children": [
    { "id": "child-0", "batch_index": 0, "state": "succeeded", "host_id": "island-a" },
    { "id": "child-1", "batch_index": 1, "state": "succeeded", "host_id": "island-b" },
    { "id": "child-2", "batch_index": 2, "state": "started", "host_id": "island-c" }
  ]
}

Completed Batch Output

When the parent job reaches "succeeded", its output contains:

{
  "batch_results": [
    { "text": "A cat sitting on a windowsill" },
    { "text": "A sunset over the ocean" },
    { "text": "A group of people hiking" }
  ],
  "total": 3,
  "failed": 0
}

With best_effort and partial failures:

{
  "batch_results": [
    { "text": "A cat sitting on a windowsill" },
    null
  ],
  "total": 3,
  "succeeded": 2,
  "failed": 1,
  "errors": [
    { "batch_index": 1, "error": "OOM error" }
  ]
}

Billing

Each child job is billed individually through the existing per-job billing system. The parent job’s total cost equals the sum of all children’s clearing prices. Credits are checked upfront — the batch is rejected if the Consumer doesn’t have enough credits for the full batch at current pricing.

Real-Time Progress

Subscribe to the parent job’s PubSub channel (job:{parent_id}) to receive progress updates as children complete:

{
  "state": "streaming",
  "batch_progress": {
    "completed": 7,
    "failed": 0,
    "total": 10
  }
}

SSE streaming via GET /api/v1/jobs/{parent_id}/stream also works — you’ll receive state events as the batch progresses.

Limits

LimitValue
Max batch size100 inputs
Max input item size256 KB (serialized JSON)
Max total batch costMust not exceed Consumer’s credit balance

Pipeline Parallelism (Inference Rings)

Experimental
Pipeline parallelism code exists but has not been verified with real multi-Island workloads in production. The API surface is subject to change during beta.

Pipeline parallelism shards a large model across N Islands in a sequential chain. Each Island holds a subset of layers. Tokens flow through the pipeline — Island 1 processes layers 0–15, sends activations to Island 2 (layers 16–31), and so on. This is the “run 70B across four laptops” feature.

Prior art: Petals, Exo.

How It Works

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is pipeline-capable AND
                     no single Island has enough VRAM
       │
       ▼
Form Island Group: find N Islands with highest Karma,
                   sufficient per-shard VRAM, same region preferred
       │
       ▼
Start Ring Session (one per active pipeline job)
       │
       ├──→ Island 1 (position 0, layers 0–10)
       │         │ download shard, signal ready
       │         │
       ├──→ Island 2 (position 1, layers 11–21)
       │         │ download shard, signal ready
       │         │
       └──→ Island 3 (position 2, layers 22–31)
                 │ download shard, signal ready
                 │
All ready → Coordinator sends "start" + prompt to position 0
                 │
       Island 1: embedding + layers 0–10 → activations
                 │
       Island 2: layers 11–21 → activations
                 │
       Island 3: layers 22–31 → final logits → tokens
                 │
       Tokens stream back to Consumer (same as any other job)

Consumer Transparency

Pipeline execution is completely invisible to Consumers. You submit a job the same way you always do:

POST /api/v1/jobs
{
  "workload": "llama-70b-chat",
  "input": { "prompt": "Explain quantum computing" }
}

If the model needs pipeline execution, the coordinator handles everything automatically. The streaming response looks identical to a single-Island job. Billing works the same way — you pay the per-job price, and the coordinator splits earnings among participating Islands.

Island Groups

When the coordinator decides to use pipeline execution, it forms an Island Group — a set of Islands working together on a single Cargo.

Formation criteria:

  • Each Island must have enough VRAM for its shard (not the full model)
  • Islands must support the required runtime (e.g., llmcpp)
  • Islands must be online, approved, and not in cooldown
  • Higher-Karma Islands are preferred — a pipeline is only as reliable as its weakest link
  • Same-region Islands are preferred for lower latency

Group lifecycle:

StatusMeaning
formingGroup created, members downloading shards
activeAll members ready, accepting jobs
degradedA member failed — group can’t serve new jobs
disbandedGroup torn down (timeout, manual, or error)

Active groups are reused across multiple jobs — Islands keep their shards loaded in memory, so subsequent requests skip the download step and start immediately.

Shard Manifests

For a Cargo to support pipeline execution, it must declare a shard manifest — metadata describing how the model can be split:

{
  "distribution_strategies": ["single", "pipeline"],
  "shard_manifest": {
    "total_layers": 80,
    "min_shards": 2,
    "max_shards": 8,
    "shard_urls": {
      "0": "https://cdn.example.com/llama-70b-shard-0.gguf",
      "1": "https://cdn.example.com/llama-70b-shard-1.gguf",
      "2": "https://cdn.example.com/llama-70b-shard-2.gguf",
      "3": "https://cdn.example.com/llama-70b-shard-3.gguf"
    }
  }
}
FieldDescription
total_layersTotal number of transformer layers in the full model
min_shardsMinimum Islands needed (fewer shards = more VRAM per Island)
max_shardsMaximum Islands supported
shard_urlsPre-split GGUF files, one per shard position

The "single" strategy is always included as a fallback — when a single Island has enough VRAM, the model runs normally without any pipeline overhead.

Preparing a Model for Pipeline Execution

Cargo publishers use the built-in shard splitting tool to prepare models for pipeline execution:

island --split-model llama-70b-Q4_K_M.gguf --shards 4 --output-dir ./shards --layer-aware

The --layer-aware flag parses the GGUF binary format, identifies each tensor’s layer from its name (blk.0.attn_q.weight, blk.1.ffn_gate.weight, etc.), and produces valid sub-GGUF files where each shard contains only its layer range’s tensors. Embedding tensors go in the first shard; the output head goes in the last. Layer tensors are renumbered (blk.16.*blk.0.*) and the model’s block count is updated, so each shard is a valid standalone model that llama.cpp can load directly.

This produces 4 shard files and a shard_manifest.json with per-shard layer ranges and SHA256 hashes. Upload the shards to your CDN, update the URLs in the manifest, and set it as the Cargo’s shard_manifest field.

Automatic Pipeline Detection

The coordinator automatically decides whether to use pipeline execution based on two conditions:

  1. The Cargo supports itdistribution_strategies includes "pipeline" and shard_manifest is present
  2. No single Island can handle it — the maximum VRAM of any online Island is less than the Cargo’s required_vram_mb

If either condition is false, the job dispatches normally to a single Island.

Fault Tolerance

If any Island in the pipeline fails during inference, the entire pipeline fails and the job is retried from scratch. This is the simplest and most robust approach — re-sharding mid-inference is complex, and for most workloads the retry cost is acceptable.

Specifically:

  • If a member fails during the forming phase, the group is disbanded and the job is re-queued
  • If a member fails during active inference, a stop signal is sent to all members, the job fails, and it retries if attempts remain
  • If the session times out (120 seconds with no progress), the group is disbanded

Group Lifecycle & Idle Timeout

Active groups are reused across multiple sequential jobs — Islands keep their model shards cached in memory, so the next request for the same Cargo starts immediately without re-downloading.

Groups are automatically disbanded after 5 minutes of inactivity (no new jobs). This frees Islands to participate in other groups or serve single-Island jobs. The idle timeout is monitored by a background process that checks every 60 seconds.

Activation Transport

Tokens and activations flow between pipeline stages through a pluggable transport layer with two backends:

TransportStatusLatencyBest For
NATSDefault~1–2ms/hopAll pipelines — simple, reliable, no extra config
QUIC with relayRecommended~0.1ms or ~1msProduction pipelines — tries direct QUIC, falls back to NATS
QUICAvailable~0.1–0.5msCo-located Islands on LAN/VPN where latency is guaranteed

The coordinator automatically enables the QUIC with relay mode when Islands report a public address. Each Island discovers its public IP via STUN (a lightweight NAT traversal protocol) and reports it in every heartbeat. The coordinator includes the next pipeline member’s address in each Island’s configuration, allowing direct QUIC connections with ephemeral self-signed TLS certificates. If QUIC fails (firewall, symmetric NAT), the transport falls back to NATS seamlessly — no interruption, no error visible to the Consumer.

Position 0 (the first Island in the chain) supports microbatching — collecting multiple tokens before sending each activation message. This reduces message overhead for high-throughput pipelines. Microbatch size is configurable per job (default: 1 token = real-time streaming).

Billing

Pipeline jobs are billed at the same per-job rate as single-Island jobs from the Consumer’s perspective — you pay the clearing price (or the Cargo’s default price) once per job, regardless of how many Islands participate.

The coordinator splits the earned credits among participating Islands proportional to the layers each Island processes. For example, in a 32-layer model split across two Islands:

IslandLayersSharePayout (10 credit job)
Island A0–15 (16 layers)50%5.00 credits
Island B16–31 (16 layers)50%5.00 credits

With an unequal split (e.g., one Island has more VRAM and takes more layers):

IslandLayersSharePayout (12 credit job)
Island A0–7 (8 layers)25%3.00 credits
Island B8–31 (24 layers)75%9.00 credits

Each Island’s earnings are credited immediately on job completion and count toward their payout balance.


Expert Routing (MoE Parallelism)

Experimental
Expert routing code exists but has not been verified with real MoE workloads in production. The API surface is subject to change during beta.

For Mixture-of-Experts models (like Mixtral), only a subset of experts are activated per token. Each Island loads a subset of experts instead of the full model. A Router Island runs the gating network, determines which experts should process each token, dispatches work, and combines the results.

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is expert-capable
       │
       ▼
Form Expert Group: 1 Router Island (highest Karma)
                   + N Expert Islands (each loads a subset of experts)
       │
       ▼
Start Expert Session
       │
       ├──→ Router Island (position 0, gating network)
       │         │ download router model, signal ready
       │         │
       ├──→ Expert Island A (position 1, experts [0-3])
       │         │ download expert shards, signal ready
       │         │
       └──→ Expert Island B (position 2, experts [4-7])
                 │ download expert shards, signal ready
                 │
All ready → Coordinator sends "start" + prompt to Router
                 │
       Router: gating network → select top-K experts per token
                 │
       Dispatch tokens to Expert Islands via NATS
                 │
       Expert Islands process tokens, return results
                 │
       Router: combine expert outputs → final tokens
                 │
       Tokens stream back to Consumer

Consumer Transparency

Like pipeline execution, expert routing is completely invisible to Consumers. The same API, same streaming, same billing. The coordinator decides to use expert routing when the Cargo’s distribution_strategies includes "expert".

Expert Manifests

For a Cargo to support expert routing, it must declare an expert manifest in its shard_manifest:

{
  "distribution_strategies": ["single", "expert"],
  "shard_manifest": {
    "total_experts": 8,
    "active_experts": 2,
    "min_expert_islands": 2,
    "max_expert_islands": 4,
    "router_url": "https://cdn.example.com/mixtral-router.gguf",
    "expert_urls": {
      "0": "https://cdn.example.com/expert-0.gguf",
      "1": "https://cdn.example.com/expert-1.gguf",
      "2": "https://cdn.example.com/expert-2.gguf",
      "3": "https://cdn.example.com/expert-3.gguf",
      "4": "https://cdn.example.com/expert-4.gguf",
      "5": "https://cdn.example.com/expert-5.gguf",
      "6": "https://cdn.example.com/expert-6.gguf",
      "7": "https://cdn.example.com/expert-7.gguf"
    }
  }
}
FieldDescription
total_expertsNumber of experts in the model (e.g., 8 for Mixtral)
active_expertsExperts activated per token — the top-K value (e.g., 2)
min_expert_islandsMinimum expert Islands needed (excluding router)
max_expert_islandsMaximum expert Islands supported
router_urlGGUF model for the gating network / routing
expert_urlsMap of expert_id → download URL for each expert shard

Expert Group Formation

The coordinator forms an expert group by selecting:

  1. The highest-scoring Island as the Router (position 0) — it handles every token, so reliability and latency matter most
  2. N Expert Islands (positions 1..N) — each assigned a subset of expert IDs

Expert IDs are distributed round-robin: for 8 experts across 2 Islands, Island A gets experts [0,1,2,3] and Island B gets [4,5,6,7].

Formation uses the same scoring as pipeline groups: Karma (40%), region affinity (30%), and NATS RTT (30%).

Fault Tolerance

If any Island in the expert group fails during inference, the entire group fails and the job is retried. Expert failover (re-routing tokens to a different Island’s copy of the same expert) is planned for a future version.

Billing

Expert jobs are billed at the same per-job rate as single-Island jobs. The coordinator splits earnings:

RoleShareRationale
Router20%Processes every token (gating + combination)
Each Expert Island80% ÷ NProcesses only routed tokens

Example for a 10.00 credit job with 1 router + 2 expert Islands:

  • Router: 2.00 credits
  • Expert A: 4.00 credits
  • Expert B: 4.00 credits

Expert Replication & Load Balancing

Popular experts (those most frequently activated by the gating network) can be replicated across multiple Islands for load balancing. When replicate_popular_experts is enabled in the Cargo manifest, each expert Island also loads a copy of expert 0 (typically the most activated).

The coordinator tracks tokens in-flight per expert Island and uses capacity-aware routing — when multiple Islands can serve the same expert, the router dispatches to the one with the lowest load. This prevents any single Island from becoming a bottleneck.

The router supports multiple gating strategies for expert selection:

  • Hash-based (default): deterministic routing via consistent hashing — no model needed
  • Embedding-based: routes tokens to experts whose embedding centroids are most similar — learned from training data
  • Native MoE gating: uses actual gating layer weights when available — highest accuracy
  • Round-robin: sequential assignment for load testing

Islands that already have expert weights cached get a warmth bonus during group formation, avoiding cold starts when experts are reassigned.

Bandwidth Efficiency

Expert routing is more bandwidth-efficient than pipeline parallelism because only active expert outputs travel between Islands — not full activation tensors. With top-2 routing on an 8-expert model, only 25% of expert outputs cross the network per token.


Speculative Decoding Pairs

Experimental
Speculative decoding code exists but has not been verified with real workloads in production. The API surface is subject to change during beta.

Speculative decoding pairs a fast Island (small draft model) with a powerful Island (large verifier model) to accelerate autoregressive generation by 2–3x.

Consumer submits job (normal API — no special parameters)
       │
       ▼
Coordinator detects: Cargo is speculative-capable
       │
       ▼
Form pair: Draft Island (TinyLlama 1B, fast)
           Verify Island (Llama 70B, accurate)
       │
       ├── Draft generates K tokens quickly (K=5)
       │     ↓
       ├── Verify checks all K in one forward pass
       │     ↓
       ├── Accepts matching prefix + first corrected token
       │     ↓
       ├── Accepted tokens stream to Consumer
       │     ↓
       └── Draft continues from accepted point
             ...repeat until done...

How It Works

  1. The Draft Island generates K candidate tokens autoregressively (K=4–8) using a small, fast model
  2. All K tokens are sent to the Verify Island
  3. The Verify Island runs a single forward pass on all K tokens in parallel (same cost as 1 token)
  4. Accepts tokens that match (within a configurable threshold), rejects divergent ones
  5. Returns the accepted prefix + first corrected token
  6. The Draft Island continues from the accepted point

This is transparent to the Consumer — they just see faster token output. The speedup comes from the draft model being 5–10x faster per token than the verifier: it generates K tokens in the time the verifier processes 1.

Speculative Manifest

For a Cargo to support speculative decoding, it must declare a speculative manifest:

{
  "distribution_strategies": ["single", "speculative"],
  "shard_manifest": {
    "draft_model_url": "https://cdn.example.com/tinyllama-1b.gguf",
    "verify_model_url": "https://cdn.example.com/llama-70b.gguf",
    "draft_tokens": 5,
    "acceptance_threshold": 0.9
  }
}
FieldDescription
draft_model_urlSmall, fast model for generating candidate tokens
verify_model_urlLarge, accurate model for verification
draft_tokensK — number of tokens per draft round (default: 5)
acceptance_thresholdLog-prob match threshold for acceptance (default: 0.9)

Pair Formation

The coordinator selects:

  • Verify Island: the highest-VRAM candidate that can run the target model — accuracy is priority
  • Draft Island: the best remaining candidate — any VRAM is fine since the draft model is small

Both Islands are scored by Karma, region affinity, and NATS RTT. Low RTT between the pair is critical because draft tokens must travel to the verifier quickly.

Multi-Draft Mode

For maximum throughput, speculative decoding supports multiple draft Islands generating candidates in parallel. Set draft_count in the Cargo manifest to use N drafts + 1 verifier. Each draft independently generates K tokens per round, and the verifier picks the best batch — the one with the highest acceptance rate. This “best-of-N” selection ensures the verifier always uses the highest-quality draft output.

Adaptive Draft Size

The number of draft tokens (K) is adjusted dynamically based on the acceptance rate:

  • High acceptance (>80%): K increases (up to 12) — draft and verifier agree well, generate more tokens per round
  • Low acceptance (<50%): K decreases (down to 2) — draft diverges too much, fewer tokens per round
  • Moderate (50–80%): K stays the same

This automatic tuning maximizes throughput without requiring manual configuration per model pair.

Billing

RoleShareRationale
Draft Island30%Generates most tokens (fast, cheap model)
Verify Island70%Runs the expensive target model

Example for a 10.00 credit job:

  • Draft Island: 3.00 credits
  • Verify Island: 7.00 credits

Shared Infrastructure

All multi-Island strategies share common infrastructure:

Island Groups

The island_groups system tracks groups of Islands working together:

FieldPurpose
topology"pipeline", "expert", or "speculative"
status"forming""active""degraded" / "disbanded"
workload_idWhich Cargo this group runs
membersOrdered list of Islands with position and shard assignments

Groups are reusable — an active group can serve multiple sequential jobs without re-forming. Groups are automatically disbanded after 5 minutes of inactivity (configurable) or when a member goes offline.

Placement Engine Extensions

The placement engine is extended with multi-dimensional scoring for pipeline member selection:

DimensionWeightWhat It Measures
Karma40%Island reliability — higher karma = fewer pipeline failures
Region affinity30%Geographic proximity — same-region Islands have lower inter-hop latency
NATS RTT30%Measured network latency — Islands report round-trip time in every heartbeat

The coordinator scores all eligible Islands and picks the top N by composite score. Islands with lower measured latency are preferred because every millisecond of hop delay is multiplied by the number of pipeline stages.

Islands also support peer-to-peer RTT probes — each Island responds to latency probes from other Islands via NATS request/reply. The coordinator caches these measurements in a pairwise RTT cache (refreshed every 5 minutes) so formation decisions use real network conditions, not just geographic estimates.

Planned extensions:

  • Anti-affinity for batch — spread batch children across different Islands for fault isolation

Billing

StrategyHow Islands EarnStatus
Batch fan-outPer child job (existing billing)Beta
Pipeline parallelProportional to layers held (split on completion)Experimental
Expert parallelRouter 20%, experts split 80% equallyExperimental
Speculative decodingDraft 30%, verify 70%Experimental

Cargo Metadata

Cargos declare their distribution capabilities via the distribution_strategies field and strategy-specific metadata:

{
  "distribution_strategies": ["single", "pipeline", "batch"],
  "shard_manifest": {
    "total_layers": 32,
    "min_shards": 2,
    "max_shards": 4,
    "shard_urls": { ... }
  }
}

Observability

Telemetry events for multi-Island compute:

  • ring:formed/completed/failed — pipeline group lifecycle
  • expert:formed/completed/failed — expert group lifecycle
  • speculative:formed/completed/failed — speculative pair lifecycle
  • Batch completion progress (existing job:completed events per child)

All distributed jobs carry the same job_id correlation ID through every hop, enabling end-to-end tracing across Islands.