Placement & Fit Scoring

How the coordinator selects the best Island for each job — multi-dimensional scoring, GPU bandwidth estimation, and runtime-specific strategies

Placement & Fit Scoring

When a job is submitted, the coordinator doesn’t just find any qualifying Island — it ranks candidates by how well they can run the Cargo and picks the best one. This is the fit scoring system.

Overview

Traditional placement checks binary requirements: “does this Island have enough VRAM?” Fit scoring goes further by estimating how well the Cargo will perform on each candidate, producing a composite 0–100 score across multiple dimensions.

Job submitted
    │
    ▼
Filter: online, approved, not in cooldown,
        runtime compatible, meets min requirements
    │
    ▼
Score: up to 10 candidates ranked by fit
    │
    ▼
Select: highest composite score wins
    │
    ▼
Region tiebreaker: same region > adjacent > any

The system is implemented in Coordinator.Placement.FitScoring and Coordinator.Placement.GpuSpecs.

Scoring Dimensions

Every candidate Island is scored on three dimensions, each 0–100:

Dimension	What it measures	Example
Speed	Estimated throughput relative to a target	GPU with 100 tok/s scores higher than one with 20 tok/s
Fit	Resource utilization sweet spot — not too tight, not wasted	60% VRAM utilization scores higher than 95% or 20%
Headroom	Capacity for concurrent work	An idle Island scores higher than one running 3 jobs

The composite score is a weighted average. Weights vary by runtime type:

Runtime	Speed	Fit	Headroom
`llmcpp`	50%	35%	15%
`container`	30%	40%	30%
`wasm`	25%	40%	35%
`coreml` / `onnx`	45%	40%	15%

Why different weights?

LLM Cargos prioritize speed because users are waiting for token-by-token responses. Container Cargos prioritize fit and headroom because they need reliable resources and often run concurrently. WASM Cargos are lightweight and benefit most from headroom.

Fit Levels

Based on dimension scores, each Island gets a fit level:

Level	Meaning	When assigned
Perfect	Optimal match — GPU-accelerated with headroom	Fit ≥ 80, Speed ≥ 60, has GPU (for GPU Cargos)
Good	Solid match with 20%+ headroom	Fit ≥ 50, Speed ≥ 30
Marginal	Will work but may be slow or unstable	Fit ≥ 20
Too tight	Does not meet minimum requirements	Filtered out before scoring

Islands scored as “Too tight” are never selected. The remaining candidates are sorted by composite score.

Runtime-Specific Scoring

LLM Cargos (`llmcpp`)

The LLM scorer estimates tokens per second using GPU memory bandwidth:

tok/s = gpu_bandwidth_GB/s / model_size_GB × 0.55 × mode_factor

Where:

gpu_bandwidth is looked up from a 120+ GPU model table (or reported by the Island in heartbeats)
model_size is estimated from the Cargo’s VRAM requirement (~70% of required VRAM)
0.55 is an empirical efficiency factor
mode_factor is 1.0 for full GPU, 0.5 for CPU offload, 0.3 for CPU-only

Speed score: (tok_s / 40) × 100 — targeting 40 tok/s for interactive chat.

Fit score: VRAM utilization sweet spot:

50–80% utilization → 100
Under 50% → 60–100 (wasting resources)
Over 90% → 20–70 (risk of OOM, KV cache pressure)

Container Cargos

Speed: CPU ratio — how many times the Island exceeds the requirement. An Island with 16 cores running a 4-core Cargo scores higher than one with 4 cores.

Fit: Average of CPU and RAM utilization. The sweet spot is 30–70% utilization.

Headroom: Based on available concurrent slots: max(cpu_cores / required_cores, 1) - active_jobs.

WASM Cargos

WASM modules are lightweight and single-threaded. Scoring emphasizes:

Speed: CPU core count as proxy for quality
Fit: RAM utilization (WASM linear memory vs available)
Headroom: Many can run concurrently — scored generously

Mobile Cargos (`coreml`, `onnx`)

Mobile scoring accounts for hardware accelerators and device health:

Speed: Neural Engine (CoreML) or NPU (ONNX) presence is the primary factor. Metal GPU adds a bonus.
Fit: Available device memory vs model size. Considers memory_used_mb if reported.
Penalties: Thermal state (critical = -30, serious = -15) and low battery without charging (-10 to -20).

Battery and thermal awareness

A device in `critical` thermal state or below 20% battery (not charging) will score significantly lower, reducing the chance it receives jobs. This protects user devices from overheating or unexpected shutdowns.

GPU Bandwidth Table

The coordinator maintains a lookup table mapping GPU model names to memory bandwidth in GB/s. This is the primary input for LLM performance estimation.

Category	Examples	Bandwidth Range
NVIDIA Data Center	H100, A100, L40S	300–3350 GB/s
NVIDIA Consumer	RTX 4090, 3090, 3060	224–1008 GB/s
AMD Radeon	RX 7900 XTX, 6800 XT	224–960 GB/s
AMD Instinct	MI300X, MI250X	1228–5300 GB/s
Apple Silicon	M1–M4 (all tiers)	68–819 GB/s
Intel Arc	A770, A750	186–560 GB/s

When a GPU model isn’t found in the table, the system falls back to a conservative estimate based on VRAM size (~40 GB/s per GB of VRAM).

Island-reported bandwidth

Islands running the node agent report their GPU bandwidth in heartbeats. When available, the coordinator prefers this value over the lookup table — it accounts for the actual hardware detected on the machine.

Performance Estimates in Heartbeats

The Island node agent computes and reports performance estimates with every heartbeat (every 10 seconds):

Field	Type	Description
`gpu_bandwidth_gb_s`	float	GPU memory bandwidth (looked up from model)
`estimated_llm_tok_s`	float	Estimated tok/s for a reference 7B Q4 model
`max_concurrent_containers`	int	Based on CPU cores and RAM
`wasm_memory_limit_mb`	int	Available WASM linear memory
`supported_runtimes`	array	Runtime types this Island can serve

These estimates are stored in the performance_estimates JSONB column on the hosts table and used by the fit scorer to improve accuracy over time.

Integration Points

Job Dispatch (`find_host_for_workload`)

The main entry point is Coordinator.Hosts.find_host_for_workload/2. It:

Filters candidates by binary requirements (online, approved, runtime, VRAM, CPU, RAM)
Fetches up to 10 qualifying candidates
Scores all candidates with FitScoring.rank_hosts/2
Selects the highest composite score
Falls back to DB-ordered first candidate if all score :too_tight

Regional preference is applied as a tiered fallback: same region → adjacent regions → any region.

Cargo Registry

Each Cargo card in the Cargo Registry shows network availability:

Number of Islands ready to serve it
Best fit level and composite score
Estimated throughput
Fit level breakdown across available Islands

Island Dashboard

The Island dashboard shows a “Compatible Cargos” section listing every Cargo the Island can run, sorted by fit score. This helps Island operators understand what their hardware is best suited for.

Pricing

Fit scoring is separate from pricing. The hardware tier system determines pricing multipliers (enterprise 2.0×, high-end 1.5×, etc.), while fit scoring determines which Island gets the job. A more capable Island earns more per job through tier multipliers, and fit scoring ensures it gets matched to appropriate Cargos.

Source Code

Module	Purpose
`Coordinator.Placement.GpuSpecs`	GPU bandwidth lookup table (120+ models)
`Coordinator.Placement.FitScoring`	Multi-dimensional scoring engine
`Coordinator.Hosts.find_host_for_workload/2`	Placement entry point
`node-agent/src/metrics/gpu.rs`	Island-side bandwidth lookup and estimation

Placement & Fit Scoring

Placement & Fit Scoring

Overview

Scoring Dimensions

Fit Levels

Runtime-Specific Scoring

LLM Cargos (llmcpp)

Container Cargos

WASM Cargos

Mobile Cargos (coreml, onnx)

GPU Bandwidth Table

Performance Estimates in Heartbeats

Integration Points

Job Dispatch (find_host_for_workload)

Cargo Registry

Island Dashboard

Pricing

Source Code

See Also

LLM Cargos (`llmcpp`)

Mobile Cargos (`coreml`, `onnx`)

Job Dispatch (`find_host_for_workload`)