Open Beta Archipelag.io is in open beta until June 2026. All credits and earnings are virtual. Read the announcement →

Troubleshooting

Common issues and solutions for consumers, Islands, and publishers on Archipelag.io

Troubleshooting

This guide covers the most common issues encountered by consumers, Islands, and publishers, along with their solutions. Issues are organized by role so you can quickly find what’s relevant to you.

For Consumers

Job stuck in “Processing”

Symptoms: Your job shows “Processing…” but never completes. No output appears.

Possible causes:

CauseSolution
No Islands available for the CargoWait for an Island to come online, or try again later
Island went offline mid-jobThe job will be automatically requeued after lease expiry (up to 5 minutes)
Cargo container crashedThe job will be marked failed — check the error message and retry
Check job status
If a job is stuck, refresh the page. LiveView will auto-reconnect and show the latest status. Jobs that exceed their lease are automatically requeued or failed within a few minutes.

No streaming output

Symptoms: Job starts but you don’t see token-by-token output.

Possible causes:

  • The Cargo doesn’t support streaming (some Cargos return all output at once)
  • WebSocket connection was interrupted — refresh the page to reconnect
  • The Cargo is still loading its model — watch for status messages like “Loading model…”

API errors

Common error responses:

StatusErrorSolution
401UnauthorizedCheck your API key. Ensure it has the correct scopes (read/write).
402Insufficient creditsPurchase more credits or check your balance.
429Rate limitedYou’ve exceeded 100 requests/minute. Wait and retry.
422Validation errorCheck input limits: max 128 messages, 32KB per message, 4096 max_tokens.
503Service unavailableThe coordinator is restarting. Wait 15-30 seconds and retry.

Billing issues

  • Charged but no result: If a job fails after being charged, credits are automatically refunded.
  • Unexpected charges: Check your job history — each charge is linked to a specific job and Cargo.
  • Credits not appearing after purchase: Stripe webhook processing is usually instant but can take up to a minute. If credits don’t appear after 5 minutes, contact support.

For Islands

Island won’t start

Connection refused to Docker
The Island software needs access to the Docker socket. ```bash # Check Docker is running docker info # Check socket permissions ls -la /var/run/docker.sock # If permission denied, add your user to the docker group sudo usermod -aG docker $USER # Then log out and back in ```
NATS connection failed
The Island software can't reach the NATS server. ```bash # Check your config.toml cat config.toml | grep nats_url # Test connectivity nc -zv <nats-host> 4222 # Common causes: # - Wrong NATS URL in config.toml # - Firewall blocking outbound port 4222 # - NATS server is down ``` The Island software retries NATS connections infinitely with backoff, so it will reconnect automatically when the server becomes available.
Island ID conflicts
If you see errors about duplicate host IDs, the Island state file may be stale. ```bash # Check current state cat ~/.archipelag/state.json # To generate a fresh Island ID, remove the state file rm ~/.archipelag/state.json # Then restart the Island ```

Not receiving jobs

Possible causes:

CauseHow to CheckSolution
Island not approvedCheck logs for pairing codePair at /pair or ask admin to approve
Island not registeredCheck logs for “Registered with coordinator”Verify NATS connection URL and credentials
Island marked offlineHeartbeat logs should appear every 10sEnsure heartbeats are sent and acknowledged
Runtime not advertisedCheck DB supported_runtimes for your IslandRebuild with --features gguf (or all-runtimes)
Cargo requires GPU but Island has noneGGUF Cargos with required_vram_mb > 0Set required_vram_mb = NULL for CPU-capable models
Doesn’t meet Cargo requirementsCompare Island capabilities vs. Cargo requirementsUpgrade hardware or target different Cargos
No warm containersOther Islands with cached images get priorityEnable preloading in config or run a test job
Low karmaIslands below threshold get fewer jobsComplete jobs reliably to build karma
Native runtime Cargos
For GGUF, ONNX, or diffusers Cargos, your Island must be built with the corresponding feature flag (`--features gguf`, `--features onnx`, `--features diffusers`, or `--features all-runtimes`). The Island advertises its supported runtimes in heartbeats — if a runtime isn't compiled in, the coordinator will never dispatch those Cargos to your Island. See the [Native ML Runtimes guide](/guides/native-runtimes/) for details.
Placement priority
The coordinator prefers Islands that: (1) are in the same region as the consumer, (2) already have the Cargo image cached, (3) have high reputation scores, and (4) have fewer active jobs. If you're not getting jobs, check these factors.

Karma dropping

Karma penalties are applied for failures:

EventPenaltyEquivalent Trust Lost
Job failure-5 karma~5 hours of compute
Job timeout-3 karma~3 hours of compute
Disconnect mid-job-20 karma~20 hours of compute

How to recover:

  • Below the monetization threshold, you earn karma at 1.5x rate (recovery multiplier)
  • Focus on reliability — a stable Island earns 1 karma per hour of compute
  • Check your karma history for patterns: GET /api/v1/hosts/{id}/karma
  • If penalties seem incorrect, contact platform admin for manual adjustment

See the Karma Reference for full details on the karma system.

Connection drops

The Island software handles network instability gracefully:

  • NATS reconnect: Infinite retry with backoff, automatic resubscription
  • Heartbeat recovery: After 3 consecutive heartbeat failures, the Island software resubscribes to NATS topics with exponential backoff (up to 30 seconds)
  • Active jobs: Jobs in progress continue executing even during brief disconnects. The Island software renews leases locally and publishes status when the connection recovers.
Extended disconnects cause karma loss
If the Island disconnects for more than 30 seconds, the coordinator marks it as offline. Any active jobs will have their leases expire, resulting in a -20 karma penalty per job. Ensure stable network connectivity for your Island.

If you’re experiencing frequent disconnects:

  1. Check your internet connection stability
  2. Verify no firewall is intermittently blocking port 4222
  3. Check system resources — high CPU/memory pressure can cause the Island software to miss heartbeat deadlines
  4. Review Island logs for NATS reconnection patterns

For Publishers

Security scan failures

Critical or high vulnerabilities found
Update your base image and dependencies: ```bash # Check vulnerabilities locally before submitting trivy image your-workload:latest # Common fixes: # 1. Update base image FROM python:3.12-slim # Use latest slim variant # 2. Update OS packages RUN apt-get update && apt-get upgrade -y # 3. Update language dependencies RUN pip install --upgrade pip && pip install -r requirements.txt ``` Use multi-stage builds to minimize the final image: ```dockerfile # Build stage FROM python:3.12 AS builder COPY requirements.txt . RUN pip install --user -r requirements.txt # Runtime stage FROM python:3.12-slim COPY --from=builder /root/.local /root/.local COPY . . CMD ["python", "main.py"] ```
Scan keeps timing out
Large images take longer to scan. To reduce scan time: - Use a smaller base image (alpine, distroless, slim) - Remove unnecessary packages and build artifacts - Use `.dockerignore` to exclude test files, docs, and dev dependencies

Submission rejected

Common rejection reasons:

ReasonSolution
Missing input/output schemaAdd JSON Schema definitions for your Cargo’s I/O format
Unreasonable resource requirementsLower requirements to match actual usage — test with profiling
Unclear descriptionExplain what the Cargo does, what inputs it expects, and what output it produces
Policy violationReview Cargo Registry policies — no malicious code, no data exfiltration
Image not on approved registryPush to ghcr.io/archipelag-io or docker.io/archipelag

Reputation dropping

Cargo reputation is tracked automatically:

  • Score below 0.5: Cargo is auto-suspended
  • Success rate below 90% (after 100+ jobs): Trust level is demoted
  • 10+ complaints in 7 days: Flagged for manual review

Common causes of reputation drops:

  • Container crashes on certain inputs — test with edge cases
  • Timeouts under load — optimize or increase resource requirements
  • Out-of-memory kills — increase required_ram_mb or optimize memory usage

Common Error Codes

ErrorMeaningAction
ECONNREFUSEDCan’t connect to Docker or NATSCheck service is running and accessible
IMAGE_NOT_ALLOWEDContainer image from unapproved registryUse an approved registry
SIGNATURE_INVALIDCosign signature verification failedRe-sign with a trusted key
LEASE_EXPIREDJob execution exceeded lease timeOptimize Cargo or request lease extensions
OOM_KILLEDContainer exceeded memory limitReduce memory usage or increase requirements
QUOTA_EXCEEDEDUser hit their job quotaUpgrade tier or wait for quota reset
KYC_REQUIREDIdentity verification neededComplete KYC verification at /verify

Getting Help

If your issue isn’t covered here:

  1. Check the FAQ for common questions
  2. Review the Security page for security-related issues
  3. Contact support with your job ID, Island ID, or submission ID for faster resolution

Next Steps

{% card(title="Karma Reference", href="/reference/karma/") %} Understand how karma is earned, lost, and how it affects your Island's placement.

Cargo Registry Security

Learn about the security model protecting Islands, consumers, and Cargos.

System Overview

Understand the full architecture to better diagnose issues.

{% end %}