Troubleshooting
Common issues and solutions for consumers, Islands, and publishers on Archipelag.io
Troubleshooting
This guide covers the most common issues encountered by consumers, Islands, and publishers, along with their solutions. Issues are organized by role so you can quickly find what’s relevant to you.
For Consumers
Job stuck in “Processing”
Symptoms: Your job shows “Processing…” but never completes. No output appears.
Possible causes:
| Cause | Solution |
|---|---|
| No Islands available for the Cargo | Wait for an Island to come online, or try again later |
| Island went offline mid-job | The job will be automatically requeued after lease expiry (up to 5 minutes) |
| Cargo container crashed | The job will be marked failed — check the error message and retry |
No streaming output
Symptoms: Job starts but you don’t see token-by-token output.
Possible causes:
- The Cargo doesn’t support streaming (some Cargos return all output at once)
- WebSocket connection was interrupted — refresh the page to reconnect
- The Cargo is still loading its model — watch for status messages like “Loading model…”
API errors
Common error responses:
| Status | Error | Solution |
|---|---|---|
| 401 | Unauthorized | Check your API key. Ensure it has the correct scopes (read/write). |
| 402 | Insufficient credits | Purchase more credits or check your balance. |
| 429 | Rate limited | You’ve exceeded 100 requests/minute. Wait and retry. |
| 422 | Validation error | Check input limits: max 128 messages, 32KB per message, 4096 max_tokens. |
| 503 | Service unavailable | The coordinator is restarting. Wait 15-30 seconds and retry. |
Billing issues
- Charged but no result: If a job fails after being charged, credits are automatically refunded.
- Unexpected charges: Check your job history — each charge is linked to a specific job and Cargo.
- Credits not appearing after purchase: Stripe webhook processing is usually instant but can take up to a minute. If credits don’t appear after 5 minutes, contact support.
For Islands
Island won’t start
Connection refused to Docker
NATS connection failed
Island ID conflicts
Not receiving jobs
Possible causes:
| Cause | How to Check | Solution |
|---|---|---|
| Island not approved | Check logs for pairing code | Pair at /pair or ask admin to approve |
| Island not registered | Check logs for “Registered with coordinator” | Verify NATS connection URL and credentials |
| Island marked offline | Heartbeat logs should appear every 10s | Ensure heartbeats are sent and acknowledged |
| Runtime not advertised | Check DB supported_runtimes for your Island | Rebuild with --features gguf (or all-runtimes) |
| Cargo requires GPU but Island has none | GGUF Cargos with required_vram_mb > 0 | Set required_vram_mb = NULL for CPU-capable models |
| Doesn’t meet Cargo requirements | Compare Island capabilities vs. Cargo requirements | Upgrade hardware or target different Cargos |
| No warm containers | Other Islands with cached images get priority | Enable preloading in config or run a test job |
| Low karma | Islands below threshold get fewer jobs | Complete jobs reliably to build karma |
Karma dropping
Karma penalties are applied for failures:
| Event | Penalty | Equivalent Trust Lost |
|---|---|---|
| Job failure | -5 karma | ~5 hours of compute |
| Job timeout | -3 karma | ~3 hours of compute |
| Disconnect mid-job | -20 karma | ~20 hours of compute |
How to recover:
- Below the monetization threshold, you earn karma at 1.5x rate (recovery multiplier)
- Focus on reliability — a stable Island earns 1 karma per hour of compute
- Check your karma history for patterns:
GET /api/v1/hosts/{id}/karma - If penalties seem incorrect, contact platform admin for manual adjustment
See the Karma Reference for full details on the karma system.
Connection drops
The Island software handles network instability gracefully:
- NATS reconnect: Infinite retry with backoff, automatic resubscription
- Heartbeat recovery: After 3 consecutive heartbeat failures, the Island software resubscribes to NATS topics with exponential backoff (up to 30 seconds)
- Active jobs: Jobs in progress continue executing even during brief disconnects. The Island software renews leases locally and publishes status when the connection recovers.
If you’re experiencing frequent disconnects:
- Check your internet connection stability
- Verify no firewall is intermittently blocking port 4222
- Check system resources — high CPU/memory pressure can cause the Island software to miss heartbeat deadlines
- Review Island logs for NATS reconnection patterns
For Publishers
Security scan failures
Critical or high vulnerabilities found
Scan keeps timing out
Submission rejected
Common rejection reasons:
| Reason | Solution |
|---|---|
| Missing input/output schema | Add JSON Schema definitions for your Cargo’s I/O format |
| Unreasonable resource requirements | Lower requirements to match actual usage — test with profiling |
| Unclear description | Explain what the Cargo does, what inputs it expects, and what output it produces |
| Policy violation | Review Cargo Registry policies — no malicious code, no data exfiltration |
| Image not on approved registry | Push to ghcr.io/archipelag-io or docker.io/archipelag |
Reputation dropping
Cargo reputation is tracked automatically:
- Score below 0.5: Cargo is auto-suspended
- Success rate below 90% (after 100+ jobs): Trust level is demoted
- 10+ complaints in 7 days: Flagged for manual review
Common causes of reputation drops:
- Container crashes on certain inputs — test with edge cases
- Timeouts under load — optimize or increase resource requirements
- Out-of-memory kills — increase
required_ram_mbor optimize memory usage
Common Error Codes
| Error | Meaning | Action |
|---|---|---|
ECONNREFUSED | Can’t connect to Docker or NATS | Check service is running and accessible |
IMAGE_NOT_ALLOWED | Container image from unapproved registry | Use an approved registry |
SIGNATURE_INVALID | Cosign signature verification failed | Re-sign with a trusted key |
LEASE_EXPIRED | Job execution exceeded lease time | Optimize Cargo or request lease extensions |
OOM_KILLED | Container exceeded memory limit | Reduce memory usage or increase requirements |
QUOTA_EXCEEDED | User hit their job quota | Upgrade tier or wait for quota reset |
KYC_REQUIRED | Identity verification needed | Complete KYC verification at /verify |
Getting Help
If your issue isn’t covered here:
- Check the FAQ for common questions
- Review the Security page for security-related issues
- Contact support with your job ID, Island ID, or submission ID for faster resolution
Next Steps
Cargo Registry Security
Learn about the security model protecting Islands, consumers, and Cargos.
System Overview
Understand the full architecture to better diagnose issues.
{% end %}
