Sawyer — Distributed MoE Inference Network

Architecture

Mixture-of-Experts, Distributed

MoE models activate only 25% of parameters per token. Each expert is independent, so they split across consumer GPUs naturally. No tensor parallelism required.

Router

Receives your request, runs the gating network locally, and activates only the 2-6 relevant experts across the network. Aggregates the results back to you.

Expert Nodes

Each volunteer GPU hosts 1-3 expert weight files (~1.5GB each for Mixtral). A single RTX 3090 can serve inference requests while you game or work.

Bedrock Identity

Every node holds a cryptographic identity from Bedrock. Consent tokens gate which models a node will serve. Every inference is audit-logged.

Token Economics

$5/month gives you a token budget. Tokens debit per inference. Hosts earn a 70% share of what you spend, proportional to compute contributed.

Smart Routing

Adaptive routing balances load (60%) and latency (40%). Falls back to redundant experts on timeout. No single point of failure.

Quantized Models

Q4_K_M quantization fits frontier models on consumer hardware. Mixtral-8x7B expert = ~1.5GB. Full model = ~24GB. Each piece runs independently.

How It Works

From request to result in 5 steps

No setup required. Subscribe, get an API key, and send requests like any other inference API.

Pick a tier. Get an API key instantly. Explorer starts at $5/month with 500K tokens.

Send a Request

Use the Sawyer API like any inference endpoint. The router receives your prompt and token embedding.

Router Selects Experts

The gating network identifies which 2-6 experts are needed. Only those experts activate. The rest stay dormant.

Nodes Compute in Parallel

Expert nodes run forward passes concurrently. Average latency: 50-200ms per expert on consumer hardware.

Get Your Result

The router aggregates expert outputs and returns your response. Tokens are debited from your budget.

Hosts Get Paid

70% of what you spend goes to the nodes that served you. Monthly or quarterly payouts via Stripe Connect.

Pricing

Cheap enough to experiment, powerful enough to ship

No per-token surprises. Fixed monthly budgets with rollover. Cancel anytime.

Explorer

Starter

$5/mo

500,000 tokens per month

All supported models
Token rollover (1 month)
Standard routing
Community support

Get Started

Adventurer

Developer

$15/mo

2,000,000 tokens per month

All supported models
Token rollover (1 month)
Priority routing
Email support

Get Started

Pioneer

Production

$40/mo

10,000,000 tokens per month

All supported models
Token rollover (1 month)
Adaptive routing (priority)
Direct support

Get Started

Host a Node

Turn idle GPU time into real income

One command to join. Your GPU serves expert inference requests in the background. You earn money while your machine sits idle.

70% Host Share

70%

Of every token you serve goes to you. The other 30% sustains the network routing and infrastructure.

Min Payout

$10

Monthly payout threshold. Or $25 for quarterly. Via Stripe Connect. 1099-K tax reporting handled automatically.

One Command

$ sawyer register

Verified Identity

KYC/AML

Stripe Connect Express handles onboarding, bank verification, and tax reporting. Your personal data stays with Stripe.

Supported Models

Frontier models on consumer hardware

Quantized MoE models that split across GPUs. Each expert runs independently, so the network scales horizontally.

Model	Params	Experts	Active/Token	Q4 Size	Expert Size
Mixtral 8x7B	46.7B	8	2	~24 GB	~1.5 GB
DeepSeek-V2 Lite new	15.7B	64 (shared)	6	~9 GB	varies
Qwen2.5 7B MoE	14.3B	60	4	~7 GB	varies

CLI

From zero to inference in 3 commands

# Create your account
$ sawyer account create --tier explorer

# Register your GPU as a host node
$ sawyer provider register --email you@example.com --name "MyNode"

# Start serving inference requests
$ sawyer serve --gpu

Sawyer Node Started
  Node:       sawyer-node-abc123
  Experts:    mixtral-8x7b/e2, mixtral-8x7b/e5
  GPU:        NVIDIA RTX 3090 (24 GB)
  Status:     Healthy
  Earnings:   $0.00

The load is split.
Friends help.

Router

Expert Nodes

Bedrock Identity

Token Economics

Smart Routing

Quantized Models

Subscribe

Send a Request

Router Selects Experts

Nodes Compute in Parallel

Get Your Result

Hosts Get Paid

70% Host Share

Min Payout

One Command

Verified Identity

Start building with frontier AI for $5/month.

The load is split.Friends help.

Router

Expert Nodes

Bedrock Identity

Token Economics

Smart Routing

Quantized Models

Subscribe

Send a Request

Router Selects Experts

Nodes Compute in Parallel

Get Your Result

Hosts Get Paid

70% Host Share

Min Payout

One Command

Verified Identity

Start building with frontier AI for $5/month.

The load is split.
Friends help.