SL
Skeptik Log
youtube

Claude Code for Free: Three Ways to Run It Without Opening Your Wallet

By Skeptik Log

Claude Code costs nothing if you run it on alternative models. Ollama gives you GLM-5.1 from the cloud or Gemma 4 locally, OpenRouter hands you Elephant Alpha for free. The trade-off? No Claude under the hood, but surprisingly solid results at zero cost.

🎬 Article based on a YouTube video. Sources: Julian Goldie, YouTube video, Ollama docs, OpenRouter

Where we’re going

If you’re paying $20/month for Claude Code, you might not know there are three ways to get a similar experience for free. Julian Goldie walks through them in his video: Ollama with a cloud model, Ollama with a local model, and OpenRouter with a free alpha model. The video is heavy on pitches for his community, but the technical content is legit.

Claude Code, minus Claude

Let’s clear this up first. “Claude Code for free” doesn’t mean using Anthropic’s Claude model. It means using the Claude Code client, the command-line coding agent, and pointing it at different models. Claude Code is a client. The model is a separate decision.

That means the experience shifts. Smaller or less capable models will make mistakes that Claude wouldn’t. But for plenty of day-to-day coding tasks, the gap is smaller than you’d think.

Method 1: Ollama + GLM-5.1 (cloud)

The simplest approach: run a cloud model through Ollama.

ollama run glm-5.1:cloud
ollama launch claude --model glm-5.1:cloud

GLM-5.1 comes from Z.AI with solid credentials: SWE-Bench Pro SOTA (at release time) and 68.5% on Terminal-Bench 2.0. It’s built for coding and agentic work, and it shows.

Pros:

  • Zero local setup: no beefy hardware needed, the model runs on Ollama’s cloud
  • Good speed: latency is API-call level, not local-inference level
  • High quality: GLM-5.1 ranks among the strongest open models for coding tasks

Cons:

  • Token limits: the free tier has a per-session and per-week token budget. Long refactors might exhaust it
  • Connection required: no internet, no coding agent
Deep dive. Since January 2026 (Ollama v0.14), Ollama exposes an **Anthropic Messages API compatibility layer** on `localhost:11434`. This means Claude Code can connect directly to any Ollama model without proxies or manual configuration. The `ollama launch` command handles endpoint and model setup automatically.

Method 2: Ollama + Gemma 4 (local)

The second approach is for people who want everything on their own machine. No cloud, no costs, no logs.

ollama run gemma4:31b
ollama launch claude --model gemma4:31b

Gemma 4 by Google is an open-weight model that runs locally. The 31B dense version is the most capable, but there’s also a 26B MoE variant that needs fewer resources.

Pros:

  • Zero cost, unlimited tokens: the model lives on your disk, tokens are infinite
  • Full privacy: your code never leaves your machine
  • Offline: works without an internet connection

Cons:

  • Hardware: the 31B dense needs at least 16GB unified RAM (24GB to be comfortable). The 26B MoE drops to 6GB
  • Speed: on a Mac Mini M4 Pro with 24GB, the 31B runs at roughly 15 tokens/s. Fine for thoughtful work, slow for rapid iteration
  • Quality: Gemma 4 is good, but it’s not Claude. On complex multi-file tasks, the gap shows
Try it yourself - Minimum requirements: 16GB RAM for 31B dense, 6GB for 26B MoE - Command: `ollama run gemma4:31b` then `ollama launch claude --model gemma4:31b` - On Mac Mini M4 Pro 24GB: ~15 token/s, 256K context

Method 3: OpenRouter + Elephant Alpha (free API)

The third one is the most curious. Elephant Alpha is a stealth model that appeared on OpenRouter on April 13, 2026. Nobody knows who trained it: the provider is OpenRouter itself. 100 billion parameters, 256K context, zero cost.

To use it with Claude Code, configure OpenRouter as your API provider and point it at Elephant Alpha.

Pros:

  • 100B parameters: the largest model of the three, potentially the most capable
  • 256K context: plenty of memory for large files and extended codebases
  • Function calling and structured output: natively supported
  • Free: $0/M for both input and output

Cons:

  • Alpha: it’s in testing. It could change, disappear, or become paid at any point
  • Privacy: prompts may be logged by the provider. The model page states: “Prompts and completions may be logged by the provider and used to improve the model.” If you’re working on proprietary code, think twice
  • Mystery: no documentation on architecture, training data, or who’s behind it. “Elephant Alpha” could be anything
Deep dive. Elephant Alpha also supports **prompt caching**, which reduces latency (and cost, already zero) on repetitive contexts. The privacy note is explicit: your prompts are not private. For open-source code or learning, no problem. For company code, it's a risk.

Which one? Depends on context

There’s no clear winner. Each method has its ideal use case:

Scenario Method Why
Quick prototyping, single tasks GLM-5.1 cloud Fast, zero setup, high quality
Sensitive code, offline work Gemma 4 local Full privacy, unlimited tokens
Large codebase, long context Elephant Alpha 256K context, 100B params
Complex multi-file refactoring GLM-5.1 cloud Best agentic reasoning
Open-ended experimentation Gemma 4 local No token limits, iterate as much as you want
Deep dive. You can configure different backends per project via `.claude/settings.local.json`. A hybrid approach works well: **local for single tasks and experimentation**, **cloud for complex multi-file refactoring**. You don't have to pick one for everything.

For the technically inclined

From here on it gets technical. If you care about the idea more than the implementation, skip to the conclusion.

Ollama’s compatibility layer

Since Ollama v0.14 (January 2026), the daemon exposes an endpoint compatible with the Anthropic Messages API at localhost:11434/v1/messages. Any client that speaks the Anthropic protocol, Claude Code included, can point to Ollama as if it were the official API.

The flow:

  1. Ollama downloads and loads the model (local or cloud)
  2. It exposes the compatible endpoint
  3. ollama launch claude automatically sets ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY
  4. Claude Code talks to Ollama as if talking to Anthropic

No proxy, no litellm, no manual configuration. It just works.

Elephant Alpha: what we know

Not much, honestly. The specs published by OpenRouter:

Parameter Value
Parameters 100B
Context 256K tokens
Input $0/M
Output $0/M
Function calling Yes
Structured output Yes
Prompt caching Yes
Provider OpenRouter (unknown)

The “intelligence efficiency” label suggests a model optimized to produce quality responses with fewer tokens. Think reasoning efficiency: think well, waste little. But without a paper, without independent benchmarks, it’s all unverified.

Configuring OpenRouter with Claude Code

To use Elephant Alpha (or any OpenRouter model) with Claude Code:

export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=your-openrouter-api-key
claude --model openrouter/elephant-alpha

Or through the project’s configuration file.

The bottom line

Key points:

  • Claude Code is a client: you can plug any model that speaks the Anthropic protocol
  • Ollama + GLM-5.1 cloud is the easiest path to immediate quality at zero cost
  • Ollama + Gemma 4 local is the privacy and offline choice, at the cost of speed
  • Elephant Alpha is the wildcard: powerful, free, but alpha and with privacy caveats

The free coding agent isn’t an experiment anymore. It’s a real choice with real trade-offs. The question isn’t “can I afford Claude Code?” anymore, it’s “which compromise am I willing to make?”

youtube By Skeptik Log