Skip to main content

macOS 26 ships a local 3B LLM — and you can actually use it

·261 words·2 mins
Kostiantyn Lysenko
Author
Kostiantyn Lysenko

If you run macOS 26 Tahoe on Apple Silicon, you already have a language model on disk. Apple ships a ~3-billion-parameter model with the OS and exposes it through the FoundationModels framework — the same model Apple Intelligence runs on. It’s heavily quantized — Apple uses 2-bit quantization-aware training (QAT) — runs locally, and costs nothing.

The catch: the API is Swift only. You can’t curl it.

apfel fixes that. It wraps the built-in model in a CLI and an OpenAI-compatible server, so anything that speaks the OpenAI API can talk to your Mac’s local model — no keys, no cloud, no Xcode.

Install
#

brew install apfel

Needs macOS 26 Tahoe or later and an Apple Silicon Mac (M1+). The model is part of the OS, so there’s nothing to download.

Use it
#

One-shot from the shell:

apfel "What is the capital of Austria?"
apfel writing a haiku, fully offline

Interactive chat:

apfel --chat

OpenAI-compatible server:

apfel --serve   # http://localhost:11434/v1

The server is the part that matters. Point any OpenAI SDK at the local endpoint:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

resp = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[{"role": "user", "content": "Summarize this in one line: ..."}],
)
print(resp.choices[0].message.content)

/v1/chat/completions supports streaming, tool calling, and JSON-schema output. /v1/models lists the one model you get.

Where it fits
#

It’s a 3B model with a 4096-token context — input and output combined. That budget rules out AI-agent workloads and long documents. But for summarization, entity extraction, classification, short rewrites, and quick local glue in scripts, it’s free, private, and already installed. No token meter running.

Repo: https://github.com/Arthur-Ficial/apfel


comments powered by Disqus