Reference

The Offline AI Glossary

Nineteen terms, defined precisely enough to build with. Words are load-bearing in this field — "on-premise" and "air-gapped" differ by an entire threat model.

01Air gap: Physical isolation of a computer system from all networks — no internet, LAN or wireless — so data cannot enter or leave except by removable media. Isolation by physics, not firewall policy.
02Air-gapped AI: An AI system, typically a large language model, running on air-gapped hardware. The model answers locally; data cannot leak because no path out exists. Full treatment in Air-Gapped AI, Explained.
03Offline LLM: A large language model whose weights and inference run entirely on local hardware with no internet dependency. Every air-gapped LLM is offline; an offline LLM becomes air-gapped when the isolation is engineered and verified.
04Large language model (LLM): A neural network trained on large text corpora to understand and generate language — the engine behind modern AI assistants. Mechanically, a file of weights plus software to run them, which is why it can live on your machine.
05Open-weight model: A model whose trained weights are published under a licence permitting download and local use. The prerequisite for genuine offline deployment: closed API-only models cannot be owned, only rented.
06NVIDIA Nemotron: NVIDIA's family of open-weight LLMs, optimised for NVIDIA GPUs — from edge-scale Nano variants that run on a Jetson to larger Super-class models for nodes and racks. AIOD's standard stack; see the model stack.
07Inference: Running a trained model to produce answers, as distinct from training it. Local inference means this computation happens on hardware you control — the defining act of an offline appliance.
08Token: The unit of text an LLM reads and writes — roughly three-quarters of an English word. Cloud APIs bill per token; on an owned appliance the marginal token costs approximately its electricity.
09Tokens per second (tok/s): Inference throughput. Around 10–20 tok/s sustains fluent conversation; 30+ exceeds comfortable reading speed. Planning ranges by hardware class are in our benchmarks.
10Quantisation: Storing model weights at reduced numerical precision — commonly 4–8 bit instead of 16 — cutting memory and power needs roughly in half or quarter for a modest quality cost. The technique that makes battery-powered LLMs practical.
11VRAM: GPU memory: the binding constraint in local inference. The model must fit in VRAM with headroom for context and retrieval — sizing maths in the Nemotron guide.
12Retrieval-augmented generation (RAG): Architecture that retrieves relevant passages from a local document corpus and supplies them to the model at question time — grounding answers in authoritative sources and letting smaller models punch far above their weight.
13Corpus: The curated body of documents an appliance retrieves from: manuals, protocols, case files, reference works. The model is the engine; the corpus decides what the machine is for.
14Edge AI: AI computation performed on devices at the point of use — vehicles, sensors, portable units — rather than in a data centre. AIOD Field units are edge AI in its most literal form: sealed, portable, battery-fed.
15Fine-tuning: Further training of a model on domain data to specialise its behaviour. Distinct from RAG, which supplies knowledge at question time without changing weights; most missions want RAG first, fine-tuning only when justified.
16Knowledge pack: AIOD's sustainment mechanism: a signed, encrypted drive carrying model, software and corpus updates, verified and applied locally — so an offline appliance improves on schedule without ever touching a network. Detail under Knowledge Packs.
17Telemetry: Usage data sent from software back to its vendor. A genuinely offline system has none. Zero telemetry is a defining property of the air gap, not a privacy setting.
18Sovereign AI: AI capability owned and operated under one's own control — weights, hardware and data custody — independent of any vendor, subscription or foreign jurisdiction. The property all AIOD deployments share, whatever their mission.
19Google Gemma: Google’s family of open-weight LLMs — compact ~1B variants up to 27B-class models with strong multilingual coverage — and, with Nemotron, one of the two families AIOD deploys offline. Guide: Run Gemma Locally.

Vocabulary into hardware

Now say it in watts and gigabytes.

When the terms are clear, the spec conversation takes an hour.

DEPLOY@AIOD.APP →