- 01Air gap
- Physical isolation of a computer system from all networks — no internet, LAN or wireless — so data cannot enter or leave except by removable media. Isolation by physics, not firewall policy.
- 02Air-gapped AI
- An AI system, typically a large language model, running on air-gapped hardware. The model answers locally; data cannot leak because no path out exists. Full treatment in Air-Gapped AI, Explained.
- 03Offline LLM
- A large language model whose weights and inference run entirely on local hardware with no internet dependency. Every air-gapped LLM is offline; an offline LLM becomes air-gapped when the isolation is engineered and verified.
- 04Large language model (LLM)
- A neural network trained on large text corpora to understand and generate language — the engine behind modern AI assistants. Mechanically, a file of weights plus software to run them, which is why it can live on your machine.
- 05Open-weight model
- A model whose trained weights are published under a licence permitting download and local use. The prerequisite for genuine offline deployment: closed API-only models cannot be owned, only rented.
- 06NVIDIA Nemotron
- NVIDIA's family of open-weight LLMs, optimised for NVIDIA GPUs — from edge-scale Nano variants that run on a Jetson to larger Super-class models for nodes and racks. AIOD's standard stack; see the model stack.
- 07Inference
- Running a trained model to produce answers, as distinct from training it. Local inference means this computation happens on hardware you control — the defining act of an offline appliance.
- 08Token
- The unit of text an LLM reads and writes — roughly three-quarters of an English word. Cloud APIs bill per token; on an owned appliance the marginal token costs approximately its electricity.
- 09Tokens per second (tok/s)
- Inference throughput. Around 10–20 tok/s sustains fluent conversation; 30+ exceeds comfortable reading speed. Planning ranges by hardware class are in our benchmarks.
- 10Quantisation
- Storing model weights at reduced numerical precision — commonly 4–8 bit instead of 16 — cutting memory and power needs roughly in half or quarter for a modest quality cost. The technique that makes battery-powered LLMs practical.
- 11VRAM
- GPU memory: the binding constraint in local inference. The model must fit in VRAM with headroom for context and retrieval — sizing maths in the Nemotron guide.
- 12Retrieval-augmented generation (RAG)
- Architecture that retrieves relevant passages from a local document corpus and supplies them to the model at question time — grounding answers in authoritative sources and letting smaller models punch far above their weight.
- 13Corpus
- The curated body of documents an appliance retrieves from: manuals, protocols, case files, reference works. The model is the engine; the corpus decides what the machine is for.
- 14Edge AI
- AI computation performed on devices at the point of use — vehicles, sensors, portable units — rather than in a data centre. AIOD Field units are edge AI in its most literal form: sealed, portable, battery-fed.
- 15Fine-tuning
- Further training of a model on domain data to specialise its behaviour. Distinct from RAG, which supplies knowledge at question time without changing weights; most missions want RAG first, fine-tuning only when justified.
- 16Knowledge pack
- AIOD's sustainment mechanism: a signed, encrypted drive carrying model, software and corpus updates, verified and applied locally — so an offline appliance improves on schedule without ever touching a network. Detail under Knowledge Packs.
- 17Telemetry
- Usage data sent from software back to its vendor. A genuinely offline system has none. Zero telemetry is a defining property of the air gap, not a privacy setting.
- 18Sovereign AI
- AI capability owned and operated under one's own control — weights, hardware and data custody — independent of any vendor, subscription or foreign jurisdiction. The property all AIOD deployments share, whatever their mission.
- 19Google Gemma
- Google’s family of open-weight LLMs — compact ~1B variants up to 27B-class models with strong multilingual coverage — and, with Nemotron, one of the two families AIOD deploys offline. Guide: Run Gemma Locally.
Vocabulary into hardware
Now say it in watts and gigabytes.
When the terms are clear, the spec conversation takes an hour.
DEPLOY@AIOD.APP →