Home / Technology

The stack

Coherent from chip to model:
NVIDIA silicon. Nemotron & Gemma weights.

An offline appliance lives or dies on the fit between hardware and model. AIOD builds on NVIDIA GPUs running two open-weight families — NVIDIA's own Nemotron and Google's Gemma — so every watt, every gigabyte of VRAM and every token per second is accounted for at design time.

The model stack

Open weights, engineered for the silicon they run on.

01 // OWNERSHIP

Open-licensed weights

Nemotron ships under NVIDIA's open model licence; Gemma under Google's open-weight Gemma terms. Either way the weights live on your hardware, under your control — no API key that expires, no terms-of-service change that strands your investment, no vendor who can switch you off.

02 // EFFICIENCY

Sized for the edge

The combined line-up spans ~1B Gemma variants and Nano-class Nemotron models that run comfortably on a Jetson, through Super-class and 27B models for workstation and node deployments. With quantisation, serious capability fits in a sealed case that runs on a battery.

03 // COHERENCE

One vendor, chip to weights

Models tuned by the company that designed the GPUs they run on — with TensorRT-LLM optimisation paths and predictable performance. Fewer integration surprises means tighter power budgets and honest runtime promises.

Choosing between them

Family / sizeNatural habitatReach for it when
Nemotron Nano ~9–12BJetson Orin · single GPUMaximum performance-per-watt on NVIDIA edge silicon; TensorRT-LLM optimised
Nemotron Super ~49BNode · rackStrongest reasoning for DGX-class private deployments
Gemma compact ~1–4BUltra-low-power edgeThe tightest battery budgets; companion and embedded tasks
Gemma 12B / 27BSingle GPU · nodeMultilingual missions, vision-capable variants, deepest quantised-ecosystem maturity
Two families, one rule

Selection is mission-led. Nemotron typically wins raw performance-per-watt on our NVIDIA silicon; Gemma earns its slot on multilingual corpora — Civic and humanitarian nodes especially — on ultra-compact builds, and where its very mature local-deployment ecosystem shortens the path. Both are open-weight, both run fully offline, both are yours at handover. Guides: run Nemotron locally · run Gemma locally.

Hardware guide

Three classes of silicon, sized to the mission.

We spec the smallest machine that does the job properly — smaller means less power, less heat, fewer failure modes, and longer autonomous runtime.

ClassComputeModel fitPower envelopeTypical unit
Edge / portableNVIDIA Jetson OrinNemotron Nano, quantised15–60 WAIOD Field F-1
Node / workstationDGX-Spark-class · RTX workstationNemotron Nano → Super~100–500 WAIOD Civic C-1
Rack / departmentMulti-GPU RTX / DGX-class rackNemotron Super+, fine-tuned, high concurrency1 kW+AIOD Private P-1
Sizing rule of thumb

Model parameters × precision sets the VRAM floor: a ~12B-parameter model wants roughly 24 GB at FP16, or close to half that at 4-bit quantisation — before context and retrieval overhead. We publish full sizing maths in the Nemotron local deployment guide.

SIZE YOUR OWN BUILD — HARDWARE CALCULATOR →

Security model

The air gap is engineered, then proven.

"Offline" is a verifiable property, not a marketing word. Every appliance ships with a documented attestation of the controls below.

GAP / NETWORK

No path out

Radios disabled or physically absent; interfaces locked at OS and firmware level; verified with the client present at handover.

SIG / UPDATES

Signed media only

The appliance applies updates exclusively from cryptographically signed AIOD media. Unsigned input is refused, loudly.

ENC / AT REST

Encrypted at rest

Full-disk encryption protects weights, corpus and logs if the hardware is ever lost, stolen or decommissioned.

TEL / ZERO

Zero telemetry

No analytics, no call-home, no usage data. We cannot see your prompts. Nobody can — that is the product.

AIR-GAPPED AI, EXPLAINED →

Benchmarks

What "real-time" means on a battery.

Indicative interactive throughput by class — useful conversation needs roughly 10–20 tokens/second; reading speed is comfortably exceeded above ~30.

ConfigurationModel classIndicative throughputExperience
Jetson Orin, quantisedNemotron Nano class~20–45 tok/sFluid single-user conversation
Single high-end RTX GPUNemotron Nano class~60–120 tok/sInstant; multi-user capable
Node / DGX-Spark classNemotron Super class~30–80 tok/sStrong reasoning, small-team concurrency
Multi-GPU rackNemotron Super+, tunedscales with GPUsDepartment-scale serving
Honest numbers policy

Figures above are indicative planning ranges; real throughput depends on quantisation, context length, concurrency and thermal envelope. Every AIOD proposal includes measured benchmarks from your actual configuration — and every demo runs with the cable out.

Technical deep-dive

Bring your hardest constraint.

Power budget, VRAM ceiling, concurrency target, accreditation requirement — sizing is the conversation we enjoy most.

DEPLOY@AIOD.APP →