Home / Technology

The stack

Coherent from chip to model:
NVIDIA silicon. Nemotron & Gemma weights.

An offline appliance lives or dies on the fit between hardware and model. AIOD builds on NVIDIA GPUs running two open-weight families — NVIDIA's own Nemotron and Google's Gemma — so every watt, every gigabyte of VRAM and every token per second is accounted for at design time.

The model stack

Open weights, engineered for the silicon they run on.

01 // OWNERSHIP

Open-licensed weights

Nemotron ships under NVIDIA's open model licence; Gemma under Google's open-weight Gemma terms. Either way the weights live on your hardware, under your control — no API key that expires, no terms-of-service change that strands your investment, no vendor who can switch you off.

02 // EFFICIENCY

Sized for the edge

The combined line-up spans ~1B Gemma variants and Nano-class Nemotron models that run comfortably on a Jetson, through Super-class and 27B models for workstation and node deployments. With quantisation, serious capability fits in a sealed case that runs on a battery.

03 // COHERENCE

One vendor, chip to weights

Models tuned by the company that designed the GPUs they run on — with TensorRT-LLM optimisation paths and predictable performance. Fewer integration surprises means tighter power budgets and honest runtime promises.

Choosing between them

Family / size	Natural habitat	Reach for it when
Nemotron Nano ~9–12B	Jetson Orin · single GPU	Maximum performance-per-watt on NVIDIA edge silicon; TensorRT-LLM optimised
Nemotron Super ~49B	Node · rack	Strongest reasoning for DGX-class private deployments
Gemma compact ~1–4B	Ultra-low-power edge	The tightest battery budgets; companion and embedded tasks
Gemma 12B / 27B	Single GPU · node	Multilingual missions, vision-capable variants, deepest quantised-ecosystem maturity

Two families, one rule

Selection is mission-led. Nemotron typically wins raw performance-per-watt on our NVIDIA silicon; Gemma earns its slot on multilingual corpora — Civic and humanitarian nodes especially — on ultra-compact builds, and where its very mature local-deployment ecosystem shortens the path. Both are open-weight, both run fully offline, both are yours at handover. Guides: run Nemotron locally · run Gemma locally.

Hardware guide

Three classes of silicon, sized to the mission.

We spec the smallest machine that does the job properly — smaller means less power, less heat, fewer failure modes, and longer autonomous runtime.

Class	Compute	Model fit	Power envelope	Typical unit
Edge / portable	NVIDIA Jetson Orin	Nemotron Nano, quantised	15–60 W	AIOD Field F-1
Node / workstation	DGX-Spark-class · RTX workstation	Nemotron Nano → Super	~100–500 W	AIOD Civic C-1
Rack / department	Multi-GPU RTX / DGX-class rack	Nemotron Super+, fine-tuned, high concurrency	1 kW+	AIOD Private P-1

Sizing rule of thumb

Model parameters × precision sets the VRAM floor: a ~12B-parameter model wants roughly 24 GB at FP16, or close to half that at 4-bit quantisation — before context and retrieval overhead. We publish full sizing maths in the Nemotron local deployment guide.

SIZE YOUR OWN BUILD — HARDWARE CALCULATOR →

Security model

The air gap is engineered, then proven.

"Offline" is a verifiable property, not a marketing word. Every appliance ships with a documented attestation of the controls below.

GAP / NETWORK

No path out

Radios disabled or physically absent; interfaces locked at OS and firmware level; verified with the client present at handover.

SIG / UPDATES

Signed media only

The appliance applies updates exclusively from cryptographically signed AIOD media. Unsigned input is refused, loudly.

ENC / AT REST

Encrypted at rest

Full-disk encryption protects weights, corpus and logs if the hardware is ever lost, stolen or decommissioned.

TEL / ZERO

Zero telemetry

No analytics, no call-home, no usage data. We cannot see your prompts. Nobody can — that is the product.

AIR-GAPPED AI, EXPLAINED →

Benchmarks

What "real-time" means on a battery.

Indicative interactive throughput by class — useful conversation needs roughly 10–20 tokens/second; reading speed is comfortably exceeded above ~30.

Configuration	Model class	Indicative throughput	Experience
Jetson Orin, quantised	Nemotron Nano class	~20–45 tok/s	Fluid single-user conversation
Single high-end RTX GPU	Nemotron Nano class	~60–120 tok/s	Instant; multi-user capable
Node / DGX-Spark class	Nemotron Super class	~30–80 tok/s	Strong reasoning, small-team concurrency
Multi-GPU rack	Nemotron Super+, tuned	scales with GPUs	Department-scale serving

Honest numbers policy

Figures above are indicative planning ranges; real throughput depends on quantisation, context length, concurrency and thermal envelope. Every AIOD proposal includes measured benchmarks from your actual configuration — and every demo runs with the cable out.

Technical deep-dive

Bring your hardest constraint.

Power budget, VRAM ceiling, concurrency target, accreditation requirement — sizing is the conversation we enjoy most.

DEPLOY@AIOD.APP →