title: On Device Quantization V4 Analysis created: 2026-05-27 updated: 2026-05-27 type: concept tags: [research, whitepaper] sources: [raw/papers/on-device-quantization-v4.md]

On Device Quantization V4

2-bit quantization of 70B models is viable for daily tasks without catastrophic intelligence loss.

Uses ‘Selective Bit-Preserving’ where key attention heads remain at 4-bit while others drop to 1.5-bit.

Llama-4-70B running on 16GB VRAM with only 3% performance degradation.

True ‘Local AI’ that doesn’t require cloud API for reasonable reasoning.

Medium Impact.