Learn the fundamentals of model compression and Quantization-Aware Training (QAT) to optimize large language models like Gemma 4 for …
Tag: ONNX Runtime
Articles tagged with ONNX Runtime. Showing 3 articles.
Chapters
Understand the landscape of on-device AI agents and tiny LLM systems, set up your development environment, and explore core tooling for edge …
Master techniques for optimizing AI agent and tiny LLM performance and resource usage on constrained edge devices for real-world production …