Edge-Optimized Driver State Monitoring

Real-time computer vision on CPU-only edge devices

code View Code

Overview

This project explores how to deploy a modern deep learning vision model under real-world edge constraints. I built a real-time system to detect driver drowsiness and distraction (e.g. texting, drinking, reaching behind) while optimizing the model to run efficiently on CPU-only hardware with limited memory.

Rather than focusing purely on accuracy, the core challenge was model efficiency, stability, and deployability.

Problem

Most deep learning models assume access to GPUs and abundant memory. In contrast, automotive and IoT deployments often operate with:

  • No GPU acceleration
  • Tight memory budgets
  • Strict latency requirements

The baseline model achieved strong accuracy but was too large and inefficient for edge deployment. The goal was to retain model fidelity while significantly reducing model size and improving CPU inference performance.

Model & Dataset

  • Backbone: MobileNetV3-Large
  • Classes: 10 driver behaviors (safe driving, texting, phone usage, drinking, etc.)
  • Input: 224×224 RGB images normalized with ImageNet statistics

Initial experiments used MobileNetV3-Small, but it struggled with fine-grained distraction classes. MobileNetV3-Large provided stronger feature representation, making it a better candidate for post-training optimization.

Optimization: Post-Training Static Quantization

To enable edge deployment, I applied Post-Training Static Quantization (INT8) in PyTorch. I chose static quantization because vision models are dominated by activation memory, and dynamic quantization alone is often insufficient.

Pipeline

  1. Layer Fusion: Fused (Conv–BatchNorm–Activation) layers to reduce memory access overhead.
  2. Instrumentation: Inserted QuantStub/DeQuantStub.
  3. Calibration: Calibrated with real validation data to estimate activation ranges.
  4. Conversion: Converted to a fully INT8 model using PyTorch’s FBGEMM backend.

Results

Metric Float32 INT8 (Quantized)
Model Size 16.2 MB 4.49 MB
Compression 3.6× smaller
Accuracy Loss < 0.5%
Hardware GPU / Server CPU-only edge

Real-Time Inference

I built a real-time inference pipeline to validate deployment readiness. The system performs webcam capture at >30 FPS, handles preprocessing on the CPU, and executes fully quantized INT8 inference. The result is a system that runs smoothly on standard x86 CPUs without any GPU acceleration.

Key Learnings

  • High-accuracy models can be made edge-deployable without retraining.
  • Quantization is a systems problem, not just a modeling trick.
  • Calibration quality is critical to preserving accuracy.
  • Architectural choices strongly influence quantization robustness.

Future Work

  • Quantization-Aware Training (QAT) to recover that last 0.5% accuracy.
  • ONNX export for C++ deployment.
  • Testing on ARM-based edge platforms (Raspberry Pi, Jetson Nano).