Resource Hub

Doing the Impossible with Modalix: 16-Channel, 30 fps AI Video Analytics on One Chip in Under 10W

September 24, 2025
David Olsen, Senior Manager, Product Marketing
Blog
Share on:

Traditional video management systems (VMS) weren’t built for AI. To extract meaningful insights, organizations have had to bolt on accelerators, host systems, and layers of software and services. Each element adds cost, power, and latency – exposing new points of failure and data privacy concerns.

SiMa.ai offers a different path. Our Modalix MLSoC™ injects intelligence directly into VMS systems, running multiple AI models for people counting, pose estimation, and more on a single chip.

High-Efficiency Inference

SiMa’s Machine Learning Accelerator (MLA) delivers industry-leading vision performance at extreme energy efficiency:

  • Over 1000 FPS YOLOv8n for object detection
  • ~300 FPS YOLOv8m for people/face detection
  • ~1000 FPS ResNet-18 for classification

Unlike conventional VMS boxes that advertise high channel counts but analyze few streams, Modalix MLSoC runs a unique ML pipeline on each stream. That means 16 different camera feeds can simultaneously handle different tasks – person and object detection, safety compliance, vehicle tracking, even pose estimation.

With an integrated Arm A65 octa-core microprocessor, Modalix can natively serve visualizations to any browser – no x86 hardware or PCIe accelerators needed.

This efficiency delivers up to 90% total cost of ownership (TCO) savings at scale versus cloud-based video analytics while simplifying deployment, reducing latency, and keeping sensitive data local.

Proof In Action

At the AI Infra Summit in Santa Clara, CA (September, 2025), SiMa demonstrated 16 video streams with four model classes on a single Modalix System on Module (SoM) – pin and form-factor compatible with the leading GPU SoM vendor – all under 10 watts.

  • 16 video streams – over Ethernet (RTSP), 640×480@30fps
  • Multiple models in parallel – supporting detection across all 80 COCO dataset classes for person, safety equipment, vehicle, and pose detection:
    • Channels 1–4: YOLOv9s
    • Channels 5–8: YOLOv8s-pose
    • Channels 9–14: YOLOv8n
    • Channels 15–16: YOLOv7n
  • Adaptive UI – with real-time overlays of metadata, bounding boxes, and classifications

What traditionally required racks of servers now fits on a palm-sized (69.6 mm x 45 mm) SiMa module, ready to use in the SiMa SoM development kit or to drop into competing GPU carrier boards, where you will enjoy a significant reduction in power draw.

SiMa Modalix SoM Dev Kit and Module

                                                  SiMa Modalix SoM Dev Kit and Module

Moving Beyond Detection

Modalix can also run larger models – including Large Language Models (LLM), Large Vision Models (LVM), and Large MultiModal Models (LMM) – blending image and video understanding with natural language reasoning, bringing powerful end-point intelligence for semantic search, contextual insight, and real-world reasoning in physical AI applications. And with SiMa’s LLiMa framework, we make it easy to deploy Hugging Face models directly on Modalix, expanding the scope of what’s possible.

Real-World Applications

The value of physical AI video analytics on Modalix spans many industries:

  • Robotics & automation – Perception-driven control in manufacturing, warehouses, or logistics
  • Smart cities & public safety – Traffic optimization, threat detection
  • Industrial monitoring – Anomaly detection, worker safety, predictive maintenance on the factory floor
  • Retail operations – Customer counting, dwell time tracking, shelf monitoring
  • Hospitality – Table monitoring for unattended guests, empty glasses, or long wait times

 

Fast, Flexible, and Cost-Effective

SiMa’s Palette Software stack orchestrates pipelines, stream management, and UI overlays. And the same hardware can be re-used between applications with only a model update – no infrastructure overhaul required.

The Takeaway

Intelligent video belongs on-device, not in server racks or cloud. SiMa.ai’s Modalix MLSoC shows you can run full high-performance ML pipelines – for streaming, decoding, and inference – in real time at under 10 watts. For enterprises, cities, and innovators, that means faster decisions, lower costs, and adaptability that legacy systems just can’t match.

Ready to explore how SiMa.ai Modalix can transform your video solutions?