Split Inference on Networked Microcontrollers

More Info
expand_more

Abstract

With the rapid development of Artificial Intelligence (AI), the size and complexity of models are increasing rapidly. The limited memory and computing power of microcontroller units (MCUs) pose significant challenges for running AI applications on them. This thesis presents a method to run deep learning models on MCUs using a distributed approach.

First, we identified memory size as the primary constraint for implementing deep learning models in MCUs. To address this issue, our method involves splitting the models into smaller weight fragments distributed across multiple networked MCUs. A coordinator MCU manages the overall process, including neuron mapping and data relaying. We demonstrated that our approach reduces peak RAM usage during inference compared to existing methods.

For optimization, we employed layer fusion and quantization to reduce model size while preserving accuracy. We also introduced a rating system to assign capability scores to MCUs for efficient task allocation, explained through mathematical equations.

Our simulation phase validated the effectiveness of our method, showing successful distributed inference in MCUs and providing insights for real-world applications. Implementation on a network of MCUs confirmed the practical applicability and efficiency of our approach.

In conclusion, this thesis presents a feasible and efficient distributed inference method for networked MCUs, addressing resource limitations and enabling practical AI applications on constrained MCU platforms.

Files

Split_inference_thesis.pdf
Unknown license
warning

File under embargo until 28-06-2026