Split Inference on Networked Microcontrollers

Master thesis (2024)

Authors

J. Lu Electrical Engineering, Mathematics and Computer Science

Contributors

Q. Wang Embedded Systems - (mentor)

J. Yang Web Information Systems - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:0c4f6740-bbeb-433e-9359-f746a37c53c6

More Info

expand_more

Published Date

28-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

With the rapid development of Artificial Intelligence (AI), the size and complexity of models are increasing rapidly. The limited memory and computing power of microcontroller units (MCUs) pose significant challenges for running AI applications on them. This thesis presents a method to run deep learning models on MCUs using a distributed approach.

First, we identified memory size as the primary constraint for implementing deep learning models in MCUs. To address this issue, our method involves splitting the models into smaller weight fragments distributed across multiple networked MCUs. A coordinator MCU manages the overall process, including neuron mapping and data relaying. We demonstrated that our approach reduces peak RAM usage during inference compared to existing methods.

For optimization, we employed layer fusion and quantization to reduce model size while preserving accuracy. We also introduced a rating system to assign capability scores to MCUs for efficient task allocation, explained through mathematical equations.

Our simulation phase validated the effectiveness of our method, showing successful distributed inference in MCUs and providing insights for real-world applications. Implementation on a network of MCUs confirmed the practical applicability and efficiency of our approach.

In conclusion, this thesis presents a feasible and efficient distributed inference method for networked MCUs, addressing resource limitations and enabling practical AI applications on constrained MCU platforms.

Files

Split_inference_thesis.pdf

Unknown license

File under embargo until 28-06-2026