Towards a Distributed Quantized Machine Learning Inference Using Commodity SoC-FPGAs Using FINN

Authors

  • Mathieu Hannoun ETIS
  • Stepahne Zuckerman
  • Olivier Romain ETIS

DOI:

https://doi.org/10.64552/wipiec.v11i1.76

Keywords:

Distributed inference, FPGA, AI, Edge AI, Embedded AI, Split network, Split inference

Abstract

Deep Neural Networks (DNNs) have experienced significant growth over the years, accompanied by a corresponding rise in energy consumption due to their escalating demand for computational resources. To mitigate the environmental impact of AI, and address growing concerns over data privacy, a growing trend is to process data locally at the edge rather than relying on large-scale data centers.
FPGA-based systems are particularly suited for this kind of applications, with their low power consumption to high parallel computation ratio.
The main drawback of commodity FPGAs is their limited hardware resources, constraining the size of the DNNs which can run efficiently on such targets. This paper presents a methodology for distributed DNNs on multiple commodity FPGAs to support models that are usually only suited for larger FPGAs. We are able to support the inference for a MobileNetV1 on six Zedboards with a peak throughput of 118.3 inferences per second for an estimated power consumption of 16.176 Watts.

References

H. F. Atlam, R. J. Walters, et al., Fog Computing and the Internet of Things: A Review.Big Data and Cognitive Computing. vol. 2, p. 10, June 2018. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute. DOI: https://doi.org/10.3390/bdcc2020010

H. Li, G. Shou, Y. Hu, et al., “Mobile Edge Computing: Progress and Challenges,” in 2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), pp. 83–84, Mar. 2016. DOI: https://doi.org/10.1109/MobileCloud.2016.16

T. Zebin, P. J. Scully, et al., “Design and Implementation of a Convolutional Neural Network on an Edge Computing Smartphone for Human Activity Recognition,” IEEE Access, vol. 7, pp. 133509–133520, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2941836

M. Merenda, C. Porcaro, et al., “Edge Machine Learning for AI-Enabled IoT Devices: A Review,” Sensors, vol. 20, p. 2533, Jan. 2020. Number: 9 Publisher: Multidisciplinary Digital Publishing Institute. DOI: https://doi.org/10.3390/s20092533

N. N. Alajlan and D. M. Ibrahim, “TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications,” Micromachines, vol. 13, p. 851, June 2022. Number: 6 Publisher: Multidisciplinary Digital Publishing Institute. DOI: https://doi.org/10.3390/mi13060851

Y. Umuroglu, N. J. Fraser, et al., “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17, (New York, NY, USA), p. 65–74, Association for Computing Machinery, 2017. DOI: https://doi.org/10.1145/3020078.3021744

M. Courbariaux, I. Hubara, et al., “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” Mar. 2016. arXiv:1602.02830.

M. Rastegari, V. Ordonez, et al., “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” in Computer Vision – ECCV 2016 (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), (Cham), pp. 525–542, Springer International Publishing, 2016. DOI: https://doi.org/10.1007/978-3-319-46493-0_32

M. Blott, T. B. Preußer, et al., “FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks,”ACM Transactions on Reconfigurable Technology and Systems (TRETS),vol. 11, no. 3, pp. 1–23, 2018. DOI: https://doi.org/10.1145/3242897

T. Alonso, L. Petrica, et al., “Elastic-df: Scaling performance of dnn inference in fpga clouds through automatic partitioning,” ACM Trans. Reconfigurable Technol. Syst., vol. 15, Dec. 2021. DOI: https://doi.org/10.1145/3470567

G. Fiscaletti, M. Speziali, et al., “BNNsplit: Binarized neural networks for embedded distributed FPGA-based computing systems,” in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 975–978, 2020. DOI: https://doi.org/10.23919/DATE48585.2020.9116220

W. Jiang, E. H.-M. Sha, et al., “Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference,” ACM Trans. Embed. Comput. Syst., vol. 18, pp. 67:167:23, Oct. 2019. DOI: https://doi.org/10.1145/3358192

A. G. Howard, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.

J. Deng, W. Dong, et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848

A. Sharma, V. Singh, et al., “Implementation of CNN on Zynq based FPGA for Real-time Object Detection,” IEEE Internet of Things Journal, pp. 1–7, July 2019. DOI: https://doi.org/10.1109/ICCCNT45670.2019.8944792

Downloads

Published

2025-09-02

How to Cite

Hannoun, M., Zuckerman, S., & Romain, O. (2025). Towards a Distributed Quantized Machine Learning Inference Using Commodity SoC-FPGAs Using FINN. WiPiEC Journal - Works in Progress in Embedded Computing Journal, 11(1), 4. https://doi.org/10.64552/wipiec.v11i1.76