TinyML: Accelerating Tiny Machine Learning by Building Your Own Processor (BYOP)



A full-stack workshop for accelerating TinyML
HPCA | February 25, 2023

Welcome

To build your own TinyML accelerator and processor during the tutorial, attendees will need to bring a laptop running Ubuntu 18 or 20 with a USB port and CFU Playground installed. See repo installation directions here. We will provide the Lattice FPGAs (Option 4b).

Demo Video

Overview

We are running a half-day tutorial on TinyML Acceleration at HPCA 2023 in Montreal!

Embedded FPGAs will be provided for free to in-person attendees to build their own TinyML processor and accelerator during the tutorial! Remote participation at HPCA 2023 is not supported.

About

Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. We present CFU Playground, a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our toolchain provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground’s design and evaluation loop, we show substantial speedups in just minutes! The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, a black-box optimization service.

What is the goal of the workshop?

Who is the audience for this workshop?

New ML accelerators are being announced and released each month for a variety of applications. However, the large cost & complexity associated with designing an accelerator, integrating it into a larger System-on-Chip, and developing its software stack has made it a non-trivial task that is difficult for one to rapidly iterate upon. Attendees will be able to deploy their very own accelerated ML solutions within minutes, empowering them to explore the breadth of opportunity that exists in hardware acceleration. This in conjunction with the relevance and excitement surrounding ML today should welcome people with many different backgrounds and interests in ML, FPGAs, embedded systems, computer architecture, hardware design, and software development.

Scope and Topics

Requirements

To build your own TinyML accelerator and processor during the tutorial, attendees will need to bring a Linux laptop with CFU Playground installed. See installation directions here. We will provide the Lattice FPGAs (Option 4b).

Pre-requisites

Hardware

Software

Tutorial Schedule

Time Material/Activity
1:30 PM Welcome & Tiny Machine Learning (TinyML)
  • Tutorial Overview
  • General survey of the field of TinyML
  • What are the common use cases
  • What kind of models do we run
  • What are the typical resource constraints, challenges, etc.
2:00 PM CFU Playground: Full-Stack Framework for TinyML Acceleration Using HW-SW Co-Design
  • General overview of CFUs
  • Tour of CFU Playground
  • End-to-end guide of building an ML Accelerator
2:50 PM Design Space Exploration of CPU vs CFU accelerator
  • Renode and Verilator simulation
  • Google's Vizier for Black-Box Search Optimization
  • Integration with CFU Playground for DSE
3:20 PM Coffee Break
3:40 PM TensorFlow Lite Microcontrollers (TFLM)
  • What is TF Lite Micro
  • TF vs. TF Lite vs TF Lite Micro
  • Running TF Lite on-device
4:10 PM Benchmarking of TinyML Systems
  • Importance of benchmarking and challenges of benchmarking TinyML systems
  • MLPerf Tiny and it's workloads well suited for CFUs
  • CFU Playground for Architects and the Hardware Lottery Problem
  • Demo of Profiling/Microbenchmarking support in CFU Playground
4:40 PM Build Your Own Processor (BYOP)
  • Accelerate Your Own TinyML Model
  • Case study of model and profiling results provided
  • Implement a new instruction in the CFU
  • Use the new instruction in a TFLM kernel
  • Measure performance speed up