CFU-Playground: Build Your Own Custom TinyML Processor

A full-stack workshop for accelerating TinyML
FCCM May 18, 2022


Running machine learning (ML) on embedded edge devices, as opposed to in the cloud, is gaining increased attention for multiple reasons such as privacy, latency, security, and accessibility. Given the need for energy efficiency when running ML on these embedded platforms, custom processor support and hardware accelerators for such systems could present the needed solutions. However, ML acceleration on microcontroller-class hardware is a new area, and there exists a need for agile hardware customization for tiny machine learning (tinyML). Building ASICs is both costly and time-consuming, though, and the opportunity exists with an FPGA platform to customize the processor to adapt it to perform the application’s computation efficiently while adding a small amount of custom hardware that exploits the bit-level flexibility of FPGAs.

To this end, we present CFU Playground, a full-stack open-source framework that enables rapid and iterative design of tightly-coupled accelerators for tinyML systems. Our toolchain integrates open-source software, RTL generators, and FPGA tools for synthesis, place, and route. This full-stack development framework gives engineers access to explore bespoke architectures that are customized and co-optimized for tinyML. The rapid deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization for repetitive ML computations. CFU Playground is available as an open-source project here:

What is the goal of the workshop?

Who is the audience for this workshop?

New ML accelerators are being announced and released each month for a variety of applications. However, the large cost & complexity associated with designing an accelerator, integrating it into a larger System-on-Chip, and developing its software stack has made it a non-trivial task that is difficult for one to rapidly iterate upon. Attendees will be able to deploy their very own accelerated ML solutions within minutes, empowering them to explore the breadth of opportunity that exists in hardware acceleration. This in conjunction with the relevance and excitement surrounding ML today should welcome people with many different backgrounds and interests in ML, FPGAs, embedded systems, computer architecture, hardware design, and software development.

Scope and Topics






Time Material/Activity
1:00 PM Welcome & Tiny Machine Learning (TinyML)
  • General overview of tinyML as a field
  • What are the common use cases
  • What kind of models do we run
  • What are the typical resource constraints, challenges, etc.
1:30 PM Benchmarking of TinyML Systems
  • Importance of benchmarking and challenges of benchmarking TinyML systems
  • MLPerf Tiny and it's workloads well suited for CFUs
  • FPGA submissions to MLPerf Tiny
  • Demo of Profiling/Microbenchmarking support in CFU-Playground
2:00 PM TensorFlow Lite Microcontrollers (TFLM)
  • What is TF Lite Micro
  • TF vs. TF Lite vs TF Lite Micro
  • Running TF Lite on-device
2:30 PM Custom Function Units
  • General overview of CFU
  • Tour of CFU Playground
  • Build your first CFU in a Colab
3:00 PM Introduction to Amaranth
  • Using Amaranth to design a CFU
  • Unit testing in Amaranth
3:30 PM Renode/Antmicro
  • Renode's simulation solution
  • Renode Development Lifecycle
  • Verilator Integration
  • Integration with CFU
4:00 PM Accelerate a TinyML Model
  • Case study of model and profiling results provided
  • Implement a new instruction in the CFU
  • Use the new instruction in a TFLM kernel
  • Measure performance speed up