Hpca_2023 · CFU-Playground

Table of Contents

Welcome
Demo Video
Overview
About
Requirements
Tutorial Schedule

Welcome

To build your own TinyML accelerator and processor during the tutorial, attendees will need to bring a laptop running Ubuntu 18 or 20 with a USB port and CFU Playground installed. See repo installation directions here. We will provide the Lattice FPGAs (Option 4b).

Demo Video

Overview

We are running a half-day tutorial on TinyML Acceleration at HPCA 2023 in Montreal!

Embedded FPGAs will be provided for free to in-person attendees to build their own TinyML processor and accelerator during the tutorial! Remote participation at HPCA 2023 is not supported.

About

Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. We present CFU Playground, a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our toolchain provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground’s design and evaluation loop, we show substantial speedups in just minutes! The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, a black-box optimization service.

What is the goal of the workshop?

Learn what are the challenges and opportunities for designing TinyML hardware.
Design and develop model-specific accelerators quickly on FPGAs.
Get hands-on knowledge on how to build an ML accelerator and perform design space exploration using CFU playground!

Who is the audience for this workshop?

New ML accelerators are being announced and released each month for a variety of applications. However, the large cost & complexity associated with designing an accelerator, integrating it into a larger System-on-Chip, and developing its software stack has made it a non-trivial task that is difficult for one to rapidly iterate upon. Attendees will be able to deploy their very own accelerated ML solutions within minutes, empowering them to explore the breadth of opportunity that exists in hardware acceleration. This in conjunction with the relevance and excitement surrounding ML today should welcome people with many different backgrounds and interests in ML, FPGAs, embedded systems, computer architecture, hardware design, and software development.

Scope and Topics

Custom Hardware Acceleration on FPGAs
Tiny Machine Learning (TinyML)
Open-Source Tools/Frameworks for HW & SW Development (Full-Stack)

Requirements

To build your own TinyML accelerator and processor during the tutorial, attendees will need to bring a Linux laptop with CFU Playground installed. See installation directions here. We will provide the Lattice FPGAs (Option 4b).

Pre-requisites

Knowledge of computer organization (RISC pipeline, registers, opcodes, etc.)
Basic experience with HDL (being able to read Verilog) & synthesis concepts for FPGAs
Familiarity with C and Python
Familiarity with ML “cycle” (inputs, preprocessing, training, inference, etc.) is helpful but not required

Hardware

For the tutorial, or to experiment with CFU-Playground using simulators, none is required.
To develop on an FPGA, one of the supported FPGA boards is required (or you might be able to add support!)

Software

All software (RISCV toolchain, Symbiflow, etc.) installed in via environment pre-packaged with CFU Playground.

Tutorial Schedule

Time	Material/Activity
1:30 PM	Welcome & Tiny Machine Learning (TinyML) Tutorial Overview General survey of the field of TinyML What are the common use cases What kind of models do we run What are the typical resource constraints, challenges, etc.
2:00 PM	CFU Playground: Full-Stack Framework for TinyML Acceleration Using HW-SW Co-Design General overview of CFUs Tour of CFU Playground End-to-end guide of building an ML Accelerator
2:50 PM	Design Space Exploration of CPU vs CFU accelerator Renode and Verilator simulation Google's Vizier for Black-Box Search Optimization Integration with CFU Playground for DSE
3:20 PM	Coffee Break
3:40 PM	TensorFlow Lite Microcontrollers (TFLM) What is TF Lite Micro TF vs. TF Lite vs TF Lite Micro Running TF Lite on-device
4:10 PM	Benchmarking of TinyML Systems Importance of benchmarking and challenges of benchmarking TinyML systems MLPerf Tiny and it's workloads well suited for CFUs CFU Playground for Architects and the Hardware Lottery Problem Demo of Profiling/Microbenchmarking support in CFU Playground
4:40 PM	Build Your Own Processor (BYOP) Accelerate Your Own TinyML Model Case study of model and profiling results provided Implement a new instruction in the CFU Use the new instruction in a TFLM kernel Measure performance speed up

CFU-Playground

TinyML: Accelerating Tiny Machine Learning by Building Your Own Processor (BYOP)