Chào mừng bạn đến với Comprehensive Rust

Build workflow GitHub contributors GitHub stars

This is a free Rust course developed by the Android team at Google. The course covers the full spectrum of Rust, from basic syntax to advanced topics like generics and error handling.

The latest version of the course can be found at https://google.github.io/comprehensive-rust/. If you are reading somewhere else, please check there for updates.

The course is also available as a PDF.

The goal of the course is to teach you Rust. We assume you don’t know anything about Rust and hope to:

  • Give you a comprehensive understanding of the Rust syntax and language.
  • Enable you to modify existing programs and write new programs in Rust.
  • Show you common Rust idioms.

We call the first four course days Rust Fundamentals.

Building on this, you’re invited to dive into one or more specialized topics:

  • Android: a half-day course on using Rust for Android platform development (AOSP). This includes interoperability with C, C++, and Java.
  • Chromium: a half-day course on using Rust within Chromium based browsers. This includes interoperability with C++ and how to include third-party crates in Chromium.
  • Bare-metal: a whole-day class on using Rust for bare-metal (embedded) development. Both microcontrollers and application processors are covered.
  • Concurrency: a whole-day class on concurrency in Rust. We cover both classical concurrency (preemptively scheduling using threads and mutexes) and async/await concurrency (cooperative multitasking using futures).

Non-Goals

Rust is a large language and we won’t be able to cover all of it in a few days. Some non-goals of this course are:

Assumptions

The course assumes that you already know how to program. Rust is a statically-typed language and we will sometimes make comparisons with C and C++ to better explain or contrast the Rust approach.

If you know how to program in a dynamically-typed language such as Python or JavaScript, then you will be able to follow along just fine too.

Speaker Notes

This is an example of a speaker note. We will use these to add additional information to the slides. This could be key points which the instructor should cover as well as answers to typical questions which come up in class.

Hướng Dẫn Khóa Học

Trang này là dành cho người hướng dẫn khóa học.

Đây là một số thông tin cơ bản về cách chúng tôi triển khai khóa học trong nội bộ Google.

Chúng tôi mở lớp từ 9 giờ sáng đến 4 giờ chiều, trong đó có 1 tiếng nghỉ trưa. Một ngày học bao gồm lớp buổi sáng và lớp buổi chiều, mỗi buổi kéo dài 3 tiếng. Một buổi học sẽ được chia thành nhiều quãng nghỉ để học sinh có thời gian thực hành.

Trước khi bắt đầu giảng dạy khóa học này, bạn sẽ cần:

  1. Làm quen với cách sử dụng giáo trình của khóa học. Chúng tôi đã lồng ghép các ghi chú thuyết trình để giúp tóm lược lại nội dung từng phần (chúng tôi cũng rất mong bạn có thể giúp đỡ, đóng góp vào các ghi chú này!). Khi giảng dạy, bạn nên chắc chắn rằng các ghi chú thuyết trình sẽ được mở dưới dạng một cửa sổ popup (bằng cách nhấn vào biểu tượng có mũi tên nhỏ bên phải chữ “Speaker Notes”). Làm vậy sẽ giúp bạn có thể thuyết trình với slide rõ ràng nhất.

  2. Xác định lịch học. Vì nội dung khóa học kéo dài bốn ngày, chúng tôi khuyến nghị bạn sắp xếp lịch học dàn trải ra hai tuần. Các học viên đều bảo rằng việc có những “khoảng trống” trong khóa rất có ích để họ có thể tiêu hóa hết lượng thông tin được cung cấp.

  3. Tìm một căn phòng đủ lớn để chứa tất cả học viên. Chúng tôi khuyến nghị một lớp nên từ 15 đến 25 người, đủ nhỏ để mọi người có thể thoải mái đưa ra các câu hỏi — và nó cũng đủ để người hướng dẫn có thể trả lời hết các câu hỏi đó. Hãy đảm bảo rằng phòng học có đủ ghế cho cả bạn và học sinh: bạn đương nhiên cần ngồi dạy với laptop của bạn. Đặc biệt, bạn sẽ hướng dẫn và thực hành mẫu rất nhiều bài live-coding, nên một cái ghế sẽ có ích hơn là một cái bục giảng đó.

  4. Trong ngày học, hãy đến lớp sớm một chút để chuẩn bị sẵn sàng mọi thứ. Chúng tôi khuyến khích chạy trực tiếp lệnh mdbook serve trên laptop của bạn (tham khảo hướng dẫn cài đặt ở đây). Việc này giúp đảm bảo việc chuyển trang của bạn được diễn ra mượt mà nhất, ngoài ra còn cho phép bạn sửa lỗi chính tả trực tiếp khi bạn hoặc ai đó phát hiện ra.

  5. Để mọi người làm các bài luyện tập theo hình thức cá nhân hoặc nhóm nhỏ. Chúng tôi thường bỏ ra 30 đến 45 phút để thực hành vào buối sáng hoặc chiều (tính cả thời gian chữa đáp án). Hãy chắc chắn hỏi các học viên xem có ai đang bị tắc hoặc cần mình giúp không. Nếu bạn thấy vài người đang gặp một vấn đề giống nhau, hay đưa nó lên trước lớp và đề xuất giải pháp, như là, gợi ý mọi người cách để tìm kiếm thông tin liên quan trong thư viện tiêu chuẩn.

Đó là tất cả, chúc bạn may mắn với hành trình hướng dẫn của bạn! Chúng tôi hy vọng bạn có thể tìm thấy niềm vui như cách chúng tôi tìm thấy khi dạy khóa học này!

Sau khi khóa học kết thúc, rất mong nhận được feedback từ bạn để chúng tôi có thể hoàn thiện khóa học hơn nữa. Chúng tôi rất vui lòng khi được biết những gì mang lại hiệu quả cho bạn và những gì cần cải thiện thêm. Ngoài ra, học viên của bạn cũng có thể tích cực gửi feedback cho chúng tôi tại đây!

Cấu Trúc Khóa Học

Trang này là dành cho người hướng dẫn khóa học.

Rust Căn Bản

Ở bốn ngày học đầu tiên, ta sẽ nhanh chóng khái quát rất nhiều khía cạnh căn bản của Rust!

Thời khóa biểu:

  • Ngày 1 Buổi Sáng (2 tiếng và 5 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Lời Chào Mừng5 minutes
Hello, World15 minutes
Kiểu Dữ Liệu Và Giá Trị40 minutes
Control flow (Điểu khiển luồng) căn bản40 minutes
  • Ngày 1 Buổi Chiều (2 tiếng và 35 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Tuples and Arrays35 minutes
References55 minutes
User-Defined Types50 minutes
  • Ngày 2 Buổi Sáng (2 tiếng và 55 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Lời Chào Mừng3 minutes
Pattern Matching1 hour
Methods and Traits50 minutes
  • Ngày 2 Buổi Chiều (3 tiếng và 10 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Generics40 minutes
Standard Library Types1 hour and 20 minutes
Standard Library Traits1 hour and 40 minutes
  • Ngày 3 Buổi Sáng (2 tiếng và 20 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Lời Chào Mừng3 minutes
Memory Management1 hour
Smart Pointers55 minutes
  • Ngày 3 Buổi Chiều (1 tiếng và 50 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Borrowing55 minutes
Lifetimes50 minutes
  • Ngày 4 Buổi Sáng (2 tiếng và 40 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Lời Chào Mừng3 minutes
Iterators45 minutes
Modules40 minutes
Testing45 minutes
  • Ngày 4 Buổi Chiều (2 tiếng và 10 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Error Handling55 minutes
Unsafe Rust1 hour and 5 minutes

Chuyên Sâu

Ngoài lớp học “4 ngày” về Rust căn bản ra, chúng tôi còn cung cấp thêm một số chủ đề chuyên sâu sau:

Rust trong Android

Chủ đề chuyên sâu Rust trong Android là khóa học nửa ngày hướng dẫn sử dụng Rust cho phát triển trên nền tảng Android, bao gồm tính tương tác với C, C++ và Java.

Bạn sẽ cần cài sẵn AOSP checkout, rồi tạo một checkout cho repository của khóa học và di chuyển thư mục src/android/ tới thư mục root của AOSP checkout trên. Việc này sẽ đảm bảo hệ thống build của Android tìm được file Android.bp trong thư mục src/android/.

Hãy chắc chắn rằng lệnh adb sync hoạt động được với trình giả lập hoặc thiết bị thực của bạn, và pre-build tất cả ví dụ Android bằng src/android/build_all.sh. Bạn có thể đọc script để xem các lệnh mà nó chạy và xác nhận rằng mọi thứ đều hoạt động khi bạn kích hoạt bằng tay.

Rust trong Chromium

Chủ đề chuyên sâu Rust trong Chromium là khóa học nửa ngày hướng dẫn sử dụng Rust như là một phần của trình duyệt nhân Chromium, bao gồm hệ thống build gn, đưa vào thư viện bên thứ ba (“crates”) và tính tương tác với C++.

Bạn sẽ cần có khả năng build được Chromium — chúng tôi đề xuất build bằng công cụ xây dựng thành phần, debug bởi tốc độ ấn tượng của nó. Dù sao thì bất kỳ cách build nào cũng đều hoạt động được, chỉ cần bạn đảm bảo rằng mình sẽ chạy được trình duyệt Chromium vừa build xong.

Bare-Metal Rust

Chủ đề chuyên sâu Bare-Metal Rust là khóa học một ngày hướng dẫn sử dụng Rust cho phát triển nhúng (bare-metal). Nội dung về vi điều khiển và bộ xử lý ứng dụng sẽ được bao quát trong phần này.

Về phần vi điều khiển, bạn sẽ cần mua trước bo mạch BBC micro:bit v2. Mỗi người sẽ cần cài đặt một số packages dựa theo miêu tả ở trang chào mừng.

Tính đồng thời trong Rust

Chủ đề chuyên sâu Tính đồng thời trong Rust là khóa học một ngày hướng dẫn về khái niệm đồng thời async/await điển hình.

Bạn sẽ cần thiết lập một crate mới và tải xuống các gói phụ thuộc cần thiết để sẵn sàng chạy. Rồi bạn có thể copy/paste các ví dụ vào src/main.rs để thử nghiệm:

cargo init concurrency cd concurrency cargo add tokio --features full cargo run

Thời khóa biểu:

  • Ngày 3 Buổi Sáng (2 tiếng và 20 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Threads30 minutes
Channels20 minutes
Send and Sync15 minutes
Shared State30 minutes
Exercises1 hour and 10 minutes
  • Buổi Chiều (3 tiếng và 20 phút, bao gồm thời gian nghỉ)
SegmentThời lượng
Async Basics30 minutes
Channels and Control Flow20 minutes
Pitfalls55 minutes
Exercises1 hour and 10 minutes

Quy Chuẩn

Khóa học này hướng tới tính tương tác cao nên chúng tôi khuyến khích để các câu hỏi dẫn bạn đi khám phá nhiều thứ thú vị của Rust!

Phím Tắt

Dưới đây là một vài phím tắt có ích cho việc sử dụng mdBook:

  • Arrow-Left: Navigate to the previous page.
  • Arrow-Right: Navigate to the next page.
  • Ctrl + Enter: Execute the code sample that has focus.
  • s: Activate the search bar.

Bản Dịch

Khóa học này đã được dịch ra các ngôn ngữ sau nhờ các tình nguyện viên tuyệt vời:

Sử dụng nút lựa chọn ngôn ngữ ở góc phía trên bên phải để đổi sang ngôn ngữ khác.

Bản Dịch Chưa Hoàn Chỉnh

Hiện tại, vẫn còn một lượng lớn các bản dịch đang được tiến hành. Chúng tôi liên kết đến các bản dịch được cập nhật gần đây nhất:

Nếu bạn muốn đóng góp giúp đỡ, vui lòng xem hướng dẫn của chúng tôi để biết nên bắt đầu như thế nào. Các bản dịch được hợp tác cùng hoàn thành trên một bản issue tracker.

Sử Dụng Cargo

Khi bạn bắt đầu đọc về Rust, bạn sẽ rất nhanh chóng được thấy Cargo, một công cụ tiêu chuẩn xây dựng và chạy ứng dụng Rust, nằm trong hệ sinh thái của Rust. Trong phần này, chúng tôi sẽ tóm tắt ngắn gọn Cargo là gì, nó phù hợp với hệ sinh thái rộng lớn hơn ra sao, và nó phù hợp với khóa đào tạo như thế nào.

Cài đặt

Vui lòng tuân theo hướng dẫn cài đặt tại https://rustup.rs/.

Bạn sẽ nhận được công cụ xây dựng Cargo (cargo) và trình biên dịch Rust (rustc). Bạn cũng sẽ có rustup, một tiện ích dòng lệnh mà bạn có thể sử dụng để cài đặt các phiên bản khác nhau cho trình biên dịch.

Sau khi cài đặt Rust thành công, bạn nên cấu hình trình Text Editor hoặc IDE của bạn để hoạt động với Rust. Hầu hết các Text Editor như VS Code, Emacs, Vim/Neovim, v.v… đều áp dụng rust-analyzer để thực hiện tính năng auto-completion và jump-to-definition. Bên cạnh đó, vẫn còn một IDE khác khá phổ biến được phát triển bởi JetBrains là RustRover.

Speaker Notes

  • Trong hệ máy Debian/Ubuntu, bạn còn có thể tự cài đặt Cargo, mã nguồn của Rust, và Rust Formatter bằng apt. Tuy nhiên, bạn sẽ chỉ cài được phiên bản đã bị quá hạn và có thể dẫn tới những lỗi không mong muốn. Câu lệnh để cài sẽ là như sau:

    sudo apt install cargo rust-src rustfmt

Hệ Sinh Thái Rust

Hệ sinh thái của Rust bao gồm một số lượng đáng kể các công cụ, trong đó nổi bật nhất như là:

  • rustc: trình biên dịch Rust, thứ mà sẽ “dịch” các file .rs sang dạng nhị phân hoặc các formats bậc trung khác.

  • cargo: trình quản lý các gói phụ thuộc và công cụ xây dựng cho Rust. Cargo sẽ tải các gói phụ thuộc, thường được lưu trữ trên https://crates.io, rồi đưa chúng cho rustc khi bạn xây dựng dự án. Cargo còn có các trình chạy kiểm thử được cài sẵn phục vụ cho việc thực thi các đơn vị kiểm thử.

  • rustup: bộ công cụ cài đặt và cập nhật cho Rust. Công cụ này được sử dụng để cài đặt và cập nhật rustccargo khi các phiên bản mới của Rust được phát hành. Thêm vào đó, rustup còn có thể tải các tài liệu cho thư viện tiêu chuẩn. Bạn có thể cài đặt nhiều phiên bản Rust cùng lúc và rustup sẽ cho bạn chuyển đổi các phiên bản với nhau khi cần.

Speaker Notes

Các điểm chính:

  • Rust có một lịch phát hành phiên bản mới dày đặc với một phiên bản mới mỗi sáu tuần. Các phiên bản mới bảo toàn tính tương thích với các phiên bản cũ — cùng với đó chúng kích hoạt thêm các chức năng mới.

  • Có tổng cộng ba kênh phát hành: “stable”, “beta”, và “nightly”.

  • Các tính năng mới dang được thử nghiệm trên “nightly”, còn “beta” sẽ trở thành “stable” sau mỗi sáu tuần.

  • Các gói phụ thuộc cũng có thể được phân giải từ các hệ thống đăng ký thay thế, git, các thư mục, v.v…

  • Rust còn có nhiều phiên bản lớn: Rust 2021 là bản hiện tại, Rust 2018, và Rust 2015.

    • Các phiên bản lớn được cho phép thực hiện các thay đổi không tương thích đối với ngôn ngữ.

    • Để phòng trừ phá vỡ cấu trúc mã, các phiên bản lớn sẽ được chọn cho crate của bạn thông qua file Cargo.toml.

    • Nhằm tránh hệ sinh thái bị chia nhỏ ra, trình biên dịch Rust có thể trộn lẫn các đoạn mã được viết dưới các phiên bản lớn khác nhau.

    • Hãy đề cập với người học rằng việc sử dụng trình biên dịch trực tiếp mà không thông qua cargo là việc rất hiếm (hầu hết người dùng không bao giờ làm như vậy).

    • Nói không ngoa bản thân Cargo là một công cụ cự kỳ mạnh mẽ và toàn diện. Nó có khả năng bao gồm nhưng không giới hạn tới nhiều tính năng mở rộng sau:

    • Tham khảo thêm tại The Cargo Book.

Code Mẫu Trong Khóa Đào Tạo

Với khóa đào tạo này, chúng ta sẽ hầu như khám phá ngôn ngữ Rust qua các ví dụ có thể thực thi được ngay trên trình duyệt của mình. Việc này giúp cho việc chuẩn bị được thực hiện dễ dàng hơn và đảm bảo trải nghiệm đồng nhất với mọi người.

Cài đặt Cargo vẫn được khuyến khích: nó sẽ giúp bạn làm các bài thực hành dễ hơn. Trong ngày học cuối cùng, chúng tôi sẽ làm một bài tập lớn để cho bạn thấy cách làm việc với các gói phụ thuộc và bạn sẽ cần Cargo để làm nó.

Các khối code trong khóa học này hoàn toàn tương tác được:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You can use Ctrl + Enter to execute the code when focus is in the text box.

Speaker Notes

Hầu hết các đoạn code mẫu đều có thể chỉnh sửa được như trên. Nhưng có một ít các đoạn mẫu lại không chỉnh sửa được vì vài lý do sau:

  • Các Playgrounds được nhúng không thể thực thi được các đơn vị kiểm thử. Hãy copy-paste đoạn code và mở nó trong một Playground thực sự để minh họa các thử nghiệm.

  • Các Playgrounds được nhúng bị mất trạng thái tạm thời ở thời điểm bạn chuyển sang trang khác! Vì lý do này, các học sinh nên giải các bài thực hành sử dụng Rust đã được cài đặt trong máy tính riêng hoặc qua Playground thực sự.

Chạy Cargo Trong Máy Tính Của Bạn

Nếu bạn muốn thử nghiệm code trong hệ thống riêng của bạn, vậy thì trước hết bạn cần phải cài đặt Rust. Hãy tuân theo các hướng dẫn trong Rust Book. Bạn sẽ học được cách sử dụng rustccargo ở đây. Tại thời điểm bài này được viết, phiên bản ổn định mới nhất của Rust có các số phiên bản sau:

% rustc --version rustc 1.69.0 (84c898d65 2023-04-16) % cargo --version cargo 1.69.0 (6e9a83356 2023-04-12)

Bạn cũng có thể sử dụng bất kỳ phiên bản nào ra sau đó vì Rust luôn bảo trì khả năng tương thích ngược của nó.

Sau khi cài xong, tuân theo các bước sau đây để xây dựng tệp nhị phân Rust từ một trong các ví dụ trong khóa đào tạo:

  1. Nhấn vào nút “Copy to clipboard” trong ví dụ mà bạn muốn copy.

  2. Sử dụng cargo new exercise để tạo một thư mục exercise/ mới cho code của bạn:

    $ cargo new exercise Created binary (application) `exercise` package
  3. Điều hướng tới exercise/ và sử dụng cargo run để build và chạy tệp nhị phân của bạn:

    $ cd exercise $ cargo run Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise) Finished dev [unoptimized + debuginfo] target(s) in 0.75s Running `target/debug/exercise` Hello, world!
  4. Thay thế khung code trong src/main.rs với code riêng của bạn. Chẳng hạn, sử dụng ví dụ ở trang trước, viết src/main.rs sao cho thành như sau

    fn main() { println!("Chỉnh sửa tôi đê!"); }
  5. Sử dụng cargo run để build và chạy tệp nhị phân vừa được cập nhật:

    $ cargo run Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise) Finished dev [unoptimized + debuginfo] target(s) in 0.24s Running `target/debug/exercise` Edit me!
  6. Sử dụng cargo check để kiểm tra nhanh dự án của bạn, sử dụng cargo build để biên dịch nó mà không chạy nó luôn. Bạn có thể tìm các tệp được tạo ra ở target/debug/ cho debug build thông thường. Sử dụng cargo build --release để xuất ra bản build tối ưu cho việc xuất bản ở target/release/.

  7. Bạn có thể thêm các gói phụ thuộc vào dự án của bạn bằng cách chỉnh sửa file Cargo.toml. Khi bạn chạy các lệnh cargo, nó sẽ tự động tải về và biên dịch các gói phụ thuộc bị thiếu cho bạn.

Speaker Notes

Hãy cố gắng khuyến khích các học viên cài đặt Cargo và sử dụng trong Text Editor ở máy riêng. Điều này sẽ giúp cuộc sống của họ dễ thở hơn một tý vì họ sẽ được làm việc trong một môi trường phát triển bình thường.

Chào mừng tới Ngày 1

Đây là ngày học đầu tiên của Rust Căn Bản. Sẽ có rất nhiều nội dung mà chúng ta bao quát trong hôm nay:

  • Câu lệnh Rust cơ bản: các biến, kiểu dữ liệu đơn giản và phức hợp, enum, struct, tham chiếu, hàm và phương thức.
  • Các kiểu dữ liệu và suy luận kiểu.
  • Các cấu trúc điều kiển luồng: vòng lặp, điều kiện, vân vân…
  • Các kiểu dữ liệu tự định nghĩa: struct và enum.
  • So khớp mẫu: hủy enum, struct, và mảng.

Mục lục

Buổi học nên kéo dài khoảng 2 tiếng và 5 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Lời Chào Mừng5 minutes
Hello, World15 minutes
Kiểu Dữ Liệu Và Giá Trị40 minutes
Control flow (Điểu khiển luồng) căn bản40 minutes

Speaker Notes

This slide should take about 5 minutes.

Hãy lưu ý các học viên rằng:

  • Học viên nên đặt câu hỏi bất kì khi nào thay vì đợi đến cuối buổi học.
  • Lớp học rất khuyến khích có nhiều sự tương tác và thảo luận!
    • Với vai trò là người hướng dẫn, bạn nên giữ các cuộc thảo luận sao cho không bị lạc đề, chẳng hạn, thử giới hạn các cuộc thảo luận xung quanh cách Rust hoạt động so với vài ngôn ngữ khác. Việc cân bằng này có thể sẽ khá khó khăn, nhưng ta thà chấp nhận nó để học viên thấy hứng thú hơn là việc giao tiếp một chiều.
  • Một phần có thể bị hỏi trước khi ta học tới slide của phần đó.
    • Đây là điều hoàn toàn chấp nhận được! Nhắc lại kiến thức là một phần quan trọng của việc học. Hãy nhớ rằng các slides chỉ là công cụ hỗ trợ cho bạn và bạn có thể bỏ qua chúng nếu muốn.

Ý tưởng cho ngày đầu tiên đó là đưa ra các khía cạnh “cơ bản” của Rust tương tự với các ngôn ngữ khác. Các phần nâng cao hơn sẽ được đề cập tới ở những ngày kế tiếp.

Nếu bạn đang giảng bài này trên lớp, thì bạn nên vắn tắt một lượt mục lục trên. Hãy ghi nhớ rằng cuối mỗi mục nhỏ sẽ có một bài thực hành, sau đó là giờ nghỉ giải lao. Hãy lên kế hoạch để chữa bài thực hành sau giờ nghỉ. Thời gian ở sau mỗi đề mục là nhằm giúp bạn giữ đúng tiến độ khóa học, nhưng bạn cũng có thể thoải mái điều chỉnh linh hoạt nếu cần thiết!

Hello, World

Phần này sẽ kéo dài khoảng 15 phút, bao gồm:

SlideThời lượng
Rust là gì?10 minutes
Benefits of Rust3 minutes
Sân chơi (Playground)2 minutes

Rust là gì?

Rust là một ngôn ngữ lập trình mới được phát hành phiên bản 1.0 vào năm 2015:

  • Rust là một ngôn ngữ được biên dịch tĩnh với vai trò tương tự như C++
    • rustc sử dụng LLVM làm bộ dịch backend.
  • Rust hỗ trợ rất nhiều nền tảng và kiểu kiến trúc:
    • x86, ARM, WebAssembly, …
    • Linux, Mac, Windows, …
  • Rust được sử dụng cho nhiều loại thiết bị:
    • firmware và các chương trình khởi động (boot loaders),
    • màn hình thông minh,
    • điện thoại di động,
    • máy tính,
    • máy chủ.

Speaker Notes

This slide should take about 10 minutes.

Rust phù hợp trong cùng lĩnh vực như C++:

  • Độ linh hoạt cao.
  • Có quyền điều khiển cao.
  • Có thể ứng dụng với các thiết bị có tài nguyên hạn chế như vi điều khiển.
  • Không có thời gian chạy (runtime) hay trình thu dọn rác (garbage collection).
  • Tập trung vào tính đáng tin cậy và độ an toàn mà không làm giảm hiệu suất.

Benefits of Rust

Some unique selling points of Rust:

  • Compile time memory safety - whole classes of memory bugs are prevented at compile time

    • No uninitialized variables.
    • No double-frees.
    • No use-after-free.
    • No NULL pointers.
    • No forgotten locked mutexes.
    • No data races between threads.
    • No iterator invalidation.
  • No undefined runtime behavior - what a Rust statement does is never left unspecified

    • Array access is bounds checked.
    • Integer overflow is defined (panic or wrap-around).
  • Modern language features - as expressive and ergonomic as higher-level languages

    • Enums and pattern matching.
    • Generics.
    • No overhead FFI.
    • Zero-cost abstractions.
    • Great compiler errors.
    • Built-in dependency manager.
    • Built-in support for testing.
    • Excellent Language Server Protocol support.

Speaker Notes

This slide should take about 3 minutes.

Do not spend much time here. All of these points will be covered in more depth later.

Make sure to ask the class which languages they have experience with. Depending on the answer you can highlight different features of Rust:

  • Experience with C or C++: Rust eliminates a whole class of runtime errors via the borrow checker. You get performance like in C and C++, but you don’t have the memory unsafety issues. In addition, you get a modern language with constructs like pattern matching and built-in dependency management.

  • Experience with Java, Go, Python, JavaScript…: You get the same memory safety as in those languages, plus a similar high-level language feeling. In addition you get fast and predictable performance like C and C++ (no garbage collector) as well as access to low-level hardware (should you need it)

Sân chơi (Playground)

Sân chơi cho Rust (Rust Playground) cho phép chúng ta chạy thử những đoạn code ngắn viết bằng Rust, đặc biệt là các ví dụ và bài tập trong khoá học này. Hãy mở Rust Playground và chạy thử chương trình “hello-world”. Rust Playground đi kèm với một vài tính năng hữu ích:

  • Ở phần “Công cụ (Tools)”, sử dụng tuỳ chọnrustfmt để định dạng đoạn code của bạn theo cách “tiêu chuẩn”.

  • Rust có hai “chế độ” chính để tạo code: Gỡ lỗi (Debug) (thêm nhiều thông tin hỗ trợ trong quá trình chạy (runtime), ít tối ưu hơn) và Phát hành (Release) (ít thông tin hỗ trợ trong quá trình chạy (runtime), tối ưu hoá nhiều hơn). Các tuỳ chọn này có trong mục “Gỡ lỗi (Debug)” ở phía trên.

  • Nếu bạn quan tâm, bạn có thể sử dụng chế độ “ASM” ở trong phần “…” để xem đoạn code Assembly được tạo.

Speaker Notes

This slide should take about 2 minutes.

Hãy khuyến khích học viên sử dụng Rust Playground để thử nghiệm trong giờ giải lao. Khuyến khích học viên luôn giữ tab mở và liên tục thử nghiệm mọi thứ trong suốt phần còn lại của khoá học. Điều này đặc biệt hữu ích cho những học viên nâng cao muốn biết thêm về tối ưu hoá của Rust và những đoạn code Assembly được tạo ra.

Kiểu Dữ Liệu Và Giá Trị

Phân đoạn này sẽ kéo dài khoảng 40 phút, bao gồm:

SlideThời lượng
Hello, World5 minutes
Biến5 minutes
Giá trị5 minutes
Số Học3 minutes
Suy Luận Kiểu3 minutes
Thực Hành: Fibonacci15 minutes

Hello, World

Chúng ta hãy cùng bước vào chương trình Rust đơn giản nhất, một chương trình Hello World điển hình:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Những gì bạn thấy được:

  • Khái niệm hàm được giới thiệu với fn.
  • Các khối code được phân tách theo ngoặc nhọn y như C và C++.
  • Hàm main là điểm bắt đầu của cả chương trình.
  • Rust có các hygienic macros, ví dụ như println!.
  • Các chuỗi của Rust được mã hóa theo chuẩn UTF-8 và có thể chứa bất kỳ ký tự Unicode nào.

Speaker Notes

This slide should take about 5 minutes.

Slide này là nhằm cố gắng giúp các học viên thấy thoải mái với Rust code. Họ sẽ được thấy hàng tấn đoạn code tương tự trong bốn ngày tiếp theo nên ta có thể bắt đầu với những thứ nhỏ bé quen thuộc trước.

Các điểm chính:

  • Rust khá giống với các ngôn ngữ truyền thống như C/C++/Java, đó là điều bắt buộc. Nó cũng cố tránh sáng tạo lại những thứ đã có trừ khi thực sự cần thiết.

  • Rust là ngôn ngữ hiện đại với khả năng hỗ trợ đầy đủ các tính năng như là Unicode.

  • Rust sử dụng các macro trong các tình huống như khi bạn muốn có số lượng đối số thay đổi (không bao gồm nạp chồng hàm).

  • Macros được gọi là ‘hygienic’ khi chúng không vô tình bắt các định danh từ phạm vi mà chúng được sử dụng. Rust macros thực chất chỉ là hygienic từng phần.

  • Rust là ngôn ngữ đa mô hình. Ví dụ, nó có các tính năng hướng đối tượng mạnh mẽ, và dù cho nó không phải là một ngôn ngữ lập trình chức năng, nó vẫn bao gồm nhiều khái niệm lập trình hàm.

Biến

Rust cung cấp tính năng an toàn kiểu thông qua kiểu dữ liệu tĩnh. Việc liên kết biến được thực thi bằng let:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Hãy thử loại bỏ ghi chú x = 20 để cho thấy các biến theo mặc định là bất biến. Thêm từ khóa mut để cho các biến được chấp nhận thay đổi.

  • Ở đây, từ khóa i32 chỉ kiểu dữ liệu của biến. Từ khóa này là bắt buộc cho việc biên dịch, nhưng trong nhiều trường hợp, suy luận kiểu (sẽ được thảo luận sau) cho phép lập trình viên bỏ qua nó.

Giá trị

Dưới đây là các kiểu dữ liệu được định nghĩa sẵn, và nguyên văn cú pháp cho các giá trị của từng loại.

Kiểu Dữ LiệuNguyên Văn Cú Pháp
Số nguyên có dấui8, i16, i32, i64, i128, isize-10, 0, 1_000, 123_i64
Số nguyên không dấuu8, u16, u32, u64, u128, usize0, 123, 10_u16
Số thực dấu phẩy độngf32, f643.14, -10.0e20, 2_f32
Giá trị vô hướng Unicodechar'a', 'α', '∞'
Booleanbooltrue, false

Các kiểu dữ liệu có độ lớn như sau:

  • iN, uN, và fN tương đương N bit,
  • isizeusize tương đương một pointer,
  • char tương đương 32 bit,
  • bool tương đương 8 bit.

Speaker Notes

This slide should take about 5 minutes.

Ngoài ra, còn có một vài cú pháp không được kể đến ở trên:

  • Tất cả dấu gạch dưới trong số đều có thể bỏ qua, chúng chỉ nhằm cho mục đích dễ đọc mã. Vậy nên, 1_000 có thể được viết thành 1000 (hoặc 10_00), và 123_i64 có thể được viết thành 123i64.

Số Học

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.

Đây là lần đầu tiên ta thấy một hàm mới ngoài hàm main, nhưng hàm này cũng có logic rõ ràng dễ hiểu: nó nhận ba số nguyên, và trả lại một số nguyên. Chi tiết về hàm sẽ được bao quát sau.

Số học trong Rust cũng rất giống với các ngôn ngữ khác, với tiền đề giống nhau.

Vậy còn hiên tượng tràn số nguyên thì sao? Trong C và C++, vấn đề tràn số nguyên có dấu thực ra lại không được xác định, và nó có thể xảy ra theo các cách khác nhau trên các nền tảng hoặc trình biên dịch khác nhau. Còn trong Rust, nó được xác định rõ ràng.

Để xem cách tràn số nguyên xảy ra, hãy thử đổi i32 sang i16. Việc này sẽ gây ra lỗi (đã được kiểm tra) trong bản debug build và được điều chỉnh lại kiểu phù hợp trong bản release build. Ngoài ra, còn có các tùy chọn số học khác như tính số tràn (overflowing), tính số bão hòa (saturating), và tính giữ phần dư (carrying). Chúng được truy cập bằng cú pháp riêng, ví dụ, (a * b).saturating_add(b * c).saturating_add(c * a).

Trên thực tế, trình biên dịch sẽ phát hiện ra tràn số nguyên trong các biểu thức hằng số, vì thế nên ta cần phải tách ra các hàm riêng ở trong ví dụ trên.

Suy Luận Kiểu

Rust sẽ kiểm tra xem biến được sử dụng như thế nào để xác định kiểu dữ liệu:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.

Slide này cho ta thấy cách mà trình biên dịch Rust đoán kiểu dữ liệu dựa trên các ràng buộc từ việc khai báo và sử dụng biến.

Một điều quan trọng cần lưu ý đó là việc khai báo các biến như trên không phải là một dạng khai báo động theo kiểu “any” mà có thể gán được bất kỳ dữ liệu nào. Mã máy được sinh ra từ cách khai báo này cũng giống như cách khai báo một kiểu dữ liệu cụ thể. Trình biên dịch sẽ lo liệu những việc này cho ta và giúp ta viết những đoạn mã ngắn gọn hơn.

Khi không có bất cứ ràng buộc nào cho khai báo kiểu số nguyên, Rust sẽ mặc định sử dụng i32. Đôi khi trong thông báo lỗi biến sẽ được hiển thị dưới dạng {integer}. Tương tự, với các số thực dấu phẩy động, Rust mặc định là f64.

fn main() { let x = 3.14; let y = 20; assert_eq!(x, y); // ERROR: no implementation for `{float} == {integer}` }

Thực Hành: Fibonacci

Hai số Fibonacci đầu tiên mang giá trị 1. Với n > 2, số Fibonacci thứ n sẽ được tính đệ quy bằng với tổng của số Fibonacci thứ n - 1 và n - 2.

Hãy viết một hàm fib(n) để tính số Fibonacci thứ n. Khi nào thì hàm này sẽ gây lỗi?

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Control flow (Điểu khiển luồng) căn bản

Phân đoạn này sẽ kéo dài khoảng 40 phút, bao gồm:

SlideThời lượng
Lệnh if4 minutes
Vòng lặp5 minutes
break và continue4 minutes
Blocks và scopes5 minutes
Hàm3 minutes
Macros2 minutes
Thực hành: Chuỗi Collatz15 minutes

Lệnh if

Học viên sử dụng lệnh if giống hệt như lệnh if ở những ngôn ngữ lập trình khác

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Ngoài ra, học viên có thể sử dụng if như một biểu thức. Biểu thức cuối cùng của mỗi khối lệnh sẽ trở thành giá trị của biểu thức if đó

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 4 minutes.

Bởi vì if là một biểu thức và phải có một kiểu dữ liệu nhất định, mỗi nhánh khối lệnh phải có cùng một kiểu dữ liểu. Hãy trình bày nếu học viên thêm dấu ; vào sau "nhỏ" ở ví dụ thứ hai

Khi if được sử dụng cho một biểu thức, biểu thức đó phải có dấu ; để tách biệt nó khỏi lệnh tiếp theo. Loại bỏ dấu ; sau println! để thấy được lỗi biên dịch

Vòng lặp

Có ba từ khóa cho vòng lặp trong Rust: while, loopfor`

while

Từ khóa while hoạt đông tương tự như những ngôn ngữ lập trình khác, thực hiện vòng lặp lại một khối lệnh trong khi điều kiện còn đúng

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

for

[Từ khóa for] lặp lại qua chuỗi dữ liệu hoặc các phần tử trong một tập hợp:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Cơ chế hoạt động của vòng lặp for sử dụng một khái niệm gọi là “iterators” để xử việc lặp lại qua nhiều dạng dãy/tập hợp khác nhau. Iterators sẽ được nhắc tới chi tiết sau.
  • Lưu ý rằng vòng lặp for chỉ lặp tới 4. Trình bày cú pháp 1..=5 để có một dãy bao gồm

loop

Từ khóa loop là một vòng lặp bất tận, cho tới khi break.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

breakcontinue

Nếu học viên muốn bắt đầu vòng lặp tiếp theo thì hãy sử dụng continue.

Nếu học viên muốn kết thúc một vòng lặp sớm, sử dụng break. Đối với loop, từ khóa này có thể dùng một biểu thức tùy chọn trở thành giá trị của biểu thức loop

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Labels (nhãn)

Cả continuebreak có thể sử dụng một nhãn tùy chọn để được sử dụng để thoát ra khỏi những vòng lặp lồng nhau

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Lưu ý rằng loop là kiểu vòng lặp duy nhất có thể trả về một giá trị. Đấy là vởi vì vòng lặp sẽ chắc chắn được nhập vào ít nhất một lần (không giống như vòng lặp whilefor)

Blocks và scopes

Blocks (khối lệnh)

Mỗi block trong Rust chứa một chuỗi những biểu thức, được bao bọc bởi dấu {}. Mỗi block có một giá trị và một kiểu dữ liệu tương ứng với biểu thức cuối cùng của block đó:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Nếu biểu thức cuối cùng kết thúc bằng dấu ;, giá trị trả về và kiểu dữ liệu sẽ là ().

Speaker Notes

This slide and its sub-slides should take about 5 minutes.
  • Giáo viên có thể trình bày rằng giá trị của block thay đổi bằng cách thay đổi dòng cuối cùng của block đó. Ví dụ, thêm vào hoặc loại bỏ dấu ; hoặc sử dụng lệnh return

Scopes và shadowing

Mỗi scope của một biến đều được giới hạn trong một block (khối lệnh).

Học viên có thể shadow (che khuất) các biến, bao gồm các biến ở ngoài phạm vi và các biến ở trong cùng một scope:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Trình bày rằng mỗi phạm vi của một biến bị giới hạn bằng cách thêm một biến b vào block bên trong ở ví dụ trên, sau đó thử truy cập biến đó từ bên ngoài block.
  • Shadow khác với mutation (đột biến), bởi vì sau khi shadow tất cả các vị trí bộ nhớ của biến đó cùng đồng thời tồn tại. Tất cả đều tồn tại dưới cùng một tên, biến nào được sử dụng trong chương trình tùy thuộc vào việc học viên sử dụng chúng ở đâu trong mã.
  • Mỗi một shadowing variable (biến bóng) đều có thể có một kiểu dữ liệu khác nhau
  • Shadowing có thể khó hiểu ban đầu, nhưng hữu dụng trong việc giữ giá trị sau khi .unwrap().

Hàm

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.
  • Những tham số phải được chú thích sau bằng một kiểu dữ liệu (ngược lại với một số ngôn ngữ lập trình khác), rồi một giá trị trả về.
  • Biểu thức cuối cùng trong hàm (hay một khối) sẽ trở thành giá trị tả về. Đơn giản bằng cách bỏ qua dấu ; ở cuối biểu thức. Từ khóa return có thể sử dụng để trả về sớm, nhưng “giá trị trần (bare value)” là triết lí ngôn ngữ (idiomatic) khi đặt ở cuối hàm (tái cấu trúc gcd dể sử dụng return).
  • Một số hàm sẽ không có giá trị trở về, sẽ trả về ‘kiểu dữ liệu’, (). Trình biên dịch sẽ kết luận rằng kiểu dữ liệu sẽ là -> () nếu kiểu dữ liệu trả về được bỏ qua.
  • Overloading (nạp chồng) không được hỗ trợ – mỗi hàm chỉ có duy nhất một triển khai.
    • Luôn luôn lấy một số những tham số nhất định. Đối số mặc định không đưọc hỗ trợ. Macros có thể sử dụng cho hỗ trợ hàm đa tham số.
    • Luôn luôn chỉ lấy một nhóm kiểu tham số dữ liệu. Những kiểu dữ liệu này có thể là generic (tổng quát), và sẽ được đề cập tới ở phần sau.

Macros

Macros được triển khai qua mã Rust trong quá trình biên dịch, và có thể nhận nhiều những biến khác nhau. Macros được nhận diện bởi dấu ! ở sau cùng. Rust standard library (Thư viện Rust căn bản) bao gồm nhiều loại macros hữu dụng khác nhau.

  • println!(format, ..) in nội dung ra standard output, áp dụng format được miêu tả trong std::fmt.
  • format!(format, ..) hoạt động tương tự như println! nhưng trả về dữ liệu ở dạng string.
  • dbg!(expression) logs giá trị của biểu thức rồi trả về giá trị đó.
  • todo!() đánh dấu một phần của mã là not-yet-implemented (chưa được hoàn thiện). Khi được triển khai, đoạn mã đõ sẽ panic.
  • unreachable!() đánh dẫu một đoạn mã là unreachable (không thể tới). Khi được triển khai, đoạn mã đõ sẽ panic.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 2 minutes.

Bài học được rút ra trong chương này là những cú pháp thông dụng tiện nghi trên tồn tại, và các sử dụng chúng. Tại sao chúng được định nghĩa là macros, với cách những macros triền khai qua đều không quan trọng.

Khóa học này sẽ không đi qua về định nghĩ macros, nhưng những chương tiếp theo sẽ đề cập tới sử dụng derive macros.

Thực hành: Chuỗi Collatz

The Collatz Sequence is defined as follows, for an arbitrary n1 greater than zero:

  • If ni is 1, then the sequence terminates at ni.
  • If ni is even, then ni+1 = ni / 2.
  • If ni is odd, then ni+1 = 3 * ni + 1.

For example, beginning with n1 = 3:

  • 3 is odd, so n2 = 3 * 3 + 1 = 10;
  • 10 is even, so n3 = 10 / 2 = 5;
  • 5 is odd, so n4 = 3 * 5 + 1 = 16;
  • 16 is even, so n5 = 16 / 2 = 8;
  • 8 is even, so n6 = 8 / 2 = 4;
  • 4 is even, so n7 = 4 / 2 = 2;
  • 2 is even, so n8 = 1; and
  • chuỗi chấm dứt.

Viết một hàm để tính độ dài chủa chuỗi Collatz với một giá trị ban đầu n

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome Back

Buổi học nên kéo dài khoảng 2 tiếng và 35 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Tuples and Arrays35 minutes
References55 minutes
User-Defined Types50 minutes

Tuples and Arrays

Phần này sẽ kéo dài khoảng 35 phút, bao gồm:

SlideThời lượng
Arrays5 minutes
Tuples5 minutes
Array Iteration3 minutes
Patterns and Destructuring5 minutes
Exercise: Nested Arrays15 minutes

Arrays

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • A value of the array type [T; N] holds N (a compile-time constant) elements of the same type T. Note that the length of the array is part of its type, which means that [u8; 3] and [u8; 4] are considered two different types. Slices, which have a size determined at runtime, are covered later.

  • Try accessing an out-of-bounds array element. Array accesses are checked at runtime. Rust can usually optimize these checks away, and they can be avoided using unsafe Rust.

  • We can use literals to assign values to arrays.

  • The println! macro asks for the debug implementation with the ? format parameter: {} gives the default output, {:?} gives the debug output. Types such as integers and strings implement the default output, but arrays only implement the debug output. This means that we must use debug output here.

  • Adding #, eg {a:#?}, invokes a “pretty printing” format, which can be easier to read.

Tuples

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Like arrays, tuples have a fixed length.

  • Tuples group together values of different types into a compound type.

  • Fields of a tuple can be accessed by the period and the index of the value, e.g. t.0, t.1.

  • The empty tuple () is referred to as the “unit type” and signifies absence of a return value, akin to void in other languages.

Array Iteration

The for statement supports iterating over arrays (but not tuples).

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.

This functionality uses the IntoIterator trait, but we haven’t covered that yet.

The assert_ne! macro is new here. There are also assert_eq! and assert! macros. These are always checked while, debug-only variants like debug_assert! compile to nothing in release builds.

Patterns and Destructuring

When working with tuples and other structured values it’s common to want to extract the inner values into local variables. This can be done manually by directly accessing the inner values:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

However, Rust also supports using pattern matching to destructure a larger value into its constituent parts:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • The patterns used here are “irrefutable”, meaning that the compiler can statically verify that the value on the right of = has the same structure as the pattern.
  • A variable name is an irrefutable pattern that always matches any value, hence why we can also use let to declare a single variable.
  • Rust also supports using patterns in conditionals, allowing for equality comparison and destructuring to happen at the same time. This form of pattern matching will be discussed in more detail later.
  • Edit the examples above to show the compiler error when the pattern doesn’t match the value being matched on.

Exercise: Nested Arrays

Arrays can contain other arrays:

#![allow(unused)] fn main() { let array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]; }

What is the type of this variable?

Use an array such as the above to write a function transpose which will transpose a matrix (turn rows into columns):

2584567⎤8⎥9⎦transpose==1473⎤6⎥9⎦123

Copy the code below to https://play.rust-lang.org/ and implement the function. This function only operates on 3x3 matrices.

// TODO: remove this when you're done with your implementation. #![allow(unused_variables, dead_code)] fn transpose(matrix: [[i32; 3]; 3]) -> [[i32; 3]; 3] { unimplemented!() } #[test] fn test_transpose() { let matrix = [ [101, 102, 103], // [201, 202, 203], [301, 302, 303], ]; let transposed = transpose(matrix); assert_eq!( transposed, [ [101, 201, 301], // [102, 202, 302], [103, 203, 303], ] ); } fn main() { let matrix = [ [101, 102, 103], // <-- the comment makes rustfmt add a newline [201, 202, 203], [301, 302, 303], ]; println!("matrix: {:#?}", matrix); let transposed = transpose(matrix); println!("transposed: {:#?}", transposed); }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

References

Phần này sẽ kéo dài khoảng 55 phút, bao gồm:

SlideThời lượng
Shared References10 minutes
Exclusive References10 minutes
Slices: &[T]10 minutes
Strings10 minutes
Exercise: Geometry15 minutes

Shared References

A reference provides a way to access another value without taking responsibility for the value, and is also called “borrowing”. Shared references are read-only, and the referenced data cannot change.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

A shared reference to a type T has type &T. A reference value is made with the & operator. The * operator “dereferences” a reference, yielding its value.

Rust will statically forbid dangling references:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • A reference is said to “borrow” the value it refers to, and this is a good model for students not familiar with pointers: code can use the reference to access the value, but is still “owned” by the original variable. The course will get into more detail on ownership in day 3.

  • References are implemented as pointers, and a key advantage is that they can be much smaller than the thing they point to. Students familiar with C or C++ will recognize references as pointers. Later parts of the course will cover how Rust prevents the memory-safety bugs that come from using raw pointers.

  • Rust does not automatically create references for you - the & is always required.

  • Rust will auto-dereference in some cases, in particular when invoking methods (try r.is_ascii()). There is no need for an -> operator like in C++.

  • In this example, r is mutable so that it can be reassigned (r = &b). Note that this re-binds r, so that it refers to something else. This is different from C++, where assignment to a reference changes the referenced value.

  • A shared reference does not allow modifying the value it refers to, even if that value was mutable. Try *r = 'X'.

  • Rust is tracking the lifetimes of all references to ensure they live long enough. Dangling references cannot occur in safe Rust. x_axis would return a reference to point, but point will be deallocated when the function returns, so this will not compile.

  • We will talk more about borrowing when we get to ownership.

Exclusive References

Exclusive references, also known as mutable references, allow changing the value they refer to. They have type &mut T.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

Các điểm chính:

  • “Exclusive” means that only this reference can be used to access the value. No other references (shared or exclusive) can exist at the same time, and the referenced value cannot be accessed while the exclusive reference exists. Try making an &point.0 or changing point.0 while x_coord is alive.

  • Be sure to note the difference between let mut x_coord: &i32 and let x_coord: &mut i32. The first one represents a shared reference which can be bound to different values, while the second represents an exclusive reference to a mutable value.

Slices

A slice gives you a view into a larger collection:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Slices borrow data from the sliced type.
  • Question: What happens if you modify a[3] right before printing s?

Speaker Notes

This slide should take about 10 minutes.
  • We create a slice by borrowing a and specifying the starting and ending indexes in brackets.

  • If the slice starts at index 0, Rust’s range syntax allows us to drop the starting index, meaning that &a[0..a.len()] and &a[..a.len()] are identical.

  • The same is true for the last index, so &a[2..a.len()] and &a[2..] are identical.

  • To easily create a slice of the full array, we can therefore use &a[..].

  • s is a reference to a slice of i32s. Notice that the type of s (&[i32]) no longer mentions the array length. This allows us to perform computation on slices of different sizes.

  • Slices always borrow from another object. In this example, a has to remain ‘alive’ (in scope) for at least as long as our slice.

  • The question about modifying a[3] can spark an interesting discussion, but the answer is that for memory safety reasons you cannot do it through a at this point in the execution, but you can read the data from both a and s safely. It works before you created the slice, and again after the println, when the slice is no longer used.

Strings

We can now understand the two string types in Rust:

  • &str is a slice of UTF-8 encoded bytes, similar to &[u8].
  • String is an owned buffer of UTF-8 encoded bytes, similar to Vec<T>.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • &str introduces a string slice, which is an immutable reference to UTF-8 encoded string data stored in a block of memory. String literals ("Hello"), are stored in the program’s binary.

  • Rust’s String type is a wrapper around a vector of bytes. As with a Vec<T>, it is owned.

  • As with many other types String::from() creates a string from a string literal; String::new() creates a new empty string, to which string data can be added using the push() and push_str() methods.

  • The format!() macro is a convenient way to generate an owned string from dynamic values. It accepts the same format specification as println!().

  • You can borrow &str slices from String via & and optionally range selection. If you select a byte range that is not aligned to character boundaries, the expression will panic. The chars iterator iterates over characters and is preferred over trying to get character boundaries right.

  • For C++ programmers: think of &str as std::string_view from C++, but the one that always points to a valid string in memory. Rust String is a rough equivalent of std::string from C++ (main difference: it can only contain UTF-8 encoded bytes and will never use a small-string optimization).

  • Byte strings literals allow you to create a &[u8] value directly:

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Raw strings allow you to create a &str value with escapes disabled: r"\n" == "\\n". You can embed double-quotes by using an equal amount of # on either side of the quotes:

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Exercise: Geometry

We will create a few utility functions for 3-dimensional geometry, representing a point as [f64;3]. It is up to you to determine the function signatures.

// Calculate the magnitude of a vector by summing the squares of its coordinates // and taking the square root. Use the `sqrt()` method to calculate the square // root, like `v.sqrt()`. fn magnitude(...) -> f64 { todo!() } // Normalize a vector by calculating its magnitude and dividing all of its // coordinates by that magnitude. fn normalize(...) { todo!() } // Use the following `main` to test your work. fn main() { println!("Magnitude of a unit vector: {}", magnitude(&[0.0, 1.0, 0.0])); let mut v = [1.0, 2.0, 9.0]; println!("Magnitude of {v:?}: {}", magnitude(&v)); normalize(&mut v); println!("Magnitude of {v:?} after normalization: {}", magnitude(&v)); }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

User-Defined Types

Phần này sẽ kéo dài khoảng 50 phút, bao gồm:

SlideThời lượng
Named Structs10 minutes
Tuple Structs10 minutes
Enums5 minutes
Static5 minutes
Type Aliases2 minutes
Exercise: Elevator Events15 minutes

Named Structs

Like C and C++, Rust has support for custom structs:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

Key Points:

  • Structs work like in C or C++.
    • Like in C++, and unlike in C, no typedef is needed to define a type.
    • Unlike in C++, there is no inheritance between structs.
  • This may be a good time to let people know there are different types of structs.
    • Zero-sized structs (e.g. struct Foo;) might be used when implementing a trait on some type but don’t have any data that you want to store in the value itself.
    • The next slide will introduce Tuple structs, used when the field names are not important.
  • If you already have variables with the right names, then you can create the struct using a shorthand.
  • The syntax ..avery allows us to copy the majority of the fields from the old struct without having to explicitly type it all out. It must always be the last element.

Tuple Structs

If the field names are unimportant, you can use a tuple struct:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is often used for single-field wrappers (called newtypes):

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Newtypes are a great way to encode additional information about the value in a primitive type, for example:
    • The number is measured in some units: Newtons in the example above.
    • The value passed some validation when it was created, so you no longer have to validate it again at every use: PhoneNumber(String) or OddNumber(u32).
  • Demonstrate how to add a f64 value to a Newtons type by accessing the single field in the newtype.
    • Rust generally doesn’t like inexplicit things, like automatic unwrapping or for instance using booleans as integers.
    • Operator overloading is discussed on Day 3 (generics).
  • The example is a subtle reference to the Mars Climate Orbiter failure.

Enums

The enum keyword allows the creation of a type which has a few different variants:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Key Points:

  • Enumerations allow you to collect a set of values under one type.
  • Direction is a type with variants. There are two values of Direction: Direction::Left and Direction::Right.
  • PlayerMove is a type with three variants. In addition to the payloads, Rust will store a discriminant so that it knows at runtime which variant is in a PlayerMove value.
  • This might be a good time to compare structs and enums:
    • In both, you can have a simple version without fields (unit struct) or one with different types of fields (variant payloads).
    • You could even implement the different variants of an enum with separate structs but then they wouldn’t be the same type as they would if they were all defined in an enum.
  • Rust uses minimal space to store the discriminant.
    • If necessary, it stores an integer of the smallest required size

    • If the allowed variant values do not cover all bit patterns, it will use invalid bit patterns to encode the discriminant (the “niche optimization”). For example, Option<&u8> stores either a pointer to an integer or NULL for the None variant.

    • You can control the discriminant if needed (e.g., for compatibility with C):

      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

      Without repr, the discriminant type takes 2 bytes, because 10001 fits 2 bytes.

More to Explore

Rust has several optimizations it can employ to make enums take up less space.

  • Null pointer optimization: For some types, Rust guarantees that size_of::<T>() equals size_of::<Option<T>>().

    Example code if you want to show how the bitwise representation may look like in practice. It’s important to note that the compiler provides no guarantees regarding this representation, therefore this is totally unsafe.

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

static

Static variables will live during the whole execution of the program, and therefore will not move:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

As noted in the Rust RFC Book, these are not inlined upon use and have an actual associated memory location. This is useful for unsafe and embedded code, and the variable lives through the entirety of the program execution. When a globally-scoped value does not have a reason to need object identity, const is generally preferred.

Speaker Notes

This slide should take about 5 minutes.
  • static is similar to mutable global variables in C++.
  • static provides object identity: an address in memory and state as required by types with interior mutability such as Mutex<T>.

More to Explore

Because static variables are accessible from any thread, they must be Sync. Interior mutability is possible through a Mutex, atomic or similar.

Thread-local data can be created with the macro std::thread_local.

const

Constants are evaluated at compile time and their values are inlined wherever they are used:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

According to the Rust RFC Book these are inlined upon use.

Only functions marked const can be called at compile time to generate const values. const functions can however be called at runtime.

Speaker Notes

  • Mention that const behaves semantically similar to C++’s constexpr
  • It isn’t super common that one would need a runtime evaluated constant, but it is helpful and safer than using a static.

Type Aliases

A type alias creates a name for another type. The two types can be used interchangeably.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 2 minutes.

C programmers will recognize this as similar to a typedef.

Exercise: Elevator Events

We will create a data structure to represent an event in an elevator control system. It is up to you to define the types and functions to construct various events. Use #[derive(Debug)] to allow the types to be formatted with {:?}.

This exercise only requires creating and populating data structures so that main runs without errors. The next part of the course will cover getting data out of these structures.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome to Day 2

Now that we have seen a fair amount of Rust, today will focus on Rust’s type system:

  • Pattern matching: extracting data from structures.
  • Methods: associating functions with types.
  • Traits: behaviors shared by multiple types.
  • Generics: parameterizing types on other types.
  • Standard library types and traits: a tour of Rust’s rich standard library.

Mục lục

Buổi học nên kéo dài khoảng 2 tiếng và 10 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Lời Chào Mừng3 minutes
Pattern Matching1 hour
Methods and Traits50 minutes

Pattern Matching

Phần này sẽ kéo dài khoảng 1 giờ, bao gồm:

SlideThời lượng
Matching Values10 minutes
Destructuring Structs4 minutes
Destructuring Enums4 minutes
Let Control Flow10 minutes
Exercise: Expression Evaluation30 minutes

Matching Values

The match keyword lets you match a value against one or more patterns. The comparisons are done from top to bottom and the first match wins.

The patterns can be simple values, similarly to switch in C and C++:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The _ pattern is a wildcard pattern which matches any value. The expressions must be exhaustive, meaning that it covers every possibility, so _ is often used as the final catch-all case.

Match can be used as an expression. Just like if, each match arm must have the same type. The type is the last expression of the block, if any. In the example above, the type is ().

A variable in the pattern (key in this example) will create a binding that can be used within the match arm.

A match guard causes the arm to match only if the condition is true.

Speaker Notes

This slide should take about 10 minutes.

Key Points:

  • You might point out how some specific characters are being used when in a pattern

    • | as an or
    • .. can expand as much as it needs to be
    • 1..=5 represents an inclusive range
    • _ is a wild card
  • Match guards as a separate syntax feature are important and necessary when we wish to concisely express more complex ideas than patterns alone would allow.

  • They are not the same as separate if expression inside of the match arm. An if expression inside of the branch block (after =>) happens after the match arm is selected. Failing the if condition inside of that block won’t result in other arms of the original match expression being considered.

  • The condition defined in the guard applies to every expression in a pattern with an |.

Structs

Like tuples, Struct can also be destructured by matching:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 4 minutes.
  • Change the literal values in foo to match with the other patterns.
  • Add a new field to Foo and make changes to the pattern as needed.
  • The distinction between a capture and a constant expression can be hard to spot. Try changing the 2 in the second arm to a variable, and see that it subtly doesn’t work. Change it to a const and see it working again.

Enums

Like tuples, enums can also be destructured by matching:

Patterns can also be used to bind variables to parts of your values. This is how you inspect the structure of your types. Let us start with a simple enum type:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Here we have used the arms to destructure the Result value. In the first arm, half is bound to the value inside the Ok variant. In the second arm, msg is bound to the error message.

Speaker Notes

This slide should take about 4 minutes.
  • The if/else expression is returning an enum that is later unpacked with a match.
  • You can try adding a third variant to the enum definition and displaying the errors when running the code. Point out the places where your code is now inexhaustive and how the compiler tries to give you hints.
  • The values in the enum variants can only be accessed after being pattern matched.
  • Demonstrate what happens when the search is inexhaustive. Note the advantage the Rust compiler provides by confirming when all cases are handled.
  • Save the result of divide_in_two in the result variable and match it in a loop. That won’t compile because msg is consumed when matched. To fix it, match &result instead of result. That will make msg a reference so it won’t be consumed. This “match ergonomics” appeared in Rust 2018. If you want to support older Rust, replace msg with ref msg in the pattern.

Let Control Flow

Rust has a few control flow constructs which differ from other languages. They are used for pattern matching:

  • if let expressions
  • while let expressions
  • match expressions

if let expressions

The if let expression lets you execute different code depending on whether a value matches a pattern:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

let else expressions

For the common case of matching a pattern and returning from the function, use let else. The “else” case must diverge (return, break, or panic - anything but falling off the end of the block).

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Like with if let, there is a while let variant which repeatedly tests a value against a pattern:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Here String::pop returns Some(c) until the string is empty, after which it will return None. The while let lets us keep iterating through all items.

Speaker Notes

This slide should take about 10 minutes.

if-let

  • Unlike match, if let does not have to cover all branches. This can make it more concise than match.
  • A common usage is handling Some values when working with Option.
  • Unlike match, if let does not support guard clauses for pattern matching.

let-else

if-lets can pile up, as shown. The let-else construct supports flattening this nested code. Rewrite the awkward version for students, so they can see the transformation.

The rewritten version is:

#![allow(unused)] fn main() { fn hex_or_die_trying(maybe_string: Option<String>) -> Result<u32, String> { let Some(s) = maybe_string else { return Err(String::from("got None")); }; let Some(first_byte_char) = s.chars().next() else { return Err(String::from("got empty string")); }; let Some(digit) = first_byte_char.to_digit(16) else { return Err(String::from("not a hex digit")); }; return Ok(digit); } }

while-let

  • Point out that the while let loop will keep going as long as the value matches the pattern.
  • You could rewrite the while let loop as an infinite loop with an if statement that breaks when there is no value to unwrap for name.pop(). The while let provides syntactic sugar for the above scenario.

Exercise: Expression Evaluation

Let’s write a simple recursive evaluator for arithmetic expressions.

The Box type here is a smart pointer, and will be covered in detail later in the course. An expression can be “boxed” with Box::new as seen in the tests. To evaluate a boxed expression, use the deref operator (*) to “unbox” it: eval(*boxed_expr).

Some expressions cannot be evaluated and will return an error. The standard Result<Value, String> type is an enum that represents either a successful value (Ok(Value)) or an error (Err(String)). We will cover this type in detail later.

Copy and paste the code into the Rust playground, and begin implementing eval. The final product should pass the tests. It may be helpful to use todo!() and get the tests to pass one-by-one. You can also skip a test temporarily with #[ignore]:

#[test] #[ignore] fn test_value() { .. }

If you finish early, try writing a test that results in division by zero or integer overflow. How could you handle this with Result instead of a panic?

#![allow(unused)] fn main() { /// An operation to perform on two subexpressions. #[derive(Debug)] enum Operation { Add, Sub, Mul, Div, } /// An expression, in tree form. #[derive(Debug)] enum Expression { /// An operation on two subexpressions. Op { op: Operation, left: Box<Expression>, right: Box<Expression> }, /// A literal value Value(i64), } fn eval(e: Expression) -> Result<i64, String> { todo!() } #[test] fn test_value() { assert_eq!(eval(Expression::Value(19)), Ok(19)); } #[test] fn test_sum() { assert_eq!( eval(Expression::Op { op: Operation::Add, left: Box::new(Expression::Value(10)), right: Box::new(Expression::Value(20)), }), Ok(30) ); } #[test] fn test_recursion() { let term1 = Expression::Op { op: Operation::Mul, left: Box::new(Expression::Value(10)), right: Box::new(Expression::Value(9)), }; let term2 = Expression::Op { op: Operation::Mul, left: Box::new(Expression::Op { op: Operation::Sub, left: Box::new(Expression::Value(3)), right: Box::new(Expression::Value(4)), }), right: Box::new(Expression::Value(5)), }; assert_eq!( eval(Expression::Op { op: Operation::Add, left: Box::new(term1), right: Box::new(term2), }), Ok(85) ); } #[test] fn test_error() { assert_eq!( eval(Expression::Op { op: Operation::Div, left: Box::new(Expression::Value(99)), right: Box::new(Expression::Value(0)), }), Err(String::from("division by zero")) ); } }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Methods and Traits

Phần này sẽ kéo dài khoảng 50 phút, bao gồm:

SlideThời lượng
Methods10 minutes
Traits15 minutes
Deriving3 minutes
Exercise: Generic Logger20 minutes

Methods

Rust allows you to associate functions with your new types. You do this with an impl block:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The self arguments specify the “receiver” - the object the method acts on. There are several common receivers for a method:

  • &self: borrows the object from the caller using a shared and immutable reference. The object can be used again afterwards.
  • &mut self: borrows the object from the caller using a unique and mutable reference. The object can be used again afterwards.
  • self: takes ownership of the object and moves it away from the caller. The method becomes the owner of the object. The object will be dropped (deallocated) when the method returns, unless its ownership is explicitly transmitted. Complete ownership does not automatically mean mutability.
  • mut self: same as above, but the method can mutate the object.
  • No receiver: this becomes a static method on the struct. Typically used to create constructors which are called new by convention.

Speaker Notes

This slide should take about 8 minutes.

Key Points:

  • It can be helpful to introduce methods by comparing them to functions.
    • Methods are called on an instance of a type (such as a struct or enum), the first parameter represents the instance as self.
    • Developers may choose to use methods to take advantage of method receiver syntax and to help keep them more organized. By using methods we can keep all the implementation code in one predictable place.
  • Point out the use of the keyword self, a method receiver.
    • Show that it is an abbreviated term for self: Self and perhaps show how the struct name could also be used.
    • Explain that Self is a type alias for the type the impl block is in and can be used elsewhere in the block.
    • Note how self is used like other structs and dot notation can be used to refer to individual fields.
    • This might be a good time to demonstrate how the &self differs from self by trying to run finish twice.
    • Beyond variants on self, there are also special wrapper types allowed to be receiver types, such as Box<Self>.

Traits

Rust lets you abstract over types with traits. They’re similar to interfaces:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide and its sub-slides should take about 15 minutes.
  • A trait defines a number of methods that types must have in order to implement the trait.

  • In the “Generics” segment, next, we will see how to build functionality that is generic over all types implementing a trait.

Implementing Traits

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • To implement Trait for Type, you use an impl Trait for Type { .. } block.

  • Unlike Go interfaces, just having matching methods is not enough: a Cat type with a talk() method would not automatically satisfy Pet unless it is in an impl Pet block.

  • Traits may provide default implementations of some methods. Default implementations can rely on all the methods of the trait. In this case, greet is provided, and relies on talk.

Supertraits

A trait can require that types implementing it also implement other traits, called supertraits. Here, any type implementing Pet must implement Animal.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This is sometimes called “trait inheritance” but students should not expect this to behave like OO inheritance. It just specifies an additional requirement on implementations of a trait.

Speaker Notes

Associated Types

Associated types are placeholder types which are supplied by the trait implementation.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Associated types are sometimes also called “output types”. The key observation is that the implementer, not the caller, chooses this type.

  • Many standard library traits have associated types, including arithmetic operators and Iterator.

Deriving

Supported traits can be automatically implemented for your custom types, as follows:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.

Derivation is implemented with macros, and many crates provide useful derive macros to add useful functionality. For example, serde can derive serialization support for a struct using #[derive(Serialize)].

Exercise: Logger Trait

Let’s design a simple logging utility, using a trait Logger with a log method. Code which might log its progress can then take an &impl Logger. In testing, this might put messages in the test logfile, while in a production build it would send messages to a log server.

However, the StderrLogger given below logs all messages, regardless of verbosity. Your task is to write a VerbosityFilter type that will ignore messages above a maximum verbosity.

This is a common pattern: a struct wrapping a trait implementation and implementing that same trait, adding behavior in the process. What other kinds of wrappers might be useful in a logging utility?

use std::fmt::Display; pub trait Logger { /// Log a message at the given verbosity level. fn log(&self, verbosity: u8, message: impl Display); } struct StderrLogger; impl Logger for StderrLogger { fn log(&self, verbosity: u8, message: impl Display) { eprintln!("verbosity={verbosity}: {message}"); } } fn do_things(logger: &impl Logger) { logger.log(5, "FYI"); logger.log(2, "Uhoh"); } // TODO: Define and implement `VerbosityFilter`. fn main() { let l = VerbosityFilter { max_verbosity: 3, inner: StderrLogger }; do_things(&l); }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome Back

Buổi học nên kéo dài khoảng 4 tiếng, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Generics40 minutes
Standard Library Types1 hour and 20 minutes
Standard Library Traits1 hour and 40 minutes

Generics

Phân đoạn này sẽ kéo dài khoảng 40 phút, bao gồm:

SlideThời lượng
Generic Functions5 minutes
Generic Data Types10 minutes
Trait Bounds10 minutes
impl Trait5 minutes
Exercise: Generic min10 minutes

Generic Functions

Rust supports generics, which lets you abstract algorithms or data structures (such as sorting or a binary tree) over the types used or stored.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Rust infers a type for T based on the types of the arguments and return value.

  • This is similar to C++ templates, but Rust partially compiles the generic function immediately, so that function must be valid for all types matching the constraints. For example, try modifying pick to return even + odd if n == 0. Even if only the pick instantiation with integers is used, Rust still considers it invalid. C++ would let you do this.

  • Generic code is turned into non-generic code based on the call sites. This is a zero-cost abstraction: you get exactly the same result as if you had hand-coded the data structures without the abstraction.

Generic Data Types

You can use generics to abstract over the concrete field type:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Q: Why T is specified twice in impl<T> Point<T> {}? Isn’t that redundant?

    • This is because it is a generic implementation section for generic type. They are independently generic.
    • It means these methods are defined for any T.
    • It is possible to write impl Point<u32> { .. }.
      • Point is still generic and you can use Point<f64>, but methods in this block will only be available for Point<u32>.
  • Try declaring a new variable let p = Point { x: 5, y: 10.0 };. Update the code to allow points that have elements of different types, by using two type variables, e.g., T and U.

Generic Traits

Traits can also be generic, just like types and functions. A trait’s parameters get concrete types when it is used.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • The From trait will be covered later in the course, but its definition in the std docs is simple.

  • Implementations of the trait do not need to cover all possible type parameters. Here, Foo::From("hello") would not compile because there is no From<&str> implementation for Foo.

  • Generic traits take types as “input”, while associated types are a kind of “output” type. A trait can have multiple implementations for different input types.

  • In fact, Rust requires that at most one implementation of a trait match for any type T. Unlike some other languages, Rust has no heuristic for choosing the “most specific” match. There is work on adding this support, called specialization.

Trait Bounds

When working with generics, you often want to require the types to implement some trait, so that you can call this trait’s methods.

You can do this with T: Trait or impl Trait:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 8 minutes.
  • Try making a NonClonable and passing it to duplicate.

  • When multiple traits are necessary, use + to join them.

  • Show a where clause, students will encounter it when reading code.

    fn duplicate<T>(a: T) -> (T, T) where T: Clone, { (a.clone(), a.clone()) }
    • It declutters the function signature if you have many parameters.
    • It has additional features making it more powerful.
      • If someone asks, the extra feature is that the type on the left of “:” can be arbitrary, like Option<T>.
  • Note that Rust does not (yet) support specialization. For example, given the original duplicate, it is invalid to add a specialized duplicate(a: u32).

impl Trait

Similar to trait bounds, an impl Trait syntax can be used in function arguments and return values:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

impl Trait allows you to work with types which you cannot name. The meaning of impl Trait is a bit different in the different positions.

  • For a parameter, impl Trait is like an anonymous generic parameter with a trait bound.

  • For a return type, it means that the return type is some concrete type that implements the trait, without naming the type. This can be useful when you don’t want to expose the concrete type in a public API.

    Inference is hard in return position. A function returning impl Foo picks the concrete type it returns, without writing it out in the source. A function returning a generic type like collect<B>() -> B can return any type satisfying B, and the caller may need to choose one, such as with let x: Vec<_> = foo.collect() or with the turbofish, foo.collect::<Vec<_>>().

What is the type of debuggable? Try let debuggable: () = .. to see what the error message shows.

Exercise: Generic min

In this short exercise, you will implement a generic min function that determines the minimum of two values, using the Ord trait.

use std::cmp::Ordering; // TODO: implement the `min` function used in `main`. fn main() { assert_eq!(min(0, 10), 0); assert_eq!(min(500, 123), 123); assert_eq!(min('a', 'z'), 'a'); assert_eq!(min('7', '1'), '1'); assert_eq!(min("hello", "goodbye"), "goodbye"); assert_eq!(min("bat", "armadillo"), "armadillo"); }

Speaker Notes

This slide and its sub-slides should take about 10 minutes.

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Standard Library Types

Phần này sẽ kéo dài khoảng 1 giờ và 20 phút, bao gồm:

SlideThời lượng
Standard Library3 minutes
Documentation5 minutes
Option10 minutes
Result10 minutes
String10 minutes
Vec10 minutes
HashMap10 minutes
Exercise: Counter20 minutes

Speaker Notes

Hãy dành thời gian xem lại tài liệu cho từng trang trong phần này, đặc biệt là những hàm phổ biến.

Standard Library

Đi kèm với Rust là một thư viện chuẩn tổng hợp các kiểu dữ liệu thông dụng.Nhờ đó, hai thư viện có thể hoạt động cùng nhau một cách mượt mà vì cả hai đều sử dụng cùng một kiểu String của thư viện chuẩn.

Trong thực tế, thư viện chuẩn của Rust chứa nhiều tầng riêng biệt: core, allocstd.

  • core bao gồm các kiểu dữ liệu và hàm cơ bản nhất không phụ thuộc vào libc, bộ cấp phát bộ nhớ hoặc thậm chí là hệ điều hành.
  • alloc bao gồm các kiểu dữ liệu yêu cầu phải được cấp phát trên bộ nhớ heap, như Vec, BoxArc.
  • Phần mềm nhúng viết bằng Rust thường chỉ sử dụng core, và đôi khi alloc.

Documentation

Rust đi kèm với một bộ tài liệu rất chi tiết. Ví dụ:

  • Tất cả chi tiết về vòng lặp.
  • Các kiểu dữ liệu cơ bản như u8.
  • Các kiểu dữ liệu thuộc về thư viện chuẩn như Option hoặc BinaryHeap.

Ngoài ra, bạn cũng có thể tự viết tài liệu cho code của bản thân:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Nội dung tài liệu sẽ được xử lý bằng Markdown. Tất cả các thư viện Rust đã được publish đều sẽ tự động được công cụ rustdoc viết tài liệu, và được lưu trữ tại docs.rs. Tất cả thành phần công khai trong một API nên được viết tài liệu theo phương pháp này.

Để viết tài liệu cho một thành phần từ bên trong thành phần đó (như bên trong một module), hãy sử dụng //! hoặc /*! .. */. Cú pháp trên thường được gọi là “inner doc comments”:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Option

Chúng ta đã thấy một số cách dùng Option<T>. Kiểu dữ liệu này hoặc lưu trữ một giá trị kiểu T hoặc không lưu trữ gì cả. Ví dụ, hàm String::find trả về một Option<usize>.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Option được sử dụng rộng rãi ở nhiều nơi, không chỉ trong thư viện chuẩn.
  • unwrap sẽ trả về giá trị của Option hoặc panic. expect hoạt động tương tự nhưng có thể truyền thêm một thông báo lỗi.
    • Chương trình có thể panic khi gặp phải giá trị None, nhưng bạn không thể “vô tình” quên kiểm tra None.
    • Thông thường, ta có thể thoải mái sử dụng unwrap/expect khi thử nghiệm một ý tưởng mới, nhưng code production thường xử lý None một cách an toàn hơn.
  • Option<T> được tối ưu để chiếm bằng bộ nhớ như Ttrong đa số trường hợp.

Result

Result hoạt động tương tự như Option, nhưng được sử dụng để biểu diễn kết quả thành công hoặc thất bại của một tác vụ. Tương tự như Res trong phần thực hành về biểu thức, Result cũng có hai biến thể khác nhau cho mỗi trường hợp thành công và thất bại, nhưng được tổng quát hóa: Result<T, E> trong đó T được sử dụng trong trường hợp OkE xuất hiện trong trường hợp Err.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Tương tự với Option, giá trị trả về trong trường hợp thành công được đặt bên trong Result, buộc người dùng phải trích xuất giá trị một cách rõ ràng, đảm bảo mọi lỗi đều được kiểm tra. Trong trường hợp mà lỗi chắc chắn không thể xảy ra, unwrap() hoặc expect() có thể được gọi, bày tỏ rõ ràng ý định của người viết là lỗi không thể xảy ra.
  • Khuyến khích học viên đọc tài liệu về Result. Không nhất thiết phải đọc trong khóa học, nhưng đáng đề cập đến. Tài liệu này chứa nhiều hàm khá tiện lợi, hỗ trợ việc lập trình theo functional-style programming.
  • Result là kiểu dữ liệu chuẩn để xử lý lỗi như chúng ta sẽ thấy trong ngày 4.

String

String là kiểu dữ liệu chuẩn sử dụng để lưu trữ chuỗi UTF-8 trên bộ nhớ heap, có thể được mở rộng:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

String implement Deref<Target = str>, có nghĩa tất cả các hàm của str đều có thể được gọi trên một biến kiểu String.

Speaker Notes

This slide should take about 10 minutes.
  • String::new trả về một chuỗi rỗng. Khi dữ liệu cần đẩy vào chuỗi đã được xác định trước, hãy sử dụng String::with_capacity.
  • String::len trả về kích thước của String theo byte (có thể khác với số ký tự trong chuỗi).
  • String::chars trả về một con trỏ chạy dọc theo các ký tự của chuỗi. Lưu ý rằng một char có thể khác với một “ký tự” thông thường bởi vì một khái niệm gọi là grapheme clusters.
  • Khi nói đến chuỗi, ta có thể sử dụng &str hoặc String đều được.
  • Khi một kiểu dữ liệu implement Deref<Target = T>, trình biên dịch sẽ cho phép gọi các hàm thuộc về T trên kiểu dữ liệu đó.
    • Chúng ta chưa đi sâu vào Deref trait, việc đề cập đến Deref ở đây chủ yếu là để giải thích cấu trúc của mục lục tài liệu.
    • String implement Deref<Target = str>, nên các hàm của str đều có thể được gọi trên các biến kiểu String.
    • So sánh let s3 = s1.deref();let s3 = &*s1;. Cái nào dễ hiểu hơn?
  • String được được viết dựa trên một vector gồm nhiều byte, nên những hàm gọi được trên vector cũng có thể được gọi trên String, nhưng đi kèm với một số điều kiện để đảm bảo các tính chất của chuỗi.
  • So sánh các cách truy cập phần tử trong một biến kiểu String:
    • Truy cập một ký tự bằng cách sử dụng s3.chars().nth(i).unwrap() với i nằm trong phạm vi của chuỗi, và với i nằm ngoài phạm vi của chuỗi.
    • Truy cập một xâu con bằng cách sử dụng s3[0..4], với phạm vi của xâu con con giao thoa với ranh giới của các ký tự cận biên, hoặc với các ký tự cận biên hoàn toàn nằm bên trong xâu con. Ví dụ, truy cập s[5..7] với s = “Xin chào”`.
  • Nhiều kiểu dữ liệu có thể được chuyển đổi thành chuỗi bằng phương thức to_string. Trait này đã được implement sẵn cho tất cả các kiểu dữ liệu implement Display, nên bất cứ giá trị nào có thể được định dạng cũng có thể được chuyển thành chuỗi.

Vec

Vec là kiểu dữ liệu chuẩn dạng mảng có thể tự thay đổi kích thước:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Vec implement Deref<Target = [T]>, nên tất cả các hàm của slice đều có thể được gọi trên một biến kiểu Vec.

Speaker Notes

This slide should take about 10 minutes.
  • Tương tự StringHashMap, Vec là một kiểu dữ liệu sử dụng để lưu trữ nhiều phần tử. Dữ liệu được lưu trữ trên bộ nhớ heap, nên không cần phải biết trước kích thước của dữ liệu tại thời điểm compile. Kích thước của Vec có thể thay đổi tại trong quá trình chạy.
  • Vec<T> cũng là một kiểu dữ liệu generic, nhưng trong đa số trường hợp ta không cần phải chỉ rõ ra T. Hệ thống nội suy kiểu dữ liệu của Rust sẽ tự động xác định T trong lần đầu tiên gọi hàm push.
  • Ta có thể sử dụng macro vec![...] thay vì gọi hàm Vec::new() để khởi tạo một Vec. vec![...] còn có thể được dùng để khởi tạo vector với một số phần tử cho trước.
  • Để truy cập phần tử của một vector ta có thể sử dụng [ ], nhưng nếu phần tử được truy cập cập nằm ngoài phạm vi của vector, chương trình sẽ panic. Trong trường hợp này, ta có thể sử dụng hàm get để trả về một Option. Hàm pop sẽ loại bỏ phần tử cuối cùng của vector.
  • Chúng ta sẽ đi sâu vào chi tiết về slice trong ngày 3. Bây giờ, học viên chỉ cần biết rằng một biến kiểu Vec cũng có thể sử dụng tất cả các hàm của slice.

HashMap

Một bảng băm chuẩn với cơ chế bảo vệ chống lại các cuộc tấn công HashDoS:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • HashMap không được định nghĩa trong prelude và cần phải được import vào scope trước khi sử dụng.

  • Thử chạy đoạn code sau. Dòng đầu tiên sẽ kiểm tra xem một cuốn sách có tồn tại trong hashmap không, và nếu không trả về một giá trị thay thế. Dòng thứ hai sẽ chèn giá trị thay thế vào hashmap nếu cuốn sách không tồn tại.

    let pc1 = page_counts .get("Harry Potter và hòn đá phù thủy") .unwrap_or(&336); let pc2 = page_counts .entry("Trò chơi sinh tử".to_string()) .or_insert(374);
  • Khác với vec!, Rust không cung cấp macro hashmap!.

    • Thay vào đó, kể từ phiên bản Rust 1.56, HashMap implement From<[(K, V); N]>, cho phép chúng ta dễ dàng khởi tạo một hashmap từ một mảng giá trị:

      let page_counts = HashMap::from([ ("Harry Potter và hòn đá phù thủy".to_string(), 336), ("Trò chơi sinh tử".to_string(), 374), ]);
  • HashMap cũng có thể được khởi tạo từ bất kỳ Iterator nào trả về các cặp key-value.

  • Trong những ví dụ trên, HashMap<String, i32> được sử dụng thay vì &str để làm cho ví dụ dễ hiểu hơn. Ta tất nhiên có thể sử dụng tham chiếu trong các cấu trúc dữ liệu, nhưng có thể gặp nhiều vấn đề với hệ thống kiểm tra vay mượn của Rust.

    • Thử xoá hàm to_string() từ ví dụ trên và xem chương trình có vẫn chạy không. Bạn nghĩ chúng ta sẽ gặp vấn đề ở đâu?
  • HashMap đi kèm với nhiều kiểu dữ liệu chỉ sử dụng làm kiểu trả về của một số hàm, như std::collections::hash_map::Keys. Những kiểu dữ liệu này thường xuất hiện trong tài liệu của Rust. Hãy cho học viên xem tài liệu về kiểu dữ liệu Keys, và liên hệ với hàm keys.

Exercise: Counter

Trong bài tập này, ta sẽ chuyển một cấu trúc dữ liệu đơn giản thành một cấu trúc dữ liệu generic. Chúng ta sẽ sử dụng một std::collections::HashMap để theo dõi các giá trị đã xuất hiện và số lần xuất hiện của mỗi giá trị.

Phiên bản đầu tiên của Counter chỉ hoạt động với các giá trị kiểu u32. Hãy biến struct Counter và các hàm của nó thành generic, để Counter có thể theo dõi bất kỳ kiểu dữ liệu nào.

Nếu bạn hoàn thành sớm, hãy thử sử dụng hàm entry để giảm đi một nửa số lần thực hiện hash lookup cần thiết để thực hiện hàm count.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Standard Library Traits

Phần này sẽ kéo dài khoảng 1 giờ và 40 phút, bao gồm:

SlideThời lượng
Comparisons10 minutes
Operators10 minutes
From and Into10 minutes
Casting5 minutes
Read and Write10 minutes
Default, struct update syntax5 minutes
Closures20 minutes
Exercise: ROT1330 minutes

Speaker Notes

Tương tự như trong chương về các kiểu dữ liệu chuẩn, hãy dành thời gian để ôn lại tài liệu cho mỗi trait.

Nội dung của chương này khá dài, hãy cho học sinh nghỉ giải lao giữa chương.

Comparisons

Kiểu dữ liệu implement các trait này có thể được so sánh với nhau. Tất cả các trait này có thể được derive nếu các trường của kiểu dữ liệu sử dụng derive cũng implement các trait này.

PartialEqEq

PartialEq là một quan hệ tương đương một phần. Kiểu dữ liệu implement trait này phải định nghĩa hàm eq, hàm ne tự được định nghĩa dựa trên eq. Các toán tử ==!= sẽ gọi đến các hàm này.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Eq là một quan hệ tương đương toàn phần (tính phản xạ, đối xứng và bắc cầu). Kiểu dữ liệu implement trait này mặc định implement PartialEq. Các hàm yêu cầu quan hệ tương đương toàn phần sẽ sử dụng Eq dưới dạng một trait bound.

PartialOrdOrd

PartialOrd định nghĩa một quan hệ thứ tự một phần, với hàm partial_cmp. Trait này được sử dụng để implement các toán tử <, <=, >=, và >.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Ord là một quan hệ thứ tự toàn phần, với hàm cmp trả về kiểu dữ liệu Ordering.

Speaker Notes

This slide should take about 10 minutes.

PartialEq có thể được implement để so sánh giữa các kiểu dữ liệu khác nhau, nhưng Eq thì không, vì Eq có tính phản xạ:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trong thực tế, các trait này thường thường được derive, ít khi được implement.

Operators

Ta có thể thực hiện overloading toán tử thông qua các trait trong std::ops:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

Một số điểm cần thảo luận:

  • Ta có thể implement Add cho &Point. Việc này có ích trong những tình huống nào?
    • Trả lời: Add:add tiêu thụ self. Nếu kiểu T mà ta muốn overloading toán tử không implement Copy, ta cũng nên cân nhắc overload toán tử cho &T. Như vậy ta có thể tránh việc clone không cần thiết tại thời điểm gọi hàm.
  • Tại sao Output là một associated type? Có thể thay thế Output bằng một tham số generic cho hàm không?
    • Trả lời: Tham số generic của hàm được quyết định bởi người gọi hàm, nhưng associated types (như Output) được quyết định bởi người implement trait.
  • Ta có thể implement Add cho hai kiểu dữ liệu khác nhau, ví dụ impl Add<(i32, i32)> for Point sẽ cộng một tuple vào một Point.

From and Into

Ta có thể implement trait FromInto để đơn giản hoá việc chuyển đổi kiểu dữ liệu:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Into sẽ tự động được implement khi From được implement:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Vì vậy, thông thường ta chỉ cần implement From, vì kiểu dữ liệu của bạn sẽ tự động có Into.
  • Ngược lại, khi ta muốn khai báo 1 hàm nhận “bất kỳ kiểu dữ liệu nào có thể chuyển đổi thành String”, ta nên sử dụng Into. Như vậy, hàm được khai báo sẽ chấp nhận các kiểu dữ liệu implement From và những kiểu dữ liệu chỉ implement Into.

Casting

Rust không hỗ trợ ép kiểu ngầm định, nhưng hỗ trợ ép kiểu một cách tường minh với as. Cú pháp này khá giống với cách ép kiểu trong ngôn ngữ C.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Kết quả của việc ép kiểu bằng as luôn được định nghĩa rõ ràng trong Rust, cố định trên mọi nền tảng. Vì vậy, đôi lúc kết quả có thể không giống với dự kiến của bạn, ví dụ như khi đổi dấu hoặc chuyển về kiểu dữ liệu nhỏ hơn – hãy kiểm tra tài liệu cho chắc chắn.

Ép kiểu dữ liệu với as là một công cụ khá mạnh nên dễ bị sử dụng sai, có thể gây ra những lỗi khó phát hiện khi thay đổi kiểu dữ liệu hoặc phạm vi giá trị của biến. Ép kiểu với as nên được sử dụng chỉ khi ta chắc chắn rằng lỗi không thể xảy ra (ví dụ: lấy 32 bit cuối của một u64 với as u32, bỏ qua giá trị của 32 bit đầu).

Nên sử dụng From hoặc Into thay vì as cho những trường hợp ép kiểu không thể thất bại (ví dụ: u32 sang u64), để đảm bảo rằng quá trình chuyển đổi thật sự không thể thất bại. Đối với những trường hợp ép kiểu có thể thất bại, nên sử dụng TryFromTryInto để xử lý lỗi khi ép kiểu (khi một giá trị không hoàn toàn phù hợp với kiểu dữ liệu đích).

Speaker Notes

This slide should take about 5 minutes.

Hãy cân nhắc cho học viên nghỉ giải lao sau slide này.

as hoạt động tương tự như static cast trong C++. Không nên sử dụng as nếu dữ liệu có thể bị mất khi ép kiểu, ít nhất phải có một comment giải thích rõ ràng lý do sử dụng as.

Một trường hợp ngoại lệ là khi ép kiểu số nguyên sang usize để sử dụng làm index trong array.

Read and Write

Bằng cách sử dụng ReadBufRead, ta có thể đọc dữ liệu văn bản từ một mảng u8:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Tương tự, Write cho phép ta ghi dữ liệu văn bản vào một mảng u8:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trait Default

Kiểu dữ liệu implement trait Default sẽ sở hữu một giá trị mặc định.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Người dùng có thể trực tiếp implement trait này, hoặc derive trait này bằng #[derive(Default)].
  • Khi derive trait này, từng miền của giá trị mặc định được tạo ra sẽ được gán bằng giá trị mặc định của kiểu dữ liệu tương ứng.
    • Vì vậy tất cả các miền của struct cũng phải implement Default.
  • Kiểu dữ liệu chuẩn của Rust thường hay implement trait Default (như 0, "", vân vân).
  • Trait này rất hữu dụng khi ta cần khởi tạo struct chỉ với một ít miền.
  • Vì các kiểu dữ liệu thường hay implment Default, thư viện chuẩn của Rust cũng cung cấp một số hàm giúp người dùng có thể tận dụng giá trị mặc định của biến.
  • Dấu .. còn được gọi là ký hiệu update struct.

Closures

Closures or lambda expressions have types which cannot be named. However, they implement special Fn, FnMut, and FnOnce traits:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 20 minutes.

An Fn (e.g. add_3) neither consumes nor mutates captured values, or perhaps captures nothing at all. It can be called multiple times concurrently.

An FnMut (e.g. accumulate) might mutate captured values. You can call it multiple times, but not concurrently.

If you have an FnOnce (e.g. multiply_sum), you may only call it once. It might consume captured values.

FnMut is a subtype of FnOnce. Fn is a subtype of FnMut and FnOnce. I.e. you can use an FnMut wherever an FnOnce is called for, and you can use an Fn wherever an FnMut or FnOnce is called for.

When you define a function that takes a closure, you should take FnOnce if you can (i.e. you call it once), or FnMut else, and last Fn. This allows the most flexibility for the caller.

In contrast, when you have a closure, the most flexible you can have is Fn (it can be passed everywhere), then FnMut, and lastly FnOnce.

The compiler also infers Copy (e.g. for add_3) and Clone (e.g. multiply_sum), depending on what the closure captures.

By default, closures will capture by reference if they can. The move keyword makes them capture by value.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Exercise: ROT13

In this example, you will implement the classic “ROT13” cipher. Copy this code to the playground, and implement the missing bits. Only rotate ASCII alphabetic characters, to ensure the result is still valid UTF-8.

use std::io::Read; struct RotDecoder<R: Read> { input: R, rot: u8, } // Implement the `Read` trait for `RotDecoder`. fn main() { let mut rot = RotDecoder { input: "Gb trg gb gur bgure fvqr!".as_bytes(), rot: 13 }; let mut result = String::new(); rot.read_to_string(&mut result).unwrap(); println!("{}", result); } #[cfg(test)] mod test { use super::*; #[test] fn joke() { let mut rot = RotDecoder { input: "Gb trg gb gur bgure fvqr!".as_bytes(), rot: 13 }; let mut result = String::new(); rot.read_to_string(&mut result).unwrap(); assert_eq!(&result, "To get to the other side!"); } #[test] fn binary() { let input: Vec<u8> = (0..=255u8).collect(); let mut rot = RotDecoder::<&[u8]> { input: input.as_ref(), rot: 13 }; let mut buf = [0u8; 256]; assert_eq!(rot.read(&mut buf).unwrap(), 256); for i in 0..=255 { if input[i] != buf[i] { assert!(input[i].is_ascii_alphabetic()); assert!(buf[i].is_ascii_alphabetic()); } } } }

What happens if you chain two RotDecoder instances together, each rotating by 13 characters?

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome to Day 3

Today, we will cover:

  • Memory management, lifetimes, and the borrow checker: how Rust ensures memory safety.
  • Smart pointers: standard library pointer types.

Mục lục

Buổi học nên kéo dài khoảng 2 tiếng và 20 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Lời Chào Mừng3 minutes
Memory Management1 hour
Smart Pointers55 minutes

Memory Management

Phần này sẽ kéo dài khoảng 1 giờ, bao gồm:

SlideThời lượng
Review of Program Memory5 minutes
Approaches to Memory Management10 minutes
Ownership5 minutes
Move Semantics5 minutes
Clone2 minutes
Copy Types5 minutes
Drop10 minutes
Exercise: Builder Type20 minutes

Review of Program Memory

Programs allocate memory in two ways:

  • Stack: Continuous area of memory for local variables.

    • Values have fixed sizes known at compile time.
    • Extremely fast: just move a stack pointer.
    • Easy to manage: follows function calls.
    • Great memory locality.
  • Heap: Storage of values outside of function calls.

    • Values have dynamic sizes determined at runtime.
    • Slightly slower than the stack: some book-keeping needed.
    • No guarantee of memory locality.

Example

Creating a String puts fixed-sized metadata on the stack and dynamically sized data, the actual string, on the heap:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
StackHeaps1capacity5ptrHellolen5

Speaker Notes

This slide should take about 5 minutes.
  • Mention that a String is backed by a Vec, so it has a capacity and length and can grow if mutable via reallocation on the heap.

  • If students ask about it, you can mention that the underlying memory is heap allocated using the System Allocator and custom allocators can be implemented using the Allocator API

More to Explore

We can inspect the memory layout with unsafe Rust. However, you should point out that this is rightfully unsafe!

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Approaches to Memory Management

Traditionally, languages have fallen into two broad categories:

  • Full control via manual memory management: C, C++, Pascal, …
    • Programmer decides when to allocate or free heap memory.
    • Programmer must determine whether a pointer still points to valid memory.
    • Studies show, programmers make mistakes.
  • Full safety via automatic memory management at runtime: Java, Python, Go, Haskell, …
    • A runtime system ensures that memory is not freed until it can no longer be referenced.
    • Typically implemented with reference counting, garbage collection, or RAII.

Rust offers a new mix:

Full control and safety via compile time enforcement of correct memory management.

It does this with an explicit ownership concept.

Speaker Notes

This slide should take about 10 minutes.

This slide is intended to help students coming from other languages to put Rust in context.

  • C must manage heap manually with malloc and free. Common errors include forgetting to call free, calling it multiple times for the same pointer, or dereferencing a pointer after the memory it points to has been freed.

  • C++ has tools like smart pointers (unique_ptr, shared_ptr) that take advantage of language guarantees about calling destructors to ensure memory is freed when a function returns. It is still quite easy to mis-use these tools and create similar bugs to C.

  • Java, Go, and Python rely on the garbage collector to identify memory that is no longer reachable and discard it. This guarantees that any pointer can be dereferenced, eliminating use-after-free and other classes of bugs. But, GC has a runtime cost and is difficult to tune properly.

Rust’s ownership and borrowing model can, in many cases, get the performance of C, with alloc and free operations precisely where they are required – zero cost. It also provides tools similar to C++’s smart pointers. When required, other options such as reference counting are available, and there are even third-party crates available to support runtime garbage collection (not covered in this class).

Ownership

All variable bindings have a scope where they are valid and it is an error to use a variable outside its scope:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

We say that the variable owns the value. Every Rust value has precisely one owner at all times.

At the end of the scope, the variable is dropped and the data is freed. A destructor can run here to free up resources.

Speaker Notes

This slide should take about 5 minutes.

Students familiar with garbage-collection implementations will know that a garbage collector starts with a set of “roots” to find all reachable memory. Rust’s “single owner” principle is a similar idea.

Move Semantics

An assignment will transfer ownership between variables:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • The assignment of s1 to s2 transfers ownership.
  • When s1 goes out of scope, nothing happens: it does not own anything.
  • When s2 goes out of scope, the string data is freed.

Before move to s2:

StackHeaps1ptrHello!len6capacity6

After move to s2:

StackHeaps1ptrHello!len6capacity6s2ptrlen6capacity6(inaccessible)

When you pass a value to a function, the value is assigned to the function parameter. This transfers ownership:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Mention that this is the opposite of the defaults in C++, which copies by value unless you use std::move (and the move constructor is defined!).

  • It is only the ownership that moves. Whether any machine code is generated to manipulate the data itself is a matter of optimization, and such copies are aggressively optimized away.

  • Simple values (such as integers) can be marked Copy (see later slides).

  • In Rust, clones are explicit (by using clone).

In the say_hello example:

  • With the first call to say_hello, main gives up ownership of name. Afterwards, name cannot be used anymore within main.
  • The heap memory allocated for name will be freed at the end of the say_hello function.
  • main can retain ownership if it passes name as a reference (&name) and if say_hello accepts a reference as a parameter.
  • Alternatively, main can pass a clone of name in the first call (name.clone()).
  • Rust makes it harder than C++ to inadvertently create copies by making move semantics the default, and by forcing programmers to make clones explicit.

More to Explore

Defensive Copies in Modern C++

Modern C++ solves this differently:

std::string s1 = "Cpp"; std::string s2 = s1; // Duplicate the data in s1.
  • The heap data from s1 is duplicated and s2 gets its own independent copy.
  • When s1 and s2 go out of scope, they each free their own memory.

Before copy-assignment:

StackHeaps1ptrCpplen3capacity3

After copy-assignment:

StackHeaps1ptrCpplen3capacity3s2ptrCpplen3capacity3

Các điểm chính:

  • C++ has made a slightly different choice than Rust. Because = copies data, the string data has to be cloned. Otherwise we would get a double-free when either string goes out of scope.

  • C++ also has std::move, which is used to indicate when a value may be moved from. If the example had been s2 = std::move(s1), no heap allocation would take place. After the move, s1 would be in a valid but unspecified state. Unlike Rust, the programmer is allowed to keep using s1.

  • Unlike Rust, = in C++ can run arbitrary code as determined by the type which is being copied or moved.

Clone

Sometimes you want to make a copy of a value. The Clone trait accomplishes this.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 2 minutes.
  • The idea of Clone is to make it easy to spot where heap allocations are occurring. Look for .clone() and a few others like vec! or Box::new.

  • It’s common to “clone your way out” of problems with the borrow checker, and return later to try to optimize those clones away.

  • clone generally performs a deep copy of the value, meaning that if you e.g. clone an array, all of the elements of the array are cloned as well.

  • The behavior for clone is user-defined, so it can perform custom cloning logic if needed.

Copy Types

While move semantics are the default, certain types are copied by default:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

These types implement the Copy trait.

You can opt-in your own types to use copy semantics:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • After the assignment, both p1 and p2 own their own data.
  • We can also use p1.clone() to explicitly copy the data.

Speaker Notes

This slide should take about 5 minutes.

Copying and cloning are not the same thing:

  • Copying refers to bitwise copies of memory regions and does not work on arbitrary objects.
  • Copying does not allow for custom logic (unlike copy constructors in C++).
  • Cloning is a more general operation and also allows for custom behavior by implementing the Clone trait.
  • Copying does not work on types that implement the Drop trait.

In the above example, try the following:

  • Add a String field to struct Point. It will not compile because String is not a Copy type.
  • Remove Copy from the derive attribute. The compiler error is now in the println! for p1.
  • Show that it works if you clone p1 instead.

The Drop Trait

Values which implement Drop can specify code to run when they go out of scope:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 8 minutes.
  • Note that std::mem::drop is not the same as std::ops::Drop::drop.
  • Values are automatically dropped when they go out of scope.
  • When a value is dropped, if it implements std::ops::Drop then its Drop::drop implementation will be called.
  • All its fields will then be dropped too, whether or not it implements Drop.
  • std::mem::drop is just an empty function that takes any value. The significance is that it takes ownership of the value, so at the end of its scope it gets dropped. This makes it a convenient way to explicitly drop values earlier than they would otherwise go out of scope.
    • This can be useful for objects that do some work on drop: releasing locks, closing files, etc.

Một số điểm cần thảo luận:

  • Why doesn’t Drop::drop take self?
    • Short-answer: If it did, std::mem::drop would be called at the end of the block, resulting in another call to Drop::drop, and a stack overflow!
  • Try replacing drop(a) with a.drop().

Exercise: Builder Type

In this example, we will implement a complex data type that owns all of its data. We will use the “builder pattern” to support building a new value piece-by-piece, using convenience functions.

Fill in the missing pieces.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Smart Pointers

Phần này sẽ kéo dài khoảng 55 phút, bao gồm:

SlideThời lượng
Box10 minutes
Rc5 minutes
Trait Objects10 minutes
Exercise: Binary Tree30 minutes

Box<T>

Box is an owned pointer to data on the heap:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5StackHeapfive

Box<T> implements Deref<Target = T>, which means that you can call methods from T directly on a Box<T>.

Recursive data types or data types with dynamic sizes need to use a Box:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
StackHeaplistElement1Element2Nil

Speaker Notes

This slide should take about 8 minutes.
  • Box is like std::unique_ptr in C++, except that it’s guaranteed to be not null.

  • A Box can be useful when you:

    • have a type whose size that can’t be known at compile time, but the Rust compiler wants to know an exact size.
    • want to transfer ownership of a large amount of data. To avoid copying large amounts of data on the stack, instead store the data on the heap in a Box so only the pointer is moved.
  • If Box was not used and we attempted to embed a List directly into the List, the compiler would not be able to compute a fixed size for the struct in memory (the List would be of infinite size).

  • Box solves this problem as it has the same size as a regular pointer and just points at the next element of the List in the heap.

  • Remove the Box in the List definition and show the compiler error. We get the message “recursive without indirection”, because for data recursion, we have to use indirection, a Box or reference of some kind, instead of storing the value directly.

More to Explore

Niche Optimization

Though Box looks like std::unique_ptr in C++, it cannot be empty/null. This makes Box one of the types that allow the compiler to optimize storage of some enums.

For example, Option<Box<T>> has the same size, as just Box<T>, because compiler uses NULL-value to discriminate variants instead of using explicit tag (“Null Pointer Optimization”):

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Rc

Rc is a reference-counted shared pointer. Use this when you need to refer to the same data from multiple places:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • See Arc and Mutex if you are in a multi-threaded context.
  • You can downgrade a shared pointer into a Weak pointer to create cycles that will get dropped.

Speaker Notes

This slide should take about 5 minutes.
  • Rc’s count ensures that its contained value is valid for as long as there are references.
  • Rc in Rust is like std::shared_ptr in C++.
  • Rc::clone is cheap: it creates a pointer to the same allocation and increases the reference count. Does not make a deep clone and can generally be ignored when looking for performance issues in code.
  • make_mut actually clones the inner value if necessary (“clone-on-write”) and returns a mutable reference.
  • Use Rc::strong_count to check the reference count.
  • Rc::downgrade gives you a weakly reference-counted object to create cycles that will be dropped properly (likely in combination with RefCell).

Trait Objects

Trait objects allow for values of different types, for instance in a collection:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Memory layout after allocating pets:

<Dog as Pet>::talk<Cat as Pet>::talkStackHeapFidoptrlives9len2capacity2data:name,4,4age5vtablevtablepets: Vec<dyn Pet>data: CatDogProgram text

Speaker Notes

This slide should take about 10 minutes.
  • Types that implement a given trait may be of different sizes. This makes it impossible to have things like Vec<dyn Pet> in the example above.
  • dyn Pet is a way to tell the compiler about a dynamically sized type that implements Pet.
  • In the example, pets is allocated on the stack and the vector data is on the heap. The two vector elements are fat pointers:
    • A fat pointer is a double-width pointer. It has two components: a pointer to the actual object and a pointer to the virtual method table (vtable) for the Pet implementation of that particular object.
    • The data for the Dog named Fido is the name and age fields. The Cat has a lives field.
  • Compare these outputs in the above example:
    println!("{} {}", std::mem::size_of::<Dog>(), std::mem::size_of::<Cat>()); println!("{} {}", std::mem::size_of::<&Dog>(), std::mem::size_of::<&Cat>()); println!("{}", std::mem::size_of::<&dyn Pet>()); println!("{}", std::mem::size_of::<Box<dyn Pet>>());

Exercise: Binary Tree

A binary tree is a tree-type data structure where every node has two children (left and right). We will create a tree where each node stores a value. For a given node N, all nodes in a N’s left subtree contain smaller values, and all nodes in N’s right subtree will contain larger values.

Implement the following types, so that the given tests pass.

Extra Credit: implement an iterator over a binary tree that returns the values in order.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome Back

Buổi học nên kéo dài khoảng 1 tiếng và 55 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Borrowing55 minutes
Lifetimes50 minutes

Borrowing

Phần này sẽ kéo dài khoảng 55 phút, bao gồm:

SlideThời lượng
Borrowing a Value10 minutes
Borrow Checking10 minutes
Borrow Errors3 minutes
Interior Mutability10 minutes
Exercise: Health Statistics20 minutes

Borrowing a Value

As we saw before, instead of transferring ownership when calling a function, you can let a function borrow the value:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • The add function borrows two points and returns a new point.
  • The caller retains ownership of the inputs.

Speaker Notes

This slide should take about 10 minutes.

This slide is a review of the material on references from day 1, expanding slightly to include function arguments and return values.

More to Explore

Notes on stack returns and inlining:

  • Demonstrate that the return from add is cheap because the compiler can eliminate the copy operation, by inlining the call to add into main. Change the above code to print stack addresses and run it on the Playground or look at the assembly in Godbolt. In the “DEBUG” optimization level, the addresses should change, while they stay the same when changing to the “RELEASE” setting:

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • The Rust compiler can do automatic inlining, that can be disabled on a function level with #[inline(never)].

  • Once disabled, the printed address will change on all optimization levels. Looking at Godbolt or Playground, one can see that in this case, the return of the value depends on the ABI, e.g. on amd64 the two i32 that is making up the point will be returned in 2 registers (eax and edx).

Borrow Checking

Rust’s borrow checker puts constraints on the ways you can borrow values. For a given value, at any time:

  • You can have one or more shared references to the value, or
  • You can have exactly one exclusive reference to the value.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Note that the requirement is that conflicting references not exist at the same point. It does not matter where the reference is dereferenced.
  • The above code does not compile because a is borrowed as mutable (through c) and as immutable (through b) at the same time.
  • Move the println! statement for b before the scope that introduces c to make the code compile.
  • After that change, the compiler realizes that b is only ever used before the new mutable borrow of a through c. This is a feature of the borrow checker called “non-lexical lifetimes”.
  • The exclusive reference constraint is quite strong. Rust uses it to ensure that data races do not occur. Rust also relies on this constraint to optimize code. For example, a value behind a shared reference can be safely cached in a register for the lifetime of that reference.
  • The borrow checker is designed to accommodate many common patterns, such as taking exclusive references to different fields in a struct at the same time. But, there are some situations where it doesn’t quite “get it” and this often results in “fighting with the borrow checker.”

Borrow Errors

As a concrete example of how these borrowing rules prevent memory errors, consider the case of modifying a collection while there are references to its elements:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Similarly, consider the case of iterator invalidation:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.
  • In both of these cases, modifying the collection by pushing new elements into it can potentially invalidate existing references to the collection’s elements if the collection has to reallocate.

Interior Mutability

In some situations, it’s necessary to modify data behind a shared (read-only) reference. For example, a shared data structure might have an internal cache, and wish to update that cache from read-only methods.

The “interior mutability” pattern allows exclusive (mutable) access behind a shared reference. The standard library provides several ways to do this, all while still ensuring safety, typically by performing a runtime check.

RefCell

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Cell

Cell wraps a value and allows getting or setting the value, even with a shared reference to the Cell. However, it does not allow any references to the value. Since there are no references, borrowing rules cannot be broken.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

The main thing to take away from this slide is that Rust provides safe ways to modify data behind a shared reference. There are a variety of ways to ensure that safety, and RefCell and Cell are two of them.

  • RefCell enforces Rust’s usual borrowing rules (either multiple shared references or a single exclusive reference) with a runtime check. In this case, all borrows are very short and never overlap, so the checks always succeed.

    • The extra block in the RefCell example is to end the borrow created by the call to borrow_mut before we print the cell. Trying to print a borrowed RefCell just shows the message "{borrowed}".
  • Cell is a simpler means to ensure safety: it has a set method that takes &self. This needs no runtime check, but requires moving values, which can have its own cost.

Exercise: Health Statistics

You’re working on implementing a health-monitoring system. As part of that, you need to keep track of users’ health statistics.

You’ll start with a stubbed function in an impl block as well as a User struct definition. Your goal is to implement the stubbed out method on the User struct defined in the impl block.

Copy the code below to https://play.rust-lang.org/ and fill in the missing method:

// TODO: remove this when you're done with your implementation. #![allow(unused_variables, dead_code)] #![allow(dead_code)] pub struct User { name: String, age: u32, height: f32, visit_count: usize, last_blood_pressure: Option<(u32, u32)>, } pub struct Measurements { height: f32, blood_pressure: (u32, u32), } pub struct HealthReport<'a> { patient_name: &'a str, visit_count: u32, height_change: f32, blood_pressure_change: Option<(i32, i32)>, } impl User { pub fn new(name: String, age: u32, height: f32) -> Self { Self { name, age, height, visit_count: 0, last_blood_pressure: None } } pub fn visit_doctor(&mut self, measurements: Measurements) -> HealthReport { todo!("Update a user's statistics based on measurements from a visit to the doctor") } } fn main() { let bob = User::new(String::from("Bob"), 32, 155.2); println!("I'm {} and my age is {}", bob.name, bob.age); } #[test] fn test_visit() { let mut bob = User::new(String::from("Bob"), 32, 155.2); assert_eq!(bob.visit_count, 0); let report = bob.visit_doctor(Measurements { height: 156.1, blood_pressure: (120, 80) }); assert_eq!(report.patient_name, "Bob"); assert_eq!(report.visit_count, 1); assert_eq!(report.blood_pressure_change, None); let report = bob.visit_doctor(Measurements { height: 156.1, blood_pressure: (115, 76) }); assert_eq!(report.visit_count, 2); assert_eq!(report.blood_pressure_change, Some((-5, -4))); }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Lifetimes

Phần này sẽ kéo dài khoảng 50 phút, bao gồm:

SlideThời lượng
Lifetime Annotations10 minutes
Lifetime Elision5 minutes
Struct Lifetimes5 minutes
Exercise: Protobuf Parsing30 minutes

Lifetime Annotations

A reference has a lifetime, which must not “outlive” the value it refers to. This is verified by the borrow checker.

The lifetime can be implicit - this is what we have seen so far. Lifetimes can also be explicit: &'a Point, &'document str. Lifetimes start with ' and 'a is a typical default name. Read &'a Point as “a borrowed Point which is valid for at least the lifetime a”.

Lifetimes are always inferred by the compiler: you cannot assign a lifetime yourself. Explicit lifetime annotations create constraints where there is ambiguity; the compiler verifies that there is a valid solution.

Lifetimes become more complicated when considering passing values to and returning values from functions.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

In this example, the compiler does not know what lifetime to infer for p3. Looking inside the function body shows that it can only safely assume that p3’s lifetime is the shorter of p1 and p2. But just like types, Rust requires explicit annotations of lifetimes on function arguments and return values.

Add 'a appropriately to left_most:

fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {

This says, “given p1 and p2 which both outlive 'a, the return value lives for at least 'a.

In common cases, lifetimes can be elided, as described on the next slide.

Lifetimes in Function Calls

Lifetimes for function arguments and return values must be fully specified, but Rust allows lifetimes to be elided in most cases with a few simple rules. This is not inference – it is just a syntactic shorthand.

  • Each argument which does not have a lifetime annotation is given one.
  • If there is only one argument lifetime, it is given to all un-annotated return values.
  • If there are multiple argument lifetimes, but the first one is for self, that lifetime is given to all un-annotated return values.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

In this example, cab_distance is trivially elided.

The nearest function provides another example of a function with multiple references in its arguments that requires explicit annotation.

Try adjusting the signature to “lie” about the lifetimes returned:

fn nearest<'a, 'q>(points: &'a [Point], query: &'q Point) -> Option<&'q Point> {

This won’t compile, demonstrating that the annotations are checked for validity by the compiler. Note that this is not the case for raw pointers (unsafe), and this is a common source of errors with unsafe Rust.

Students may ask when to use lifetimes. Rust borrows always have lifetimes. Most of the time, elision and type inference mean these don’t need to be written out. In more complicated cases, lifetime annotations can help resolve ambiguity. Often, especially when prototyping, it’s easier to just work with owned data by cloning values where necessary.

Lifetimes in Data Structures

If a data type stores borrowed data, it must be annotated with a lifetime:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • In the above example, the annotation on Highlight enforces that the data underlying the contained &str lives at least as long as any instance of Highlight that uses that data.
  • If text is consumed before the end of the lifetime of fox (or dog), the borrow checker throws an error.
  • Types with borrowed data force users to hold on to the original data. This can be useful for creating lightweight views, but it generally makes them somewhat harder to use.
  • When possible, make data structures own their data directly.
  • Some structs with multiple references inside can have more than one lifetime annotation. This can be necessary if there is a need to describe lifetime relationships between the references themselves, in addition to the lifetime of the struct itself. Those are very advanced use cases.

Exercise: Protobuf Parsing

In this exercise, you will build a parser for the protobuf binary encoding. Don’t worry, it’s simpler than it seems! This illustrates a common parsing pattern, passing slices of data. The underlying data itself is never copied.

Fully parsing a protobuf message requires knowing the types of the fields, indexed by their field numbers. That is typically provided in a proto file. In this exercise, we’ll encode that information into match statements in functions that get called for each field.

We’ll use the following proto:

message PhoneNumber { optional string number = 1; optional string type = 2; } message Person { optional string name = 1; optional int32 id = 2; repeated PhoneNumber phones = 3; }

A proto message is encoded as a series of fields, one after the next. Each is implemented as a “tag” followed by the value. The tag contains a field number (e.g., 2 for the id field of a Person message) and a wire type defining how the payload should be determined from the byte stream.

Integers, including the tag, are represented with a variable-length encoding called VARINT. Luckily, parse_varint is defined for you below. The given code also defines callbacks to handle Person and PhoneNumber fields, and to parse a message into a series of calls to those callbacks.

What remains for you is to implement the parse_field function and the ProtoMessage trait for Person and PhoneNumber.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome to Day 4

Today we will cover topics relating to building large-scale software in Rust:

  • Iterators: a deep dive on the Iterator trait.
  • Modules and visibility.
  • Testing.
  • Error handling: panics, Result, and the try operator ?.
  • Unsafe Rust: the escape hatch when you can’t express yourself in safe Rust.

Mục lục

Buổi học nên kéo dài khoảng 2 tiếng và 40 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Lời Chào Mừng3 minutes
Iterators45 minutes
Modules40 minutes
Testing45 minutes

Iterators

Phần này sẽ kéo dài khoảng 45 phút, bao gồm:

SlideThời lượng
Iterator5 minutes
IntoIterator5 minutes
FromIterator5 minutes
Exercise: Iterator Method Chaining30 minutes

Iterator

The Iterator trait supports iterating over values in a collection. It requires a next method and provides lots of methods. Many standard library types implement Iterator, and you can implement it yourself, too:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • The Iterator trait implements many common functional programming operations over collections (e.g. map, filter, reduce, etc). This is the trait where you can find all the documentation about them. In Rust these functions should produce the code as efficient as equivalent imperative implementations.

  • IntoIterator is the trait that makes for loops work. It is implemented by collection types such as Vec<T> and references to them such as &Vec<T> and &[T]. Ranges also implement it. This is why you can iterate over a vector with for i in some_vec { .. } but some_vec.next() doesn’t exist.

IntoIterator

The Iterator trait tells you how to iterate once you have created an iterator. The related trait IntoIterator defines how to create an iterator for a type. It is used automatically by the for loop.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Click through to the docs for IntoIterator. Every implementation of IntoIterator must declare two types:

  • Item: the type to iterate over, such as i8,
  • IntoIter: the Iterator type returned by the into_iter method.

Note that IntoIter and Item are linked: the iterator must have the same Item type, which means that it returns Option<Item>

The example iterates over all combinations of x and y coordinates.

Try iterating over the grid twice in main. Why does this fail? Note that IntoIterator::into_iter takes ownership of self.

Fix this issue by implementing IntoIterator for &Grid and storing a reference to the Grid in GridIter.

The same problem can occur for standard library types: for e in some_vector will take ownership of some_vector and iterate over owned elements from that vector. Use for e in &some_vector instead, to iterate over references to elements of some_vector.

FromIterator

FromIterator lets you build a collection from an Iterator.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Iterator implements

fn collect<B>(self) -> B where B: FromIterator<Self::Item>, Self: Sized

There are two ways to specify B for this method:

  • With the “turbofish”: some_iterator.collect::<COLLECTION_TYPE>(), as shown. The _ shorthand used here lets Rust infer the type of the Vec elements.
  • With type inference: let prime_squares: Vec<_> = some_iterator.collect(). Rewrite the example to use this form.

There are basic implementations of FromIterator for Vec, HashMap, etc. There are also more specialized implementations which let you do cool things like convert an Iterator<Item = Result<V, E>> into a Result<Vec<V>, E>.

Exercise: Iterator Method Chaining

In this exercise, you will need to find and use some of the provided methods in the Iterator trait to implement a complex calculation.

Copy the following code to https://play.rust-lang.org/ and make the tests pass. Use an iterator expression and collect the result to construct the return value.

#![allow(unused)] fn main() { /// Calculate the differences between elements of `values` offset by `offset`, /// wrapping around from the end of `values` to the beginning. /// /// Element `n` of the result is `values[(n+offset)%len] - values[n]`. fn offset_differences<N>(offset: usize, values: Vec<N>) -> Vec<N> where N: Copy + std::ops::Sub<Output = N>, { unimplemented!() } #[test] fn test_offset_one() { assert_eq!(offset_differences(1, vec![1, 3, 5, 7]), vec![2, 2, 2, -6]); assert_eq!(offset_differences(1, vec![1, 3, 5]), vec![2, 2, -4]); assert_eq!(offset_differences(1, vec![1, 3]), vec![2, -2]); } #[test] fn test_larger_offsets() { assert_eq!(offset_differences(2, vec![1, 3, 5, 7]), vec![4, 4, -4, -4]); assert_eq!(offset_differences(3, vec![1, 3, 5, 7]), vec![6, -2, -2, -2]); assert_eq!(offset_differences(4, vec![1, 3, 5, 7]), vec![0, 0, 0, 0]); assert_eq!(offset_differences(5, vec![1, 3, 5, 7]), vec![2, 2, 2, -6]); } #[test] fn test_custom_type() { assert_eq!( offset_differences(1, vec![1.0, 11.0, 5.0, 0.0]), vec![10.0, -6.0, -5.0, 1.0] ); } #[test] fn test_degenerate_cases() { assert_eq!(offset_differences(1, vec![0]), vec![0]); assert_eq!(offset_differences(1, vec![1]), vec![0]); let empty: Vec<i32> = vec![]; assert_eq!(offset_differences(1, empty), vec![]); } }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Modules

Phân đoạn này sẽ kéo dài khoảng 40 phút, bao gồm:

SlideThời lượng
Modules3 minutes
Filesystem Hierarchy5 minutes
Visibility5 minutes
use, super, self10 minutes
Exercise: Modules for a GUI Library15 minutes

Modules

We have seen how impl blocks let us namespace functions to a type.

Similarly, mod lets us namespace types and functions:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.
  • Packages provide functionality and include a Cargo.toml file that describes how to build a bundle of 1+ crates.
  • Crates are a tree of modules, where a binary crate creates an executable and a library crate compiles to a library.
  • Modules define organization, scope, and are the focus of this section.

Filesystem Hierarchy

Omitting the module content will tell Rust to look for it in another file:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This tells rust that the garden module content is found at src/garden.rs. Similarly, a garden::vegetables module can be found at src/garden/vegetables.rs.

The crate root is in:

  • src/lib.rs (for a library crate)
  • src/main.rs (for a binary crate)

Modules defined in files can be documented, too, using “inner doc comments”. These document the item that contains them – in this case, a module.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Before Rust 2018, modules needed to be located at module/mod.rs instead of module.rs, and this is still a working alternative for editions after 2018.

  • The main reason to introduce filename.rs as alternative to filename/mod.rs was because many files named mod.rs can be hard to distinguish in IDEs.

  • Deeper nesting can use folders, even if the main module is a file:

    src/ ├── main.rs ├── top_module.rs └── top_module/ └── sub_module.rs
  • The place rust will look for modules can be changed with a compiler directive:

    #[path = "some/path.rs"] mod some_module;

    This is useful, for example, if you would like to place tests for a module in a file named some_module_test.rs, similar to the convention in Go.

Visibility

Modules are a privacy boundary:

  • Module items are private by default (hides implementation details).
  • Parent and sibling items are always visible.
  • In other words, if an item is visible in module foo, it’s visible in all the descendants of foo.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Use the pub keyword to make modules public.

Additionally, there are advanced pub(...) specifiers to restrict the scope of public visibility.

  • See the Rust Reference.
  • Configuring pub(crate) visibility is a common pattern.
  • Less commonly, you can give visibility to a specific path.
  • In any case, visibility must be granted to an ancestor module (and all of its descendants).

use, super, self

A module can bring symbols from another module into scope with use. You will typically see something like this at the top of each module:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Paths

Paths are resolved as follows:

  1. As a relative path:

    • foo or self::foo refers to foo in the current module,
    • super::foo refers to foo in the parent module.
  2. As an absolute path:

    • crate::foo refers to foo in the root of the current crate,
    • bar::foo refers to foo in the bar crate.

Speaker Notes

This slide should take about 8 minutes.
  • It is common to “re-export” symbols at a shorter path. For example, the top-level lib.rs in a crate might have

    mod storage; pub use storage::disk::DiskStorage; pub use storage::network::NetworkStorage;

    making DiskStorage and NetworkStorage available to other crates with a convenient, short path.

  • For the most part, only items that appear in a module need to be use’d. However, a trait must be in scope to call any methods on that trait, even if a type implementing that trait is already in scope. For example, to use the read_to_string method on a type implementing the Read trait, you need to use std::io::Read.

  • The use statement can have a wildcard: use std::io::*. This is discouraged because it is not clear which items are imported, and those might change over time.

Exercise: Modules for a GUI Library

In this exercise, you will reorganize a small GUI Library implementation. This library defines a Widget trait and a few implementations of that trait, as well as a main function.

It is typical to put each type or set of closely-related types into its own module, so each widget type should get its own module.

Cargo Setup

The Rust playground only supports one file, so you will need to make a Cargo project on your local filesystem:

cargo init gui-modules cd gui-modules cargo run

Edit the resulting src/main.rs to add mod statements, and add additional files in the src directory.

Source

Here’s the single-module implementation of the GUI library:

pub trait Widget { /// Natural width of `self`. fn width(&self) -> usize; /// Draw the widget into a buffer. fn draw_into(&self, buffer: &mut dyn std::fmt::Write); /// Draw the widget on standard output. fn draw(&self) { let mut buffer = String::new(); self.draw_into(&mut buffer); println!("{buffer}"); } } pub struct Label { label: String, } impl Label { fn new(label: &str) -> Label { Label { label: label.to_owned() } } } pub struct Button { label: Label, } impl Button { fn new(label: &str) -> Button { Button { label: Label::new(label) } } } pub struct Window { title: String, widgets: Vec<Box<dyn Widget>>, } impl Window { fn new(title: &str) -> Window { Window { title: title.to_owned(), widgets: Vec::new() } } fn add_widget(&mut self, widget: Box<dyn Widget>) { self.widgets.push(widget); } fn inner_width(&self) -> usize { std::cmp::max( self.title.chars().count(), self.widgets.iter().map(|w| w.width()).max().unwrap_or(0), ) } } impl Widget for Window { fn width(&self) -> usize { // Add 4 paddings for borders self.inner_width() + 4 } fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { let mut inner = String::new(); for widget in &self.widgets { widget.draw_into(&mut inner); } let inner_width = self.inner_width(); // TODO: Change draw_into to return Result<(), std::fmt::Error>. Then use the // ?-operator here instead of .unwrap(). writeln!(buffer, "+-{:-<inner_width$}-+", "").unwrap(); writeln!(buffer, "| {:^inner_width$} |", &self.title).unwrap(); writeln!(buffer, "+={:=<inner_width$}=+", "").unwrap(); for line in inner.lines() { writeln!(buffer, "| {:inner_width$} |", line).unwrap(); } writeln!(buffer, "+-{:-<inner_width$}-+", "").unwrap(); } } impl Widget for Button { fn width(&self) -> usize { self.label.width() + 8 // add a bit of padding } fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { let width = self.width(); let mut label = String::new(); self.label.draw_into(&mut label); writeln!(buffer, "+{:-<width$}+", "").unwrap(); for line in label.lines() { writeln!(buffer, "|{:^width$}|", &line).unwrap(); } writeln!(buffer, "+{:-<width$}+", "").unwrap(); } } impl Widget for Label { fn width(&self) -> usize { self.label.lines().map(|line| line.chars().count()).max().unwrap_or(0) } fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { writeln!(buffer, "{}", &self.label).unwrap(); } } fn main() { let mut window = Window::new("Rust GUI Demo 1.23"); window.add_widget(Box::new(Label::new("This is a small text GUI demo."))); window.add_widget(Box::new(Button::new("Click me!"))); window.draw(); }

Speaker Notes

This slide and its sub-slides should take about 15 minutes.

Encourage students to divide the code in a way that feels natural for them, and get accustomed to the required mod, use, and pub declarations. Afterward, discuss what organizations are most idiomatic.

Đáp Án

src ├── main.rs ├── widgets │   ├── button.rs │   ├── label.rs │   └── window.rs └── widgets.rs
// ---- src/widgets.rs ---- mod button; mod label; mod window; pub trait Widget { /// Natural width of `self`. fn width(&self) -> usize; /// Draw the widget into a buffer. fn draw_into(&self, buffer: &mut dyn std::fmt::Write); /// Draw the widget on standard output. fn draw(&self) { let mut buffer = String::new(); self.draw_into(&mut buffer); println!("{buffer}"); } } pub use button::Button; pub use label::Label; pub use window::Window;
// ---- src/widgets/label.rs ---- use super::Widget; pub struct Label { label: String, } impl Label { pub fn new(label: &str) -> Label { Label { label: label.to_owned() } } } impl Widget for Label { fn width(&self) -> usize { // ANCHOR_END: Label-width self.label.lines().map(|line| line.chars().count()).max().unwrap_or(0) } // ANCHOR: Label-draw_into fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { // ANCHOR_END: Label-draw_into writeln!(buffer, "{}", &self.label).unwrap(); } }
// ---- src/widgets/button.rs ---- use super::{Label, Widget}; pub struct Button { label: Label, } impl Button { pub fn new(label: &str) -> Button { Button { label: Label::new(label) } } } impl Widget for Button { fn width(&self) -> usize { // ANCHOR_END: Button-width self.label.width() + 8 // add a bit of padding } // ANCHOR: Button-draw_into fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { // ANCHOR_END: Button-draw_into let width = self.width(); let mut label = String::new(); self.label.draw_into(&mut label); writeln!(buffer, "+{:-<width$}+", "").unwrap(); for line in label.lines() { writeln!(buffer, "|{:^width$}|", &line).unwrap(); } writeln!(buffer, "+{:-<width$}+", "").unwrap(); } }
// ---- src/widgets/window.rs ---- use super::Widget; pub struct Window { title: String, widgets: Vec<Box<dyn Widget>>, } impl Window { pub fn new(title: &str) -> Window { Window { title: title.to_owned(), widgets: Vec::new() } } pub fn add_widget(&mut self, widget: Box<dyn Widget>) { self.widgets.push(widget); } fn inner_width(&self) -> usize { std::cmp::max( self.title.chars().count(), self.widgets.iter().map(|w| w.width()).max().unwrap_or(0), ) } } impl Widget for Window { fn width(&self) -> usize { // ANCHOR_END: Window-width // Add 4 paddings for borders self.inner_width() + 4 } // ANCHOR: Window-draw_into fn draw_into(&self, buffer: &mut dyn std::fmt::Write) { // ANCHOR_END: Window-draw_into let mut inner = String::new(); for widget in &self.widgets { widget.draw_into(&mut inner); } let inner_width = self.inner_width(); // TODO: after learning about error handling, you can change // draw_into to return Result<(), std::fmt::Error>. Then use // the ?-operator here instead of .unwrap(). writeln!(buffer, "+-{:-<inner_width$}-+", "").unwrap(); writeln!(buffer, "| {:^inner_width$} |", &self.title).unwrap(); writeln!(buffer, "+={:=<inner_width$}=+", "").unwrap(); for line in inner.lines() { writeln!(buffer, "| {:inner_width$} |", line).unwrap(); } writeln!(buffer, "+-{:-<inner_width$}-+", "").unwrap(); } }
// ---- src/main.rs ---- mod widgets; use widgets::Widget; fn main() { let mut window = widgets::Window::new("Rust GUI Demo 1.23"); window .add_widget(Box::new(widgets::Label::new("This is a small text GUI demo."))); window.add_widget(Box::new(widgets::Button::new("Click me!"))); window.draw(); }

Testing

Phần này sẽ kéo dài khoảng 45 phút, bao gồm:

SlideThời lượng
Test Modules5 minutes
Other Types of Tests5 minutes
Compiler Lints and Clippy3 minutes
Exercise: Luhn Algorithm30 minutes

Unit Tests

Rust and Cargo come with a simple unit test framework:

  • Unit tests are supported throughout your code.

  • Integration tests are supported via the tests/ directory.

Tests are marked with #[test]. Unit tests are often put in a nested tests module, using #[cfg(test)] to conditionally compile them only when building tests.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • This lets you unit test private helpers.
  • The #[cfg(test)] attribute is only active when you run cargo test.

Speaker Notes

This slide should take about 5 minutes.

Run the tests in the playground in order to show their results.

Other Types of Tests

Integration Tests

If you want to test your library as a client, use an integration test.

Create a .rs file under tests/:

// tests/my_library.rs use my_library::init; #[test] fn test_init() { assert!(init().is_ok()); }

These tests only have access to the public API of your crate.

Documentation Tests

Rust has built-in support for documentation tests:

#![allow(unused)] fn main() { /// Shortens a string to the given length. /// /// ``` /// # use playground::shorten_string; /// assert_eq!(shorten_string("Hello World", 5), "Hello"); /// assert_eq!(shorten_string("Hello World", 20), "Hello World"); /// ``` pub fn shorten_string(s: &str, length: usize) -> &str { &s[..std::cmp::min(length, s.len())] } }
  • Code blocks in /// comments are automatically seen as Rust code.
  • The code will be compiled and executed as part of cargo test.
  • Adding # in the code will hide it from the docs, but will still compile/run it.
  • Test the above code on the Rust Playground.

Compiler Lints and Clippy

The Rust compiler produces fantastic error messages, as well as helpful built-in lints. Clippy provides even more lints, organized into groups that can be enabled per-project.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 3 minutes.

Run the code sample and examine the error message. There are also lints visible here, but those will not be shown once the code compiles. Switch to the Playground site to show those lints.

After resolving the lints, run clippy on the playground site to show clippy warnings. Clippy has extensive documentation of its lints, and adds new lints (including default-deny lints) all the time.

Note that errors or warnings with help: ... can be fixed with cargo fix or via your editor.

Exercise: Luhn Algorithm

Luhn Algorithm

The Luhn algorithm is used to validate credit card numbers. The algorithm takes a string as input and does the following to validate the credit card number:

  • Ignore all spaces. Reject number with fewer than two digits.

  • Moving from right to left, double every second digit: for the number 1234, we double 3 and 1. For the number 98765, we double 6 and 8.

  • After doubling a digit, sum the digits if the result is greater than 9. So doubling 7 becomes 14 which becomes 1 + 4 = 5.

  • Sum all the undoubled and doubled digits.

  • The credit card number is valid if the sum ends with 0.

The provided code provides a buggy implementation of the luhn algorithm, along with two basic unit tests that confirm that most the algorithm is implemented correctly.

Copy the code below to https://play.rust-lang.org/ and write additional tests to uncover bugs in the provided implementation, fixing any bugs you find.

#![allow(unused)] fn main() { pub fn luhn(cc_number: &str) -> bool { let mut sum = 0; let mut double = false; for c in cc_number.chars().rev() { if let Some(digit) = c.to_digit(10) { if double { let double_digit = digit * 2; sum += if double_digit > 9 { double_digit - 9 } else { double_digit }; } else { sum += digit; } double = !double; } else { continue; } } sum % 10 == 0 } #[cfg(test)] mod test { use super::*; #[test] fn test_valid_cc_number() { assert!(luhn("4263 9826 4026 9299")); assert!(luhn("4539 3195 0343 6467")); assert!(luhn("7992 7398 713")); } #[test] fn test_invalid_cc_number() { assert!(!luhn("4223 9826 4026 9299")); assert!(!luhn("4539 3195 0343 6476")); assert!(!luhn("8273 1232 7352 0569")); } } }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome Back

Buổi học nên kéo dài khoảng 2 tiếng và 10 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Error Handling55 minutes
Unsafe Rust1 hour and 5 minutes

Error Handling

Phần này sẽ kéo dài khoảng 55 phút, bao gồm:

SlideThời lượng
Panics3 minutes
Try Operator5 minutes
Try Conversions5 minutes
Error Trait5 minutes
thiserror and anyhow5 minutes
Exercise: Rewriting with Result30 minutes

Panics

Rust handles fatal errors with a “panic”.

Rust will trigger a panic if a fatal error happens at runtime:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Panics are for unrecoverable and unexpected errors.
    • Panics are symptoms of bugs in the program.
    • Runtime failures like failed bounds checks can panic
    • Assertions (such as assert!) panic on failure
    • Purpose-specific panics can use the panic! macro.
  • A panic will “unwind” the stack, dropping values just as if the functions had returned.
  • Use non-panicking APIs (such as Vec::get) if crashing is not acceptable.

Speaker Notes

This slide should take about 3 minutes.

By default, a panic will cause the stack to unwind. The unwinding can be caught:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Catching is unusual; do not attempt to implement exceptions with catch_unwind!
  • This can be useful in servers which should keep running even if a single request crashes.
  • This does not work if panic = 'abort' is set in your Cargo.toml.

Try Operator

Runtime errors like connection-refused or file-not-found are handled with the Result type, but matching this type on every call can be cumbersome. The try-operator ? is used to return errors to the caller. It lets you turn the common

match some_expression { Ok(value) => value, Err(err) => return Err(err), }

into the much simpler

some_expression?

We can use this to simplify our error handling code:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Simplify the read_username function to use ?.

Các điểm chính:

  • The username variable can be either Ok(string) or Err(error).
  • Use the fs::write call to test out the different scenarios: no file, empty file, file with username.
  • Note that main can return a Result<(), E> as long as it implements std::process::Termination. In practice, this means that E implements Debug. The executable will print the Err variant and return a nonzero exit status on error.

Try Conversions

The effective expansion of ? is a little more complicated than previously indicated:

expression?

works the same as

match expression { Ok(value) => value, Err(err) => return Err(From::from(err)), }

The From::from call here means we attempt to convert the error type to the type returned by the function. This makes it easy to encapsulate errors into higher-level errors.

Example

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

The ? operator must return a value compatible with the return type of the function. For Result, it means that the error types have to be compatible. A function that returns Result<T, ErrorOuter> can only use ? on a value of type Result<U, ErrorInner> if ErrorOuter and ErrorInner are the same type or if ErrorOuter implements From<ErrorInner>.

A common alternative to a From implementation is Result::map_err, especially when the conversion only happens in one place.

There is no compatibility requirement for Option. A function returning Option<T> can use the ? operator on Option<U> for arbitrary T and U types.

A function that returns Result cannot use ? on Option and vice versa. However, Option::ok_or converts Option to Result whereas Result::ok turns Result into Option.

Dynamic Error Types

Sometimes we want to allow any type of error to be returned without writing our own enum covering all the different possibilities. The std::error::Error trait makes it easy to create a trait object that can contain any error.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

The read_count function can return std::io::Error (from file operations) or std::num::ParseIntError (from String::parse).

Boxing errors saves on code, but gives up the ability to cleanly handle different error cases differently in the program. As such it’s generally not a good idea to use Box<dyn Error> in the public API of a library, but it can be a good option in a program where you just want to display the error message somewhere.

Make sure to implement the std::error::Error trait when defining a custom error type so it can be boxed. But if you need to support the no_std attribute, keep in mind that the std::error::Error trait is currently compatible with no_std in nightly only.

thiserror and anyhow

The thiserror and anyhow crates are widely used to simplify error handling.

  • thiserror is often used in libraries to create custom error types that implement From<T>.
  • anyhow is often used by applications to help with error handling in functions, including adding contextual information to your errors.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

thiserror

  • The Error derive macro is provided by thiserror, and has lots of useful attributes to help define error types in a compact way.
  • The std::error::Error trait is derived automatically.
  • The message from #[error] is used to derive the Display trait.

anyhow

  • anyhow::Error is essentially a wrapper around Box<dyn Error>. As such it’s again generally not a good choice for the public API of a library, but is widely used in applications.
  • anyhow::Result<V> is a type alias for Result<V, anyhow::Error>.
  • Actual error type inside of it can be extracted for examination if necessary.
  • Functionality provided by anyhow::Result<T> may be familiar to Go developers, as it provides similar usage patterns and ergonomics to (T, error) from Go.
  • anyhow::Context is a trait implemented for the standard Result and Option types. use anyhow::Context is necessary to enable .context() and .with_context() on those types.

Exercise: Rewriting with Result

The following implements a very simple parser for an expression language. However, it handles errors by panicking. Rewrite it to instead use idiomatic error handling and propagate errors to a return from main. Feel free to use thiserror and anyhow.

HINT: start by fixing error handling in the parse function. Once that is working correctly, update Tokenizer to implement Iterator<Item=Result<Token, TokenizerError>> and handle that in the parser.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Unsafe Rust

Phần này sẽ kéo dài khoảng 1 giờ và 5 phút, bao gồm:

SlideThời lượng
Unsafe5 minutes
Dereferencing Raw Pointers10 minutes
Mutable Static Variables5 minutes
Unions5 minutes
Unsafe Functions5 minutes
Unsafe Traits5 minutes
Exercise: FFI Wrapper30 minutes

Unsafe Rust

The Rust language has two parts:

  • Safe Rust: memory safe, no undefined behavior possible.
  • Unsafe Rust: can trigger undefined behavior if preconditions are violated.

We saw mostly safe Rust in this course, but it’s important to know what Unsafe Rust is.

Unsafe code is usually small and isolated, and its correctness should be carefully documented. It is usually wrapped in a safe abstraction layer.

Unsafe Rust gives you access to five new capabilities:

  • Dereference raw pointers.
  • Access or modify mutable static variables.
  • Access union fields.
  • Call unsafe functions, including extern functions.
  • Implement unsafe traits.

We will briefly cover unsafe capabilities next. For full details, please see Chapter 19.1 in the Rust Book and the Rustonomicon.

Speaker Notes

This slide should take about 5 minutes.

Unsafe Rust does not mean the code is incorrect. It means that developers have turned off some compiler safety features and have to write correct code by themselves. It means the compiler no longer enforces Rust’s memory-safety rules.

Dereferencing Raw Pointers

Creating pointers is safe, but dereferencing them requires unsafe:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.

It is good practice (and required by the Android Rust style guide) to write a comment for each unsafe block explaining how the code inside it satisfies the safety requirements of the unsafe operations it is doing.

In the case of pointer dereferences, this means that the pointers must be valid, i.e.:

  • The pointer must be non-null.
  • The pointer must be dereferenceable (within the bounds of a single allocated object).
  • The object must not have been deallocated.
  • There must not be concurrent accesses to the same location.
  • If the pointer was obtained by casting a reference, the underlying object must be live and no reference may be used to access the memory.

In most cases the pointer must also be properly aligned.

The “NOT SAFE” section gives an example of a common kind of UB bug: *r1 has the 'static lifetime, so r3 has type &'static String, and thus outlives s. Creating a reference from a pointer requires great care.

Mutable Static Variables

It is safe to read an immutable static variable:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

However, since data races can occur, it is unsafe to read and write mutable static variables:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • The program here is safe because it is single-threaded. However, the Rust compiler is conservative and will assume the worst. Try removing the unsafe and see how the compiler explains that it is undefined behavior to mutate a static from multiple threads.

  • Using a mutable static is generally a bad idea, but there are some cases where it might make sense in low-level no_std code, such as implementing a heap allocator or working with some C APIs.

Unions

Unions are like enums, but you need to track the active field yourself:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Unions are very rarely needed in Rust as you can usually use an enum. They are occasionally needed for interacting with C library APIs.

If you just want to reinterpret bytes as a different type, you probably want std::mem::transmute or a safe wrapper such as the zerocopy crate.

Unsafe Functions

Calling Unsafe Functions

A function or method can be marked unsafe if it has extra preconditions you must uphold to avoid undefined behaviour:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Writing Unsafe Functions

You can mark your own functions as unsafe if they require particular conditions to avoid undefined behaviour.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

Calling Unsafe Functions

get_unchecked, like most _unchecked functions, is unsafe, because it can create UB if the range is incorrect. abs is incorrect for a different reason: it is an external function (FFI). Calling external functions is usually only a problem when those functions do things with pointers which might violate Rust’s memory model, but in general any C function might have undefined behaviour under any arbitrary circumstances.

The "C" in this example is the ABI; other ABIs are available too.

Writing Unsafe Functions

We wouldn’t actually use pointers for a swap function - it can be done safely with references.

Note that unsafe code is allowed within an unsafe function without an unsafe block. We can prohibit this with #[deny(unsafe_op_in_unsafe_fn)]. Try adding it and see what happens. This will likely change in a future Rust edition.

Implementing Unsafe Traits

Like with functions, you can mark a trait as unsafe if the implementation must guarantee particular conditions to avoid undefined behaviour.

For example, the zerocopy crate has an unsafe trait that looks something like this:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.

There should be a # Safety section on the Rustdoc for the trait explaining the requirements for the trait to be safely implemented.

The actual safety section for AsBytes is rather longer and more complicated.

The built-in Send and Sync traits are unsafe.

Safe FFI Wrapper

Rust has great support for calling functions through a foreign function interface (FFI). We will use this to build a safe wrapper for the libc functions you would use from C to read the names of files in a directory.

You will want to consult the manual pages:

You will also want to browse the std::ffi module. There you find a number of string types which you need for the exercise:

Kiểu Dữ LiệuEncodingUse
str and StringUTF-8Text processing in Rust
CStr and CStringNUL-terminatedCommunicating with C functions
OsStr and OsStringOS-specificCommunicating with the OS

You will convert between all these types:

  • &str to CString: you need to allocate space for a trailing \0 character,
  • CString to *const i8: you need a pointer to call C functions,
  • *const i8 to &CStr: you need something which can find the trailing \0 character,
  • &CStr to &[u8]: a slice of bytes is the universal interface for “some unknown data”,
  • &[u8] to &OsStr: &OsStr is a step towards OsString, use OsStrExt to create it,
  • &OsStr to OsString: you need to clone the data in &OsStr to be able to return it and call readdir again.

The Nomicon also has a very useful chapter about FFI.

Copy the code below to https://play.rust-lang.org/ and fill in the missing functions and methods:

// TODO: remove this when you're done with your implementation. #![allow(unused_imports, unused_variables, dead_code)] mod ffi { use std::os::raw::{c_char, c_int}; #[cfg(not(target_os = "macos"))] use std::os::raw::{c_long, c_uchar, c_ulong, c_ushort}; // Opaque type. See https://doc.rust-lang.org/nomicon/ffi.html. #[repr(C)] pub struct DIR { _data: [u8; 0], _marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>, } // Layout according to the Linux man page for readdir(3), where ino_t and // off_t are resolved according to the definitions in // /usr/include/x86_64-linux-gnu/{sys/types.h, bits/typesizes.h}. #[cfg(not(target_os = "macos"))] #[repr(C)] pub struct dirent { pub d_ino: c_ulong, pub d_off: c_long, pub d_reclen: c_ushort, pub d_type: c_uchar, pub d_name: [c_char; 256], } // Layout according to the macOS man page for dir(5). #[cfg(all(target_os = "macos"))] #[repr(C)] pub struct dirent { pub d_fileno: u64, pub d_seekoff: u64, pub d_reclen: u16, pub d_namlen: u16, pub d_type: u8, pub d_name: [c_char; 1024], } extern "C" { pub fn opendir(s: *const c_char) -> *mut DIR; #[cfg(not(all(target_os = "macos", target_arch = "x86_64")))] pub fn readdir(s: *mut DIR) -> *const dirent; // See https://github.com/rust-lang/libc/issues/414 and the section on // _DARWIN_FEATURE_64_BIT_INODE in the macOS man page for stat(2). // // "Platforms that existed before these updates were available" refers // to macOS (as opposed to iOS / wearOS / etc.) on Intel and PowerPC. #[cfg(all(target_os = "macos", target_arch = "x86_64"))] #[link_name = "readdir$INODE64"] pub fn readdir(s: *mut DIR) -> *const dirent; pub fn closedir(s: *mut DIR) -> c_int; } } use std::ffi::{CStr, CString, OsStr, OsString}; use std::os::unix::ffi::OsStrExt; #[derive(Debug)] struct DirectoryIterator { path: CString, dir: *mut ffi::DIR, } impl DirectoryIterator { fn new(path: &str) -> Result<DirectoryIterator, String> { // Call opendir and return a Ok value if that worked, // otherwise return Err with a message. unimplemented!() } } impl Iterator for DirectoryIterator { type Item = OsString; fn next(&mut self) -> Option<OsString> { // Keep calling readdir until we get a NULL pointer back. unimplemented!() } } impl Drop for DirectoryIterator { fn drop(&mut self) { // Call closedir as needed. unimplemented!() } } fn main() -> Result<(), String> { let iter = DirectoryIterator::new(".")?; println!("files: {:#?}", iter.collect::<Vec<_>>()); Ok(()) }

Đáp Án

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Welcome to Rust in Android

Rust is supported for system software on Android. This means that you can write new services, libraries, drivers or even firmware in Rust (or improve existing code as needed).

We will attempt to call Rust from one of your own projects today. So try to find a little corner of your code base where we can move some lines of code to Rust. The fewer dependencies and “exotic” types the better. Something that parses some raw bytes would be ideal.

Speaker Notes

The speaker may mention any of the following given the increased use of Rust in Android:

Setup

We will be using a Cuttlefish Android Virtual Device to test our code. Make sure you have access to one or create a new one with:

source build/envsetup.sh lunch aosp_cf_x86_64_phone-trunk_staging-userdebug acloud create

Please see the Android Developer Codelab for details.

Speaker Notes

Các điểm chính:

  • Cuttlefish is a reference Android device designed to work on generic Linux desktops. MacOS support is also planned.

  • The Cuttlefish system image maintains high fidelity to real devices, and is the ideal emulator to run many Rust use cases.

Build Rules

The Android build system (Soong) supports Rust via a number of modules:

Module TypeDescription
rust_binaryProduces a Rust binary.
rust_libraryProduces a Rust library, and provides both rlib and dylib variants.
rust_ffiProduces a Rust C library usable by cc modules, and provides both static and shared variants.
rust_proc_macroProduces a proc-macro Rust library. These are analogous to compiler plugins.
rust_testProduces a Rust test binary that uses the standard Rust test harness.
rust_fuzzProduces a Rust fuzz binary leveraging libfuzzer.
rust_protobufGenerates source and produces a Rust library that provides an interface for a particular protobuf.
rust_bindgenGenerates source and produces a Rust library containing Rust bindings to C libraries.

We will look at rust_binary and rust_library next.

Speaker Notes

Additional items speaker may mention:

  • Cargo is not optimized for multi-language repos, and also downloads packages from the internet.

  • For compliance and performance, Android must have crates in-tree. It must also interop with C/C++/Java code. Soong fills that gap.

  • Soong has many similarities to Bazel, which is the open-source variant of Blaze (used in google3).

  • There is a plan to transition Android, ChromeOS, and Fuchsia to Bazel.

  • Learning Bazel-like build rules is useful for all Rust OS developers.

  • Fun fact: Data from Star Trek is a Soong-type Android.

Rust Binaries

Let us start with a simple application. At the root of an AOSP checkout, create the following files:

hello_rust/Android.bp:

rust_binary { name: "hello_rust", crate_name: "hello_rust", srcs: ["src/main.rs"], }

hello_rust/src/main.rs:

//! Rust demo. /// Prints a greeting to standard output. fn main() { println!("Hello from Rust!"); }

You can now build, push, and run the binary:

m hello_rust adb push "$ANDROID_PRODUCT_OUT/system/bin/hello_rust" /data/local/tmp adb shell /data/local/tmp/hello_rust
Hello from Rust!

Rust Libraries

You use rust_library to create a new Rust library for Android.

Here we declare a dependency on two libraries:

  • libgreeting, which we define below,
  • libtextwrap, which is a crate already vendored in external/rust/crates/.

hello_rust/Android.bp:

rust_binary { name: "hello_rust_with_dep", crate_name: "hello_rust_with_dep", srcs: ["src/main.rs"], rustlibs: [ "libgreetings", "libtextwrap", ], prefer_rlib: true, // Need this to avoid dynamic link error. } rust_library { name: "libgreetings", crate_name: "greetings", srcs: ["src/lib.rs"], }

hello_rust/src/main.rs:

//! Rust demo. use greetings::greeting; use textwrap::fill; /// Prints a greeting to standard output. fn main() { println!("{}", fill(&greeting("Bob"), 24)); }

hello_rust/src/lib.rs:

//! Greeting library. /// Greet `name`. pub fn greeting(name: &str) -> String { format!("Hello {name}, it is very nice to meet you!") }

You build, push, and run the binary like before:

m hello_rust_with_dep adb push "$ANDROID_PRODUCT_OUT/system/bin/hello_rust_with_dep" /data/local/tmp adb shell /data/local/tmp/hello_rust_with_dep
Hello Bob, it is very nice to meet you!

AIDL

The Android Interface Definition Language (AIDL) is supported in Rust:

  • Rust code can call existing AIDL servers,
  • You can create new AIDL servers in Rust.

Birthday Service Tutorial

To illustrate how to use Rust with Binder, we’re going to walk through the process of creating a Binder interface. We’re then going to both implement the described service and write client code that talks to that service.

AIDL Interfaces

You declare the API of your service using an AIDL interface:

birthday_service/aidl/com/example/birthdayservice/IBirthdayService.aidl:

/** Birthday service interface. */ interface IBirthdayService { /** Generate a Happy Birthday message. */ String wishHappyBirthday(String name, int years); }

birthday_service/aidl/Android.bp:

aidl_interface { name: "com.example.birthdayservice", srcs: ["com/example/birthdayservice/*.aidl"], unstable: true, backend: { rust: { // Rust is not enabled by default enabled: true, }, }, }

Speaker Notes

  • Note that the directory structure under the aidl/ directory needs to match the package name used in the AIDL file, i.e. the package is com.example.birthdayservice and the file is at aidl/com/example/IBirthdayService.aidl.

Generated Service API

Binder generates a trait corresponding to the interface definition. trait to talk to the service.

birthday_service/aidl/com/example/birthdayservice/IBirthdayService.aidl:

/** Birthday service interface. */ interface IBirthdayService { /** Generate a Happy Birthday message. */ String wishHappyBirthday(String name, int years); }

Generated trait:

trait IBirthdayService { fn wishHappyBirthday(&self, name: &str, years: i32) -> binder::Result<String>; }

Your service will need to implement this trait, and your client will use this trait to talk to the service.

Speaker Notes

  • The generated bindings can be found at out/soong/.intermediates/<path to module>/.
  • Point out how the generated function signature, specifically the argument and return types, correspond the interface definition.
    • String for an argument results in a different Rust type than String as a return type.

Service Implementation

We can now implement the AIDL service:

birthday_service/src/lib.rs:

use com_example_birthdayservice::aidl::com::example::birthdayservice::IBirthdayService::IBirthdayService; use com_example_birthdayservice::binder; /// The `IBirthdayService` implementation. pub struct BirthdayService; impl binder::Interface for BirthdayService {} impl IBirthdayService for BirthdayService { fn wishHappyBirthday(&self, name: &str, years: i32) -> binder::Result<String> { Ok(format!("Happy Birthday {name}, congratulations with the {years} years!")) } }

birthday_service/Android.bp:

rust_library { name: "libbirthdayservice", srcs: ["src/lib.rs"], crate_name: "birthdayservice", rustlibs: [ "com.example.birthdayservice-rust", "libbinder_rs", ], }

Speaker Notes

  • Point out the path to the generated IBirthdayService trait, and explain why each of the segments is necessary.
  • TODO: What does the binder::Interface trait do? Are there methods to override? Where source?

AIDL Server

Finally, we can create a server which exposes the service:

birthday_service/src/server.rs:

//! Birthday service. use birthdayservice::BirthdayService; use com_example_birthdayservice::aidl::com::example::birthdayservice::IBirthdayService::BnBirthdayService; use com_example_birthdayservice::binder; const SERVICE_IDENTIFIER: &str = "birthdayservice"; /// Entry point for birthday service. fn main() { let birthday_service = BirthdayService; let birthday_service_binder = BnBirthdayService::new_binder( birthday_service, binder::BinderFeatures::default(), ); binder::add_service(SERVICE_IDENTIFIER, birthday_service_binder.as_binder()) .expect("Failed to register service"); binder::ProcessState::join_thread_pool() }

birthday_service/Android.bp:

rust_binary { name: "birthday_server", crate_name: "birthday_server", srcs: ["src/server.rs"], rustlibs: [ "com.example.birthdayservice-rust", "libbinder_rs", "libbirthdayservice", ], prefer_rlib: true, // To avoid dynamic link error. }

Speaker Notes

The process for taking a user-defined service implementation (in this case the BirthdayService type, which implements the IBirthdayService) and starting it as a Binder service has multiple steps, and may appear more complicated than students are used to if they’ve used Binder from C++ or another language. Explain to students why each step is necessary.

  1. Create an instance of your service type (BirthdayService).
  2. Wrap the service object in corresponding Bn* type (BnBirthdayService in this case). This type is generated by Binder and provides the common Binder functionality that would be provided by the BnBinder base class in C++. We don’t have inheritance in Rust, so instead we use composition, putting our BirthdayService within the generated BnBinderService.
  3. Call add_service, giving it a service identifier and your service object (the BnBirthdayService object in the example).
  4. Call join_thread_pool to add the current thread to Binder’s thread pool and start listening for connections.

Deploy

We can now build, push, and start the service:

m birthday_server adb push "$ANDROID_PRODUCT_OUT/system/bin/birthday_server" /data/local/tmp adb root adb shell /data/local/tmp/birthday_server

In another terminal, check that the service runs:

adb shell service check birthdayservice
Service birthdayservice: found

You can also call the service with service call:

adb shell service call birthdayservice 1 s16 Bob i32 24
Result: Parcel( 0x00000000: 00000000 00000036 00610048 00700070 '....6...H.a.p.p.' 0x00000010: 00200079 00690042 00740072 00640068 'y. .B.i.r.t.h.d.' 0x00000020: 00790061 00420020 0062006f 0020002c 'a.y. .B.o.b.,. .' 0x00000030: 006f0063 0067006e 00610072 00750074 'c.o.n.g.r.a.t.u.' 0x00000040: 0061006c 00690074 006e006f 00200073 'l.a.t.i.o.n.s. .' 0x00000050: 00690077 00680074 00740020 00650068 'w.i.t.h. .t.h.e.' 0x00000060: 00320020 00200034 00650079 00720061 ' .2.4. .y.e.a.r.' 0x00000070: 00210073 00000000 's.!..... ')

AIDL Client

Finally, we can create a Rust client for our new service.

birthday_service/src/client.rs:

use com_example_birthdayservice::aidl::com::example::birthdayservice::IBirthdayService::IBirthdayService; use com_example_birthdayservice::binder; const SERVICE_IDENTIFIER: &str = "birthdayservice"; /// Call the birthday service. fn main() -> Result<(), Box<dyn Error>> { let name = std::env::args().nth(1).unwrap_or_else(|| String::from("Bob")); let years = std::env::args() .nth(2) .and_then(|arg| arg.parse::<i32>().ok()) .unwrap_or(42); binder::ProcessState::start_thread_pool(); let service = binder::get_interface::<dyn IBirthdayService>(SERVICE_IDENTIFIER) .map_err(|_| "Failed to connect to BirthdayService")?; // Call the service. let msg = service.wishHappyBirthday(&name, years)?; println!("{msg}"); }

birthday_service/Android.bp:

rust_binary { name: "birthday_client", crate_name: "birthday_client", srcs: ["src/client.rs"], rustlibs: [ "com.example.birthdayservice-rust", "libbinder_rs", ], prefer_rlib: true, // To avoid dynamic link error. }

Notice that the client does not depend on libbirthdayservice.

Build, push, and run the client on your device:

m birthday_client adb push "$ANDROID_PRODUCT_OUT/system/bin/birthday_client" /data/local/tmp adb shell /data/local/tmp/birthday_client Charlie 60
Happy Birthday Charlie, congratulations with the 60 years!

Speaker Notes

  • Strong<dyn IBirthdayService> is the trait object representing the service that the client has connected to.
    • Strong is a custom smart pointer type for Binder. It handles both an in-process ref count for the service trait object, and the global Binder ref count that tracks how many processes have a reference to the object.
    • Note that the trait object that the client uses to talk to the service uses the exact same trait that the server implements. For a given Binder interface, there is a single Rust trait generated that both client and server use.
  • Use the same service identifier used when registering the service. This should ideally be defined in a common crate that both the client and server can depend on.

Changing API

Let us extend the API with more functionality: we want to let clients specify a list of lines for the birthday card:

package com.example.birthdayservice; /** Birthday service interface. */ interface IBirthdayService { /** Generate a Happy Birthday message. */ String wishHappyBirthday(String name, int years, in String[] text); }

This results in an updated trait definition for IBirthdayService:

trait IBirthdayService { fn wishHappyBirthday( &self, name: &str, years: i32, text: &[String], ) -> binder::Result<String>; }

Speaker Notes

  • Note how the String[] in the AIDL definition is translated as a &[String] in Rust, i.e. that idiomatic Rust types are used in the generated bindings wherever possible:
    • in array arguments are translated to slices.
    • out and inout args are translated to &mut Vec<T>.
    • Return values are translated to returning a Vec<T>.

Updating Client and Service

Update the client and server code to account for the new API.

birthday_service/src/lib.rs:

impl IBirthdayService for BirthdayService { fn wishHappyBirthday( &self, name: &str, years: i32, text: &[String], ) -> binder::Result<String> { let mut msg = format!( "Happy Birthday {name}, congratulations with the {years} years!", ); for line in text { msg.push('\n'); msg.push_str(line); } Ok(msg) } }

birthday_service/src/client.rs:

let msg = service.wishHappyBirthday( &name, years, &[ String::from("Habby birfday to yuuuuu"), String::from("And also: many more"), ], )?;

Speaker Notes

  • TODO: Move code snippets into project files where they’ll actually be built?

Working With AIDL Types

AIDL types translate into the appropriate idiomatic Rust type:

  • Primitive types map (mostly) to idiomatic Rust types.
  • Collection types like slices, Vecs and string types are supported.
  • References to AIDL objects and file handles can be sent between clients and services.
  • File handles and parcelables are fully supported.

Primitive Types

Primitive types map (mostly) idiomatically:

AIDL TypeRust TypeNote
booleanbool
bytei8Note that bytes are signed.
charu16Note the usage of u16, NOT u32.
inti32
longi64
floatf32
doublef64
StringString

Array Types

The array types (T[], byte[], and List<T>) get translated to the appropriate Rust array type depending on how they are used in the function signature:

PositionRust Type
in argument&[T]
out/inout argument&mut Vec<T>
ReturnVec<T>

Speaker Notes

  • In Android 13 or higher, fixed-size arrays are supported, i.e. T[N] becomes [T; N]. Fixed-size arrays can have multiple dimensions (e.g. int[3][4]). In the Java backend, fixed-size arrays are represented as array types.
  • Arrays in parcelable fields always get translated to Vec<T>.

Sending Objects

AIDL objects can be sent either as a concrete AIDL type or as the type-erased IBinder interface:

birthday_service/aidl/com/example/birthdayservice/IBirthdayInfoProvider.aidl:

package com.example.birthdayservice; interface IBirthdayInfoProvider { String name(); int years(); }

birthday_service/aidl/com/example/birthdayservice/IBirthdayService.aidl:

import com.example.birthdayservice.IBirthdayInfoProvider; interface IBirthdayService { /** The same thing, but using a binder object. */ String wishWithProvider(IBirthdayInfoProvider provider); /** The same thing, but using `IBinder`. */ String wishWithErasedProvider(IBinder provider); }

birthday_service/src/client.rs:

/// Rust struct implementing the `IBirthdayInfoProvider` interface. struct InfoProvider { name: String, age: u8, } impl binder::Interface for InfoProvider {} impl IBirthdayInfoProvider for InfoProvider { fn name(&self) -> binder::Result<String> { Ok(self.name.clone()) } fn years(&self) -> binder::Result<i32> { Ok(self.age as i32) } } fn main() { binder::ProcessState::start_thread_pool(); let service = connect().expect("Failed to connect to BirthdayService"); // Create a binder object for the `IBirthdayInfoProvider` interface. let provider = BnBirthdayInfoProvider::new_binder( InfoProvider { name: name.clone(), age: years as u8 }, BinderFeatures::default(), ); // Send the binder object to the service. service.wishWithProvider(&provider)?; // Perform the same operation but passing the provider as an `SpIBinder`. service.wishWithErasedProvider(&provider.as_binder())?; }

Speaker Notes

  • Note the usage of BnBirthdayInfoProvider. This serves the same purpose as BnBirthdayService that we saw previously.

Parcelables

Binder for Rust supports sending parcelables directly:

birthday_service/aidl/com/example/birthdayservice/BirthdayInfo.aidl:

package com.example.birthdayservice; parcelable BirthdayInfo { String name; int years; }

birthday_service/aidl/com/example/birthdayservice/IBirthdayService.aidl:

import com.example.birthdayservice.BirthdayInfo; interface IBirthdayService { /** The same thing, but with a parcelable. */ String wishWithInfo(in BirthdayInfo info); }

birthday_service/src/client.rs:

fn main() { binder::ProcessState::start_thread_pool(); let service = connect().expect("Failed to connect to BirthdayService"); service.wishWithInfo(&BirthdayInfo { name: name.clone(), years })?; }

Sending Files

Files can be sent between Binder clients/servers using the ParcelFileDescriptor type:

birthday_service/aidl/com/example/birthdayservice/IBirthdayService.aidl:

interface IBirthdayService { /** The same thing, but loads info from a file. */ String wishFromFile(in ParcelFileDescriptor infoFile); }

birthday_service/src/client.rs:

fn main() { binder::ProcessState::start_thread_pool(); let service = connect().expect("Failed to connect to BirthdayService"); // Open a file and put the birthday info in it. let mut file = File::create("/data/local/tmp/birthday.info").unwrap(); writeln!(file, "{name}")?; writeln!(file, "{years}")?; // Create a `ParcelFileDescriptor` from the file and send it. let file = ParcelFileDescriptor::new(file); service.wishFromFile(&file)?; }

birthday_service/src/lib.rs:

impl IBirthdayService for BirthdayService { fn wishFromFile( &self, info_file: &ParcelFileDescriptor, ) -> binder::Result<String> { // Convert the file descriptor to a `File`. `ParcelFileDescriptor` wraps // an `OwnedFd`, which can be cloned and then used to create a `File` // object. let mut info_file = info_file .as_ref() .try_clone() .map(File::from) .expect("Invalid file handle"); let mut contents = String::new(); info_file.read_to_string(&mut contents).unwrap(); let mut lines = contents.lines(); let name = lines.next().unwrap(); let years: i32 = lines.next().unwrap().parse().unwrap(); Ok(format!("Happy Birthday {name}, congratulations with the {years} years!")) } }

Speaker Notes

  • ParcelFileDescriptor wraps an OwnedFd, and so can be created from a File (or any other type that wraps an OwnedFd), and can be used to create a new File handle on the other side.
  • Other types of file descriptors can be wrapped and sent, e.g. TCP, UDP, and UNIX sockets.

Testing in Android

Building on Testing, we will now look at how unit tests work in AOSP. Use the rust_test module for your unit tests:

testing/Android.bp:

rust_library { name: "libleftpad", crate_name: "leftpad", srcs: ["src/lib.rs"], } rust_test { name: "libleftpad_test", crate_name: "leftpad_test", srcs: ["src/lib.rs"], host_supported: true, test_suites: ["general-tests"], }

testing/src/lib.rs:

#![allow(unused)] fn main() { //! Left-padding library. /// Left-pad `s` to `width`. pub fn leftpad(s: &str, width: usize) -> String { format!("{s:>width$}") } #[cfg(test)] mod tests { use super::*; #[test] fn short_string() { assert_eq!(leftpad("foo", 5), " foo"); } #[test] fn long_string() { assert_eq!(leftpad("foobar", 6), "foobar"); } } }

You can now run the test with

atest --host libleftpad_test

The output looks like this:

INFO: Elapsed time: 2.666s, Critical Path: 2.40s INFO: 3 processes: 2 internal, 1 linux-sandbox. INFO: Build completed successfully, 3 total actions //comprehensive-rust-android/testing:libleftpad_test_host PASSED in 2.3s PASSED libleftpad_test.tests::long_string (0.0s) PASSED libleftpad_test.tests::short_string (0.0s) Test cases: finished with 2 passing and 0 failing out of 2 test cases

Notice how you only mention the root of the library crate. Tests are found recursively in nested modules.

GoogleTest

The GoogleTest crate allows for flexible test assertions using matchers:

use googletest::prelude::*; #[googletest::test] fn test_elements_are() { let value = vec!["foo", "bar", "baz"]; expect_that!(value, elements_are!(eq("foo"), lt("xyz"), starts_with("b"))); }

If we change the last element to "!", the test fails with a structured error message pin-pointing the error:

---- test_elements_are stdout ---- Value of: value Expected: has elements: 0. is equal to "foo" 1. is less than "xyz" 2. starts with prefix "!" Actual: ["foo", "bar", "baz"], where element #2 is "baz", which does not start with "!" at src/testing/googletest.rs:6:5 Error: See failure output above

Speaker Notes

This slide should take about 5 minutes.
  • GoogleTest is not part of the Rust Playground, so you need to run this example in a local environment. Use cargo add googletest to quickly add it to an existing Cargo project.

  • The use googletest::prelude::*; line imports a number of commonly used macros and types.

  • This just scratches the surface, there are many builtin matchers. Consider going through the first chapter of “Advanced testing for Rust applications”, a self-guided Rust course: it provides a guided introduction to the library, with exercises to help you get comfortable with googletest macros, its matchers and its overall philosophy.

  • A particularly nice feature is that mismatches in multi-line strings are shown as a diff:

#[test] fn test_multiline_string_diff() { let haiku = "Memory safety found,\n\ Rust's strong typing guides the way,\n\ Secure code you'll write."; assert_that!( haiku, eq("Memory safety found,\n\ Rust's silly humor guides the way,\n\ Secure code you'll write.") ); }

shows a color-coded diff (colors not shown here):

Value of: haiku Expected: is equal to "Memory safety found,\nRust's silly humor guides the way,\nSecure code you'll write." Actual: "Memory safety found,\nRust's strong typing guides the way,\nSecure code you'll write.", which isn't equal to "Memory safety found,\nRust's silly humor guides the way,\nSecure code you'll write." Difference(-actual / +expected): Memory safety found, -Rust's strong typing guides the way, +Rust's silly humor guides the way, Secure code you'll write. at src/testing/googletest.rs:17:5

Mocking

For mocking, Mockall is a widely used library. You need to refactor your code to use traits, which you can then quickly mock:

use std::time::Duration; #[mockall::automock] pub trait Pet { fn is_hungry(&self, since_last_meal: Duration) -> bool; } #[test] fn test_robot_dog() { let mut mock_dog = MockPet::new(); mock_dog.expect_is_hungry().return_const(true); assert_eq!(mock_dog.is_hungry(Duration::from_secs(10)), true); }

Speaker Notes

This slide should take about 5 minutes.
  • Mockall is the recommended mocking library in Android (AOSP). There are other mocking libraries available on crates.io, in particular in the area of mocking HTTP services. The other mocking libraries work in a similar fashion as Mockall, meaning that they make it easy to get a mock implementation of a given trait.

  • Note that mocking is somewhat controversial: mocks allow you to completely isolate a test from its dependencies. The immediate result is faster and more stable test execution. On the other hand, the mocks can be configured wrongly and return output different from what the real dependencies would do.

    If at all possible, it is recommended that you use the real dependencies. As an example, many databases allow you to configure an in-memory backend. This means that you get the correct behavior in your tests, plus they are fast and will automatically clean up after themselves.

    Similarly, many web frameworks allow you to start an in-process server which binds to a random port on localhost. Always prefer this over mocking away the framework since it helps you test your code in the real environment.

  • Mockall is not part of the Rust Playground, so you need to run this example in a local environment. Use cargo add mockall to quickly add Mockall to an existing Cargo project.

  • Mockall has a lot more functionality. In particular, you can set up expectations which depend on the arguments passed. Here we use this to mock a cat which becomes hungry 3 hours after the last time it was fed:

#[test] fn test_robot_cat() { let mut mock_cat = MockPet::new(); mock_cat .expect_is_hungry() .with(mockall::predicate::gt(Duration::from_secs(3 * 3600))) .return_const(true); mock_cat.expect_is_hungry().return_const(false); assert_eq!(mock_cat.is_hungry(Duration::from_secs(1 * 3600)), false); assert_eq!(mock_cat.is_hungry(Duration::from_secs(5 * 3600)), true); }
  • You can use .times(n) to limit the number of times a mock method can be called to n — the mock will automatically panic when dropped if this isn’t satisfied.

Logging

You should use the log crate to automatically log to logcat (on-device) or stdout (on-host):

hello_rust_logs/Android.bp:

rust_binary { name: "hello_rust_logs", crate_name: "hello_rust_logs", srcs: ["src/main.rs"], rustlibs: [ "liblog_rust", "liblogger", ], host_supported: true, }

hello_rust_logs/src/main.rs:

//! Rust logging demo. use log::{debug, error, info}; /// Logs a greeting. fn main() { logger::init( logger::Config::default() .with_tag_on_device("rust") .with_min_level(log::Level::Trace), ); debug!("Starting program."); info!("Things are going fine."); error!("Something went wrong!"); }

Build, push, and run the binary on your device:

m hello_rust_logs adb push "$ANDROID_PRODUCT_OUT/system/bin/hello_rust_logs" /data/local/tmp adb shell /data/local/tmp/hello_rust_logs

The logs show up in adb logcat:

adb logcat -s rust
09-08 08:38:32.454 2420 2420 D rust: hello_rust_logs: Starting program. 09-08 08:38:32.454 2420 2420 I rust: hello_rust_logs: Things are going fine. 09-08 08:38:32.454 2420 2420 E rust: hello_rust_logs: Something went wrong!

Interoperability

Rust has excellent support for interoperability with other languages. This means that you can:

  • Call Rust functions from other languages.
  • Call functions written in other languages from Rust.

When you call functions in a foreign language we say that you’re using a foreign function interface, also known as FFI.

Interoperability with C

Rust has full support for linking object files with a C calling convention. Similarly, you can export Rust functions and call them from C.

You can do it by hand if you want:

extern "C" { fn abs(x: i32) -> i32; } fn main() { let x = -42; // SAFETY: `abs` doesn't have any safety requirements. let abs_x = unsafe { abs(x) }; println!("{x}, {abs_x}"); }

We already saw this in the Safe FFI Wrapper exercise.

This assumes full knowledge of the target platform. Not recommended for production.

We will look at better options next.

Using Bindgen

The bindgen tool can auto-generate bindings from a C header file.

First create a small C library:

interoperability/bindgen/libbirthday.h:

typedef struct card { const char* name; int years; } card; void print_card(const card* card);

interoperability/bindgen/libbirthday.c:

#include <stdio.h> #include "libbirthday.h" void print_card(const card* card) { printf("+--------------\n"); printf("| Happy Birthday %s!\n", card->name); printf("| Congratulations with the %i years!\n", card->years); printf("+--------------\n"); }

Add this to your Android.bp file:

interoperability/bindgen/Android.bp:

cc_library { name: "libbirthday", srcs: ["libbirthday.c"], }

Create a wrapper header file for the library (not strictly needed in this example):

interoperability/bindgen/libbirthday_wrapper.h:

#include "libbirthday.h"

You can now auto-generate the bindings:

interoperability/bindgen/Android.bp:

rust_bindgen { name: "libbirthday_bindgen", crate_name: "birthday_bindgen", wrapper_src: "libbirthday_wrapper.h", source_stem: "bindings", static_libs: ["libbirthday"], }

Finally, we can use the bindings in our Rust program:

interoperability/bindgen/Android.bp:

rust_binary { name: "print_birthday_card", srcs: ["main.rs"], rustlibs: ["libbirthday_bindgen"], }

interoperability/bindgen/main.rs:

//! Bindgen demo. use birthday_bindgen::{card, print_card}; fn main() { let name = std::ffi::CString::new("Peter").unwrap(); let card = card { name: name.as_ptr(), years: 42 }; // SAFETY: The pointer we pass is valid because it came from a Rust // reference, and the `name` it contains refers to `name` above which also // remains valid. `print_card` doesn't store either pointer to use later // after it returns. unsafe { print_card(&card as *const card); } }

Build, push, and run the binary on your device:

m print_birthday_card adb push "$ANDROID_PRODUCT_OUT/system/bin/print_birthday_card" /data/local/tmp adb shell /data/local/tmp/print_birthday_card

Finally, we can run auto-generated tests to ensure the bindings work:

interoperability/bindgen/Android.bp:

rust_test { name: "libbirthday_bindgen_test", srcs: [":libbirthday_bindgen"], crate_name: "libbirthday_bindgen_test", test_suites: ["general-tests"], auto_gen_config: true, clippy_lints: "none", // Generated file, skip linting lints: "none", }
atest libbirthday_bindgen_test

Calling Rust

Exporting Rust functions and types to C is easy:

interoperability/rust/libanalyze/analyze.rs

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

interoperability/rust/libanalyze/analyze.h

#ifndef ANALYSE_H #define ANALYSE_H extern "C" { void analyze_numbers(int x, int y); } #endif

interoperability/rust/libanalyze/Android.bp

rust_ffi { name: "libanalyze_ffi", crate_name: "analyze_ffi", srcs: ["analyze.rs"], include_dirs: ["."], }

We can now call this from a C binary:

interoperability/rust/analyze/main.c

#include "analyze.h" int main() { analyze_numbers(10, 20); analyze_numbers(123, 123); return 0; }

interoperability/rust/analyze/Android.bp

cc_binary { name: "analyze_numbers", srcs: ["main.c"], static_libs: ["libanalyze_ffi"], }

Build, push, and run the binary on your device:

m analyze_numbers adb push "$ANDROID_PRODUCT_OUT/system/bin/analyze_numbers" /data/local/tmp adb shell /data/local/tmp/analyze_numbers

Speaker Notes

#[no_mangle] disables Rust’s usual name mangling, so the exported symbol will just be the name of the function. You can also use #[export_name = "some_name"] to specify whatever name you want.

With C++

The CXX crate makes it possible to do safe interoperability between Rust and C++.

The overall approach looks like this:

The Bridge Module

CXX relies on a description of the function signatures that will be exposed from each language to the other. You provide this description using extern blocks in a Rust module annotated with the #[cxx::bridge] attribute macro.

#[allow(unsafe_op_in_unsafe_fn)] #[cxx::bridge(namespace = "org::blobstore")] mod ffi { // Shared structs with fields visible to both languages. struct BlobMetadata { size: usize, tags: Vec<String>, } // Rust types and signatures exposed to C++. extern "Rust" { type MultiBuf; fn next_chunk(buf: &mut MultiBuf) -> &[u8]; } // C++ types and signatures exposed to Rust. unsafe extern "C++" { include!("include/blobstore.h"); type BlobstoreClient; fn new_blobstore_client() -> UniquePtr<BlobstoreClient>; fn put(self: Pin<&mut BlobstoreClient>, parts: &mut MultiBuf) -> u64; fn tag(self: Pin<&mut BlobstoreClient>, blobid: u64, tag: &str); fn metadata(&self, blobid: u64) -> BlobMetadata; } }

Speaker Notes

  • The bridge is generally declared in an ffi module within your crate.
  • From the declarations made in the bridge module, CXX will generate matching Rust and C++ type/function definitions in order to expose those items to both languages.
  • To view the generated Rust code, use cargo-expand to view the expanded proc macro. For most of the examples you would use cargo expand ::ffi to expand just the ffi module (though this doesn’t apply for Android projects).
  • To view the generated C++ code, look in target/cxxbridge.

Rust Bridge Declarations

#[cxx::bridge] mod ffi { extern "Rust" { type MyType; // Opaque type fn foo(&self); // Method on `MyType` fn bar() -> Box<MyType>; // Free function } } struct MyType(i32); impl MyType { fn foo(&self) { println!("{}", self.0); } } fn bar() -> Box<MyType> { Box::new(MyType(123)) }

Speaker Notes

  • Items declared in the extern "Rust" reference items that are in scope in the parent module.
  • The CXX code generator uses your extern "Rust" section(s) to produce a C++ header file containing the corresponding C++ declarations. The generated header has the same path as the Rust source file containing the bridge, except with a .rs.h file extension.

Generated C++

#[cxx::bridge] mod ffi { // Rust types and signatures exposed to C++. extern "Rust" { type MultiBuf; fn next_chunk(buf: &mut MultiBuf) -> &[u8]; } }

Results in (roughly) the following C++:

struct MultiBuf final : public ::rust::Opaque { ~MultiBuf() = delete; private: friend ::rust::layout; struct layout { static ::std::size_t size() noexcept; static ::std::size_t align() noexcept; }; }; ::rust::Slice<::std::uint8_t const> next_chunk(::org::blobstore::MultiBuf &buf) noexcept;

C++ Bridge Declarations

#[cxx::bridge] mod ffi { // C++ types and signatures exposed to Rust. unsafe extern "C++" { include!("include/blobstore.h"); type BlobstoreClient; fn new_blobstore_client() -> UniquePtr<BlobstoreClient>; fn put(self: Pin<&mut BlobstoreClient>, parts: &mut MultiBuf) -> u64; fn tag(self: Pin<&mut BlobstoreClient>, blobid: u64, tag: &str); fn metadata(&self, blobid: u64) -> BlobMetadata; } }

Results in (roughly) the following Rust:

#[repr(C)] pub struct BlobstoreClient { _private: ::cxx::private::Opaque, } pub fn new_blobstore_client() -> ::cxx::UniquePtr<BlobstoreClient> { extern "C" { #[link_name = "org$blobstore$cxxbridge1$new_blobstore_client"] fn __new_blobstore_client() -> *mut BlobstoreClient; } unsafe { ::cxx::UniquePtr::from_raw(__new_blobstore_client()) } } impl BlobstoreClient { pub fn put(&self, parts: &mut MultiBuf) -> u64 { extern "C" { #[link_name = "org$blobstore$cxxbridge1$BlobstoreClient$put"] fn __put( _: &BlobstoreClient, parts: *mut ::cxx::core::ffi::c_void, ) -> u64; } unsafe { __put(self, parts as *mut MultiBuf as *mut ::cxx::core::ffi::c_void) } } } // ...

Speaker Notes

  • The programmer does not need to promise that the signatures they have typed in are accurate. CXX performs static assertions that the signatures exactly correspond with what is declared in C++.
  • unsafe extern blocks allow you to declare C++ functions that are safe to call from Rust.

Shared Types

#[cxx::bridge] mod ffi { #[derive(Clone, Debug, Hash)] struct PlayingCard { suit: Suit, value: u8, // A=1, J=11, Q=12, K=13 } enum Suit { Clubs, Diamonds, Hearts, Spades, } }

Speaker Notes

  • Only C-like (unit) enums are supported.
  • A limited number of traits are supported for #[derive()] on shared types. Corresponding functionality is also generated for the C++ code, e.g. if you derive Hash also generates an implementation of std::hash for the corresponding C++ type.

Shared Enums

#[cxx::bridge] mod ffi { enum Suit { Clubs, Diamonds, Hearts, Spades, } }

Generated Rust:

#![allow(unused)] fn main() { #[derive(Copy, Clone, PartialEq, Eq)] #[repr(transparent)] pub struct Suit { pub repr: u8, } #[allow(non_upper_case_globals)] impl Suit { pub const Clubs: Self = Suit { repr: 0 }; pub const Diamonds: Self = Suit { repr: 1 }; pub const Hearts: Self = Suit { repr: 2 }; pub const Spades: Self = Suit { repr: 3 }; } }

Generated C++:

enum class Suit : uint8_t { Clubs = 0, Diamonds = 1, Hearts = 2, Spades = 3, };

Speaker Notes

  • On the Rust side, the code generated for shared enums is actually a struct wrapping a numeric value. This is because it is not UB in C++ for an enum class to hold a value different from all of the listed variants, and our Rust representation needs to have the same behavior.

Rust Error Handling

#[cxx::bridge] mod ffi { extern "Rust" { fn fallible(depth: usize) -> Result<String>; } } fn fallible(depth: usize) -> anyhow::Result<String> { if depth == 0 { return Err(anyhow::Error::msg("fallible1 requires depth > 0")); } Ok("Success!".into()) }

Speaker Notes

  • Rust functions that return Result are translated to exceptions on the C++ side.
  • The exception thrown will always be of type rust::Error, which primarily exposes a way to get the error message string. The error message will come from the error type’s Display impl.
  • A panic unwinding from Rust to C++ will always cause the process to immediately terminate.

C++ Error Handling

#[cxx::bridge] mod ffi { unsafe extern "C++" { include!("example/include/example.h"); fn fallible(depth: usize) -> Result<String>; } } fn main() { if let Err(err) = ffi::fallible(99) { eprintln!("Error: {}", err); process::exit(1); } }

Speaker Notes

  • C++ functions declared to return a Result will catch any thrown exception on the C++ side and return it as an Err value to the calling Rust function.
  • If an exception is thrown from an extern “C++” function that is not declared by the CXX bridge to return Result, the program calls C++’s std::terminate. The behavior is equivalent to the same exception being thrown through a noexcept C++ function.

Additional Types

Rust TypeC++ Type
Stringrust::String
&strrust::Str
CxxStringstd::string
&[T]/&mut [T]rust::Slice
Box<T>rust::Box<T>
UniquePtr<T>std::unique_ptr<T>
Vec<T>rust::Vec<T>
CxxVector<T>std::vector<T>

Speaker Notes

  • These types can be used in the fields of shared structs and the arguments and returns of extern functions.
  • Note that Rust’s String does not map directly to std::string. There are a few reasons for this:
    • std::string does not uphold the UTF-8 invariant that String requires.
    • The two types have different layouts in memory and so can’t be passed directly between languages.
    • std::string requires move constructors that don’t match Rust’s move semantics, so a std::string can’t be passed by value to Rust.

Building in Android

Create a cc_library_static to build the C++ library, including the CXX generated header and source file.

cc_library_static { name: "libcxx_test_cpp", srcs: ["cxx_test.cpp"], generated_headers: [ "cxx-bridge-header", "libcxx_test_bridge_header" ], generated_sources: ["libcxx_test_bridge_code"], }

Speaker Notes

  • Point out that libcxx_test_bridge_header and libcxx_test_bridge_code are the dependencies for the CXX-generated C++ bindings. We’ll show how these are setup on the next slide.
  • Note that you also need to depend on the cxx-bridge-header library in order to pull in common CXX definitions.
  • Full docs for using CXX in Android can be found in the Android docs. You may want to share that link with the class so that students know where they can find these instructions again in the future.

Building in Android

Create two genrules: One to generate the CXX header, and one to generate the CXX source file. These are then used as inputs to the cc_library_static.

// Generate a C++ header containing the C++ bindings // to the Rust exported functions in lib.rs. genrule { name: "libcxx_test_bridge_header", tools: ["cxxbridge"], cmd: "$(location cxxbridge) $(in) --header > $(out)", srcs: ["lib.rs"], out: ["lib.rs.h"], } // Generate the C++ code that Rust calls into. genrule { name: "libcxx_test_bridge_code", tools: ["cxxbridge"], cmd: "$(location cxxbridge) $(in) > $(out)", srcs: ["lib.rs"], out: ["lib.rs.cc"], }

Speaker Notes

  • The cxxbridge tool is a standalone tool that generates the C++ side of the bridge module. It is included in Android and available as a Soong tool.
  • By convention, if your Rust source file is lib.rs your header file will be named lib.rs.h and your source file will be named lib.rs.cc. This naming convention isn’t enforced, though.

Building in Android

Create a rust_binary that depends on libcxx and your cc_library_static.

rust_binary { name: "cxx_test", srcs: ["lib.rs"], rustlibs: ["libcxx"], static_libs: ["libcxx_test_cpp"], }

Interoperability with Java

Java can load shared objects via Java Native Interface (JNI). The jni crate allows you to create a compatible library.

First, we create a Rust function to export to Java:

interoperability/java/src/lib.rs:

#![allow(unused)] fn main() { //! Rust <-> Java FFI demo. use jni::objects::{JClass, JString}; use jni::sys::jstring; use jni::JNIEnv; /// HelloWorld::hello method implementation. #[no_mangle] pub extern "system" fn Java_HelloWorld_hello( env: JNIEnv, _class: JClass, name: JString, ) -> jstring { let input: String = env.get_string(name).unwrap().into(); let greeting = format!("Hello, {input}!"); let output = env.new_string(greeting).unwrap(); output.into_inner() } }

interoperability/java/Android.bp:

rust_ffi_shared { name: "libhello_jni", crate_name: "hello_jni", srcs: ["src/lib.rs"], rustlibs: ["libjni"], }

We then call this function from Java:

interoperability/java/HelloWorld.java:

class HelloWorld { private static native String hello(String name); static { System.loadLibrary("hello_jni"); } public static void main(String[] args) { String output = HelloWorld.hello("Alice"); System.out.println(output); } }

interoperability/java/Android.bp:

java_binary { name: "helloworld_jni", srcs: ["HelloWorld.java"], main_class: "HelloWorld", required: ["libhello_jni"], }

Finally, you can build, sync, and run the binary:

m helloworld_jni adb sync # requires adb root && adb remount adb shell /system/bin/helloworld_jni

Exercises

This is a group exercise: We will look at one of the projects you work with and try to integrate some Rust into it. Some suggestions:

  • Call your AIDL service with a client written in Rust.

  • Move a function from your project to Rust and call it.

Speaker Notes

No solution is provided here since this is open-ended: it relies on someone in the class having a piece of code which you can turn in to Rust on the fly.

Welcome to Rust in Chromium

Rust is supported for third-party libraries in Chromium, with first-party glue code to connect between Rust and existing Chromium C++ code.

Today, we’ll call into Rust to do something silly with strings. If you’ve got a corner of the code where you’re displaying a UTF8 string to the user, feel free to follow this recipe in your part of the codebase instead of the exact part we talk about.

Setup

Make sure you can build and run Chromium. Any platform and set of build flags is OK, so long as your code is relatively recent (commit position 1223636 onwards, corresponding to November 2023):

gn gen out/Debug autoninja -C out/Debug chrome out/Debug/chrome # or on Mac, out/Debug/Chromium.app/Contents/MacOS/Chromium

(A component, debug build is recommended for quickest iteration time. This is the default!)

See How to build Chromium if you aren’t already at that point. Be warned: setting up to build Chromium takes time.

It’s also recommended that you have Visual Studio code installed.

About the exercises

This part of the course has a series of exercises which build on each other. We’ll be doing them spread throughout the course instead of just at the end. If you don’t have time to complete a certain part, don’t worry: you can catch up in the next slot.

Comparing Chromium and Cargo Ecosystems

The Rust community typically uses cargo and libraries from crates.io. Chromium is built using gn and ninja and a curated set of dependencies.

When writing code in Rust, your choices are:

From here on we’ll be focusing on gn and ninja, because this is how Rust code can be built into the Chromium browser. At the same time, Cargo is an important part of the Rust ecosystem and you should keep it in your toolbox.

Mini exercise

Split into small groups and:

  • Brainstorm scenarios where cargo may offer an advantage and assess the risk profile of these scenarios.
  • Discuss which tools, libraries, and groups of people need to be trusted when using gn and ninja, offline cargo, etc.

Speaker Notes

Ask students to avoid peeking at the speaker notes before completing the exercise. Assuming folks taking the course are physically together, ask them to discuss in small groups of 3-4 people.

Notes/hints related to the first part of the exercise (“scenarios where Cargo may offer an advantage”):

  • It’s fantastic that when writing a tool, or prototyping a part of Chromium, one has access to the rich ecosystem of crates.io libraries. There is a crate for almost anything and they are usually quite pleasant to use. (clap for command-line parsing, serde for serializing/deserializing to/from various formats, itertools for working with iterators, etc.).

    • cargo makes it easy to try a library (just add a single line to Cargo.toml and start writing code)
    • It may be worth comparing how CPAN helped make perl a popular choice. Or comparing with python + pip.
  • Development experience is made really nice not only by core Rust tools (e.g. using rustup to switch to a different rustc version when testing a crate that needs to work on nightly, current stable, and older stable) but also by an ecosystem of third-party tools (e.g. Mozilla provides cargo vet for streamlining and sharing security audits; criterion crate gives a streamlined way to run benchmarks).

    • cargo makes it easy to add a tool via cargo install --locked cargo-vet.
    • It may be worth comparing with Chrome Extensions or VScode extensions.
  • Broad, generic examples of projects where cargo may be the right choice:

    • Perhaps surprisingly, Rust is becoming increasingly popular in the industry for writing command line tools. The breadth and ergonomics of libraries is comparable to Python, while being more robust (thanks to the rich typesystem) and running faster (as a compiled, rather than interpreted language).
    • Participating in the Rust ecosystem requires using standard Rust tools like Cargo. Libraries that want to get external contributions, and want to be used outside of Chromium (e.g. in Bazel or Android/Soong build environments) should probably use Cargo.
  • Examples of Chromium-related projects that are cargo-based:

    • serde_json_lenient (experimented with in other parts of Google which resulted in PRs with performance improvements)
    • Fontations libraries like font-types
    • gnrt tool (we will meet it later in the course) which depends on clap for command-line parsing and on toml for configuration files.
      • Disclaimer: a unique reason for using cargo was unavailability of gn when building and bootstrapping Rust standard library when building Rust toolchain.
      • run_gnrt.py uses Chromium’s copy of cargo and rustc. gnrt depends on third-party libraries downloaded from the internet, but run_gnrt.py asks cargo that only --locked content is allowed via Cargo.lock.)

Students may identify the following items as being implicitly or explicitly trusted:

  • rustc (the Rust compiler) which in turn depends on the LLVM libraries, the Clang compiler, the rustc sources (fetched from GitHub, reviewed by Rust compiler team), binary Rust compiler downloaded for bootstrapping
  • rustup (it may be worth pointing out that rustup is developed under the umbrella of the https://github.com/rust-lang/ organization - same as rustc)
  • cargo, rustfmt, etc.
  • Various internal infrastructure (bots that build rustc, system for distributing the prebuilt toolchain to Chromium engineers, etc.)
  • Cargo tools like cargo audit, cargo vet, etc.
  • Rust libraries vendored into //third_party/rust (audited by security@chromium.org)
  • Other Rust libraries (some niche, some quite popular and commonly used)

Chromium Rust policy

Chromium does not yet allow first-party Rust except in rare cases as approved by Chromium’s Area Tech Leads.

Chromium’s policy on third party libraries is outlined here - Rust is allowed for third party libraries under various circumstances, including if they’re the best option for performance or for security.

Very few Rust libraries directly expose a C/C++ API, so that means that nearly all such libraries will require a small amount of first-party glue code.

RustExistingcrateLanguageCrateboundaryAPIExistingChromiumChromiumRustRustC++C++wrapper

First-party Rust glue code for a particular third-party crate should normally be kept in third_party/rust/<crate>/<version>/wrapper.

Because of this, today’s course will be heavily focused on:

  • Bringing in third-party Rust libraries (“crates”)
  • Writing glue code to be able to use those crates from Chromium C++.

If this policy changes over time, the course will evolve to keep up.

Build rules

Rust code is usually built using cargo. Chromium builds with gn and ninja for efficiency — its static rules allow maximum parallelism. Rust is no exception.

Adding Rust code to Chromium

In some existing Chromium BUILD.gn file, declare a rust_static_library:

import("//build/rust/rust_static_library.gni") rust_static_library("my_rust_lib") { crate_root = "lib.rs" sources = [ "lib.rs" ] }

You can also add deps on other Rust targets. Later we’ll use this to depend upon third party code.

Speaker Notes

You must specify both the crate root, and a full list of sources. The crate_root is the file given to the Rust compiler representing the root file of the compilation unit — typically lib.rs. sources is a complete list of all source files which ninja needs in order to determine when rebuilds are necessary.

(There’s no such thing as a Rust source_set, because in Rust, an entire crate is a compilation unit. A static_library is the smallest unit.)

Students might be wondering why we need a gn template, rather than using gn’s built-in support for Rust static libraries. The answer is that this template provides support for CXX interop, Rust features, and unit tests, some of which we’ll use later.

Including unsafe Rust Code

Unsafe Rust code is forbidden in rust_static_library by default — it won’t compile. If you need unsafe Rust code, add allow_unsafe = true to the gn target. (Later in the course we’ll see circumstances where this is necessary.)

import("//build/rust/rust_static_library.gni") rust_static_library("my_rust_lib") { crate_root = "lib.rs" sources = [ "lib.rs", "hippopotamus.rs" ] allow_unsafe = true }

Depending on Rust Code from Chromium C++

Simply add the above target to the deps of some Chromium C++ target.

import("//build/rust/rust_static_library.gni") rust_static_library("my_rust_lib") { crate_root = "lib.rs" sources = [ "lib.rs" ] } # or source_set, static_library etc. component("preexisting_cpp") { deps = [ ":my_rust_lib" ] }

Speaker Notes

We'll see that this relationship only works if the Rust code exposes plain C APIs which can be called from C++, or if we use a C++/Rust interop tool.

Visual Studio Code

Types are elided in Rust code, which makes a good IDE even more useful than for C++. Visual Studio code works well for Rust in Chromium. To use it,

  • Ensure your VSCode has the rust-analyzer extension, not earlier forms of Rust support
  • gn gen out/Debug --export-rust-project (or equivalent for your output directory)
  • ln -s out/Debug/rust-project.json rust-project.json
Example screenshot from VSCode

Speaker Notes

A demo of some of the code annotation and exploration features of rust-analyzer might be beneficial if the audience are naturally skeptical of IDEs.

The following steps may help with the demo (but feel free to instead use a piece of Chromium-related Rust that you are most familiar with):

  • Open components/qr_code_generator/qr_code_generator_ffi_glue.rs
  • Place the cursor over the QrCode::new call (around line 26) in `qr_code_generator_ffi_glue.rs
  • Demo show documentation (typical bindings: vscode = ctrl k i; vim/CoC = K).
  • Demo go to definition (typical bindings: vscode = F12; vim/CoC = g d). (This will take you to //third_party/rust/.../qr_code-.../src/lib.rs.)
  • Demo outline and navigate to the QrCode::with_bits method (around line 164; the outline is in the file explorer pane in vscode; typical vim/CoC bindings = space o)
  • Demo type annotations (there are quote a few nice examples in the QrCode::with_bits method)

It may be worth pointing out that gn gen ... --export-rust-project will need to be rerun after editing BUILD.gn files (which we will do a few times throughout the exercises in this session).

Build rules exercise

In your Chromium build, add a new Rust target to //ui/base/BUILD.gn containing:

#![allow(unused)] fn main() { #[no_mangle] pub extern "C" fn hello_from_rust() { println!("Hello from Rust!") } }

Important: note that no_mangle here is considered a type of unsafety by the Rust compiler, so you’ll need to allow unsafe code in your gn target.

Add this new Rust target as a dependency of //ui/base:base. Declare this function at the top of ui/base/resource/resource_bundle.cc (later, we’ll see how this can be automated by bindings generation tools):

extern "C" void hello_from_rust();

Call this function from somewhere in ui/base/resource/resource_bundle.cc - we suggest the top of ResourceBundle::MaybeMangleLocalizedString. Build and run Chromium, and ensure that “Hello from Rust!” is printed lots of times.

If you use VSCode, now set up Rust to work well in VSCode. It will be useful in subsequent exercises. If you’ve succeeded, you will be able to use right-click “Go to definition” on println!.

Where to find help

Speaker Notes

It's really important that students get this running, because future exercises will build on it.

This example is unusual because it boils down to the lowest-common-denominator interop language, C. Both C++ and Rust can natively declare and call C ABI functions. Later in the course, we’ll connect C++ directly to Rust.

allow_unsafe = true is required here because #[no_mangle] might allow Rust to generate two functions with the same name, and Rust can no longer guarantee that the right one is called.

If you need a pure Rust executable, you can also do that using the rust_executable gn template.

Testing

Rust community typically authors unit tests in a module placed in the same source file as the code being tested. This was covered earlier in the course and looks like this:

#![allow(unused)] fn main() { #[cfg(test)] mod tests { #[test] fn my_test() { todo!() } } }

In Chromium we place unit tests in a separate source file and we continue to follow this practice for Rust — this makes tests consistently discoverable and helps to avoid rebuilding .rs files a second time (in the test configuration).

This results in the following options for testing Rust code in Chromium:

  • Native Rust tests (i.e. #[test]). Discouraged outside of //third_party/rust.
  • gtest tests authored in C++ and exercising Rust via FFI calls. Sufficient when Rust code is just a thin FFI layer and the existing unit tests provide sufficient coverage for the feature.
  • gtest tests authored in Rust and using the crate under test through its public API (using pub mod for_testing { ... } if needed). This is the subject of the next few slides.

Speaker Notes

Mention that native Rust tests of third-party crates should eventually be exercised by Chromium bots. (Such testing is needed rarely — only after adding or updating third-party crates.)

Some examples may help illustrate when C++ gtest vs Rust gtest should be used:

  • QR has very little functionality in the first-party Rust layer (it’s just a thin FFI glue) and therefore uses the existing C++ unit tests for testing both the C++ and the Rust implementation (parameterizing the tests so they enable or disable Rust using a ScopedFeatureList).

  • Hypothetical/WIP PNG integration may need to implement memory-safe implementation of pixel transformations that are provided by libpng but missing in the png crate - e.g. RGBA => BGRA, or gamma correction. Such functionality may benefit from separate tests authored in Rust.

rust_gtest_interop Library

The rust_gtest_interop library provides a way to:

  • Use a Rust function as a gtest testcase (using the #[gtest(...)] attribute)
  • Use expect_eq! and similar macros (similar to assert_eq! but not panicking and not terminating the test when the assertion fails).

Example:

use rust_gtest_interop::prelude::*; #[gtest(MyRustTestSuite, MyAdditionTest)] fn test_addition() { expect_eq!(2 + 2, 4); }

GN Rules for Rust Tests

The simplest way to build Rust gtest tests is to add them to an existing test binary that already contains tests authored in C++. For example:

test("ui_base_unittests") { ... sources += [ "my_rust_lib_unittest.rs" ] deps += [ ":my_rust_lib" ] }

Authoring Rust tests in a separate static_library also works, but requires manually declaring the dependency on the support libraries:

rust_static_library("my_rust_lib_unittests") { testonly = true is_gtest_unittests = true crate_root = "my_rust_lib_unittest.rs" sources = [ "my_rust_lib_unittest.rs" ] deps = [ ":my_rust_lib", "//testing/rust_gtest_interop", ] } test("ui_base_unittests") { ... deps += [ ":my_rust_lib_unittests" ] }

chromium::import! Macro

After adding :my_rust_lib to GN deps, we still need to learn how to import and use my_rust_lib from my_rust_lib_unittest.rs. We haven’t provided an explicit crate_name for my_rust_lib so its crate name is computed based on the full target path and name. Fortunately we can avoid working with such an unwieldy name by using the chromium::import! macro from the automatically-imported chromium crate:

chromium::import! { "//ui/base:my_rust_lib"; } use my_rust_lib::my_function_under_test;

Under the covers the macro expands to something similar to:

extern crate ui_sbase_cmy_urust_ulib as my_rust_lib; use my_rust_lib::my_function_under_test;

More information can be found in the doc comment of the chromium::import macro.

Speaker Notes

rust_static_library supports specifying an explicit name via crate_name property, but doing this is discouraged. And it is discouraged because the crate name has to be globally unique. crates.io guarantees uniqueness of its crate names so cargo_crate GN targets (generated by the gnrt tool covered in a later section) use short crate names.

Testing exercise

Time for another exercise!

In your Chromium build:

  • Add a testable function next to hello_from_rust. Some suggestions: adding two integers received as arguments, computing the nth Fibonacci number, summing integers in a slice, etc.
  • Add a separate ..._unittest.rs file with a test for the new function.
  • Add the new tests to BUILD.gn.
  • Build the tests, run them, and verify that the new test works.

Interoperability with C++

The Rust community offers multiple options for C++/Rust interop, with new tools being developed all the time. At the moment, Chromium uses a tool called CXX.

You describe your whole language boundary in an interface definition language (which looks a lot like Rust) and then CXX tools generate declarations for functions and types in both Rust and C++.

Overview diagram of cxx, showing that the same interface definition is used to create both C++ and Rust side code which then communicate via a lowest common denominator C API

See the CXX tutorial for a full example of using this.

Speaker Notes

Talk through the diagram. Explain that behind the scenes, this is doing just the same as you previously did. Point out that automating the process has the following benefits:

  • The tool guarantees that the C++ and Rust sides match (e.g. you get compile errors if the #[cxx::bridge] doesn’t match the actual C++ or Rust definitions, but with out-of-sync manual bindings you’d get Undefined Behavior)
  • The tool automates generation of FFI thunks (small, C-ABI-compatible, free functions) for non-C features (e.g. enabling FFI calls into Rust or C++ methods; manual bindings would require authoring such top-level, free functions manually)
  • The tool and the library can handle a set of core types - for example:
    • &[T] can be passed across the FFI boundary, even though it doesn’t guarantee any particular ABI or memory layout. With manual bindings std::span<T> / &[T] have to be manually destructured and rebuilt out of a pointer and length - this is error-prone given that each language represents empty slices slightly differently)
    • Smart pointers like std::unique_ptr<T>, std::shared_ptr<T>, and/or Box are natively supported. With manual bindings, one would have to pass C-ABI-compatible raw pointers, which would increase lifetime and memory-safety risks.
    • rust::String and CxxString types understand and maintain differences in string representation across the languages (e.g. rust::String::lossy can build a Rust string from non-UTF8 input and rust::String::c_str can NUL-terminate a string).

Example Bindings

CXX requires that the whole C++/Rust boundary is declared in cxx::bridge modules inside .rs source code.

#[cxx::bridge] mod ffi { extern "Rust" { type MultiBuf; fn next_chunk(buf: &mut MultiBuf) -> &[u8]; } unsafe extern "C++" { include!("example/include/blobstore.h"); type BlobstoreClient; fn new_blobstore_client() -> UniquePtr<BlobstoreClient>; fn put(self: &BlobstoreClient, buf: &mut MultiBuf) -> Result<u64>; } } // Definitions of Rust types and functions go here

Speaker Notes

Point out:

  • Although this looks like a regular Rust mod, the #[cxx::bridge] procedural macro does complex things to it. The generated code is quite a bit more sophisticated - though this does still result in a mod called ffi in your code.
  • Native support for C++’s std::unique_ptr in Rust
  • Native support for Rust slices in C++
  • Calls from C++ to Rust, and Rust types (in the top part)
  • Calls from Rust to C++, and C++ types (in the bottom part)

Common misconception: It looks like a C++ header is being parsed by Rust, but this is misleading. This header is never interpreted by Rust, but simply #included in the generated C++ code for the benefit of C++ compilers.

Limitations of CXX

By far the most useful page when using CXX is the type reference.

CXX fundamentally suits cases where:

  • Your Rust-C++ interface is sufficiently simple that you can declare all of it.
  • You’re using only the types natively supported by CXX already, for example std::unique_ptr, std::string, &[u8] etc.

It has many limitations — for example lack of support for Rust’s Option type.

These limitations constrain us to using Rust in Chromium only for well isolated “leaf nodes” rather than for arbitrary Rust-C++ interop. When considering a use-case for Rust in Chromium, a good starting point is to draft the CXX bindings for the language boundary to see if it appears simple enough.

Speaker Notes

In addition, right now, Rust code in one component cannot depend on Rust code in another, due to linking details in our component build. That's another reason to restrict Rust to use in leaf nodes.

You should also discuss some of the other sticky points with CXX, for example:

  • Its error handling is based around C++ exceptions (given on the next slide)
  • Function pointers are awkward to use.

CXX Error Handling

CXX’s support for Result<T,E> relies on C++ exceptions, so we can’t use that in Chromium. Alternatives:

  • The T part of Result<T, E> can be:

    • Returned via out parameters (e.g. via &mut T). This requires that T can be passed across the FFI boundary - for example T has to be:
      • A primitive type (like u32 or usize)
      • A type natively supported by cxx (like UniquePtr<T>) that has a suitable default value to use in a failure case (unlike Box<T>).
    • Retained on the Rust side, and exposed via reference. This may be needed when T is a Rust type, which cannot be passed across the FFI boundary, and cannot be stored in UniquePtr<T>.
  • The E part of Result<T, E> can be:

    • Returned as a boolean (e.g. true representing success, and false representing failure)
    • Preserving error details is in theory possible, but so far hasn’t been needed in practice.

CXX Error Handling: QR Example

The QR code generator is an example where a boolean is used to communicate success vs failure, and where the successful result can be passed across the FFI boundary:

#[cxx::bridge(namespace = "qr_code_generator")] mod ffi { extern "Rust" { fn generate_qr_code_using_rust( data: &[u8], min_version: i16, out_pixels: Pin<&mut CxxVector<u8>>, out_qr_size: &mut usize, ) -> bool; } }

Speaker Notes

Students may be curious about the semantics of the out_qr_size output. This is not the size of the vector, but the size of the QR code (and admittedly it is a bit redundant - this is the square root of the size of the vector).

It may be worth pointing out the importance of initializing out_qr_size before calling into the Rust function. Creation of a Rust reference that points to uninitialized memory results in Undefined Behavior (unlike in C++, when only the act of dereferencing such memory results in UB).

If students ask about Pin, then explain why CXX needs it for mutable references to C++ data: the answer is that C++ data can’t be moved around like Rust data, because it may contain self-referential pointers.

CXX Error Handling: PNG Example

A prototype of a PNG decoder illustrates what can be done when the successful result cannot be passed across the FFI boundary:

#[cxx::bridge(namespace = "gfx::rust_bindings")] mod ffi { extern "Rust" { /// This returns an FFI-friendly equivalent of `Result<PngReader<'a>, /// ()>`. fn new_png_reader<'a>(input: &'a [u8]) -> Box<ResultOfPngReader<'a>>; /// C++ bindings for the `crate::png::ResultOfPngReader` type. type ResultOfPngReader<'a>; fn is_err(self: &ResultOfPngReader) -> bool; fn unwrap_as_mut<'a, 'b>( self: &'b mut ResultOfPngReader<'a>, ) -> &'b mut PngReader<'a>; /// C++ bindings for the `crate::png::PngReader` type. type PngReader<'a>; fn height(self: &PngReader) -> u32; fn width(self: &PngReader) -> u32; fn read_rgba8(self: &mut PngReader, output: &mut [u8]) -> bool; } }

Speaker Notes

PngReader and ResultOfPngReader are Rust types — objects of these types cannot cross the FFI boundary without indirection of a Box<T>. We can’t have an out_parameter: &mut PngReader, because CXX doesn’t allow C++ to store Rust objects by value.

This example illustrates that even though CXX doesn’t support arbitrary generics nor templates, we can still pass them across the FFI boundary by manually specializing / monomorphizing them into a non-generic type. In the example ResultOfPngReader is a non-generic type that forwards into appropriate methods of Result<T, E> (e.g. into is_err, unwrap, and/or as_mut).

Using cxx in Chromium

In Chromium, we define an independent #[cxx::bridge] mod for each leaf-node where we want to use Rust. You’d typically have one for each rust_static_library. Just add

cxx_bindings = [ "my_rust_file.rs" ] # list of files containing #[cxx::bridge], not all source files allow_unsafe = true

to your existing rust_static_library target alongside crate_root and sources.

C++ headers will be generated at a sensible location, so you can just

#include "ui/base/my_rust_file.rs.h"

You will find some utility functions in //base to convert to/from Chromium C++ types to CXX Rust types — for example SpanToRustSlice.

Speaker Notes

Students may ask — why do we still need allow_unsafe = true?

The broad answer is that no C/C++ code is “safe” by the normal Rust standards. Calling back and forth to C/C++ from Rust may do arbitrary things to memory, and compromise the safety of Rust’s own data layouts. Presence of too many unsafe keywords in C/C++ interop can harm the signal-to-noise ratio of such a keyword, and is controversial, but strictly, bringing any foreign code into a Rust binary can cause unexpected behavior from Rust’s perspective.

The narrow answer lies in the diagram at the top of this page — behind the scenes, CXX generates Rust unsafe and extern "C" functions just like we did manually in the previous section.

Exercise: Interoperability with C++

Part one

  • In the Rust file you previously created, add a #[cxx::bridge] which specifies a single function, to be called from C++, called hello_from_rust, taking no parameters and returning no value.
  • Modify your previous hello_from_rust function to remove extern "C" and #[no_mangle]. This is now just a standard Rust function.
  • Modify your gn target to build these bindings.
  • In your C++ code, remove the forward-declaration of hello_from_rust. Instead, include the generated header file.
  • Build and run!

Part two

It’s a good idea to play with CXX a little. It helps you think about how flexible Rust in Chromium actually is.

Some things to try:

  • Call back into C++ from Rust. You will need:
    • An additional header file which you can include! from your cxx::bridge. You’ll need to declare your C++ function in that new header file.
    • An unsafe block to call such a function, or alternatively specify the unsafe keyword in your #[cxx::bridge] as described here.
    • You may also need to #include "third_party/rust/cxx/v1/crate/include/cxx.h"
  • Pass a C++ string from C++ into Rust.
  • Pass a reference to a C++ object into Rust.
  • Intentionally get the Rust function signatures mismatched from the #[cxx::bridge], and get used to the errors you see.
  • Intentionally get the C++ function signatures mismatched from the #[cxx::bridge], and get used to the errors you see.
  • Pass a std::unique_ptr of some type from C++ into Rust, so that Rust can own some C++ object.
  • Create a Rust object and pass it into C++, so that C++ owns it. (Hint: you need a Box).
  • Declare some methods on a C++ type. Call them from Rust.
  • Declare some methods on a Rust type. Call them from C++.

Part three

Now you understand the strengths and limitations of CXX interop, think of a couple of use-cases for Rust in Chromium where the interface would be sufficiently simple. Sketch how you might define that interface.

Where to find help

Speaker Notes

As students explore Part Two, they're bound to have lots of questions about how to achieve these things, and also how CXX works behind the scenes.

Some of the questions you may encounter:

  • I’m seeing a problem initializing a variable of type X with type Y, where X and Y are both function types. This is because your C++ function doesn’t quite match the declaration in your cxx::bridge.
  • I seem to be able to freely convert C++ references into Rust references. Doesn’t that risk UB? For CXX’s opaque types, no, because they are zero-sized. For CXX trivial types yes, it’s possible to cause UB, although CXX’s design makes it quite difficult to craft such an example.

Adding Third Party Crates

Rust libraries are called “crates” and are found at crates.io. It’s very easy for Rust crates to depend upon one another. So they do!

PropertyC++ libraryRust crate
Build systemLotsConsistent: Cargo.toml
Typical library sizeLarge-ishSmall
Transitive dependenciesFewLots

For a Chromium engineer, this has pros and cons:

  • All crates use a common build system so we can automate their inclusion into Chromium…
  • … but, crates typically have transitive dependencies, so you will likely have to bring in multiple libraries.

We’ll discuss:

  • How to put a crate in the Chromium source code tree
  • How to make gn build rules for it
  • How to audit its source code for sufficient safety.

Speaker Notes

All of the things in the table on this slide are generalizations, and counter-examples can be found. But in general it's important for students to understand that most Rust code depends on other Rust libraries, because it's easy to do so, and that this has both benefits and costs.

Configuring the Cargo.toml file to add crates

Chromium has a single set of centrally-managed direct crate dependencies. These are managed through a single Cargo.toml:

[dependencies] bitflags = "1" cfg-if = "1" cxx = "1" # lots more...

As with any other Cargo.toml, you can specify more details about the dependencies — most commonly, you’ll want to specify the features that you wish to enable in the crate.

When adding a crate to Chromium, you’ll often need to provide some extra information in an additional file, gnrt_config.toml, which we’ll meet next.

Configuring gnrt_config.toml

Alongside Cargo.toml is gnrt_config.toml. This contains Chromium-specific extensions to crate handling.

If you add a new crate, you should specify at least the group. This is one of:

# 'safe': The library satisfies the rule-of-2 and can be used in any process. # 'sandbox': The library does not satisfy the rule-of-2 and must be used in # a sandboxed process such as the renderer or a utility process. # 'test': The library is only used in tests.

For instance,

[crate.my-new-crate] group = 'test' # only used in test code

Depending on the crate source code layout, you may also need to use this file to specify where its LICENSE file(s) can be found.

Later, we’ll see some other things you will need to configure in this file to resolve problems.

Downloading Crates

A tool called gnrt knows how to download crates and how to generate BUILD.gn rules.

To start, download the crate you want like this:

cd chromium/src vpython3 tools/crates/run_gnrt.py -- vendor

Although the gnrt tool is part of the Chromium source code, by running this command you will be downloading and running its dependencies from crates.io. See the earlier section discussing this security decision.

This vendor command may download:

  • Your crate
  • Direct and transitive dependencies
  • New versions of other crates, as required by cargo to resolve the complete set of crates required by Chromium.

Chromium maintains patches for some crates, kept in //third_party/rust/chromium_crates_io/patches. These will be reapplied automatically, but if patching fails you may need to take manual action.

Generating gn Build Rules

Once you’ve downloaded the crate, generate the BUILD.gn files like this:

vpython3 tools/crates/run_gnrt.py -- gen

Now run git status. You should find:

  • At least one new crate source code in third_party/rust/chromium_crates_io/vendor
  • At least one new BUILD.gn in third_party/rust/<crate name>/v<major semver version>
  • An appropriate README.chromium

The “major semver version” is a Rust “semver” version number.

Take a close look, especially at the things generated in third_party/rust.

Speaker Notes

Talk a little about semver — and specifically the way that in Chromium it’s to allow multiple incompatible versions of a crate, which is discouraged but sometimes necessary in the Cargo ecosystem.

Resolving Problems

If your build fails, it may be because of a build.rs: programs which do arbitrary things at build time. This is fundamentally at odds with the design of gn and ninja which aim for static, deterministic, build rules to maximize parallelism and repeatability of builds.

Some build.rs actions are automatically supported; others require action:

build script effectSupported by our gn templatesWork required by you
Checking rustc version to configure features on and offYesNone
Checking platform or CPU to configure features on and offYesNone
Generating codeYesYes - specify in gnrt_config.toml
Building C/C++NoPatch around it
Arbitrary other actionsNoPatch around it

Fortunately, most crates don’t contain a build script, and fortunately, most build scripts only do the top two actions.

Build Scripts Which Generate Code

If ninja complains about missing files, check the build.rs to see if it writes source code files.

If so, modify gnrt_config.toml to add build-script-outputs to the crate. If this is a transitive dependency, that is, one on which Chromium code should not directly depend, also add allow-first-party-usage=false. There are several examples already in that file:

[crate.unicode-linebreak] allow-first-party-usage = false build-script-outputs = ["tables.rs"]

Now rerun gnrt.py -- gen to regenerate BUILD.gn files to inform ninja that this particular output file is input to subsequent build steps.

Build Scripts Which Build C++ or Take Arbitrary Actions

Some crates use the cc crate to build and link C/C++ libraries. Other crates parse C/C++ using bindgen within their build scripts. These actions can’t be supported in a Chromium context — our gn, ninja and LLVM build system is very specific in expressing relationships between build actions.

So, your options are:

  • Avoid these crates
  • Apply a patch to the crate.

Patches should be kept in third_party/rust/chromium_crates_io/patches/<crate> - see for example the patches against the cxx crate - and will be applied automatically by gnrt each time it upgrades the crate.

Depending on a Crate

Once you’ve added a third-party crate and generated build rules, depending on a crate is simple. Find your rust_static_library target, and add a dep on the :lib target within your crate.

Specifically,

cratenamemajorsemverversion//third_party/rust/v:lib

For instance,

rust_static_library("my_rust_lib") { crate_root = "lib.rs" sources = [ "lib.rs" ] deps = [ "//third_party/rust/example_rust_crate/v1:lib" ] }

Auditing Third Party Crates

Adding new libraries is subject to Chromium’s standard policies, but of course also subject to security review. As you may be bringing in not just a single crate but also transitive dependencies, there may be a lot of code to review. On the other hand, safe Rust code can have limited negative side effects. How should you review it?

Over time Chromium aims to move to a process based around cargo vet.

Meanwhile, for each new crate addition, we are checking for the following:

  • Understand why each crate is used. What’s the relationship between crates? If the build system for each crate contains a build.rs or procedural macros, work out what they’re for. Are they compatible with the way Chromium is normally built?
  • Check each crate seems to be reasonably well maintained
  • Use cd third-party/rust/chromium_crates_io; cargo audit to check for known vulnerabilities (first you’ll need to cargo install cargo-audit, which ironically involves downloading lots of dependencies from the internet2)
  • Ensure any unsafe code is good enough for the Rule of Two
  • Check for any use of fs or net APIs
  • Read all the code at a sufficient level to look for anything out of place that might have been maliciously inserted. (You can’t realistically aim for 100% perfection here: there’s often just too much code.)

These are just guidelines — work with reviewers from security@chromium.org to work out the right way to become confident of the crate.

Checking Crates into Chromium Source Code

git status should reveal:

  • Crate code in //third_party/rust/chromium_crates_io
  • Metadata (BUILD.gn and README.chromium) in //third_party/rust/<crate>/<version>

Please also add an OWNERS file in the latter location.

You should land all this, along with your Cargo.toml and gnrt_config.toml changes, into the Chromium repo.

Important: you need to use git add -f because otherwise .gitignore files may result in some files being skipped.

As you do so, you might find presubmit checks fail because of non-inclusive language. This is because Rust crate data tends to include names of git branches, and many projects still use non-inclusive terminology there. So you may need to run:

infra/update_inclusive_language_presubmit_exempt_dirs.sh > infra/inclusive_language_presubmit_exempt_dirs.txt git add -p infra/inclusive_language_presubmit_exempt_dirs.txt # add whatever changes are yours

Keeping Crates Up to Date

As the OWNER of any third party Chromium dependency, you are expected to keep it up to date with any security fixes. It is hoped that we will soon automate this for Rust crates, but for now, it’s still your responsibility just as it is for any other third party dependency.

Exercise

Add uwuify to Chromium, turning off the crate’s default features. Assume that the crate will be used in shipping Chromium, but won’t be used to handle untrustworthy input.

(In the next exercise we’ll use uwuify from Chromium, but feel free to skip ahead and do that now if you like. Or, you could create a new rust_executable target which uses uwuify).

Speaker Notes

Students will need to download lots of transitive dependencies.

The total crates needed are:

  • instant,
  • lock_api,
  • parking_lot,
  • parking_lot_core,
  • redox_syscall,
  • scopeguard,
  • smallvec, and
  • uwuify.

If students are downloading even more than that, they probably forgot to turn off the default features.

Thanks to Daniel Liu for this crate!

Bringing It Together — Exercise

In this exercise, you’re going to add a whole new Chromium feature, bringing together everything you already learned.

The Brief from Product Management

A community of pixies has been discovered living in a remote rainforest. It’s important that we get Chromium for Pixies delivered to them as soon as possible.

The requirement is to translate all Chromium’s UI strings into Pixie language.

There’s not time to wait for proper translations, but fortunately pixie language is very close to English, and it turns out there’s a Rust crate which does the translation.

In fact, you already imported that crate in the previous exercise.

(Obviously, real translations of Chrome require incredible care and diligence. Don’t ship this!)

Steps

Modify ResourceBundle::MaybeMangleLocalizedString so that it uwuifies all strings before display. In this special build of Chromium, it should always do this irrespective of the setting of mangle_localized_strings_.

If you’ve done everything right across all these exercises, congratulations, you should have created Chrome for pixies!

Chromium UI screenshot with uwu language

Speaker Notes

Students will likely need some hints here. Hints include:
  • UTF16 vs UTF8. Students should be aware that Rust strings are always UTF8, and will probably decide that it’s better to do the conversion on the C++ side using base::UTF16ToUTF8 and back again.
  • If students decide to do the conversion on the Rust side, they’ll need to consider String::from_utf16, consider error handling, and consider which CXX supported types can transfer a lot of u16s.
  • Students may design the C++/Rust boundary in several different ways, e.g. taking and returning strings by value, or taking a mutable reference to a string. If a mutable reference is used, CXX will likely tell the student that they need to use Pin. You may need to explain what Pin does, and then explain why CXX needs it for mutable references to C++ data: the answer is that C++ data can’t be moved around like Rust data, because it may contain self-referential pointers.
  • The C++ target containing ResourceBundle::MaybeMangleLocalizedString will need to depend on a rust_static_library target. The student probably already did this.
  • The rust_static_library target will need to depend on //third_party/rust/uwuify/v0_2:lib.

Exercise Solutions

Solutions to the Chromium exercises can be found in this series of CLs.

Welcome to Bare Metal Rust

This is a standalone one-day course about bare-metal Rust, aimed at people who are familiar with the basics of Rust (perhaps from completing the Comprehensive Rust course), and ideally also have some experience with bare-metal programming in some other language such as C.

Today we will talk about ‘bare-metal’ Rust: running Rust code without an OS underneath us. This will be divided into several parts:

  • What is no_std Rust?
  • Writing firmware for microcontrollers.
  • Writing bootloader / kernel code for application processors.
  • Some useful crates for bare-metal Rust development.

For the microcontroller part of the course we will use the BBC micro:bit v2 as an example. It’s a development board based on the Nordic nRF51822 microcontroller with some LEDs and buttons, an I2C-connected accelerometer and compass, and an on-board SWD debugger.

To get started, install some tools we’ll need later. On gLinux or Debian:

sudo apt install gcc-aarch64-linux-gnu gdb-multiarch libudev-dev picocom pkg-config qemu-system-arm rustup update rustup target add aarch64-unknown-none thumbv7em-none-eabihf rustup component add llvm-tools-preview cargo install cargo-binutils cargo-embed

And give users in the plugdev group access to the micro:bit programmer:

echo 'SUBSYSTEM=="usb", ATTR{idVendor}=="0d28", MODE="0664", GROUP="plugdev"' |\ sudo tee /etc/udev/rules.d/50-microbit.rules sudo udevadm control --reload-rules

On MacOS:

xcode-select --install brew install gdb picocom qemu brew install --cask gcc-aarch64-embedded rustup update rustup target add aarch64-unknown-none thumbv7em-none-eabihf rustup component add llvm-tools-preview cargo install cargo-binutils cargo-embed

no_std

core

alloc

std

  • Slices, &str, CStr
  • NonZeroU8
  • Option, Result
  • Display, Debug, write!
  • Iterator
  • panic!, assert_eq!
  • NonNull and all the usual pointer-related functions
  • Future and async/await
  • fence, AtomicBool, AtomicPtr, AtomicU32
  • Duration
  • Box, Cow, Arc, Rc
  • Vec, BinaryHeap, BtreeMap, LinkedList, VecDeque
  • String, CString, format!
  • Error
  • HashMap
  • Mutex, Condvar, Barrier, Once, RwLock, mpsc
  • File and the rest of fs
  • println!, Read, Write, Stdin, Stdout and the rest of io
  • Path, OsString
  • net
  • Command, Child, ExitCode
  • spawn, sleep and the rest of thread
  • SystemTime, Instant

Speaker Notes

  • HashMap depends on RNG.
  • std re-exports the contents of both core and alloc.

A minimal no_std program

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • This will compile to an empty binary.
  • std provides a panic handler; without it we must provide our own.
  • It can also be provided by another crate, such as panic-halt.
  • Depending on the target, you may need to compile with panic = "abort" to avoid an error about eh_personality.
  • Note that there is no main or any other entry point; it’s up to you to define your own entry point. This will typically involve a linker script and some assembly code to set things up ready for Rust code to run.

alloc

To use alloc you must implement a global (heap) allocator.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • buddy_system_allocator is a third-party crate implementing a basic buddy system allocator. Other crates are available, or you can write your own or hook into your existing allocator.
  • The const parameter of LockedHeap is the max order of the allocator; i.e. in this case it can allocate regions of up to 2**32 bytes.
  • If any crate in your dependency tree depends on alloc then you must have exactly one global allocator defined in your binary. Usually this is done in the top-level binary crate.
  • extern crate panic_halt as _ is necessary to ensure that the panic_halt crate is linked in so we get its panic handler.
  • This example will build but not run, as it doesn’t have an entry point.

Microcontrollers

The cortex_m_rt crate provides (among other things) a reset handler for Cortex M microcontrollers.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Next we’ll look at how to access peripherals, with increasing levels of abstraction.

Speaker Notes

  • The cortex_m_rt::entry macro requires that the function have type fn() -> !, because returning to the reset handler doesn’t make sense.
  • Run the example with cargo embed --bin minimal

Raw MMIO

Most microcontrollers access peripherals via memory-mapped IO. Let’s try turning on an LED on our micro:bit:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • GPIO 0 pin 21 is connected to the first column of the LED matrix, and pin 28 to the first row.

Run the example with:

cargo embed --bin mmio

Peripheral Access Crates

svd2rust generates mostly-safe Rust wrappers for memory-mapped peripherals from CMSIS-SVD files.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • SVD (System View Description) files are XML files typically provided by silicon vendors which describe the memory map of the device.
    • They are organised by peripheral, register, field and value, with names, descriptions, addresses and so on.
    • SVD files are often buggy and incomplete, so there are various projects which patch the mistakes, add missing details, and publish the generated crates.
  • cortex-m-rt provides the vector table, among other things.
  • If you cargo install cargo-binutils then you can run cargo objdump --bin pac -- -d --no-show-raw-insn to see the resulting binary.

Run the example with:

cargo embed --bin pac

HAL crates

HAL crates for many microcontrollers provide wrappers around various peripherals. These generally implement traits from embedded-hal.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • set_low and set_high are methods on the embedded_hal OutputPin trait.
  • HAL crates exist for many Cortex-M and RISC-V devices, including various STM32, GD32, nRF, NXP, MSP430, AVR and PIC microcontrollers.

Run the example with:

cargo embed --bin hal

Board support crates

Board support crates provide a further level of wrapping for a specific board for convenience.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • In this case the board support crate is just providing more useful names, and a bit of initialisation.
  • The crate may also include drivers for some on-board devices outside of the microcontroller itself.
    • microbit-v2 includes a simple driver for the LED matrix.

Run the example with:

cargo embed --bin board_support

The type state pattern

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Pins don’t implement Copy or Clone, so only one instance of each can exist. Once a pin is moved out of the port struct nobody else can take it.
  • Changing the configuration of a pin consumes the old pin instance, so you can’t keep use the old instance afterwards.
  • The type of a value indicates the state that it is in: e.g. in this case, the configuration state of a GPIO pin. This encodes the state machine into the type system, and ensures that you don’t try to use a pin in a certain way without properly configuring it first. Illegal state transitions are caught at compile time.
  • You can call is_high on an input pin and set_high on an output pin, but not vice-versa.
  • Many HAL crates follow this pattern.

embedded-hal

The embedded-hal crate provides a number of traits covering common microcontroller peripherals:

  • GPIO
  • PWM
  • Delay timers
  • I2C and SPI buses and devices

Similar traits for byte streams (e.g. UARTs), CAN buses and RNGs and broken out into embedded-io, embedded-can and rand_core respectively.

Other crates then implement drivers in terms of these traits, e.g. an accelerometer driver might need an I2C or SPI device instance.

Speaker Notes

  • The traits cover using the peripherals but not initialising or configuring them, as initialisation and configuration is usually highly platform-specific.
  • There are implementations for many microcontrollers, as well as other platforms such as Linux on Raspberry Pi.
  • embedded-hal-async provides async versions of the traits.
  • embedded-hal-nb provides another approach to non-blocking I/O, based on the nb crate.

probe-rs and cargo-embed

probe-rs is a handy toolset for embedded debugging, like OpenOCD but better integrated.

  • SWD (Serial Wire Debug) and JTAG via CMSIS-DAP, ST-Link and J-Link probes
  • GDB stub and Microsoft DAP (Debug Adapter Protocol) server
  • Cargo integration

cargo-embed is a cargo subcommand to build and flash binaries, log RTT (Real Time Transfers) output and connect GDB. It’s configured by an Embed.toml file in your project directory.

Speaker Notes

  • CMSIS-DAP is an Arm standard protocol over USB for an in-circuit debugger to access the CoreSight Debug Access Port of various Arm Cortex processors. It’s what the on-board debugger on the BBC micro:bit uses.
  • ST-Link is a range of in-circuit debuggers from ST Microelectronics, J-Link is a range from SEGGER.
  • The Debug Access Port is usually either a 5-pin JTAG interface or 2-pin Serial Wire Debug.
  • probe-rs is a library which you can integrate into your own tools if you want to.
  • The Microsoft Debug Adapter Protocol lets VSCode and other IDEs debug code running on any supported microcontroller.
  • cargo-embed is a binary built using the probe-rs library.
  • RTT (Real Time Transfers) is a mechanism to transfer data between the debug host and the target through a number of ringbuffers.

Debugging

Embed.toml:

[default.general] chip = "nrf52833_xxAA" [debug.gdb] enabled = true

In one terminal under src/bare-metal/microcontrollers/examples/:

cargo embed --bin board_support debug

In another terminal in the same directory:

On gLinux or Debian:

gdb-multiarch target/thumbv7em-none-eabihf/debug/board_support --eval-command="target remote :1337"

On MacOS:

arm-none-eabi-gdb target/thumbv7em-none-eabihf/debug/board_support --eval-command="target remote :1337"

Speaker Notes

In GDB, try running:

b src/bin/board_support.rs:29 b src/bin/board_support.rs:30 b src/bin/board_support.rs:32 c c c

Other projects

  • RTIC
    • “Real-Time Interrupt-driven Concurrency”
    • Shared resource management, message passing, task scheduling, timer queue
  • Embassy
    • async executors with priorities, timers, networking, USB
  • TockOS
    • Security-focused RTOS with preemptive scheduling and Memory Protection Unit support
  • Hubris
    • Microkernel RTOS from Oxide Computer Company with memory protection, unprivileged drivers, IPC
  • Bindings for FreeRTOS
  • Some platforms have std implementations, e.g. esp-idf.

Speaker Notes

  • RTIC can be considered either an RTOS or a concurrency framework.
    • It doesn’t include any HALs.
    • It uses the Cortex-M NVIC (Nested Virtual Interrupt Controller) for scheduling rather than a proper kernel.
    • Cortex-M only.
  • Google uses TockOS on the Haven microcontroller for Titan security keys.
  • FreeRTOS is mostly written in C, but there are Rust bindings for writing applications.

Exercises

We will read the direction from an I2C compass, and log the readings to a serial port.

Speaker Notes

After looking at the exercises, you can look at the solutions provided.

Compass

We will read the direction from an I2C compass, and log the readings to a serial port. If you have time, try displaying it on the LEDs somehow too, or use the buttons somehow.

Hints:

  • Check the documentation for the lsm303agr and microbit-v2 crates, as well as the micro:bit hardware.
  • The LSM303AGR Inertial Measurement Unit is connected to the internal I2C bus.
  • TWI is another name for I2C, so the I2C master peripheral is called TWIM.
  • The LSM303AGR driver needs something implementing the embedded_hal::blocking::i2c::WriteRead trait. The microbit::hal::Twim struct implements this.
  • You have a microbit::Board struct with fields for the various pins and peripherals.
  • You can also look at the nRF52833 datasheet if you want, but it shouldn’t be necessary for this exercise.

Download the exercise template and look in the compass directory for the following files.

src/main.rs:

#![no_main] #![no_std] extern crate panic_halt as _; use core::fmt::Write; use cortex_m_rt::entry; use microbit::{hal::uarte::{Baudrate, Parity, Uarte}, Board}; #[entry] fn main() -> ! { let mut board = Board::take().unwrap(); // Configure serial port. let mut serial = Uarte::new( board.UARTE0, board.uart.into(), Parity::EXCLUDED, Baudrate::BAUD115200, ); // Use the system timer as a delay provider. let mut delay = Delay::new(board.SYST); // Set up the I2C controller and Inertial Measurement Unit. // TODO writeln!(serial, "Ready.").unwrap(); loop { // Read compass data and log it to the serial port. // TODO } }

Cargo.toml (you shouldn’t need to change this):

[workspace] [package] name = "compass" version = "0.1.0" edition = "2021" publish = false [dependencies] cortex-m-rt = "0.7.4" embedded-hal = "1.0.0" lsm303agr = "1.0.0" microbit-v2 = "0.14.0" panic-halt = "0.2.0"

Embed.toml (you shouldn’t need to change this):

[default.general] chip = "nrf52833_xxAA" [debug.gdb] enabled = true [debug.reset] halt_afterwards = true

.cargo/config.toml (you shouldn’t need to change this):

[build] target = "thumbv7em-none-eabihf" # Cortex-M4F [target.'cfg(all(target_arch = "arm", target_os = "none"))'] rustflags = ["-C", "link-arg=-Tlink.x"]

See the serial output on Linux with:

picocom --baud 115200 --imap lfcrlf /dev/ttyACM0

Or on Mac OS something like (the device name may be slightly different):

picocom --baud 115200 --imap lfcrlf /dev/tty.usbmodem14502

Use Ctrl+A Ctrl+Q to quit picocom.

Bare Metal Rust Morning Exercise

Compass

(back to exercise)

#![no_main] #![no_std] extern crate panic_halt as _; use core::fmt::Write; use cortex_m_rt::entry; use core::cmp::{max, min}; use embedded_hal::digital::InputPin; use lsm303agr::{ AccelMode, AccelOutputDataRate, Lsm303agr, MagMode, MagOutputDataRate, }; use microbit::display::blocking::Display; use microbit::hal::twim::Twim; use microbit::hal::uarte::{Baudrate, Parity, Uarte}; use microbit::hal::{Delay, Timer}; use microbit::pac::twim0::frequency::FREQUENCY_A; use microbit::Board; const COMPASS_SCALE: i32 = 30000; const ACCELEROMETER_SCALE: i32 = 700; #[entry] fn main() -> ! { let mut board = Board::take().unwrap(); // Configure serial port. let mut serial = Uarte::new( board.UARTE0, board.uart.into(), Parity::EXCLUDED, Baudrate::BAUD115200, ); // Use the system timer as a delay provider. let mut delay = Delay::new(board.SYST); // Set up the I2C controller and Inertial Measurement Unit. writeln!(serial, "Setting up IMU...").unwrap(); let i2c = Twim::new(board.TWIM0, board.i2c_internal.into(), FREQUENCY_A::K100); let mut imu = Lsm303agr::new_with_i2c(i2c); imu.init().unwrap(); imu.set_mag_mode_and_odr( &mut delay, MagMode::HighResolution, MagOutputDataRate::Hz50, ) .unwrap(); imu.set_accel_mode_and_odr( &mut delay, AccelMode::Normal, AccelOutputDataRate::Hz50, ) .unwrap(); let mut imu = imu.into_mag_continuous().ok().unwrap(); // Set up display and timer. let mut timer = Timer::new(board.TIMER0); let mut display = Display::new(board.display_pins); let mut mode = Mode::Compass; let mut button_pressed = false; writeln!(serial, "Ready.").unwrap(); loop { // Read compass data and log it to the serial port. while !(imu.mag_status().unwrap().xyz_new_data() && imu.accel_status().unwrap().xyz_new_data()) {} let compass_reading = imu.magnetic_field().unwrap(); let accelerometer_reading = imu.acceleration().unwrap(); writeln!( serial, "{},{},{}\t{},{},{}", compass_reading.x_nt(), compass_reading.y_nt(), compass_reading.z_nt(), accelerometer_reading.x_mg(), accelerometer_reading.y_mg(), accelerometer_reading.z_mg(), ) .unwrap(); let mut image = [[0; 5]; 5]; let (x, y) = match mode { Mode::Compass => ( scale(-compass_reading.x_nt(), -COMPASS_SCALE, COMPASS_SCALE, 0, 4) as usize, scale(compass_reading.y_nt(), -COMPASS_SCALE, COMPASS_SCALE, 0, 4) as usize, ), Mode::Accelerometer => ( scale( accelerometer_reading.x_mg(), -ACCELEROMETER_SCALE, ACCELEROMETER_SCALE, 0, 4, ) as usize, scale( -accelerometer_reading.y_mg(), -ACCELEROMETER_SCALE, ACCELEROMETER_SCALE, 0, 4, ) as usize, ), }; image[y][x] = 255; display.show(&mut timer, image, 100); // If button A is pressed, switch to the next mode and briefly blink all LEDs // on. if board.buttons.button_a.is_low().unwrap() { if !button_pressed { mode = mode.next(); display.show(&mut timer, [[255; 5]; 5], 200); } button_pressed = true; } else { button_pressed = false; } } } #[derive(Copy, Clone, Debug, Eq, PartialEq)] enum Mode { Compass, Accelerometer, } impl Mode { fn next(self) -> Self { match self { Self::Compass => Self::Accelerometer, Self::Accelerometer => Self::Compass, } } } fn scale(value: i32, min_in: i32, max_in: i32, min_out: i32, max_out: i32) -> i32 { let range_in = max_in - min_in; let range_out = max_out - min_out; cap(min_out + range_out * (value - min_in) / range_in, min_out, max_out) } fn cap(value: i32, min_value: i32, max_value: i32) -> i32 { max(min_value, min(value, max_value)) }

Application processors

So far we’ve talked about microcontrollers, such as the Arm Cortex-M series. Now let’s try writing something for Cortex-A. For simplicity we’ll just work with QEMU’s aarch64 ‘virt’ board.

Speaker Notes

  • Broadly speaking, microcontrollers don’t have an MMU or multiple levels of privilege (exception levels on Arm CPUs, rings on x86), while application processors do.
  • QEMU supports emulating various different machines or board models for each architecture. The ‘virt’ board doesn’t correspond to any particular real hardware, but is designed purely for virtual machines.

Getting Ready to Rust

Before we can start running Rust code, we need to do some initialisation.

.section .init.entry, "ax" .global entry entry: /* * Load and apply the memory management configuration, ready to enable MMU and * caches. */ adrp x30, idmap msr ttbr0_el1, x30 mov_i x30, .Lmairval msr mair_el1, x30 mov_i x30, .Ltcrval /* Copy the supported PA range into TCR_EL1.IPS. */ mrs x29, id_aa64mmfr0_el1 bfi x30, x29, #32, #4 msr tcr_el1, x30 mov_i x30, .Lsctlrval /* * Ensure everything before this point has completed, then invalidate any * potentially stale local TLB entries before they start being used. */ isb tlbi vmalle1 ic iallu dsb nsh isb /* * Configure sctlr_el1 to enable MMU and cache and don't proceed until this * has completed. */ msr sctlr_el1, x30 isb /* Disable trapping floating point access in EL1. */ mrs x30, cpacr_el1 orr x30, x30, #(0x3 << 20) msr cpacr_el1, x30 isb /* Zero out the bss section. */ adr_l x29, bss_begin adr_l x30, bss_end 0: cmp x29, x30 b.hs 1f stp xzr, xzr, [x29], #16 b 0b 1: /* Prepare the stack. */ adr_l x30, boot_stack_end mov sp, x30 /* Set up exception vector. */ adr x30, vector_table_el1 msr vbar_el1, x30 /* Call into Rust code. */ bl main /* Loop forever waiting for interrupts. */ 2: wfi b 2b

Speaker Notes

  • This is the same as it would be for C: initialising the processor state, zeroing the BSS, and setting up the stack pointer.
    • The BSS (block starting symbol, for historical reasons) is the part of the object file which containing statically allocated variables which are initialised to zero. They are omitted from the image, to avoid wasting space on zeroes. The compiler assumes that the loader will take care of zeroing them.
  • The BSS may already be zeroed, depending on how memory is initialised and the image is loaded, but we zero it to be sure.
  • We need to enable the MMU and cache before reading or writing any memory. If we don’t:
    • Unaligned accesses will fault. We build the Rust code for the aarch64-unknown-none target which sets +strict-align to prevent the compiler generating unaligned accesses, so it should be fine in this case, but this is not necessarily the case in general.
    • If it were running in a VM, this can lead to cache coherency issues. The problem is that the VM is accessing memory directly with the cache disabled, while the host has cacheable aliases to the same memory. Even if the host doesn’t explicitly access the memory, speculative accesses can lead to cache fills, and then changes from one or the other will get lost when the cache is cleaned or the VM enables the cache. (Cache is keyed by physical address, not VA or IPA.)
  • For simplicity, we just use a hardcoded pagetable (see idmap.S) which identity maps the first 1 GiB of address space for devices, the next 1 GiB for DRAM, and another 1 GiB higher up for more devices. This matches the memory layout that QEMU uses.
  • We also set up the exception vector (vbar_el1), which we’ll see more about later.
  • All examples this afternoon assume we will be running at exception level 1 (EL1). If you need to run at a different exception level you’ll need to modify entry.S accordingly.

Inline assembly

Sometimes we need to use assembly to do things that aren’t possible with Rust code. For example, to make an HVC (hypervisor call) to tell the firmware to power off the system:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

(If you actually want to do this, use the smccc crate which has wrappers for all these functions.)

Speaker Notes

  • PSCI is the Arm Power State Coordination Interface, a standard set of functions to manage system and CPU power states, among other things. It is implemented by EL3 firmware and hypervisors on many systems.
  • The 0 => _ syntax means initialise the register to 0 before running the inline assembly code, and ignore its contents afterwards. We need to use inout rather than in because the call could potentially clobber the contents of the registers.
  • This main function needs to be #[no_mangle] and extern "C" because it is called from our entry point in entry.S.
  • _x0_x3 are the values of registers x0x3, which are conventionally used by the bootloader to pass things like a pointer to the device tree. According to the standard aarch64 calling convention (which is what extern "C" specifies to use), registers x0x7 are used for the first 8 arguments passed to a function, so entry.S doesn’t need to do anything special except make sure it doesn’t change these registers.
  • Run the example in QEMU with make qemu_psci under src/bare-metal/aps/examples.

Volatile memory access for MMIO

  • Use pointer::read_volatile and pointer::write_volatile.
  • Never hold a reference.
  • addr_of! lets you get fields of structs without creating an intermediate reference.

Speaker Notes

  • Volatile access: read or write operations may have side-effects, so prevent the compiler or hardware from reordering, duplicating or eliding them.
    • Usually if you write and then read, e.g. via a mutable reference, the compiler may assume that the value read is the same as the value just written, and not bother actually reading memory.
  • Some existing crates for volatile access to hardware do hold references, but this is unsound. Whenever a reference exist, the compiler may choose to dereference it.
  • Use the addr_of! macro to get struct field pointers from a pointer to the struct.

Let’s write a UART driver

The QEMU ‘virt’ machine has a PL011 UART, so let’s write a driver for that.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Note that Uart::new is unsafe while the other methods are safe. This is because as long as the caller of Uart::new guarantees that its safety requirements are met (i.e. that there is only ever one instance of the driver for a given UART, and nothing else aliasing its address space), then it is always safe to call write_byte later because we can assume the necessary preconditions.
  • We could have done it the other way around (making new safe but write_byte unsafe), but that would be much less convenient to use as every place that calls write_byte would need to reason about the safety
  • This is a common pattern for writing safe wrappers of unsafe code: moving the burden of proof for soundness from a large number of places to a smaller number of places.

More traits

We derived the Debug trait. It would be useful to implement a few more traits too.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Implementing Write lets us use the write! and writeln! macros with our Uart type.
  • Run the example in QEMU with make qemu_minimal under src/bare-metal/aps/examples.

A better UART driver

The PL011 actually has a bunch more registers, and adding offsets to construct pointers to access them is error-prone and hard to read. Plus, some of them are bit fields which would be nice to access in a structured way.

OffsetRegister nameWidth
0x00DR12
0x04RSR4
0x18FR9
0x20ILPR8
0x24IBRD16
0x28FBRD6
0x2cLCR_H8
0x30CR16
0x34IFLS6
0x38IMSC11
0x3cRIS11
0x40MIS11
0x44ICR11
0x48DMACR3

Speaker Notes

  • There are also some ID registers which have been omitted for brevity.

Bitflags

The bitflags crate is useful for working with bitflags.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • The bitflags! macro creates a newtype something like Flags(u16), along with a bunch of method implementations to get and set flags.

Multiple registers

We can use a struct to represent the memory layout of the UART’s registers.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • #[repr(C)] tells the compiler to lay the struct fields out in order, following the same rules as C. This is necessary for our struct to have a predictable layout, as default Rust representation allows the compiler to (among other things) reorder fields however it sees fit.

Driver

Now let’s use the new Registers struct in our driver.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Note the use of addr_of! / addr_of_mut! to get pointers to individual fields without creating an intermediate reference, which would be unsound.

Using it

Let’s write a small program using our driver to write to the serial console, and echo incoming bytes.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • As in the inline assembly example, this main function is called from our entry point code in entry.S. See the speaker notes there for details.
  • Run the example in QEMU with make qemu under src/bare-metal/aps/examples.

Logging

It would be nice to be able to use the logging macros from the log crate. We can do this by implementing the Log trait.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • The unwrap in log is safe because we initialise LOGGER before calling set_logger.

Using it

We need to initialise the logger before we use it.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Note that our panic handler can now log details of panics.
  • Run the example in QEMU with make qemu_logger under src/bare-metal/aps/examples.

Exceptions

AArch64 defines an exception vector table with 16 entries, for 4 types of exceptions (synchronous, IRQ, FIQ, SError) from 4 states (current EL with SP0, current EL with SPx, lower EL using AArch64, lower EL using AArch32). We implement this in assembly to save volatile registers to the stack before calling into Rust code:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • EL is exception level; all our examples this afternoon run in EL1.
  • For simplicity we aren’t distinguishing between SP0 and SPx for the current EL exceptions, or between AArch32 and AArch64 for the lower EL exceptions.
  • For this example we just log the exception and power down, as we don’t expect any of them to actually happen.
  • We can think of exception handlers and our main execution context more or less like different threads. Send and Sync will control what we can share between them, just like with threads. For example, if we want to share some value between exception handlers and the rest of the program, and it’s Send but not Sync, then we’ll need to wrap it in something like a Mutex and put it in a static.

Other projects

  • oreboot
    • “coreboot without the C”
    • Supports x86, aarch64 and RISC-V.
    • Relies on LinuxBoot rather than having many drivers itself.
  • Rust RaspberryPi OS tutorial
    • Initialisation, UART driver, simple bootloader, JTAG, exception levels, exception handling, page tables
    • Some dodginess around cache maintenance and initialisation in Rust, not necessarily a good example to copy for production code.
  • cargo-call-stack
    • Static analysis to determine maximum stack usage.

Speaker Notes

  • The RaspberryPi OS tutorial runs Rust code before the MMU and caches are enabled. This will read and write memory (e.g. the stack). However:
    • Without the MMU and cache, unaligned accesses will fault. It builds with aarch64-unknown-none which sets +strict-align to prevent the compiler generating unaligned accesses so it should be alright, but this is not necessarily the case in general.
    • If it were running in a VM, this can lead to cache coherency issues. The problem is that the VM is accessing memory directly with the cache disabled, while the host has cacheable aliases to the same memory. Even if the host doesn’t explicitly access the memory, speculative accesses can lead to cache fills, and then changes from one or the other will get lost. Again this is alright in this particular case (running directly on the hardware with no hypervisor), but isn’t a good pattern in general.

Useful crates

We’ll go over a few crates which solve some common problems in bare-metal programming.

zerocopy

The zerocopy crate (from Fuchsia) provides traits and macros for safely converting between byte sequences and other types.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is not suitable for MMIO (as it doesn’t use volatile reads and writes), but can be useful for working with structures shared with hardware e.g. by DMA, or sent over some external interface.

Speaker Notes

  • FromBytes can be implemented for types for which any byte pattern is valid, and so can safely be converted from an untrusted sequence of bytes.
  • Attempting to derive FromBytes for these types would fail, because RequestType doesn’t use all possible u32 values as discriminants, so not all byte patterns are valid.
  • zerocopy::byteorder has types for byte-order aware numeric primitives.
  • Run the example with cargo run under src/bare-metal/useful-crates/zerocopy-example/. (It won’t run in the Playground because of the crate dependency.)

aarch64-paging

The aarch64-paging crate lets you create page tables according to the AArch64 Virtual Memory System Architecture.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • For now it only supports EL1, but support for other exception levels should be straightforward to add.
  • This is used in Android for the Protected VM Firmware.
  • There’s no easy way to run this example, as it needs to run on real hardware or under QEMU.

buddy_system_allocator

buddy_system_allocator is a third-party crate implementing a basic buddy system allocator. It can be used both for LockedHeap implementing GlobalAlloc so you can use the standard alloc crate (as we saw before), or for allocating other address space. For example, we might want to allocate MMIO space for PCI BARs:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • PCI BARs always have alignment equal to their size.
  • Run the example with cargo run under src/bare-metal/useful-crates/allocator-example/. (It won’t run in the Playground because of the crate dependency.)

tinyvec

Sometimes you want something which can be resized like a Vec, but without heap allocation. tinyvec provides this: a vector backed by an array or slice, which could be statically allocated or on the stack, which keeps track of how many elements are used and panics if you try to use more than are allocated.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • tinyvec requires that the element type implement Default for initialisation.
  • The Rust Playground includes tinyvec, so this example will run fine inline.

spin

std::sync::Mutex and the other synchronisation primitives from std::sync are not available in core or alloc. How can we manage synchronisation or interior mutability, such as for sharing state between different CPUs?

The spin crate provides spinlock-based equivalents of many of these primitives.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • Be careful to avoid deadlock if you take locks in interrupt handlers.
  • spin also has a ticket lock mutex implementation; equivalents of RwLock, Barrier and Once from std::sync; and Lazy for lazy initialisation.
  • The once_cell crate also has some useful types for late initialisation with a slightly different approach to spin::once::Once.
  • The Rust Playground includes spin, so this example will run fine inline.

Android

To build a bare-metal Rust binary in AOSP, you need to use a rust_ffi_static Soong rule to build your Rust code, then a cc_binary with a linker script to produce the binary itself, and then a raw_binary to convert the ELF to a raw binary ready to be run.

rust_ffi_static { name: "libvmbase_example", defaults: ["vmbase_ffi_defaults"], crate_name: "vmbase_example", srcs: ["src/main.rs"], rustlibs: [ "libvmbase", ], } cc_binary { name: "vmbase_example", defaults: ["vmbase_elf_defaults"], srcs: [ "idmap.S", ], static_libs: [ "libvmbase_example", ], linker_scripts: [ "image.ld", ":vmbase_sections", ], } raw_binary { name: "vmbase_example_bin", stem: "vmbase_example.bin", src: ":vmbase_example", enabled: false, target: { android_arm64: { enabled: true, }, }, }

vmbase

For VMs running under crosvm on aarch64, the vmbase library provides a linker script and useful defaults for the build rules, along with an entry point, UART console logging and more.

#![no_main] #![no_std] use vmbase::{main, println}; main!(main); pub fn main(arg0: u64, arg1: u64, arg2: u64, arg3: u64) { println!("Hello world"); }

Speaker Notes

  • The main! macro marks your main function, to be called from the vmbase entry point.
  • The vmbase entry point handles console initialisation, and issues a PSCI_SYSTEM_OFF to shutdown the VM if your main function returns.

Exercises

We will write a driver for the PL031 real-time clock device.

Speaker Notes

After looking at the exercises, you can look at the solutions provided.

RTC driver

The QEMU aarch64 virt machine has a PL031 real-time clock at 0x9010000. For this exercise, you should write a driver for it.

  1. Use it to print the current time to the serial console. You can use the chrono crate for date/time formatting.
  2. Use the match register and raw interrupt status to busy-wait until a given time, e.g. 3 seconds in the future. (Call core::hint::spin_loop inside the loop.)
  3. Extension if you have time: Enable and handle the interrupt generated by the RTC match. You can use the driver provided in the arm-gic crate to configure the Arm Generic Interrupt Controller.
    • Use the RTC interrupt, which is wired to the GIC as IntId::spi(2).
    • Once the interrupt is enabled, you can put the core to sleep via arm_gic::wfi(), which will cause the core to sleep until it receives an interrupt.

Download the exercise template and look in the rtc directory for the following files.

src/main.rs:

#![no_main] #![no_std] mod exceptions; mod logger; mod pl011; use crate::pl011::Uart; use arm_gic::gicv3::GicV3; use core::panic::PanicInfo; use log::{error, info, trace, LevelFilter}; use smccc::psci::system_off; use smccc::Hvc; /// Base addresses of the GICv3. const GICD_BASE_ADDRESS: *mut u64 = 0x800_0000 as _; const GICR_BASE_ADDRESS: *mut u64 = 0x80A_0000 as _; /// Base address of the primary PL011 UART. const PL011_BASE_ADDRESS: *mut u32 = 0x900_0000 as _; #[no_mangle] extern "C" fn main(x0: u64, x1: u64, x2: u64, x3: u64) { // SAFETY: `PL011_BASE_ADDRESS` is the base address of a PL011 device, and // nothing else accesses that address range. let uart = unsafe { Uart::new(PL011_BASE_ADDRESS) }; logger::init(uart, LevelFilter::Trace).unwrap(); info!("main({:#x}, {:#x}, {:#x}, {:#x})", x0, x1, x2, x3); // SAFETY: `GICD_BASE_ADDRESS` and `GICR_BASE_ADDRESS` are the base // addresses of a GICv3 distributor and redistributor respectively, and // nothing else accesses those address ranges. let mut gic = unsafe { GicV3::new(GICD_BASE_ADDRESS, GICR_BASE_ADDRESS) }; gic.setup(); // TODO: Create instance of RTC driver and print current time. // TODO: Wait for 3 seconds. system_off::<Hvc>().unwrap(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { error!("{info}"); system_off::<Hvc>().unwrap(); loop {} }

src/exceptions.rs (you should only need to change this for the 3rd part of the exercise):

#![allow(unused)] fn main() { // Copyright 2023 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. use arm_gic::gicv3::GicV3; use log::{error, info, trace}; use smccc::psci::system_off; use smccc::Hvc; #[no_mangle] extern "C" fn sync_exception_current(_elr: u64, _spsr: u64) { error!("sync_exception_current"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn irq_current(_elr: u64, _spsr: u64) { trace!("irq_current"); let intid = GicV3::get_and_acknowledge_interrupt().expect("No pending interrupt"); info!("IRQ {intid:?}"); } #[no_mangle] extern "C" fn fiq_current(_elr: u64, _spsr: u64) { error!("fiq_current"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn serr_current(_elr: u64, _spsr: u64) { error!("serr_current"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn sync_lower(_elr: u64, _spsr: u64) { error!("sync_lower"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn irq_lower(_elr: u64, _spsr: u64) { error!("irq_lower"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn fiq_lower(_elr: u64, _spsr: u64) { error!("fiq_lower"); system_off::<Hvc>().unwrap(); } #[no_mangle] extern "C" fn serr_lower(_elr: u64, _spsr: u64) { error!("serr_lower"); system_off::<Hvc>().unwrap(); } }

src/logger.rs (you shouldn’t need to change this):

#![allow(unused)] fn main() { // Copyright 2023 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // ANCHOR: main use crate::pl011::Uart; use core::fmt::Write; use log::{LevelFilter, Log, Metadata, Record, SetLoggerError}; use spin::mutex::SpinMutex; static LOGGER: Logger = Logger { uart: SpinMutex::new(None) }; struct Logger { uart: SpinMutex<Option<Uart>>, } impl Log for Logger { fn enabled(&self, _metadata: &Metadata) -> bool { true } fn log(&self, record: &Record) { writeln!( self.uart.lock().as_mut().unwrap(), "[{}] {}", record.level(), record.args() ) .unwrap(); } fn flush(&self) {} } /// Initialises UART logger. pub fn init(uart: Uart, max_level: LevelFilter) -> Result<(), SetLoggerError> { LOGGER.uart.lock().replace(uart); log::set_logger(&LOGGER)?; log::set_max_level(max_level); Ok(()) } }

src/pl011.rs (you shouldn’t need to change this):

#![allow(unused)] fn main() { // Copyright 2023 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. #![allow(unused)] use core::fmt::{self, Write}; use core::ptr::{addr_of, addr_of_mut}; // ANCHOR: Flags use bitflags::bitflags; bitflags! { /// Flags from the UART flag register. #[repr(transparent)] #[derive(Copy, Clone, Debug, Eq, PartialEq)] struct Flags: u16 { /// Clear to send. const CTS = 1 << 0; /// Data set ready. const DSR = 1 << 1; /// Data carrier detect. const DCD = 1 << 2; /// UART busy transmitting data. const BUSY = 1 << 3; /// Receive FIFO is empty. const RXFE = 1 << 4; /// Transmit FIFO is full. const TXFF = 1 << 5; /// Receive FIFO is full. const RXFF = 1 << 6; /// Transmit FIFO is empty. const TXFE = 1 << 7; /// Ring indicator. const RI = 1 << 8; } } // ANCHOR_END: Flags bitflags! { /// Flags from the UART Receive Status Register / Error Clear Register. #[repr(transparent)] #[derive(Copy, Clone, Debug, Eq, PartialEq)] struct ReceiveStatus: u16 { /// Framing error. const FE = 1 << 0; /// Parity error. const PE = 1 << 1; /// Break error. const BE = 1 << 2; /// Overrun error. const OE = 1 << 3; } } // ANCHOR: Registers #[repr(C, align(4))] struct Registers { dr: u16, _reserved0: [u8; 2], rsr: ReceiveStatus, _reserved1: [u8; 19], fr: Flags, _reserved2: [u8; 6], ilpr: u8, _reserved3: [u8; 3], ibrd: u16, _reserved4: [u8; 2], fbrd: u8, _reserved5: [u8; 3], lcr_h: u8, _reserved6: [u8; 3], cr: u16, _reserved7: [u8; 3], ifls: u8, _reserved8: [u8; 3], imsc: u16, _reserved9: [u8; 2], ris: u16, _reserved10: [u8; 2], mis: u16, _reserved11: [u8; 2], icr: u16, _reserved12: [u8; 2], dmacr: u8, _reserved13: [u8; 3], } // ANCHOR_END: Registers // ANCHOR: Uart /// Driver for a PL011 UART. #[derive(Debug)] pub struct Uart { registers: *mut Registers, } impl Uart { /// Constructs a new instance of the UART driver for a PL011 device at the /// given base address. /// /// # Safety /// /// The given base address must point to the MMIO control registers of a /// PL011 device, which must be mapped into the address space of the process /// as device memory and not have any other aliases. pub unsafe fn new(base_address: *mut u32) -> Self { Self { registers: base_address as *mut Registers } } /// Writes a single byte to the UART. pub fn write_byte(&self, byte: u8) { // Wait until there is room in the TX buffer. while self.read_flag_register().contains(Flags::TXFF) {} // SAFETY: We know that self.registers points to the control registers // of a PL011 device which is appropriately mapped. unsafe { // Write to the TX buffer. addr_of_mut!((*self.registers).dr).write_volatile(byte.into()); } // Wait until the UART is no longer busy. while self.read_flag_register().contains(Flags::BUSY) {} } /// Reads and returns a pending byte, or `None` if nothing has been /// received. pub fn read_byte(&self) -> Option<u8> { if self.read_flag_register().contains(Flags::RXFE) { None } else { // SAFETY: We know that self.registers points to the control // registers of a PL011 device which is appropriately mapped. let data = unsafe { addr_of!((*self.registers).dr).read_volatile() }; // TODO: Check for error conditions in bits 8-11. Some(data as u8) } } fn read_flag_register(&self) -> Flags { // SAFETY: We know that self.registers points to the control registers // of a PL011 device which is appropriately mapped. unsafe { addr_of!((*self.registers).fr).read_volatile() } } } // ANCHOR_END: Uart impl Write for Uart { fn write_str(&mut self, s: &str) -> fmt::Result { for c in s.as_bytes() { self.write_byte(*c); } Ok(()) } } // Safe because it just contains a pointer to device memory, which can be // accessed from any context. unsafe impl Send for Uart {} }

Cargo.toml (you shouldn’t need to change this):

[workspace] [package] name = "rtc" version = "0.1.0" edition = "2021" publish = false [dependencies] arm-gic = "0.1.0" bitflags = "2.5.0" chrono = { version = "0.4.38", default-features = false } log = "0.4.21" smccc = "0.1.1" spin = "0.9.8" [build-dependencies] cc = "1.0.98"

build.rs (you shouldn’t need to change this):

// Copyright 2023 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. use cc::Build; use std::env; fn main() { #[cfg(target_os = "linux")] env::set_var("CROSS_COMPILE", "aarch64-linux-gnu"); #[cfg(not(target_os = "linux"))] env::set_var("CROSS_COMPILE", "aarch64-none-elf"); Build::new() .file("entry.S") .file("exceptions.S") .file("idmap.S") .compile("empty") }

entry.S (you shouldn’t need to change this):

/* * Copyright 2023 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * https://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ .macro adr_l, reg:req, sym:req adrp \reg, \sym add \reg, \reg, :lo12:\sym .endm .macro mov_i, reg:req, imm:req movz \reg, :abs_g3:\imm movk \reg, :abs_g2_nc:\imm movk \reg, :abs_g1_nc:\imm movk \reg, :abs_g0_nc:\imm .endm .set .L_MAIR_DEV_nGnRE, 0x04 .set .L_MAIR_MEM_WBWA, 0xff .set .Lmairval, .L_MAIR_DEV_nGnRE | (.L_MAIR_MEM_WBWA << 8) /* 4 KiB granule size for TTBR0_EL1. */ .set .L_TCR_TG0_4KB, 0x0 << 14 /* 4 KiB granule size for TTBR1_EL1. */ .set .L_TCR_TG1_4KB, 0x2 << 30 /* Disable translation table walk for TTBR1_EL1, generating a translation fault instead. */ .set .L_TCR_EPD1, 0x1 << 23 /* Translation table walks for TTBR0_EL1 are inner sharable. */ .set .L_TCR_SH_INNER, 0x3 << 12 /* * Translation table walks for TTBR0_EL1 are outer write-back read-allocate write-allocate * cacheable. */ .set .L_TCR_RGN_OWB, 0x1 << 10 /* * Translation table walks for TTBR0_EL1 are inner write-back read-allocate write-allocate * cacheable. */ .set .L_TCR_RGN_IWB, 0x1 << 8 /* Size offset for TTBR0_EL1 is 2**39 bytes (512 GiB). */ .set .L_TCR_T0SZ_512, 64 - 39 .set .Ltcrval, .L_TCR_TG0_4KB | .L_TCR_TG1_4KB | .L_TCR_EPD1 | .L_TCR_RGN_OWB .set .Ltcrval, .Ltcrval | .L_TCR_RGN_IWB | .L_TCR_SH_INNER | .L_TCR_T0SZ_512 /* Stage 1 instruction access cacheability is unaffected. */ .set .L_SCTLR_ELx_I, 0x1 << 12 /* SP alignment fault if SP is not aligned to a 16 byte boundary. */ .set .L_SCTLR_ELx_SA, 0x1 << 3 /* Stage 1 data access cacheability is unaffected. */ .set .L_SCTLR_ELx_C, 0x1 << 2 /* EL0 and EL1 stage 1 MMU enabled. */ .set .L_SCTLR_ELx_M, 0x1 << 0 /* Privileged Access Never is unchanged on taking an exception to EL1. */ .set .L_SCTLR_EL1_SPAN, 0x1 << 23 /* SETEND instruction disabled at EL0 in aarch32 mode. */ .set .L_SCTLR_EL1_SED, 0x1 << 8 /* Various IT instructions are disabled at EL0 in aarch32 mode. */ .set .L_SCTLR_EL1_ITD, 0x1 << 7 .set .L_SCTLR_EL1_RES1, (0x1 << 11) | (0x1 << 20) | (0x1 << 22) | (0x1 << 28) | (0x1 << 29) .set .Lsctlrval, .L_SCTLR_ELx_M | .L_SCTLR_ELx_C | .L_SCTLR_ELx_SA | .L_SCTLR_EL1_ITD | .L_SCTLR_EL1_SED .set .Lsctlrval, .Lsctlrval | .L_SCTLR_ELx_I | .L_SCTLR_EL1_SPAN | .L_SCTLR_EL1_RES1 /** * This is a generic entry point for an image. It carries out the operations required to prepare the * loaded image to be run. Specifically, it zeroes the bss section using registers x25 and above, * prepares the stack, enables floating point, and sets up the exception vector. It preserves x0-x3 * for the Rust entry point, as these may contain boot parameters. */ .section .init.entry, "ax" .global entry entry: /* Load and apply the memory management configuration, ready to enable MMU and caches. */ adrp x30, idmap msr ttbr0_el1, x30 mov_i x30, .Lmairval msr mair_el1, x30 mov_i x30, .Ltcrval /* Copy the supported PA range into TCR_EL1.IPS. */ mrs x29, id_aa64mmfr0_el1 bfi x30, x29, #32, #4 msr tcr_el1, x30 mov_i x30, .Lsctlrval /* * Ensure everything before this point has completed, then invalidate any potentially stale * local TLB entries before they start being used. */ isb tlbi vmalle1 ic iallu dsb nsh isb /* * Configure sctlr_el1 to enable MMU and cache and don't proceed until this has completed. */ msr sctlr_el1, x30 isb /* Disable trapping floating point access in EL1. */ mrs x30, cpacr_el1 orr x30, x30, #(0x3 << 20) msr cpacr_el1, x30 isb /* Zero out the bss section. */ adr_l x29, bss_begin adr_l x30, bss_end 0: cmp x29, x30 b.hs 1f stp xzr, xzr, [x29], #16 b 0b 1: /* Prepare the stack. */ adr_l x30, boot_stack_end mov sp, x30 /* Set up exception vector. */ adr x30, vector_table_el1 msr vbar_el1, x30 /* Call into Rust code. */ bl main /* Loop forever waiting for interrupts. */ 2: wfi b 2b

exceptions.S (you shouldn’t need to change this):

/* * Copyright 2023 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * https://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ /** * Saves the volatile registers onto the stack. This currently takes 14 * instructions, so it can be used in exception handlers with 18 instructions * left. * * On return, x0 and x1 are initialised to elr_el2 and spsr_el2 respectively, * which can be used as the first and second arguments of a subsequent call. */ .macro save_volatile_to_stack /* Reserve stack space and save registers x0-x18, x29 & x30. */ stp x0, x1, [sp, #-(8 * 24)]! stp x2, x3, [sp, #8 * 2] stp x4, x5, [sp, #8 * 4] stp x6, x7, [sp, #8 * 6] stp x8, x9, [sp, #8 * 8] stp x10, x11, [sp, #8 * 10] stp x12, x13, [sp, #8 * 12] stp x14, x15, [sp, #8 * 14] stp x16, x17, [sp, #8 * 16] str x18, [sp, #8 * 18] stp x29, x30, [sp, #8 * 20] /* * Save elr_el1 & spsr_el1. This such that we can take nested exception * and still be able to unwind. */ mrs x0, elr_el1 mrs x1, spsr_el1 stp x0, x1, [sp, #8 * 22] .endm /** * Restores the volatile registers from the stack. This currently takes 14 * instructions, so it can be used in exception handlers while still leaving 18 * instructions left; if paired with save_volatile_to_stack, there are 4 * instructions to spare. */ .macro restore_volatile_from_stack /* Restore registers x2-x18, x29 & x30. */ ldp x2, x3, [sp, #8 * 2] ldp x4, x5, [sp, #8 * 4] ldp x6, x7, [sp, #8 * 6] ldp x8, x9, [sp, #8 * 8] ldp x10, x11, [sp, #8 * 10] ldp x12, x13, [sp, #8 * 12] ldp x14, x15, [sp, #8 * 14] ldp x16, x17, [sp, #8 * 16] ldr x18, [sp, #8 * 18] ldp x29, x30, [sp, #8 * 20] /* Restore registers elr_el1 & spsr_el1, using x0 & x1 as scratch. */ ldp x0, x1, [sp, #8 * 22] msr elr_el1, x0 msr spsr_el1, x1 /* Restore x0 & x1, and release stack space. */ ldp x0, x1, [sp], #8 * 24 .endm /** * This is a generic handler for exceptions taken at the current EL while using * SP0. It behaves similarly to the SPx case by first switching to SPx, doing * the work, then switching back to SP0 before returning. * * Switching to SPx and calling the Rust handler takes 16 instructions. To * restore and return we need an additional 16 instructions, so we can implement * the whole handler within the allotted 32 instructions. */ .macro current_exception_sp0 handler:req msr spsel, #1 save_volatile_to_stack bl \handler restore_volatile_from_stack msr spsel, #0 eret .endm /** * This is a generic handler for exceptions taken at the current EL while using * SPx. It saves volatile registers, calls the Rust handler, restores volatile * registers, then returns. * * This also works for exceptions taken from EL0, if we don't care about * non-volatile registers. * * Saving state and jumping to the Rust handler takes 15 instructions, and * restoring and returning also takes 15 instructions, so we can fit the whole * handler in 30 instructions, under the limit of 32. */ .macro current_exception_spx handler:req save_volatile_to_stack bl \handler restore_volatile_from_stack eret .endm .section .text.vector_table_el1, "ax" .global vector_table_el1 .balign 0x800 vector_table_el1: sync_cur_sp0: current_exception_sp0 sync_exception_current .balign 0x80 irq_cur_sp0: current_exception_sp0 irq_current .balign 0x80 fiq_cur_sp0: current_exception_sp0 fiq_current .balign 0x80 serr_cur_sp0: current_exception_sp0 serr_current .balign 0x80 sync_cur_spx: current_exception_spx sync_exception_current .balign 0x80 irq_cur_spx: current_exception_spx irq_current .balign 0x80 fiq_cur_spx: current_exception_spx fiq_current .balign 0x80 serr_cur_spx: current_exception_spx serr_current .balign 0x80 sync_lower_64: current_exception_spx sync_lower .balign 0x80 irq_lower_64: current_exception_spx irq_lower .balign 0x80 fiq_lower_64: current_exception_spx fiq_lower .balign 0x80 serr_lower_64: current_exception_spx serr_lower .balign 0x80 sync_lower_32: current_exception_spx sync_lower .balign 0x80 irq_lower_32: current_exception_spx irq_lower .balign 0x80 fiq_lower_32: current_exception_spx fiq_lower .balign 0x80 serr_lower_32: current_exception_spx serr_lower

idmap.S (you shouldn’t need to change this):

/* * Copyright 2023 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * https://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ .set .L_TT_TYPE_BLOCK, 0x1 .set .L_TT_TYPE_PAGE, 0x3 .set .L_TT_TYPE_TABLE, 0x3 /* Access flag. */ .set .L_TT_AF, 0x1 << 10 /* Not global. */ .set .L_TT_NG, 0x1 << 11 .set .L_TT_XN, 0x3 << 53 .set .L_TT_MT_DEV, 0x0 << 2 // MAIR #0 (DEV_nGnRE) .set .L_TT_MT_MEM, (0x1 << 2) | (0x3 << 8) // MAIR #1 (MEM_WBWA), inner shareable .set .L_BLOCK_DEV, .L_TT_TYPE_BLOCK | .L_TT_MT_DEV | .L_TT_AF | .L_TT_XN .set .L_BLOCK_MEM, .L_TT_TYPE_BLOCK | .L_TT_MT_MEM | .L_TT_AF | .L_TT_NG .section ".rodata.idmap", "a", %progbits .global idmap .align 12 idmap: /* level 1 */ .quad .L_BLOCK_DEV | 0x0 // 1 GiB of device mappings .quad .L_BLOCK_MEM | 0x40000000 // 1 GiB of DRAM .fill 254, 8, 0x0 // 254 GiB of unmapped VA space .quad .L_BLOCK_DEV | 0x4000000000 // 1 GiB of device mappings .fill 255, 8, 0x0 // 255 GiB of remaining VA space

image.ld (you shouldn’t need to change this):

/* * Copyright 2023 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * https://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ /* * Code will start running at this symbol which is placed at the start of the * image. */ ENTRY(entry) MEMORY { image : ORIGIN = 0x40080000, LENGTH = 2M } SECTIONS { /* * Collect together the code. */ .init : ALIGN(4096) { text_begin = .; *(.init.entry) *(.init.*) } >image .text : { *(.text.*) } >image text_end = .; /* * Collect together read-only data. */ .rodata : ALIGN(4096) { rodata_begin = .; *(.rodata.*) } >image .got : { *(.got) } >image rodata_end = .; /* * Collect together the read-write data including .bss at the end which * will be zero'd by the entry code. */ .data : ALIGN(4096) { data_begin = .; *(.data.*) /* * The entry point code assumes that .data is a multiple of 32 * bytes long. */ . = ALIGN(32); data_end = .; } >image /* Everything beyond this point will not be included in the binary. */ bin_end = .; /* The entry point code assumes that .bss is 16-byte aligned. */ .bss : ALIGN(16) { bss_begin = .; *(.bss.*) *(COMMON) . = ALIGN(16); bss_end = .; } >image .stack (NOLOAD) : ALIGN(4096) { boot_stack_begin = .; . += 40 * 4096; . = ALIGN(4096); boot_stack_end = .; } >image . = ALIGN(4K); PROVIDE(dma_region = .); /* * Remove unused sections from the image. */ /DISCARD/ : { /* The image loads itself so doesn't need these sections. */ *(.gnu.hash) *(.hash) *(.interp) *(.eh_frame_hdr) *(.eh_frame) *(.note.gnu.build-id) } }

Makefile (you shouldn’t need to change this):

# Copyright 2023 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. UNAME := $(shell uname -s) ifeq ($(UNAME),Linux) TARGET = aarch64-linux-gnu else TARGET = aarch64-none-elf endif OBJCOPY = $(TARGET)-objcopy .PHONY: build qemu_minimal qemu qemu_logger all: rtc.bin build: cargo build rtc.bin: build $(OBJCOPY) -O binary target/aarch64-unknown-none/debug/rtc $@ qemu: rtc.bin qemu-system-aarch64 -machine virt,gic-version=3 -cpu max -serial mon:stdio -display none -kernel $< -s clean: cargo clean rm -f *.bin

.cargo/config.toml (you shouldn’t need to change this):

[build] target = "aarch64-unknown-none" rustflags = ["-C", "link-arg=-Timage.ld"]

Run the code in QEMU with make qemu.

Bare Metal Rust Afternoon

RTC driver

(back to exercise)

main.rs:

#![no_main] #![no_std] mod exceptions; mod logger; mod pl011; mod pl031; use crate::pl031::Rtc; use arm_gic::gicv3::{IntId, Trigger}; use arm_gic::{irq_enable, wfi}; use chrono::{TimeZone, Utc}; use core::hint::spin_loop; use crate::pl011::Uart; use arm_gic::gicv3::GicV3; use core::panic::PanicInfo; use log::{error, info, trace, LevelFilter}; use smccc::psci::system_off; use smccc::Hvc; /// Base addresses of the GICv3. const GICD_BASE_ADDRESS: *mut u64 = 0x800_0000 as _; const GICR_BASE_ADDRESS: *mut u64 = 0x80A_0000 as _; /// Base address of the primary PL011 UART. const PL011_BASE_ADDRESS: *mut u32 = 0x900_0000 as _; /// Base address of the PL031 RTC. const PL031_BASE_ADDRESS: *mut u32 = 0x901_0000 as _; /// The IRQ used by the PL031 RTC. const PL031_IRQ: IntId = IntId::spi(2); #[no_mangle] extern "C" fn main(x0: u64, x1: u64, x2: u64, x3: u64) { // SAFETY: `PL011_BASE_ADDRESS` is the base address of a PL011 device, and // nothing else accesses that address range. let uart = unsafe { Uart::new(PL011_BASE_ADDRESS) }; logger::init(uart, LevelFilter::Trace).unwrap(); info!("main({:#x}, {:#x}, {:#x}, {:#x})", x0, x1, x2, x3); // SAFETY: `GICD_BASE_ADDRESS` and `GICR_BASE_ADDRESS` are the base // addresses of a GICv3 distributor and redistributor respectively, and // nothing else accesses those address ranges. let mut gic = unsafe { GicV3::new(GICD_BASE_ADDRESS, GICR_BASE_ADDRESS) }; gic.setup(); // SAFETY: `PL031_BASE_ADDRESS` is the base address of a PL031 device, and // nothing else accesses that address range. let mut rtc = unsafe { Rtc::new(PL031_BASE_ADDRESS) }; let timestamp = rtc.read(); let time = Utc.timestamp_opt(timestamp.into(), 0).unwrap(); info!("RTC: {time}"); GicV3::set_priority_mask(0xff); gic.set_interrupt_priority(PL031_IRQ, 0x80); gic.set_trigger(PL031_IRQ, Trigger::Level); irq_enable(); gic.enable_interrupt(PL031_IRQ, true); // Wait for 3 seconds, without interrupts. let target = timestamp + 3; rtc.set_match(target); info!("Waiting for {}", Utc.timestamp_opt(target.into(), 0).unwrap()); trace!( "matched={}, interrupt_pending={}", rtc.matched(), rtc.interrupt_pending() ); while !rtc.matched() { spin_loop(); } trace!( "matched={}, interrupt_pending={}", rtc.matched(), rtc.interrupt_pending() ); info!("Finished waiting"); // Wait another 3 seconds for an interrupt. let target = timestamp + 6; info!("Waiting for {}", Utc.timestamp_opt(target.into(), 0).unwrap()); rtc.set_match(target); rtc.clear_interrupt(); rtc.enable_interrupt(true); trace!( "matched={}, interrupt_pending={}", rtc.matched(), rtc.interrupt_pending() ); while !rtc.interrupt_pending() { wfi(); } trace!( "matched={}, interrupt_pending={}", rtc.matched(), rtc.interrupt_pending() ); info!("Finished waiting"); system_off::<Hvc>().unwrap(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { error!("{info}"); system_off::<Hvc>().unwrap(); loop {} }

pl031.rs:

#![allow(unused)] fn main() { use core::ptr::{addr_of, addr_of_mut}; #[repr(C, align(4))] struct Registers { /// Data register dr: u32, /// Match register mr: u32, /// Load register lr: u32, /// Control register cr: u8, _reserved0: [u8; 3], /// Interrupt Mask Set or Clear register imsc: u8, _reserved1: [u8; 3], /// Raw Interrupt Status ris: u8, _reserved2: [u8; 3], /// Masked Interrupt Status mis: u8, _reserved3: [u8; 3], /// Interrupt Clear Register icr: u8, _reserved4: [u8; 3], } /// Driver for a PL031 real-time clock. #[derive(Debug)] pub struct Rtc { registers: *mut Registers, } impl Rtc { /// Constructs a new instance of the RTC driver for a PL031 device at the /// given base address. /// /// # Safety /// /// The given base address must point to the MMIO control registers of a /// PL031 device, which must be mapped into the address space of the process /// as device memory and not have any other aliases. pub unsafe fn new(base_address: *mut u32) -> Self { Self { registers: base_address as *mut Registers } } /// Reads the current RTC value. pub fn read(&self) -> u32 { // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. unsafe { addr_of!((*self.registers).dr).read_volatile() } } /// Writes a match value. When the RTC value matches this then an interrupt /// will be generated (if it is enabled). pub fn set_match(&mut self, value: u32) { // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. unsafe { addr_of_mut!((*self.registers).mr).write_volatile(value) } } /// Returns whether the match register matches the RTC value, whether or not /// the interrupt is enabled. pub fn matched(&self) -> bool { // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. let ris = unsafe { addr_of!((*self.registers).ris).read_volatile() }; (ris & 0x01) != 0 } /// Returns whether there is currently an interrupt pending. /// /// This should be true if and only if `matched` returns true and the /// interrupt is masked. pub fn interrupt_pending(&self) -> bool { // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. let ris = unsafe { addr_of!((*self.registers).mis).read_volatile() }; (ris & 0x01) != 0 } /// Sets or clears the interrupt mask. /// /// When the mask is true the interrupt is enabled; when it is false the /// interrupt is disabled. pub fn enable_interrupt(&mut self, mask: bool) { let imsc = if mask { 0x01 } else { 0x00 }; // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. unsafe { addr_of_mut!((*self.registers).imsc).write_volatile(imsc) } } /// Clears a pending interrupt, if any. pub fn clear_interrupt(&mut self) { // SAFETY: We know that self.registers points to the control registers // of a PL031 device which is appropriately mapped. unsafe { addr_of_mut!((*self.registers).icr).write_volatile(0x01) } } } // SAFETY: `Rtc` just contains a pointer to device memory, which can be // accessed from any context. unsafe impl Send for Rtc {} }

Welcome to Concurrency in Rust

Rust has full support for concurrency using OS threads with mutexes and channels.

The Rust type system plays an important role in making many concurrency bugs compile time bugs. This is often referred to as fearless concurrency since you can rely on the compiler to ensure correctness at runtime.

Mục lục

Buổi học nên kéo dài khoảng 3 tiếng và 20 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Threads30 minutes
Channels20 minutes
Send and Sync15 minutes
Shared State30 minutes
Exercises1 hour and 10 minutes

Speaker Notes

  • Rust lets us access OS concurrency toolkit: threads, sync. primitives, etc.
  • The type system gives us safety for concurrency without any special features.
  • The same tools that help with “concurrent” access in a single thread (e.g., a called function that might mutate an argument or save references to it to read later) save us from multi-threading issues.

Threads

Phần này sẽ kéo dài khoảng 30 phút, bao gồm:

SlideThời lượng
Plain Threads15 minutes
Scoped Threads15 minutes

Plain Threads

Rust threads work similarly to threads in other languages:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • Threads are all daemon threads, the main thread does not wait for them.
  • Thread panics are independent of each other.
    • Panics can carry a payload, which can be unpacked with downcast_ref.

Speaker Notes

This slide should take about 15 minutes.
  • Rust thread APIs look not too different from e.g. C++ ones.

  • Run the example.

    • 5ms timing is loose enough that main and spawned threads stay mostly in lockstep.
    • Notice that the program ends before the spawned thread reaches 10!
    • This is because main ends the program and spawned threads do not make it persist.
      • Compare to pthreads/C++ std::thread/boost::thread if desired.
  • How do we wait around for the spawned thread to complete?

  • thread::spawn returns a JoinHandle. Look at the docs.

    • JoinHandle has a .join() method that blocks.
  • Use let handle = thread::spawn(...) and later handle.join() to wait for the thread to finish and have the program count all the way to 10.

  • Now what if we want to return a value?

  • Look at docs again:

  • Use the Result return value from handle.join() to get access to the returned value.

  • Ok, what about the other case?

    • Trigger a panic in the thread. Note that this doesn’t panic main.
    • Access the panic payload. This is a good time to talk about Any.
  • Now we can return values from threads! What about taking inputs?

    • Capture something by reference in the thread closure.
    • An error message indicates we must move it.
    • Move it in, see we can compute and then return a derived value.
  • If we want to borrow?

    • Main kills child threads when it returns, but another function would just return and leave them running.
    • That would be stack use-after-return, which violates memory safety!
    • How do we avoid this? see next slide.

Scoped Threads

Normal threads cannot borrow from their environment:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

However, you can use a scoped thread for this:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 13 minutes.
  • The reason for that is that when the thread::scope function completes, all the threads are guaranteed to be joined, so they can return borrowed data.
  • Normal Rust borrowing rules apply: you can either borrow mutably by one thread, or immutably by any number of threads.

Channels

Phần này sẽ kéo dài khoảng 20 phút, bao gồm:

SlideThời lượng
Senders and Receivers10 minutes
Unbounded Channels2 minutes
Bounded Channels10 minutes

Senders and Receivers

Rust channels have two parts: a Sender<T> and a Receiver<T>. The two parts are connected via the channel, but you only see the end-points.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 9 minutes.
  • mpsc stands for Multi-Producer, Single-Consumer. Sender and SyncSender implement Clone (so you can make multiple producers) but Receiver does not.
  • send() and recv() return Result. If they return Err, it means the counterpart Sender or Receiver is dropped and the channel is closed.

Unbounded Channels

You get an unbounded and asynchronous channel with mpsc::channel():

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Bounded Channels

With bounded (synchronous) channels, send can block the current thread:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 8 minutes.
  • Calling send will block the current thread until there is space in the channel for the new message. The thread can be blocked indefinitely if there is nobody who reads from the channel.
  • A call to send will abort with an error (that is why it returns Result) if the channel is closed. A channel is closed when the receiver is dropped.
  • A bounded channel with a size of zero is called a “rendezvous channel”. Every send will block the current thread until another thread calls recv.

Send and Sync

Phần này sẽ kéo dài khoảng 15 phút, bao gồm:

SlideThời lượng
Marker Traits2 minutes
Send2 minutes
Sync2 minutes
Examples10 minutes

Marker Traits

How does Rust know to forbid shared access across threads? The answer is in two traits:

  • Send: a type T is Send if it is safe to move a T across a thread boundary.
  • Sync: a type T is Sync if it is safe to move a &T across a thread boundary.

Send and Sync are unsafe traits. The compiler will automatically derive them for your types as long as they only contain Send and Sync types. You can also implement them manually when you know it is valid.

Speaker Notes

This slide should take about 2 minutes.
  • One can think of these traits as markers that the type has certain thread-safety properties.
  • They can be used in the generic constraints as normal traits.

Send

A type T is Send if it is safe to move a T value to another thread.

The effect of moving ownership to another thread is that destructors will run in that thread. So the question is when you can allocate a value in one thread and deallocate it in another.

Speaker Notes

This slide should take about 2 minutes.

As an example, a connection to the SQLite library must only be accessed from a single thread.

Sync

A type T is Sync if it is safe to access a T value from multiple threads at the same time.

More precisely, the definition is:

T is Sync if and only if &T is Send

Speaker Notes

This slide should take about 2 minutes.

This statement is essentially a shorthand way of saying that if a type is thread-safe for shared use, it is also thread-safe to pass references of it across threads.

This is because if a type is Sync it means that it can be shared across multiple threads without the risk of data races or other synchronization issues, so it is safe to move it to another thread. A reference to the type is also safe to move to another thread, because the data it references can be accessed from any thread safely.

Examples

Send + Sync

Most types you come across are Send + Sync:

  • i8, f32, bool, char, &str, …
  • (T1, T2), [T; N], &[T], struct { x: T }, …
  • String, Option<T>, Vec<T>, Box<T>, …
  • Arc<T>: Explicitly thread-safe via atomic reference count.
  • Mutex<T>: Explicitly thread-safe via internal locking.
  • mpsc::Sender<T>: As of 1.72.0.
  • AtomicBool, AtomicU8, …: Uses special atomic instructions.

The generic types are typically Send + Sync when the type parameters are Send + Sync.

Send + !Sync

These types can be moved to other threads, but they’re not thread-safe. Typically because of interior mutability:

  • mpsc::Receiver<T>
  • Cell<T>
  • RefCell<T>

!Send + Sync

These types are thread-safe, but they cannot be moved to another thread:

  • MutexGuard<T: Sync>: Uses OS level primitives which must be deallocated on the thread which created them.

!Send + !Sync

These types are not thread-safe and cannot be moved to other threads:

  • Rc<T>: each Rc<T> has a reference to an RcBox<T>, which contains a non-atomic reference count.
  • *const T, *mut T: Rust assumes raw pointers may have special concurrency considerations.

Shared State

Phần này sẽ kéo dài khoảng 30 phút, bao gồm:

SlideThời lượng
Arc5 minutes
Mutex15 minutes
Example10 minutes

Arc

Arc<T> allows shared read-only access via Arc::clone:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • Arc stands for “Atomic Reference Counted”, a thread safe version of Rc that uses atomic operations.
  • Arc<T> implements Clone whether or not T does. It implements Send and Sync if and only if T implements them both.
  • Arc::clone() has the cost of atomic operations that get executed, but after that the use of the T is free.
  • Beware of reference cycles, Arc does not use a garbage collector to detect them.
    • std::sync::Weak can help.

Mutex

Mutex<T> ensures mutual exclusion and allows mutable access to T behind a read-only interface (another form of interior mutability):

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Notice how we have a impl<T: Send> Sync for Mutex<T> blanket implementation.

Speaker Notes

This slide should take about 14 minutes.
  • Mutex in Rust looks like a collection with just one element — the protected data.
    • It is not possible to forget to acquire the mutex before accessing the protected data.
  • You can get an &mut T from an &Mutex<T> by taking the lock. The MutexGuard ensures that the &mut T doesn’t outlive the lock being held.
  • Mutex<T> implements both Send and Sync iff (if and only if) T implements Send.
  • A read-write lock counterpart: RwLock.
  • Why does lock() return a Result?
    • If the thread that held the Mutex panicked, the Mutex becomes “poisoned” to signal that the data it protected might be in an inconsistent state. Calling lock() on a poisoned mutex fails with a PoisonError. You can call into_inner() on the error to recover the data regardless.

Example

Let us see Arc and Mutex in action:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 8 minutes.

Possible solution:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Notable parts:

  • v is wrapped in both Arc and Mutex, because their concerns are orthogonal.
    • Wrapping a Mutex in an Arc is a common pattern to share mutable state between threads.
  • v: Arc<_> needs to be cloned as v2 before it can be moved into another thread. Note move was added to the lambda signature.
  • Blocks are introduced to narrow the scope of the LockGuard as much as possible.

Exercises

Phần này sẽ kéo dài khoảng 1 giờ và 10 phút, bao gồm:

SlideThời lượng
Dining Philosophers20 minutes
Multi-threaded Link Checker20 minutes
Solutions30 minutes

Dining Philosophers

The dining philosophers problem is a classic problem in concurrency:

Five philosophers dine together at the same table. Each philosopher has their own place at the table. There is a fork between each plate. The dish served is a kind of spaghetti which has to be eaten with two forks. Each philosopher can only alternately think and eat. Moreover, a philosopher can only eat their spaghetti when they have both a left and right fork. Thus two forks will only be available when their two nearest neighbors are thinking, not eating. After an individual philosopher finishes eating, they will put down both forks.

You will need a local Cargo installation for this exercise. Copy the code below to a file called src/main.rs, fill out the blanks, and test that cargo run does not deadlock:

use std::sync::{mpsc, Arc, Mutex}; use std::thread; use std::time::Duration; struct Fork; struct Philosopher { name: String, // left_fork: ... // right_fork: ... // thoughts: ... } impl Philosopher { fn think(&self) { self.thoughts .send(format!("Eureka! {} has a new idea!", &self.name)) .unwrap(); } fn eat(&self) { // Pick up forks... println!("{} is eating...", &self.name); thread::sleep(Duration::from_millis(10)); } } static PHILOSOPHERS: &[&str] = &["Socrates", "Hypatia", "Plato", "Aristotle", "Pythagoras"]; fn main() { // Create forks // Create philosophers // Make each of them think and eat 100 times // Output their thoughts }

You can use the following Cargo.toml:

[package] name = "dining-philosophers" version = "0.1.0" edition = "2021"

Multi-threaded Link Checker

Let us use our new knowledge to create a multi-threaded link checker. It should start at a webpage and check that links on the page are valid. It should recursively check other pages on the same domain and keep doing this until all pages have been validated.

For this, you will need an HTTP client such as reqwest. You will also need a way to find links, we can use scraper. Finally, we’ll need some way of handling errors, we will use thiserror.

Create a new Cargo project and reqwest it as a dependency with:

cargo new link-checker cd link-checker cargo add --features blocking,rustls-tls reqwest cargo add scraper cargo add thiserror

If cargo add fails with error: no such subcommand, then please edit the Cargo.toml file by hand. Add the dependencies listed below.

The cargo add calls will update the Cargo.toml file to look like this:

[package] name = "link-checker" version = "0.1.0" edition = "2021" publish = false [dependencies] reqwest = { version = "0.11.12", features = ["blocking", "rustls-tls"] } scraper = "0.13.0" thiserror = "1.0.37"

You can now download the start page. Try with a small site such as https://www.google.org/.

Your src/main.rs file should look something like this:

use reqwest::blocking::Client; use reqwest::Url; use scraper::{Html, Selector}; use thiserror::Error; #[derive(Error, Debug)] enum Error { #[error("request error: {0}")] ReqwestError(#[from] reqwest::Error), #[error("bad http response: {0}")] BadResponse(String), } #[derive(Debug)] struct CrawlCommand { url: Url, extract_links: bool, } fn visit_page(client: &Client, command: &CrawlCommand) -> Result<Vec<Url>, Error> { println!("Checking {:#}", command.url); let response = client.get(command.url.clone()).send()?; if !response.status().is_success() { return Err(Error::BadResponse(response.status().to_string())); } let mut link_urls = Vec::new(); if !command.extract_links { return Ok(link_urls); } let base_url = response.url().to_owned(); let body_text = response.text()?; let document = Html::parse_document(&body_text); let selector = Selector::parse("a").unwrap(); let href_values = document .select(&selector) .filter_map(|element| element.value().attr("href")); for href in href_values { match base_url.join(href) { Ok(link_url) => { link_urls.push(link_url); } Err(err) => { println!("On {base_url:#}: ignored unparsable {href:?}: {err}"); } } } Ok(link_urls) } fn main() { let client = Client::new(); let start_url = Url::parse("https://www.google.org").unwrap(); let crawl_command = CrawlCommand{ url: start_url, extract_links: true }; match visit_page(&client, &crawl_command) { Ok(links) => println!("Links: {links:#?}"), Err(err) => println!("Could not extract links: {err:#}"), } }

Run the code in src/main.rs with

cargo run

Tasks

  • Use threads to check the links in parallel: send the URLs to be checked to a channel and let a few threads check the URLs in parallel.
  • Extend this to recursively extract links from all pages on the www.google.org domain. Put an upper limit of 100 pages or so so that you don’t end up being blocked by the site.

Solutions

Dining Philosophers

use std::sync::{mpsc, Arc, Mutex}; use std::thread; use std::time::Duration; struct Fork; struct Philosopher { name: String, left_fork: Arc<Mutex<Fork>>, right_fork: Arc<Mutex<Fork>>, thoughts: mpsc::SyncSender<String>, } impl Philosopher { fn think(&self) { self.thoughts .send(format!("Eureka! {} has a new idea!", &self.name)) .unwrap(); } fn eat(&self) { println!("{} is trying to eat", &self.name); let _left = self.left_fork.lock().unwrap(); let _right = self.right_fork.lock().unwrap(); println!("{} is eating...", &self.name); thread::sleep(Duration::from_millis(10)); } } static PHILOSOPHERS: &[&str] = &["Socrates", "Hypatia", "Plato", "Aristotle", "Pythagoras"]; fn main() { let (tx, rx) = mpsc::sync_channel(10); let forks = (0..PHILOSOPHERS.len()) .map(|_| Arc::new(Mutex::new(Fork))) .collect::<Vec<_>>(); for i in 0..forks.len() { let tx = tx.clone(); let mut left_fork = Arc::clone(&forks[i]); let mut right_fork = Arc::clone(&forks[(i + 1) % forks.len()]); // To avoid a deadlock, we have to break the symmetry // somewhere. This will swap the forks without deinitializing // either of them. if i == forks.len() - 1 { std::mem::swap(&mut left_fork, &mut right_fork); } let philosopher = Philosopher { name: PHILOSOPHERS[i].to_string(), thoughts: tx, left_fork, right_fork, }; thread::spawn(move || { for _ in 0..100 { philosopher.eat(); philosopher.think(); } }); } drop(tx); for thought in rx { println!("{thought}"); } }
use std::sync::{mpsc, Arc, Mutex}; use std::thread; use reqwest::blocking::Client; use reqwest::Url; use scraper::{Html, Selector}; use thiserror::Error; #[derive(Error, Debug)] enum Error { #[error("request error: {0}")] ReqwestError(#[from] reqwest::Error), #[error("bad http response: {0}")] BadResponse(String), } #[derive(Debug)] struct CrawlCommand { url: Url, extract_links: bool, } fn visit_page(client: &Client, command: &CrawlCommand) -> Result<Vec<Url>, Error> { println!("Checking {:#}", command.url); let response = client.get(command.url.clone()).send()?; if !response.status().is_success() { return Err(Error::BadResponse(response.status().to_string())); } let mut link_urls = Vec::new(); if !command.extract_links { return Ok(link_urls); } let base_url = response.url().to_owned(); let body_text = response.text()?; let document = Html::parse_document(&body_text); let selector = Selector::parse("a").unwrap(); let href_values = document .select(&selector) .filter_map(|element| element.value().attr("href")); for href in href_values { match base_url.join(href) { Ok(link_url) => { link_urls.push(link_url); } Err(err) => { println!("On {base_url:#}: ignored unparsable {href:?}: {err}"); } } } Ok(link_urls) } struct CrawlState { domain: String, visited_pages: std::collections::HashSet<String>, } impl CrawlState { fn new(start_url: &Url) -> CrawlState { let mut visited_pages = std::collections::HashSet::new(); visited_pages.insert(start_url.as_str().to_string()); CrawlState { domain: start_url.domain().unwrap().to_string(), visited_pages } } /// Determine whether links within the given page should be extracted. fn should_extract_links(&self, url: &Url) -> bool { let Some(url_domain) = url.domain() else { return false; }; url_domain == self.domain } /// Mark the given page as visited, returning false if it had already /// been visited. fn mark_visited(&mut self, url: &Url) -> bool { self.visited_pages.insert(url.as_str().to_string()) } } type CrawlResult = Result<Vec<Url>, (Url, Error)>; fn spawn_crawler_threads( command_receiver: mpsc::Receiver<CrawlCommand>, result_sender: mpsc::Sender<CrawlResult>, thread_count: u32, ) { let command_receiver = Arc::new(Mutex::new(command_receiver)); for _ in 0..thread_count { let result_sender = result_sender.clone(); let command_receiver = command_receiver.clone(); thread::spawn(move || { let client = Client::new(); loop { let command_result = { let receiver_guard = command_receiver.lock().unwrap(); receiver_guard.recv() }; let Ok(crawl_command) = command_result else { // The sender got dropped. No more commands coming in. break; }; let crawl_result = match visit_page(&client, &crawl_command) { Ok(link_urls) => Ok(link_urls), Err(error) => Err((crawl_command.url, error)), }; result_sender.send(crawl_result).unwrap(); } }); } } fn control_crawl( start_url: Url, command_sender: mpsc::Sender<CrawlCommand>, result_receiver: mpsc::Receiver<CrawlResult>, ) -> Vec<Url> { let mut crawl_state = CrawlState::new(&start_url); let start_command = CrawlCommand { url: start_url, extract_links: true }; command_sender.send(start_command).unwrap(); let mut pending_urls = 1; let mut bad_urls = Vec::new(); while pending_urls > 0 { let crawl_result = result_receiver.recv().unwrap(); pending_urls -= 1; match crawl_result { Ok(link_urls) => { for url in link_urls { if crawl_state.mark_visited(&url) { let extract_links = crawl_state.should_extract_links(&url); let crawl_command = CrawlCommand { url, extract_links }; command_sender.send(crawl_command).unwrap(); pending_urls += 1; } } } Err((url, error)) => { bad_urls.push(url); println!("Got crawling error: {:#}", error); continue; } } } bad_urls } fn check_links(start_url: Url) -> Vec<Url> { let (result_sender, result_receiver) = mpsc::channel::<CrawlResult>(); let (command_sender, command_receiver) = mpsc::channel::<CrawlCommand>(); spawn_crawler_threads(command_receiver, result_sender, 16); control_crawl(start_url, command_sender, result_receiver) } fn main() { let start_url = reqwest::Url::parse("https://www.google.org").unwrap(); let bad_urls = check_links(start_url); println!("Bad URLs: {:#?}", bad_urls); }

Lời Chào Mừng

“Async” is a concurrency model where multiple tasks are executed concurrently by executing each task until it would block, then switching to another task that is ready to make progress. The model allows running a larger number of tasks on a limited number of threads. This is because the per-task overhead is typically very low and operating systems provide primitives for efficiently identifying I/O that is able to proceed.

Rust’s asynchronous operation is based on “futures”, which represent work that may be completed in the future. Futures are “polled” until they signal that they are complete.

Futures are polled by an async runtime, and several different runtimes are available.

Comparisons

  • Python has a similar model in its asyncio. However, its Future type is callback-based, and not polled. Async Python programs require a “loop”, similar to a runtime in Rust.

  • JavaScript’s Promise is similar, but again callback-based. The language runtime implements the event loop, so many of the details of Promise resolution are hidden.

Mục lục

Buổi học nên kéo dài khoảng 3 tiếng và 20 phút, bao gồm thời gian 10 phút nghỉ trưa. Nội dung buổi học bao gồm:

SegmentThời lượng
Async Basics30 minutes
Channels and Control Flow20 minutes
Pitfalls55 minutes
Exercises1 hour and 10 minutes

Async Basics

Phần này sẽ kéo dài khoảng 30 phút, bao gồm:

SlideThời lượng
async/await10 minutes
Futures4 minutes
Runtimes10 minutes
Tasks10 minutes

async/await

At a high level, async Rust code looks very much like “normal” sequential code:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 6 minutes.

Các điểm chính:

  • Note that this is a simplified example to show the syntax. There is no long running operation or any real concurrency in it!

  • What is the return type of an async call?

    • Use let future: () = async_main(10); in main to see the type.
  • The “async” keyword is syntactic sugar. The compiler replaces the return type with a future.

  • You cannot make main async, without additional instructions to the compiler on how to use the returned future.

  • You need an executor to run async code. block_on blocks the current thread until the provided future has run to completion.

  • .await asynchronously waits for the completion of another operation. Unlike block_on, .await doesn’t block the current thread.

  • .await can only be used inside an async function (or block; these are introduced later).

Futures

Future is a trait, implemented by objects that represent an operation that may not be complete yet. A future can be polled, and poll returns a Poll.

#![allow(unused)] fn main() { use std::pin::Pin; use std::task::Context; pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; } pub enum Poll<T> { Ready(T), Pending, } }

An async function returns an impl Future. It’s also possible (but uncommon) to implement Future for your own types. For example, the JoinHandle returned from tokio::spawn implements Future to allow joining to it.

The .await keyword, applied to a Future, causes the current async function to pause until that Future is ready, and then evaluates to its output.

Speaker Notes

This slide should take about 4 minutes.
  • The Future and Poll types are implemented exactly as shown; click the links to show the implementations in the docs.

  • We will not get to Pin and Context, as we will focus on writing async code, rather than building new async primitives. Briefly:

    • Context allows a Future to schedule itself to be polled again when an event occurs.

    • Pin ensures that the Future isn’t moved in memory, so that pointers into that future remain valid. This is required to allow references to remain valid after an .await.

Runtimes

A runtime provides support for performing operations asynchronously (a reactor) and is responsible for executing futures (an executor). Rust does not have a “built-in” runtime, but several options are available:

  • Tokio: performant, with a well-developed ecosystem of functionality like Hyper for HTTP or Tonic for gRPC.
  • async-std: aims to be a “std for async”, and includes a basic runtime in async::task.
  • smol: simple and lightweight

Several larger applications have their own runtimes. For example, Fuchsia already has one.

Speaker Notes

This slide and its sub-slides should take about 10 minutes.
  • Note that of the listed runtimes, only Tokio is supported in the Rust playground. The playground also does not permit any I/O, so most interesting async things can’t run in the playground.

  • Futures are “inert” in that they do not do anything (not even start an I/O operation) unless there is an executor polling them. This differs from JS Promises, for example, which will run to completion even if they are never used.

Tokio

Tokio provides:

  • A multi-threaded runtime for executing asynchronous code.
  • An asynchronous version of the standard library.
  • A large ecosystem of libraries.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

  • With the tokio::main macro we can now make main async.

  • The spawn function creates a new, concurrent “task”.

  • Note: spawn takes a Future, you don’t call .await on count_to.

Further exploration:

  • Why does count_to not (usually) get to 10? This is an example of async cancellation. tokio::spawn returns a handle which can be awaited to wait until it finishes.

  • Try count_to(10).await instead of spawning.

  • Try awaiting the task returned from tokio::spawn.

Tasks

Rust has a task system, which is a form of lightweight threading.

A task has a single top-level future which the executor polls to make progress. That future may have one or more nested futures that its poll method polls, corresponding loosely to a call stack. Concurrency within a task is possible by polling multiple child futures, such as racing a timer and an I/O operation.

use tokio::io::{self, AsyncReadExt, AsyncWriteExt}; use tokio::net::TcpListener; #[tokio::main] async fn main() -> io::Result<()> { let listener = TcpListener::bind("127.0.0.1:0").await?; println!("listening on port {}", listener.local_addr()?.port()); loop { let (mut socket, addr) = listener.accept().await?; println!("connection from {addr:?}"); tokio::spawn(async move { socket.write_all(b"Who are you?\n").await.expect("socket error"); let mut buf = vec![0; 1024]; let name_size = socket.read(&mut buf).await.expect("socket error"); let name = std::str::from_utf8(&buf[..name_size]).unwrap().trim(); let reply = format!("Thanks for dialing in, {name}!\n"); socket.write_all(reply.as_bytes()).await.expect("socket error"); }); } }

Speaker Notes

This slide should take about 6 minutes.

Copy this example into your prepared src/main.rs and run it from there.

Try connecting to it with a TCP connection tool like nc or telnet.

  • Ask students to visualize what the state of the example server would be with a few connected clients. What tasks exist? What are their Futures?

  • This is the first time we’ve seen an async block. This is similar to a closure, but does not take any arguments. Its return value is a Future, similar to an async fn.

  • Refactor the async block into a function, and improve the error handling using ?.

Channels and Control Flow

Phần này sẽ kéo dài khoảng 20 phút, bao gồm:

SlideThời lượng
Async Channels10 minutes
Join4 minutes
Select5 minutes

Async Channels

Several crates have support for asynchronous channels. For instance tokio:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 8 minutes.
  • Change the channel size to 3 and see how it affects the execution.

  • Overall, the interface is similar to the sync channels as seen in the morning class.

  • Try removing the std::mem::drop call. What happens? Why?

  • The Flume crate has channels that implement both sync and async send and recv. This can be convenient for complex applications with both IO and heavy CPU processing tasks.

  • What makes working with async channels preferable is the ability to combine them with other futures to combine them and create complex control flow.

Join

A join operation waits until all of a set of futures are ready, and returns a collection of their results. This is similar to Promise.all in JavaScript or asyncio.gather in Python.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 4 minutes.

Copy this example into your prepared src/main.rs and run it from there.

  • For multiple futures of disjoint types, you can use std::future::join! but you must know how many futures you will have at compile time. This is currently in the futures crate, soon to be stabilised in std::future.

  • The risk of join is that one of the futures may never resolve, this would cause your program to stall.

  • You can also combine join_all with join! for instance to join all requests to an http service as well as a database query. Try adding a tokio::time::sleep to the future, using futures::join!. This is not a timeout (that requires select!, explained in the next chapter), but demonstrates join!.

Select

A select operation waits until any of a set of futures is ready, and responds to that future’s result. In JavaScript, this is similar to Promise.race. In Python, it compares to asyncio.wait(task_set, return_when=asyncio.FIRST_COMPLETED).

Similar to a match statement, the body of select! has a number of arms, each of the form pattern = future => statement. When a future is ready, its return value is destructured by the pattern. The statement is then run with the resulting variables. The statement result becomes the result of the select! macro.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • In this example, we have a race between a cat and a dog. first_animal_to_finish_race listens to both channels and will pick whichever arrives first. Since the dog takes 50ms, it wins against the cat that take 500ms.

  • You can use oneshot channels in this example as the channels are supposed to receive only one send.

  • Try adding a deadline to the race, demonstrating selecting different sorts of futures.

  • Note that select! drops unmatched branches, which cancels their futures. It is easiest to use when every execution of select! creates new futures.

    • An alternative is to pass &mut future instead of the future itself, but this can lead to issues, further discussed in the pinning slide.

Pitfalls

Async / await provides convenient and efficient abstraction for concurrent asynchronous programming. However, the async/await model in Rust also comes with its share of pitfalls and footguns. We illustrate some of them in this chapter.

Phần này sẽ kéo dài khoảng 55 phút, bao gồm:

SlideThời lượng
Blocking the Executor10 minutes
Pin20 minutes
Async Traits5 minutes
Cancellation20 minutes

Blocking the executor

Most async runtimes only allow IO tasks to run concurrently. This means that CPU blocking tasks will block the executor and prevent other tasks from being executed. An easy workaround is to use async equivalent methods where possible.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 10 minutes.
  • Run the code and see that the sleeps happen consecutively rather than concurrently.

  • The "current_thread" flavor puts all tasks on a single thread. This makes the effect more obvious, but the bug is still present in the multi-threaded flavor.

  • Switch the std::thread::sleep to tokio::time::sleep and await its result.

  • Another fix would be to tokio::task::spawn_blocking which spawns an actual thread and transforms its handle into a future without blocking the executor.

  • You should not think of tasks as OS threads. They do not map 1 to 1 and most executors will allow many tasks to run on a single OS thread. This is particularly problematic when interacting with other libraries via FFI, where that library might depend on thread-local storage or map to specific OS threads (e.g., CUDA). Prefer tokio::task::spawn_blocking in such situations.

  • Use sync mutexes with care. Holding a mutex over an .await may cause another task to block, and that task may be running on the same thread.

Pin

Async blocks and functions return types implementing the Future trait. The type returned is the result of a compiler transformation which turns local variables into data stored inside the future.

Some of those variables can hold pointers to other local variables. Because of that, the future should never be moved to a different memory location, as it would invalidate those pointers.

To prevent moving the future type in memory, it can only be polled through a pinned pointer. Pin is a wrapper around a reference that disallows all operations that would move the instance it points to into a different memory location.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 20 minutes.
  • You may recognize this as an example of the actor pattern. Actors typically call select! in a loop.

  • This serves as a summation of a few of the previous lessons, so take your time with it.

    • Naively add a _ = sleep(Duration::from_millis(100)) => { println!(..) } to the select!. This will never execute. Why?

    • Instead, add a timeout_fut containing that future outside of the loop:

      #![allow(unused)] fn main() { let mut timeout_fut = sleep(Duration::from_millis(100)); loop { select! { .., _ = timeout_fut => { println!(..); }, } } }
    • This still doesn’t work. Follow the compiler errors, adding &mut to the timeout_fut in the select! to work around the move, then using Box::pin:

      #![allow(unused)] fn main() { let mut timeout_fut = Box::pin(sleep(Duration::from_millis(100))); loop { select! { .., _ = &mut timeout_fut => { println!(..); }, } } }
    • This compiles, but once the timeout expires it is Poll::Ready on every iteration (a fused future would help with this). Update to reset timeout_fut every time it expires.

  • Box allocates on the heap. In some cases, std::pin::pin! (only recently stabilized, with older code often using tokio::pin!) is also an option, but that is difficult to use for a future that is reassigned.

  • Another alternative is to not use pin at all but spawn another task that will send to a oneshot channel every 100ms.

  • Data that contains pointers to itself is called self-referential. Normally, the Rust borrow checker would prevent self-referential data from being moved, as the references cannot outlive the data they point to. However, the code transformation for async blocks and functions is not verified by the borrow checker.

  • Pin is a wrapper around a reference. An object cannot be moved from its place using a pinned pointer. However, it can still be moved through an unpinned pointer.

  • The poll method of the Future trait uses Pin<&mut Self> instead of &mut Self to refer to the instance. That’s why it can only be called on a pinned pointer.

Async Traits

Async methods in traits are were stabilized only recently, in the 1.75 release. This required support for using return-position impl Trait (RPIT) in traits, as the desugaring for async fn includes -> impl Future<Output = ...>.

However, even with the native support today there are some pitfalls around async fn and RPIT in traits:

  • Return-position impl Trait captures all in-scope lifetimes (so some patterns of borrowing cannot be expressed)

  • Traits whose methods use return-position impl trait or async are not dyn compatible.

If we do need dyn support, the crate async_trait provides a workaround through a macro, with some caveats:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 5 minutes.
  • async_trait is easy to use, but note that it’s using heap allocations to achieve this. This heap allocation has performance overhead.

  • The challenges in language support for async trait are deep Rust and probably not worth describing in-depth. Niko Matsakis did a good job of explaining them in this post if you are interested in digging deeper.

  • Try creating a new sleeper struct that will sleep for a random amount of time and adding it to the Vec.

Cancellation

Dropping a future implies it can never be polled again. This is called cancellation and it can occur at any await point. Care is needed to ensure the system works correctly even when futures are cancelled. For example, it shouldn’t deadlock or lose data.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Speaker Notes

This slide should take about 18 minutes.
  • The compiler doesn’t help with cancellation-safety. You need to read API documentation and consider what state your async fn holds.

  • Unlike panic and ?, cancellation is part of normal control flow (vs error-handling).

  • The example loses parts of the string.

    • Whenever the tick() branch finishes first, next() and its buf are dropped.

    • LinesReader can be made cancellation-safe by making buf part of the struct:

      #![allow(unused)] fn main() { struct LinesReader { stream: DuplexStream, bytes: Vec<u8>, buf: [u8; 1], } impl LinesReader { fn new(stream: DuplexStream) -> Self { Self { stream, bytes: Vec::new(), buf: [0] } } async fn next(&mut self) -> io::Result<Option<String>> { // prefix buf and bytes with self. // ... let raw = std::mem::take(&mut self.bytes); let s = String::from_utf8(raw) // ... } } }
  • Interval::tick is cancellation-safe because it keeps track of whether a tick has been ‘delivered’.

  • AsyncReadExt::read is cancellation-safe because it either returns or doesn’t read data.

  • AsyncBufReadExt::read_line is similar to the example and isn’t cancellation-safe. See its documentation for details and alternatives.

Exercises

Phần này sẽ kéo dài khoảng 1 giờ và 10 phút, bao gồm:

SlideThời lượng
Dining Philosophers20 minutes
Broadcast Chat Application30 minutes
Solutions20 minutes

Dining Philosophers — Async

See dining philosophers for a description of the problem.

As before, you will need a local Cargo installation for this exercise. Copy the code below to a file called src/main.rs, fill out the blanks, and test that cargo run does not deadlock:

use std::sync::Arc; use tokio::sync::mpsc::{self, Sender}; use tokio::sync::Mutex; use tokio::time; struct Fork; struct Philosopher { name: String, // left_fork: ... // right_fork: ... // thoughts: ... } impl Philosopher { async fn think(&self) { self.thoughts .send(format!("Eureka! {} has a new idea!", &self.name)) .await .unwrap(); } async fn eat(&self) { // Keep trying until we have both forks println!("{} is eating...", &self.name); time::sleep(time::Duration::from_millis(5)).await; } } static PHILOSOPHERS: &[&str] = &["Socrates", "Hypatia", "Plato", "Aristotle", "Pythagoras"]; #[tokio::main] async fn main() { // Create forks // Create philosophers // Make them think and eat // Output their thoughts }

Since this time you are using Async Rust, you’ll need a tokio dependency. You can use the following Cargo.toml:

[package] name = "dining-philosophers-async-dine" version = "0.1.0" edition = "2021" [dependencies] tokio = { version = "1.26.0", features = ["sync", "time", "macros", "rt-multi-thread"] }

Also note that this time you have to use the Mutex and the mpsc module from the tokio crate.

Speaker Notes

This slide should take about 20 minutes.
  • Can you make your implementation single-threaded?

Broadcast Chat Application

In this exercise, we want to use our new knowledge to implement a broadcast chat application. We have a chat server that the clients connect to and publish their messages. The client reads user messages from the standard input, and sends them to the server. The chat server broadcasts each message that it receives to all the clients.

For this, we use a broadcast channel on the server, and tokio_websockets for the communication between the client and the server.

Create a new Cargo project and add the following dependencies:

Cargo.toml:

[package] name = "chat-async" version = "0.1.0" edition = "2021" [dependencies] futures-util = { version = "0.3.30", features = ["sink"] } http = "1.1.0" tokio = { version = "1.37.0", features = ["full"] } tokio-websockets = { version = "0.8.2", features = ["client", "fastrand", "server", "sha1_smol"] }

The required APIs

You are going to need the following functions from tokio and tokio_websockets. Spend a few minutes to familiarize yourself with the API.

  • StreamExt::next() implemented by WebSocketStream: for asynchronously reading messages from a Websocket Stream.
  • SinkExt::send() implemented by WebSocketStream: for asynchronously sending messages on a Websocket Stream.
  • Lines::next_line(): for asynchronously reading user messages from the standard input.
  • Sender::subscribe(): for subscribing to a broadcast channel.

Two binaries

Normally in a Cargo project, you can have only one binary, and one src/main.rs file. In this project, we need two binaries. One for the client, and one for the server. You could potentially make them two separate Cargo projects, but we are going to put them in a single Cargo project with two binaries. For this to work, the client and the server code should go under src/bin (see the documentation).

Copy the following server and client code into src/bin/server.rs and src/bin/client.rs, respectively. Your task is to complete these files as described below.

src/bin/server.rs:

use futures_util::sink::SinkExt; use futures_util::stream::StreamExt; use std::error::Error; use std::net::SocketAddr; use tokio::net::{TcpListener, TcpStream}; use tokio::sync::broadcast::{channel, Sender}; use tokio_websockets::{Message, ServerBuilder, WebSocketStream}; async fn handle_connection( addr: SocketAddr, mut ws_stream: WebSocketStream<TcpStream>, bcast_tx: Sender<String>, ) -> Result<(), Box<dyn Error + Send + Sync>> { // TODO: For a hint, see the description of the task below. } #[tokio::main] async fn main() -> Result<(), Box<dyn Error + Send + Sync>> { let (bcast_tx, _) = channel(16); let listener = TcpListener::bind("127.0.0.1:2000").await?; println!("listening on port 2000"); loop { let (socket, addr) = listener.accept().await?; println!("New connection from {addr:?}"); let bcast_tx = bcast_tx.clone(); tokio::spawn(async move { // Wrap the raw TCP stream into a websocket. let ws_stream = ServerBuilder::new().accept(socket).await?; handle_connection(addr, ws_stream, bcast_tx).await }); } }

src/bin/client.rs:

use futures_util::stream::StreamExt; use futures_util::SinkExt; use http::Uri; use tokio::io::{AsyncBufReadExt, BufReader}; use tokio_websockets::{ClientBuilder, Message}; #[tokio::main] async fn main() -> Result<(), tokio_websockets::Error> { let (mut ws_stream, _) = ClientBuilder::from_uri(Uri::from_static("ws://127.0.0.1:2000")) .connect() .await?; let stdin = tokio::io::stdin(); let mut stdin = BufReader::new(stdin).lines(); // TODO: For a hint, see the description of the task below. }

Running the binaries

Run the server with:

cargo run --bin server

and the client with:

cargo run --bin client

Tasks

  • Implement the handle_connection function in src/bin/server.rs.
    • Hint: Use tokio::select! for concurrently performing two tasks in a continuous loop. One task receives messages from the client and broadcasts them. The other sends messages received by the server to the client.
  • Complete the main function in src/bin/client.rs.
    • Hint: As before, use tokio::select! in a continuous loop for concurrently performing two tasks: (1) reading user messages from standard input and sending them to the server, and (2) receiving messages from the server, and displaying them for the user.
  • Optional: Once you are done, change the code to broadcast messages to all clients, but the sender of the message.

Solutions

Dining Philosophers — Async

use std::sync::Arc; use tokio::sync::mpsc::{self, Sender}; use tokio::sync::Mutex; use tokio::time; struct Fork; struct Philosopher { name: String, left_fork: Arc<Mutex<Fork>>, right_fork: Arc<Mutex<Fork>>, thoughts: Sender<String>, } impl Philosopher { async fn think(&self) { self.thoughts .send(format!("Eureka! {} has a new idea!", &self.name)) .await .unwrap(); } async fn eat(&self) { // Keep trying until we have both forks let (_left_fork, _right_fork) = loop { // Pick up forks... let left_fork = self.left_fork.try_lock(); let right_fork = self.right_fork.try_lock(); let Ok(left_fork) = left_fork else { // If we didn't get the left fork, drop the right fork if we // have it and let other tasks make progress. drop(right_fork); time::sleep(time::Duration::from_millis(1)).await; continue; }; let Ok(right_fork) = right_fork else { // If we didn't get the right fork, drop the left fork and let // other tasks make progress. drop(left_fork); time::sleep(time::Duration::from_millis(1)).await; continue; }; break (left_fork, right_fork); }; println!("{} is eating...", &self.name); time::sleep(time::Duration::from_millis(5)).await; // The locks are dropped here } } static PHILOSOPHERS: &[&str] = &["Socrates", "Hypatia", "Plato", "Aristotle", "Pythagoras"]; #[tokio::main] async fn main() { // Create forks let mut forks = vec![]; (0..PHILOSOPHERS.len()).for_each(|_| forks.push(Arc::new(Mutex::new(Fork)))); // Create philosophers let (philosophers, mut rx) = { let mut philosophers = vec![]; let (tx, rx) = mpsc::channel(10); for (i, name) in PHILOSOPHERS.iter().enumerate() { let left_fork = Arc::clone(&forks[i]); let right_fork = Arc::clone(&forks[(i + 1) % PHILOSOPHERS.len()]); philosophers.push(Philosopher { name: name.to_string(), left_fork, right_fork, thoughts: tx.clone(), }); } (philosophers, rx) // tx is dropped here, so we don't need to explicitly drop it later }; // Make them think and eat for phil in philosophers { tokio::spawn(async move { for _ in 0..100 { phil.think().await; phil.eat().await; } }); } // Output their thoughts while let Some(thought) = rx.recv().await { println!("Here is a thought: {thought}"); } }

Broadcast Chat Application

src/bin/server.rs:

use futures_util::sink::SinkExt; use futures_util::stream::StreamExt; use std::error::Error; use std::net::SocketAddr; use tokio::net::{TcpListener, TcpStream}; use tokio::sync::broadcast::{channel, Sender}; use tokio_websockets::{Message, ServerBuilder, WebSocketStream}; async fn handle_connection( addr: SocketAddr, mut ws_stream: WebSocketStream<TcpStream>, bcast_tx: Sender<String>, ) -> Result<(), Box<dyn Error + Send + Sync>> { ws_stream .send(Message::text("Welcome to chat! Type a message".to_string())) .await?; let mut bcast_rx = bcast_tx.subscribe(); // A continuous loop for concurrently performing two tasks: (1) receiving // messages from `ws_stream` and broadcasting them, and (2) receiving // messages on `bcast_rx` and sending them to the client. loop { tokio::select! { incoming = ws_stream.next() => { match incoming { Some(Ok(msg)) => { if let Some(text) = msg.as_text() { println!("From client {addr:?} {text:?}"); bcast_tx.send(text.into())?; } } Some(Err(err)) => return Err(err.into()), None => return Ok(()), } } msg = bcast_rx.recv() => { ws_stream.send(Message::text(msg?)).await?; } } } } #[tokio::main] async fn main() -> Result<(), Box<dyn Error + Send + Sync>> { let (bcast_tx, _) = channel(16); let listener = TcpListener::bind("127.0.0.1:2000").await?; println!("listening on port 2000"); loop { let (socket, addr) = listener.accept().await?; println!("New connection from {addr:?}"); let bcast_tx = bcast_tx.clone(); tokio::spawn(async move { // Wrap the raw TCP stream into a websocket. let ws_stream = ServerBuilder::new().accept(socket).await?; handle_connection(addr, ws_stream, bcast_tx).await }); } }

src/bin/client.rs:

use futures_util::stream::StreamExt; use futures_util::SinkExt; use http::Uri; use tokio::io::{AsyncBufReadExt, BufReader}; use tokio_websockets::{ClientBuilder, Message}; #[tokio::main] async fn main() -> Result<(), tokio_websockets::Error> { let (mut ws_stream, _) = ClientBuilder::from_uri(Uri::from_static("ws://127.0.0.1:2000")) .connect() .await?; let stdin = tokio::io::stdin(); let mut stdin = BufReader::new(stdin).lines(); // Continuous loop for concurrently sending and receiving messages. loop { tokio::select! { incoming = ws_stream.next() => { match incoming { Some(Ok(msg)) => { if let Some(text) = msg.as_text() { println!("From server: {}", text); } }, Some(Err(err)) => return Err(err.into()), None => return Ok(()), } } res = stdin.next_line() => { match res { Ok(None) => return Ok(()), Ok(Some(line)) => ws_stream.send(Message::text(line.to_string())).await?, Err(err) => return Err(err.into()), } } } } }

Thanks!

Thank you for taking Comprehensive Rust 🦀! We hope you enjoyed it and that it was useful.

We’ve had a lot of fun putting the course together. The course is not perfect, so if you spotted any mistakes or have ideas for improvements, please get in contact with us on GitHub. We would love to hear from you.

Chú giải thuật ngữ

Bảng từ vựng dưới đây cung cấp định nghĩa ngắn gọn của nhiều thuật ngữ Rust.

  • allocate:
    Cấp phát bộ nhớ động trên bộ nhớ heap.
  • argument:
    Giá trị được truyền vào một hàm. Có thể được gọi là tham số.
  • Bare-metal Rust:
    Việc phát triển chương trình Rust ở cấp thấp, thường để triển khai trên hệ thống không cần hệ điều hành. Xem Bare-metal Rust.
  • block:
    Xem Blocksscope.
  • borrow:
    Xem Borrowing.
  • borrow checker:
    Một bộ phận của Rust compiler kiểm tra xem tất cả các borrow có hợp lệ hay không.
  • brace:
    {}. Được gọi là dấu ngoặc nhọn, phân chia các block.
  • build:
    Quá trình chuyển đổi mã nguồn thành một chương trình executable.
  • call:
    Gọi hoặc thực thi hàm.
  • channel:
    Dùng để truyền thông điệp một cách an toàn giữa các thread.
  • Comprehensive Rust 🦀:
    Các khóa học ở đây được gọi chung là Comprehensive Rust 🦀.
  • concurrency:
    Quá trình thực thi nhiều tác vụ hoặc process cùng một lúc.
  • Concurrency in Rust:
    Xem Concurrency trong Rust.
  • constant:
    Một giá trị không thay đổi trong quá trình thực thi của chương trình. Còn được gọi là hằng số.
  • control flow:
    Thứ tự thực thi của các câu lệnh trong một chương trình. Một vài sách gọi từ này là luồng điều khiển.
  • crash:
    Khi chương trình gặp lỗi không xử lý được và bị dừng đột ngột.
  • enumeration:
    Môt kiểu dữ liệu chứa một hoặc nhiều hằng số, có thể kèm theo một tuple hoặc một struct.
  • error:
    Khi gặp phải một hành vi hoặc kết quả không như dự kiến.
  • error handling:
    Quá trình quản lý và xử lý lỗi xảy ra trong quá trình thực thi chương trình.
  • exercise:
    Bài tập được thiết kế giúp học viên luyện tập và kiểm tra năng lực lập trình.
  • function:
    Một đoạn code có thể được tái sử dụng, nhằm thực hiện một nhiệm vụ cụ thể.Còn được gọi là hàm.
  • garbage collector:
    Một cơ chế tự động giải phóng bộ nhớ bị chiếm bởi các biến không còn được sử dụng nữa.
  • generics:
    Một tính năng của ngôn ngữ lập trình, cho phép code được viết nhưng không cần phải định rõ kiểu dữ liệu, giúp tái sử dụng code với nhiều kiểu dữ liệu khác nhau.
  • immutable:
    Không thể thay đổi sau khi tạo. Có thể được dịch là bất biến.
  • integration test:
    Một quy trình test code để kiểm tra sự tương tác giữa các thành phần của hệ thống.
  • keyword:
    Một từ khóa trong ngôn ngữ lập trình, có ý nghĩa cụ thể và không thể sử dụng như một identifier (tên biến).
  • library:
    Một bộ sưu tập các hàm hoặc code đã được biên dịch trước, có thể được sử dụng trong chương trình. Thường được gọi là thư viện.
  • macro:
    Macros trong Rust có thể được nhận diện bằng việc tên có dấu !. Macros được sử dụng khi hàm thông thường không đáp ứng được nhu cầu của chương trình. Một ví dụ điển hình là format!, macro này có thể nhận một số lượng tham số không cố định (có thể 1 tham số, 2 tham số, hay thậm chí là không có tham số nào). Hàm thông thường trong Rust không làm được điều này.
  • main function:
    Điểm bắt đầu của mọi chương trình Rust là hàm main.
  • match:
    Một control flow trong Rust cho phép so sánh một giá trị với một số pattern cho trước.
  • memory leak:
    Khi chương trình không giải phóng bộ nhớ của biến không còn cần thiết, dẫn đến việc bộ nhớ sử dụng tăng dần trong quá trình thực thi.
  • method:
    Một hàm liên kết với một đối tượng hoặc một kiểu dữ liệu trong Rust.
  • module:
    Một phương pháp đẻ nhóm các định nghĩa, như hàm, kiểu dữ liệu, hoặc traits, trong Rust.
  • move:
    Chuyển quyền sở hữu của một giá trị từ một biến sang một biến khác trong Rust.
  • mutable:
    Một thuộc tính trong Rust cho phép biến có thể được thay đổi sau khi đã được khai báo.
  • ownership:
    Một khái niệm trong Rust xác định phần nào của code chịu trách nhiệm quản lý bộ nhớ liên quan đến một giá trị.
  • panic:
    Khi chương trình gặp phải một lỗi không thể phục hồi được, dẫn đến việc dừng chương trình.
  • parameter:
    Giá trị được truyền vào một hàm khi hàm đó được gọi. Còn được gọi là tham số.
  • pattern:
    Cách sử dụng giá trị, hằng số, hoặc cấu trúc để so sánh với một biểu thức trong Rust.
  • payload:
    Dữ liệu hoặc thông tin được chuyển đi, thường sử dụng khi nhắc đến việc chuyển dữ liệu qua mạng.
  • program:
    Một tập hợp các dòng lệnh máy tính có thể thực thi để thực hiện một nhiệm vụ hoặc giải quyết một vấn đề cụ thể. Còn được gọi là chương trình.
  • programming language:
    Một hệ thống ngôn ngữ được sử dụng để truyền thông điệp cho máy tính, như Rust.
  • receiver:
    Tham số đầu tiên trong một method của Rust, đại diện cho biến mà method đó gọi lên.
  • reference counting:
    Một kỹ thuật quản lý bộ nhớ trong đó số lượng tham chiếu đến một đối tượng được theo dõi, và đối tượng được giải phóng khi số lượng tham chiếu giảm về 0.
  • return:
    Một từ khóa trong Rust dùng để chỉ ra giá trị trả về của một hàm.
  • Rust:
    Ngôn ngữ lập trình hệ thống chú trọng vào an toàn và hiệu suất.
  • Rust Fundamentals:
    Ngày 1 đến ngày 4 của khóa học này.
  • Rust in Android:
    Xem Sử dụng Rust trong Android.
  • Rust in Chromium:
    Xem Sử dụng Rust trong Chromium.
  • safe:
    Đoạn code tuân thủ các quy tắc về ownershipborrowing của Rust, ngăn chặn lỗi liên quan đến bộ nhớ.
  • scope:
    Một phạm vi trong chương trình mà một biến có thể được sử dụng.
  • standard library:
    Một bộ sưu tập các module cung cấp các chức năng cần thiết trong Rust. Một số sách gọi là thư viện chuẩn.
  • static:
    Một từ khóa trong Rust dùng để định nghĩa biến tĩnh hoặc các phần tử với lifetime'static.
  • string:
    Một kiểu dữ liệu lưu trữ dữ liệu văn bản. Xem String vs str để biết thêm chi tiết. Còn được gọi là chuỗi.
  • struct:
    Một kiểu dữ liệu trong Rust, dùng để nhóm các biến khác nhau với nhau thành một kiểu dữ liệu mới.
  • test:
    Một module trong Rust chứa các hàm kiểm tra sự chính xác của các hàm khác.
  • thread:
    Một cơ chế cho phép thực thi nhiều tác vụ cùng một lúc trong chương trình. Còn được gọi là luồng.
  • thread safety:
    Thuộc tính của chương trình đảm bảo chương trình hoạt động chuẩn xác trong môi trường đa luồng.
  • trait:
    Một tập hợp các hành vi mà ta muốn một kiểu dữ liệu cụ thể phải có.
  • trait bound:
    Một cách để yêu cầu một kiểu dữ liệu phải implement một số trait cụ thể.
  • tuple:
    Một kiểu dữ liệu trong Rust chứa các biến khác nhau. Các trường của tuple không có tên, và được truy cập bằng số thứ tự.
  • type:
    Một cơ chế giúp xác định các thao tác có thể thực hiện trên giá trị của một kiểu dữ liệu cụ thể trong Rust.
  • type inference:
    Một chức năng của Rust compiler giúp suy luận kiểu dữ liệu của biến hoặc biểu thức, thay vì phải khai báo kiểu dữ liệu cho từng biến.
  • undefined behavior:
    Một số hành vi và điều kiện trong Rust không có kết quả cụ thể, thường dẫn đến việc chương trình hoạt động bất thường.
  • union:
    Một kiểu dữ liệu có thể chứa giá trị của nhiều kiểu dữ liệu khác nhau, nhưng chỉ một giá trị một lúc.
  • unit test:
    Một loại test trong Rust, giúp kiểm tra từng phần nhỏ của chương trình. Xem Unit Tests.
  • unit type:
    Kiểu dữ liệu không chứa dữ liệu, ký hiệu bằng ().
  • unsafe:
    Một phần của Rust cho phép người lập trình gây ra undefined behavior. Xem Unsafe Rust.
  • variable:
    Một vùng bộ nhớ lưu trữ dữ liệu. Có thể được sử dụng trong một scope. Còn được gọi là biến.

Other Rust Resources

The Rust community has created a wealth of high-quality and free resources online.

Official Documentation

The Rust project hosts many resources. These cover Rust in general:

  • The Rust Programming Language: the canonical free book about Rust. Covers the language in detail and includes a few projects for people to build.
  • Rust By Example: covers the Rust syntax via a series of examples which showcase different constructs. Sometimes includes small exercises where you are asked to expand on the code in the examples.
  • Rust Standard Library: full documentation of the standard library for Rust.
  • The Rust Reference: an incomplete book which describes the Rust grammar and memory model.

More specialized guides hosted on the official Rust site:

  • The Rustonomicon: covers unsafe Rust, including working with raw pointers and interfacing with other languages (FFI).
  • Asynchronous Programming in Rust: covers the new asynchronous programming model which was introduced after the Rust Book was written.
  • The Embedded Rust Book: an introduction to using Rust on embedded devices without an operating system.

Unofficial Learning Material

A small selection of other guides and tutorial for Rust:

Please see the Little Book of Rust Books for even more Rust books.

Credits

The material here builds on top of the many great sources of Rust documentation. See the page on other resources for a full list of useful resources.

The material of Comprehensive Rust is licensed under the terms of the Apache 2.0 license, please see LICENSE for details.

Rust by Example

Some examples and exercises have been copied and adapted from Rust by Example. Please see the third_party/rust-by-example/ directory for details, including the license terms.

Rust on Exercism

Some exercises have been copied and adapted from Rust on Exercism. Please see the third_party/rust-on-exercism/ directory for details, including the license terms.

CXX

The Interoperability with C++ section uses an image from CXX. Please see the third_party/cxx/ directory for details, including the license terms.