In this article I explain all the courses I have taken at UT, and my thoughts on them

Hardware and Algorithm Co-Design for ML

This course delved into various methods of accelerating machine learning algorithms. Topics such as quantization and pruning were discussed, with a primary focus on hardware and its configuration, including the use of systolic arrays and the proper arrangement of memory buses and local BRAM for high throughput.

There were four labs:

Lab 1: Building and training various neural networks using PyTorch, analyzing network structure and its effect on accuracy. The lab also covered how quantization and pruning impact the model output.

Lab 2: Getting familiar with using HLS to program an FPGA, culminating in the creation of a simple matrix multiplier within the FPGA fabric.

Lab 3: Building an IM2COL implementation within the FPGA and creating a more scalable matrix multiplication implementation.

Lab 4: Training a simple neural network in PyTorch, using the quantized weights in an inference model built on the FPGA. This involved using the IM2COL and matrix multiplication implementations from previous labs, as well as developing data input and output infrastructure. The result was a fully functional image detector inference machine running on an FPGA.

This class was extremely interesting and informative. However, it lacked teaching and specific resources on the FPGA development environment and tools. All course teachings focused on the theory of accelerating neural networks and why it works. A few lectures on the tools used to program these networks on an FPGA would greatly reduce lab headaches and enable more ambitious labs.

 

VLSI 1

VLSI was a very fast-paced course, covering a wide range of topics from the theory of VLSI lithography to the core principles of creating fast and efficient circuits and designing and synthesizing complex systems in Verilog. It included exams, three large lab assignments, and a project for graduate students.

Lab 1: Manually creating a layout design of a 4-bit memory cell in the Cadence layout editor. The main challenge was the competition aspect, where the smaller the area of your functional 4-bit memory cell (relative to your classmates), the better the grade you received. This was a highly manual process involving planning, trial, error, and avoiding many hidden DRC errors.

Lab 2: Designing a fast and efficient ALU within the Cadence Schematic Editor, using only inverters, NAND gates, and OR gates. The lab was graded competitively, where the faster your ALU circuit ran compared to others, the better the grade. This involved a lengthy but fascinating process of designing fast implementations of adders, subtractors, shifters, etc., with the constraint of using only NOT, NAND, and OR gates.

Lab 3: This lab had two parts:

  • Part 1: Designing a Synchronous Serial Port (SSP) in Verilog, checking the functional correctness by simulating it using VCS, and synthesizing the SSP design in Design Vision.
  • Part 2: Integrating the designed SSP with the given Verilog of an ARM core using the Wishbone interconnect. This involved designing a bus controller conforming to the Wishbone Protocol in Verilog. The complexity stemmed from understanding the design goals within the context of tens of thousands of lines of Verilog operating within the ARM core. Once understood, implementation was straightforward.

The labs in this class were challenging but very rewarding. The difficulty of the class matched the amount learned, and I would recommend it to anyone interested in the field.

 

IoT For Embedded Systems

This course focused on various IoT communication systems and protocols, how they work, and their implementation in embedded system projects. It covered communication protocols, the physics of various forms of wireless communication, and significant portions of communication theory. Labs were based on pre-built embedded systems with various communication methods designed to accomplish specific tasks.

My favorite lab was on low-frequency sub-GHz radio communication:

Given three embedded systems, each with a sub-GHz radio module, the goal was to create a communication protocol, including addressing, error correction, variable data length, etc. The devices needed to communicate with one another, running the same code (dynamic addressing was necessary). The final part involved benchmarking the protocol's data rate and transmission distance.

Other labs covered LoRa, Bluetooth, WiFi, and more, all of which were interesting and fun. Highly recommended if you are interested in embedded systems.

 

Advanced Embedded Systems

This course covered a broad range of topics related to FPGAs, embedded systems, and their design. It was centered around three labs, where each student worked solo with an Ultra96 FPGA board, and a project where students worked in teams.

Lab 1: Introduction to using Xilinx Vivado to program the FPGA, as well as coding simple applications on the Linux board to interact with the custom FPGA core. The goal was to develop a simple memory tester that would write a pattern to memory, then read it back, verifying if the memory could be successfully read and written to. This involved learning about the AXI bus protocol, effectively using and modifying the IP cores within Vivado, and interacting with mapped memory to control the FPGA core.

Lab 2: Building a timer module within the FPGA and modifying the Xilinx Direct Memory Access (DMA) module to interface with the timer module. The task included developing a custom kernel module to receive an interrupt from the timer upon DMA completion. The C code needed to begin various memory transfers of different sizes, verify their validity, and measure their latency using the capture timer.

Lab 3: Creating a Keccak-512 cryptography core in the FPGA fabric. This involved building the state machine to read memory in chunks and pass it into the Keccak core while following the core’s input guidelines. The timer module from Lab 2 was also modified to measure hash latency. The Keccak core needed to throw an interrupt to notify the kernel upon hash completion. The final part was developing a kernel module to receive the Keccak interrupt, writing a program to create a hash of any input text or file, and writing a program to test many hashes and analyze the latency of the hashing core.

I definitely recommend this course. The content it teaches is very interesting and extremely applicable if you are interested in high-performance embedded systems and FPGA design. The bitcoin miner my group and I designed and deployed as a final project for this class was also very interesting, and details can be found [here].

 

Real Time Operating Systems

This course focused on developing a custom OS for a basic Arm processor. It included a midterm, a final exam, and 7 labs, each building upon the previous to create a fully functioning system. The final two labs featured an RC car race, utilizing the developed OS as the brain of the system.

Lab 1-5: The initial labs involved writing the OS in C, with context switching and PendSV handling in assembly. Key features built into the OS included priority scheduling, mutexes, and a custom command line interpreter. Custom implementations of dynamic heap allocation (malloc) and an SD card file system (e.g., Fat32) were also completed.

Lab 6-7: Forming teams, students worked on controlling RC cars with dual processors. One processor interacted with sensors to acquire data, while the other controlled the motors and servos driving the car. CAN communication was implemented between the two boards to facilitate this dual processing architecture The RC car race involved testing the developed OS in a competitive environment.

I definitely recommend this course. The content it teaches is very interesting and extremely applicable if you are interested in OS development and embedded system programming. The debugging can get frustrating, as tracking down errors in a parallel context is extremely challenging, but the satisfaction that results from building a functional OS is worth it.

 

VLSI Physical Design Automation

This course discussed the various methods and algorithms used to go from a Verilog circuit design into an ASIC chip layout. It covered the history of algorithms and their advancements over time, as well as discussed what are currently the state of the art methods for VLSI placement. While the content itself is interesting, the teaching methods were not my favorite, and the assignments, tests, and project were not designed very well to be engaging and challenging.