Personal Statement

My name is Bohan Yang, a junior majoring in physical electronics at the School of the Gifted Young, University of Science and Technology of China (USTC). Now I am applying for a Ph.D. position at IIIS, Tsinghua University. I have a strong interest in computing dataflow optimization, efficient resource mapping strategy, and innovative chip architecture in AI chips, especially for hardware acceleration dedicated to various AI application scenarios such as LLM, self-driving, and cloud computing.

I have a mixed background in physics and electronics because I prefer a solid knowledge foundation and a broadened eyesight in theory and engineering. In fact, creative ideas at the logical level must have support from underlying physics principles and are also heavily influenced by them. With the development of technology nodes in packaging, high-speed interconnection (e.g. optical interconnect), and memory (e.g. HBM and 3D-stacked memory), computer architecture research has shown much more opportunities and challenges than before. And I believe my background will help me a lot in my future career.

In the past three years, I have participated in many exciting projects. The first project I want to introduce is to design a real digital chip and generate a complete GDS layout. My target is a microchip in TSMC 180nm process, including a MIPS CPU (8-bit), a systolic array accelerator, and an SRAM controller. To do that, I traveled from front-end architecture design such as the efficient systolic array architecture, to circuit verification which ensures the dataflow is valid and can output useful matrix data, and finally back-end layout generation by floorplan, clock tree generation, power grid design, and IO pad allocation. In this project, I learned a lot about the design workflow in AI chips and EDAs from both Synopsys and Cadence. Now the chip has come back from the foundry and is under test.

Last summer, as an intern, I joined Polar Bear Tech for my summer internship. Since Polar Bear Tech is a startup company for chiplet design and integration, I learned a lot about the industrial workflow in AI compilers and the problems the industry cares about. As a member of the AI compiling research group, I better understood how AI frameworks operate. There, I wrote AI operator descriptions for the dedicated functional simulator and tried to optimize their performance in C++. Also, I wrote scripts to verify the functionality of operators using the simulator. The task was troublesome because of the huge expansion of parameter space, so I had to prune my travel list while ensuring the coverage rate and correctness in corner cases. My coding skills improved a lot after this work.

Now, I work as a research intern under the supervision of Prof. Mingyu Gao at IIIS, Tsinghua University to seek a better accelerator architecture for dynamic neural networks. Our idea is inspired based on the fact that GPU and static spatial accelerators are inefficient in manipulation and control flow like if-else blocks. So they cannot save energy consumption or improve hardware utilization since all branches in dynamic neural networks have to be executed or the input batch size is limited to 1. This limitation is severe for dynamic neural networks because it dramatically cuts down the inference throughput and undermines their ideal performance. Therefore, we propose an accelerator with support for dynamism while still achieving high hardware utilization, Adyna. To better utilize the architecture, we use the HW-SW co-design method and construct a scheduling and dispatch platform. My job is to build a cycle-accurate simulator based on gem5 and Simpy to implement our dataflow scheme ideas, evaluate the architecture and perform relatively precise performance estimation. After taking the traces from the scheduler as input, the simulator will generate relative statistics. Our work has been under review in Micro’23 and I am the second author of this paper.

In my observation, most of the research papers and frontier research hotspots tend to focus on relieving the incompatibility between computing and storage demands. For example, kinds of dataflows have been proposed to fully utilize data reuse, like NVDLA and Eyeriss. Many data-moving strategies have also been created like forwarding, buffer sharing, and multicast. But that is not enough. Many tasks are still facing inefficiency because of the low data utility and load imbalance of resources.

Also, with the rapid growth of AI applications and specific demands, the current computing paradigm is no longer suitable for future workloads. GPU, for instance, has been proven to be the most practical DNN training and inference platform because of its tensor-friendly architecture (SIMT) compared to traditional CPU. However, it is still not perfect since there are growing demands for more cases such as sparse matrix, mixed precision, different input formats, multimodal tasks, dynamic models, etc. The GPU has its own drawbacks and historical baggage. For example, Adyna is to facilitate dynamic neural networks by using near-SRAM-routers that can manipulate and distribute tensors at runtime, which fits this application better than GPUs.

So in the future, my research plan lies in dataflow optimization for challenging workloads like ViT, BEV (point clouds), sparse model, GNN, multimodal NN, and other models by designing efficient schemes to increase utilization and reduce energy waste. After dynamic neural networks, I believe some innovative models may be also hindered by the current platform. How to save them by getting rid of the drawbacks of GPU is a very promising research direction. However, innovations that only stay in microarchitecture is facing a wall. HW-SW co-design and circuit-level ideas may have a big difference (like Adyna and DOTA). Computing-in-memory, chiplets, and C2C links are also exciting research directions both in industry and academia to set up powerful and scalable server clusters.

Besides, I am also interested in universal AI accelerators. They are promising because “universal” indicates a huge development community and flexible applications. Such universal integration in architecture will be big news and a dream for me. The challenge, however, may lie in the trade-off between tasks and how to make itself charming for every client. So will some new universal AI computing platform take the place of NVIDIA GPU in this Golden Age? That is what I am going to explore in my Ph.D. life.

In summary, I have the confidence to explore the design space for future workloads and face the challenges in modern computer architecture research. Also, I hope that my work will be applied to commercial use, just like Tenstorrent and Tesla Dojo, and facilitate computing-dense scenarios that will drive the next wave of the AI revolution one day, which means my research is helpful to industry development.

Here is the end of my statement. If you are interested in the details of my projects, you can check them out in my GitHub repo or on my website. Thank you!