FSA Labs

Efficient and Scalable ML System Design

Designing highly efficient and scalable ML systems that focuses on exploring multi-dimensional opportunities for fast training and inference on multi-scale complex architectures.

Planet-Scale VR/AR System Design

System-oriented design methodologies for enabling future planet-scale VR/AR/XR systems, considering latency, perception, energy-efficiency, user experience, quality of service and edge-cloud top metrics.

Quantum Compilation and Architecture Models

Designing alternative quantum system execution model with compilation techniques, programming models and middle-ware layer over various future quantum hardware.

High Performance Computing (HPC)

Co-Design for emerging HPC application cases on extreme heterogeneous architectures for both scale-up and scale-out efficiency.

Sydney Systems Research Seminars

Please join in us for cutting-edge system research talks from academia and industry labs ranging from novel ML chip design to scalable meta-verse design/optimisation.

Who we are

Dr. Shuaiwen Leon Song

SOAR Faculty Fellow, Director of FSA

Roly Poly Song

Our lovely mascot! His specialty is sleeping hard.

Alan Robertson

Ph.D Student, Quantum Computing, Xanadu Intern

Donglin Zhuang

MPhil and upcoming Ph.D Student, ML system Design (Alibaba student fellow and Google Brain collaborator)

Haojun Xia

Ph.D Student, Accelerators, ML co-design, sparsity, large models scalability and acceleration (Alibaba Student Fellow, Google Brain collaborator)

Xie Li

Ph.D Student, Heterogeneous Memory Architecture Acceleration (ARC DP)

Yuben Yang

USYD Masters, upcoming Ph.D, Software-Hardware Co-Design, Bayesian Acceleration

Charlie/Zhongzhu Zhou

Ph.D Student, Comprehensive Optimisation of Deep Learning: Efficient and Effective Algorithm and System Design (Co-supervised by Yibo@JD@KAUST)

Lilian Hunt

Honours Student, Quantum Error Suppression

Andrew Esteban

Honours Student, Quantum Compilation

Adam Ghanem

MPhil student, ML System Design and Optimisation (co-advised with Sara Hooker and Amir Yazdanbakhsh @ Google Brain)

Haowen Gao

Honours Student, Dalyell scholar, Neural Architecture Search (NAS)

Jeffery Sun

Masters Student, Meta-verse VR/AR Co-Design

Chengying Huan

Visiting student at FSA, Novel Graph Acceleration Engines.

Dr. Yuval Sanders

Affiliated Researcher, postdoc at UTS, Quantum Algorithm.

Yufei Yang

Masters student, AR/VR Software Layer Design

Past Mentored Students / Alumni

Ang Li, previously postdoc at PNNL, now senior research scientist at PNNL
Xingyao (Leo) Zhang, previously co-mentored with Michael Taylor at U of Washington, now a research scientist at Google Seattle
You Yang, mentored him for 2 years when Dr. Song was at PNNL. He graduated from UC Berkeley and now a CS assistant professor@ NUS
Junqiao Qiu, intern 2016-2017, now assistant professor @Michigan Tech
Chenhao Xie, postdoc 2017-now, postdoc@PNNL, soon to be assistant professor
Zijun Hui, USYD 2020-2021 research internship, now at ETH Zurich
Eric Brugel, 2015-2016, now at Google

Dipanjon Sangupta, mentored from 2014-2016, now at Intel Research
Lingda Li, Postdoc, Rutgers University, 2015 spring, now a research scientist @ Brookheaven National Lab
Dingwen Tao, intern 2015-2017, now an assistant professor @ UNSW
Chao Li, intern 2014-2017, now a research scientist @NVIDIA
Probir Roy, research intern 2016-2018, now an assistant professor@ U of Michigan Dearborn
Heng Zhang, visiting researcher 2020-now, senior research scientist @ Chinese Academy of Science
Roland Gorzkowski, 2015-2016 intern, now at Facebook

Some Recent Papers

[VLDB'24] Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity'23

[HPCA'23] "Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing, HPCA'23

[EuroSys'23] "TEA: A General-Purpose Temporal Graph Random Walk Engine, EuroSys'23

[ICS'22] "Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampling on Multi-Accelerator Systems", ICS'22
[PPoPP'22] "Vapro: Light-weight Performance Variance Detection and Diagnosis for Production Run Parallel Applications", PPoPP'22

[ICDE'22] "TeGraph: A Novel General-Purpose Temporal Graph Computing Engine", ICDE'22

[MLSys'22] "Randomness In Neural Network Training: Characterising The Impact of Tooling"
Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker (Google Brain), to appear in Machine Learning and Systems (MLSys) 2022. Artifact
[ASPLOS'22] "AStitch: Enabling A New Multi-Dimensional Optimisation Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures"
In the proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'22) Artifact.
[VLDB'22] "COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression"
To appear in the 48th International Conference on Very Large Data Bases (VLDB'22)

[SC'21] "Dr. Top-k: Delegate-Centric Top-k on Heterogeneous HPC Architectures"
To appear on The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21). Artifact
[SC'21] "MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers"
To appear on The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21). Artifact
[MICRO'21] "Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving"
Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song and Xin Fu, IEEE/ACM International Symposium on Microarchitecture 2021 (MICRO'21). Artifact

[ESEC/FSE'21] "Toward Efficient Interactions between Python and Native Libraries"
Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen, Xu Liu, The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2021.

[TC'21] "MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors"
Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu, IEEE Transactions on Computers.

[ICS'21] "ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning"
Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao, The International Conference on Supercomputing.

[ASPLOS YArch'21] "Quantum Von Neumann Architectural Modelling for Algorithm Analysis"
Alan Robertson, Shuaiwen Leon Song, The third Young Architect Workshop (YArch), in conjunction with ASPLOS'21, April 2021. Talk

[ISCA'21] "η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities"
Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael Taylor, Shuaiwen Leon Song, The International Symposium on Computer Architecture [ISCA'21]. (Acceptance rate: 18.7%). Presentation

[TC] "Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design"
Xingyao Zhang, Xin Fu, Donglin Zhuang, Chenhao Xie, Shuaiwen Leon Song, IEEE Transactions on Computers, A special issue on research highlights of Computer Architecture for 2019-2020 .

[ASPLOS'21] "Q-VR: System-Level Design for Future Collaborative Virtual Reality"
Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song, The 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'21). (Acceptance rate: 17.5%). This is the first work in the computer architecture community on a possible design for future planet-scale mobile VR systems with low latency and high quality. We transform a complex multi-dimensional problem into a system design problem, providing real-world observations, practical insights and a software-hardware co-design solution.

[HPCA'20] "Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design"
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), (Acceptance rate: 19.4%; This work tries to start some conversation on how we deal with these new types of hybrid network structures where not only 2D or 3D convolution poses as the most significant efficiency block. We are helping CapsuleNet actually scale and become hardware-friendly (via Processing-in-Memory).)
Slides

[ISCA'19] "OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems"
Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). (Acceptance rate: 16.9%) Investigating into what future chiplet-based cloud server systems will look like to reach scalable parallel rendering efficiency via a new hardware-software co-design framework.)
Presentation

[HPCA'19] "PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customised Memory Cube"
Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, Shuaiwen Leon Song. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate: 19.7%; One of the first papers in the community identifying that state-of-the-art software-level reprojection optimisation for commercial VR devices involuntarily causes significant hardware-level bottlenecks and motion anomalies.)
Video

[SC'19] "BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets"
Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). (Acceptance rate: 20.9%; Introducing the concept of software-based tensor core for bit-level parallelism on many core accelerators)
Video

[PPoPP'18] "Superneurons: dynamic GPU memory management for training deep neural networks"
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Tim Kraska. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). (Acceptance rate: 20%; GPUs' memory are too small to have balanced efficiency matching with powerful cores when handling training for large non-linear networks. SuperNeurons helps users tackle this
Video

[HPCA'18] "Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors."
Chenhao Xie, Xin Fu, Shuaiwen Leon Song. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate: 19.7%; Connecting architectural bit stream and its manipulation to user perception impact.)
Video

[CGO'18] "CUDAAdvisor: LLVM-based Runtime Profiling for Modern GPUs."
Du Shen, Shuaiwen Leon Song, Ang Li, Xu Liu. In ACM International Symposium on Code Generation and Optimisation (CGO).

[ICS'18] "Warp-Consolidation: A Novel Execution Model for GPUs."
Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, Shuaiwen Leon Song. In 32nd ACM International Conference on Supercomputing (ICS'18). (A new GPU programming model for synchronisation-critical applications).

[TACO] "NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks."
Probir Roy, Shuaiwen Leon Song, SRIRAM KRISHNAMOORTHY, Dipanjan Sengupta, Xu Liu. In ACM Transactions on Architecture and Code Optimization (TACO).

[TPDS] "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect."
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker. In IEEE Transactions on Parallel and Distributed Systems (TPDS).

[IISWC'18, Best Paper Finalist] "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite."
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker. In 2018 IEEE International Symposium on Workload Characterisation (IISWC).

[SC'17, Best Paper Finalist, DoE Research Highlight article] "Exploring and analysing the real impact of modern on-package memory on HPC scientific kernels."
Ang Li, Weifeng Liu, Mads Ruben Burgdorff Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). (Acceptance rate: 61/327=18.6%; First research investigating performance impact of introducing on-package memory into memory hierarchy on a large spectrum of fundamental HPC scientific kernels and provide formalised modelling scheme for easy performance analysis at large scale. It receives SC'17 best paper finalist and DoE research highlight.)

[MICRO-50] "BVF: enabling significant on-chip power savings via bit-value-favour for throughput processors."
Ang Li, Wenfeng Zhao, Shuaiwen Leon Song. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17). (Acceptance rate:18.6%).

[ASPLOS-17, HiPEAC paper award] "Locality-Aware CTA Clustering for Modern GPUs."
Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). (Acceptance rate:17.4%)
Video

[HPCA'17] "Processing-in-Memory Enabled Graphics Processors for 3D Rendering."
Chenhao Xie, Shuaiwen Leon Song, Xin Fu. In n 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate:21.6%; Using processing-in-memory technique along with graphics software pipeline re-design to accelerate modern rendering efficiency. )

[ICS'16] "SFU-Driven Transparent Approximation Acceleration on GPUs."
Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, Henk Corporaal. In 30th ACM International Conference on Supercomputing (ICS). (Interesting concept about approximation units on modern many-core architectures and how they collaborate together in system to achieve dynamic performance-accuracy trade-offs.)

[HPDC'16] "SMT-Aware Instantaneous Footprint Optimization."
Probir Roy, Xu Liu, Shuaiwen Leon Song. In 25th ACM international Symposium on High-Performance and Distributed Computing (HPDC). (Identifying, debugging and fixing false sharing on multicore processors.)

[ICS'15] "Locality-Driven Dynamic GPU Cache Bypassing."
Chao Li, Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Hari, Huiyang Zhou In 29th ACM International Conference on Supercomputing (ICS).

[SC'15, Best Paper Finalist] "GraphReduce: processing large-scale graphs on accelerator-based systems."
Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, Karsten Schwan. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). (Acceptance rate:18.5%)