High Performance Computing Research

FSA aims to explore challenges and opportunities of future extreme heterogeneous high performance system designs that incorporate a diverse range of general-purpose and customised accelerators, novel memory systems, fast interconnect technologies, new packaging schemes (e.g., Chiplet) that scale beyond Moore's. We would like to take a look at how scientific applications, simulation and big data analytics can better interact with such complex hardware that is designed to scale for performance, identifying key design challenges. Particularly, we focus on improving the ecosystem of scale-up solutions for future complex extreme heterogeneous system architectures, targeting both traditional and emerging workloads.

Publications

Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing, HPCA'23
TEA: A General-Purpose Temporal Graph Random Walk Engine, EuroSys'23
T-GCN: A Sampling Based Streaming Graph Neural Network System With Hybrid Architecture, PACT'22
Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampling on Multi-Accelerator Systems, ICS'22
Vapro: Light-weight Performance Variance Detection and Diagnosis for Production Run Parallel Applications, PPoPP' 22
TeGraph: A Novel General-Purpose Temporal Graph Computing Engine, ICDE'22
MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers, SC'21
Dr.Top-k: Delegate-Centric Top-k on Heterogeneous HPC Architectures, SC'21
An efficient uncertain graph processing framework for heterogeneous architectures, PPoPP'21
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures, ICPP'21
TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs, JPDC'21
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect, TPDS'20
Speeding up Collective Communications Through Inter-GPU Rerouting, Computer Architecture Letters 2020.
Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity, ACM Transactions on Design Automation of Electronic Systems, 2020
LoSCache: Leveraging Locality Similarity to Build Energy-Efficient GPU L2 Cache, DATE 2019
CUDAAdvisor: LLVM-based Runtime Profiling for Modern GPUs, CGO'18
Lightweight Detection of Cache Conflicts, CGO'18
Warp-Consolidation: A Novel Execution Model for GPUs, ICS'18
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite, IISWC'18, BEST PAPER FINALIST
BVF: Enabling Significant On-Chip Power Savings via Bit-Value-Favor For Throughput Processors, MICRO'17
Exploring and Analysing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels, SC'17, BEST PAPER RUNNERS-UP
Locality-Aware CTA Clustering For Modern GPUs, ASPLOS'17, 2017 HiPEAC PAPER AWARD
ScalaFSM: Enabling Scalability-Sensitive Speculative Parallelisation for FSM Computations, ICS'17
EvoGraph: On-The-Fly Efficient Mining of Evolving Graphs on GPU, ISC'17
ORION: A Framework for GPU Occupancy Tuning, Middleware'16
SFU-Driven Transparent Approximation Acceleration on GPUs, ICS'16
Tag-Split Cache for Efficient GPGPU Cache Utilisation, ICS'16
Combating the Reliability Challenge of GPU Register File at Low Supply Voltage, PACT'16
SMT-Aware Instantaneous Footprint Optimisation, HPDC'16
New-Sum: A Novel Online ABFT Scheme For General Iterative Methods, HPDC'16
X: A Comprehensive Analytic Model for Parallel Machines, IPDPS'16
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology, HiPEAC'16
Critical Point Based Register-Concurrency Autotuning For GPUs, DATE'16
GraphReduce: Processing Large-scale Graphs on Accelerator-based Systems, SC'15, BEST PAPER RUNNERS-UP
Locality-Driven Dynamic GPU Cache Bypassing, ICS'15
Investigating the interplay between energy efficiency and resilience in high performance computing, IPDPS'15
Gregarious Data Restructuring in a Many Core Architecture, HPDC'15
Scaling Support Vector Machines On Modern HPC Platforms, JPDC'15
MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures, IPDPS'14
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures, IPDPS'13
Iso-energy-efficiency: An approach to power-constrained parallel computation, IPDPS'11
PowerPack: Energy profiling and analysis of high performance systems and applications, TPDS'10