Rob Clucas

I'm a C++ software developer with experience optimising large-scale GPU applications


I am a final year PhD Student in the Laboratory for Scientific Computing at the University of Cambridge, supervised by Dr Philip Blakely and Professor Nikos Nikiforakis.

My research interests are primarily in distributed computing, particularly the application of challenging problems to multi-gpu systems. My recent research has involved developing a framework which makes large-scale multi-gpu programming accessible with minimal GPU knowledge, see ripple, and using it to solve difficult real-world problems in fluid and solid dynamics which require massive computational resources.

I have been using C++ and CUDA for the past 7 years, and enjoy optimising applications, both for multi-threaded CPUs and especially for GPUs. I am also interested in machine learning, both in keeping up with the latest research, as well as in exploring existing frameworks such as PyTorch and Tensorflow to understand how they work and potentially contribute in the future.

The work I most enjoy is at the intersection of research and application, particularly involving GPUs, and am interested in any such job opportunities.

Outside of my work, I enjoy reading, playing golf, and triathlon. I am originally from South Africa, but have been in the UK for the past 4 years.

Education


PhD in Physics

University of Cambridge

Jan 2018 - March 2021

Advisor : Dr Philip Blakely & Prof Nikos Nikiforakis
Thesis : Acceleration of Fluid and Solid Dynamics on Massively Parallel Systems

I have developed a general purpose compute framework which allows parallel programs with flexible data layout (AoS/SoA) to be executed on many-gpu systems with minimal knowledge of distributed programming or GPUs, and without performance compromise. I have used the framework to solve large multi-material interaction problems in computational fluid and solid dynamics on many GPUs, which has not previously been done.

MPhil in Scientific Computing

University of Cambridge

Sept 2016 - Sept 2017

Advisor : Dr Philip Blakely
Thesis : Embedded Boundary Methods on Highly Parallel Architectures

I developed a GPU solution for a problem in computational fluid dynamics, the cut cell problem, which was an order of magnitude improvement compared to a parallel CPU implementation, and identified areas where further performance improvements could be made.

BSc in Electrical and Information Engineering

University of the Witwatersrand

Jan 2012 - Nov 2015

Advisor : Prof Scott Hazelhurst
Thesis : Parahaplo: A Heterogeneous Haplotype Solver for Accelerating Graph Search with GPU Processing

In addition to the four years of coursework, for my final thesis I developed a GPU solver for the haplotype assembly problem--an NP hard problem in computational biology-- which was up to 50x faster than the current state of the art implementations, which were all CPU based.

*Please contact me for copies of any of the above work

Experience


Netronome Systems

Apr 2016 - Sept 2016

Netronome

Software Engineer
I rewrote in-house packet generation software to allow configurable bursty traffic patterns, of up to 100Gbps, using a multi-threaded approach with a custom memory allocator. This resulted in improved performance by up to 60% over the previous version, allowing in-house servers to be used for packet generation.

I patched an open source packet generation engine to support Netronome’s network card drivers and in-house traffic pattern generator, allowing the performance of the network cards to be tested using open source software.

Symmetry Electronics

Feb 2015 - March 2016

Symmetry

Electrical Engineer
Designed and implemented the WAB1 board – a development board which bridges Bluetooth Low Energy and Cellular technologies – allowing simple proof-of-concept development for customers.
I also wrote an iOS app, software examples and documentation for the WAB1 board to allow users to get started quickly.

Research


Submitted/Under Review

R. Clucas, P. Blakely, N. Nikiforakis, Simulation of Multiple Interacting Materials on Many GPUs with the Ghost Fluid Method, Journal of Computational Physics, 2021 [ Coming soon. ]

Published

R. Clucas, P. Blakely, N. Nikiforakis, Ripple : Simplified Large-Scale Computation on Heterogeneous Architectures withPolymorphic Data Layout, Journal of Parallel and Distributed Computing, 2021 [ arXiv:2104.08571 ]

R. Clucas, S. Levitt, CAPP: A C++ Aspect-Oriented Based Framework for Parallel Programming with OpenCL, in Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and InformationTechnologists, ser. SAICSIT 15. New York, NY, USA: ACM, 2015, pp. 10:110:10

Projects


The following is a selected list of projects with a brief overview of each, click the links to see more.

Ripple

Ripple is a framework for large-scale heterogeneous compute, with a focus on multi-gpu systems. It requires minimal knowledge of parallel, distributed, or gpu programming, and will allow simple c++ code to scale to large systems.

It also supports polymorphic data layout, which allows user-defined classes to be stored as SoA or AoS, significantly improving performance on the GPU.

It scales very well, having been used to scale real-world computational fluid dynamics up to 7.3x on an 8 GPU system.

Fracture

Fracture is a library to perform large-scale simulations of multiple interacting materials, such as between fluids, gasses, and solids, using finite volume methods. It is currently a work in progress, however, simulations of high impact shock waves on air and helium bubbles and underwater explosions are examples of simulations which have been performed correctly.

On multi-gpu systems, it can be used to simulate 3D interactions in domains consisting of billions of cells in real-time. Support is currently being added for internal solid dynamics such as stress, strain, fracture, and void.

Flame

A C++ library machine learning library built on PyTorch for fast inference of common object-detection and pose-estimation model. It is very much a work in progress, and used as a personal playground for implementing machine learning algorithms for understanding.

I intend to add models as I come across them and as I get time, and test the benefits for inference and training time when using the c++ PyTorch interface with custom CUDA implementations.

Snowflake

Snowflake is a rendering engine written in Vulkan. It has mostly been used to learn about Vulkan and entity component systems. It is not very complete at this stage, but I hope to extend the current functionality to allow for real-time rendering of the simulations for fracture, as I have found existing solutions for rendering large-scale simulation data limiting.

Skills


The following are the programming languages, frameworks, and compilers while I am familiar with to varying degrees.

  • C++
  • CUDA
  • C
  • Python
  • Rust
  • GCC
  • MPI
  • Clang
  • NVCC
  • OpenMP
  • PyTorch
  • Tensorflow

Contact

If you'd like to get in touch, you can reach me at any of the following links.