Michael Goin

Michael Goin @mgoin

systems engineer making inference fast

About

I've been working in ML inference since 2019, currently focused on making SOTA open-source LLMs run fast on various hardware accelerators in vLLM.

I like working across the stack wherever the bottleneck is - CPU, GPU, compute-bound, memory-bound, io-bound by using Python, PyTorch, C++, CUDA. Most of my time goes into profiling, benchmarking, and figuring out why things are slow.

Before that, my background was in HPC where I worked on robotics, materials science simulations, and neuromorphic computing.

I'm currently working at Red Hat on vLLM to power the open-source AI ecosystem with fast and easy inference. Before acquisition by Red Hat, I was at Neural Magic, where I worked on vLLM and originally built a sparsity-aware inference compiler that optimized CNNs, Transformers, and other models for CPUs.

If you want to reach me, the best way is to ping me @mgoin on vLLM Slack. I'm always happy to collaborate on projects or ideas related to inference performance!

Work

Changelog

Things I've shipped or helped ship.

Talks