Speculative decoding with block diffusion for high-quality, ultra-fast parallel drafting and up to 6x lossless speedup.
Jian Chen
I am a first-year Ph.D. student at UC San Diego, where I work on efficient machine learning with Prof. Zhijian Liu.
Previously, I completed my master's degree at CMU, where I worked with Prof. Beidi Chen on efficient long-context LLM. I received my bachelor's degree from Zhejiang University.
Recent work
ICLR 2025
Long-context speculative decoding that breaks the latency-throughput tradeoff with sparse-KV drafting.