Fast, Vectorized Quicksort Partitioning Algorithms



I work under Dr. Dmitri Loguinov in the Texas A&M Internet Research Lab.

The primary workload of the quicksort algorithm lies in partitioning the input around a pivot. By leveraging AVX vector instructions and optimizing L1 cache utilization, we can achieve a significantly faster quicksort partition than previous implementations.

Our algorithms are designed to minimize the number of physical instructions, avoid execution port bottlenecks, eliminate branches, and maximize branch prediction accuracy.

Because we’re optimizing at a level far beyond what’s possible in C or C++, the most performance-critical partitioning core is written in x86-64 assembly using Microsoft Macro Assembler.





More Projects