Cache and Multi-Core Efficient Algorithms for High-Degree Permutations

Aug. 2020
Ryan Thomas Gibb and Steve Linton.
Undergraduate Research Assistant Scheme project.

PDF URL BibTeX

The traditional naive permutation composition algorithm is limited by memory latency and not CPU speed. Algorithms to take advantage of current memory hierarchies and multiple cores can be designed so as to be limited by memory bandwidth instead. We have implemented and benchmarked a number of such algorithms. We have found a multithreaded Rust implementation of the naive algorithm is 64% than the naive algorithm on an Intel Xeon E3-1230 v5 with 64GB RAM for permutations of degree 2^32. This can be adapted to improve the performance of other permutation operations, as well as other algorithms that access large amounts of main memory pseudo-randomly