Diviner big-number math benchmark

Revision 10
© 2015-2019 by Zack Smith. All rights reserved.


Some time ago I performed an experiment in which I sought to compare the performance of two x86_64 assembly language routines that I had written that perform 256-bit division. I wrote one routine to use x86 registers whereas the other stored values in the caches. I ended up expanding and genericizing the in-memory division routine to allow any bignum size. I had effectively created the core routines for a benchmark that estimates CPU performance using big-number division, so I went ahead and wrote that benchmark. Diviner presently measures the performance of bignum division for sizes of 256 bits up to 4 megabits.

What does it test?

This benchmark effectively tests the CPU's arithmetic processing speed, the caches' speed and the main memory speed.


Intel Core i5-5257U, 1866 MHz DDR3 RAM:

Dividend size divisor size divisions per second:
256-bit (registers) 128 1838024.1763
256-bit (memory) 128 680377.3001
1024-bit (memory) 512 46437.9947
4096-bit (memory) 2048 2737.1900
16384-bit (memory) 8192 163.5578
65536-bit (memory) 32768 10.1042
262144-bit (memory) 131072 0.5918
1048576-bit (memory) 524288 0.0276
4194304-bit (memory) 2097152 0.0012

Intel Core i5-8250U, 2400 MHz DDR4 RAM:

Dividend size divisor size divisions per second:
256-bit (registers) 128 2249776.3646
256-bit (memory) 128 984890.8786
1024-bit (memory) 512 55738.5356
4096-bit (memory) 2048 3153.5974
16384-bit (memory) 8192 190.3464
65536-bit (memory) 32768 11.7879
262144-bit (memory) 131072 0.7120
1048576-bit (memory) 524288 0.0417
4194304-bit (memory) 2097152 0.0024




  • This requires the Unix 64-bit ABI therefore it doesn't run on Windows.
  • The assembly code utilizes loop unrolling and can be time-consuming to assemble.