Open source
  • Bandwidth benchmark
  • TouchWidgets UI lib
  • Diviner big number math
  • Documentation
  • x86 instructions ref
  • GIT quick ref
  • GPG quick ref
  • Avoid Ubuntu
  • Android malware risks
  • iOS malware risks
  • OS/X security tips
  • Who blocks Tor
  • Software engineering
  • BASH aliases
  • I.B. pro/con
  • Nutrition
  • Other apps
  • Contact
    1 at zsmith dot co

    Why 64-Bit Processors Are Better

    Revision 14
    © by
    All rights reserved.


    The issue comes up from time to time of why 64-bit processors are better than 32-bit CPUs and whether 64-bit specifically are necessary or even desirable. Especially with the introduction of mobile devices like the and , which use the 64-bit ARM8 in the form of Apple's , these questions have been revived.

    Here is my list of reasons why 64-bit processors are preferable.

    Reason 0: Memory bandwidth

    First let us compare, side by side, the 32-bit iPhone 5c and the iPhone 5s:


    The difference is clear. Without any indication as to which phone is which, the numbers speak for themselves.

    Width of the memory bus

    Computers that are 64-bits usually have wider data paths going to memory, meaning that typically twice the number of bytes can be moved to and from RAM per clock cycle. This is not always the case, but it usually is. Therefore:
    • Programs that need to move lots of data are faster.
    • Caches load faster so all programs spend less time waiting for their data and instructions.

    Transfers using wider registers

    While you don't necessarily need a 64-bit CPU to perform fast memory copies or writes, so long as your CPU has vector registers, my tests have shown that 64-bit CPUs usually improve upon the speed of using vector registers for copies and writes.

    Reason 1: Computer security

    In order to fend off computer hacking attempts, a larger virtual address space is preferable, even if the amount of RAM inside a device such as a phone remains small. This is because modern operating systems, including those found in phones like iOS, use ASLR: Address Space Layout Randomization. Many exploits rely on being able to locate vulnerable software and data within a computer's address space. ASLR is a response to this. It places data and software at random locations that cannot be as easily found. The bigger the address space, the more randomness can be applied and the harder it is for malware to guess where things are.

    A counterargument would be that ASLR is no longer so important now that we have No-Execute (NX) bits for virtual memory pages. However NX has been proven to not be the panacea that was initially hoped for, and not all processors support an NX bit.

    Reason 2: Faster vector operations

    64-bit CPUs sometimes, but not always, have wider vector registers and/or more vector registers than do 32-bit processors. Vector registers are used to perform SIMD (meaning single-instruction, multiple-data) operations, including:
    • Matrix math for 3D graphics
    • Digital signal processing for audio
    • Video decoding and encoding
    • Cryptography
    SIMD is all about loading the registers with useful data as fast as possible, operating on them as fast as possible, and storing any results quickly, therefore the more registers that are available for SIMD the better, especially if the memory bandwidth has improved to support increased vector register load-store traffic.

    In the case of the 64-bit ARM (AArch64) there are twice as many 128-bit vector registers as in 32-bit ARM.

    Reason 3: Larger transistor budgets

    64-bit CPUs often have substantially more transistors than do 32-bit CPUs, and the rationale is unavoidable that if a processor is going to be given larger numbers of transitors, more functional units may as well be added to perform arithmetic or similar operations. The more functional units that a modern, superscalar processor has, the faster it can run your software.

    Larger transistor budgets may also explain why we see SHA256 instructions and AES support appearing in many newer processors, like the 64-bit ARM. If you have more transistors to spend, the more luxuries you can add.

    Reason 4: New instruction set architecture (ISA)

    The 64-bit Intel and AMD processors have had to maintain backward compatibility, supporting a convoluted, variable-length instruction encoding scheme.

    In contrast, 64-bit ARM CPUs utilize an instruction format that is different from older 32-bit and 16-bit ARM instruction formats. (ARM64 can also execute 32-bit ARM code.) The new instructions are 32 bits in length, and enjoy even better performance if you believe the marketing materials. In fact, my (synthetic) benchmarks for iOS embodied in iBenchmark show that the iPhone 5s's A7 is considerably faster than the same-speed A6 that is in the iPhone 5C.

    It is unfortunate that Intel and AMD did not have the will to introduce a new ISA for their 64-bit processors, or at least a fixed-length instruction encoding scheme. AMD led the charge into 64 bits after Intel dragged its feet, but AMD did not really innovate when doing so. The result is that a great many transistors are needed to simply decode variable-length x86_64 instructions, leading to less efficient use of power, more reliance on microcode and higher chip costs, compared to 64-bit ARM chips.

    Intel CPUs actually translate instructions into an intermediate format that the microcode then executes, and which is buffered. In short, bad design decisions earlier on with x86 led to worse ones down the line.

    In contrast, the ARM architecture was originally designed to avoid the use of microcode, which has led to greater power efficiency and smaller chips.

    Reason 5: More registers

    Typically 64-bit processors have twice the number of registers as their 32-bit counterparts. This is true regarding the Intel and AMD processors and the ARM processors. Having twice as many registers mean the CPU can run common tasks faster because data does not have to be put on the stack or elsewhere in memory.

    Reason 6: Parameters in registers

    64-bit calling conventions place the first few parameters of a function call in registers rather than on the stack.

    There is a security advantage to be had from placing any function parameters in registers rather than on the stack as 32-bit calling conventions typically require.

    Why is this advantageous? It's simple: Anything that is on the stack can be overwritten using a buffer overflow attack.

    While return addresses are typically the primary target of attacks, parameters can be too.

    In one type of hacking exploit, called return to libc, the exploit uses a buffer overflow to both overwrite the current return address and provide its own parameter. For instance, the return address might be changed to that of the libc function int system(const char*). When the CPU arrives at system, that function looks onto the corrupted stack for its first parameter: a pointer to a string (also situated in the corrupted stack) for a command to enable remote access.

    © Zack Smith