Mobile
My iOS apps
Other apps
Open source
  • Bandwidth benchmark
  • RAVM virtual machine
  • Big integer division
  • Prime numbers
  • AntiJOP sanitizer
  • TouchWidgets UI lib
  • Networking utils
  • Documentation
  • x86 instructions ref
  • GIT quick ref
  • GPG quick ref
  • Avoid Ubuntu
  • Android malware risks
  • iOS malware risks
  • OS/X security tips
  • Tor blockers
  • Software engineering
  • BASH aliases
  • I.B. pro/con
  • Why 64-bit?
  • Nutrition
  • Blog
  • Contact
    1 at zsmith dot co

    Why 64-Bit Processors Are Better

    Revision 13
    © by
    All rights reserved.

    Introduction

    The issue comes up from time to time of why 64-bit processors are better than 32-bit CPUs and whether 64-bit specifically are necessary or even desirable. Especially with the introduction of mobile devices like the and , which use the 64-bit ARM8 in the form of Apple's , these questions have been revived.

    Here is my list of reasons why 64-bit processors are preferable.

    Reason 0: Memory bandwidth

    First let us compare, side by side, the 32-bit iPhone 5c and the iPhone 5s:

     

    The difference is clear. Without any indication as to which phone is which, the numbers speak for themselves.

    Width of the memory bus

    Computers that are 64-bits usually have wider data paths going to memory, meaning that typically twice the number of bytes can be moved to and from RAM per clock cycle. This is not always the case, but it usually is. This may explain why in my memory bandwidth benchmarking of the iPhone 5s (see my app iBenchmark), in which I use my 32-bit ARM assembly language routines, I have found that the 5s exhibits substantially faster memory performance than does the same-speed 32-bit iPhone 5C.

    Transfers using wider registers

    My tests of the memory bandwidth of Intel-based computers (here) have shown that even with 32-bit computers, the vector registers, which are usually 64- or 128-bits wide, can be used to improve the speed of memory transfers. As a rule, moving data into and out of 64- and 128-bit registers is often faster than moving it into and out of a 32-bit register. The reason is that most processors connect to RAM through a 64-bit bus.

    While you don't necessarily need a 64-bit CPU to perform fast memory copies or writes, so long as your CPU has vector registers, my tests have shown that 64-bit CPUs usually improve upon the speed of using vector registers for copies and writes.

    Reason 1: Computer security

    In order to fend off computer hacking attempts, a larger virtual address space is preferable, even if the amount of RAM inside a device such as a phone remains small. This is because modern operating systems, including those found in phones like iOS, use ASLR: Address Space Layout Randomization. Many exploits rely on being able to locate vulnerable software and data within a computer's address space. ASLR is a response to this. It places data and software at random locations that cannot be as easily found. The bigger the address space, the more randomness can be applied and the harder it is for malware to guess where things are.

    A counterargument would be that ASLR is no longer so important now that we have No-Execute (NX) bits for virtual memory pages. However NX has been proven to not be the panacea that was initially hoped for, and not all processors support an NX bit.

    Reason 2: Faster vector operations

    64-bit CPUs sometimes, but not always, have wider vector registers and/or more vector registers than do 32-bit processors. Vector registers are used to perform SIMD (meaning single-instruction, multiple-data) operations, including:
    • Matrix math for 3D graphics
    • Digital signal processing for audio
    • Video decoding and encoding
    • Cryptography
    SIMD is all about loading the registers with useful data as fast as possible, operating on them as fast as possible, and storing any results quickly, therefore the more registers that are available for SIMD the better, especially if the memory bandwidth has improved to support increased vector register load-store traffic.

    In the case of the 64-bit ARM (AArch64) there are twice as many 128-bit vector registers as in 32-bit ARM.

    Reason 3: Larger transistor budgets

    64-bit CPUs often have substantially more transistors than do 32-bit CPUs, and the rationale is unavoidable that if a processor is going to be given larger numbers of transitors, more functional units may as well be added to perform arithmetic or similar operations. The more functional units that a modern, superscalar processor has, the faster it can run your software.

    Larger transistor budgets may also explain why we see SHA256 instructions and AES support appearing in many newer processors, like the 64-bit ARM. If you have more transistors to spend, the more luxuries you can add.

    Reason 4: New instruction set architecture

    The 64-bit Intel and AMD processors have had to maintain backward compatibility, supporting a convoluted, variable-length instruction encoding scheme.

    In contrast, 64-bit ARM CPUs utilize an instruction format that is different from older 32-bit and 16-bit ARM instruction formats. (ARM64 can also execute 32-bit ARM code.) The new instructions are 32 bits in length, and enjoy even better performance if you believe the marketing materials. In fact, my (synthetic) benchmarks for iOS embodied in iBenchmark show that the iPhone 5s's A7 is considerably faster than the same-speed A6 that is in the iPhone 5C.

    It is unfortunate that Intel and AMD did not have the will to introduce a new ISA for their 64-bit processors, or at least a fixed-length instruction encoding scheme. AMD led the charge into 64 bits after Intel dragged its feet, but AMD did not really innovate when doing so. The result is that a great many transistors are needed to simply decode variable-length x86_64 instructions, leading to less efficient use of power, more reliance on microcode and higher chip costs, compared to 64-bit ARM chips.

    Intel CPUs actually translate instructions into an intermediate format that the microcode then executes, and which is buffered. In short, bad design decisions earlier on with x86 led to worse ones down the line.

    In contrast, the ARM architecture was originally designed to permit avoidance of the use of microcode, which has led to greater power efficiency and smaller chips.

    Reason 5: More registers

    Typically 64-bit processors have twice the number of registers as their 32-bit counterparts. This is true regarding the Intel and AMD processors and the ARM processors. Having twice as many registers mean the CPU can run common tasks faster because data does not have to be put on the stack or elsewhere in memory.

    Reason 6: Parameters in registers

    There is a security advantage to be had from placing any function parameters in registers rather than on the stack as 32-bit calling conventions typically do.

    64-bit calling conventions place the first few parameters of a function call in registers rather than on the stack.

    Why is this advantageous? It's simple: Anything that is on the stack can be overwritten using a buffer overflow attack.

    While return addresses generally go on the stack, unless you have a rare SPARC CPU, keeping parameters in registers limits the types of attacks that can be done somewhat because parameters can't be corrupted.

    In one common type of hacking exploit, called return to libc, the exploit uses a buffer overrun to both overwrite the current return address and it provides its own parameter(s). The return address is changed to that of a libc function such as int system(const char*), and the parameter is changed to point to something useful like a command to give remote access.

    Because 64-bit systems place the first few parameters in registers rather than on the stack, the important parameters cannot be overwritten using a buffer overflow.




    © Zack Smith