This chapter attempts to cover some of the common issues involved when writing 64-bit code, to run under Win64 or Unix. It covers how to write assembly code to interface with 64-bit C routines, and how to write position-independent code for shared libraries.
All 64-bit code uses a flat memory model, since segmentation is not
available in 64-bit mode. The one exception is the FS
and
GS
registers, which still add their bases.
Position independence in 64-bit mode is significantly simpler, since the
processor supports RIP
–relative addressing directly; see
the REL
keyword (section
3.3). On most 64-bit platforms, it is probably desirable to make that
the default, using the directive DEFAULT REL
(section 7.2).
64-bit programming is relatively similar to 32-bit programming, but of course pointers are 64 bits long; additionally, all existing platforms pass arguments in registers rather than on the stack. Furthermore, 64-bit platforms use SSE2 by default for floating point. Please see the ABI documentation for your platform.
64-bit platforms differ in the sizes of the C/C++ fundamental datatypes,
not just from 32-bit platforms but from each other. If a specific size data
type is desired, it is probably best to use the types defined in the
standard C header <inttypes.h>
.
All known 64-bit platforms except some embedded platforms require that
the stack is 16-byte aligned at the entry to a function. In order to
enforce that, the stack pointer (RSP
) needs to be aligned on
an odd
multiple of 8 bytes before the CALL
instruction.
In 64-bit mode, the default instruction size is still 32 bits. When loading a value into a 32-bit register (but not an 8- or 16-bit register), the upper 32 bits of the corresponding 64-bit register are set to zero.
NASM uses the following names for general-purpose registers in 64-bit mode, for 8-, 16-, 32- and 64-bit references, respectively:
AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15
This is consistent with the AMD documentation and most other assemblers.
The Intel documentation, however, uses the names R8L-R15L
for
8-bit references to the higher registers. It is possible to use those names
by definiting them as macros; similarly, if one wants to use numeric names
for the low 8 registers, define them as macros. The standard macro package
altreg
(see section
6.1) can be used for this purpose.
In 64-bit mode, immediates and displacements are generally only 32 bits wide. NASM will therefore truncate most displacements and immediates to 32 bits.
The only instruction which takes a full 64-bit immediate is:
MOV reg64,imm64
NASM will produce this instruction whenever the programmer uses
MOV
with an immediate into a 64-bit register. If this is not
desirable, simply specify the equivalent 32-bit register, which will be
automatically zero-extended by the processor, or specify the immediate as
DWORD
:
mov rax,foo ; 64-bit immediate mov rax,qword foo ; (identical) mov eax,foo ; 32-bit immediate, zero-extended mov rax,dword foo ; 32-bit immediate, sign-extended
The length of these instructions are 10, 5 and 7 bytes, respectively.
If optimization is enabled and NASM can determine at assembly time that
a shorter instruction will suffice, the shorter instruction will be emitted
unless of course STRICT QWORD
or STRICT DWORD
is
specified (see section 3.7):
mov rax,1 ; Assembles as "mov eax,1" (5 bytes) mov rax,strict qword 1 ; Full 10-byte instruction mov rax,strict dword 1 ; 7-byte instruction mov rax,symbol ; 10 bytes, not known at assembly time lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI
Note that lea rax,[rel symbol]
is position-independent,
whereas mov rax,symbol
is not. Most ABIs prefer or even
require position-independent code in 64-bit mode. However, the
MOV
instruction is able to reference a symbol anywhere in the
64-bit address space, whereas LEA
is only able to access a
symbol within within 2 GB of the instruction itself (see below.)
The only instructions which take a full 64-bit displacement is
loading or storing, using MOV
, AL
,
AX
, EAX
or RAX
(but no other
registers) to an absolute 64-bit address. Since this is a relatively rarely
used instruction (64-bit code generally uses relative addressing), the
programmer has to explicitly declare the displacement size as
ABS QWORD
:
default abs mov eax,[foo] ; 32-bit absolute disp, sign-extended mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended mov eax,[qword foo] ; 64-bit absolute disp default rel mov eax,[foo] ; 32-bit relative disp mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) mov eax,[qword foo] ; error mov eax,[abs qword foo] ; 64-bit absolute disp
A sign-extended absolute displacement can access from –2 GB to +2 GB; a zero-extended absolute displacement can access from 0 to 4 GB.
On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the CPU in 64-bit mode) is defined by the documents at:
Although written for AT&T-syntax assembly, the concepts apply equally well for NASM-style assembly. What follows is a simplified summary.
The first six integer arguments (from the left) are passed in
RDI
, RSI
, RDX
, RCX
,
R8
, and R9
, in that order. Additional integer
arguments are passed on the stack. These registers, plus RAX
,
R10
and R11
are destroyed by function calls, and
thus are available for use by the function without saving.
Integer return values are passed in RAX
and
RDX
, in that order.
Floating point is done using SSE registers, except for
long double
, which is 80 bits (TWORD
) on most
platforms (Android is one exception; there long double
is 64
bits and treated the same as double
.) Floating-point arguments
are passed in XMM0
to XMM7
; return is
XMM0
and XMM1
. long double
are
passed on the stack, and returned in ST0
and ST1
.
All SSE and x87 registers are destroyed by function calls.
On 64-bit Unix, long
is 64 bits.
Integer and SSE register arguments are counted separately, so for the case of
void foo(long a, double b, int c)
a
is passed in RDI
, b
in
XMM0
, and c
in ESI
.
The Win64 ABI is described by the document at:
What follows is a simplified summary.
The first four integer arguments are passed in RCX
,
RDX
, R8
and R9
, in that order.
Additional integer arguments are passed on the stack. These registers, plus
RAX
, R10
and R11
are destroyed by
function calls, and thus are available for use by the function without
saving.
Integer return values are passed in RAX
only.
Floating point is done using SSE registers, except for
long double
. Floating-point arguments are passed in
XMM0
to XMM3
; return is XMM0
only.
On Win64, long
is 32 bits; long long
or
_int64
is 64 bits.
Integer and SSE register arguments are counted together, so for the case of
void foo(long long a, double b, int c)
a
is passed in RCX
, b
in
XMM1
, and c
in R8D
.