ARM64 程序调用标准
- 1 Machine Registers
- 1.1 General-purpose Registers
- 1.2 SIMD and Floating-Point Registers
- 2 Processes, Memory and the Stack
- 2.1 Memory Addresses
- 2.2 The Stack
- 2.2.1 Universal stack constraints
- 2.2.2 Stack constraints at a public interface
- 2.3 The Frame Pointer
- 3 Subroutine Calls
- 3.1 Use of IP0 and IP1 by the linker
- 4 Parameter Passing
- 4.1 Variadic Subroutines
- 4.2 Parameter Passing Rules
- 5 Result Return
- 6 Interworking
The base standard defines a machine-level calling standard for the A64 instruction set. It assumes the availability of the vector registers for passing floating-point and SIMD arguments. Application code is expected to conform to one of the two defined major variants of it (SVR4-like or Windows-like).
基础标准定义了 A64 指令集的machine-level 调用标准。它假定了用于传递浮点和 SIMD 参数的矢量寄存器的可用性。应用代码应符合其两个主要变体(类 SVR4 或类 Windows)中的一个。
note:
SRV4: System V Revision 4. A variant of the Unix Operating System. Although this specification refers to SVR4, many other operating systems, such as Linux or BSD use similar rules.
SRV4:System V Revision 4。Unix 操作系统的一种变体。虽然本规范指的是 SVR4,但许多其他操作系统(如 Linux 或 BSD)也使用类似的规则。
1 Machine Registers
The ARM 64-bit architecture defines two mandatory register banks: a general-purpose register bank which can be used for scalar integer processing and pointer arithmetic; and a SIMD and Floating-Point register bank.
ARM 64 位体系结构定义了两个强制性寄存器组:一个通用寄存器组,可用于标量整数处理和指针运算;另一个 SIMD 和浮点寄存器组。
1.1 General-purpose Registers
There are thirty-one, 64-bit, general-purpose (integer) registers visible to the A64 instruction set; these are labeled r0-r30. In a 64-bit context these registers are normally referred to using the names x0-x30; in a 32-bit context the registers are specified by using w0-w30. Additionally, a stack-pointer register, SP, can be used with a restricted number of instructions. Register names may appear in assembly language in either upper case or lower case. In this specification upper case is used when the register has a fixed role in this procedure call standard. Table 2, General purpose registers and AAPCS64 usage summarizes the uses of the general-purpose registers in this standard. In addition to the general-purpose registers there is one status register (NZCV) that may be set and read by conforming code.
A64 指令集有 31 个 64 位通用(整数)寄存器,分别标为 r0-r30。在 64 位环境下,这些寄存器通常使用 x0-x30 的名称;在 32 位环境下,这些寄存器使用 w0-w30 的名称。此外,堆栈指针寄存器 SP 可用于数量有限的指令。寄存器名称在汇编语言中可以大写或小写出现。在本规范中,当寄存器在程序调用标准中具有固定作用时,则使用大写。表 2(通用寄存器和 AAPCS64 的使用)概述了本标准中通用寄存器的用途。除通用寄存器外,还有一个状态寄存器(NZCV)可由符合标准的代码设置和读取。
The first eight registers, r0-r7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
前 8 个寄存器(r0-r7)用于向子程序传递参数值和从函数返回结果值。它们也可用于保存例程中的中间值(但一般只在子例程调用之间使用)。
Registers r16 (IP0) and r17 (IP1) may be used by a linker as a scratch register between a routine and any subroutine it calls (for details, see §3.1.1, Use of IP0 and IP1 by the linker). They can also be used within a routine to hold intermediate values between subroutine calls.
寄存器 r16(IP0)和 r17(IP1)可以被链接器用作例程和它调用的任何子例程之间的scratch寄存器(详见章节 3.1.1,链接器对 IP0 和 IP1 的使用)。它们也可以在例程中使用,在子程序调用之间保存中间值。
The role of register r18 is platform specific. If a platform ABI has need of a dedicated general purpose register to carry inter-procedural state (for example, the thread context) then it should use this register for that purpose. If the platform ABI has no such requirements, then it should use r18 as an additional temporary register. The platform ABI specification must document the usage for this register.
寄存器 r18 的作用与平台有关。如果平台 ABI 需要一个专用的通用寄存器来承载程序间状态(例如线程上下文),那么就应该使用该寄存器。如果平台 ABI 没有此类要求,则应使用 r18 作为额外的临时寄存器。平台 ABI 规范必须记录该寄存器的用途。
Note: Software developers creating platform-independent code are advised to avoid using r18 if at all possible. Most compilers provide a mechanism to prevent specific registers from being used for general allocation; portable hand-coded assembler should avoid it entirely. It should not be assumed that treating the register as callee-saved will be sufficient to satisfy the requirements of the platform. Virtualization code must, of course, treat the register as they would any other resource provided to the virtual machine.
注: 建议创建平台独立代码的软件开发人员尽可能避免使用 r18。大多数编译器都提供了防止将特定寄存器用于通用分配的机制;可移植的手工编码汇编程序应完全避免使用 r18。不要以为将寄存器视为 “可保存的”(calle-saved)寄存器就能满足平台的要求。当然,虚拟化代码必须像对待提供给虚拟机的其他资源一样对待寄存器。
A subroutine invocation must preserve the contents of the registers r19-r29 and SP.
子程序调用必须保留寄存器 r19-r29 和 SP 的内容。
In all variants of the procedure call standard, registers r16, r17, r29 and r30 have special roles. In these roles they are labeled IP0, IP1, FP and LR when being used for holding addresses (that is, the special name implies accessing the register as a 64-bit entity).
在程序调用标准的所有变体中,寄存器 r16、r17、r29 和 r30 具有特殊作用。当寄存器用于保存地址时,它们被标记为 IP0、IP1、FP 和 LR(也就是说,特殊名称意味着将寄存器作为 64 位实体访问)。
Note: The special register names (IP0, IP1, FP and LR) should be used only in the context in which they are special. It is recommended that disassemblers always use the architectural names for the registers.
注意*: 特殊寄存器名称(IP0、IP1、FP 和 LR)只能在特殊情况下使用。建议反汇编程序始终使用寄存器的结构名称。
The NZCV register is a global condition flag register with the following properties:
NZCV 寄存器是全局条件标志寄存器,具有以下特性:
- The N, Z, C and V flags are undefined on entry to and return from a public interface.
- 在进入公共接口和从公共接口返回时,N、Z、C 和 V 标志都是未定义的。
1.2 SIMD and Floating-Point Registers
The ARM 64-bit architecture also has a further thirty-two registers, v0-v31, which can be used by SIMD and Floating-Point operations. The precise name of the register will change indicating the size of the access.
ARM 64 位架构还有另外 32 个寄存器 v0-v31,可供 SIMD 和浮点运算使用。寄存器的精确名称将发生变化,指示访问的大小。
Note Unlike in AArch32, in AArch64 the 128-bit and 64-bit views of a SIMD and Floating-Point register do no overlap multiple registers in a narrower view, so q1, d1 and s1 all refer to the same entry in the register bank.
注意 与 AArch32 不同,在 AArch64 中,SIMD 和浮点寄存器的 128 位和 64 位视图不会与较窄视图中的多个寄存器重叠,因此 q1、d1 和 s1 都指向寄存器库中的同一个条目。
The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
前 8 个寄存器(v0-v7)用于向子程序传递参数值和从函数返回结果值。这些寄存器还可用于保存例程中的中间值(但一般只在子例程调用之间使用)。
Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64-bits of each value stored in v8-v15 need to be preserved; it is the responsibility of the caller to preserve larger values.
子程序调用时,被调用者必须保留寄存器 v8-v15;其余寄存器(v0-v7、v16-v31)无需保留(或应由调用者保留)。此外,只有 v8-v15 中存储的每个值的底部 64 位才需要保留;调用者有责任保留更大的值。
The FPSR is a status register that holds the cumulative exception bits of the floating-point unit. It contains the fields IDC, IXC, UFC, OFC, DZC, IOC and QC. These fields are not preserved across a public interface and may have any value on entry to a subroutine.
FPSR 是一个状态寄存器,保存浮点单元的累积异常位。它包含字段IDC、IXC、UFC、OFC、DZC、IOC和QC。这些字段不会在公共接口中保留,并且在进入子例程时可能具有任何值。
The FPCR is used to control the behavior of the floating-point unit. It is a global register with the following properties.
FPCR 用于控制浮点运算单元的行为。它是一个全局寄存器,具有以下特性。
The exception-control bits (8-12), rounding mode bits (22-23) and flush-to-zero bits (24) may be modified by calls to specific support functions that affect the global state of the application.
异常控制位(8-12)、舍入模式位(22-23)和平移归零位(24)可通过调用特定的支持函数进行修改,从而影响应用程序的全局状态。
All other bits are reserved and must not be modified. It is not defined whether the bits read as zero or one, or whether they are preserved across a public interface.
所有其他位均为保留位,不得修改。至于这些位读作 0 还是 1,或者是否在公共接口上保留,目前还没有明确规定。
2 Processes, Memory and the Stack
The AAPCS64 applies to a single thread of execution or process (hereafter referred to as a process). A process has a program state defined by the underlying machine registers and the contents of the memory it can access. The memory a process can access, without causing a run-time fault, may vary during the execution of the process.
AAPCS64 适用于单线程执行或进程(以下简称进程)。进程的程序状态由底层机器寄存器及其可访问的内存内容定义。在进程执行过程中,在不导致运行时故障的情况下,进程可访问的内存可能会发生变化。
The memory of a process can normally be classified into five categories:
进程的内存通常可以分为五类:
code (the program being executed), which must be readable, but need not be writable, by the process.
代码(正在执行的程序),进程必须可以读取,但不一定可以写入。
read-only static data.
只读静态数据。
writable static data.
可写静态数据。
the heap.
堆
the stack.
栈
Writable static data may be further sub-divided into initialized, zero-initialized and uninitialized data. Except for the stack there is no requirement for each class of memory to occupy a single contiguous region of memory. A process must always have some code and a stack, but need not have any of the other categories of memory.
可写静态数据可以进一步细分为已初始化数据、零初始化数据和未初始化数据。除了栈之外,不需要每一类内存都占用单个连续的内存区域。进程必须始终具有一些代码和栈,但不需要任何其他类别的内存。
The heap is an area (or areas) of memory that are managed by the process itself (for example, with the C malloc function). It is typically used for the creation of dynamic data objects.
堆是由进程自身管理的内存区域(如使用 C 语言的 malloc 函数)。它通常用于创建动态数据对象。
A conforming program must only execute instructions that are in areas of memory designated to contain code.
符合要求的程序必须只执行指定包含代码的内存区域内的指令。
2.1 Memory Addresses
The address space may consist of one or more disjoint regions. No region may span address zero (although one region may start at zero).
地址空间可由一个或多个不相连的区域组成。任何区域都不得跨越零地址(尽管一个区域可以从零开始)。
The use of tagged addressing is platform specific. When tagged addressing is disabled all 64 bits of a pointer are passed to the address translation system. When tagged addressing is enabled, the top eight bits of a pointer are ignored for the purposes of address translation.
标记寻址的使用与平台有关。禁用标记寻址时,指针的所有 64 位都将传递给地址转换系统。启用标记寻址后,指针的前八位在地址转换时将被忽略。
2.2 The Stack
The stack is a contiguous area of memory that may be used for storage of local variables and for passing additional arguments to subroutines when there are insufficient argument registers available.
堆栈是一个连续的内存区域,可用于存储局部变量以及在没有足够的参数寄存器可用时将附加参数传递给子例程。
The stack implementation is full-descending, with the current extent of the stack held in the special-purpose register SP. The stack will, in general, have both a base and a limit though in practice an application may not be able to determine the value of either.
堆栈实现是全递减的,堆栈的当前范围保存在专用寄存器 SP 中。一般来说,堆栈将具有基数和限制,但实际上应用程序可能无法确定其中任何一个的值。
The stack may have a fixed size or be dynamically extendable (by adjusting the stack-limit downwards).
堆栈可以具有固定大小,也可以动态扩展(通过向下调整堆栈限制)。
The rules for maintenance of the stack are divided into two parts: a set of constraints that must be observed at all times, and an additional constraint that must be observed at a public interface.
堆栈的维护规则分为两部分:一组必须始终遵守的约束,以及必须在公共接口处遵守的附加约束。
2.2.1 Universal stack constraints
At all times the following basic constraints must hold:
任何时候都必须遵守以下基本约束:
Stack-limit < SP <= stack-base. The stack pointer must lie within the extent of the stack.
Stack-limit < SP <= stack-base。堆栈指针必须位于堆栈范围内。
A process may only access (for reading or writing) the closed interval of the entire stack delimited by [SP, stack-base – 1].
进程只能访问(用于读取或写入)由 [SP, stack-base – 1] 分隔的整个堆栈的闭区间。
Additionally, at any point at which memory is accessed via SP, the hardware requires that
此外,在访问内存的任何时候硬件都要求通过 SP 访问
SP mod 16 = 0. The stack must be quad-word aligned.
SP mod 16 = 0。堆栈必须四字对齐。
2.2.2 Stack constraints at a public interface
The stack must also conform to the following constraint at a public interface:
堆栈还必须在公共接口处符合以下约束:
SP mod 16 = 0. The stack must be quad-word aligned.
SP mod 16 = 0。堆栈必须四字对齐。
2.3 The Frame Pointer
Conforming code shall construct a linked list of stack-frames. Each frame shall link to the frame of its caller by means of a frame record of two 64-bit values on the stack. The frame record for the innermost frame (belonging to the most recent routine invocation) shall be pointed to by the Frame Pointer register (FP). The lowest addressed double-word shall point to the previous frame record and the highest addressed double-word shall contain the value passed in LR on entry to the current function. The end of the frame record chain is indicated by the address zero in the address for the previous frame. The location of the frame record within a stack frame is not specified. Note: There will always be a short period during construction or destruction of each frame record during which the frame pointer will point to the caller’s record.
合格的代码应构造堆栈帧的链接列表。每个帧应通过堆栈上两个 64 位值的帧记录链接到其调用者的帧。最里面的帧(属于最近的例程调用)的帧记录应由帧指针寄存器(FP)指向。最低寻址双字应指向前一帧记录,最高寻址双字应包含进入当前函数时在 LR 中传递的值。帧记录链的末尾由前一帧地址中的地址零指示。堆栈帧内帧记录的位置未指定。注意:在每个帧记录的构造或销毁过程中总会有一个短暂的时间段,在此期间帧指针将指向调用者的记录。
A platform shall mandate the minimum level of conformance with respect to the maintenance of frame records.
平台应强制规定帧记录维护的最低一致性水平。
The options are, in decreasing level of functionality:
这些选项按功能级别递减:
It may require the frame pointer to address a valid frame record at all times, except that small subroutines which do not modify the link register may elect not to create a frame record
它可能需要帧指针始终寻址有效的帧记录,但不修改链接寄存器的小子例程可能选择不创建帧记录
It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record
它可能需要帧指针始终寻址有效的帧记录,但任何子例程可能选择不创建帧记录
It may permit the frame pointer register to be used as a general-purpose callee-saved register, but provide a platform-specific mechanism for external agents to reliably detect this condition
它可以允许帧指针寄存器用作通用被调用者保存的寄存器,但为外部代理提供特定于平台的机制来可靠地检测这种情况
It may elect not to maintain a frame chain and to use the frame pointer register as a general-purpose callee-saved register.
它可以选择不维护帧链并使用帧指针寄存器作为通用被调用者保存的寄存器。
3 Subroutine Calls
The A64 instruction set contains primitive subroutine call instructions, BL and BLR, which performs a branch-with-link operation. The effect of executing BL is to transfer the sequentially next value of the program counter—the return address—into the link register (LR) and the destination address into the program counter. The effect of executing BLR is similar except that the new PC value is read from the specified register.
A64指令集包含原始子程序调用指令BL和BLR,它们执行带链接的分支操作。执行BL的效果是将程序计数器的下一个值(返回地址)传送到链接寄存器(LR),并将目标地址传送到程序计数器。执行BLR 的效果类似,只是从指定寄存器中读取新的PC 值。
3.1 Use of IP0 and IP1 by the linker
The A64 branch instructions are unable to reach every destination in the address space, so it may be necessary for the linker to insert a veneer between a calling routine and a called subroutine. Veneers may also be needed to support dynamic linking. Any veneer inserted must preserve the contents of all registers except IP0, IP1 (r16, r17) and the condition code flags; a conforming program must assume that a veneer that alters IP0 and/or IP1 may be inserted at any branch instruction that is exposed to a relocation that supports long branches.
A64 分支指令无法到达地址空间中的每个目的地,因此链接器可能需要在调用例程和被调用子例程之间插入veneer代码。还可能需要veneer来支持动态链接。插入的任何veneer必须保留除 IP0、IP1(r16、r17)和条件代码标志之外的所有寄存器的内容;符合要求的程序必须假设可以将更改 IP0 和/或 IP1 的veneer代码插入到支持长分支的重定位的任何分支指令处。
Note R_AARCH64_CALL26, and R_AARCH64_JUMP26 are the ELF relocation types with this property.
注意 R_AARCH64_CALL26 和 R_AARCH64_JUMP26 是具有此属性的 ELF 重定位类型。
4 Parameter Passing
The base standard provides for passing arguments in general-purpose registers (r0-r7), SIMD/floating-point registers (v0-v7) and on the stack. For subroutines that take a small number of small parameters, only registers are used.
基本标准规定在通用寄存器 (r0-r7)、SIMD/浮点寄存器 (v0-v7) 和堆栈中传递参数。对于采用少量小参数的子程序,仅使用寄存器。
4.1 Variadic Subroutines
A Variadic subroutine is a routine that takes a variable number of parameters. The full parameter list is known by the caller, but the callee only knows a minimum number of arguments will be passed and will determine the additional arguments based on the values passed in other arguments. The two classes of arguments are known as Named arguments (these form the minimum set) and Anonymous arguments (these are the optional additional arguments).
变量子例程是一种接受可变数参数的例程。调用者知道完整的参数列表,但被调用者只知道将传递的最小参数数,并将根据其他参数中传递的值确定附加参数。这两类参数被称为命名参数(构成最小参数集)和匿名参数(可选的附加参数)。
In this standard a non-variadic subroutine can be considered to be identical to a variadic subroutine that takes no optional arguments.
在本标准中,非变量子程序可视为与不带可选参数的变量子程序相同。
4.2 Parameter Passing Rules
Parameter passing is defined as a two-level conceptual model
参数传递被定义为两个层次的概念模型
A mapping from the type of a source language argument onto a machine type
从源语言参数类型到机器类型的映射
The marshaling of machine types to produce the final parameter list
对机器类型进行调整,以生成最终参数列表
The mapping from a source language type onto a machine type is specific for each language and is described separately. The result is an ordered list of arguments that are to be passed to the subroutine.
从源语言类型到机器类型的映射是每种语言所特有的,将分别进行描述。结果是要传递给子程序的参数的有序列表。
For a caller, sufficient stack space to hold stacked argument values is assumed to have been allocated prior to marshaling: in practice the amount of stack space required cannot be known until after the argument marshaling has been completed. A callee is permitted to modify any stack space used for receiving parameter values from the caller.
对于调用方而言,假定在堆叠参数之前已经分配了足够的堆栈空间来存放堆叠参数值:实际上,只有在参数堆叠完成后才能知道所需的堆栈空间。允许被调用者修改用于接收调用者参数值的栈空间
5 Result Return
The manner in which a result is returned from a function is determined by the type of that result:
函数返回结果的方式由结果的类型决定:
If the type, T, of the result of a function is such that
如果类型 是T,函数将会是这样的
void func(T arg)
would require that arg be passed as a value in a register (or set of registers) according to the rules in §4 Parameter Passing, then the result is returned in the same registers as would be used for such an argument.
将要求 arg 按照第 4 节参数传递中的规则,以寄存器(或寄存器组)中的值的形式传递,然后结果将以用于此类参数的相同寄存器返回。
Otherwise, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine (there is no requirement for the callee to preserve the value stored in x8).
否则,调用者应预留足够大小和对齐方式的内存块来保存结果。内存块的地址应作为附加参数传递给 x8 中的函数。被调用者可在执行子程序的任何时候修改结果内存块(不要求调用者保留 x8 中存储的值)。
6 Interworking
Interworking between the 32-bit AAPCS and the AAPCS64 is not supported within a single process. (In AArch64, all inter-operation between 32-bit and 64-bit machine states takes place across a change of exception level).
单个进程内不支持 32 位 AAPCS 和 AAPCS64 之间的互操作。(在 AArch64 中,32 位和 64 位机器状态之间的所有互操作都是在异常级别发生变化时进行的)。