The B3 compiler comprises two intermediate representations: a higher-level SSA-based representation called B3 IR and a lower-level representation that focuses of machine details, like registers. This lower-level form is called Air (Assembly Intermediate Representation).
Air programs are represented using a
Code comprises an array of
blocks. Each basic block comprises an array of
Air has an explicit control flow graph: each basic block has predecessor and successor blocks.
Execution always begins at the first basic block (
in each block are executed in order. Each
Inst has an opcode, some flags, an
array of arguments
and an origin. The opcode and flags are wrapped in a
Kind, to make it
convenient to carry them around together. The origin is simply a B3 IR
Some opcodes use the origin for additional meta-data. This works because Air code always
coexists with the B3 procedure from which it was generated.
This document begins by describing the philosophy of Air. The behavior of
central to Air's execution model, which is described in the section that follows. The last
section describes the way Air opcodes are defined.
B3 is designed to be portable to many kinds of CPUs. Currently, it supports x86-64 and ARM64, which are quite different from each other. In B3 IR, we expose very few instruction set details. It's a goal of B3 IR to ensure that B3 values behave the same way except when the alternative would be counterproductive (like with pointer size, the corner-case behaviors of division, or calling convention customization). But to effectively compile code to different CPUs, the compiler has to eventually make instruction set details explicit. This is where Air comes in. B3 locks in most CPU-specific details at the moment of conversion to Air, and the Air code is irreversibly tied to some specific CPU.
Air is an instruction superset: it recognizes all of the instructions from all CPUs
that Air may target. In its lowest-level form, Air is simply a way of describing an assembly
instruction sequence, and this includes CPU concepts like registers and direct accesses to the
stack. Air also has a higher-level form in which the assembly has not yet undergone register
or stack allocation. Therefore, Air also supports abstract registers (called
Tmp object can either hold an unallocated temporary or a
Air has syntax to speak of all of the CPU instructions we know about. It is possible to speak of an x86-64 instruction while compiling for ARM64, for example. Clients of Air, such as the B3 to Air lowering phase, are allowed to pick any Air opcode and ask if that opcode would be valid on the current CPU. They are also allowed to check if specific forms of any given opcode are valid. This allows clients to optimize for multiple instruction sets by cascading through the possible opcodes that they know of, starting with the one they think is most efficient. Some of those opcodes may only be available on one CPU while others are available everywhere. Instruction selection does not need to know which instructions work on which CPUs; Air will tell you if some instruction happens to not be valid right now for whatever reason.
Air opcodes support overloading. For example, the Add32 opcode has both two-operand and
three-operand overloads, and those overloads have multiple forms: the first
operand may or may not be permitted to be an immediate and depending on the CPU and some of the
other operands may or may not be allowed to be memory addresses. We use opcode overload
to refer to all forms of an opcode that share the same number of arguments, and opcode
form to mean the number of arguments and their types. A fundamental Air operation is
Inst::isValidForm(), which tells the client if the instruction's current form is
valid on the current CPU. This may return false either because the Inst is not well-formed for
any CPU or because it is not valid for the current CPU even though it may be valid on some
other CPU. There is also
Air::isValidForm(), which can answer if the form you are
intending to use will be valid even if you have not created an
Inst yet. This
allows clients to generate Air by experimenting with different forms before settling on the one
that the current CPU supports.
Air doesn't require the client to perform register or stack allocation. Anywhere that Air
accepts a register it will also accept a
Tmp. Anywhere that Air accepts an address
it will also accept a stack slot reference. Air code generation comprises a register allocator
and a stack allocator, which turn
Regs and stack slots into
addresses with the frame pointer (or stack pointer) as the base and some integer offset. Air
allows clients to speak of registers directly even while also using
Tmps, and the
register allocator will ensure that it avoids clobbering registers that the client code already
relies upon. This is possible because Air has precise modeling of how instructions use
registers, so it's always possible to determine which registers are live at any point in the
Air's philosophy allows B3 to use it for converting high-level, mostly-CPU-agnostic SSA
procedures into code for the current CPU. Air is an instruction superset that allows clients to
consider all available instructions on all possible CPUs and query which forms of those
instructions are available on the current CPU. Air also supports for high-level concepts like
Tmps and stack slots, which allows B3 to Air lowering to focus on which
instructions to use without worrying about register allocation or stack layout.
Air can be thought of as an
set. It's possible to construct an
Inst with any combination of opcode and
arguments. The opcode determines what Air will do to the arguments - it may read from them or
write to them, for example. Orthognality implies that any argument that is read may be either
a register (or
Tmp), an address, or an immediate; while any argument that is
written may be either a register or an address. Air constrains orthognality where the target
CPU would. For example, none of Air's target CPUs would support an
instruction that loads its sources from memory and stores its result into memory. Even
x86 doesn't go that far. Either before or after creating an
Inst, the client can
query if a particular combination of arguments (for example, three memory addresses) would be
valid for a given opcode (for example,
Air arguments are represented using the
represent any of the following assembly operands:
Tmprepresents either a register or a temporary.
Tmpfor the base and an immediate for the offset. Stack and CallArg are abstract stack offsets. Index is a base-index address, which has a pair of
Tmps (base and index) as well as an offset immediate and a scaling immediate for the index.
Insts to point to additional meta-data. The Special argument type is used for such meta-data. It holds a
The opcode of an
Inst combined with the overload - i.e. the number of arguments
- determines what the
Inst will do to each argument. The behavior of arguments
comes down to three dimensions that are determined by the opcode and overload:
GP(general purpose) and
FP(floating point). Arguments also have type. It's important to remember that there is both a type for the argument as determined by the opcode and overload, and a type of the argument itself. Some arguments are untyped, which means that they may be used regardless of the type desired by the opcode/overload. For example, addresses are untyped. Other arguments have specific type, like registers and
Tmps. Except for
BigImm, immediates are
The timing of an argument role is important, and brings us to the order of execution of an
Inst can be thought of as executing three
The early actions of one instruction happen immediately after the late actions of the instruction before it. However, many Air analyses view them as happening at the same time. For example, any register usage in the early action of one instruction interferes with the register usage in the late action of the instruction that came before it. All of Air's liveness and interference analyses reason about the fence posts between instructions, where the late actions of the previous instruction and the early actions of the next form an interference clique.
Let's consider a simple example, like
Add32 with two arguments. Let's say that
the first argument is a memory location and the second argument is a register. Air uses the
AT&T style of
placing the destination argument last for most instructions. This Add32 loads from memory and
adds the value to the register. Air writes this as:
Add32 42(%rax), %rcx
This instruction will proceed in three steps:
%rax. The result is stored in an internal, hidden CPU location for the remainder of execution. Even if the instruction later stores to memory and overwrites this value,
Add32will still use the original value it had loaded. We say that this is an early use. At the same time, the CPU will load the value of
%rcxand store it in a hidden CPU location. This is also an early use.
%rcx. This is a late def with zero extension.
Therefore, the two-argument overload of
Add32 ascribes the following to its
42(%rax)) and early warm use with late warm def with zero extension for the second argument (
%rcx). Early warm use is written as
Usefor short. Early warm use with late warm def with zero extension is written as
GP. This matches
42(%rax)because addresses match any type. This matches
%rcxbecause it's a general-purpose register.
Width32. Combined with
UseZDef, this means that the instruction will read the low 32 bits of
%rcxin the early actions and it will store to all bits of
%rcxin the late actions, but it will ensure that all but the low 32 bits are zero.
Air can tell you what role, type, and width is ascribed to each argument by using the
Inst::forEachArg(func) operation. It takes a callback of type
void(Arg&, Arg::Role, Arg::Type, Arg::Width). For our
this callback would get called twice:
func(42(%rax), Use, GP, Width32)
func(%rcx, UseZDef, GP, Width32)
It's important to remember that Air's summaries of what instructions do to arguments are not meant to be exhaustive. For example, if an instruction claims to use an address, this tells you that the instruction will perform a load but it tells you nothing about how that load will be performed. This means that unless you know exactly what it means to use/def an argument, you cannot perform this transformation:
Move32 (%rax), %tmp Foo32 %tmp
Even if you know that Foo32 only uses its argument, you cannot do this because Move32 may not load from the address using exactly the same kind of load that Foo32 would have done. Memory accesses have many dimensions of options: alignment semantics (if you're lucky then misaligned accesses run fast but sometimes they ignore low bits, trap if unaligned, or run super slow when unaligned, and this behavior may depend on the opcode), temporality and memory ordering, determinism of trapping, etc. Just seeing that an instruction uses an address does not tell you what kind of load will happen, and currently Air does not have the ability to answer such questions. Fortunately, Air does not have much need to move memory accesses out of instructions. Uses and defs of temporaries, registers, immediates, and spill slots don't have these caveats and so those arguments can be moved from one instruction to another without worries.
Air's introspection of
Insts tends to be quite fast thanks to the use of template
specialization and C++ lambdas. The
forEachArg() template method uses an efficient
arrangement of switch statements to determine the opcode and overload. If
a C++ lambda, we expect
forEachArg() to be specialized for that lambda. Therefore,
this idiom avoids virtual dispatch or memory allocation.
Air supports exotic roles, such as late uses and early defs. There is even the
Scratch role, which means early def and late use. Speaking of a
Scratch role means that the
Tmp will be assigned a register
that is guaranteed to not interfere with any of the other registers that the instruction
speaks of. Late uses and early defs are crucial for patchpoints, which may require
that one of the incoming values be given a register that does not interfere with whatever
register is used for the result. This can be expressed either as giving the inputs a late use
role or by giving the outputs an early def role. The full list of possible roles is:
UseAddr is interesting for the
Lea (load effective address)
instruction, which evaluates the address and places the result into a temporary or register.
The argument must be an address, but
UseAddr means that we don't actually read
from the address. Note that using an address in any of the other roles always implies that the
components of the address are used early and warm (i.e.
Air arguments are central to Air's execution model. The early and late actions of an
instruction have to do with arguments, and what happens to each argument during the early and
late actions is determined by the opcode and the number of arguments (i.e. the overload).
Clients of Air may create an
Inst with any combination of opcode and arguments
and then query, using
isValidForm() if the opcode, overload, and specific
arguments are valid for the current CPU.
Air has many opcodes and the opcodes have many different overloads and forms. Air makes it
easy to reason about all of them with helpers like
forEachArg(). It also provides a
Inst::generate() function that will
generate code for the instruction, provided that it does not use any non-register
Tmps or any abstract stack slots. If we wrote the
code for validating, iterating, and generating each form by hand, we would have a bad time.
For this reason, Air comes with an
opcode code generator
that uses a
opcode definition file
as input. The opcode definition file use a simple and concise syntax that lets us define many
opcodes at once and constrain them to the CPU kinds that support them. Additionally, Air
supports custom opcodes, where the code generator emits calls to
C++ code. This section describes the opcode definition language.
It's easiest to understand the opcode definitions with an example. Let's use the two-argument
Add32 U:G:32, UZD:G:32 Tmp, Tmp x86: Imm, Addr x86: Imm, Index Imm, Tmp x86: Addr, Tmp x86: Tmp, Addr x86: Tmp, Index
The first line defines the overload. It has two arguments. The first argument serves the
Use role, shorted as
U. It is general-purpose, shortened as
G. It has a 32-bit width. Hence the string
The next lines list the available forms of the overload. A form is a list of possible kinds of
arguments. These use the same terminology for
Arg kinds from the previous section,
with the caveat that
Addr implies that
CallArg would be accepted.
Prefixing any line with
x86: means that this form is only available on x86
CPUs, such as x86 or x86-64.
opcode is automatically given a code generator that calls
MacroAssembler::opcodeName, where opcodeName is derived by
lower-casing the first letter of the Air opcode name.
MacroAssembler::add32, for example.
See the header of AirOpcode.opcodes for a complete list of shorthand used by Air's opcode definition language.