API - Docs - AsmJit

Index ⭢ Assembler

Assembler

Assembler interface and operands.

Overview

AsmJit's Assembler is used to emit machine code directly into a CodeBuffer. In general, code generation with assembler requires the knowledge of the following:

BaseAssembler and architecture-specific assemblers:
- x86::Assembler - Assembler implementation targeting X86 and X86_64 architectures.
- a64::Assembler - Assembler implementation targeting AArch64 architecture.
Operand and its variations:
- BaseReg - Base class for a register operand, inherited by:
  - x86::Reg - Register operand specific to X86 and X86_64 architectures.
  - arm::Reg - Register operand specific to AArch64 architecture.
- BaseMem - Base class for a memory operand, inherited by:
  - x86::Mem - Memory operand specific to X86 architecture.
  - arm::Mem - Memory operand specific to AArch64 architecture.
- Imm - Immediate (value) operand.
- Label - Label operand.

Note: Assembler examples use x86::Assembler as abstract interfaces cannot be used to generate code.

Operand Basics

Let's start with operands. Operand is a data structure that defines a data layout of any operand. It can be inherited, but any class inheriting it cannot add any members to it, only the existing layout can be reused. AsmJit allows to construct operands dynamically, to store them, and to query a complete information about them at run-time. Operands are small (always 16 bytes per Operand) and can be copied and passed by value. Please never allocate individual operands dynamically by using a new keyword - it would work, but then you would have to be responsible for deleting such operands. In AsmJit operands are always part of some other data structures like InstNode, which is part of Builder tool.

Operands contain only identifiers, but not pointers to any code-generation data. For example Label operand only provides label identifier, but not a pointer to LabelEntry structure. In AsmJit such IDs are used to link stuff together without having to deal with pointers.

AsmJit's operands all inherit from a base class called Operand. Operands have the following properties that are commonly accessible by getters and setters:

Operand - Base operand, which only provides accessors that are common to all operand types.
BaseReg - Describes either physical or virtual register. Physical registers have id that matches the target's machine id directly whereas virtual registers must be allocated into physical registers by a register allocator pass. Register operand provides:
- Register Type (RegType) - Unique id that describes each possible register provided by the target architecture - for example X86 backend provides general purpose registers (GPB-LO, GPB-HI, GPW, GPD, and GPQ) and all types of other registers like K, MM, BND, XMM, YMM, ZMM, and TMM.
- Register Group (RegGroup) - Groups multiple register types under a single group - for example all general-purpose registers (of all sizes) on X86 are part of RegGroup::kGp and all SIMD registers (XMM, YMM, ZMM) are part of RegGroup::kVec.
- Register Size - Contains the size of the register in bytes. If the size depends on the mode (32-bit vs 64-bit) then generally the higher size is used (for example RIP register has size 8 by default).
- Register Id - Contains physical or virtual id of the register.
BaseMem - Used to reference a memory location. Memory operand provides:
- Base Register - A base register type and id (physical or virtual).
- Index Register - An index register type and id (physical or virtual).
- Offset - Displacement or absolute address to be referenced (32-bit if base register is used and 64-bit if base register is not used).
- Flags that can describe various architecture dependent information (like scale and segment-override on X86).
Imm - Immediate values are usually part of instructions (encoded within the instruction itself) or data.
Label - used to reference a location in code or data. Labels must be created by the BaseEmitter or by CodeHolder. Each label has its unique id per CodeHolder instance.

Operand Manipulation

AsmJit allows to construct operands dynamically, to store them, and to query a complete information about them at run-time. Operands are small (always 16 bytes per Operand) and should be always copied (by value) if you intend to store them (don't create operands by using new keyword, it's not recommended). Operands are safe to be passed to memcpy() and memset(), which becomes handy when working with arrays of operands. If you set all members of an Operand to zero the operand would become NONE operand, which is the same as a default constructed Operand.

The example below illustrates how operands can be used and modified even without using any other code generation classes. The example uses X86 architecture-specific operands.

#include <asmjit/x86.h>
 
using namespace asmjit;
 
// Registers can be copied, it's a common practice.
x86::Gp dstRegByValue() { return x86::ecx; }
 
void usingOperandsExample(x86::Assembler& a) {
  // Gets `ecx` register returned by a function.
  x86::Gp dst = dstRegByValue();
  // Gets `rax` register directly from the provided `x86` namespace.
  x86::Gp src = x86::rax;
  // Constructs `r10` dynamically.
  x86::Gp idx = x86::gpq(10);
  // Constructs [src + idx] memory address - referencing [rax + r10].
  x86::Mem m = x86::ptr(src, idx);
 
  // Examine `m`: Returns `RegType::kX86_Gpq`.
  m.indexType();
  // Examine `m`: Returns 10 (`r10`).
  m.indexId();
 
  // Reconstruct `idx` stored in mem:
  x86::Gp idx_2 = x86::Gp::fromTypeAndId(m.indexType(), m.indexId());
 
  // True, `idx` and idx_2` are identical.
  idx == idx_2;
 
  // Possible - op will still be the same as `m`.
  Operand op = m;
  // True (can be casted to BaseMem or architecture-specific Mem).
  op.isMem();
 
  // True, `op` is just a copy of `m`.
  m == op;
 
  // Static cast is fine and valid here.
  static_cast<BaseMem&>(op).addOffset(1);
  // However, using `as<T>()` to cast to a derived type is preferred.
  op.as<BaseMem>().addOffset(1);
  // False, `op` now points to [rax + r10 + 2], which is not [rax + r10].
  m == op;
 
  // Emitting 'mov' - type safe way.
  a.mov(dst, m);
  // Not possible, `mov` doesn't provide mov(x86::Gp, Operand) overload.
  a.mov(dst, op);
 
  // Type-unsafe, but possible.
  a.emit(x86::Inst::kIdMov, dst, m);
  // Also possible, `emit()` is type-less and can be used with raw Operand.
  a.emit(x86::Inst::kIdMov, dst, op);
}

Some operands have to be created explicitly by emitters. For example labels must be created by BaseEmitter::newLabel(), which creates a label entry and returns a Label operand with the id that refers to it. Such label then can be used by emitters.

Memory Operands

Some architectures like X86 provide a complex memory addressing model that allows to encode addresses having a BASE register, INDEX register with a possible scale (left shift), and displacement (called offset in AsmJit). Memory address on X86 can also specify memory segment (segment-override in X86 terminology) and some instructions (gather / scatter) require INDEX to be a x86::Vec register instead of a general-purpose register.

AsmJit allows to encode and work with all forms of addresses mentioned and implemented by X86. In addition, it also allows to construct absolute 64-bit memory address operands, which is only allowed in one form of 'mov' instruction.

#include <asmjit/x86.h>
 
using namespace asmjit;
 
void testX86Mem() {
  // Makes it easier to access x86 stuff...
  using namespace asmjit::x86;
 
  // BASE + OFFSET.
  Mem a = ptr(rax);                 // a = [rax]
  Mem b = ptr(rax, 15);             // b = [rax + 15]
 
  // BASE + INDEX << SHIFT - Shift is in BITS as used by X86!
  Mem c = ptr(rax, rbx);            // c = [rax + rbx]
  Mem d = ptr(rax, rbx, 2);         // d = [rax + rbx << 2]
  Mem e = ptr(rax, rbx, 2, 15);     // e = [rax + rbx << 2 + 15]
 
  // BASE + VM (Vector Index) (encoded as MOD+VSIB).
  Mem f = ptr(rax, xmm1);           // f = [rax + xmm1]
  Mem g = ptr(rax, xmm1, 2);        // g = [rax + xmm1 << 2]
  Mem h = ptr(rax, xmm1, 2, 15);    // h = [rax + xmm1 << 2 + 15]
 
  // Absolute address:
  uint64_t addr = (uint64_t)0x1234;
  Mem i = ptr(addr);                // i = [0x1234]
  Mem j = ptr(addr, rbx);           // j = [0x1234 + rbx]
  Mem k = ptr(addr, rbx, 2);        // k = [0x1234 + rbx << 2]
 
  // LABEL - Will be encoded as RIP (64-bit) or absolute address (32-bit).
  Label L = ...;
  Mem m = ptr(L);                   // m = [L]
  Mem n = ptr(L, rbx);              // n = [L + rbx]
  Mem o = ptr(L, rbx, 2);           // o = [L + rbx << 2]
  Mem p = ptr(L, rbx, 2, 15);       // p = [L + rbx << 2 + 15]
 
  // RIP - 64-bit only (RIP can't use INDEX).
  Mem q = ptr(rip, 24);             // q = [rip + 24]
}

Memory operands can optionally contain memory size. This is required by instructions where the memory size cannot be deduced from other operands, like inc and dec on X86:

#include <asmjit/x86.h>
 
using namespace asmjit;
 
void testX86Mem() {
  // The same as: dword ptr [rax + rbx].
  x86::Mem a = x86::dword_ptr(x86::rax, x86::rbx);
 
  // The same as: qword ptr [rdx + rsi << 0 + 1].
  x86::Mem b = x86::qword_ptr(x86::rdx, x86::rsi, 0, 1);
}

Memory operands provide API that can be used to access its properties:

#include <asmjit/x86.h>
 
using namespace asmjit;
 
void testX86Mem() {
  // The same as: dword ptr [rax + 12].
  x86::Mem mem = x86::dword_ptr(x86::rax, 12);
 
  mem.hasBase();                    // true.
  mem.hasIndex();                   // false.
  mem.size();                       // 4.
  mem.offset();                     // 12.
 
  mem.setSize(0);                   // Sets the size to 0 (makes it size-less).
  mem.addOffset(-1);                // Adds -1 to the offset and makes it 11.
  mem.setOffset(0);                 // Sets the offset to 0.
  mem.setBase(x86::rcx);            // Changes BASE to RCX.
  mem.setIndex(x86::rax);           // Changes INDEX to RAX.
  mem.hasIndex();                   // true.
}
// ...

Making changes to memory operand is very comfortable when emitting loads and stores:

#include <asmjit/x86.h>
 
using namespace asmjit;
 
void testX86Mem(CodeHolder& code) {
  x86::Assembler a(code);           // Your initialized x86::Assembler.
  x86::Mem mSrc = x86::ptr(eax);    // Construct [eax] memory operand.
 
  // One way of emitting bunch of loads is to use `mem.adjusted()`, which
  // returns a new memory operand and keeps the source operand unchanged.
  a.movaps(x86::xmm0, mSrc);        // No adjustment needed to load [eax].
  a.movaps(x86::xmm1, mSrc.adjusted(16)); // Loads from [eax + 16].
  a.movaps(x86::xmm2, mSrc.adjusted(32)); // Loads from [eax + 32].
  a.movaps(x86::xmm3, mSrc.adjusted(48)); // Loads from [eax + 48].
 
  // ... do something with xmm0-3 ...
 
  // Another way of adjusting memory is to change the operand in-place.
  // If you want to keep the original operand you can simply clone it.
  x86::Mem mDst = mSrc.clone();     // Clone mSrc.
 
  a.movaps(mDst, x86::xmm0);        // Stores xmm0 to [eax].
  mDst.addOffset(16);               // Adds 16 to `mDst`.
 
  a.movaps(mDst, x86::xmm1);        // Stores to [eax + 16] .
  mDst.addOffset(16);               // Adds 16 to `mDst`.
 
  a.movaps(mDst, x86::xmm2);        // Stores to [eax + 32].
  mDst.addOffset(16);               // Adds 16 to `mDst`.
 
  a.movaps(mDst, x86::xmm3);        // Stores to [eax + 48].
}

Assembler Examples

x86::Assembler provides many X86/X64 examples.

Classes

class BaseAssembler

struct OperandSignature

struct Operand_

class Operand

class Label

class BaseReg

struct RegOnly

class BaseRegList

class RegListT<RegT>

class BaseMem

class Imm

Typedefs

typedef uint32_t RegMask

typedef Support::EnumValues<RegGroup, RegGroup::kGp, RegGroup::kMaxVirt> RegGroupVirtValues

Enumerations

enum class OperandType : uint32_t

enum class RegType : uint8_t

enum class RegGroup : uint8_t

enum class ImmType : uint32_t

Functions

template<typename T>

static constexpr Imm imm(const T& val) noexcept

Variables

static constexpr const Operand Globals::none

typedef uint32_t RegMask[¶]

Register mask is a convenience typedef that describes a mask where each bit describes a physical register id in the same RegGroup.

At the moment 32 bits are enough as AsmJit doesn't support any architecture that would provide more than 32 registers for a register group.

class OperandType : uint32_tenumstrong[¶]

Operand type used by Operand_.

Constant	Description
kNone	Not an operand or not initialized.
kReg	Operand is a register.
kMem	Operand is a memory.
kRegList	Operand is a register-list.
kImm	Operand is an immediate value.
kLabel	Operand is a label.
kMaxValue	Maximum value of `OperandType`.

class RegType : uint8_tenumstrong[¶]

Provides a unique type that can be used to identify a register or its view.

Constant	Description
kNone	No register - unused, invalid, multiple meanings.
kLabelTag	This is not a register type. This value is reserved for a Label that's used in BaseMem as a base. Label tag is used as a sub-type, forming a unique signature across all operand types as 0x1 is never associated with any register type. This means that a memory operand's BASE register can be constructed from virtually any operand (register vs. label) by just assigning its type (register type or label-tag) and operand id.
kPC	Universal type describing program counter (PC) or instruction pointer (IP) register, if the target architecture actually exposes it as a separate register type, which most modern architectures do.
kGp8Lo	8-bit low general purpose register (X86).
kGp8Hi	8-bit high general purpose register (X86).
kGp16	16-bit general purpose register (X86).
kGp32	32-bit general purpose register (X86\|AArch32\|AArch64).
kGp64	64-bit general purpose register (X86\|AArch64).
kVec8	8-bit view of a vector register (AArch64).
kVec16	16-bit view of a vector register (AArch64).
kVec32	32-bit view of a vector register (AArch32\|AArch64).
kVec64	64-bit view of a vector register (AArch32\|AArch64). Note This is never used for MMX registers on X86, MMX registers have its own category.
kVec128	128-bit view of a vector register (X86\|AArch32\|AArch64).
kVec256	256-bit view of a vector register (X86).
kVec512	512-bit view of a vector register (X86).
kVec1024	1024-bit view of a vector register (future).
kVecNLen	View of a vector register, which width is implementation specific (AArch64).
kMask	Mask register (X86).
kExtra	Start of architecture dependent register types.
kX86_Rip	Instruction pointer (RIP), only addressable in x86::Mem in 64-bit targets.
kX86_GpbLo	Low GPB register (AL, BL, CL, DL, ...).
kX86_GpbHi	High GPB register (AH, BH, CH, DH only).
kX86_Gpw	GPW register.
kX86_Gpd	GPD register.
kX86_Gpq	GPQ register (64-bit).
kX86_Xmm	XMM register (SSE+).
kX86_Ymm	YMM register (AVX+).
kX86_Zmm	ZMM register (AVX512+).
kX86_KReg	K register (AVX512+).
kX86_Mm	MMX register.
kX86_SReg	Segment register (None, ES, CS, SS, DS, FS, GS).
kX86_CReg	Control register (CR).
kX86_DReg	Debug register (DR).
kX86_St	FPU (x87) register.
kX86_Bnd	Bound register (BND).
kX86_Tmm	TMM register (AMX_TILE)
kARM_PC	Program pointer (PC) register (AArch64).
kARM_GpW	32-bit general purpose register (R or W).
kARM_GpX	64-bit general purpose register (X).
kARM_VecB	8-bit view of VFP/ASIMD register (B).
kARM_VecH	16-bit view of VFP/ASIMD register (H).
kARM_VecS	32-bit view of VFP/ASIMD register (S).
kARM_VecD	64-bit view of VFP/ASIMD register (D).
kARM_VecQ	128-bit view of VFP/ASIMD register (Q).
kARM_VecV	128-bit view of VFP/ASIMD register (V).
kMaxValue	Maximum value of `RegType`.

class RegGroup : uint8_tenumstrong[¶]

Provides a unique value that identifies groups of registers and their views.

Constant	Description
kGp	General purpose register group compatible with all backends.
kVec	Vector register group compatible with all backends. Describes X86 XMM\|YMM\|ZMM registers ARM/AArch64 V registers.
kMask	Mask register group compatible with all backends that can use masking.
kExtraVirt3	Extra virtual group #3 that can be used by Compiler for register allocation.
kPC	Program counter group.
kExtraNonVirt	Extra non-virtual group that can be used by registers not managed by Compiler.
kX86_K	K register group (KReg) - maps to RegGroup::kMask (X86, X86_64).
kX86_MM	MMX register group (MM) - maps to RegGroup::kExtraVirt3 (X86, X86_64).
kX86_Rip	Instruction pointer (X86, X86_64).
kX86_SReg	Segment register group (X86, X86_64).
kX86_CReg	CR register group (X86, X86_64).
kX86_DReg	DR register group (X86, X86_64).
kX86_St	FPU register group (X86, X86_64).
kX86_Bnd	BND register group (X86, X86_64).
kX86_Tmm	TMM register group (X86, X86_64).
k0	First group - only used in loops.
kMaxVirt	Last value of a virtual register that is managed by BaseCompiler.
kMaxValue	Maximum value of `RegGroup`.

class ImmType : uint32_tenumstrong[¶]

Type of the an immediate value.

Constant	Description
kInt	Immediate is integer.
kDouble	Immediate is a floating point stored as double-precision.

template<typename T>
Imm imm(
const T& val
)constexprstaticnoexcept[¶]

Creates a new immediate operand.

const Operand Globals::noneconstexprstatic[¶]

A default-constructed operand of Operand_::kOpNone type.