X86/X64 compiler implementation.
Compiler Basics
The first x86::Compiler example shows how to generate a function that simply returns an integer value. It's an analogy to the first Assembler example:
#include <asmjit/x86.h>
#include <stdio.h>
typedef int (*Func)(void);
int main() {
cc.mov(vReg, 1);
cc.ret(vReg);
cc.endFunc();
cc.finalize();
Func fn;
if (err) return 1;
int result = fn();
printf("%d\n", result);
return 0;
}
The BaseCompiler::addFunc() and BaseCompiler::endFunc() functions are used to define the function and its end. Both must be called per function, but the body doesn't have to be generated in sequence. An example of generating two functions will be shown later. The next example shows more complicated code that contain a loop and generates a simple memory copy function that uses uint32_t
items:
#include <asmjit/x86.h>
#include <stdio.h>
typedef void (*MemCpy32)(uint32_t* dst, const uint32_t* src, size_t count);
int main() {
uint32_t*,
const uint32_t*,
size_t>());
Label L_Loop = cc.newLabel();
Label L_Exit = cc.newLabel();
cc.test(i, i);
cc.jz(L_Exit);
cc.bind(L_Loop);
cc.mov(tmp, x86::dword_ptr(src));
cc.mov(x86::dword_ptr(dst), tmp);
cc.add(src, 4);
cc.add(dst, 4);
cc.dec(i);
cc.jnz(L_Loop);
cc.bind(L_Exit);
cc.endFunc();
cc.finalize();
MemCpy32 memcpy32;
if (err)
return 1;
uint32_t input[6] = { 1, 2, 3, 5, 8, 13 };
uint32_t output[6];
memcpy32(output, input, 6);
for (uint32_t i = 0; i < 6; i++)
printf("%d\n", output[i]);
return 0;
}
AVX and AVX-512
AVX and AVX-512 code generation must be explicitly enabled via FuncFrame to work properly. If it's not setup correctly then Prolog & Epilog would use SSE instead of AVX instructions to work with SIMD registers. In addition, Compiler requires explicitly enable AVX-512 via FuncFrame in order to use all 32 SIMD registers.
#include <asmjit/x86.h>
#include <stdio.h>
typedef void (*Func)(void*);
int main() {
x86::Gp addr = cc.newIntPtr(
"addr");
cc.vpaddq(vreg, vreg, vreg);
cc.endFunc();
cc.finalize();
Func fn;
if (err) return 1;
uint64_t data[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
fn(data);
printf("%llu\n", (unsigned long long)data[0]);
return 0;
}
Recursive Functions
It's possible to create more functions by using the same x86::Compiler instance and make links between them. In such case it's important to keep the pointer to FuncNode.
The example below creates a simple Fibonacci function that calls itself recursively:
#include <asmjit/x86.h>
#include <stdio.h>
typedef uint32_t (*Fibonacci)(uint32_t
x);
int main() {
Label L_Exit = cc.newLabel()
cc.jb(L_Exit);
cc.invoke(&invokeNode,
cc.bind(L_Exit);
cc.endFunc();
cc.finalize();
Fibonacci fib;
if (err) return 1;
printf("Fib(%u) -> %u\n", 8, fib(8));
return 0;
}
Stack Management
Function's stack-frame is managed automatically, which is used by the register allocator to spill virtual registers. It also provides an interface to allocate user-defined block of the stack, which can be used as a temporary storage by the generated function. In the following example a stack of 256 bytes size is allocated, filled by bytes starting from 0 to 255 and then iterated again to sum all the values.
#include <asmjit/x86.h>
#include <stdio.h>
typedef int (*Func)(void);
int main() {
stackIdx.setIndex(i);
stackIdx.setSize(1);
cc.lea(p, stack);
cc.xor_(i, i);
Label L1 = cc.newLabel();
Label L2 = cc.newLabel();
cc.bind(L1);
cc.mov(stackIdx, i.
r8());
cc.inc(i);
cc.cmp(i, 256);
cc.jb(L1);
cc.xor_(i, i);
cc.xor_(sum, sum);
cc.bind(L2);
cc.movzx(val, stackIdx);
cc.add(sum, val);
cc.inc(i);
cc.cmp(i, 256);
cc.jb(L2);
cc.ret(sum);
cc.endFunc();
cc.finalize();
if (err) return 1;
printf(
"Func() -> %d\n",
func());
return 0;
}
Constant Pool
Compiler provides two constant pools for a general purpose code generation:
- Local constant pool - Part of FuncNode, can be only used by a single function and added after the function epilog sequence (after
ret
instruction).
- Global constant pool - Part of BaseCompiler, flushed at the end of the generated code by BaseEmitter::finalize().
The example below illustrates how a built-in constant pool can be used:
#include <asmjit/x86.h>
cc.mov(v0, c0);
cc.mov(v1, c1);
cc.add(v0, v1);
}
Jump Tables
x86::Compiler supports jmp
instruction with reg/mem operand, which is a commonly used pattern to implement indirect jumps within a function, for example to implement switch()
statement in a programming languages. By default AsmJit assumes that every basic block can be a possible jump target as it's unable to deduce targets from instruction's operands. This is a very pessimistic default that should be avoided if possible as it's costly and very unfriendly to liveness analysis and register allocation.
Instead of relying on such pessimistic default behavior, let's use JumpAnnotation to annotate a jump where all targets are known:
#include <asmjit/x86.h>
x86::Gp target = cc.newIntPtr(
"target");
x86::Gp offset = cc.newIntPtr(
"offset");
cc.movsxd(target, x86::dword_ptr(offset, op.
cloneAs(offset), 2));
else
cc.mov(target, x86::dword_ptr(offset, op.
cloneAs(offset), 2));
cc.add(target, offset);
cc.
jmp(target, annotation);
cc.addss(a, b);
cc.subss(a, b);
cc.mulss(a, b);
cc.divss(a, b);
}