AsmJit development is currently on hold due to insufficient funding. For more information, please see the Funding Page.

Roadmap

The roadmap provides an overview of planned features for future releases. These items are not guaranteed to be in active development, but represent ideas and goals that would make AsmJit faster and more feature complete.

Please also consult AsmJit's issues page, which contains reported bugs and feature requests possibly not mentioned on this page.

ISA Support

Feature Status Description
X86 & X86_64 Supported Baseline, AVX, AVX-512, and AMX extensions supported. Intel APX is not supported.
AArch32 Not Ready Not ready, there is a separate branch providing ARM32 ISA (no thumb atm), but nobody works on it and it's not mergeable in the current state (no Compiler and UJIT support)
AArch64 Supported Baseline and NEON extensions are supported. New extensions (SVE/SME/...) are not supported.
RISC-V Pending Not supported.
LoongArch 64 Pending Not supported.

X86 Extensions Support

Extension Status Description
Baseline + Extensions Done Baseline ISA and Extensions, including ADX, BMI, BMI2, CMPXCHG, LZCNT, PONCNT, ...
MMX + SSE[1 to 4.2] Done MMX, 3DNOW, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, PCLMULQDQ.
AVX + AVX2 Done AVX, AVX2, F16C, FMA, FMA4, XOP, AVX_IFMA, AVX_NE_CONVERT, AVX_VNNI_xxx.
AVX-512 Done AVX512_[BW|CD|DQ|F|BITALG|FP16|IFMA|VBMI|VBMI2|VNNI|VP2INTERSECT|VPOPCNTDQ].
AVX10.1 Pending Not supported, but AVX10.1 doesn't bring any new instructions (it's re-branded AVX-512).
AVX10.2 Pending Not supported.
AMX Done Baseline AMX instructions are supported (some later extensions might not be supported).
APX Pending Not supported.
Other Extensions Done Other extensions including virtualization, special instructions, and system instructions, are supported.

AArch64 Extensions Support

Extension Status Description
ARMv8.1+ ISA Done Baseline, BTI, CHK, CLRBHB, CRC, CSSC, DGH, FLAGM, FLAGM2, HBC, MTE, LSE
ASIMD Done ASIMD, AES, SHA2, SHA256, SHA3, SHA512, RDMA, FCMA, JSCVT, FHM, SM3, SM4, DOTPROD, BF16, I8MM.
SVE Pending SVE, SVE2, ...
SME Pending SME, SME2, ...
GP Extensions Pending Recently introduced extensions such as LSE128 are not supported.
Miscellaneous Pending Some instructions are not fully supported, like ADRP. Some immediate operators like |lo12| are not supported as well.

Architecture Independent Features & Ideas

Feature Status Description
Generated Assemblers Pending Generate assembler's emit functions based on instruction database to reduce the time required for AsmJit maintenance and to add new instructions. At the moment most instructions are added manually, but this doesn't scale especially when AsmJit gets more ISA backends. Instead of doing everything by hand, we should have a generator that would simply generate all the required stuff the Assembler needs. Generators can generate faster code than what we have today. AArch32 backend, which is currently in a separate branch already provides such generator, but what we need is to use the same approach for generating X86 and AArch64 assemblers. Generators can also be used to generate instruction queries (like RW info, extensions, etc...).
Table-Driven Assemblers Pending Assembler performance can be implemented by using a table-driven approach and SIMD. For example we can use permutations (PSHUFB / TBL) instructions to permute data that we are going to assemble. We can use SIMD comparisons to compare instruction operands with table signatures and based on the index of the match and permutations we can assemble instructions much faster than when using the current scalar approach. SIMD-use would definitely reduce the number of branches and also the number of encodings each Assembler has to handle (because only a minor difference in instruction signature means using a different encoding identifier). This feature would need a generated assembler as doing this by hand would not be great.
Always Validating Assemblers Pending Assemblers should always validate the operands and internally match the instruction signature. At the moment instruction validation is a separate feature that when turned on slows down assembling significantly, because Assembler would explicitly call the validator. The validator is optimized for size and not for performance, so this combination is not ideal. What we want is to have validation part of the Assembler so it doesn't have to be turned on or off. This feature would need a generated assembler as the validation logic would be generated. It could be table-driven to improve performance. Implicit validation cannot make the assembler slower in 99% of cases, but complex cases are permitted.
RX^W Without Dual-Mapping Pending Implement a virtual memory runtime, which would use RX^W pages without dual mapping. There are in general two options to do this: Allocate at page granularity for each code fragment, which would be costly in cases AsmJit is used to generate small code. However, there is an another options, which is much more complex, but would be more memory efficient. Each time a new executable memory allocation happens, the memory allocator would increment a generation counter and allocate a block where older code fitting into that block would be moved. The user would only have to make sure to wipe out older generations the code is not using. This would be more like an experiment initially to validate whether it's even possible to design a good API that would fit this approach.
Homogenous Structs in Functions Pending Functions currently don't support passing structs as values. Usually people don't want to pass random structs, but structs that contain for example 4 floats, and are thus homogenous. This feature would be possible to implement.
Canonical Representation of Instructions Pending Only allow a single form of instructions that have implicit operands and provide API for making such instructions canonical - this would support projects like AsmTK/Zydis that do parsing or disassembling to always get the correct form regardless of whether the instruction was specified by using explicit or implicit operands. This would also simplify instruction query API that would not have to deal with instructions that have both implicit and explicit forms - this includes DIV, IDIV, MUL, MULX, REP+MOVS, CMPXCHG, ...
Merge AsmTK with AsmJit Pending Merging AsmTK and AsmJit would mean that we would be finally able to have AArch64 parser as well. What do we need to do before merging is own code for string to double conversion for the parser, and instruction canonization.

Register Allocation

Feature Status Description
Liveness Analysis Performance Improvements Done Improve the performance of liveness analysis by not considering virtual registers that only live in a single basic block. This optimization alone could save like 80% of memory when generating very large functions, because all the bits used by liveness analysis per each basic-block are reduced to only compute virtual registers spanning across multiple basic blocks.
Small-Code RA Performance Improvements Pending When running RA for small code (machine code under 32 KiB) the biggest overhead is CFG construction and running a local RA. CFG construction could be improved by using a different table to query RW information (basically a faster and inlineable approach). Local RA could be completely avoided if we can allocate registers by bin-pack and just fixup everything in case we know that we can just replace virtual ids with physical ids.
Large-Code RA Performance Improvements Pending When running RA for large code (generating a single function that is hundreds of megabytes long) most of the time spent during register allocation is bin-packing. The problem is that bin-packing tries to iterate ALL virtual registers and pack them into bins (each bin represents a single physical register). When the function is long there will be a lot of records for each physical register and every operation would take more and more time when more and more ranges are added to these bins. The solution is to either split the process into smaller packs (like bin-packing N basic-blocks at a time, but not more) or to optimize bin-packing of virtual registers only living in a single basic-block.

Universal JIT (UJIT)

Feature Status Description
Introduce UJIT Done UJIT (Universal JIT) project was ported from Blend2D and aims to reduce the number of lines required to write portable backends. It allows to generate most of the code with unified interface and to opt-in into platform dependent code generation where it matters.
Leading/Trailing Memory Access Pending Add leading & trailing memory access to UJIT so users don't have to implement their own. Trailing memory access is used for loop epilogs