AsmJit is no longer maintained due to the absence of funding. To resume the development, please visit the Funding Page and act accordingly.

Roadmap

The roadmap provides an overview of planned features for future releases. These items are not guaranteed to be in active development, but represent ideas and goals that would make AsmJit faster and more feature complete.

Please also consult AsmJit's issues page, which contains reported bugs and feature requests possibly not mentioned on this page.

ISA Support

Feature	Status	Description
X86 & X86_64	Supported	Baseline, AVX, AVX-512, and AMX extensions supported. Intel APX is not supported.
AArch32	Not Ready	Not ready, there is a separate branch providing ARM32 ISA (no thumb atm), but nobody works on it and it's not mergeable in the current state (no Compiler and UJIT support)
AArch64	Supported	Baseline and NEON extensions are supported. New extensions (SVE/SME/...) are not supported.
RISC-V	Pending	Not supported.
LoongArch 64	Pending	Not supported.

X86 Extensions Support

Extension	Status	Description
Baseline + Extensions	Done	Baseline ISA and Extensions, including `ADX`, `BMI`, `BMI2`, `CMPXCHG`, `LZCNT`, `PONCNT`, ...
MMX + SSE[1 to 4.2]	Done	`MMX`, `3DNOW`, `SSE`, `SSE2`, `SSE3`, `SSSE3`, `SSE4.1`, `SSE4.2`, `SSE4A`, `PCLMULQDQ`.
AVX + AVX2	Done	`AVX`, `AVX2`, `F16C`, `FMA`, `FMA4`, `XOP`, `AVX_IFMA`, `AVX_NE_CONVERT`, `AVX_VNNI_xxx`.
AVX-512	Done	`AVX512_[BW\|CD\|DQ\|F\|BITALG\|FP16\|IFMA\|VBMI\|VBMI2\|VNNI\|VP2INTERSECT\|VPOPCNTDQ]`.
AVX10.1	Pending	Not supported, but `AVX10.1` doesn't bring any new instructions (it's re-branded `AVX-512`).
AVX10.2	Pending	Not supported.
AMX	Done	Baseline AMX instructions are supported (some later extensions might not be supported).
APX	Pending	Not supported.
Other Extensions	Done	Other extensions including virtualization, special instructions, and system instructions, are supported.

AArch64 Extensions Support

Extension	Status	Description
ARMv8.1+ ISA	Done	Baseline, `BTI`, `CHK`, `CLRBHB`, `CRC`, `CSSC`, `DGH`, `FLAGM`, `FLAGM2`, `HBC`, `MTE`, `LSE`
ASIMD	Done	`ASIMD`, `AES`, `SHA2`, `SHA256`, `SHA3`, `SHA512`, `RDMA`, `FCMA`, `JSCVT`, `FHM`, `SM3`, `SM4`, `DOTPROD`, `BF16`, `I8MM`.
SVE	Pending	`SVE`, `SVE2`, ...
SME	Pending	`SME`, `SME2`, ...
GP Extensions	Pending	Recently introduced extensions such as `LSE128` are not supported.
Miscellaneous	Pending	Some instructions are not fully supported, like `ADRP`. Some immediate operators like `\|lo12\|` are not supported as well.

Architecture Independent Features & Ideas

Feature	Status	Description
Generated Assemblers	Pending	Generate assembler's emit functions based on instruction database to reduce the time required for AsmJit maintenance and to add new instructions. At the moment most instructions are added manually, but this doesn't scale especially when AsmJit gets more ISA backends. Instead of doing everything by hand, we should have a generator that would simply generate all the required stuff the Assembler needs. Generators can generate faster code than what we have today. AArch32 backend, which is currently in a separate branch already provides such generator, but what we need is to use the same approach for generating X86 and AArch64 assemblers. Generators can also be used to generate instruction queries (like RW info, extensions, etc...).
Table-Driven Assemblers	Pending	Assembler performance can be implemented by using a table-driven approach and SIMD. For example we can use permutations (`PSHUFB` / `TBL`) instructions to permute data that we are going to assemble. We can use SIMD comparisons to compare instruction operands with table signatures and based on the index of the match and permutations we can assemble instructions much faster than when using the current scalar approach. SIMD-use would definitely reduce the number of branches and also the number of encodings each Assembler has to handle (because only a minor difference in instruction signature means using a different encoding identifier). This feature would need a generated assembler as doing this by hand would not be great.
Always Validating Assemblers	Pending	Assemblers should always validate the operands and internally match the instruction signature. At the moment instruction validation is a separate feature that when turned on slows down assembling significantly, because Assembler would explicitly call the validator. The validator is optimized for size and not for performance, so this combination is not ideal. What we want is to have validation part of the Assembler so it doesn't have to be turned on or off. This feature would need a generated assembler as the validation logic would be generated. It could be table-driven to improve performance. Implicit validation cannot make the assembler slower in 99% of cases, but complex cases are permitted.
RX^W Without Dual-Mapping	Pending	Implement a virtual memory runtime, which would use RX^W pages without dual mapping. There are in general two options to do this: Allocate at page granularity for each code fragment, which would be costly in cases AsmJit is used to generate small code. However, there is an another options, which is much more complex, but would be more memory efficient. Each time a new executable memory allocation happens, the memory allocator would increment a generation counter and allocate a block where older code fitting into that block would be moved. The user would only have to make sure to wipe out older generations the code is not using. This would be more like an experiment initially to validate whether it's even possible to design a good API that would fit this approach.
Homogenous Structs in Functions	Pending	Functions currently don't support passing structs as values. Usually people don't want to pass random structs, but structs that contain for example 4 floats, and are thus homogenous. This feature would be possible to implement.
Canonical Representation of Instructions	Pending	Only allow a single form of instructions that have implicit operands and provide API for making such instructions canonical - this would support projects like AsmTK/Zydis that do parsing or disassembling to always get the correct form regardless of whether the instruction was specified by using explicit or implicit operands. This would also simplify instruction query API that would not have to deal with instructions that have both implicit and explicit forms - this includes `DIV`, `IDIV`, `MUL`, `MULX`, `REP+MOVS`, `CMPXCHG`, ...
Merge AsmTK with AsmJit	Pending	Merging AsmTK and AsmJit would mean that we would be finally able to have AArch64 parser as well. What do we need to do before merging is own code for string to double conversion for the parser, and instruction canonization.

Register Allocation

Feature	Status	Description
Liveness Analysis Performance Improvements	Done	Improve the performance of liveness analysis by not considering virtual registers that only live in a single basic block. This optimization alone could save like 80% of memory when generating very large functions, because all the bits used by liveness analysis per each basic-block are reduced to only compute virtual registers spanning across multiple basic blocks.
Small-Code RA Performance Improvements	Pending	When running RA for small code (machine code under 32 KiB) the biggest overhead is CFG construction and running a local RA. CFG construction could be improved by using a different table to query RW information (basically a faster and inlineable approach). Local RA could be completely avoided if we can allocate registers by bin-pack and just fixup everything in case we know that we can just replace virtual ids with physical ids.
Large-Code RA Performance Improvements	Pending	When running RA for large code (generating a single function that is hundreds of megabytes long) most of the time spent during register allocation is bin-packing. The problem is that bin-packing tries to iterate ALL virtual registers and pack them into bins (each bin represents a single physical register). When the function is long there will be a lot of records for each physical register and every operation would take more and more time when more and more ranges are added to these bins. The solution is to either split the process into smaller packs (like bin-packing N basic-blocks at a time, but not more) or to optimize bin-packing of virtual registers only living in a single basic-block.

Universal JIT (UJIT)

Feature	Status	Description
Introduce UJIT	Done	UJIT (Universal JIT) project was ported from Blend2D and aims to reduce the number of lines required to write portable backends. It allows to generate most of the code with unified interface and to opt-in into platform dependent code generation where it matters.
Leading/Trailing Memory Access	Pending	Add leading & trailing memory access to UJIT so users don't have to implement their own. Trailing memory access is used for loop epilogs