The recent updates for JIT encompass bug fixes and performance enhancements, addressing issues such as incorrect backpatching of unaligned atomics, improper instruction handling, and optimizing the performance of several instructions. The updates encompass a resolution for float to integer overflow behavior, an adjustment for ModRM decoding of 3DNow! instructions, and a correction for H0F3A table decoding.
FEX-2501
Welcome back to another new FEX-Emu release in the new year! While everyone was out celebrating the holidays, we still managed to get some work done. So let's get in to what we did this last month!
Official WINE WoW64 and Arm64ec package support
This month we have updated our Ubuntu ppa repository to now support a fex-emu-wine package. This package provides wow64 and arm64ec emulator DLL files that can be applied directly to an AArch64 build of WINE, thus allowing you to do x86 and x86-64 emulation inside of WINE directly and removing a ton of CPU overhead in the emulation! This is relatively fresh so there will be some teething issues around getting it setup, like the current upstream WINE may not integrate directly in to these builds yet. Check out our wiki for more information about getting this hooked up.
Partial support for inline self-modifying code and trap flag
As we work towards supporting more edge-case behaviour of anti-tamper and anti-debugger software. We have spent some time this month implementing support for inline self-modifying code and the trap flag. In particular Denuvo uses inline self-modifying code which is relatively annoying to support, but we can use the fact that it tends to generate invalid instructions to determine that a block of code is invalid early, thus letting it work. There's some more work towards making this more robust but this gets a decent number of games running.
The trap flag on the other hand is interesting because this is an anti-debugger tactic that some badly behaving launchers use. This is because of how debuggers treat the trap flag versus how it works when a debugger isn't running, this lets the application detect the debugger and throw an error. FEX didn't quite handle this correctly which was causing these launchers to throw their hands up and stop running.
A note is that some of this work is only wired up on the WINE side rather than the FEX-Emu Linux emulation side, so mileage may vary!
JIT bug fixes and performance improvements
As usual, a lot of fixes landed for our JIT, ranging from incorrect backpatching of unaligned atomics, to incorrect instruction handling, to improving performance of a couple of instructions. Let's break down what we fixed this month.
Fixed backpatching of unaligned atomics with small immediates
ARM's FEAT_LRCPC2 extension added TSO instructions for small immediate offsets in the range of -256 to 255. These still have the regular atomic limitation of ARM where the address needs to be naturally aligned (or within 16 byte granule!) of the access type. FEX needs to emulate unaligned memory accesses from x86 by backpatching these instructions to be a DMB plus load or store. We were incorrectly patching these instructions with the small offsets. This will improve stability of emulation on hardware that supports the new FEAT_LRCPC2 instructions
Fix float to integer overflow behaviour
This is a very important change for how FEX handles when converting a float value to an integer and an overflow occurs. While we knew of the problem, we didn't realize how wide reaching the problem was causing problems. In particular this fixes The Talos Principle's audio cuting out, Animal Well's music having chirping artifacts, SOMA not allowing interactions with things in the world, Satisfactory's server crashing, and Metaphor Refantazio infinite looping before getting in-game!
There are sure to be a bunch of other little fixes that this also fixes because it's a pervasive problem that games rely upon!
Fix ModRM decoding of 3DNow! instructions
While 3DNow! isn't used in any recent games, to the point that AMD has removed the instruction set from Zen CPU cores, older games still use this extension if possible. Turns out we had a gap in our testing infrastructure for when a 3DNow! instruction used the SIB encoding form of the instruction. This would result in crashes and misinterpreting of instructions. This will fix some older 32-bit games using 3DNow! and of course we added new unittests to our testing infrastructure to make sure it keeps working.
Fixes H0F3A table decoding
This fix doesn't affect any known applications, but because of how x86 compilers aggressively pad instruction sizes, this could crop up anywhere without us noticing. When the H0F3A instruction table gets decoded, FEX was incorrectly applying the REX_W prefix to instructions that would ignore the prefix. Out of all the instructions in the table, only three actually care about the prefix while the others always ignore it. If this padding occured then FEX would think it is an unknown instruction and crash. This has now been resolved which should keep us from ever hitting the issue.
Generate 80-bit SVE loadstores when necessary
For all the users that have SVE supporting hardware (There aren't a lot of you!), we have added a new optimization that converts two loads or stores in to a single 80-bit masked loadstore instruction. While this isn't going to be a huge improvement because this only occurs with x87 code, it's another little optimization in the list of things that SVE improves for x86 emulation.
Increase minimum kernel requirement from 5.0 to 5.15
We're moving in to the future with some changes that require increasing our minimum kernel version. Because we were allowing such an old version of the Linux kernel, we were hitting some heartburn in some codepaths. In order to make this easier, we are moving up the minimum kernel requirement to an LTS release of the kernel released back in 2021 already! We don't expect this to cause too many problems, since this is an kernel supported by Ubuntu for 22.04
Drop official support for ArchLinux
Due to a clarification from the ArchLinux team this last month, they are no longer allowing packages in the AUR that don't support x86-64. Due to this change and that FEX only supports running on an AArch64 host, they have removed our official packages from AUR. There's nothing that we can do about this besides dropping support for ArchLinux.
Raw Changes
FEX Release FEX-2501
ArchHelpers
Arm64
Fixes LDAPUR and STLUR backpatching ( 1e827ec)
ConstProp
fix 32-bit masking behaviour ( c902b88)
Context
Constify GPRs passed to ReconstructCompactedEFLAGS ( a86c922)
External
Update bundled libfmt ( 7e257cc)
FEXCore
Emulate EFLAGS.TF ( e88c92d)
Override x87 precision control when necessary ( 8111b7c)
Don't
WaitForEmptyJobQueue
if CodeObjectCacheService isn't used ( 5a4691f)FEXLoader
Increase minimum kernel requirement from 5.0 to 5.15 ( 6bc7a83)
Enable early logs output to stderr ( e32c538)
Frontend
Fix ModRM handling with 3DNow! ( 15a1a0f)
GdbServer
Fixes encoding of hex ( 735a4f9)
Support 32-bit context definitions ( 072cf4c)
Implement support for
$vKill
( 46fb858)IR
Change convention from number of elements to elementsize ( a6c67ca)
Passes
Adds missing comment that clang-format keeps complaining about locally ( b03b02d)
InstCountCI
Adds more LRCPC2 tests that are missed ( cd6722f)
Implement support for TSO and LRCPC and add hot block that could be optimized ( 9fb69ed)
InstructionCountCI
add some hot blocks from Factorio ( e44d1f1)
Linux
Fixes typo in removing RESOLVE_IN_ROOT flag ( e55b5d0)
FaultSafeUserMemAccess
Break out fault safe handler ( 57178ab)
LinuxEmulation
Don't use clone3 for fork ( 71187d3)
LinuxSyscalls
Log unhandled clone3 fork flags ( c3261b4)
Ensure CSIGNAL is merged back in to flags for clone2 ( c7fb95a)
Fixes exit syscall ( bdae4f6)
OpcodeDispatcher
Fixes FEX's H0F3A table handling of REX.W ( 90b1ac4)
Minor division improvement ( 04e785e)
ThreadManager
Add some sanity asserts ( d8ef702)
Threads
Fix memory leak in joinable() ( f906c6a)
Thunks
gen
Add support for compiling against clang 19 ( 7b2fc37)
Utils
FileLoading
Fix LoadFileImpl ( 527752c)
Windows
Only deinit the thread CRT when destroying the current thread ( d2bac45)
Track RWX regions in mapped images ( 27ededf)
Misc
Just a few things picked up from static analysis ( 8913c59)
Support a merged RootFS (and a bunch of related fixes) ( 2d66bc2)
Fix float->int conversion overflow behaviour ( d2f86e4)
Library Forwarding: Allow reading standard library headers from a development x86 rootfs ( d66cd16)
Support inline self modifying code ( 656477e)
Generate SVE for 80bit load/stores when possible ( 8427731)
docs
Remove Arch from the release process. ( f8b6edf)
unittests
Adds a 3DNow! ModRM SIB encoding test ( 3abe6c1)
ASM
Fix incorrect instruction form test ( b391fe6)
Adds missing MMX PADDQ test ( fc1b500)
gvisor
Disable memfd tests ( 8bee101)
x87StackOptimizationPass
Minor opt to f80 fchs and fabs ( f51812a)