This month, the JIT has undergone optimization, concentrating on SHA256 instructions by utilizing ARM's equivalent instructions, which offer approximately the same functionality. The optimization has approximately doubled the performance of SHA256 during emulation. Furthermore, the implementation of support for ARM's FEAT_FRINTTS extension has been completed, offering instructions that facilitate the rapid rounding of float values into integer restricted ranges.
Several enhancements have been incorporated into the JIT, including the implementation of AVX 128-bit operation zeroing utilizing XZR GPR stores, addressing scalar reciprocal on AFP hardware, optimizing SVE maskmov loadstores, and eliminating trivial moves in CMPXCHG. The VPALIGNR trivial move optimization has been enhanced, along with minor improvements for transferring GPRs into x87 registers. The address calculation issue has been resolved for 32-bit applications.
Recent enhancements and bug resolutions have been implemented in the JIT, encompassing the removal of AVX128, optimization of vpalignr, and improvements in Addressing, Arm64Emitter, CMake, CodeEmitter, SVEOps, Config, ConstProp, ELFContainer, ELFParser, Externals, FEX, FEXCore, Frontend, RA, SMCTracking, Scripts, DefinitionExtract, SignalDelegator, Softfloat, StructVerifier, Syscalls, Telemetry, VDSOEmulation, as well as various updates related to X87, VBSL source, PE volatile metadata support, config options conversion, masking in SHLD, and address modes calculation for 32-bit guests.
FEX Release FEX-2504
Good weekend emulation and PC gaming enthusiasts! Another month has passed and we have implemented a decent number of features, optimizations, and bug fixes. A lot of good things we cooked up, so let's jump right in to it!
Slay the Spire audio fix
Over the past few months of optimizing code we had noticed that Slay the Spire's audio had broken at one point. While trying to bisect the issue we couldn't find when it had broken, was it always the case? After a weekend of struggling to debug it we finally tracked down exactly what was going wrong. A couple months ago now we implemented an optimization to remove x87 stack management from JIT code if we know the state of the stack. This is a great optimization because stack management is very costly. Turns out we had accidentally implemented F{INC,DEC}STP instructions backwards in one code path, pushing and popping the stack in the opposite directions! This had gone unnoticed because this isn't a very common operation, and it also only broke under certain analysis conditions.
With that fixed and new unittests to ensure it never breaks again, Slay the Sphire's audio is now working wonderfully!
Windows PE volatile metadata support
Volatile Metadata is an interesting feature that Microsoft implemented in MSCV 2019 and default enabled. This causes their compiler to generate metadata that the compiler identifies as "volatile". This data is then used
with their ARM64 Prism emulator to avoid costly emulation of the x86 memory model. As FEX developers and users are well aware, emulating the x86 memory model is the number one most costly feature and is required for games to
function in a lot of cases. ARM has implemented FEAT_LRCPC{1,2,3} as a way to improve the performance of emulating the memory model. It's a nice gesture from ARM but the performance is basically a joke on all shipping hardware. This is why Apple has hardware x86-TSOsupport and their performance is the best in the industry, side-stepping the problem entirely.Microsoft's compiler approach lessens the burden on our ARM hardware by just avoiding the problem all together and just having their JIT skip the emulation if the compiler says it is safe to do so. Because this feature has been implemented and enabled by default since 2019, there is actually a large number of games that have this metadata! FEX when running under a WINE-arm64ec environment will now use this metadata as well in order to improve emulation performance!
Hopefully once package maintainers start shipping Arm64ec WINE and FEX packages then this will be easy for users to get effectively free performance!
Optimizations as always
This month we optimized the JIT as we always do. This time around we have optimized SHA256 instructions, they are now using ARM's equivalent instructions which provide roughly the same functionality. While these don't tend to get heavily used for gaming, it's nice when some loading code does some trivial hash checking to ensure the data is valid. With this new optimization we have roughly doubled the performance of SHA256 during emulation.
In addition we have implemented support for ARM's FEAT_FRINTTS extension. This provides a handful of instructions for more quickly rounding float values in to integer restricted ranges. This is a fairly common operation and can result in games like Factorio getting a small performance improvement because of it.
Theres more optimizations and bug fixes in the JIT this month but we would be here forever if we talked about all of them. So here's a list of of the various bits.
- AVX 128-bit operation zeroing now use XZR GPR stores
- Fix scalar reciprocal on AFP hardware
- Optimize SVE maskmov loadstores
- Strict BT flag generation
- Remove trivial moves in CMPXCHG
- Remove mask in register CMPXCHG
- VPALIGNR trivial move optimization
- Clear DF/RF flag on signal
- Minor optimization for moving GPRs in to x87 registers
- Spill PF/AF in one instruction instead of two
- Fix x18 GPR saving
- Tie VBSL source
- Remove masking in SHLD
- Fix address calculation again for 32-bit applications
Raw Changes
FEX Release FEX-2504
AVX128
Addressing
Arm64Emitter
- Removes warning ( d18d043)
CMake
- Propagate cpp-optparse include directories automatically ( af2ee42)
CodeEmitter
Config
- Stop using config values with list when unnecessary ( a8bc20f)
ConstProp
- optimize StoreContext(128-bit zero) ( 3079855)
ELFContainer
- Removes unused debug print functions ( 63a8b66)
ELFParser
- Sanity check ELF program headers ( b145e89)
Externals
- Update fmt to 11.1.4 (from 11.1.0) ( 498c86b)
FEX
- Remove xbyak dependency ( 4929480)
FEXCore
FEXRootFSFetcher
- Move strings in GetDistroInfo() ( b8b6f81)
General
- Replace deprecated std::is_trivial/std::is_trivial_v trait usages ( 25c4fb8)
IR
InstructionCountCI
- add blocks using F->I conversion ( d8e4e00)
IoctlEmulation
- Stop using std::pair and memmove ( 726228c)
JIT
LibraryForwarding
LinuxSyscalls
- Emulate futimesat syscalls ( 31ee8d8)
OpcodeDispatcher
RA
- small cleanups ( fbe3a86)
SMCTracking
- Fix order of VMAs tracked per MappedResource ( 6426428)
Scripts
SignalDelegator
- Clear DF and RF on signal ( 53a55ba)
Softfloat
- Define INLINE ( a2fd3c0)
StructVerifier
- Use isystem for system header includes ( bdd078d)
Syscalls
- Minor cleanup ( 7d1351f)
Telemetry
- Removes unnecessary indirection ( 8ac296b)
VDSOEmulation
- Safe fallback if host VDSO can't be parsed ( aab9e1b)
Various
WINE
- Support sleeping a process ( 53356f1)
Windows
X87
- Minor optimization in how GPRs are moved in to vector registers ( 1700a73)
Misc
- Update tests for fincstp and fdecstp ( acfb24f)
- Initialize OutputFS to -1 ( 6065e7a)
- Optimize float->integer conversions with Feat_FRINTTS ( 8aecdc5)
- Fix 66 90 decoding to a nop ( 0ca34d1)
- Enable maxinst to 1 only if singlestep exists and is enabled ( bf3275b)
- Remove some warnings from Config.cpp ( b744ca1)
- Static analysis warning fixes ( bd1d682)
- Pair PF/AF when spilling static regs ( ec976f3)
- Initialize class fields to null/zero ( b816581)
- Fix pylance warning about possible unbound var ( fdb4a07)
- Fix x18 register saving ( 9672197)
- Improve JSON file validation and error reporting ( 123c5d8)
- Tie VBSL source ( e504a8c)
- Implement PE volatile metadata support ( 7efd827)
- Convert config options once ( 2f6f8b9)
- Drop some masking in shld ( d48413c)
- Fix address modes calculation on 32bit guests ( ab7f648)
- Library Forwarding: Add experimental support for 32-bit Vulkan ( 2d56f5e)
- Add tests for F(U)COMI(P) ( fa1d991)
- Remove some brackets from expected results in tests ( 531ab5b)
actions
- Update to step-security change-files action ( 0f9d791)
unittests
- ASM
- Move test to correct folder ( 2e57ad6)
x87OptimizationPass
- Fixes {Inc,Dec}StackPop ( d5db2cc)
x87StackOptimizationPass
- Initialize a couple of arrays ( 949b205)