FEX Release FEX-2405
One month older and we have a new release for FEX. This month is a little bit slower for user facing features but behind the scenes we had a large amount of refactoring to facilitate future improvements.
Support OpenGL and Vulkan thunking with forwarding X11
A thorn in FEX's side has been forwarding X11 calls when thunking OpenGL and Vulkan. This has caused us pain since X11's API is a fairly leaky abstraction that leaks data symbols which FEX's thunking can't accurately thunk. We have now instead redirect the X11 Display object directly in OpenGL and Vulkan. This not only reduces the amount of code we need to thunk, but also is required for us to eventually thunk 32-bit libraries.
This may change behaviour in some games when thunks are enabled, so it will be useful to do some testing and regression testing.
Enable Enhanced REP MOVS when memcpy TSO is disabled
Alongside last month's optimization when optimizing memcpy instructions, we now enable this CPUID bit when memcpy TSO emulation is disabled. This means that glibc will take advantage of the optimization when it is doing memcpy and memset operations. This is a minor performance improvement depending on the application.
Implement support for SMSW instruction
This instruction isn't too notable since all it does on recent x86 CPUs is return the same data no matter what, but legacy CPUs it was useful for checking if x87 was supported. As this is considered a system level instruction, FEX didn't implement it originally but we finally found a game that uses it. The original Far Cry game from 2004 uses this instruction for some reason. Now that we have implemented the instruction the game at least gets to the menus but seems to still stall out when going in-game. Kind of neat!
Fix disabling TSO emulation on some stack accesses
When emulating the x86 memory model, we can get away with not emulating it when a thread accesses its stack. This works since we know that stack accesses are usually not shared between threads. While we usually disabled the TSO emulation in these cases, we had accidentally missed some cases. This will mean that there is some performance improvements for "free."
Fix ADC and SBC instructions
Back in FEX-2403 we had landed some optimizations for these instructions. Turns out that we had inadvertently introduced some broken behaviour as an edge case. Most games didn't hit these edge cases so they were fine but it completely broke rendering in the Steam release of Final Fantasy 7. With that fixed, we now get correct rendering again!
Option to disable half-barrier TSO emulation on unaligned atomics
On x86 most atomic instructions don't care about alignment, with a feature that Intel calls "Split-locks" to even support unaligned atomics that cross a cacheline. On ARM we don't have this luxury, where all atomic instructions need to be naturally aligned to their size. Newer ARM CPUs added a feature that allows unaligned atomics inside of a 16-byte region, but if the data crosses the edge then FEX needs to handle this instruction in a different way. We backpatch the instruction stream with a "half-barrier" access which still emulates the x86 memory model but is exceedingly heavy.
Now we have an option to convert these "half-barrier" implementations in to non-atomic access. While this doesn't match true x86 behaviour, this can accelerate some games that heavily abuse unaligned atomic memory accesses. So if a game is running slow, this is an option to try out!
Refactor code using clang-format
As alluded to at the start, the FEX codebase has now been completely refactored using clang-format to ensure a more consistent coding experience. We provide a helper script and CI enforcement to ensure this stays in place. This took quite a bit of work to ensure the feature was up to everyone's standards. Major thanks to everyone that worked on this!
Eliminate cross-block liveness for instructions
This is preparation work for future JIT improvements and speedups. Cross-block liveness makes our register allocator more complex and by removing this from our JIT we can start working on improving the register allocator. Look forward to register allocator improvements in the coming months.
Support arm64ec function calls
This changes how some of FEX's JIT infrastructure works in order to support the ARM64ec windows ABI. This will get used in conjunction with Wine's ARM64ec support. While still a work in progress, this is starting to become interesting as WINE continues to implement more features to handle this case.
Change log
- Allocator
- Fixes compiling on Fedora 40 (7b88b0f27)
- AllocatorHooks
- Mark JIT code memory as EC code on ARM64EC (bb24e1419)
- AppConfig
- Disable libGL forwarding for steamwebhelper (02ebb6e32)
- Arm64
- Adds another TSO hack to disable half-barrier TSO (1069cabad)
- CI
- Drop use of obsolete ENABLE_X86_HOST_DEBUG setting (c126b209f)
- CPUID
- Removes FEX version string from CPU model name (faa494c28)
- Fix inverted RDTSCP check (9781b957d)
- Enable enhanced rep movs in more situations (68e543c81)
- DebugData
- Remove header (1616b4e77)
- FEXCore
- Support x64 -> arm64ec calls (1a8b61b9f)
- FHU
- Switch over win32 file operations to std::filesystem (ae5e388e1)
- IR
- Remove VectorZero (a0f2cae1c)
- JIT
- fix neon vec4 faddv (fe70ec727)
- fix ShiftFlags shuffles (376936c80)
- Adds support for spilling/Filling GPRPair (eedb120fd)
- LookupCache
- Track ARM64EC page state in the code cache (f0dad8633)
- OpcodeDispatcher
- Implement support for SMSW (20bd86473)
- Fixes disabling TSO access on RSP SIB stores (1ba678f63)
- Add helper for making segment offset addresses (66cbb6673)
- RCLSE
- disable store-after-store optimization (5cb11aed3)
- X87
- Simplify constant loading for FLD family (271700e9f)
- Misc
- Library Forwarding: Support libGL/libvulkan without forwarding libX11 (b33e0e383)
- clang-format: allow overriding clang-format (3fda47e87)
- Fix 8/16-bit ADC/SBC (9c6f749eb)
- Library Forwarding: Fix issues with libGL's fake X11 dependency (81a420680)
- Factor out SetWriteCursorBefore (a0bf6a425)
- WOW64 backend code invalidation fixes (81c219c21)
- Clang Format file and script (53bed6de6)
- CI workflow to check clang-format usage on pull requests (f9097c0a9)
- Add second reformat to git blame ignore file (55c054bd8)
- Reformat until fixed-point (07e58f8db)
- Create .git-blame-ignore-revs with whole-tree reformat sha (7e97db8d9)
- Remove trace of clang-tidy experiment from CMakeLists.txt (7614ac9f1)
- Whole-tree reformat (1d806ac21)
- Move const to the left in preparation for reformatting (028c22004)
- Enable jemalloc for ARM64EC (a9b7ad841)
- Validate that we have no crossblock liveness (e91e1d553)
- Eliminate xblock liveness for shifts (cbfa426b5)
The latest version of FEX, FEX-2405, is now available. Issues with Fedora 40 compilation, AppConfig, AllocatorHooks, CPUID, DebugData, FEXCore, FHU, IR, JIT, LookupCache, OpcodeDispatcher, RCLSE, X87, and Misc. have been addressed in the change log. Fixes compiling issues on Fedora 40, disables libGL forwarding for steamwebhelper, and identifies JIT code memory on ARM64EC as EC code with AllocatorHooks.