FEX-2302
Read the blog post at FEX-Emu's Site!
This month certainly passed in the blink of an eye. A lot of good bug fixes this month as usual! Continue reading to find out more.
Fix incorrect operation for cache line clears
In emulating the CLFLUSH instruction, FEX was incorrectly using the wrong operation for clearing caches. We were accidentally using the CVAU operation instead of CIVAC.
While this is incorrect, it was hard to find anything that was actually affected by the wrong implementation. With Snapdragon's open source Vulkan driver implementing what is required for VKD3D,
it became evident from Vulkan tests that this was incorrectly implemented. Switching the implementation is easy and will let VKD3D run without hacks
when the required feature is finished.Bug fixes to 64-bit x87 emulation
A big thanks to CallumDev for finding and fixing these latest bugs in FEX's less accurate x87 emulation. As a
reminder, x87 on original hardware operates using 80-bit float values. This is a feature that ARM doesn't natively support, so FEX needs to emulate
this using a software floating point library. We have a hack in our configuration to allow removing this software implementation and instead operate
using 64-bit double operations instead. This can significantly improve performance in some 32-bit games but introduce rendering artifacts.This month there were many bug fixes:
- ALU operations that consume integers converted to floats are fixed
- Float comparison that also consumes 16-bit integers fixed
- FPREM instruction no longer infinite looping
With these fixes in place, a large number of games now actually render correctly with this hack enabled. It will be interesting to see how well this
improves performance or batterty savings in 32-bit games!More AVX instructions emulated
With one of FEX's developers taking some away time, this was a little less involved than the last couple of months.
There was still a handful of instructions implementation
- VPBLENDD, VBLENDPS, and VPSRAVD
Additionally while these aren't AVX instruction, we also implemented the CLWB and CLFLUSHOPT instructions. These match their ARM equivalents so it was
mostly an easy implementation that applications can use if they want.Fix copy and paste error in Arm64 JIT
While this is a fairly minor issue, we had a copy and paste error in FEX's register spilling code. This caused Steam to crash in certain situations,
so fixing this since the previous release helps users wanting to run that.A bunch of minor optimizations
This month had a bunch of small optimizations around the entire project. Alone these are all quite minor but added together should result in a couple
percentage of CPU time removed from FEX's JIT.
- Arm64 Dispatcher is slightly faster
- CPUID emulation initialization is faster
- Optimize File loading, improving config loading time
- Frontend instruction decoder optimizations to be faster
- Makes IR operations 1 byte smaller, improving memory usage
- Inline IR constants optimization to reduce IR memory size
Fixing thunk symbol override fetching
FEX's thunks had an issue where if a library was loaded, we would only ever fetch relevant symbols from that library directly. While this worked for
our use case, it breaks when wanting to use MangoHud in OpenGL applications. Resolving this issue fixes most things that will override symbols with
LD_PRELOAD.Update JEMalloc from 5.2.1 to 5.3.0
While this is a fairly minor change, this release on JEMalloc fixes some bugs and improves performance. Small but every performance improvement is
welcome.Support for execveat with AT_EMPTY_PATH
This is an interesting feature where an application can be executed directly through a file descriptor instead of a filepath on disk. This is a fairly
simple idea but has some interesting edge cases that might be interesting to some people. To see the more technical information about implementing
this, check out the pull request.Raw Changes
ARMEmitter
Handle integer add/subtract vectors (predicated) instruction class ( 9d33bba)
Handle RMIF, SETF8/SETF16 ( a899f9f)
Handle SVE floating-point recursive reduction ( 1cda029)
Add a few missing instructions ( 2c9f99e)
Support helper for long address generation ( f8d56a8)
Removes some warnings that cropped up ( 5fd8fdb)
Arm64
Merge two loads in to an LDP ( a28039f)
Fixes incorrect operation for CacheLineClear ( f8d92aa)
Use switch statement for op handlers instead of jump table ( 565ed45)
Fix SpillRegister C&P error ( 9c93c6f)
Fixes large offset spill slots ( 9acb513)
VectorOps
Clamp shift amount to esize-1 for VSShr ( 9a318ca)
ArmEmitter
Adds two more classes of ASIMD instructions ( 95e544c)
Adds three more classes of ASIMD instructions ( 81e0ac7)
CPUID
Optimize initialization ( f614fc6)
Config
Fix relative execve applications. ( 65971ef)
ConstProp
Pool inline constants ( 1e90ebb)
Core
Adjust virtual memory size for 32-bit ( 7f6a620)
Dispatcher
Extract 64-bit signal frame save and restore ( 65b6b6d)
Fixes x86-64 SA_SIGINFO generation ( 8dae785)
ELFCodeLoader
Don't use std::random_device for RNG ( f5e97f3)
Emitter
Remove unused header ( 90bcb8c)
External
Update JEMalloc to disable 16k pages ( bbf9198)
Externals
Update jemalloc to 5.3.0 ( 9322e55)
F64
Fix integer immediates for add,mul,div,sub ( c2325e1)
FEXCore
Fixup 32-bit signal handling ( fa1193f)
FEXLoader
Adds support for execveat with AT_EMPTY_PATH ( dcce9ad)
Build FEXInterpreter and FEXLoader independently ( 8974509)
FEXRootFSFetcher
Support option to auto select first distro ( a7aeb4a)
FEXServer
Remove POLLREMOVE usage ( d2d5282)
FileLoading
Optimize FileLoad ( 28dd946)
Frontend
Various optimizations ( 787b689)
Github
Add ARM emitter tests to CI ( da88c68)
IR
Removes NumArgs member from IR ops ( 9403c66)
Remove HasDest member ( f8e762f)
JitSymbols
Fixes file opening and writing ( a486797)
Fixes a crash that can occur ( 34e1ba6)
Linux
Fixes shebang file execution ( 477d4b6)
MContext
Insert a stack cookie with assertions enabled ( 7664359)
OpDispatcher
Adds support for CLWB and CLFLUSHOPT ( 7be2e1a)
Fixes a few missing GPR/XMM helper usages ( 4aa984a)
OpcodeDispatcher
Handle VPBLENDD/VBLENDPS ( 62e6ada)
Handle VPSRAVD ( fe79f61)
Scripts
Update InstallFEX.py rootfs links ( df87042)
Syscalls
Fix out-of-bounds read when handling single-line shebang files ( 3d29dac)
Thunks
Fixes host symbol overrides ( 9d35bc0)
X86Tables
Optimize struct layouts ( dfc3297)
Misc
X87_F64: Fixes FICOM ( afaff92)
fix ifdef to use HAS_SYSCALL_TGKILL for tgkill as it was intented ( 8d0329d)
fix tgkill ( 1521e0a)
Fix FPREM flags calculation in F64 ( 632add6)
unittests
Adds negative integer x87 tests ( ee58c5d)
The FEX-EMU, which enables the execution of x86 and x86-64 binaries on an AArch64 host, has been updated.