FEX-2412
Read the blog post at FEX-Emu's Site!
Welcome back to another FEX release! With the last month's release being cancelled there is a few more changes than average.
Fix Steam's webhelper process again
On November 5th, Valve's Steam client updated their embedded version of Chromium. This uncovered new bugs in FEX-Emu and caused Steam to constantly restart its UI. Luckily Asahi Lina diagnosed the issues and resolved the problems that were occuring! Now Steam is more stable than it has ever been!
Fix WINE path execution problems
Asahi Lina had encountered a bug when running WINE under Fedora which wasn't seen before. Turns out FEX-Emu was incorrectly concating some filepaths together when child processes were getting executed inside WINE. This usually worked but due to the combination of a newer version of WINE and how Fedora set up its paths caused this to break under FEX. With this issue resolved, WINE is working again. There are likely some more bugs due to the nature of how FEX sets up its path handling, but those will likely get solved in time.
Emulate PAUSE instruction using WFE instead of YIELD
This is mostly just a passing curiousity since it doesn't really matter. FEX has long emulated the x86 instruction PAUSE by using a theoretically functionally equivalent ARM64 instruction called YIELD. It came to our attention over the last couple months that Apple Silicon and ARM Cortex implement the YIELD instruction as a nop, so it effectively does nothing. This kind of makes sense since it is typically useful for SMT systems, and no relevant ARM cores support SMT.
On x86 CPUs this matters more since they do support SMT, but PAUSE depending on CPU will also have the CPU core sleep for some number of nanoseconds that is implementation dependent. To be more similar to how PAUSE operates, we have instead switched over to using the WFE instruction which provides similar guarantees on most ARM64 hardware. The trade-off here is that Qualcomm's new Oryon CPU treats WFE as a nop instead. A reasonable trade-off and just an interesting quirk of their hardware.
Adds support for compat input prctl
A major problem that constantly comes back to bite FEX is emulating 32-bit syscalls with complete accuracy. Due to the Linux Kernel hiding information about a syscall returning opaque data structures, FEX jumps through a lot of hoops to make sure everything works while running as a 64-bit ARM process.
The ioctl syscall is one of these annoying syscalls that completely change how they behave depending on the device that is being communicated with, requiring FEX to track file descriptors and wrap all ioctl syscalls with checks see which ioctl it is, to which device. This is fraught with problems since we can't always know 100% if we are actually communicating with a device that we think it is.
This leads us to this new bug that we encountered. When a 32-bit process is communicating with a gamepad, it uses a combination of the joydev and udev subsystems inside of the Linux kernel. Any 32-bit game was having issues where gamepads where the controls were "sticky". While FEX is catching the ioctl communications and translating the data as necessary, there was a more nefarious problem happening. Turns out the Linux kernel in some instances will check to see if a syscall is coming from either a "compat" process, or is inside of a "compat" syscall. A compatibility process is just a process in which the Linux kernel knows it is a 32-bit process, while the kernel is 64-bit. In x86 land this is easy since you just check if it is x86 or x86-64. When running under FEX, the process is /never/ a compatibility process, because it is always 64-bit.
Turns out that the read and write syscalls, depending on what type of device it is communicating with, will return different data if it is compat or not! This isn't just specific to gamepads, there are various subsystems in Linux that check if they are in a compat process and modify the data. This is a huge problem since we would introduce a ton of overhead tracking every read, write, seek, etc syscall and try to compare it against something that needs modified data or not. slp decided to tackle this for the Asahi Linux distro by adding a new prctl to changing the data to the "compat" versions if enabled, even in a 64-bit process. This is a bandage to a significant problem as it only covers gamepad devices today. We will continue tracking these issues in to more subsystems as we find them over on our 32Bit Syscall Woes wiki page.
Support CPU index through TPIDRRO
This is a fun little feature to make FEX's life easier. When FEX is emulating the CPUID and RDTSCP instructions, we need to know the physical CPU index because the instructions will give that information to the program. The RDTSCP instruction gives the index directly, while the CPUID instruction uses it for per-cpu data lookup and also some indexing data.
The fun thing is that Windows is the only operating system that supports this today. So FEX only gets to enjoy this in a little toy sandbox for when it is running under Windows. The linux implementation on the other hand does a syscall to getcpu for each one of these instructions getting emulated, which adds a bunch of overhead. This is due to Linux not supporting using TPIDRRO in this fashion, instead the register is set to zero.
This has an additional knock-on effect that if WINE wants to run Arm64 native Windows applications, this is going to break any algorithm that relies on that Windows feature. Ideally the Linux kernel implements this feature for both WINE compatibility, and FEX performance improvements some day!
FEXCore cleanups and restructuring
While not user facing at all, there has been some restructuring of how FEXCore, the CPU emulation backend of FEX-Emu, exposes its CPU emulation. Historically FEX-Emu has had a problem where we leak OS information across the boundary, making the interface to the CPU backend more convoluted than it needed to be. Over the last couple of months we have been removing large amounts of the Linux specific code from FEXCore and moving it in to the frontend.
While this doesn't affect the user, it makes our programmer's lives easier since they don't need to dance around OS abstractions quite as much. It also makes it easier for upcoming debugger improvements that FEX-Emu is sorely lacking today.
Fixed some JIT bugs
We uncovered some implementation bugs in our AVX emulation this last month. There were four instructions that had some incorrect behaviour. While fixed, we don't know of any actual bugs in games that happened because of these edge case failures. It's a good testament towards having unittesting infrastructure that can catch behaviours like this, and knowing that even with thousands of hand-written assembly tests, some edge cases still aren't tested.
There were also a couple of other misc optimizations and bug fixes but we've gone on for quite a while now.
Raw Changes
FEX Release FEX-2412
AVX128
Fixes some AVX bugs ( c4306f2)
CMake
Generate test list even when testing is disabled ( ee592ba)
FEXCore
Removes ExitHandler and RunUntilExit ( 41c8731)
Removes GetGPRSize and convert all uses to GetGPROpSize ( 9febdde)
Removes stale function definition ( c7098d0)
Removes CoreShuttingDown from ContextImpl ( 65a162b)
Removes remaining RunningEvents from InternalThreadState ( baddfe0)
Moves InternalThreadState ExecutionThread to the frontend ( 95d5b14)
Move InternalThreadState StartRunning to frontend ( e89f48f)
Removes ExitReason from InternalThreadState ( f59fc0f)
Moves ThreadWaiting to the frontend ( 56fadec)
Constify CTX ptr in InternalThreadState ( 649a494)
Moves StatusCode to the frontend ( aa2180d)
Moves DeferredSignalFrames to the frontend ( fad2214)
Moves SignalThread/SignalEvent to Frontend ( b440e17)
Removes EarlyExit running event ( 56c6b0d)
Moves TLS initialization for Alloc::OSAllocator ( 0596a96)
Adds support for CPU Index through TPIDRRO ( 992d6e8)
Change yield implementation to use wfe ( ff51435)
adds support for compat input prctl ( 6a07ea7)
FEXLoader
Align stack base ( 09bfe58)
Fixes newer wine versions and Fedora ( d66ed71)
FEXServer
Fix auto-shutdown regression ( 1b11f2f)
Listen on both abstract & named sockets ( d5c9655)
FileManagement
Hide the FEX RootFS fd from /proc/self/fd take 2 ( 4278c48)
Hide the FEX RootFS fd from /proc/self/fd ( 55b3d67)
GdbServer
Minor work ( b2e61c3)
IR
Convert OpSize over to enum class ( 5ad7fdb)
Converts base IR operations to store OpSize sizes ( 5c6de4e)
Fix some missing OpSize conversions ( 0c29f8f)
JIT
Remove implicit OpSize conversions ( 493b952)
Moves Arm64 JIT up one folder ( 079e70f)
LinuxEmulation
Personality handling ( 0897cd8)
OpcodeDispatcher
drop PossiblySetNZCV ( f41b9bc)
Various missed OpSize implicit cast fixes ( 00ab3f8)
Convert address size helpers to use OpSize ( 704841f)
Convert flags helpers over to OpSize ( c122f3f)
Convert {Load,Store}GPRRegister to OpSize ( 04c701e)
Minor optimization to small pushf ( f6cdb16)
Unify PSHUF{L,H}W implementations ( 5d1fda7)
TestHarnessRunner
Fixes FS/GS usage in tests ( eeb8eb1)
Utils
StringUtil
Fixes ltrim and adds a unittest ( 0463512)
VDSOEmulation
Support loading VDSO thunk as shared ( 389ad73)
WOW64
Set the software CPU area flag ( 22058c0)
X86Tables
Removes XOP tables ( f60388d)
Misc
Check that LDPath is not empty ( 60c52e3)
Support CLONE_FS and CLONE_FILES with fork() semantics ( bcfdf39)
Add some more tests as unsupported by vixl ( c1db7a7)
Avoid warning on assertionless builds ( 1bf06f8)
Check if a candidate home directory exists before using it ( e675f42)
Convert all of the IR operations to use OpSize ( c0a9463)
Fix IR operation usage to use OpSize when possible ( d2aa521)
X87 Code Simplification ( 0190e1a)
Optimize bsf, bsr, register cmpxchg, pcmpistri ( caaacb6)
Disable RPRES in instcounci files ( 767c61c)
add Ubuntu 24.10 and remove unsupported releases ( 368162d)
Fix FXTRACT for 0.0 and -0.0 ( a421ff1)
Implement explicit state switch between X87 and MMX ( 4b945a9)
Remove file since FXAM_Simple is not a Linux test ( 5026bf8)
instcountci
testing multiple 80bit ldst using SVE ( 731e4d6)
unitests
ASM
Removes unused MemoryRegion configs ( e5ceaa1)
unittests
ASM
Fixes x87 80-bit loads on the edge of page boundaries. ( 2e7fc60)
Adds missing unaligned atomic tests ( b5b34df)
Ensures FNSAVE and FRSTOR only store as much data as required ( c00f781)
A new version of FEX has been released to address several issues. It fixes Steam's webhelper process, which was causing issues with the updated Chromium embedded version of Valve's Steam client. Additionally, it fixes WINE path execution problems, which were previously fixed by Asahi Lina. FEX has also replaced YIELD with WFE, an ARM64 instruction that is typically useful for SMT systems. This change is due to Qualcomm's new Oryon CPU treating WFE as a nop, a trade-off that Qualcomm's new Oryon CPU treats as a nop.