Add support for full 64-bit futex words
This is meant to be a direct replacement for userspace-to-userspace io_uring instances, where we can instead let userspace build whatever interface they find desirable, on top of futexes and a helper scheme which allows sharing memory. (Userspace-to-kernel and kernel-to-userspace rings will however remain, and probably implement this futex opcode themselves.)
This opens up a lot of new possibilities for atomic structures. For example, applications can wait for atomic pointers to change, implement full 64-bit queues (such as userspace-to-userspace io_urings), or utilize 64-bit rwlocks.