Single-stack kernel

Currently, each context has a 64 KiB stack, which is switched to and from during context switches. However, when switching from user mode (timer interrupts), the kernel stack is by definition empty (before the interrupt). The kernel stack is only populated before switches, when switching from within the kernel, which is almost always when waiting for e.g. a scheme operation to complete, or a futex. By switching, state (i.e. local variables) is conveniently restored when the awaited event is complete. Usually, what is done after it completes, is relatively simple. For example, here.

However, one could argue 64 KiB is too much state for regular scheme ops/futexes/pipes/signal queues, and instead there could be a state enum in ContextStatus::Blocked that stores local variables used at each wait point. I'm not sure how feasible this would be to change in practice, but it would simplify context switching a lot. (Sidenote: although unnecessary due to the simplicity of most Redox syscall handlers, async/await could be used to manage state across such wait points.)

I'm not sure if this will work before the ugly signal stack-switching code is removed. That probably needs to be fixed first.

Each context uses a few hundred bytes AFAIK, and with FXSAVE/FXRSTOR state, round that up to the 4 KiB page size. Removing the kernel stack would thus reduce the kernel's memory usage per context, from 64+4 to 4, i.e. by 17 times. It will also reduce kernel UB (like storing kernel stack bytes in a regular Vec), and make the kernel more like a regular program (1:1 between CPUs and kernel stacks i.e. what Rust calls "threads", only difference is userspace pages). There will obviously be at least one stack per CPU (and probably more by using the x86_64 IST), but it won't scale by the number of contexts.

This idea is called "event-based kernel" in seL4 literature (https://dl.acm.org/doi/10.1145/2517349.2522720).

Edited Aug 10, 2023 by Jacob Lorentzon