aarch64 boot on qemu fails with redoxfs panic

When booting on qemu, redoxfs fails with redoxfs-crash-log.

ELR_EL1 points at a BRK instruction. ESR_EL1, when decoded with something like

println!(
    "ESR_EL1: {:>016X} ISS={:>06X} instr len={:?} class={:>02X}", done in an idiomatic way, hardcode strings?
    { self.esr_el1 },
    { self.esr_el1 & 0xffffff },
    { self.esr_el1 >> 25 & 1 },
    { self.esr_el1 >> 26 & 0x3f }
);

in dump reveals

ISS=000001 instr len=1 class=3C

0x3c is BRK instruction execution from AArch64 state.

Manually dissecting the backtrace shows that the addresses correspond to the backtrace printing code, not the code that failed.

Early on, a fork happens, and the parent & child both panic. Looks like the child has this in src/bin/mount.rs.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/bin/mount.rs:373:39

And the parent has

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/wangenent/code/redox/mac/rust/library/alloc/src/collections/btree/navigate.rs:588:48

Some notes from conversations in the chat

@4lDO2 mentioned that problems the btree is often caused by memory corruption.
@microcolonel suggested checking the fork/context switching code. Audit the process creation, fork and context switch code. Come up with tests to stress this part. Tests that do not depend on an initfs or a livedisk etc.
@rw_van suggested it's probably memory corruption or a race condition

An interesting observation, is that if redoxfs is built with -O0 in a custom profile, e.g.

[profile.will-debug]
inherits = "release"

opt-level = 0

then the problem doesn't manifest itself and redox boots fine.

Edited May 11, 2023 by Will Angenent