aarch64 boot on qemu fails with redoxfs panic
When booting on qemu, redoxfs fails with redoxfs-crash-log.
ELR_EL1
points at a BRK instruction. ESR_EL1
, when decoded with something like
println!(
"ESR_EL1: {:>016X} ISS={:>06X} instr len={:?} class={:>02X}", done in an idiomatic way, hardcode strings?
{ self.esr_el1 },
{ self.esr_el1 & 0xffffff },
{ self.esr_el1 >> 25 & 1 },
{ self.esr_el1 >> 26 & 0x3f }
);
in dump reveals
ISS=000001 instr len=1 class=3C
0x3c
is BRK instruction execution from AArch64 state.
Manually dissecting the backtrace shows that the addresses correspond to the backtrace printing code, not the code that failed.
Early on, a fork happens, and the parent & child both panic. Looks like the child has this in src/bin/mount.rs
.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/bin/mount.rs:373:39
And the parent has
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/wangenent/code/redox/mac/rust/library/alloc/src/collections/btree/navigate.rs:588:48
Some notes from conversations in the chat
- @4lDO2 mentioned that problems the btree is often caused by memory corruption.
- @microcolonel suggested checking the fork/context switching code. Audit the process creation, fork and context switch code. Come up with tests to stress this part. Tests that do not depend on an initfs or a livedisk etc.
- @rw_van suggested it's probably memory corruption or a race condition
An interesting observation, is that if redoxfs is built with -O0
in a custom profile, e.g.
[profile.will-debug]
inherits = "release"
opt-level = 0
then the problem doesn't manifest itself and redox boots fine.