XHCI Hardware Issues Tracking
This ticket exists to track the current state of the bugs in the XHCI driver.
The behavior in this issue assumes the latest bug fixes in !199 (merged).
This differs from #29 in that it tracks observed buggy behaviors, rather than unimplemented features.
This draft MR will only be merged once the bugs are fully worked out. In the current version of the driver (without the MR), we do not respond to port status change events at all, which does not follow the specified behavior for enabling a USB device and can lead to some address device commands getting lost due to the device not being ready to receive them. This is fixed in the MR, but other issues were introduced.
The following behaviors are the erroneous behaviors that are currently observed:
On All Systems
- No interrupt method currently works (MSI/MSIX/INTx). After receiving the first TR from a single device, no further interrupts are received.
- Physically plugging or unplugging a device after boot will deadlock the system
- The Cycle Bit in the Event TRB is always 1 and the time of the first received event. Logically, it should be zero because a full cycle has not yet completed (This may play into the lack of interrupts -- The cycle bit tells the XHCI which TRBs to ignore as not ready or in the next cycle. If software and hardware are misaligned, the XHCI may stop responding to commands.
- The software currently places a link TRB that loops back to the start of the buffer as the first element. This also flips the cycle bit to 1. While this is technically valid as per the spec, I have a suspicion that this may not be expected by some hardware (The standard indicates that the cycle bit should be zero).
- With interrupts enabled, the first command TRB never receives a response. QEMU's XHCI trace indicates that no interrupt is ever generated, so the first event TRB should never be processed -- but currently it is.
- The EHB bit in EDRP is always set, even if no interrupt is pending. We need to figure out why this is happening.
In QEMU
- The USB spec indicates that both USB2 and USB3 ports should send a port status change event the moment that the ports detect a connected device. Under QEMU, the ports do not send this event until you have reset all of them. USB2 ports should only be reset after receiving the first port status change to force them into enabled mode. USB3 ports should never be reset, as they'll advance to enabled automatically. This seems to be a known issue with QEMU that QEMU never solved.
On x86 Desktop Hardware
- With polling enabled, all of the currently attached devices can be enumerated, but the moment you plug in a device it deadlocks and never sends any more commands, or receives any more TRBs. There is a lock shared somewhere by the IRQ thread and Device Enumerator thread that is causing this.
- The first one or two devices sometimes get timeout error codes during enumeration as a response to the slot enable request.
- The linux XHCI driver has several quirks related to specific controllers (The intel one being of particular note to us) that we do not currently implement that may explain some of the buggy behavior.
I intend to build out a debug command interface for interacting with the driver to get more detailed debug information, and I have ordered a JTAG-compatible x86 SBC to begin looking into this.