Potential Issue: Xilinx I2C Polled Write Functions Freeze on Slave NACK
While working on LIDAR, I noticed that there might be a significant flaw in the Xilinx I2C polling functions we are currently using for both the IMU and the LIDAR sensor. Currently as it’s implemented, the polling methods appear to get stuck in an infinite loop when the target device doesn’t respond (i.e. a NACK shows up on the bus). I originally saw this issue whenever I issued a reset command to the LIDAR and the tried to execute another command right away.
Example Image: First 3 bytes are the reset command, then we can see a NACK (high 9th bit) on the first byte on of the 2nd command. At the time I retrieved this image, the code execution would freeze in an infinite loop.
It turns out that the LIDAR is effectively out of commission while it resets internally, which takes about 13 ms total. In this particular case, I was able to circumvent this the issue by putting a 15 ms delay after the reset.
However, if I’m diagnosing the situation correctly, this is a critical issue. If any device on our I2C bus responds with a NACK during flight, our we will lose all control over the quad, and it will just fall to the ground (it sounds like we have kill switches for stale motor outputs to at least keep the quad from flying away).
I tried looking further into the issue, and it appears the Xilinx polling function for I2C writes only checks for the completion of the data transfer, so until all the data is sent, it just loops, which would be forever if the first byte was NACKed. It looks like there are interrupts to handle the NACK case, but interrupts are disabled for this Xilinx polling function. Looking at the TRM, it seems like the Xilinx polling function is missing the step to clear the FIFO in the case of a NACK:
"If at any point the slave responds with a NACK, the transfer automatically terminates and a transfer NACK interrupt is generated (the NACK bit set). When a NACK is received the Transfer Size register indicates the number of bytes that still need to be sent minus one. Unless the very last byte written by the host into the FIFO was a NACK byte, TXDV remains High. In this case, the host must clear the FIFO by setting the CLR_FIFO bit in the Control register."
So at the moment, I see two potential options: we can either copy-and-paste the Xilinx polling I2C write functions and add the ability to set the CLI_FIFO register the event of a NACK, or we can use an interrupt driven I2C write where we would have to handle the FIFO ourselves.
I’d still say I’m a pretty new to all of this, so if anyone has any suggestions/corrections, I’d greatly appreciate it!