[PATCH v2] PCI: pciehp: Clear LBMS on hot-remove to prevent link speed reduction

From: Smita Koralahalli
Date: Thu May 16 2024 - 16:48:11 EST


Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove
event.

The hot-remove event could result in target link speed reduction if LBMS
is set, due to a delay in Presence Detect State Change (PDSC) happening
after a Data Link Layer State Change event (DLLSC).

In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in
PDSC can sometimes be too late and the slot could have already been
powered down just by a DLLSC event. And the delayed PDSC could falsely be
interpreted as an interrupt raised to turn the slot on. This false process
of powering the slot on, without a link forces the kernel to retrain the
link if LBMS is set, to a lower speed to restablish the link thereby
bringing down the link speeds [2].

According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot
be set for an unconnected link and if set, it serves the purpose of
indicating that there is actually a device down an inactive link.
However, hardware could have already set LBMS when the device was
connected to the port i.e when the state was DL_Up or DL_Active. Some
hardwares would have even attempted retrain going into recovery mode,
just before transitioning to DL_Down.

Thus the set LBMS is never cleared and might force software to cause link
speed drops when there is no link [2].

Dmesg before:
pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
pcieport 0000:20:01.1: pciehp: Slot(59): Card present
pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:20:01.1: retraining failed
pcieport 0000:20:01.1: pciehp: Slot(59): No link

Dmesg after:
pcieport 0000:20:01.1: pciehp: Slot(59): Link Down
pcieport 0000:20:01.1: pciehp: Slot(59): Card present
pcieport 0000:20:01.1: pciehp: Slot(59): No link

[1] PCI Express Base Specification Revision 6.2, Jan 25 2024.
https://members.pcisig.com/wg/PCI-SIG/document/20590
[2] Commit a89c82249c37 ("PCI: Work around PCIe link training failures")

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
---
Link to v1:
https://lore.kernel.org/all/20240424033339.250385-1-Smita.KoralahalliChannabasappa@xxxxxxx/

v2:
Cleared LBMS unconditionally. (Ilpo)
Added Fixes Tag. (Lukas)
---
drivers/pci/hotplug/pciehp_pci.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ad12515a4a12..dae73a8932ef 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -134,4 +134,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
}

pci_unlock_rescan_remove();
+
+ pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
+ PCI_EXP_LNKSTA_LBMS);
}
--
2.17.1