Re: [PATCH] [thermal] adding check if the thermal firmware is running

From: Wysocki, Rafael J
Date: Fri May 17 2024 - 06:21:11 EST



On 5/16/2024 10:05 PM, Guilherme Giacomo Simoes wrote:
in the dmesg is showing the message "failed to read out thermal zone"
as if the temperature read is failed by don't find the thermal zone.

After researching and debugging, I see that this specific error is
occurrence because the thermal try read the temperature when is started,
but the firmware is not running yet.

For more legibiliti i create the NOTLOAD error code in the errno-base.h,
and in the iwl_mvm_tzone_get_temp() on tt.c i check if firmware is
running and I set the NOTLOAD code for ret variable and goto out.

After this, in the update_temperature() in the thermal_code.c i received
the return of thermal_zone_get_temp() and check if return is NOTLOAD,
because if it is, I print the warning message "firmware yet not load"
and return for caller

The thermal_core.c i think that is generic for any thermal drivers and
not only used for tt.c of course.
But if this ipotetic driver not check if firmware is running before read
the temperature, the thermal_code.c is work as a before this change.

After this change, in my computer I compile and install kernel in /boot
and in my dmesg the message "failed to read out thermal zone" is not
show any more. In your place the warning messafe "Firmware yet not
load" is showing.

I would like to thank you in advance for any contribution, suggestion
or criticism of my patch suggestion.
---
drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 10 ++++++++--
drivers/thermal/thermal_core.c | 10 +++++++---
include/uapi/asm-generic/errno-base.h | 1 +
3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
index 8083c4b2ab6b..dd5725db06d2 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
@@ -620,8 +620,14 @@ static int iwl_mvm_tzone_get_temp(struct thermal_zone_device *device,
mutex_lock(&mvm->mutex);
- if (!iwl_mvm_firmware_running(mvm) ||
- mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
+ const int res = iwl_mvm_firmware_running(mvm);
+
+ if (!res) {
+ ret = -NOTLOAD;
+ goto out;
+ }
+
+ if (mvm->fwrt.cur_fw_img != IWL_UCODE_REGULAR) {
ret = -ENODATA;
goto out;
}
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 34a31bc72023..4116d312d4a1 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -414,10 +414,14 @@ static void handle_thermal_trip(struct thermal_zone_device *tz,
static void update_temperature(struct thermal_zone_device *tz)
{
- int temp, ret;
-
- ret = __thermal_zone_get_temp(tz, &temp);
+ int temp;
+ int ret = __thermal_zone_get_temp(tz, &temp);
if (ret) {
+ if (ret == -NOTLOAD) {
+ pr_warn("Firmware yet not load");
+ return;
+ }
+

The thermal core doesn't need to be modified for this.

Please print the new message from the driver and you may as well return -EAGAIN from it in all cases when the issue is expected to be intermittent to prevent the core from printing the (existing) warning message.

Thanks!


if (ret != -EAGAIN)
dev_warn(&tz->device,
"failed to read out thermal zone (%d)\n",
diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h
index 9653140bff92..8b92c41f7993 100644
--- a/include/uapi/asm-generic/errno-base.h
+++ b/include/uapi/asm-generic/errno-base.h
@@ -36,5 +36,6 @@
#define EPIPE 32 /* Broken pipe */
#define EDOM 33 /* Math argument out of domain of func */
#define ERANGE 34 /* Math result not representable */
+#define NOTLOAD 35 /* Firmware yet not load */
#endif