Re: [PATCH 2/4] acpi/ghes, efi/cper: Recognize and process CXL Protocol Errors.

From: Dave Jiang
Date: Thu May 23 2024 - 18:51:55 EST




On 5/23/24 2:19 PM, Smita Koralahalli wrote:
> Hi Dave,
>
> On 5/22/2024 10:59 AM, Dave Jiang wrote:
>>
>>
>> On 5/22/24 8:08 AM, Smita Koralahalli wrote:
>>> UEFI v2.10 section N.2.13 defines a CPER record for CXL Protocol errors.
>>>
>>> Add GHES support to detect CXL CPER Protocol Error Record and Cache Error
>>> Severity, Device ID, Device Serial number and CXL RAS capability struct in
>>> struct cxl_cper_prot_err. Include this struct as a member of struct
>>> cxl_cper_work_data.
>>>
>>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
>>> ---
>>>   drivers/acpi/apei/ghes.c        | 10 +++++
>>>   drivers/firmware/efi/cper_cxl.c | 66 +++++++++++++++++++++++++++++++++
>>>   include/linux/cxl-event.h       | 26 +++++++++++++
>>>   3 files changed, 102 insertions(+)
>>>
>
> [snip]
>
>
>>> +     * The device ID or agent address is required for CXL RCD, CXL
>>> +     * SLD, CXL LD, CXL Fabric Manager Managed LD, CXL Root Port,
>>> +     * CXL Downstream Switch Port and CXL Upstream Switch Port.
>>> +     */
>>> +    if (prot_err->agent_type <= 0x7 && prot_err->agent_type != RCH_DP) {
>>
>> Perhaps define an enum CXL_AGENT_TYPE_MAX instead of 0x7 magic number? Otherwise if a new type is introduced, it would break this code.
>
> Agreed. I will define a boolean array indexed by agent type as suggested by Alison. That would avoid all these comparisons and not worry about breaking code in future.
>
>>  
>>> +        p_err->segment = prot_err->agent_addr.segment;
>>> +        p_err->bus = prot_err->agent_addr.bus;
>>> +        p_err->device = prot_err->agent_addr.device;
>>> +        p_err->function = prot_err->agent_addr.function;
>>> +    } else {
>>> +        pr_err(FW_WARN "Invalid agent type\n");
>>> +        return -EINVAL;
>>> +    }
>>
>> Up to you if you want to do this or not, but maybe:
>>
>>     if (prot_err->agent_type >= CXL_AGENT_TYPE_MAX || prot_err->agent_type == RCH_DP) {
>>         pr_warn(...);
>>         return -EINVAL;
>>     }
>>
>>     p_err->segment = ...;
>>     p_err->bus = ...;
>
> Noted.
>
>>     ...
>>
>> Although perhaps a helper function cxl_cper_valid_agent_type() that checks invalid agent type by checking the valid_bits, the agent_type boundary, and if agent_type != RCH_DP?
>
> Okay.
>
>>> +
>>> +    if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
>>> +        pr_err(FW_WARN "Invalid Protocol Error log\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    dvsec_start = (u8 *)(prot_err + 1);
>>> +    cap_start = dvsec_start + prot_err->dvsec_len;
>>> +    p_err->cxl_ras = *(struct cxl_ras_capability_regs *)cap_start;
>>> +
>>> +    /*
>>> +     * Set device serial number unconditionally.
>>> +     *
>>> +     * Print a warning message if it is not valid. The device serial
>>> +     * number is required for CXL RCD, CXL SLD, CXL LD and CXL Fabric
>>> +     * Manager Managed LD.
>>> +     */
>>> +    if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER) ||
>>> +          prot_err->agent_type > 0x4 || prot_err->agent_type == RCH_DP)
>>
>> prot_err->agent_type > FM_LD? Although maybe it would be a clearer read if a helper function is defined to identify the agent types such as cxl_cper_prot_err_serial_needed() or cxl_cper_prot_agent_type_device() and with it a switch statement to explicitly identify all the agent types that require serial number. If a future device is defined, the > 0x4 logic may break.
>
> Probably helper function is not required if boolean array is defined? What do you think?

That works for me. My main concern is to clarify the code and remove possibility of breakage from future changes.
>
> Thanks,
> Smita
>
> [snip]