Re: [PATCH v1 03/31] x86/resctrl: Move ctrlval string parsing policy away from the arch code

From: James Morse
Date: Thu May 23 2024 - 14:04:35 EST


Hi Reinette, Dave,

On 18/04/2024 06:34, Reinette Chatre wrote:
> On 4/16/2024 9:16 AM, Dave Martin wrote:
>> On Mon, Apr 15, 2024 at 10:44:34AM -0700, Reinette Chatre wrote:
>>> On 4/12/2024 9:16 AM, Dave Martin wrote:
>>>> On Mon, Apr 08, 2024 at 08:14:47PM -0700, Reinette Chatre wrote:
>>>>> On 3/21/2024 9:50 AM, James Morse wrote:
>>>
>>>>>> @@ -195,6 +204,14 @@ int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static ctrlval_parser_t *get_parser(struct rdt_resource *res)
>>>>>> +{
>>>>>> + if (res->fflags & RFTYPE_RES_CACHE)
>>>>>> + return &parse_cbm;
>>>>>> + else
>>>>>> + return &parse_bw;
>>>>>> +}
>>>>>
>>>>> This is borderline ... at minimum it expands what fflags means and how it
>>>>> is intended to be used and that needs to be documented because it reads:
>>>>>
>>>>> * @fflags: flags to choose base and info files

Hmm, true this is used to day to select which groups of files appear.


>>>>> I am curious why you picked fflags instead of an explicit check against
>>>>> rid?

Simply because it would need to match both L2 and L3 to parse_cbm, I didn't think that
would scale if other cache resources get added. However, with an enum of types we can get
the compiler to bark if an entry is needed here, which is probably good enough.

more background: {
In the arm world the cache hierarchy isn't something we can reason about. We
have funny names for where different things converge, (Point of Coherency,
Point of Unification etc), but its up to the platform designer if/where the
L2/L3 or even L9 live. The cache topology is fed to the kernel via an ACPI
table.

I anticipate a 'System Cache' resource and schema eventually being added to
resctrl as it looks to be a popular hardware design. These system-cache live
after the L3 (if there is one).
}


>>>> Is fflags already somewhat overloaded? There seem to be a mix of things
>>>> that are independent Boolean flags, while other things seem mutually
>>>> exclusive or enum-like.
>>>>
>>>> Do we expect RFTYPE_RES_CACHE | RFTYPE_RES_MB ever to make sense,
>>>> as David points out?
>>>>
>>>>
>>>> With MPAM, we could in theory have cache population control and egress
>>>> memory bandwidth controls on a single interconnect component.
>>>>
>>>> If that would always be represented through resctrl as two components
>>>> with the MB controls considered one level out from the CACHE controls,
>>>> then I guess these control types remain mutually exclusive from
>>>> resctrl's point of view.
>>>>
>>>> Allowing a single rdt_resource to sprout multiple control types looks
>>>> more invasive in the code, even if it logically makes sense in terms of
>>>> the hardware.

MPAM allows this, but it doesn't fit with resctrl. The MPAM drivers resctrl glue code has
a load of stuff to present these as separate resources to resctrl, even if they are the
same piece of hardware underneath.

So far it looks possible to hide this, I don't think its worth changing resctrl's
behaviour to try and cover this.

RFTYPE_RES_CACHE and RFTYPE_RES_MB would remain mutually-exclusive.


>>>> Anyway, for this patch, there seem to be a couple of assumptions:
>>>>
>>>> a) get_parser() doesn't get called except for rdt_resources that
>>>> represent resource controls (so, fflags = RFTYPE_RES_foo for some "foo",
>>>> with no other flags set), and
>>>>
>>>> b) there are exactly two kinds of "foo", so whatever isn't a CACHE is
>>>> a BW.
>>>>
>>>> These assumptions seem to hold today (?)
>>>
>>> (c) the parser for user provided data is based on the resource type.
>>>
>>> As I understand (c) may not be true for MPAM that supports different
>>> partitioning controls for a single resource. For example, for a cache
>>> MPAM supports portion as well as maximum capacity controls that
>>> I expect would need different parsers (perhaps mapping to different
>>> schemata entries?) from user space but will be used to control the
>>> same resource.

Exactly - to maintain compatibility with existing software the driver has to present it as
a totally new thing. I guess it will look something like this:
| L3:0=0xffff;1=0xffff;
| L3_CAP:0=1048576;1=;1048576

Where existing software knows about 'L3', and should ignore 'L3_CAP'.


>>> I do now know if the goal is to support this MPAM capability via
>>> resctrl but do accomplish this I wonder if it may not be more appropriate
>>> to associate the parser with the schema entry that is presented to user space.

Even better.

For Tony's resctrl2 I had mused on exposing to user-space whether the controls were a
bitmap/percentage/MBps-value/raw-number. As there is a parser for the first two (or three)
today I think keying these from something in the schemata makes the most sense.


>>>> But the semantics of fflags already look a bit complicated, so I can
>>>> see why it might be best to avoid anything that may add more
>>>> complexity.
>>>
>>> ack.
>>>
>>>> If the main aim is to avoid silly copy-paste errors when coding up
>>>> resources for a new arch, would it make sense to go for a more low-
>>>> tech approach and just bundle up related fields in a macro?
>>>
>>> I understand this as more than avoiding copy-paste errors. I understand
>>> the goal is to prevent architectures from having architecture specific
>>> parsers.

[...]

>>> You do highlight another point though, shouldn't the fs code own the
>>> format_str also? I do not think we want arch code to control the
>>> print format, this is also something that should be consistent between
>>> all archs and owned by fs code, again perhaps more appropriate for
>>> a schema entry.

Good point ... I've still got that as a "TODO: kill these properties off as they are
derivatives" in the MPAM code.

I agree they should live together. We can also pull in data_width too, as it is calculated
based on the format used here.

Moving default_ctrl is tricky as on AMD platforms the {S,}MBA default value is discovered
from cpuid. But it only makes sense for an architecture to provides this for MBps controls
- bitmaps and percentages have an obvious maximum/default value. Putting that in struct
resctrl_membw as 'max_bw' makes bw_validate()s use of it clearer.

bw_validate() has always caught me out as it doesn't just parse percentages, but AMDs MBps
values. I don't think this needs changing, but having MBps as a control type will make
this less surprising.

Finally, core.c will end up keeping default_ctrl as an arch-specific thing as its
convenient for the init and reset code.

[...]

> What I was thinking about was something like below that uses the
> enum you introduce later and lets the RF flags stay internal to fs code:
>
> rdtgroup_create_info_dir()
> {
>
> ...
> list_for_each_entry(s, &resctrl_schema_all, list) {
> r = s->res;
> if (r->res_type == RRESTYPE_CACHE)
> fflags = RFTYPE_RES_CACHE;
> else if (r->res_type == RRESTYPE_MB)
> fflags = RFTYPE_RES_MB;
> else /* fail */
>
> fflags |= RFTYPE_CTRL_INFO;
>
> ...
> }
> /* same idea for monitor info files */

Good point, that would let us remove fflags from the arch code too.


> For this patch the resource type can be used to initialize the schema
> entry.
>
>>
>>
>> /* In include/linux/resctrl_types.h */
>>
>> +#define RFTYPE_RES BIT(8)
>> -#define RFTYPE_RES_CACHE BIT(8)
>> -#define RFTYPE_RES_MB BIT(9)
>
> The goal is to not have to expose any of the RFTYPE flags internals to
> the architecture. RFTYPE_RES_CACHE and RFTYPE_RES_MB stays, but is
> not exposed to arch code. I do not see need for RFTYPE_RES.
> All the RFTYPE flags can be defined in fs/resctrl/internal.h

Yup, these should stay in internal.h - they got swept up as there are #defines either side
that are needed for MPAM to build.


>> /* For RFTYPE_RES: */
>> enum resctrl_resource_type {
>> RRESTYPE_INVALID,
>> RRESTYPE_MB,
>> RRESTYPE_CACHE,
>> };
>
> (I find naming hard ... note the names changed from the beginning of
> pseudo code to here where RESTYPE changing to RRESTYPE)

Before I saw this my attempt has:
| enum resctrl_schema_fmt {
| RESCTRL_SCHEMA_BITMAP,
| RESCTRL_SCHEMA_PERCENTAGE,
| RESCTRL_SCHEMA_MBPS,
| };

Invalid as value '0' would catch the arch code missing this - but means any switch over
this enum has to handle it... I'd prefer to leave that out so the compiler can bark about
any place that needs updating when a new control scheme is added.



Thanks,

James