Re: [PATCH v5 1/2] dt-bindings: cpufreq: add virtual cpufreq device

From: David Dai
Date: Fri May 17 2024 - 16:59:59 EST


On Tue, May 7, 2024 at 3:21 AM Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
>
> On Thu, May 02, 2024 at 01:17:57PM -0700, David Dai wrote:
> > On Thu, Feb 15, 2024 at 3:26 AM Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> > >
> > > On Fri, Feb 02, 2024 at 09:53:52AM -0600, Rob Herring wrote:
> > > > On Wed, Jan 31, 2024 at 10:23:03AM -0800, Saravana Kannan wrote:
> > > > >
> > > > > We also need the OPP tables to indicate which CPUs are part of the
> > > > > same cluster, etc. Don't want to invent a new "protocol" and just use
> > > > > existing DT bindings.
> > > >
> > > > Topology binding is for that.
> > > >
> > > > What about when x86 and other ACPI systems need to do this too? You
> > > > define a discoverable interface, then it works regardless of firmware.
> > > > KVM, Virtio, VFIO, etc. are all their own protocols.
> > > >
> > >
> > > +1 for the above. I have mentioned the same couple of times but I am told
> > > it can be taken up later which I fail to understand. Once we define DT
> > > bindings, it must be supported for long time which doesn't provide any
> > > motivation to such a discoverable interface which works on any virtual
> > > platforms irrespective of the firmware.
> > >
> >
> > Hi Sudeep,
> >
> > We are thinking of a discoverable interface like this, where the
> > performance info and performance domain mappings are discoverable
> > through the device registers. This should make it more portable across
> > firmwares. Would this address your concerns?
>
> Yes.
>
> > Also, you asked to document this.
> > Where exactly would you want to document this?
>
> IMO it could go under Documentation/firmware-guide ? Unless someone
> has any other suggestions.
>
> > AFAIK the DT bindings documentation is not supposed to include this level of
> > detail. Would a comment in the driver be sufficient?
>
> Agree, DT bindings is not the right place. May be even comment in the
> driver would be sufficient.

Alright, I’ll make this into a comment in the driver itself.

>
> Overall it looks good and on the right path IMO.
>

Okay, I’ll submit V6 patches and continue from there.

> >
> > CPU0..CPUn
> > +-------------+-------------------------------+--------+-------+
> > | Register | Description | Offset | Len |
> > +-------------+-------------------------------+--------+-------+
> > | cur_perf | read this register to get | 0x0 | 0x4 |
> > | | the current perf (integer val | | |
> > | | representing perf relative to | | |
> > | | max performance) | | |
> > | | that vCPU is running at | | |
> > +-------------+-------------------------------+--------+-------+
> > | set_perf | write to this register to set | 0x4 | 0x4 |
> > | | perf value of the vCPU | | |
> > +-------------+-------------------------------+--------+-------+
> > | perftbl_len | number of entries in perf | 0x8 | 0x4 |
> > | | table. A single entry in the | | |
> > | | perf table denotes no table | | |
> > | | and the entry contains | | |
> > | | the maximum perf value | | |
> > | | that this vCPU supports. | | |
> > | | The guest can request any | | |
> > | | value between 1 and max perf. | | |
>
> Does this have to be per cpu ? It can be simplified by keeping
> just cur_perf, set_perf and perf_domain in per-cpu entries and this
> per domain entries separate. But I am not against per cpu entries
> as well.

I think separating out the perf domain entries may make the device
emulation and the driver slightly more complicated. Emulating the perf
domain regions per CPU is a simpler layout if we need to install eBPF
programs to handle the backend per vCPU. Each vCPU looking up its own
frequency information in its own MMIO region is a bit easier too when
initializing the driver. Also each vCPU will be in its own perf domain
for the majority of the use cases, so it won’t make much of a
difference most of the time.

>
> Also why do you need the table if the guest can request any value from
> 1 to max perf ? The table will have discrete OPPs ? If so, how to they
> map to the perf range [1 - maxperf] ?

Let me clarify this in the comment, the perf range [1 - maxperf] is
only applicable in the case where the frequency table is not
supported. The cpufreq driver will still vote for discrete levels if
tables are used. The VMM(Virtual Machine Manager) may choose to use
tables depending on the use case and the driver will support both
cases.

Thanks,
David

>
> > +---------------------------------------------+--------+-------+
> > | perftbl_sel | write to this register to | 0xc | 0x4 |
> > | | select perf table entry to | | |
> > | | read from | | |
> > +---------------------------------------------+--------+-------+
> > | perftbl_rd | read this register to get | 0x10 | 0x4 |
> > | | perf value of the selected | | |
> > | | entry based on perftbl_sel | | |
> > +---------------------------------------------+--------+-------+
> > | perf_domain | performance domain number | 0x14 | 0x4 |
> > | | that this vCPU belongs to. | | |
> > | | vCPUs sharing the same perf | | |
> > | | domain number are part of the | | |
> > | | same performance domain. | | |
> > +-------------+-------------------------------+--------+-------+
>
> The above are couple of high level questions I have ATM.
>
> --
> Regards,
> Sudeep