Re: [PATCH RFCv1 04/14] iommufd: Add struct iommufd_viommu and iommufd_viommu_ops

From: Nicolin Chen
Date: Thu May 23 2024 - 00:02:23 EST


On Thu, May 23, 2024 at 01:43:45AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, May 22, 2024 9:39 PM
> > VIOMMU contains:
> > - A nesting parent
> > - A KVM
> > - Any global per-VM data the driver needs
> > * In ARM case this is VMID, sometimes shared with KVM
>
> In which case is it not shared with KVM? I had the impression that
> VMID always comes from KVM in this VCMDQ usage. 😊

Not actually. I guess that shared VMID is for BTM.

> > On ARM the S2 is not divorced from the VIOMMU, ARM requires a single
> > VMID, shared with KVM, and localized to a single VM for some of the
> > bypass features (vBTM, vCMDQ). So to attach a S2 you actually have to
> > attach the VIOMMU to pick up the correct VMID.
> >
> > I imagine something like this:
> > hwpt_alloc(deva, nesting_parent=true) = shared_s2
> > viommu_alloc(deva, shared_s2) = viommu1
> > viommu_alloc(devb, shared_s2) = viommu2
> > hwpt_alloc(deva, viommu1, vste) = deva_vste
> > hwpt_alloc(devb, viommu2, vste) = devb_vste
> > attach(deva, deva_vste)
> > attach(devb, devb_vste)
> > attach(devc, shared_s2)
>
> I wonder whether we want to make viommu as the 1st-class citizen
> for any nested hwpt if it is desirable to enable it even for VT-d which
> lacks of a hw viommu concept at the moment.

I think Jason is completely using SMMU as an example here.

Also FWIW, I am trying a core-allocated core-managed viommu for
IOMMU_VIOMMU_TYPE_DEFAULT. So VT-d driver doesn't need to hold
a viommu while VMM could still allocate one if it wants. And the
VIOMMU interface can provide some helpers if driver wants some
info from the core-managed viommu: a virtual dev ID to physical
dev ID (returning device pointer) translation for example. And
we can add more after we brain storm.

Sample change:
@@ -623,6 +625,18 @@ struct iommu_ops {
+ * @viommu_alloc: Allocate an iommufd_viommu associating to a nested parent
+ * @domain as a user space IOMMU instance for HW-accelerated
+ * features from the physical IOMMU behind the @dev. The
+ * @viommu_type must be defined in include/uapi/linux/iommufd.h
+ * It is suggested to call iommufd_viommu_alloc() helper for
+ * a bundled allocation of the core and the driver structures,
+ * using the given @ictx pointer.
+ * @default_viommu_ops: Driver can choose to use a default core-allocated core-
+ * managed viommu object by providing a default viommu ops.
+ * Otherwise, i.e. for a driver-managed viommu, viommu_ops
+ * should be passed in via iommufd_viommu_alloc() helper in
+ * its own viommu_alloc op.

[..]

+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
+{
...
+ if (cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) {
+ viommu = __iommufd_viommu_alloc(
+ ucmd->ictx, sizeof(*viommu),
+ domain->ops->default_viommu_ops);
+ } else {
+ if (!domain->ops->viommu_alloc) {
+ rc = -EOPNOTSUPP;
+ goto out_put_hwpt;
+ }
+
+ viommu = domain->ops->viommu_alloc(domain, idev->dev,
+ ucmd->ictx, cmd->type);
+ }

[..]
// Helper:
+struct device *
+iommufd_viommu_find_device(struct iommufd_viommu *viommu, u64 id);

> > The driver will then know it should program three different VMIDs for
> > the same S2 page table, which matches the ARM expectation for
> > VMID. That is to say we'd pass in the viommu as the pt_id for the
> > iommu_hwpt_alloc. The viommu would imply both the S2 page table and
> > any meta information like VMID the driver needs.
>
> Can you elaborate the aspect about "three different VMIDs"? They are
> all for the same VM hence sharing the same VMID per the earlier
> description. This is also echo-ed in patch14:
>
> tegra241_cmdqv_viommu_alloc()
> vintf->vmid = smmu_domain->vmid;

The design in the series is still old using a 1:1 relationship
between a viommu and an S2 domain. I think the "three" is from
his SMMU example above? Leaving it to Jason to reply though.

> > now. If someone needs them linked someday we can add a viommu_id to
> > the create pri queue command.
>
> I'm more worried about the potential conflict between the vqueue
> object here and the fault queue object in Baolu's series, if we want
> to introduce vIOMMU concept to platforms which lack of the hw
> support.

I actually see one argument is whether we should use a vqueue
v.s. Baolu's fault queue object, and a counter argument whether
we should also use vqueue for viommu invalidation v.s an array-
based invalidation request that we have similarly for HWPT...

Thanks
Nicolin