Re: [PATCH RFCv1 04/14] iommufd: Add struct iommufd_viommu and iommufd_viommu_ops
From: Jason Gunthorpe
Date: Thu May 23 2024 - 08:59:49 EST
On Thu, May 23, 2024 at 01:43:45AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, May 22, 2024 9:39 PM
> >
> > On Wed, May 22, 2024 at 08:58:34AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > Sent: Tuesday, May 14, 2024 11:56 PM
> > > >
> > > > > > So we need the S2 to exist before the VIOMMU is created, but the
> > > > > > drivers are going to need some more fixing before that will fully
> > > > > > work.
> > >
> > > Can you elaborate on this point? VIOMMU is a dummy container when
> > > it's created and the association to S2 comes relevant only until when
> > > VQUEUE is created inside and linked to a device?
> >
> > VIOMMU contains:
> > - A nesting parent
> > - A KVM
> > - Any global per-VM data the driver needs
> > * In ARM case this is VMID, sometimes shared with KVM
>
> In which case is it not shared with KVM? I had the impression that
> VMID always comes from KVM in this VCMDQ usage. 😊
Not quite, only vBTM needs it to be shared with KVM because the CPU
will forward the KVM VMID to the SMMU during invalidations.
Everything else in the nesting space (including vCMDQ) just needs the
VMID to be unique to the VM since it scopes the ASID that is stored in
guest table.s
For non-nesting cases (ie no viommu) the VMID can be unique to the S2.
> > On ARM the S2 is not divorced from the VIOMMU, ARM requires a single
> > VMID, shared with KVM, and localized to a single VM for some of the
> > bypass features (vBTM, vCMDQ). So to attach a S2 you actually have to
> > attach the VIOMMU to pick up the correct VMID.
> >
> > I imagine something like this:
> > hwpt_alloc(deva, nesting_parent=true) = shared_s2
> > viommu_alloc(deva, shared_s2) = viommu1
> > viommu_alloc(devb, shared_s2) = viommu2
> > hwpt_alloc(deva, viommu1, vste) = deva_vste
> > hwpt_alloc(devb, viommu2, vste) = devb_vste
> > attach(deva, deva_vste)
> > attach(devb, devb_vste)
> > attach(devc, shared_s2)
>
> I wonder whether we want to make viommu as the 1st-class citizen
> for any nested hwpt if it is desirable to enable it even for VT-d which
> lacks of a hw viommu concept at the moment.
I think we may as well code it like that, yes. It is easy to get the
S2 out of the viommu and feed that into the intel driver.
> > The driver will then know it should program three different VMIDs for
> > the same S2 page table, which matches the ARM expectation for
> > VMID. That is to say we'd pass in the viommu as the pt_id for the
> > iommu_hwpt_alloc. The viommu would imply both the S2 page table and
> > any meta information like VMID the driver needs.
>
> Can you elaborate the aspect about "three different VMIDs"?
In SMMUv3 the cache is tagged by (VMID,ASID) where ASID is completely
controlled by the guest.
Every time the guest observes a SMMUv3 instance it is allowed to
creates its own private ASID number space for that instance. The guest
could re-use ASID #1 on two instances.
So every SMMUv3 instance plugged into the guest needs to have its own
unique VMID so that the overlapping ASID's are disambiguate. The above
would create a VM where:
deva -> vSMMUv3 #1
devb -> vSMMUv3 #2
devc -> No IOMMU
> tegra241_cmdqv_viommu_alloc()
> vintf->vmid = smmu_domain->vmid;
And same here the 'vintf' is processing ASIDs directly from the guest,
it needs to understand the VMID they are scoped under. Every vSMMUv3
instance in the guest must get a VMID, vintf, and vCMDQ all to its
own. The above needs revising in a viommu world.
> > Both AMD and the vCMDQ thing need to translate some PFNs through the
> > S2 and program them elsewhere, this is manually done by SW, and there
> > are three choices I guess:
> > - Have the VMM do it and provide void __user * to the driver
>
> this sounds redundant to what S2 already provides
Yes, but we don't have to track invalidations here..
> > - Have the driver do it through the S2 directly and track
> > S2 invalidations
>
> this makes more sense to me. Just like the driver already needs to track
> S2 invalidations to flush any nested cache related to the affected S2 range.
I'm a bit worried about 'track invalidations' though..
> > - Have the driver open an access on the IOAS and use the access unmap
>
> it requires adding more iommufd awareness into the iommu driver. I'm
> inclined to do it only at minimal necessity.
Yes, it is certainly annoying because of the modular/builtin problem.
> > Hmm, given we currently have no known hardware entanglement between
> > PRI and VIOMMU it does seem OK for PRI to just exist seperate for
>
> Isn't AMD vPPRLog for directly sending PRI request into the guest?
I think it is, but that would be a vQUEUE on the VIOMMU not adding a
VIOMMU to Lu's patches, which is what I ment.
> > now. If someone needs them linked someday we can add a viommu_id to
> > the create pri queue command.
>
> I'm more worried about the potential conflict between the vqueue
> object here and the fault queue object in Baolu's series, if we want
> to introduce vIOMMU concept to platforms which lack of the hw
> support.
I assume the vPPRLog will steal all the PRI before it reaches the
kernel, so once this is turned on Lu's path won't see anything.
I don't know if AMD can turn vPPRLog on individually or if it is a
whole package once they switch on VIOMMU..
Jason