Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver
From: Alyssa Rosenzweig
Date: Fri Apr 05 2019 - 12:16:36 EST
> I'm also somewhat surprised that you don't need loads of other
> properties from the GPU - in particular knowing the number of shader
> cores is useful for allocating the right amount of memory for TLS (and
> can't be obtained purely from the GPU_ID).
Since I have no idea what TLS is (and in my notes, I've come across the
acronym once ever and have it as a "??"), I'm not sure how to respond to
that... We don't know how to allocate memory for the GPU-internal data
structures (the tiler heap, for instance, but also a few others I've
just named "misc_0" and "scratchpad" -- guessing one of those is for
"TLS"). With kbase, I took the worst-case strategy of allocating
gigantic chunks on startup with tiny commit counts and GROW_ON_GPF set.
With the new driver, well, our memory consumption is scary since
implementing GROW_ON_GPF in an upstream-friendly way is a bit more work
and isn't expected to hit the 5.2 window.
Given this is kernel facing, I'm hoping 're able to share some answers
here?
> I think this comment might have survived since the very earliest version
> of the Midgard driver! :)
^_^
> But I'm not sure anything will attempt to lock a region spanning two
> pages like that.
At least at the moment, I align everything kernel-facing to page
granularity in userspace, so it's not a cornercase I'm going to hit
anytime soon. Still probably better to have it technically correct.
> To be fair only for BASE_HW_ISSUE_6367/T60X - but yes it's not a
> pleasant workaround. There's no way on that hardware to reliably drain
> the write buffer other than waiting.
*wishing T60X disappeared intensifies* ;)
Granted there are enough other errata specific to it that aren't worked
around here that, well, it makes you wonder ;)
> Do we have a good way of user space determining which requirements are
> supported by the driver? At the moment it's just the one. kbase outgeew
> the original u16 and has an ugly "compat_core_req", so I suspect you're
> going to need to add several more along the way.
Oh, so that's why compat_/core_req is split off! I thought somebody just
thought it would be "fun" to break the UABI ;)
I've definitely issues using the wrong core_req field for the kbase I
had setup, that set me back a little bit on RK3399/T860 bringup *purses
lips*
To be fair, bunches of the kbase reqs are for soft jobs, which I don't
feel are a good fit for how the Panfrost kernel works. If we need to
implement functionality corresponding to softjobs, that would likely be
done with dedicated ioctl(s), instead of affecting the core_req field.
On that note
> You might also want to consider being able to submit more than one job
> chain at a time - but that could easily be implemented using a new ioctl
> in the future.
The issue with that at the bottom is having to introduce something akin
to kbase's annoyingly intra-job-chain dependency management (read: I
still don't understand how FBOs are supposed to work with kbase ;) ),
which AFAIK we push off to userspace right now via standard fencing. If
we want to submit batches at a time, we would potentially need to
express those somewhat complex dependency trees, which is a lot of work
for diminishing returns at this stage. Future ioctl indeed...
> There's no SUBMIT_CL in this posting? I think you just need s/_CL//.
+1
> You are probably going to want flags for at least:
>
> * No execute/No read/No write (good for security, especially with
> buffer sharing between processes)
>
> * Alignment (shader programs have quite strict alignment requirements,
> I believe kbase just ensures that the shader memory block doesn't cross
> a 16MB boundary which covers most cases).
>
> * Page fault behaviour (kbase has GROW_ON_GPF)
>
> * Coherency management
+1 for all of these. This is piped through in userspace (for kbase), but
the corresponding functionality isn't there yet in the Panfrost kernel.
You're right there should at least be a flags field for future use.
> One issue that I haven't got to the bottom of is that I can trigger a
> lockdep splat:
Oh, "fun"...
> This is with the below simple reproducer:
@Rob, ideas?
> Other than that in my testing (on a Firefly RK3288) I didn't experience
> any problems pushing jobs from the ARM userspace blob through it.
Nice!
Besides what was mentioned above, any other functionality you'll need
for that? (e.g. the infamous replay workaround...)