Le vendredi 21 juillet 2023 à 16:56 +0800, Hsia-Jun Li a écrit :SVC case is the one I mention in the talk, although the major problem may only happens to SVC-S.
On 7/17/23 22:00, Nicolas Dufresne wrote:Correct, only SRC/DST/BG type of blitters can be supported for compositing,
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.I think those things are m2m, but it would be hard to present in current
Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:Its a bit late to try and bring that argument. It should have been raised couple
Hi Randy,While I'm not going to NAK this series or those 2 patches if you send
Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
From: Randy Li <ayaka@xxxxxxxxxxx>I have a two atches with similar goals in my wave5 tree. It will be easier to
For the decoder supports Dynamic Resolution Change,
we don't need to allocate any CAPTURE or graphics buffer
for them at inital CAPTURE setup step.
We need to make the device run or we can't get those
metadata.
Signed-off-by: Randy Li <ayaka@xxxxxxxxxxx>
---
drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 0cc30397fbad..c771aba42015 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
- if (!m2m_ctx->out_q_ctx.q.streaming
- || !m2m_ctx->cap_q_ctx.q.streaming) {
+ if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
+ || !(m2m_ctx->cap_q_ctx.q.streaming
+ || m2m_ctx->cap_q_ctx.buffered)) {
upstream with an actual user, though, I'm probably a month or two away from
submitting this driver again.
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
them, I'm not really convinced that adding more and more complexity to
the mem2mem helpers is a good idea, especially since all of those seem
to be only needed by stateful video decoders.
The mem2mem framework started as a set of helpers to eliminate boiler
plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
run 1 processing job on them and then return both of the to the userspace
and I think it should stay like this.
of years ago (before I even started helping with these CODEC). Now that all the
newly written stately decoders uses this framework, it is logical to keep
reducing the boiler plate for these too. In my opinion, the job_ready()
callback, should have been a lot more flexible from the start. And allowing
driver to make it more powerful does not really add that much complexity.
Speaking of complexity, driving the output manually (outside of the job
workqueue) during sequence initialization is a way more complex and risky then
this. Finally, sticking with 1:1 pattern means encoder, detilers, image
enhancement reducing framerate, etc. would all be unwelcome to use this. Which
in short, means no one should even use this.
m2m framework:
1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).
which is quite limiting. Currently there is no way to make an N:1 M2M, as M2M
instances are implemented at the video node layer, and not at the MC layer. This
is a entirely new subject and API design space to tackle (same goes for 1:N,
like multi scalers, svc decoders etc.).
2. AV1 film gainFor AV1/HEVC film grain, it is handle similar to inline converters and scalers.
The driver secretly allocate the reference frames, and post process into theHiding internal buffer is the worst case, frame buffer could be large.
user visible buffers.
It breaks some assumption made by most protected memoryThe current stateless API won't support DMA buffer for the metadata.
setup though, as not all allocation is user driven, meaning the decoder needs to
know if its secure or not. Secure memory is a also another API design space to
tackle.
3. HDR with dynamic meta data to SDRTrue, but easy to design around the stateless model. I'm not worried for these.
The video things fix for m2m model could be just a little less complexI thought that having the kernel schedule ISP reprocessing jobs (which requires
than ISP or camera pipeline. The only difference is just ISP won't have
multiple contexts running at the same time.
instances) would be nice. But this can only be solved after we have solved the
N:N use cases of m2m (with multiple instances).
If we could design a model for the video encoder I think we could solveOn RK3588 at least, decoder scheduling is going to be complex. There is an even
the most of problems.
A video encoder would have:
1. input graphics buffer
2. reconstruction graphics buffer
3. motion vector cache buffer(optional)
4. coded bitstream output
5. encoding statistic report
I didn't know the schedule problem about stateless codec, are theyI think we're strongly in need of a stateful video decoder framework thatThe bend is already there, of course I'd be happy to help with any new
would actually address the exact problems that those have rather than
bending something that wasn't designed with them in mind to work around the
differences.
framework. Specially on modern stateless, were there is a need to do better
scheduling.
supposed to be in the job queue when the buffers that DPB requests are
own by the driver and its registers are prepared except the trigger bit?
number of cores, but when you need to decode 8K, you have to pair two cores
(there is a specific set of cores that are to be paired with). We need a decent
scheduling logic to ensure we don't starve 8K decoding session when there is
multiple smaller resolution session on-going.
On MTK, the entropy decoding (LAT) and the reconstruction (CORE) is split. MTK
vcodec is using multiple workqueues to move jobs around, which is clearly
expensive. Also, the life time of a job is not exactly easy to manage.
On RPi HEVC (not upstream yet, but being worked on), the entropy decoding and
reconstruction is done one the same core, but remains 2 concurrent operation.
Does not impose a complex scheduling issue, but it raised the need for a way to
fully utilize such HW.
This is just some examples of complexity for which the current framework is not
that helpful (even though, its not impossible either).
Just ping me if you have some effort starting, I don't currently
have a budget or bandwidth to write new drivers or port existing drivers them on
a newly written framework.
Nicolas
[...]