Re: [PATCH rfc 0/9] mm: memcg: separate legacy cgroup v1 code and put under config option

From: Yafang Shao
Date: Sun May 19 2024 - 22:15:25 EST


On Sat, May 18, 2024 at 3:33 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Thu, May 16, 2024 at 11:35:57AM +0800, Yafang Shao wrote:
> > On Thu, May 9, 2024 at 2:33 PM Shakeel Butt <shakeel.butt@linuxdev> wrote:
> > >
> >
> [...]
> > Hi Shakeel,
> >
> > Hopefully I'm not too late. We are currently using memcg v1.
> >
> > One specific feature we rely on in v1 is skmem accounting. In v1, we
> > account for TCP memory usage without charging it to memcg v1, which is
> > useful for monitoring the TCP memory usage generated by tasks running
> > in a container. However, in memcg v2, monitoring TCP memory requires
> > charging it to the container, which can easily cause OOM issues. It
> > would be better if we could monitor skmem usage without charging it in
> > the memcg v2, allowing us to account for it without the risk of
> > triggering OOM conditions.
> >
>
> Hi Yafang,
>
> No worries. From what I understand, you are not really using skmem
> charging of v1 but just the network memory usage stats and you are
> worried that charging network memory to cgroup memory may cause OOMs. Is
> that correct?

Correct.

> Have you tried charging network memory to cgroup memory
> before and saw OOMs? If yes then I would really like to see OOM reports.

No, we don't enable the charging for TCP memory in memcg v1 and we
don't have a plan to add support for it currently.

>
> I have two examples where the v2's skmem charging is working fine in
> production namely Google and Meta. Google is still on v1 but for skmem
> charging, they have moved to v2 semantics. Actually I have another
> report from Cloudflare [0] where the tcp throttling mechanism for v2's
> tcp memory accounting is too much conservative for their production
> traffic.
>
> Anyways this just means that we need a more flexible way to provide
> and enforce semantics for tcp memory pressure with a decent default
> behavior. I will followup on this separately.
>
> [0] https://lore.kernel.org/lkml/CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfNkXUE+Uw@xxxxxxxxxxxxxx/

Thanks for your explanation.

--
Regards
Yafang