Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading
From: Willem de Bruijn
Date: Wed May 22 2024 - 14:39:44 EST
Chengen Du wrote:
> Hi Paolo,
>
> Thank you for your useful suggestions and information.
>
> Hi Willem,
>
> The issue initially stems from libpcap [1].
> Upon their further investigation, another issue was discovered,
> leading to a kernel request [2] that describes the problem in detail.
>
> In essence, the kernel does not provide VLAN information if hardware
> VLAN offloading is unavailable in cooked mode.
> The TCI-TPID is missing because the prb_fill_vlan_info() function in
> af_packet.c does not modify the tp_vlan_tci/tp_vlan_tpid values since
> the information is in the payload and not in the sk_buff struct.
> In cooked mode, the L2 header is stripped, preventing the receiver
> from determining the correct TCI-TPID value.
> Additionally, the protocol in SLL is incorrect, which means the
> receiver cannot parse the L3 header correctly.
>
> To reproduce the issue, please follow these steps:
> 1. ip link add link ens18 ens18.24 type vlan id 24
> 2. ifconfig ens18.24 1.0.24.1/24
> 3. ping -n 1.0.24.3 > /dev/null 2>&1 &
> 4. tcpdump -nn -i any -Q out not tcp and not udp
>
> The attached experiment results show that the protocol is incorrectly
> parsed as IPv4, which leads to inaccurate outcomes.
>
> Thanks to Paolo's suggestion, I propose that we add a new bit in the
> status to indicate the presence of VLAN information in the payload and
> modify the header's entry (i.e., tp_vlan_tci/tp_vlan_tpid)
> accordingly.
> For the sll_protocol part, we can introduce a new member in the
> sockaddr_ll struct to represent the VLAN-encapsulated protocol, if
> applicable.
>
> In my humble opinion, this approach will not affect current users who
> rely on the status to handle VLAN parsing, and the sll_protocol will
> remain unchanged.
> Please kindly provide your feedback on this proposal, as there may be
> important points I have overlooked.
> If this approach seems feasible, I will submit a new version next week.
> Your assistance and opinions on this issue are important to me, and I
> truly appreciate them.
>
> Best regards,
> Chengen Du
>
> [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
> [2] https://marc.info/?l=linux-netdev&m=165074467517201&w=4
This is all super helpful context and will have to make it into the
commit message.
So if I understand correctly the issue is inconsistency about whether
VLAN tags are L2 or L3, and information getting lost along the way.
SOCK_DGRAM mode removes everything up to skb_network_offset, which
also removes the VLAN tags. But it does not update skb->protocol.
msg_name includes sockaddr_ll.sll_protocol which is set to
skb->protocol.
So the process gets a packet with purported protocol ETH_P_8021Q
starting beginning at an IP or IPv6 header.
A few alternatives to address this:
1. insert the VLAN tag back into the packet, with an skb_push.
2. prepare the data as if it is a VLAN offloaded packet:
pass the VLAN information through PACKET_AUXDATA.
3. pull not up to skb_network_offset, but pull mac_len.
Your patch does the second.
I think the approach is largely sound, with a few issues to consider:
- QinQ. The current solution just passes the protocol in the outer tag
- Other L2.5, like MPLS. This solution does not work for those.
(if they need a fix, and the same network_offset issue applies.)
3 would solve all these cases, I think. But is a larger diversion from
established behavior.
> On Tue, May 21, 2024 at 9:28 PM Willem de Bruijn
> <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >
> > Paolo Abeni wrote:
> > > On Tue, 2024-05-21 at 11:31 +0800, Chengen Du wrote:
> > > > I would appreciate any suggestions you could offer, as I am not as
> > > > familiar with this area as you are.
> > > >
> > > > I encountered an issue while capturing packets using tcpdump, which
> > > > leverages the libpcap library for sniffing functionalities.
> > > > Specifically, when I use "tcpdump -i any" to capture packets and
> > > > hardware VLAN offloading is unavailable, some bogus packets appear.
> >
> > Bogus how exactly?
> >
> > > > In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
> > > > device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
> > > > usual PF_PACKET/SOCK_RAW socket.
> >
> > Trying to extract L2 or VLAN information from the any device may be
> > the real issue here.
> >
> > > >
> > > > Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
> > > > does not supply the packet's link-layer header.
> > > > Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
> > > > the original packets and provides SLL for some L2 information.
> > >
> > > > From the receiver's perspective, the VLAN information can only be
> > > > parsed from SLL, which causes issues if the kernel stores VLAN
> > > > information in the payload.
> >
> > ETH_HLEN is pulled, but the VLAN tag is still present, right?
> >
> > > >
> > > > As you mentioned, this modification affects existing PF_PACKET receivers.
> > > > For example, libpcap needs to change how it parses VLAN packets with
> > > > the PF_PACKET/SOCK_RAW socket.
> > > > The lack of VLAN information in SLL may prevent the receiver from
> > > > properly decoding the L3 frame in cooked mode.
> > > >
> > > > I am new to this area and would appreciate it if you could kindly
> > > > correct any misunderstandings I might have about the mechanism.
> > > > I would also be grateful for any insights you could share on this issue.
> > > > Additionally, I am passionate about contributing to resolving this
> > > > issue and am willing to work on patches based on your suggestions.
> > >
> > > One possible way to address the above in a less invasive manner, could
> > > be allocating a new TP_STATUS_VLAN_HEADER_IS_PRESENT bit, set it for
> > > SLL when the vlan is not stripped by H/W and patch tcpdump to interpret
> > > such info.
> >
> > Any change must indeed not break existing users. It's not sufficient
> > to change pcap/tcpdump. There are lots of other PF_PACKET users out
> > there. Related, it is helpful to verify that tcpdump agrees to a patch
> > before we change the ABI for it.