From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Tudor Cornea <tudor.cornea@gmail.com>
Cc: <linville@tuxdriver.com>, Thomas Monjalon <thomas@monjalon.net>,
"Mihai Pogonaru" <pogonarumihai@gmail.com>, <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
Date: Tue, 26 Oct 2021 15:30:00 +0100 [thread overview]
Message-ID: <7dd5cbb2-dfb8-8c54-9b48-9e271c0e51b1@intel.com> (raw)
In-Reply-To: <CAOuQ8vWHLgt=8aNxoSd4XvWeYQqMuJ1Wt4iKpeVmY8tqwVp-_w@mail.gmail.com>
On 10/5/2021 4:11 PM, Tudor Cornea wrote:
> Hi Ferruh,
>
> I have attempted to narrow down the issue.
> I have the following bash script, which computes packet rates on an
> interface.
>
> [root@localhost ~]# cat compute-rates.sh
> #!/usr/bin/env bash
>
> if [[ ${#} -ne 2 ]]; then
> echo "Usage: ${0} <iface-name> <sleep-interval-seconds>"
> exit 1
> fi
>
> IFACE_NAME="${1}"
> SLEEP_INTERVAL_SECONDS="${2}"
> TMP_STATS_FILE="/tmp/netstat"
>
> # Clear Previous stats file
> echo "0 0 0 0" > "${TMP_STATS_FILE}"
>
> echo "Press CTRL+C to exit..."
>
> while true; do
> export "RxB=0" "RxP=0" "TxB=0" "TxP=0"
>
> # Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and
> # format the output. Individual fields will be exported
> export $(\
> ifconfig "${IFACE_NAME}" \
> | grep 'packets' \
> | awk '{print $5, $3}' \
> | xargs echo \
> | sed -E -e \
> "s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2
> TxB=\3 TxP=\4/")
>
> # Print Packet and Byte Rates
> # Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets |
>
> echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \
> | awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}'
>
> # Save the new values
> echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}"
>
> sleep "${SLEEP_INTERVAL_SECONDS}"
>
> done
>
> On the transmit side, I'm using the engine behind [1] with the af_packet
> PMD.
>
> The configuration for the af_packet PMD is the following:
> --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
>
> I'm configuring a Tx rate of 335 packets / second and a packet size of 300
> Bytes.
> These seem to be the values using which we seem to have better chances of
> seeing the problem. I suspect it might also be linked with the af_packet
> configuration.
>
> I'm starting traffic using the specified configuration, and in parallel,
> running the script that computes the rates as follows:
> ./compute-rates.sh eth1 0.1
>
> Initially, the packet rates seem steady
>
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10952 TxP=37
>
> [...]
>
> After a while, we toggle the interface up / down with a sleep between the
> steps. I suspect the length of the sleep might be a variable in the
> equation.
>
> ifconfig eth1 down; sleep 7; ifconfig eth1 up
>
>
> What we see, is that even after the interface is toggled back up, the rates
> never seem to recover.
>
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=2072 TxP=7
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=521256 TxP=1761
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
>
> [...]
>
>
> I've attempted to mirror the same behavior using dpdk-pktgen [2] on a
> different machine (Ubuntu 20.04). This time, af_packet runs on top of
> a Linux virtio_net interface.
>
> I seem to be getting a similar behavior. I have used the following
> dpdk-pktgen configuration and run-time settings
>
>
> pktgen \
> -l 1-4 \
> -n 4 \
> --proc-type=primary \
> --no-pci \
> --no-telemetry \
> --no-huge \
> -m 512 \
> --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
> \
> -- \
> -P \
> -T \
> -m "3.0" \
> -f themes/black-yellow.theme
>
> set 0 size 300
> set 0 rate 0.008
> set 0 burst 1
> start 0
>
>
> [1] https://github.com/open-traffic-generator/ixia-c
> [2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
>
> On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cornea@gmail.com> wrote:
>
Hi Tudor,
I have used testpmd, 'txonly' forwarding. Tx recovers after interface up,
but by adding some debug logs I can see 'poll()' returns with POLLOUT even
there is no space in the buffer.
According the logic in the PMD, when 'poll()' returns success, it expects
to have some space in the Tx buffer.
So I agree to add the check.
Only have a question on the POLLERR, should we separate the POLLERR check
to cover ifdown case, what do you think about following logic:
if (!TP_STATUS_AVAILABLE) {
if (poll() < 0)
break;
if (pfd.revents & POLLERR)
break;
}
if (!TP_STATUS_AVAILABLE)
break;
>> Hi Ferruh,
>>
>> What you described above looks like a ring buffer with single producer and
>>> single consumer, and producer overwrites the not consumed items.
>>
>>
>> Indeed. This is also my understanding of the bug.
>> I am going to try to isolate the issue, and should probably be able to
>> come up with a script in a few days.
>>
>> Our of curiosity, are you using an modified af_packet implementation in
>>> kernel
>>> for above described usage?
>>
>>
>> We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
>> We don't have any kernel patches for the af_packet implementation to my
>> knowledge (probably excepting patches that are back-ported by Ubuntu
>> maintainers from newer releases).
>>
>>
>> On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yigit@intel.com>
>> wrote:
>>
>>> On 9/13/2021 2:45 PM, Tudor Cornea wrote:
>>>> The poll call can return POLLERR which is ignored, or it can return
>>>> POLLOUT, even if there are no free frames in the mmap-ed area.
>>>>
>>>> We can account for both of these cases by re-checking if the next
>>>> frame is empty before writing into it.
>>>>
>>>> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
>>>> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
>>>> ---
>>>> drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
>>>> 1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> index b73b211..087c196 100644
>>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>>> **bufs, uint16_t nb_pkts)
>>>> (poll(&pfd, 1, -1) < 0))
>>>> break;
>>>>
>>>> + /*
>>>> + * Poll can return POLLERR if the interface is down
>>>> + *
>>>> + * It will almost always return POLLOUT, even if there
>>>> + * are no extra buffers available
>>>> + *
>>>> + * This happens, because packet_poll() calls
>>> datagram_poll()
>>>> + * which checks the space left in the socket buffer and,
>>>> + * in the case of packet_mmap, the default socket buffer
>>> length
>>>> + * doesn't match the requested size for the tx_ring.
>>>> + * As such, there is almost always space left in socket
>>> buffer,
>>>> + * which doesn't seem to be correlated to the requested
>>> size
>>>> + * for the tx_ring in packet_mmap.
>>>> + *
>>>> + * This results in poll() returning POLLOUT.
>>>> + */
>>>> + if (ppd->tp_status != TP_STATUS_AVAILABLE)
>>>> + break;
>>>> +
>>>
>>> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
>>> the
>>> point of the 'poll()' at all?
>>>
>>> What can we test/reproduce the mentioned behavior? Or is there a way to
>>> fix the
>>> behavior of poll() or use an alternative of it?
>>>
>>>
>>> OK to break on the 'POLLERR', I guess it can be detected in the
>>> 'pfd.revent'.
>>>
>>>
>>>> /* copy the tx frame data */
>>>> pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>>>> sizeof(struct sockaddr_ll);
>>>>
>>>
>>>
next prev parent reply other threads:[~2021-10-26 14:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-20 13:39 [dpdk-dev] [PATCH] " Tudor Cornea
2021-09-01 16:34 ` Ferruh Yigit
2021-09-06 10:23 ` Tudor Cornea
2021-09-20 17:11 ` Ferruh Yigit
2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
2021-09-20 17:44 ` Ferruh Yigit
2021-09-29 10:03 ` Tudor Cornea
2021-10-05 15:11 ` Tudor Cornea
2021-10-26 14:30 ` Ferruh Yigit [this message]
2021-11-02 15:24 ` Tudor Cornea
2021-11-02 15:47 ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-02 16:47 ` Ferruh Yigit
2021-11-03 9:31 ` [dpdk-dev] [PATCH v4] " Tudor Cornea
2021-11-04 12:07 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7dd5cbb2-dfb8-8c54-9b48-9e271c0e51b1@intel.com \
--to=ferruh.yigit@intel.com \
--cc=dev@dpdk.org \
--cc=linville@tuxdriver.com \
--cc=pogonarumihai@gmail.com \
--cc=thomas@monjalon.net \
--cc=tudor.cornea@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).