From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by dpdk.org (Postfix) with ESMTP id 6C58B1B123 for ; Tue, 23 Oct 2018 08:17:58 +0200 (CEST) Received: by mail-yb1-f169.google.com with SMTP id g75-v6so82947yba.10 for ; Mon, 22 Oct 2018 23:17:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=T7Frpyc6GqoJmfZEMEAAI8icpMjbUmuYCkc4ah448Mw=; b=HhxkIq8f2rQDBYjNQwY5SOXdjLEHakuZyWNtrdI+6d/PrB4Bjde1SAeByP0ur7Su1W gd32BRmefznLFk1q0kRf+jPRmVSv2hOL0tss84A7yaTLq74uo9axeoM6WaNjXmAjdjCN Zon4RYag9Ru/2N3ho1byQ4q5t3hn9EDyY82tLIQUA4lmerHmbyMjTK3BpL0iBf1kAdNA H//BzMkLIb9Ecn2VBw7cKZ6KG4gaIGUeigZpJ8wU8VhTdRNkSWmsomMzuEMdbQHaIcnr 0VIyE/Mb8LPBZ/4fJGGkg13QaWkOCEkERYnEb5lZoe2S3iOs+mWNbpnGd87K5whi+vJW SnqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=T7Frpyc6GqoJmfZEMEAAI8icpMjbUmuYCkc4ah448Mw=; b=ntYU37QWbIMiiMVPr5rok5s3dxbJZ/awi5FeR0D7kvwt0Ho/TGlXqNZnP7Dko/aFfq 96pjq8sP4jUzEWf4X4+OwtJbfKzm1fFy0W1uhiQEeeUoEdtZBJIKiBZDnDIFVbJEuYtO qt0qA5pgUntFjp6hwNR3BOHizRgMpp5Iem1uASmVhhL2LYN44RpuqI13NDE5CVF7cr8z Q+Zulr256xgmsp7pZ/pvUEbO3Tbfq+rsjEruPJpXa4CLYwFstN8NYXOvD4ZkSvdZaHqm ps1FFEGQAueC0eNH3eiBQj5Dinz9e0j+JFYwZmUNO7VosCtG3bz/ctDgtU4h70IpHc/l IFzw== X-Gm-Message-State: AGRZ1gKTRRZgJWPf3STFbBSFt45Ro/lqimXJEtMcltwpI3xXAPdHPiVk 96I60eo+B3Hskck4LqcSSR+ZSByaV097lU2bcO0= X-Google-Smtp-Source: AJdET5cv4ECkw2C6mKD89cQI4PA9HPjUbZpx3YlX89XQDUDWHAzIHXgdNDtCsKtP/Beb4AEOstnNDRYNtOFcXYSv/Wc= X-Received: by 2002:a25:5a87:: with SMTP id o129-v6mr238606ybb.179.1540275477738; Mon, 22 Oct 2018 23:17:57 -0700 (PDT) MIME-Version: 1.0 References: <7BE25A0E-D90B-4B88-A412-F06604ABE018@intel.com> In-Reply-To: <7BE25A0E-D90B-4B88-A412-F06604ABE018@intel.com> From: Wajeeha Javed Date: Tue, 23 Oct 2018 11:17:21 +0500 Message-ID: To: keith.wiles@intel.com Cc: users@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] users Digest, Vol 155, Issue 7 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Oct 2018 06:17:58 -0000 Hi Keith, Thanks for your reply. Please find below my comments >> You're right, in my application all the packets are stored inside mbuf. The reason for not using the next pointer of mbuf is that it might get used by the fragmented packets having size greater than MTU. >> I have tried using small buffer of STAILQ linked list for each port having STAILQ entry and pointer to mbuf packets burst. I allocate the stailq entry, set the mbuf pointer in the stailq entry, the link the stailq entry to the stailq list using stailq macros. I observe millions of packet loss, the stailq linked list could only hold less than 1 million packets per second at line rate of 10Gbits/sec. >> I would like to prevent data loss, could you please guide me what is the best optimal solution for increasing the number of mbufs without freeing or overwriting them for a delay of 2 secs. Thanks & Best Regards, Wajeeha Javed On Tue, Oct 16, 2018 at 3:02 PM Wiles, Keith wrote: > Sorry, you must have replied to my screwup not sending the reply in pure > text format. I did send an updated reply to hopefully fix that problem. > More comments inline below. All emails to the list must be in =E2=80=98te= xt' format > not =E2=80=98Rich Text=E2=80=99 format :-( > > > On Oct 15, 2018, at 11:42 PM, Wajeeha Javed > wrote: > > > > Hi, > > > > Thanks, everyone for your reply. Please find below my comments. > > > > *I've failed to find explicit limitations from the first glance.* > > * NB_MBUF define is typically internal to examples/apps.* > > * The question I'd like to double-check if the host has enought* > > * RAM and hugepages allocated? 5 million mbufs already require about* > > * 10G.* > > > > Total Ram =3D 128 GB > > Available Memory =3D 23GB free > > > > Total Huge Pages =3D 80 > > > > Free Huge Page =3D 38 > > Huge Page Size =3D 1GB > > > > *The mempool uses uint32_t for most sizes and the number of mempool ite= ms > > is uint32_t so ok with the number of entries in a can be ~4G as stated = be > > make sure you have enough * > > > > *memory as the over head for mbufs is not just the header + the packet > size* > > > > Right. Currently, there are total of 80 huge pages, 40 for each numa no= de > > (Numa node 0 and Numa node 1). I observed that I was using only 16 huge > > pages while the other 16 > > > > huge pages were used by other dpdk application. By running only my dpd= k > > application on numa node 0, I was able to increase the mempool size to > 14M > > that uses all the > > > > huge pages of Numa node 0. > > > > *My question is why are you copying the mbuf and not just linking the > mbufs > > into a link list? Maybe I do not understand the reason. I would try to > make > > sure you do not do a copy of the * > > > > *data and just link the mbufs together using the next pointer in the mb= uf > > header unless you have chained mbufs already.* > > > > The reason for copying the Mbuf is due to the NIC limitations, I cannot > > have more than 16384 Rx descriptors, whereas I want to withhold all th= e > > packets coming at a line rate of 10GBits/sec for each port. I created a > > circular queue running on a FIFO basis. Initially, I thought of using > > rte_mbuf* packet burst for a delay of 2 secs. Now at line rate, we > receive > > 14Million > > I assume in your driver a mbuf is used to receive the packet data, which > means the packet is inside an mbuf (if not then why not?). The mbuf data > does not need to be copied you can use the =E2=80=99next=E2=80=99 pointer= in the mbuf to > create a single link list. If you use fragmented packets in your design, > which means you are using the =E2=80=99next=E2=80=99 pointer in the mbuf = to chain the frame > fragments into a single packet then using =E2=80=99next=E2=80=99 will not= work. Plus when > you call rte_pktmbuf_free() you need to make sure the next pointer is NUL= L > or it will free the complete chain of mbufs (not what you want here). > > In the case where you are using chained mbufs for a single packet then yo= u > can create a set of small buffers to hold the STAILQ pointers and the > pointer to the mbuf. Then add the small structure onto a link list as thi= s > method maybe the best solution in the long run instead of trying to use t= he > mbuf->next pointer. > > Have a look at the rte_tailq.h and eal_common_tailqs.c files and > rte_mempool.c (plus many other libs in DPDK). Use the rte_tailq_entry > structure to create a linked list of mempool structures for searching and > debugging mempools in the system. The 'struct rte_tailq_entry=E2=80=99 is= just > adding a simple structure to point to the mempool structure and allows it > to build a linked list with the correct pointer types. > > You can create a mempool of rte_tailq_entry structures if you want a fast > and clean way to allocate/free the tailq entry structures. > > Then you do not need to copy the packet memory anyplace just allocate a > tailq entry structure, set the mbuf pointer in the tailq entry, the link > the tailq entry to the tailq list. These macros for tailq support are no= t > the easiest to understand :-(, but once you understand the idea it become= s > clearer. > > I hope that helps. > > > > > Packet/s, so descriptor get full and I don't have other option left tha= n > > copying the mbuf to the circular queue rather than using a rte_mbuf* > > pointer. I know I have to make a > > > > compromise on performance to achieve a delay for packets. So for copyin= g > > mbufs, I allocate memory from Mempool to copy the mbuf received and the= n > > free it. Please find the > > > > code snippet below. > > > > How we can chain different mbufs together? According to my understandin= g > > chained mbufs in the API are used for storing segments of the fragmente= d > > packets that are greater > > > > than MTU. Even If we chain the mbufs together using next pointer we nee= d > to > > free the mbufs received, otherwise we will not be able to get free Rx > > descriptors at a line rate of > > > > 10GBits/sec and eventually all the Rx descriptors will be filled and NI= C > > will not receive any more packets. > > > > > > > > for( j =3D 0; j < nb_rx; j++) { > > m =3D pkts_burst[j]; > > struct rte_mbuf* copy_mbuf =3D pktmbuf_copy(m, pktmbuf_pool[sockid]); > > .... > > rte_pktmbuf_free(m); > > } > > > > > > > > *The other question is can you drop any packets if not then you only ha= ve > > the linking option IMO. If you can drop packets then you can just start > > dropping them when the ring is getting full. Holding onto 28m packets f= or > > two seconds can cause other protocol related problems and TCP could be > > sending retransmitted packets and now you have caused a bunch of work o= n > > the RX side * > > > > *at **the end point.* > > I would like my DPDK application to have zero packet loss, it only dela= ys > > all the received packet for 2 secs than transmitted them as it is witho= ut > > any change or processing to packets. > > Moreover, DPDK application is receiving tap traffic(monitoring traffic) > > rather than real-time traffic. So there will not be any TCP or any othe= r > > protocol-related problems. > > > > Looking forward to your reply. > > > > > > Best Regards, > > > > Wajeeha Javed > > Regards, > Keith > >