From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f171.google.com (mail-ua0-f171.google.com [209.85.217.171]) by dpdk.org (Postfix) with ESMTP id 0230F3253 for ; Mon, 17 Jul 2017 22:38:43 +0200 (CEST) Received: by mail-ua0-f171.google.com with SMTP id 35so882334uax.3 for ; Mon, 17 Jul 2017 13:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=m9BvNiEBILks0SzbXrBEgQI09eFepu2BKMLN9ZE+cvY=; b=CgQZHwel/WFXOFi0LL0eEX6vf2Ek60+5ri9qxDeLQ3/xKv9knelBbqC6nodHPhtH23 oCCGxGMZ0RY19VAQ/5YU74FRwohz302cSD2PBFitiK0SZhnpg6oaDVeDs0x3qpiaCMkm c/olBDrZvcZ2wgrbNKYzmlxldk0VIiahqgRqvX7/7Uuzkzaqt3izH6Q8UFSPT95kjhUd fILF6JjtbKoIr/S9p63e58nxcJcS4IeyfcqIzGgfI+8R22Gq/jK+7aXlN1i76o/O7UlV VwkbX3pDzLsPTITurAi/zI1kYZhIDYR6AWTrcBsyl3WBRZRxwLzOLp6CqkaP4QfBbnMG CcrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=m9BvNiEBILks0SzbXrBEgQI09eFepu2BKMLN9ZE+cvY=; b=FoPWr5SrdyqYuWLalT8TC36kv5aA3VJLzABzb7fYgtxBm01KOZSvEPxONrtOPaQJ7t dniIJuMzTnKa6qyb7nAbIZCb4yRTOjRApP0iWtFumBgT7CLmkLlvTV9a29Pg/D3MfKnF IrqwBuix7+EPuXY58FJn2MGfJJ1R0FrtnBOwLNCVPOWIdRWiIDgUyAMxoVgKIVKWzzOr YAGBcSGnjgheM8llxhqt+3Q/jyY73R9BvedxvfqekKywPWRWv3QDvUCiPykgwxoO+L6w 94hykgdTmUvWxFzPrxYPzNl7bUvDADz3I0PMDazp1MdP/bqIYB30a4fYqi71p5q5+3aJ fmWw== X-Gm-Message-State: AIVw113fRF2a9yTWjQ8zk+m9P8aG6BP/CrVOhPGOlSJBbIexCPODHfHr roIr0BgjmwSi8Fbnn1vdErT8wYt6Cg== X-Received: by 10.31.154.5 with SMTP id c5mr13133624vke.34.1500323923305; Mon, 17 Jul 2017 13:38:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.64.3 with HTTP; Mon, 17 Jul 2017 13:38:42 -0700 (PDT) In-Reply-To: References: From: Pavel Shirshov Date: Mon, 17 Jul 2017 13:38:42 -0700 Message-ID: To: Harold Demure Cc: users@dpdk.org Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-users] Strange packet loss with multi-frame payloads X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 20:38:44 -0000 Hi Harold, Sorry I don't have a direct answer on your request, but I have a bunch of questions. 1. What is "packet_id" here? It's something inside of your udp payload? 2. How do you know you have the packet loss? How can you be sure it's packet loss if you don't see it on your counters? How can you be sure that these packets were sent by clients? How can you be sure your clients actually sent the packets? Also I see you're using 2x8 cores server. So your OS uses some cores for itself. Could it be a problem too? Thanks On Mon, Jul 17, 2017 at 6:18 AM, Harold Demure wrote: > Hello, > I am having a problem with packets loss and I hope you can help me out. > Below you find a description of the application and of the problem. > It is a little long, but I really hope somebody out there can help me, > because this is driving me crazy. > > *Application* > > I have a client-server application; single server, multiple clients. > The machines have 8 active cores which poll 8 distinct RX queues to receive > packets and use 8 distinct TX queues to burst out packets (i.e., > run-to-completion model). > > *Workload* > > The workload is composed of mostly single-frame packets, but occasionally > clients send to the server multi-frame packets, and occasionally the server > sends back to the client multi-frame replies. > Packets are fragmented at the UDP level (i.e., no IP fragmentation, every > packet of the same requests has a frag_id == 0, even though they share the > same packet_id). > > *Problem* > > I experience huge packet loss on the server when the occasional multi-frame > requests of the clients correspond to a big payload ( > 300 Kb). > The eth stats that I gather on the server say that there is no error, nor > any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all > equal to 0). Yet, the application is not seeing some packets of big > requests that the clients send. > > I record some interesting facts > 1) The clients do not experience such packet loss, although they also > receive packets with an aggregate payload of the same size of the packets > received by the server. The only differences w.r.t. the server is that a > client machine of course has a lower RX load (it only gets the replies to > its own requests) and a client thread only receives packets from a single > machine (the server). > 2) This behavior does not arise as long as the biggest payload exchanged > between clients and servers is < 200 Kb. This leads me to conclude that > fragmentation is not te issue (also, if I implement a stubborn > retransmission, eventually all packets are received even with bigger > payloads). Also, I reserve plenty of memory for my mempool, so I don't > think the server runs out of mbufs (and if that was the case I guess I > would see this in the dropped packets count, right?). > 3) If I switch to the pipeline model (on the server only) this problem > basically disappears. By pipeline model I mean something like the > load-balancing app, where a single core on the server receives client > packets on a single RX queue (worker cores reply back to the client using > their own TX queue). This leads me to think that the problem is on the > server, and not on the clients. > 4) It doesn't seem to be a "load" problem. If I run the same tests multiple > times, in some "lucky" runs I get that the run-to-completion model > outperforms the pipeline one. Also, the run-to-completion model with > single-frame packets can handle a number of single-frame packets per second > that is much higher than the number of frames per second that are generated > with the workload with some big packets. > > > *Question* > > Do you have any idea why I am witnessing this behavior? I know that having > fewer queues can help performance by relieving contention on the NIC, but > is it possible that the contention is actually causing packets to get > dropped? > > *Platform* > > DPDK: v 2.2-0 (I know this is an old version, but I am dealing with > legacy code I cannot change) > > MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64 > > My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro] > > My machine runs a 4.4.0-72-generic on Ubuntu 16.04.02 > > CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 2x8 cores > > > Thank you a lot, especially if you went through the whole email :) > Regards, > Harold