From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by dpdk.org (Postfix) with ESMTP id C054011F5 for ; Tue, 13 Oct 2015 07:18:22 +0200 (CEST) Received: by padhy16 with SMTP id hy16so9677198pad.1 for ; Mon, 12 Oct 2015 22:18:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=D8dB0qu3MutKNu3BQQ1LU6ZUbI8ksDDWk1FSP0pHqVQ=; b=GsyB7lQm9Wcn4XD+evoJKjiXBD73QAXxisAvL9ca3gL1b+TfAPTbjWL148rCAEAr7X JZ5rjmlbq6164oGT4xDF2ZM0sMRnh6DHJLIPvnbDDsOZlhvDSMJPxf5iy+1jsjTCWure SiA4nDccc6RqKumOP7l7CneWjM76fMzy8m54Jzw+uM5/LliVzZfN1Xs4uUiqwv3V2ee+ cGwnMov+NesGSHMF0OWO+RZUmM55xzC/loKpln27vBxI9Yl+D5pYHVSRMEuY5b0Z7BlO 7AETP5QoW9Ho0yjGSatyY8YMy8RoR1usdmExwJ+Iu2P8MZoNlCm54pMFv66igkB2GreP FyeQ== X-Gm-Message-State: ALoCoQnuoFJtF2j7kbZIqNxgw5UftqgAxarwj4dydvaeoYe26XkCK7GorGq2AzEx0V27yNAYfaJY X-Received: by 10.69.1.67 with SMTP id be3mr38871697pbd.78.1444713500536; Mon, 12 Oct 2015 22:18:20 -0700 (PDT) Received: from xeon-e3 (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by smtp.gmail.com with ESMTPSA id cn4sm1174768pbc.94.2015.10.12.22.18.20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Oct 2015 22:18:20 -0700 (PDT) Date: Mon, 12 Oct 2015 22:18:30 -0700 From: Stephen Hemminger To: "Sanford, Robert" Message-ID: <20151012221830.6f5f42af@xeon-e3> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] IXGBE RX packet loss with 5+ cores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Oct 2015 05:18:23 -0000 On Tue, 13 Oct 2015 02:57:46 +0000 "Sanford, Robert" wrote: > I'm hoping that someone (perhaps at Intel) can help us understand > an IXGBE RX packet loss issue we're able to reproduce with testpmd. > > We run testpmd with various numbers of cores. We offer line-rate > traffic (~14.88 Mpps) to one ethernet port, and forward all received > packets via the second port. > > When we configure 1, 2, 3, or 4 cores (per port, with same number RX > queues per port), there is no RX packet loss. When we configure 5 or > more cores, we observe the following packet loss (approximate): > 5 cores - 3% loss > 6 cores - 7% loss > 7 cores - 11% loss > 8 cores - 15% loss > 9 cores - 18% loss > > All of the "lost" packets are accounted for in the device's Rx Missed > Packets Count register (RXMPC[0]). Quoting the datasheet: > "Packets are missed when the receive FIFO has insufficient space to > store the incoming packet. This might be caused due to insufficient > buffers allocated, or because there is insufficient bandwidth on the > IO bus." > > RXMPC, and our use of API rx_descriptor_done to verify that we don't > run out of mbufs (discussed below), lead us to theorize that packet > loss occurs because the device is unable to DMA all packets from its > internal packet buffer (512 KB, reported by register RXPBSIZE[0]) > before overrun. > > Questions > ========= > 1. The 82599 device supports up to 128 queues. Why do we see trouble > with as few as 5 queues? What could limit the system (and one port > controlled by 5+ cores) from receiving at line-rate without loss? > > 2. As far as we can tell, the RX path only touches the device > registers when it updates a Receive Descriptor Tail register (RDT[n]), > roughly every rx_free_thresh packets. Is there a big difference > between one core doing this and N cores doing it 1/N as often? > > 3. Do CPU reads/writes from/to device registers have a higher priority > than device reads/writes from/to memory? Could the former transactions > (CPU <-> device) significantly impede the latter (device <-> RAM)? > > Thanks in advance for any help you can provide. As you add cores, there is more traffic on the PCI bus from each core polling. There is a fix number of PCI bus transactions per second possible. Each core is increasing the number of useless (empty) transactions. Why do you think adding more cores will help?