From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <tgraf@redhat.com> Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 8AA5E5320 for <dev@dpdk.org>; Thu, 30 Jan 2014 00:13:57 +0100 (CET) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s0TNFDV3001214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 29 Jan 2014 18:15:13 -0500 Received: from lsx.localdomain (vpn1-5-163.ams2.redhat.com [10.36.5.163]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s0TNFAt6000987; Wed, 29 Jan 2014 18:15:11 -0500 Message-ID: <52E98B7E.9010602@redhat.com> Date: Thu, 30 Jan 2014 00:15:10 +0100 From: Thomas Graf <tgraf@redhat.com> Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Fran=E7ois-Fr=E9d=E9ric_Ozog?= <ff@ozog.com>, "'Vincent JARDIN'" <vincent.jardin@6wind.com> References: <1390873715-26714-1-git-send-email-pshelar@nicira.com> <52E7D13B.9020404@redhat.com> <CALnjE+rP29s8mkiKPtppt-a8jMn-B2qS7+re2ZBd8bK46ozUPA@mail.gmail.com> <52E8B88A.1070104@redhat.com> <52E8D772.9070302@6wind.com> <52E8E2AB.1080600@redhat.com> <52E92DA6.9070704@6wind.com> <52E936D9.4010207@redhat.com> <00ef01cf1d33$5e509270$1af1b750$@com> In-Reply-To: <00ef01cf1d33$5e509270$1af1b750$@com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Cc: dev@openvswitch.org, dev@dpdk.org, 'Gerald Rogers' <gerald.rogers@intel.com>, dpdk-ovs@ml01.01.org Subject: Re: [dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> X-List-Received-Date: Wed, 29 Jan 2014 23:13:58 -0000 On 01/29/2014 09:47 PM, François-Frédéric Ozog wrote: > In the telecom world, if you fix the underlying framework of an app, you > will still have to validate the solution, ie app/framework. In addition, the > idea of shared libraries introduces the implied requirement to validate apps > against diverse versions of DPDK shared libraries. This translates into > development and support costs. > > I also expect many DPDK applications to tackle core networking features, > with sub micro second packet handling delays and even lower than 200ns > (NAT64...). The lazy binding based on ELF PLT represent quite a cost, not > mentioning that optimization stops are shared libraries boundaries (gcc > whole program optimization can be very effective...). Microsoft DLL linkage > are an order of magnitude faster. If Linux was to provide that, I would > probably revise my judgment. (I haven't checked Linux dynamic linking > implementation for some time so my understanding of Linux dynamic linking > may be outdated). All very valid points and I am not suggesting to stop offering the static linking option in any way. Dynamic linking will by design result in more cycles. My sole point is that for a core platform component like OVS, the shared library benefits _might_ outweigh the performance difference. In order for a shared library to be effective, some form of ABI compatibility must be guaranteed though. > I don't think it is so straight forward. Many recent cards such as Chelsio > and Myricom have a very different "packet memory layout" that does not fit > so easily into actual DPDK architecture. > > 1) "traditional" architecture: the driver reserves X buffers and provide the > card with descriptors of those buffers. Each packet is DMA'ed into exactly > one buffer. Typically you have 2K buffers, a 64 byte packet consumes exactly > one buffer > > 2) "alternative" new architecture: the driver reserves a memory zone, say > 4MB, without any structure, and provide a a single zone description and a > ring buffer to the card. (there no individual buffer descriptors any more). > The card fills the memory zone with packets, one next to the other and > specifies where the packets are by updating the supplied ring. Out of the > many issues fitting this scheme into DPDK, you cannot free a single mbuf: > you have to maintain a ref count to the memory zone so that, when all mbufs > have been "released", the memory zone can be freed. > That's quite a stretch from actual paradigm. > > Apart from this aspect, managing RSS is two tied to Intel's flow director > concepts and cannot accommodate directly smarter or dumber RSS mechanisms. > > That said, I fully agree PMD API should be revisited. Fair enough. I don't see a reason why multiple interfaces could not coexist in order to support multiple memory layouts. What I'm hearing so far is that while there is no objection to bringing stability to the APIs, it should not result in performance side effects and it is still early to nail down the yet fluent APIs.