From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mo3.mail-out.ovh.net (11.mo3.mail-out.ovh.net [87.98.184.158]) by dpdk.org (Postfix) with ESMTP id 23BEA5320 for ; Wed, 29 Jan 2014 21:49:28 +0100 (CET) Received: from mail408.ha.ovh.net (b6.ovh.net [213.186.33.56]) by mo3.mail-out.ovh.net (Postfix) with SMTP id 7BA57FFB1B3 for ; Wed, 29 Jan 2014 21:50:46 +0100 (CET) Received: from b0.ovh.net (HELO queueout) (213.186.33.50) by b0.ovh.net with SMTP; 29 Jan 2014 22:51:33 +0200 Received: from lneuilly-152-23-9-75.w193-252.abo.wanadoo.fr (HELO pcdeff) (ff@ozog.com@193.252.40.75) by ns0.ovh.net with SMTP; 29 Jan 2014 22:51:32 +0200 From: =?iso-8859-1?Q?Fran=E7ois-Fr=E9d=E9ric_Ozog?= To: "'Thomas Graf'" , "'Vincent JARDIN'" References: <1390873715-26714-1-git-send-email-pshelar@nicira.com> <52E7D13B.9020404@redhat.com> <52E8B88A.1070104@redhat.com> <52E8D772.9070302@6wind.com> <52E8E2AB.1080600@redhat.com> <52E92DA6.9070704@6wind.com> <52E936D9.4010207@redhat.com> In-Reply-To: <52E936D9.4010207@redhat.com> Date: Wed, 29 Jan 2014 21:47:47 +0100 Message-ID: <00ef01cf1d33$5e509270$1af1b750$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac8dFYYyHRZDQgiqRhyUPV79pq/LXwAF/OeA Content-Language: fr X-Ovh-Tracer-Id: 7625720069508880601 X-Ovh-Remote: 193.252.40.75 (lneuilly-152-23-9-75.w193-252.abo.wanadoo.fr) X-Ovh-Local: 213.186.33.20 (ns0.ovh.net) X-OVH-SPAMSTATE: OK X-OVH-SPAMSCORE: -100 X-OVH-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeejtddrieehucetufdoteggodetrfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-Spam-Check: DONE|U 0.5/N X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeejtddrieehucetufdoteggodetrfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd Cc: dev@openvswitch.org, dev@dpdk.org, 'Gerald Rogers' , dpdk-ovs@ml01.01.org Subject: Re: [dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jan 2014 20:49:28 -0000 > > First and easy answer: it is open source, so anyone can recompile. = So, > > what's the issue? >=20 > I'm talking from a pure distribution perspective here: Requiring to > recompile all DPDK based applications to distribute a bugfix or to add > support for a new PMD is not ideal. >=20 > So ideally OVS would have the possibility to link against the shared > library long term. I agree that distribution of DPDK apps is not covered properly at = present. Identifying the proper scheme requires a specific analysis based on the constraints of the Telecom/Cloud/Networking markets. In the telecom world, if you fix the underlying framework of an app, you will still have to validate the solution, ie app/framework. In addition, = the idea of shared libraries introduces the implied requirement to validate = apps against diverse versions of DPDK shared libraries. This translates into development and support costs. I also expect many DPDK applications to tackle core networking features, with sub micro second packet handling delays and even lower than 200ns (NAT64...). The lazy binding based on ELF PLT represent quite a cost, = not mentioning that optimization stops are shared libraries boundaries (gcc whole program optimization can be very effective...). Microsoft DLL = linkage are an order of magnitude faster. If Linux was to provide that, I would probably revise my judgment. (I haven't checked Linux dynamic linking implementation for some time so my understanding of Linux dynamic = linking may be outdated). >=20 > > I get lost: do you mean ABI + API toward the PMDs or towards the > > applications using the librte ? >=20 > Towards the PMDs is more straight forward at first so it seems logical = to > focus on that first. I don't think it is so straight forward. Many recent cards such as = Chelsio and Myricom have a very different "packet memory layout" that does not = fit so easily into actual DPDK architecture. 1) "traditional" architecture: the driver reserves X buffers and provide = the card with descriptors of those buffers. Each packet is DMA'ed into = exactly one buffer. Typically you have 2K buffers, a 64 byte packet consumes = exactly one buffer 2) "alternative" new architecture: the driver reserves a memory zone, = say 4MB, without any structure, and provide a a single zone description and = a ring buffer to the card. (there no individual buffer descriptors any = more). The card fills the memory zone with packets, one next to the other and specifies where the packets are by updating the supplied ring. Out of = the many issues fitting this scheme into DPDK, you cannot free a single = mbuf: you have to maintain a ref count to the memory zone so that, when all = mbufs have been "released", the memory zone can be freed. That's quite a stretch from actual paradigm. Apart from this aspect, managing RSS is two tied to Intel's flow = director concepts and cannot accommodate directly smarter or dumber RSS = mechanisms. That said, I fully agree PMD API should be revisited. Cordially, Fran=E7ois-Fr=E9d=E9ric