From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 156E0A034F; Mon, 11 Oct 2021 13:49:35 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A5F6C40E01; Mon, 11 Oct 2021 13:49:34 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id 92B9440142 for ; Mon, 11 Oct 2021 13:49:33 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 101D67F578; Mon, 11 Oct 2021 14:49:33 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 101D67F578 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1633952973; bh=6ppl5vBYv9zmlf7zPTtCvwgltw6+0fwJC4k8V3UFdZE=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=t143dpIOkGZfgo0WagQL6qzunVpW0QXEszuJpTScC33K3L8Q4UwX5HSl64Fpeuh6R QiqqNeI8ZYsdD1EvIGHsrUFaDSsyFPcWfLT63DJi3kwqRULgk57RuJnu53F3q69wmZ oRTAP3dFX1e4T0NkuuV2rMJgl4Tq9u3Gb73P8lzU= To: Xueming Li , dev@dpdk.org Cc: Jerin Jacob , Ferruh Yigit , Viacheslav Ovsiienko , Thomas Monjalon , Lior Margalit , Ananyev Konstantin References: <20210727034204.20649-1-xuemingl@nvidia.com> <20210930145602.763969-1-xuemingl@nvidia.com> From: Andrew Rybchenko Organization: OKTET Labs Message-ID: <8494d5f3-f134-e9d7-d782-dca9a9efaa03@oktetlabs.ru> Date: Mon, 11 Oct 2021 14:49:32 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20210930145602.763969-1-xuemingl@nvidia.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 0/6] ethdev: introduce shared Rx queue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Xueming, On 9/30/21 5:55 PM, Xueming Li wrote: > In current DPDK framework, all RX queues is pre-loaded with mbufs for > incoming packets. When number of representors scale out in a switch > domain, the memory consumption became significant. Further more, > polling all ports leads to high cache miss, high latency and low > throughputs. > > This patch introduces shared RX queue. PF and representors with same > configuration in same switch domain could share RX queue set by > specifying shared Rx queue offloading flag and sharing group. > > All ports that Shared Rx queue actually shares One Rx queue and only > pre-load mbufs to one Rx queue, memory is saved. > > Polling any queue using same shared RX queue receives packets from all > member ports. Source port is identified by mbuf->port. > > Multiple groups is supported by group ID. Port queue number in a shared > group should be identical. Queue index is 1:1 mapped in shared group. > An example of polling two share groups: > core group queue > 0 0 0 > 1 0 1 > 2 0 2 > 3 0 3 > 4 1 0 > 5 1 1 > 6 1 2 > 7 1 3 > > Shared RX queue must be polled on single thread or core. If both PF0 and > representor0 joined same share group, can't poll pf0rxq0 on core1 and > rep0rxq0 on core2. Actually, polling one port within share group is > sufficient since polling any port in group will return packets for any > port in group. I apologize that I jump in into the review process that late. Frankly speaking I doubt that it is the best design to solve the problem. Yes, I confirm that the problem exists, but I think there is better and simpler way to solve it. The problem of the suggested solution is that it puts all the headache about consistency to application and PMDs without any help from ethdev layer to guarantee the consistency. As the result I believe it will be either missing/lost consistency checks or huge duplication in each PMD which supports the feature. Shared RxQs must be equally configured including number of queues, offloads (taking device level Rx offloads into account), RSS settings etc. So, applications must care about it and PMDs (or ethdev layer) must check it. The advantage of the solution is that any device may create group and subsequent devices join. Absence of primary device is nice. But do we really need it? Will the design work if some representors are configured to use shared RxQ, but some do not? Theoretically it is possible, but could require extra non-trivial code on fast path. Also looking at the first two patch I don't understand how application will find out which devices may share RxQs. E.g. if we have two difference NICs which support sharing, we can try to setup only one group 0, but finally will have two devices (not one) which must be polled. 1. We need extra flag in dev_info->dev_capa RTE_ETH_DEV_CAPA_RX_SHARE to advertise that the device supports Rx sharing. 2. I think we need "rx_domain" in device info (which should be treated in boundaries of the switch_domain) if and only if RTE_ETH_DEV_CAPA_RX_SHARE is advertised. Otherwise rx_domain value does not make sense. (1) and (2) will allow application to find out which devices can share Rx. 3. Primary device (representors backing device) should advertise shared RxQ offload. Enabling of the offload tells the device to provide packets to all device in the Rx domain with mbuf->port filled in appropriately. Also it allows app to identify primary device in the Rx domain. When application enables the offload, it must ensure that it does not treat used port_id as an input port_id, but always check mbuf->port for each packet. 4. A new Rx mode should be introduced for secondary devices. It should not allow to configure RSS, specify any Rx offloads etc. ethdev must ensure it. It is an open question right now if it should require to provide primary port_id. In theory representors have it. However, may be it is nice for consistency to ensure that application knows that it does. If shared Rx mode is specified for device, application does not need to setup RxQs and attempts to do it should be discarded in ethdev. For consistency it is better to ensure that number of queues match. It is an interesting question what should happen if primary device is reconfigured and shared Rx is disabled on reconfiguration. 5. If so, in theory implementation of the Rx burst in the secondary could simply call Rx burst on primary device. Andrew.