From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 165DBA04FF; Tue, 24 May 2022 00:54:43 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AE1484067B; Tue, 24 May 2022 00:54:42 +0200 (CEST) Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by mails.dpdk.org (Postfix) with ESMTP id A43274003C for ; Tue, 24 May 2022 00:54:41 +0200 (CEST) Received: by mail-pf1-f181.google.com with SMTP id bo5so14925532pfb.4 for ; Mon, 23 May 2022 15:54:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nkKwziER2xxdsZWiD9FtNKBx+uk4OgIOK9JztfK//W8=; b=uOn4IftGe4UvpKVBGFgx8TMHFFO14zjJTSTpr14Brkf6LK66DprvUJ7PVnszLdQLSq 5Lbu/qyGMrBWpvLbhcBMehBajnF/mlDeAwteQkesD2l9G093yWQ4/0++YIQIH+XhUR5/ d0Q6CdmboDJ2HfeMJxEww27+jcI2DR8Sxs5N55BpLG/nNHy7orfLFrlQRAoyXAyWIU3F lKY7SF1Q+pUg4fPjWFhiS4ujk0qp0jpJnxnDkvwLAi4wzrQQHOyiXct3WyEHA05y3Bpt ycuDdWd9junfUEtCHnkGAZrxztSf5ooC+/bEWxLleLIoXRX/sKF8xDt3elx/kND/evNr 2wSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nkKwziER2xxdsZWiD9FtNKBx+uk4OgIOK9JztfK//W8=; b=tCxKMuKxnzBGRyy7NP2aG9lO7B+qVcJymGBpCAnX8uAYBSa0xcYX5BOgap9i1yJ8mK hFhDvq1KS7OigZDbWHsL5dkXhFY2nh80MkHHdxMXfyW+uVqXjQ+Zfpa/9Mwhm2fHEZCs wAIpJXq8CNvsXPxuxq11PrD7P+bxmQKT9N30NlEm6P9aurZ7X5Q6RmDHpuzsUMvWxYYa rdF56e82gUH1636d1jDjjMxXXycB4jxLtv8FWtH7Qz27ygME0GgutC/F+Uch7XNGp9GO FAkcBw7ZgBOfgI+eE4Ozw+Opx8ubtv99dYC1T+Hso+qvsRTqLXslf79r30Gf9rOq6owd vkvA== X-Gm-Message-State: AOAM532EnIlhTSbKfJPT7rHAnoO9VDd+vx1YpbofwyL2p1b+5kgJLYjJ oJRQv5My8hfasm16CwFTaa4tKw== X-Google-Smtp-Source: ABdhPJz4rgvZl7T6ZpfT8muV80BhMwPQNb87mZFFXtS0e+/lUJz7+JqDgJYyJszd7jtHdKDl7++uIA== X-Received: by 2002:a63:131a:0:b0:3fa:aa7e:b28a with SMTP id i26-20020a63131a000000b003faaa7eb28amr301016pgl.569.1653346480763; Mon, 23 May 2022 15:54:40 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id e9-20020a633709000000b003f5d4d4f947sm5258083pga.78.2022.05.23.15.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 15:54:40 -0700 (PDT) Date: Mon, 23 May 2022 15:54:37 -0700 From: Stephen Hemminger To: Spike Du Cc: Matan Azrad , Slava Ovsiienko , Ori Kam , "NBU-Contact-Thomas Monjalon (EXTERNAL)" , "dev@dpdk.org" , Raslan Darawsheh Subject: Re: [RFC v2 3/7] ethdev: introduce Rx queue based limit watermark Message-ID: <20220523155437.764bea10@hermes.local> In-Reply-To: References: <20220506035645.4101714-1-spiked@nvidia.com> <20220522055900.417282-1-spiked@nvidia.com> <20220522055900.417282-4-spiked@nvidia.com> <20220522082321.3cdb7693@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, 23 May 2022 03:01:20 +0000 Spike Du wrote: > Hi, pls see below. > > > -----Original Message----- > > From: Stephen Hemminger > > Sent: Sunday, May 22, 2022 11:23 PM > > To: Spike Du > > Cc: Matan Azrad ; Slava Ovsiienko > > ; Ori Kam ; NBU-Contact- > > Thomas Monjalon (EXTERNAL) ; dev@dpdk.org; > > Raslan Darawsheh > > Subject: Re: [RFC v2 3/7] ethdev: introduce Rx queue based limit watermark > > > > External email: Use caution opening links or attachments > > > > > > On Sun, 22 May 2022 08:58:56 +0300 > > Spike Du wrote: > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index > > > 04cff8ee10..687ae5ff29 100644 > > > --- a/lib/ethdev/rte_ethdev.h > > > +++ b/lib/ethdev/rte_ethdev.h > > > @@ -1249,7 +1249,16 @@ struct rte_eth_rxconf { > > > */ > > > union rte_eth_rxseg *rx_seg; > > > > > > - uint64_t reserved_64s[2]; /**< Reserved for future fields */ > > > + /** > > > + * Per-queue Rx limit watermark defined as percentage of Rx queue > > > + * size. If Rx queue receives traffic higher than this percentage, > > > + * the event RTE_ETH_EVENT_RX_LWM is triggered. > > > + */ > > > + uint8_t lwm; > > > + > > > + uint8_t reserved_bits[3]; > > > + uint32_t reserved_32s; > > > + uint64_t reserved_64s; > > > void *reserved_ptrs[2]; /**< Reserved for future fields */ > > > }; > > > > > > > Ok but, this is an ABI risk about this because reserved stuff was never > > required before. > > Whenever is a reserved field is introduced the code (in this case > > rte_ethdev_configure). > > > > Best practice would have been to have the code require all reserved fields be > > 0 in earlier releases. In this case an application is like to define a watermark of > > zero; how will your code handle it. > Having watermark of 0 is desired, which is the default. LWM of 0 means the Rx > Queue's watermark is not monitored, hence no LWM event is generated. > > > > Also, using 8 bits as percentage is different than how other API's handle this. > > Since Rx queue size is in packets, why is this not in packets? > The short answer is to simply the LWM configuration. > Rx queue descriptor is complex nowadays. > For normal queue, user may configure LWM according to queue descriptor number easily. > But for below queues, it's not easy: > Take mprq as example, the testpmd cmd options can be " -a 0000:03:00.0,rxqs_min_mprq=1,mprq_en=1,mprq_max_memcpy_len=465,mprq_log_stride_size=8,mprq_log_stride_num=3 > -- --mbcache=512 -i --nb-cores=7 --txd=1024 --rxd=1024 ", > For MLX5 implementation, the minimum "unit" in queue has 64 descriptors, the "unit" number is 16, if you configure according to descriptor number(1024) > Here, you may easily set LWM as something like 512, but HW doesn't allow it, because 512 > 16. If you want the watermark to be half, the correct value is 8. > The same issue happens to feature like "Rx queue buffer split" where a packet can be split to multiple descriptors. > Using percentage doesn't have such issues, PMD will cover all the details. > > > Also document what behavior of 0 is. > Sure. The behavior is like the old days without this feature, pls see above. > > > Why introduce new query/set operations? This should just be part of the > > overall device configuration. > Due to different implementation. LWM can be a dynamic configuration which can help user design a flexible flow control. > User may feel ok with LWM of 80% to get high throughput, or later on with 50% to throttle the traffic responsively by handling LWM event in order to reduce drop. > Some driver like mlx5 may implement LWM event as one-time shot. When you receive LWM event, you need to reconfigure LWM in order to receive the event again, thus you will > not likely to be overwhelmed by the events. > These all require set operation. > > For the query operation. The rte_event API rte_eth_dev_callback_process() is per-port API, it doesn't carry much information when an event happens. > When a LWM event happens, we need to know in which Rx queue it happens or optionally what's the current LWM percentage of this queue. > The query operation serves this purpose. > > > Regards, > Spike. > > The bigger question is why does this have to be just MLX5 and why can't it fit into the existing DPDK RX interrupt framework? Linux and BSD have had this for years in their packet coalescing logic. Ethtool provides ability to set lot of irq coalescing parameters like: ethtool -C|--coalesce devname [adaptive-rx on|off] [adaptive-tx on|off] [rx-usecs N] [rx-frames N] [rx-usecs-irq N] [rx-frames-irq N] [tx-usecs N] [tx-frames N] [tx-usecs-irq N] [tx-frames-irq N] [stats-block-usecs N] [pkt-rate-low N] [rx-usecs-low N] [rx-frames-low N] [tx-usecs-low N] [tx-frames-low N] [pkt-rate-high N] [rx-usecs-high N] [rx-frames-high N] [tx-usecs-high N] [tx-frames-high N] [sample-interval N] [cqe-mode-rx on|off] [cqe-mode-tx on|off] It feels like this is just the DPDK version of a small subset of that. Since many device already support IRQ coalescing, it would be best to build one new API that has most of these. Rather than a MLX/Nvidia only API for a single parameter.