From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 156F468EE for ; Tue, 25 Oct 2016 16:41:39 +0200 (CEST) Received: from lfbn-1-5996-232.w90-110.abo.wanadoo.fr ([90.110.195.232] helo=[192.168.1.13]) by mail.droids-corp.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1bz2yI-00014k-Ee; Tue, 25 Oct 2016 16:44:58 +0200 To: Tomasz Kulasek , dev@dpdk.org References: <1477317933-14144-1-git-send-email-tomaszx.kulasek@intel.com> <1477327917-18564-1-git-send-email-tomaszx.kulasek@intel.com> <1477327917-18564-2-git-send-email-tomaszx.kulasek@intel.com> Cc: konstantin.ananyev@intel.com From: Olivier Matz Message-ID: Date: Tue, 25 Oct 2016 16:41:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.2.0 MIME-Version: 1.0 In-Reply-To: <1477327917-18564-2-git-send-email-tomaszx.kulasek@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v10 1/6] ethdev: add Tx preparation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2016 14:41:39 -0000 Hi Tomasz, On 10/24/2016 06:51 PM, Tomasz Kulasek wrote: > Added API for `rte_eth_tx_prep` > > [...] > --- a/lib/librte_ether/rte_ethdev.h > +++ b/lib/librte_ether/rte_ethdev.h > @@ -182,6 +182,7 @@ extern "C" { > #include > #include > #include > +#include > #include "rte_ether.h" > #include "rte_eth_ctrl.h" > #include "rte_dev_info.h" > @@ -699,6 +700,8 @@ struct rte_eth_desc_lim { > uint16_t nb_max; /**< Max allowed number of descriptors. */ > uint16_t nb_min; /**< Min allowed number of descriptors. */ > uint16_t nb_align; /**< Number of descriptors should be aligned to. */ > + uint16_t nb_seg_max; /**< Max number of segments per whole packet. */ > + uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */ Sorry if it was not clear in my previous review, but I think this should be better explained here. You said that the "limitation of number of segments may differ depend of TSO/non TSO". As an application developer, I still have some difficulties to clearly understand what does that mean. Is it the maximum number of mbuf-segments that contain payload for one tcp-segment sent by the device? In that case, it looks quite difficult to verify that in an application. It looks that this field is not used by validate_offload(), so how should it be used by an application? > }; > > /** > @@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq, > uint16_t nb_pkts); > /**< @internal Send output packets on a transmit queue of an Ethernet device. */ > > +typedef uint16_t (*eth_tx_prep_t)(void *txq, > + struct rte_mbuf **tx_pkts, > + uint16_t nb_pkts); > +/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */ > + > typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev, > struct rte_eth_fc_conf *fc_conf); > /**< @internal Get current flow control parameter on an Ethernet device */ > @@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback { > struct rte_eth_dev { > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */ > struct rte_eth_dev_data *data; /**< Pointer to device data */ > const struct eth_driver *driver;/**< Driver for this device */ > const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ > @@ -2816,6 +2825,93 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, > return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts); > } > > +/** > + * Process a burst of output packets on a transmit queue of an Ethernet device. > + * > + * The rte_eth_tx_prep() function is invoked to prepare output packets to be > + * transmitted on the output queue *queue_id* of the Ethernet device designated > + * by its *port_id*. > + * The *nb_pkts* parameter is the number of packets to be prepared which are > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them > + * allocated from a pool created with rte_pktmbuf_pool_create(). > + * For each packet to send, the rte_eth_tx_prep() function performs > + * the following operations: > + * > + * - Check if packet meets devices requirements for tx offloads. > + * > + * - Check limitations about number of segments. > + * > + * - Check additional requirements when debug is enabled. > + * > + * - Update and/or reset required checksums when tx offload is set for packet. > + * > + * The rte_eth_tx_prep() function returns the number of packets ready to be > + * sent. A return value equal to *nb_pkts* means that all packets are valid and > + * ready to be sent. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * The value must be a valid port id. > + * @param queue_id > + * The index of the transmit queue through which output packets must be > + * sent. > + * The value must be in the range [0, nb_tx_queue - 1] previously supplied > + * to rte_eth_dev_configure(). > + * @param tx_pkts > + * The address of an array of *nb_pkts* pointers to *rte_mbuf* structures > + * which contain the output packets. > + * @param nb_pkts > + * The maximum number of packets to process. > + * @return > + * The number of packets correct and ready to be sent. The return value can be > + * less than the value of the *tx_pkts* parameter when some packet doesn't > + * meet devices requirements with rte_errno set appropriately. > + */ Inserting here the previous comment: >> Can we add the constraint that invalid packets are left untouched? >> >> I think most of the time there will be a software fallback in that case, >> so it would be good to ensure that this function does not change the flags >> or the packet data. > > In current implementation, if packet is invalid, its data is never modified. Only checks are done. The only exception is when checksum needs to be updated or initialized, but it's done after packet validation. > If we want to use/restore packet in application it didn't should be changed in any way for invalid packets. > I think this should be explicitly said in the API comment that valid packets may be modified (*), but invalid packets (whose index is >= than the return value) are left untouched. (*) we still need to discuss that point, see below. Another comment was made: >> Another thing that could be interesting for the caller is to know the >> reason of the failure. Maybe the different errno types could be detailed >> here. For instance: >> - EINVAL: offload flags are not correctly set (i.e. would fail whatever >> the hardware) >> - ENOTSUP: the offload feature is not supported by the hardware >> - ... >> > > Ok. Don't you feel it could go in the API comment too? > [...] > > +/** > + * Fix pseudo header checksum > + * > + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in > + * provided mbufs packet data. > + * > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set > + * in packet data, > + * - for TSO the IP payload length is not included in pseudo header. > + * > + * This function expects that used headers are in the first data segment of > + * mbuf, and are not fragmented. There is another requirement about the cloning and reference count. Sorry I did not answer to Konstantin, but I still think this could be an issue. For instance, I think that the zero-copy mode of vhost application references the packet sent by the guest and send the data. The payload should not be modified because it is in guest memory (we don't know, maybe the guest also cloned it for its own purpose). It means that the tx_prep() API must not be used with clones, i.e the headers must not reside in a segment whose RTE_MBUF_INDIRECT(seg) or rte_mbuf_refcnt_read(seg) > 1. - if we really want this API in 16.11, it should be clearly explained in the API comment that it does not work with shared segments - for next versions, we have to take a decision whether it should be supported or not. In my opinion, cloned packets are useful and should be supported properly by all dpdk APIs. Thanks, Olivier