From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id D35A7378E for ; Mon, 25 Mar 2019 21:58:39 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Mar 2019 13:58:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,270,1549958400"; d="scan'208";a="310342477" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.237.221.46]) ([10.237.221.46]) by orsmga005.jf.intel.com with ESMTP; 25 Mar 2019 13:58:37 -0700 To: Jakub Grajciar , dev@dpdk.org References: <20190220115254.18724-1-jgrajcia@cisco.com> <20190322115727.4358-1-jgrajcia@cisco.com> From: Ferruh Yigit Openpgp: preference=signencrypt Autocrypt: addr=ferruh.yigit@intel.com; prefer-encrypt=mutual; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJVBBMBAgA/AhsDBgsJCAcDAgYVCAIJCgsE FgIDAQIeAQIXgBYhBNI2U4dCLsKE45mBx/kz60PfE2EfBQJbughWBQkHwjOGAAoJEPkz60Pf E2Eft84QAIbKWqhgqRfoiw/BbXbA1+qm2o4UgkCRQ0yJgt9QsnbpOmPKydHH0ixCliNz1J8e mRXCkMini1bTpnzp7spOjQGLeAFkNFz6BMq8YF2mVWbGEDE9WgnAxZdi0eLY7ZQnHbE6AxKL SXmpe9INb6z3ztseFt7mqje/W/6DWYIMnH3Yz9KzxujFWDcq8UCAvPkxVQXLTMpauhFgYeEx Nub5HbvhxTfUkapLwRQsSd/HbywzqZ3s/bbYMjj5JO3tgMiM9g9HOjv1G2f1dQjHi5YQiTZl 1eIIqQ3pTic6ROaiZqNmQFXPsoOOFfXF8nN2zg8kl/sSdoXWHhama5hbwwtl1vdaygQYlmdK H2ueiFh/UvT3WG3waNv2eZiEbHV8Rk52Xyn2w1G90lV0fYC6Ket1Xjoch7kjwbx793Kz/RfQ rmBY8/S4DTGn3oq3dMdQY+b6+7VMUeLMMh2CXYO9ErkOq+qNTD1IY+cBAkXnaDbQfz0zbste ZGWH74FAZ9nCpDOqbRTrBL42aMGhfOWEyeA1x7+hl6JZfabBWAuf4nnCXuorKHzBXTrf7u7p fXsKQClWRW77PF1VmzrtKNVSytQAmlCWApQIw20AarFipXmVdIjHmJPU611WoyxZPb4JTOxx 5cv9B+nr/RIB+v5dcStyHCCwO1be7nBDdCgd4F6kTQPLuQINBFfWTL4BEACnNA29e8TarUsB L5n6eLZHXcFvVwNLVlirWOClHXf44o2KnN3ww+eBEmKVfEFo9MSuGDNHS8Zw1NiGMYxLIUgd U6gGrVVs/VrQWL82pbMk6jCj98N+BXIri+6K1z+AImz7ax7iF1kDgRAnFWU0znWWBgM2mM8Y gDjcxfXk4sCKnvf6Gjo08Ey5zmqx7dekAKU2EEp8Q1EJY3jbymLdZWRP4AFFMTS1rGMk0/tt v71NBg1GobCcbNfn9chK/jhqxYhAJqq86RdJQkt3/9x1U1Oq0vXCt4JVVHmkxePtUiuWTTt+ aYlUAsKYZsWvncExvw77x2ArYDmaK0yfjh37wp0lY7DOJHFxoyT8tyWZlLci/VMRG2Ja33xj 0CN4C1yBg+QDeV3QFxQo42iA/ykdXPUR3ezmsND3XKvVLTC4DNb3V/EZQ7jBj64+bEK0VW4G B31VP00ApNQvSoczsIOAKdk97RNbpmPw6q10ILIB+9T1xbnFYzshzGF17oC0/GENIHATx8vZ masOZoDiOZQpeneLgnFE9JfzhLTxv6wNZcc/HLXRQVTkDsQr8ERtkAoHCf1E5+b5Yr7pfnE4 YuhET746o25S53ELUYPIs49qoJsEJL34/oexMfPGyPIlrbufiNyty5jc/1MRwUlhJlJ5IOHy ZUa+6CLR7GdImusFkPJUJwARAQABiQI8BBgBAgAmAhsMFiEE0jZTh0IuwoTjmYHH+TPrQ98T YR8FAlu6CHAFCQXE7zIACgkQ+TPrQ98TYR9nXxAAqNBgkYNyGuWUuy0GwDQCbu3iiMyH1+D7 llafPcK4NYy1Z4AYuVwC9nmLaoj+ozdqS3ncRo57ncRsKEJC46nDJJZYZ5LSJVn63Y3NBF86 lxQAgjj2oyZEwaLKtKbAFsXL43jv1pUGgSvWwYtDwHITXXFQto9rZEuUDRFSx4sg9OR+Q6/6 LY+nQQ3OdHlBkflzYMPcWgDcvcTAO6yasLEUf7UcYoSWTyMYjLB4QuNlXzTswzGVMssJF/vo V8lD1eqqaSUWG3STF6GVLQOr1NLvN5+kUBiEStHFxBpgSCvYY9sNV8FS6N24CAWMBl+10W+D 2h1yiiP5dOdPcBDYKsgqDD91/sP0WdyMJkwdQJtD49f9f+lYloxHnSAxMleOpyscg1pldw+i mPaUY1bmIknLhhkqfMmjywQOXpac5LRMibAAYkcB8v7y3kwELnt8mhqqZy6LUsqcWygNbH/W K3GGt5tRpeIXeJ25x8gg5EBQ0Jnvp/IbBYQfPLtXH0Myq2QuAhk/1q2yEIbVjS+7iowEZNyE 56K63WBJxsJPB2mvmLgn98GqB4G6GufP1ndS0XDti/2K0o8rep9xoY/JDGi0n0L0tk9BHyoP Y7kaEpu7UyY3nVdRLe5H1/MnFG8hdJ97WqnPS0buYZlrbTV0nRFL/NI2VABl18vEEXvNQiO+ vM8= Message-ID: <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> Date: Mon, 25 Mar 2019 20:58:36 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <20190322115727.4358-1-jgrajcia@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Mar 2019 20:58:40 -0000 On 3/22/2019 11:57 AM, Jakub Grajciar wrote: > Memory interface (memif), provides high performance > packet transfer over shared memory. > > Signed-off-by: Jakub Grajciar <...> > @@ -0,0 +1,200 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2018-2019 Cisco Systems, Inc. > + > +====================== > +Memif Poll Mode Driver > +====================== > +Shared memory packet interface (memif) PMD allows for DPDK and any other client > +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is > +Linux only. > + > +The created device transmits packets in a raw format. It can be used with > +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is > +supported in DPDK memif implementation. > + > +Memif works in two roles: master and slave. Slave connects to master over an > +existing socket. It is also a producer of shared memory file and initializes > +the shared memory. Master creates the socket and listens for any slave > +connection requests. The socket may already exist on the system. Be sure to > +remove any such sockets, if you are creating a master interface, or you will > +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``, Can it be possible to remove this existing socket on successfully termination of the dpdk application with 'master' role? Otherwise each time to run a dpdk app, requires to delete the socket file first. > +which removes memif interface, will also remove a listener socket, if it is > +not being used by any other interface. > + > +The method to enable one or more interfaces is to use the > +``--vdev=net_memif0`` option on the DPDK application command line. Each > +``--vdev=net_memif1`` option given will create an interface named net_memif0, > +net_memif1, and so on. Memif uses unix domain socket to transmit control > +messages. Each memif has a unique id per socket. If you are connecting multiple > +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``, > +etc. Note that if you assign a socket to a master interface it becomes a > +listener socket. Listener socket can not be used by a slave interface on same > +client. When I connect two slaves, id=0 & id=1 to the master, both master and second slave crashed as soon as second slave connected, is this a know issue? Can you please check this? master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev net_memif0,role=slave,id=0 -- -i slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev net_memif0,role=slave,id=1 -- -i <...> > +Example: testpmd and testpmd > +---------------------------- > +In this example we run two instances of testpmd application and transmit packets over memif. How this play with multi process support? When I run a secondary app when memif PMD is enabled in primary, secondary process crashes. Can you please check and ensure at least nothing crashes when a secondary app run? > + > +First create ``master`` interface:: > + > + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i > + > +Now create ``slave`` interface (master must be already running so the slave will connect):: > + > + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i > + > +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only':: > + > + testpmd> set fwd rxonly > + testpmd> start > + > + testpmd> set fwd txonly > + testpmd> start Would it be useful to add loopback option to the PMD, for testing/debug ? Also I am getting low performance numbers above, comparing the ring pmd for example, is there any performance target for the pmd? Same forwarding core for both testpmds, this is terrible, either something is wrong or I am doing something wrong, it is ~40Kpps Different forwarding cores for each testpmd, still low, !18Mpps <...> > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > +CFLAGS += -DALLOW_EXPERIMENTAL_API Can you please add here which experimental APIs are called (as a comment), this help to keep trace of them in long term? <...> > +/** > + * Buffer descriptor. > + */ > +typedef struct __rte_packed { > + uint16_t flags; /**< flags */ > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */ Is this define used? <...> > + > +static struct rte_vdev_driver pmd_memif_drv; Same comment from previous review, is this forward deceleration required? <...> > +static memif_ring_t * > +memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num) > +{ > + /* rings only in region 0 */ > + void *p = pmd->regions[0].addr; > + int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) * > + (1 << pmd->run.log2_ring_size); > + > + p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size; According above code, I guess layout is as following [1], can you please correct if it is wrong? Can you please put this information somewhere, possibly the function comment that allocates it, so that everyone doesn't need to figure out. [1] region 0: +-----------------------+ | S2M rings | M2S rings | +-----------------------+ S2M OR M2S Rings: +-----------------------------------------+ | ring 0 | ring 1 | ring num_s2m_rings - 1| +-----------------------------------------+ ring 0: +-----------------------------------------------------+ | Ring Header | (1 << pmd->run.log2_ring_size) * desc | +-----------------------------------------------------+ region 1: +-------------------------------------------------------------------------+ | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 | +-------------------------------------------------------------------------+ Is there any order on packet buffers? <...> > + while (n_slots && n_rx_pkts < nb_pkts) { > + mbuf_head = rte_pktmbuf_alloc(mq->mempool); > + if (unlikely(mbuf_head == NULL)) > + goto no_free_bufs; > + mbuf = mbuf_head; > + mbuf->port = mq->in_port; > + > + next_slot: > + s0 = cur_slot & mask; > + d0 = &ring->desc[s0]; > + > + src_len = d0->length; > + dst_off = 0; > + src_off = 0; > + > + do { > + dst_len = mbuf_size - dst_off; > + if (dst_len == 0) { > + dst_off = 0; > + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM; > + > + mbuf = rte_pktmbuf_alloc(mq->mempool); > + if (unlikely(mbuf == NULL)) > + goto no_free_bufs; > + mbuf->port = mq->in_port; > + rte_pktmbuf_chain(mbuf_head, mbuf); > + } > + cp_len = RTE_MIN(dst_len, src_len); > + > + rte_pktmbuf_pkt_len(mbuf) = > + rte_pktmbuf_data_len(mbuf) += cp_len; Can you please make this two lines to prevent confusion? Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'? 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not set to 'cp_len'... <...> > +void > +memif_free_regions(struct pmd_internals *pmd) > +{ > + int i; > + struct memif_region *r; > + > + for (i = 0; i < pmd->regions_num; i++) { > + r = pmd->regions + i; > + if (r == NULL) > + return; 'r' can be NULL only when 'pmd->region' is NULL and 'i == 0', so is it better to check 'pmd->region' is NULL check before the loop? > + if (r->addr == NULL) > + return; > + munmap(r->addr, r->region_size); > + if (r->fd > 0) { > + close(r->fd); > + r->fd = -1; > + } > + } > + rte_free(pmd->regions); > +} > + > +static int > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) > +{ > + struct memif_region *r; > + char shm_name[32]; > + int i; > + int ret = 0; > + > + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0); > + if (r == NULL) { > + MIF_LOG(ERR, "%s: Failed to allocate regions.", > + rte_vdev_device_name(pmd->vdev)); > + return -ENOMEM; > + } > + > + pmd->regions = r; > + pmd->regions_num = brn + 1; > + > + /* > + * Create shm for every region. Region 0 is reserved for descriptors. > + * Other regions contain buffers. > + */ > + for (i = 0; i < (brn + 1); i++) { > + r = &pmd->regions[i]; > + > + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings + > + pmd->run.num_m2s_rings) * > + (sizeof(memif_ring_t) + > + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0; For complex operations can you please prefer regular if check against ternary, with the help of the formatting, it is hard to follow this. what exactly "buffer_offset" is? For 'region 0' it calculates the size of the size of the 'region 0', otherwise 0. This is offset starting from where? And it seems only used for below size assignment. > + r->region_size = (i == 0) ? r->buffer_offset : > + (uint32_t)(pmd->run.buffer_size * I guess 'buffer_size' is buffer size per packet, to make this clear, what do you think to rename it 'packet_buffer_size' ? > + (1 << pmd->run.log2_ring_size) * > + (pmd->run.num_s2m_rings + > + pmd->run.num_m2s_rings)); There is an illusion of packet buffers can be in multiple regions (above comment implies it) but this logic assumes all packet buffer are in same region other than region 0, which gives us 'region 1' and this is already hardcoded in a few places. Is there a benefit of assuming there will be more regions, will it make simple to accept 'region 0' for rings and 'region 1' is for packet buffer? > + > + memset(shm_name, 0, sizeof(char) * 32); > + sprintf(shm_name, "memif region %d", i); snprintf please, and can you please add a define for shm_name size? <...> > +static void > +memif_init_rings(struct rte_eth_dev *dev) > +{ > + struct pmd_internals *pmd = dev->data->dev_private; > + memif_ring_t *ring; > + int i, j; > + uint16_t slot; > + > + for (i = 0; i < pmd->run.num_s2m_rings; i++) { > + ring = memif_get_ring(pmd, MEMIF_RING_S2M, i); > + ring->head = 0; > + ring->tail = 0; > + ring->cookie = MEMIF_COOKIE; > + ring->flags = 0; > + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) { > + slot = i * (1 << pmd->run.log2_ring_size) + j; > + ring->desc[j].region = 1; Why 'region' 1 is hardcoded for buffer, can it be possible to use multiple regions for buffers? <...> > +int > +memif_connect(struct rte_eth_dev *dev) > +{ > + struct pmd_internals *pmd = dev->data->dev_private; > + struct memif_region *mr; > + struct memif_queue *mq; > + int i; > + > + for (i = 0; i < pmd->regions_num; i++) { > + mr = pmd->regions + i; > + if (mr != NULL) { > + if (mr->addr == NULL) { > + if (mr->fd < 0) > + return -1; > + mr->addr = mmap(NULL, mr->region_size, > + PROT_READ | PROT_WRITE, > + MAP_SHARED, mr->fd, 0); > + if (mr->addr == NULL) > + return -1; > + } > + } > + } For one master work with multiple slaves, there should be an array of regions, right, one for for each slave. Also same concern is valid for Rx/Tx queues, master and slave uses same dev->data->tx_queues / dev->data->rx_queue, but crossover. I think a more complex logic required for multiple slave support. If multiple slave is not supported, is 'id' devarg still valid, or is it used at all? <...> > +static int > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, > + memif_interface_id_t id, uint32_t flags, > + const char *socket_filename, > + memif_log2_ring_size_t log2_ring_size, > + uint16_t buffer_size, const char *secret, > + struct ether_addr *eth_addr) > +{ > + int ret = 0; > + struct rte_eth_dev *eth_dev; > + struct rte_eth_dev_data *data; > + struct pmd_internals *pmd; > + const unsigned int numa_node = vdev->device.numa_node; > + const char *name = rte_vdev_device_name(vdev); > + > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) { > + MIF_LOG(ERR, "Zero-copy not supported."); > + return -1; > + } What is the plan for the zero copy? Can you please put the status & plan to the documentation? <...> > +struct memif_queue { > + struct rte_mempool *mempool; /**< mempool for RX packets */ > + uint16_t in_port; /**< port id */ > + > + struct pmd_internals *pmd; /**< device internals */ > + > + struct rte_intr_handle intr_handle; /**< interrupt handle */ > + > + /* ring info */ > + memif_ring_type_t type; /**< ring type */ > + memif_ring_t *ring; /**< pointer to ring */ > + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */ > + > + memif_region_index_t region; /**< shared memory region index */ > + memif_region_offset_t offset; /**< offset at which the queue begins */ Offset from where, it is start of the region but I think better to detail in the comment. > + > + uint16_t last_head; /**< last ring head */ > + uint16_t last_tail; /**< last ring tail */ ring already has head, tail variables, what is the difference in queue ones? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 49117A05D3 for ; Mon, 25 Mar 2019 21:58:43 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6A0D237B0; Mon, 25 Mar 2019 21:58:42 +0100 (CET) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id D35A7378E for ; Mon, 25 Mar 2019 21:58:39 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Mar 2019 13:58:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,270,1549958400"; d="scan'208";a="310342477" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.237.221.46]) ([10.237.221.46]) by orsmga005.jf.intel.com with ESMTP; 25 Mar 2019 13:58:37 -0700 To: Jakub Grajciar , dev@dpdk.org References: <20190220115254.18724-1-jgrajcia@cisco.com> <20190322115727.4358-1-jgrajcia@cisco.com> From: Ferruh Yigit Openpgp: preference=signencrypt Autocrypt: addr=ferruh.yigit@intel.com; prefer-encrypt=mutual; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJVBBMBAgA/AhsDBgsJCAcDAgYVCAIJCgsE FgIDAQIeAQIXgBYhBNI2U4dCLsKE45mBx/kz60PfE2EfBQJbughWBQkHwjOGAAoJEPkz60Pf E2Eft84QAIbKWqhgqRfoiw/BbXbA1+qm2o4UgkCRQ0yJgt9QsnbpOmPKydHH0ixCliNz1J8e mRXCkMini1bTpnzp7spOjQGLeAFkNFz6BMq8YF2mVWbGEDE9WgnAxZdi0eLY7ZQnHbE6AxKL SXmpe9INb6z3ztseFt7mqje/W/6DWYIMnH3Yz9KzxujFWDcq8UCAvPkxVQXLTMpauhFgYeEx Nub5HbvhxTfUkapLwRQsSd/HbywzqZ3s/bbYMjj5JO3tgMiM9g9HOjv1G2f1dQjHi5YQiTZl 1eIIqQ3pTic6ROaiZqNmQFXPsoOOFfXF8nN2zg8kl/sSdoXWHhama5hbwwtl1vdaygQYlmdK H2ueiFh/UvT3WG3waNv2eZiEbHV8Rk52Xyn2w1G90lV0fYC6Ket1Xjoch7kjwbx793Kz/RfQ rmBY8/S4DTGn3oq3dMdQY+b6+7VMUeLMMh2CXYO9ErkOq+qNTD1IY+cBAkXnaDbQfz0zbste ZGWH74FAZ9nCpDOqbRTrBL42aMGhfOWEyeA1x7+hl6JZfabBWAuf4nnCXuorKHzBXTrf7u7p fXsKQClWRW77PF1VmzrtKNVSytQAmlCWApQIw20AarFipXmVdIjHmJPU611WoyxZPb4JTOxx 5cv9B+nr/RIB+v5dcStyHCCwO1be7nBDdCgd4F6kTQPLuQINBFfWTL4BEACnNA29e8TarUsB L5n6eLZHXcFvVwNLVlirWOClHXf44o2KnN3ww+eBEmKVfEFo9MSuGDNHS8Zw1NiGMYxLIUgd U6gGrVVs/VrQWL82pbMk6jCj98N+BXIri+6K1z+AImz7ax7iF1kDgRAnFWU0znWWBgM2mM8Y gDjcxfXk4sCKnvf6Gjo08Ey5zmqx7dekAKU2EEp8Q1EJY3jbymLdZWRP4AFFMTS1rGMk0/tt v71NBg1GobCcbNfn9chK/jhqxYhAJqq86RdJQkt3/9x1U1Oq0vXCt4JVVHmkxePtUiuWTTt+ aYlUAsKYZsWvncExvw77x2ArYDmaK0yfjh37wp0lY7DOJHFxoyT8tyWZlLci/VMRG2Ja33xj 0CN4C1yBg+QDeV3QFxQo42iA/ykdXPUR3ezmsND3XKvVLTC4DNb3V/EZQ7jBj64+bEK0VW4G B31VP00ApNQvSoczsIOAKdk97RNbpmPw6q10ILIB+9T1xbnFYzshzGF17oC0/GENIHATx8vZ masOZoDiOZQpeneLgnFE9JfzhLTxv6wNZcc/HLXRQVTkDsQr8ERtkAoHCf1E5+b5Yr7pfnE4 YuhET746o25S53ELUYPIs49qoJsEJL34/oexMfPGyPIlrbufiNyty5jc/1MRwUlhJlJ5IOHy ZUa+6CLR7GdImusFkPJUJwARAQABiQI8BBgBAgAmAhsMFiEE0jZTh0IuwoTjmYHH+TPrQ98T YR8FAlu6CHAFCQXE7zIACgkQ+TPrQ98TYR9nXxAAqNBgkYNyGuWUuy0GwDQCbu3iiMyH1+D7 llafPcK4NYy1Z4AYuVwC9nmLaoj+ozdqS3ncRo57ncRsKEJC46nDJJZYZ5LSJVn63Y3NBF86 lxQAgjj2oyZEwaLKtKbAFsXL43jv1pUGgSvWwYtDwHITXXFQto9rZEuUDRFSx4sg9OR+Q6/6 LY+nQQ3OdHlBkflzYMPcWgDcvcTAO6yasLEUf7UcYoSWTyMYjLB4QuNlXzTswzGVMssJF/vo V8lD1eqqaSUWG3STF6GVLQOr1NLvN5+kUBiEStHFxBpgSCvYY9sNV8FS6N24CAWMBl+10W+D 2h1yiiP5dOdPcBDYKsgqDD91/sP0WdyMJkwdQJtD49f9f+lYloxHnSAxMleOpyscg1pldw+i mPaUY1bmIknLhhkqfMmjywQOXpac5LRMibAAYkcB8v7y3kwELnt8mhqqZy6LUsqcWygNbH/W K3GGt5tRpeIXeJ25x8gg5EBQ0Jnvp/IbBYQfPLtXH0Myq2QuAhk/1q2yEIbVjS+7iowEZNyE 56K63WBJxsJPB2mvmLgn98GqB4G6GufP1ndS0XDti/2K0o8rep9xoY/JDGi0n0L0tk9BHyoP Y7kaEpu7UyY3nVdRLe5H1/MnFG8hdJ97WqnPS0buYZlrbTV0nRFL/NI2VABl18vEEXvNQiO+ vM8= Message-ID: <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> Date: Mon, 25 Mar 2019 20:58:36 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <20190322115727.4358-1-jgrajcia@cisco.com> Content-Type: text/plain; charset="UTF-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190325205836.vzXKi3Z_m2GmXqc6H0LNjrEQaAv3mTQ9iA8QIaz3wzs@z> On 3/22/2019 11:57 AM, Jakub Grajciar wrote: > Memory interface (memif), provides high performance > packet transfer over shared memory. > > Signed-off-by: Jakub Grajciar <...> > @@ -0,0 +1,200 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2018-2019 Cisco Systems, Inc. > + > +====================== > +Memif Poll Mode Driver > +====================== > +Shared memory packet interface (memif) PMD allows for DPDK and any other client > +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is > +Linux only. > + > +The created device transmits packets in a raw format. It can be used with > +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is > +supported in DPDK memif implementation. > + > +Memif works in two roles: master and slave. Slave connects to master over an > +existing socket. It is also a producer of shared memory file and initializes > +the shared memory. Master creates the socket and listens for any slave > +connection requests. The socket may already exist on the system. Be sure to > +remove any such sockets, if you are creating a master interface, or you will > +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``, Can it be possible to remove this existing socket on successfully termination of the dpdk application with 'master' role? Otherwise each time to run a dpdk app, requires to delete the socket file first. > +which removes memif interface, will also remove a listener socket, if it is > +not being used by any other interface. > + > +The method to enable one or more interfaces is to use the > +``--vdev=net_memif0`` option on the DPDK application command line. Each > +``--vdev=net_memif1`` option given will create an interface named net_memif0, > +net_memif1, and so on. Memif uses unix domain socket to transmit control > +messages. Each memif has a unique id per socket. If you are connecting multiple > +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``, > +etc. Note that if you assign a socket to a master interface it becomes a > +listener socket. Listener socket can not be used by a slave interface on same > +client. When I connect two slaves, id=0 & id=1 to the master, both master and second slave crashed as soon as second slave connected, is this a know issue? Can you please check this? master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev net_memif0,role=slave,id=0 -- -i slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev net_memif0,role=slave,id=1 -- -i <...> > +Example: testpmd and testpmd > +---------------------------- > +In this example we run two instances of testpmd application and transmit packets over memif. How this play with multi process support? When I run a secondary app when memif PMD is enabled in primary, secondary process crashes. Can you please check and ensure at least nothing crashes when a secondary app run? > + > +First create ``master`` interface:: > + > + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i > + > +Now create ``slave`` interface (master must be already running so the slave will connect):: > + > + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i > + > +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only':: > + > + testpmd> set fwd rxonly > + testpmd> start > + > + testpmd> set fwd txonly > + testpmd> start Would it be useful to add loopback option to the PMD, for testing/debug ? Also I am getting low performance numbers above, comparing the ring pmd for example, is there any performance target for the pmd? Same forwarding core for both testpmds, this is terrible, either something is wrong or I am doing something wrong, it is ~40Kpps Different forwarding cores for each testpmd, still low, !18Mpps <...> > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > +CFLAGS += -DALLOW_EXPERIMENTAL_API Can you please add here which experimental APIs are called (as a comment), this help to keep trace of them in long term? <...> > +/** > + * Buffer descriptor. > + */ > +typedef struct __rte_packed { > + uint16_t flags; /**< flags */ > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */ Is this define used? <...> > + > +static struct rte_vdev_driver pmd_memif_drv; Same comment from previous review, is this forward deceleration required? <...> > +static memif_ring_t * > +memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num) > +{ > + /* rings only in region 0 */ > + void *p = pmd->regions[0].addr; > + int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) * > + (1 << pmd->run.log2_ring_size); > + > + p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size; According above code, I guess layout is as following [1], can you please correct if it is wrong? Can you please put this information somewhere, possibly the function comment that allocates it, so that everyone doesn't need to figure out. [1] region 0: +-----------------------+ | S2M rings | M2S rings | +-----------------------+ S2M OR M2S Rings: +-----------------------------------------+ | ring 0 | ring 1 | ring num_s2m_rings - 1| +-----------------------------------------+ ring 0: +-----------------------------------------------------+ | Ring Header | (1 << pmd->run.log2_ring_size) * desc | +-----------------------------------------------------+ region 1: +-------------------------------------------------------------------------+ | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 | +-------------------------------------------------------------------------+ Is there any order on packet buffers? <...> > + while (n_slots && n_rx_pkts < nb_pkts) { > + mbuf_head = rte_pktmbuf_alloc(mq->mempool); > + if (unlikely(mbuf_head == NULL)) > + goto no_free_bufs; > + mbuf = mbuf_head; > + mbuf->port = mq->in_port; > + > + next_slot: > + s0 = cur_slot & mask; > + d0 = &ring->desc[s0]; > + > + src_len = d0->length; > + dst_off = 0; > + src_off = 0; > + > + do { > + dst_len = mbuf_size - dst_off; > + if (dst_len == 0) { > + dst_off = 0; > + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM; > + > + mbuf = rte_pktmbuf_alloc(mq->mempool); > + if (unlikely(mbuf == NULL)) > + goto no_free_bufs; > + mbuf->port = mq->in_port; > + rte_pktmbuf_chain(mbuf_head, mbuf); > + } > + cp_len = RTE_MIN(dst_len, src_len); > + > + rte_pktmbuf_pkt_len(mbuf) = > + rte_pktmbuf_data_len(mbuf) += cp_len; Can you please make this two lines to prevent confusion? Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'? 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not set to 'cp_len'... <...> > +void > +memif_free_regions(struct pmd_internals *pmd) > +{ > + int i; > + struct memif_region *r; > + > + for (i = 0; i < pmd->regions_num; i++) { > + r = pmd->regions + i; > + if (r == NULL) > + return; 'r' can be NULL only when 'pmd->region' is NULL and 'i == 0', so is it better to check 'pmd->region' is NULL check before the loop? > + if (r->addr == NULL) > + return; > + munmap(r->addr, r->region_size); > + if (r->fd > 0) { > + close(r->fd); > + r->fd = -1; > + } > + } > + rte_free(pmd->regions); > +} > + > +static int > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) > +{ > + struct memif_region *r; > + char shm_name[32]; > + int i; > + int ret = 0; > + > + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0); > + if (r == NULL) { > + MIF_LOG(ERR, "%s: Failed to allocate regions.", > + rte_vdev_device_name(pmd->vdev)); > + return -ENOMEM; > + } > + > + pmd->regions = r; > + pmd->regions_num = brn + 1; > + > + /* > + * Create shm for every region. Region 0 is reserved for descriptors. > + * Other regions contain buffers. > + */ > + for (i = 0; i < (brn + 1); i++) { > + r = &pmd->regions[i]; > + > + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings + > + pmd->run.num_m2s_rings) * > + (sizeof(memif_ring_t) + > + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0; For complex operations can you please prefer regular if check against ternary, with the help of the formatting, it is hard to follow this. what exactly "buffer_offset" is? For 'region 0' it calculates the size of the size of the 'region 0', otherwise 0. This is offset starting from where? And it seems only used for below size assignment. > + r->region_size = (i == 0) ? r->buffer_offset : > + (uint32_t)(pmd->run.buffer_size * I guess 'buffer_size' is buffer size per packet, to make this clear, what do you think to rename it 'packet_buffer_size' ? > + (1 << pmd->run.log2_ring_size) * > + (pmd->run.num_s2m_rings + > + pmd->run.num_m2s_rings)); There is an illusion of packet buffers can be in multiple regions (above comment implies it) but this logic assumes all packet buffer are in same region other than region 0, which gives us 'region 1' and this is already hardcoded in a few places. Is there a benefit of assuming there will be more regions, will it make simple to accept 'region 0' for rings and 'region 1' is for packet buffer? > + > + memset(shm_name, 0, sizeof(char) * 32); > + sprintf(shm_name, "memif region %d", i); snprintf please, and can you please add a define for shm_name size? <...> > +static void > +memif_init_rings(struct rte_eth_dev *dev) > +{ > + struct pmd_internals *pmd = dev->data->dev_private; > + memif_ring_t *ring; > + int i, j; > + uint16_t slot; > + > + for (i = 0; i < pmd->run.num_s2m_rings; i++) { > + ring = memif_get_ring(pmd, MEMIF_RING_S2M, i); > + ring->head = 0; > + ring->tail = 0; > + ring->cookie = MEMIF_COOKIE; > + ring->flags = 0; > + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) { > + slot = i * (1 << pmd->run.log2_ring_size) + j; > + ring->desc[j].region = 1; Why 'region' 1 is hardcoded for buffer, can it be possible to use multiple regions for buffers? <...> > +int > +memif_connect(struct rte_eth_dev *dev) > +{ > + struct pmd_internals *pmd = dev->data->dev_private; > + struct memif_region *mr; > + struct memif_queue *mq; > + int i; > + > + for (i = 0; i < pmd->regions_num; i++) { > + mr = pmd->regions + i; > + if (mr != NULL) { > + if (mr->addr == NULL) { > + if (mr->fd < 0) > + return -1; > + mr->addr = mmap(NULL, mr->region_size, > + PROT_READ | PROT_WRITE, > + MAP_SHARED, mr->fd, 0); > + if (mr->addr == NULL) > + return -1; > + } > + } > + } For one master work with multiple slaves, there should be an array of regions, right, one for for each slave. Also same concern is valid for Rx/Tx queues, master and slave uses same dev->data->tx_queues / dev->data->rx_queue, but crossover. I think a more complex logic required for multiple slave support. If multiple slave is not supported, is 'id' devarg still valid, or is it used at all? <...> > +static int > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, > + memif_interface_id_t id, uint32_t flags, > + const char *socket_filename, > + memif_log2_ring_size_t log2_ring_size, > + uint16_t buffer_size, const char *secret, > + struct ether_addr *eth_addr) > +{ > + int ret = 0; > + struct rte_eth_dev *eth_dev; > + struct rte_eth_dev_data *data; > + struct pmd_internals *pmd; > + const unsigned int numa_node = vdev->device.numa_node; > + const char *name = rte_vdev_device_name(vdev); > + > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) { > + MIF_LOG(ERR, "Zero-copy not supported."); > + return -1; > + } What is the plan for the zero copy? Can you please put the status & plan to the documentation? <...> > +struct memif_queue { > + struct rte_mempool *mempool; /**< mempool for RX packets */ > + uint16_t in_port; /**< port id */ > + > + struct pmd_internals *pmd; /**< device internals */ > + > + struct rte_intr_handle intr_handle; /**< interrupt handle */ > + > + /* ring info */ > + memif_ring_type_t type; /**< ring type */ > + memif_ring_t *ring; /**< pointer to ring */ > + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */ > + > + memif_region_index_t region; /**< shared memory region index */ > + memif_region_offset_t offset; /**< offset at which the queue begins */ Offset from where, it is start of the region but I think better to detail in the comment. > + > + uint16_t last_head; /**< last ring head */ > + uint16_t last_tail; /**< last ring tail */ ring already has head, tail variables, what is the difference in queue ones?