From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 5C322A0AC5 for ; Thu, 2 May 2019 14:36:21 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2A2E65B26; Thu, 2 May 2019 14:36:21 +0200 (CEST) Received: from alln-iport-4.cisco.com (alln-iport-4.cisco.com [173.37.142.91]) by dpdk.org (Postfix) with ESMTP id 67D6E5B12 for ; Thu, 2 May 2019 14:36:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=14416; q=dns/txt; s=iport; t=1556800579; x=1558010179; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=X6IraOz7u+wW9m1iX3PLJZp41rIKnmbMhSjMLG/ehxw=; b=iWf3Bm9YIh0bu7sznTZ0hAsKSg+n9xF2st4DFF1pGWgFk2G/xJWdHSej Cmp/SkB4cthPY2AvdZ1xwFodEnVP5PMfl18yOuwv+GPH3Hx/R5ZlF99qQ 7hJsPTrG3Wls542gRu+0ginimSpABtgqNK5vLhwEyvA4XPlFo2/SrB6Ip M=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AUAAAQ5Mpc/5pdJa1bChoBAQEBAQI?= =?us-ascii?q?BAQEBBwIBAQEBgVEFAQEBAQsBghCBbSgKjCKKeoINfoNhAZNwFIFnDgEBhG0?= =?us-ascii?q?ChjMjNAkOAQMBAQQBAQIBAm0ohUoBAQEEJ00VAgEIEQQBAQEJJQ8jHQgCBAE?= =?us-ascii?q?JCYUtrlkzhUeEbYEyAYtLF4F/gRGCXTU+hAkQK4ViBIp9CAYEhyVPgSCFcIw?= =?us-ascii?q?pZQkCHYFskkAbgg6GOYx0jBSUYgIRFYEwHziBVnAVgyeCGxeOH0ExkgYpgQi?= =?us-ascii?q?BIQEB?= X-IronPort-AV: E=Sophos;i="5.60,421,1549929600"; d="scan'208";a="266345672" Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by alln-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 02 May 2019 12:35:57 +0000 Received: from XCH-ALN-017.cisco.com (xch-aln-017.cisco.com [173.36.7.27]) by rcdn-core-3.cisco.com (8.15.2/8.15.2) with ESMTPS id x42CZv5N031958 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 2 May 2019 12:35:57 GMT Received: from xch-rcd-017.cisco.com (173.37.102.27) by XCH-ALN-017.cisco.com (173.36.7.27) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 2 May 2019 07:35:56 -0500 Received: from xch-rcd-017.cisco.com ([173.37.102.27]) by XCH-RCD-017.cisco.com ([173.37.102.27]) with mapi id 15.00.1473.003; Thu, 2 May 2019 07:35:56 -0500 From: "Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)" To: Ferruh Yigit , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC v5] /net: memory interface (memif) Thread-Index: AQHU4KZ16NJCh3r9PESlg1+aACrGcKYdLYsAgBgjuyE= Date: Thu, 2 May 2019 12:35:56 +0000 Message-ID: <1556800556778.14516@cisco.com> References: <20190220115254.18724-1-jgrajcia@cisco.com> <20190322115727.4358-1-jgrajcia@cisco.com>, <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> In-Reply-To: <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.61.196.31] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Outbound-SMTP-Client: 173.36.7.27, xch-aln-017.cisco.com X-Outbound-Node: rcdn-core-3.cisco.com Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190502123556.rcM1XivtkfYtLAyRtCVwNpD0wprDNewqT-XSvS5WuIQ@z> =0A= ________________________________________=0A= From: Ferruh Yigit =0A= Sent: Monday, March 25, 2019 9:58 PM=0A= To: Jakub Grajciar; dev@dpdk.org=0A= Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)=0A= =0A= On 3/22/2019 11:57 AM, Jakub Grajciar wrote:=0A= > Memory interface (memif), provides high performance=0A= > packet transfer over shared memory.=0A= >=0A= > Signed-off-by: Jakub Grajciar =0A= =0A= <...>=0A= =0A= > @@ -0,0 +1,200 @@=0A= > +.. SPDX-License-Identifier: BSD-3-Clause=0A= > + Copyright(c) 2018-2019 Cisco Systems, Inc.=0A= > +=0A= > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= > +Memif Poll Mode Driver=0A= > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= > +Shared memory packet interface (memif) PMD allows for DPDK and any other= client=0A= > +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Me= mif is=0A= > +Linux only.=0A= > +=0A= > +The created device transmits packets in a raw format. It can be used wit= h=0A= > +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mo= de is=0A= > +supported in DPDK memif implementation.=0A= > +=0A= > +Memif works in two roles: master and slave. Slave connects to master ove= r an=0A= > +existing socket. It is also a producer of shared memory file and initial= izes=0A= > +the shared memory. Master creates the socket and listens for any slave= =0A= > +connection requests. The socket may already exist on the system. Be sure= to=0A= > +remove any such sockets, if you are creating a master interface, or you = will=0A= > +see an "Address already in use" error. Function ``rte_pmd_memif_remove()= ``,=0A= =0A= Can it be possible to remove this existing socket on successfully terminati= on of=0A= the dpdk application with 'master' role?=0A= Otherwise each time to run a dpdk app, requires to delete the socket file f= irst.=0A= =0A= =0A= net_memif is the same case as net_virtio_user, I'd use the same workaro= und.=0A= testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.=0A= =0A= =0A= > +which removes memif interface, will also remove a listener socket, if it= is=0A= > +not being used by any other interface.=0A= > +=0A= > +The method to enable one or more interfaces is to use the=0A= > +``--vdev=3Dnet_memif0`` option on the DPDK application command line. Eac= h=0A= > +``--vdev=3Dnet_memif1`` option given will create an interface named net_= memif0,=0A= > +net_memif1, and so on. Memif uses unix domain socket to transmit control= =0A= > +messages. Each memif has a unique id per socket. If you are connecting m= ultiple=0A= > +interfaces using same socket, be sure to specify unique ids ``id=3D0``, = ``id=3D1``,=0A= > +etc. Note that if you assign a socket to a master interface it becomes a= =0A= > +listener socket. Listener socket can not be used by a slave interface on= same=0A= > +client.=0A= =0A= When I connect two slaves, id=3D0 & id=3D1 to the master, both master and s= econd=0A= slave crashed as soon as second slave connected, is this a know issue? Can = you=0A= please check this?=0A= =0A= master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=3Dmaster= -- -i=0A= slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=3Dslave --vdev=0A= net_memif0,role=3Dslave,id=3D0 -- -i=0A= slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=3Dslave2 --vdev= =0A= net_memif0,role=3Dslave,id=3D1 -- -i=0A= =0A= <...>=0A= =0A= =0A= Each interface can be connected to one peer interface at the same time,= I'll=0A= add more details to the documentation.=0A= =0A= =0A= > +Example: testpmd and testpmd=0A= > +----------------------------=0A= > +In this example we run two instances of testpmd application and transmit= packets over memif.=0A= =0A= How this play with multi process support? When I run a secondary app when m= emif=0A= PMD is enabled in primary, secondary process crashes.=0A= Can you please check and ensure at least nothing crashes when a secondary a= pp run?=0A= =0A= =0A= For now I'd disable multi-process support, so we can get the patch appl= ied, then=0A= provide multi-process support in a separate patch.=0A= =0A= =0A= > +=0A= > +First create ``master`` interface::=0A= > +=0A= > + #./testpmd -l 0-1 --proc-type=3Dprimary --file-prefix=3Dpmd1 --vdev= =3Dnet_memif,role=3Dmaster -- -i=0A= > +=0A= > +Now create ``slave`` interface (master must be already running so the sl= ave will connect)::=0A= > +=0A= > + #./testpmd -l 2-3 --proc-type=3Dprimary --file-prefix=3Dpmd2 --vdev= =3Dnet_memif -- -i=0A= > +=0A= > +Set forwarding mode in one of the instances to 'rx only' and the other t= o 'tx only'::=0A= > +=0A= > + testpmd> set fwd rxonly=0A= > + testpmd> start=0A= > +=0A= > + testpmd> set fwd txonly=0A= > + testpmd> start=0A= =0A= Would it be useful to add loopback option to the PMD, for testing/debug ?= =0A= =0A= Also I am getting low performance numbers above, comparing the ring pmd for= =0A= example, is there any performance target for the pmd?=0A= Same forwarding core for both testpmds, this is terrible, either something = is=0A= wrong or I am doing something wrong, it is ~40Kpps=0A= Different forwarding cores for each testpmd, still low, !18Mpps=0A= =0A= <...>=0A= =0A= =0A= The difference between ring pmd and memif pmd is that while memif trans= fers packets,=0A= ring transfers whole buffers. (Also ring can not be used process to pro= cess)=0A= That means, it does not have to alloc/free buffers.=0A= I did a simple test where I modified the code so the tx function will o= nly free given buffers=0A= and rx allocates new buffers. On my machine, this can only handle 45-50= Mpps.=0A= =0A= With that in mind, I believe that 23Mpps is fine performance. No perfor= mance target is=0A= defined, the goal is to be as fast as possible.=0A= =0A= The cause of the issue, where traffic drops to 40Kpps while using the s= ame core for both applications,=0A= is that within the timeslice given to testpmd process, memif driver fil= ls its queues,=0A= but keeps polling, not giving the other process a chance to receive the= packets.=0A= Same applies to rx side, where an empty queue is polled over and over a= gain. In such configuration,=0A= interrupt rx-mode should be used instead of polling. Or the application= could suspend=0A= the port.=0A= =0A= =0A= > +/**=0A= > + * Buffer descriptor.=0A= > + */=0A= > +typedef struct __rte_packed {=0A= > + uint16_t flags; /**< flags */=0A= > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buf= fer */=0A= =0A= Is this define used?=0A= =0A= <...>=0A= =0A= =0A= Yes, see eth_memif_tx() and eth_memif_rx()=0A= =0A= =0A= > + while (n_slots && n_rx_pkts < nb_pkts) {=0A= > + mbuf_head =3D rte_pktmbuf_alloc(mq->mempool);=0A= > + if (unlikely(mbuf_head =3D=3D NULL))=0A= > + goto no_free_bufs;=0A= > + mbuf =3D mbuf_head;=0A= > + mbuf->port =3D mq->in_port;=0A= > +=0A= > + next_slot:=0A= > + s0 =3D cur_slot & mask;=0A= > + d0 =3D &ring->desc[s0];=0A= > +=0A= > + src_len =3D d0->length;=0A= > + dst_off =3D 0;=0A= > + src_off =3D 0;=0A= > +=0A= > + do {=0A= > + dst_len =3D mbuf_size - dst_off;=0A= > + if (dst_len =3D=3D 0) {=0A= > + dst_off =3D 0;=0A= > + dst_len =3D mbuf_size + RTE_PKTMBUF_HEADROO= M;=0A= > +=0A= > + mbuf =3D rte_pktmbuf_alloc(mq->mempool);=0A= > + if (unlikely(mbuf =3D=3D NULL))=0A= > + goto no_free_bufs;=0A= > + mbuf->port =3D mq->in_port;=0A= > + rte_pktmbuf_chain(mbuf_head, mbuf);=0A= > + }=0A= > + cp_len =3D RTE_MIN(dst_len, src_len);=0A= > +=0A= > + rte_pktmbuf_pkt_len(mbuf) =3D=0A= > + rte_pktmbuf_data_len(mbuf) +=3D cp_len;=0A= =0A= Can you please make this two lines to prevent confusion?=0A= =0A= Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?=0A= 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' i= s not=0A= set to 'cp_len'...=0A= =0A= <...>=0A= =0A= =0A= Fix in next patch.=0A= =0A= =0A= > + if (r->addr =3D=3D NULL)=0A= > + return;=0A= > + munmap(r->addr, r->region_size);=0A= > + if (r->fd > 0) {=0A= > + close(r->fd);=0A= > + r->fd =3D -1;=0A= > + }=0A= > + }=0A= > + rte_free(pmd->regions);=0A= > +}=0A= > +=0A= > +static int=0A= > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)=0A= > +{=0A= > + struct memif_region *r;=0A= > + char shm_name[32];=0A= > + int i;=0A= > + int ret =3D 0;=0A= > +=0A= > + r =3D rte_zmalloc("memif_region", sizeof(struct memif_region) * (br= n + 1), 0);=0A= > + if (r =3D=3D NULL) {=0A= > + MIF_LOG(ERR, "%s: Failed to allocate regions.",=0A= > + rte_vdev_device_name(pmd->vdev));=0A= > + return -ENOMEM;=0A= > + }=0A= > +=0A= > + pmd->regions =3D r;=0A= > + pmd->regions_num =3D brn + 1;=0A= > +=0A= > + /*=0A= > + * Create shm for every region. Region 0 is reserved for descriptor= s.=0A= > + * Other regions contain buffers.=0A= > + */=0A= > + for (i =3D 0; i < (brn + 1); i++) {=0A= > + r =3D &pmd->regions[i];=0A= > +=0A= > + r->buffer_offset =3D (i =3D=3D 0) ? (pmd->run.num_s2m_rings= +=0A= > + pmd->run.num_m2s_rings) *=0A= > + (sizeof(memif_ring_t) +=0A= > + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size))= : 0;=0A= =0A= what exactly "buffer_offset" is? For 'region 0' it calculates the size of t= he=0A= size of the 'region 0', otherwise 0. This is offset starting from where? An= d it=0A= seems only used for below size assignment.=0A= =0A= =0A= It is possible to use single region for one connection (non-zero-copy) = to reduce=0A= the number of files opened. It will be implemented in the next patch.= =0A= 'buffer_offset' is offset from the start of shared memory file to the f= irst packet buffer.=0A= =0A= =0A= > + (1 << pmd->run.log2_ring_size) *=0A= > + (pmd->run.num_s2m_rings +=0A= > + pmd->run.num_m2s_rings));=0A= =0A= There is an illusion of packet buffers can be in multiple regions (above co= mment=0A= implies it) but this logic assumes all packet buffer are in same region oth= er=0A= than region 0, which gives us 'region 1' and this is already hardcoded in a= few=0A= places. Is there a benefit of assuming there will be more regions, will it = make=0A= simple to accept 'region 0' for rings and 'region 1' is for packet buffer?= =0A= =0A= =0A= Multiple regions are required in case of zero-copy slave. The zero-copy= will=0A= expose dpdk shared memory, where each memseg/memseg_list will be=0A= represended by memif_region.=0A= =0A= =0A= > +int=0A= > +memif_connect(struct rte_eth_dev *dev)=0A= > +{=0A= > + struct pmd_internals *pmd =3D dev->data->dev_private;=0A= > + struct memif_region *mr;=0A= > + struct memif_queue *mq;=0A= > + int i;=0A= > +=0A= > + for (i =3D 0; i < pmd->regions_num; i++) {=0A= > + mr =3D pmd->regions + i;=0A= > + if (mr !=3D NULL) {=0A= > + if (mr->addr =3D=3D NULL) {=0A= > + if (mr->fd < 0)=0A= > + return -1;=0A= > + mr->addr =3D mmap(NULL, mr->region_size,=0A= > + PROT_READ | PROT_WRITE,=0A= > + MAP_SHARED, mr->fd, 0);=0A= > + if (mr->addr =3D=3D NULL)=0A= > + return -1;=0A= > + }=0A= > + }=0A= > + }=0A= =0A= If multiple slave is not supported, is 'id' devarg still valid, or is it us= ed at=0A= all?=0A= =0A= =0A= 'id' is used to identify peer interface. Interface with id=3D0 will onl= y connect to an=0A= interface with id=3D0 and so on...=0A= =0A= =0A= <...>=0A= =0A= > +static int=0A= > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,=0A= > + memif_interface_id_t id, uint32_t flags,=0A= > + const char *socket_filename,=0A= > + memif_log2_ring_size_t log2_ring_size,=0A= > + uint16_t buffer_size, const char *secret,=0A= > + struct ether_addr *eth_addr)=0A= > +{=0A= > + int ret =3D 0;=0A= > + struct rte_eth_dev *eth_dev;=0A= > + struct rte_eth_dev_data *data;=0A= > + struct pmd_internals *pmd;=0A= > + const unsigned int numa_node =3D vdev->device.numa_node;=0A= > + const char *name =3D rte_vdev_device_name(vdev);=0A= > +=0A= > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {=0A= > + MIF_LOG(ERR, "Zero-copy not supported.");=0A= > + return -1;=0A= > + }=0A= =0A= What is the plan for the zero copy?=0A= Can you please put the status & plan to the documentation?=0A= =0A= <...>=0A= =0A= =0A= I don't think that putting the status in the docs is needed. Zero-copy = slave is in testing stage=0A= and I should be able to submit a patch once we get this one applied.=0A= =0A= =0A= > +=0A= > + uint16_t last_head; /**< last ring head */=0A= > + uint16_t last_tail; /**< last ring tail */=0A= =0A= ring already has head, tail variables, what is the difference in queue ones= ?=