From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20068.outbound.protection.outlook.com [40.107.2.68]) by dpdk.org (Postfix) with ESMTP id 29E452B9C for ; Fri, 3 May 2019 06:27:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=moXYwhXOCpsDMMbEAVHXuvoT01w5H7mbv12zAGCrsVs=; b=ccfL6L6C799wbzNtlSUt+L8bC5hyzBZa+qH0nka7PSrmrsMfAuwiLVBUpf+PRZmiItj2/W0wIYXmmvupXdIwklzg4CCMYkeJkziCWN+7E9JvOOth38OTOtbdtTSXf6bFb4uhBje5Fr+/6CwbSXdPULpTeiQQasO39JRzk/xvirY= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.152) by VE1PR08MB5213.eurprd08.prod.outlook.com (10.255.159.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Fri, 3 May 2019 04:27:37 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::f5e3:39bc:e7d9:dfea]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::f5e3:39bc:e7d9:dfea%5]) with mapi id 15.20.1856.012; Fri, 3 May 2019 04:27:37 +0000 From: Honnappa Nagarahalli To: "Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)" , Ferruh Yigit , "dev@dpdk.org" , Honnappa Nagarahalli CC: nd , nd Thread-Topic: [dpdk-dev] [RFC v5] /net: memory interface (memif) Thread-Index: AQHU4KZ16NJCh3r9PESlg1+aACrGcKYdLYsAgBgjuyGAI70qEA== Date: Fri, 3 May 2019 04:27:37 +0000 Message-ID: References: <20190220115254.18724-1-jgrajcia@cisco.com> <20190322115727.4358-1-jgrajcia@cisco.com>, <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> <1556800556778.14516@cisco.com> In-Reply-To: <1556800556778.14516@cisco.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 2acbace9-cf3b-40f5-5179-08d6cf7fac5b x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:VE1PR08MB5213; x-ms-traffictypediagnostic: VE1PR08MB5213: nodisclaimer: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 0026334A56 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(39860400002)(366004)(136003)(396003)(189003)(199004)(66946007)(2906002)(6436002)(9686003)(54906003)(55016002)(316002)(66446008)(229853002)(110136005)(33656002)(14454004)(99286004)(64756008)(2501003)(66066001)(68736007)(72206003)(14444005)(71190400001)(71200400001)(66556008)(66476007)(256004)(76116006)(73956011)(7736002)(52536014)(3846002)(6116002)(446003)(30864003)(11346002)(81166006)(81156014)(508600001)(6246003)(7696005)(8676002)(486006)(76176011)(476003)(86362001)(25786009)(74316002)(102836004)(26005)(8936002)(305945005)(186003)(4326008)(6506007)(53936002)(53546011)(5660300002); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB5213; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: KQnidVXhWgLPOh13AFuL0+4X7uVq7FKSmrAwXk0pOW/dmi7gqGaJAMSnHvxrlETVbxgSCFs2KrbF9MAnhCBISM7jwbGQWNzknMRKL+EiBERmIlRZSAkH8vXBa6xg2REi2Jnd1yyWWZtojMhm2mE7+BdPSwQ/OHXZSz4dmcePY6IZDVxGszETGlM7FVdoB8Y6sAuJIRvGYmZMc8Ira6myUvMV2YRUI293X4ebKZBMv4X1Fl50T3ARpPBCMJh23KnEPNqjInjDztGLk1RigqtqdviQaHWGj012qACWpWeDRMIZNo3VAtvW0/oZI1QXvozXpYRJoR6Vhpm+H/f24/Fk5Rq8gcqSn3eLVl8UyPWkwRQqUEuK2ZdAXG//srPZhq3bRpjtmlborTV1Saxi48zL9urb5lMyS/Zu/WacVoN4jCk= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2acbace9-cf3b-40f5-5179-08d6cf7fac5b X-MS-Exchange-CrossTenant-originalarrivaltime: 03 May 2019 04:27:37.8035 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5213 Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 May 2019 04:27:40 -0000 > On 3/22/2019 11:57 AM, Jakub Grajciar wrote: > > Memory interface (memif), provides high performance packet transfer > > over shared memory. > > > > Signed-off-by: Jakub Grajciar >=20 > <...> >=20 > > @@ -0,0 +1,200 @@ > > +.. SPDX-License-Identifier: BSD-3-Clause > > + Copyright(c) 2018-2019 Cisco Systems, Inc. > > + > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +Memif Poll Mode Driver > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +Shared memory packet interface (memif) PMD allows for DPDK and any > > +other client using memif (DPDK, VPP, libmemif) to communicate using > > +shared memory. Memif is Linux only. > > + > > +The created device transmits packets in a raw format. It can be used > > +with Ethernet mode, IP mode, or Punt/Inject. At this moment, only > > +Ethernet mode is supported in DPDK memif implementation. > > + > > +Memif works in two roles: master and slave. Slave connects to master > > +over an existing socket. It is also a producer of shared memory file > > +and initializes the shared memory. Master creates the socket and > > +listens for any slave connection requests. The socket may already > > +exist on the system. Be sure to remove any such sockets, if you are > > +creating a master interface, or you will see an "Address already in > > +use" error. Function ``rte_pmd_memif_remove()``, >=20 > Can it be possible to remove this existing socket on successfully termina= tion > of the dpdk application with 'master' role? > Otherwise each time to run a dpdk app, requires to delete the socket file= first. >=20 >=20 > net_memif is the same case as net_virtio_user, I'd use the same > workaround. > testpmd.c:pmd_test_exit() this, however, is only valid for testpmd. >=20 >=20 > > +which removes memif interface, will also remove a listener socket, if > > +it is not being used by any other interface. > > + > > +The method to enable one or more interfaces is to use the > > +``--vdev=3Dnet_memif0`` option on the DPDK application command line. > > +Each ``--vdev=3Dnet_memif1`` option given will create an interface > > +named net_memif0, net_memif1, and so on. Memif uses unix domain > > +socket to transmit control messages. Each memif has a unique id per > > +socket. If you are connecting multiple interfaces using same socket, > > +be sure to specify unique ids ``id=3D0``, ``id=3D1``, etc. Note that i= f > > +you assign a socket to a master interface it becomes a listener > > +socket. Listener socket can not be used by a slave interface on same c= lient. >=20 > When I connect two slaves, id=3D0 & id=3D1 to the master, both master and > second slave crashed as soon as second slave connected, is this a know is= sue? > Can you please check this? >=20 > master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=3Dmast= er > -- -i > slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=3Dslave --vdev > net_memif0,role=3Dslave,id=3D0 -- -i > slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=3Dslave2 --vdev > net_memif0,role=3Dslave,id=3D1 -- -i >=20 > <...> >=20 >=20 > Each interface can be connected to one peer interface at the same tim= e, I'll > add more details to the documentation. >=20 >=20 > > +Example: testpmd and testpmd > > +---------------------------- > > +In this example we run two instances of testpmd application and transm= it > packets over memif. >=20 > How this play with multi process support? When I run a secondary app when > memif PMD is enabled in primary, secondary process crashes. > Can you please check and ensure at least nothing crashes when a secondary > app run? >=20 >=20 > For now I'd disable multi-process support, so we can get the patch ap= plied, > then > provide multi-process support in a separate patch. >=20 >=20 > > + > > +First create ``master`` interface:: > > + > > + #./testpmd -l 0-1 --proc-type=3Dprimary --file-prefix=3Dpmd1 > > + --vdev=3Dnet_memif,role=3Dmaster -- -i > > + > > +Now create ``slave`` interface (master must be already running so the > slave will connect):: > > + > > + #./testpmd -l 2-3 --proc-type=3Dprimary --file-prefix=3Dpmd2 > > + --vdev=3Dnet_memif -- -i > > + > > +Set forwarding mode in one of the instances to 'rx only' and the other= to > 'tx only':: > > + > > + testpmd> set fwd rxonly > > + testpmd> start > > + > > + testpmd> set fwd txonly > > + testpmd> start >=20 > Would it be useful to add loopback option to the PMD, for testing/debug ? >=20 > Also I am getting low performance numbers above, comparing the ring pmd > for example, is there any performance target for the pmd? > Same forwarding core for both testpmds, this is terrible, either somethin= g is > wrong or I am doing something wrong, it is ~40Kpps Different forwarding > cores for each testpmd, still low, !18Mpps >=20 > <...> >=20 >=20 > The difference between ring pmd and memif pmd is that while memif > transfers packets, > ring transfers whole buffers. (Also ring can not be used process to p= rocess) > That means, it does not have to alloc/free buffers. > I did a simple test where I modified the code so the tx function will= only > free given buffers > and rx allocates new buffers. On my machine, this can only handle 45- > 50Mpps. >=20 > With that in mind, I believe that 23Mpps is fine performance. No > performance target is > defined, the goal is to be as fast as possible. Use of C11 atomics have proven to provide better performance on weakly orde= red architectures (at least on Arm). IMO, C11 atomics should be used to imp= lement the fast path functions at least. This ensures optimal performance o= n all supported architectures in DPDK. >=20 > The cause of the issue, where traffic drops to 40Kpps while using the= same > core for both applications, > is that within the timeslice given to testpmd process, memif driver f= ills its > queues, > but keeps polling, not giving the other process a chance to receive t= he > packets. > Same applies to rx side, where an empty queue is polled over and over > again. In such configuration, > interrupt rx-mode should be used instead of polling. Or the applicati= on > could suspend > the port. >=20 >=20 > > +/** > > + * Buffer descriptor. > > + */ > > +typedef struct __rte_packed { > > + uint16_t flags; /**< flags */ > > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained b= uffer */ >=20 > Is this define used? >=20 > <...> >=20 >=20 > Yes, see eth_memif_tx() and eth_memif_rx() >=20 >=20 > > + while (n_slots && n_rx_pkts < nb_pkts) { > > + mbuf_head =3D rte_pktmbuf_alloc(mq->mempool); > > + if (unlikely(mbuf_head =3D=3D NULL)) > > + goto no_free_bufs; > > + mbuf =3D mbuf_head; > > + mbuf->port =3D mq->in_port; > > + > > + next_slot: > > + s0 =3D cur_slot & mask; > > + d0 =3D &ring->desc[s0]; > > + > > + src_len =3D d0->length; > > + dst_off =3D 0; > > + src_off =3D 0; > > + > > + do { > > + dst_len =3D mbuf_size - dst_off; > > + if (dst_len =3D=3D 0) { > > + dst_off =3D 0; > > + dst_len =3D mbuf_size + > > + RTE_PKTMBUF_HEADROOM; > > + > > + mbuf =3D rte_pktmbuf_alloc(mq->mempool); > > + if (unlikely(mbuf =3D=3D NULL)) > > + goto no_free_bufs; > > + mbuf->port =3D mq->in_port; > > + rte_pktmbuf_chain(mbuf_head, mbuf); > > + } > > + cp_len =3D RTE_MIN(dst_len, src_len); > > + > > + rte_pktmbuf_pkt_len(mbuf) =3D > > + rte_pktmbuf_data_len(mbuf) +=3D cp_len; >=20 > Can you please make this two lines to prevent confusion? >=20 > Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'? > 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len'= is > not set to 'cp_len'... >=20 > <...> >=20 >=20 > Fix in next patch. >=20 >=20 > > + if (r->addr =3D=3D NULL) > > + return; > > + munmap(r->addr, r->region_size); > > + if (r->fd > 0) { > > + close(r->fd); > > + r->fd =3D -1; > > + } > > + } > > + rte_free(pmd->regions); > > +} > > + > > +static int > > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) { > > + struct memif_region *r; > > + char shm_name[32]; > > + int i; > > + int ret =3D 0; > > + > > + r =3D rte_zmalloc("memif_region", sizeof(struct memif_region) * (= brn + 1), > 0); > > + if (r =3D=3D NULL) { > > + MIF_LOG(ERR, "%s: Failed to allocate regions.", > > + rte_vdev_device_name(pmd->vdev)); > > + return -ENOMEM; > > + } > > + > > + pmd->regions =3D r; > > + pmd->regions_num =3D brn + 1; > > + > > + /* > > + * Create shm for every region. Region 0 is reserved for descript= ors. > > + * Other regions contain buffers. > > + */ > > + for (i =3D 0; i < (brn + 1); i++) { > > + r =3D &pmd->regions[i]; > > + > > + r->buffer_offset =3D (i =3D=3D 0) ? (pmd->run.num_s2m_rin= gs + > > + pmd->run.num_m2s_rings) * > > + (sizeof(memif_ring_t) + > > + sizeof(memif_desc_t) * (1 << > > + pmd->run.log2_ring_size)) : 0; >=20 > what exactly "buffer_offset" is? For 'region 0' it calculates the size of= the size > of the 'region 0', otherwise 0. This is offset starting from where? And i= t seems > only used for below size assignment. >=20 >=20 > It is possible to use single region for one connection (non-zero-copy= ) to > reduce > the number of files opened. It will be implemented in the next patch. > 'buffer_offset' is offset from the start of shared memory file to the= first > packet buffer. >=20 >=20 > > + (1 << pmd->run.log2_ring_size) * > > + (pmd->run.num_s2m_rings + > > + pmd->run.num_m2s_rings)); >=20 > There is an illusion of packet buffers can be in multiple regions (above > comment implies it) but this logic assumes all packet buffer are in same > region other than region 0, which gives us 'region 1' and this is already > hardcoded in a few places. Is there a benefit of assuming there will be m= ore > regions, will it make simple to accept 'region 0' for rings and 'region 1= ' is for > packet buffer? >=20 >=20 > Multiple regions are required in case of zero-copy slave. The zero-co= py will > expose dpdk shared memory, where each memseg/memseg_list will be > represended by memif_region. >=20 >=20 > > +int > > +memif_connect(struct rte_eth_dev *dev) { > > + struct pmd_internals *pmd =3D dev->data->dev_private; > > + struct memif_region *mr; > > + struct memif_queue *mq; > > + int i; > > + > > + for (i =3D 0; i < pmd->regions_num; i++) { > > + mr =3D pmd->regions + i; > > + if (mr !=3D NULL) { > > + if (mr->addr =3D=3D NULL) { > > + if (mr->fd < 0) > > + return -1; > > + mr->addr =3D mmap(NULL, mr->region_size, > > + PROT_READ | PROT_WRITE, > > + MAP_SHARED, mr->fd, 0); > > + if (mr->addr =3D=3D NULL) > > + return -1; > > + } > > + } > > + } >=20 > If multiple slave is not supported, is 'id' devarg still valid, or is it = used at all? >=20 >=20 > 'id' is used to identify peer interface. Interface with id=3D0 will o= nly connect > to an > interface with id=3D0 and so on... >=20 >=20 > <...> >=20 > > +static int > > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, > > + memif_interface_id_t id, uint32_t flags, > > + const char *socket_filename, > > + memif_log2_ring_size_t log2_ring_size, > > + uint16_t buffer_size, const char *secret, > > + struct ether_addr *eth_addr) { > > + int ret =3D 0; > > + struct rte_eth_dev *eth_dev; > > + struct rte_eth_dev_data *data; > > + struct pmd_internals *pmd; > > + const unsigned int numa_node =3D vdev->device.numa_node; > > + const char *name =3D rte_vdev_device_name(vdev); > > + > > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) { > > + MIF_LOG(ERR, "Zero-copy not supported."); > > + return -1; > > + } >=20 > What is the plan for the zero copy? > Can you please put the status & plan to the documentation? >=20 > <...> >=20 >=20 > I don't think that putting the status in the docs is needed. Zero-cop= y slave > is in testing stage > and I should be able to submit a patch once we get this one applied. >=20 >=20 > > + > > + uint16_t last_head; /**< last ring head */ > > + uint16_t last_tail; /**< last ring tail */ >=20 > ring already has head, tail variables, what is the difference in queue on= es? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 5F2DCA0AC5 for ; Fri, 3 May 2019 06:27:42 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3D35E2BAE; Fri, 3 May 2019 06:27:41 +0200 (CEST) Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20068.outbound.protection.outlook.com [40.107.2.68]) by dpdk.org (Postfix) with ESMTP id 29E452B9C for ; Fri, 3 May 2019 06:27:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=moXYwhXOCpsDMMbEAVHXuvoT01w5H7mbv12zAGCrsVs=; b=ccfL6L6C799wbzNtlSUt+L8bC5hyzBZa+qH0nka7PSrmrsMfAuwiLVBUpf+PRZmiItj2/W0wIYXmmvupXdIwklzg4CCMYkeJkziCWN+7E9JvOOth38OTOtbdtTSXf6bFb4uhBje5Fr+/6CwbSXdPULpTeiQQasO39JRzk/xvirY= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.152) by VE1PR08MB5213.eurprd08.prod.outlook.com (10.255.159.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.11; Fri, 3 May 2019 04:27:37 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::f5e3:39bc:e7d9:dfea]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::f5e3:39bc:e7d9:dfea%5]) with mapi id 15.20.1856.012; Fri, 3 May 2019 04:27:37 +0000 From: Honnappa Nagarahalli To: "Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)" , Ferruh Yigit , "dev@dpdk.org" , Honnappa Nagarahalli CC: nd , nd Thread-Topic: [dpdk-dev] [RFC v5] /net: memory interface (memif) Thread-Index: AQHU4KZ16NJCh3r9PESlg1+aACrGcKYdLYsAgBgjuyGAI70qEA== Date: Fri, 3 May 2019 04:27:37 +0000 Message-ID: References: <20190220115254.18724-1-jgrajcia@cisco.com> <20190322115727.4358-1-jgrajcia@cisco.com>, <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com> <1556800556778.14516@cisco.com> In-Reply-To: <1556800556778.14516@cisco.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 2acbace9-cf3b-40f5-5179-08d6cf7fac5b x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(4618075)(2017052603328)(7193020); SRVR:VE1PR08MB5213; x-ms-traffictypediagnostic: VE1PR08MB5213: nodisclaimer: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 0026334A56 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(39860400002)(366004)(136003)(396003)(189003)(199004)(66946007)(2906002)(6436002)(9686003)(54906003)(55016002)(316002)(66446008)(229853002)(110136005)(33656002)(14454004)(99286004)(64756008)(2501003)(66066001)(68736007)(72206003)(14444005)(71190400001)(71200400001)(66556008)(66476007)(256004)(76116006)(73956011)(7736002)(52536014)(3846002)(6116002)(446003)(30864003)(11346002)(81166006)(81156014)(508600001)(6246003)(7696005)(8676002)(486006)(76176011)(476003)(86362001)(25786009)(74316002)(102836004)(26005)(8936002)(305945005)(186003)(4326008)(6506007)(53936002)(53546011)(5660300002); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB5213; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: KQnidVXhWgLPOh13AFuL0+4X7uVq7FKSmrAwXk0pOW/dmi7gqGaJAMSnHvxrlETVbxgSCFs2KrbF9MAnhCBISM7jwbGQWNzknMRKL+EiBERmIlRZSAkH8vXBa6xg2REi2Jnd1yyWWZtojMhm2mE7+BdPSwQ/OHXZSz4dmcePY6IZDVxGszETGlM7FVdoB8Y6sAuJIRvGYmZMc8Ira6myUvMV2YRUI293X4ebKZBMv4X1Fl50T3ARpPBCMJh23KnEPNqjInjDztGLk1RigqtqdviQaHWGj012qACWpWeDRMIZNo3VAtvW0/oZI1QXvozXpYRJoR6Vhpm+H/f24/Fk5Rq8gcqSn3eLVl8UyPWkwRQqUEuK2ZdAXG//srPZhq3bRpjtmlborTV1Saxi48zL9urb5lMyS/Zu/WacVoN4jCk= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2acbace9-cf3b-40f5-5179-08d6cf7fac5b X-MS-Exchange-CrossTenant-originalarrivaltime: 03 May 2019 04:27:37.8035 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5213 Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190503042737.XBgmRWUOv10dpBu-dP49ymVa6CdDzJoQt8XPwNVDC9A@z> > On 3/22/2019 11:57 AM, Jakub Grajciar wrote: > > Memory interface (memif), provides high performance packet transfer > > over shared memory. > > > > Signed-off-by: Jakub Grajciar >=20 > <...> >=20 > > @@ -0,0 +1,200 @@ > > +.. SPDX-License-Identifier: BSD-3-Clause > > + Copyright(c) 2018-2019 Cisco Systems, Inc. > > + > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +Memif Poll Mode Driver > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +Shared memory packet interface (memif) PMD allows for DPDK and any > > +other client using memif (DPDK, VPP, libmemif) to communicate using > > +shared memory. Memif is Linux only. > > + > > +The created device transmits packets in a raw format. It can be used > > +with Ethernet mode, IP mode, or Punt/Inject. At this moment, only > > +Ethernet mode is supported in DPDK memif implementation. > > + > > +Memif works in two roles: master and slave. Slave connects to master > > +over an existing socket. It is also a producer of shared memory file > > +and initializes the shared memory. Master creates the socket and > > +listens for any slave connection requests. The socket may already > > +exist on the system. Be sure to remove any such sockets, if you are > > +creating a master interface, or you will see an "Address already in > > +use" error. Function ``rte_pmd_memif_remove()``, >=20 > Can it be possible to remove this existing socket on successfully termina= tion > of the dpdk application with 'master' role? > Otherwise each time to run a dpdk app, requires to delete the socket file= first. >=20 >=20 > net_memif is the same case as net_virtio_user, I'd use the same > workaround. > testpmd.c:pmd_test_exit() this, however, is only valid for testpmd. >=20 >=20 > > +which removes memif interface, will also remove a listener socket, if > > +it is not being used by any other interface. > > + > > +The method to enable one or more interfaces is to use the > > +``--vdev=3Dnet_memif0`` option on the DPDK application command line. > > +Each ``--vdev=3Dnet_memif1`` option given will create an interface > > +named net_memif0, net_memif1, and so on. Memif uses unix domain > > +socket to transmit control messages. Each memif has a unique id per > > +socket. If you are connecting multiple interfaces using same socket, > > +be sure to specify unique ids ``id=3D0``, ``id=3D1``, etc. Note that i= f > > +you assign a socket to a master interface it becomes a listener > > +socket. Listener socket can not be used by a slave interface on same c= lient. >=20 > When I connect two slaves, id=3D0 & id=3D1 to the master, both master and > second slave crashed as soon as second slave connected, is this a know is= sue? > Can you please check this? >=20 > master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=3Dmast= er > -- -i > slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=3Dslave --vdev > net_memif0,role=3Dslave,id=3D0 -- -i > slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=3Dslave2 --vdev > net_memif0,role=3Dslave,id=3D1 -- -i >=20 > <...> >=20 >=20 > Each interface can be connected to one peer interface at the same tim= e, I'll > add more details to the documentation. >=20 >=20 > > +Example: testpmd and testpmd > > +---------------------------- > > +In this example we run two instances of testpmd application and transm= it > packets over memif. >=20 > How this play with multi process support? When I run a secondary app when > memif PMD is enabled in primary, secondary process crashes. > Can you please check and ensure at least nothing crashes when a secondary > app run? >=20 >=20 > For now I'd disable multi-process support, so we can get the patch ap= plied, > then > provide multi-process support in a separate patch. >=20 >=20 > > + > > +First create ``master`` interface:: > > + > > + #./testpmd -l 0-1 --proc-type=3Dprimary --file-prefix=3Dpmd1 > > + --vdev=3Dnet_memif,role=3Dmaster -- -i > > + > > +Now create ``slave`` interface (master must be already running so the > slave will connect):: > > + > > + #./testpmd -l 2-3 --proc-type=3Dprimary --file-prefix=3Dpmd2 > > + --vdev=3Dnet_memif -- -i > > + > > +Set forwarding mode in one of the instances to 'rx only' and the other= to > 'tx only':: > > + > > + testpmd> set fwd rxonly > > + testpmd> start > > + > > + testpmd> set fwd txonly > > + testpmd> start >=20 > Would it be useful to add loopback option to the PMD, for testing/debug ? >=20 > Also I am getting low performance numbers above, comparing the ring pmd > for example, is there any performance target for the pmd? > Same forwarding core for both testpmds, this is terrible, either somethin= g is > wrong or I am doing something wrong, it is ~40Kpps Different forwarding > cores for each testpmd, still low, !18Mpps >=20 > <...> >=20 >=20 > The difference between ring pmd and memif pmd is that while memif > transfers packets, > ring transfers whole buffers. (Also ring can not be used process to p= rocess) > That means, it does not have to alloc/free buffers. > I did a simple test where I modified the code so the tx function will= only > free given buffers > and rx allocates new buffers. On my machine, this can only handle 45- > 50Mpps. >=20 > With that in mind, I believe that 23Mpps is fine performance. No > performance target is > defined, the goal is to be as fast as possible. Use of C11 atomics have proven to provide better performance on weakly orde= red architectures (at least on Arm). IMO, C11 atomics should be used to imp= lement the fast path functions at least. This ensures optimal performance o= n all supported architectures in DPDK. >=20 > The cause of the issue, where traffic drops to 40Kpps while using the= same > core for both applications, > is that within the timeslice given to testpmd process, memif driver f= ills its > queues, > but keeps polling, not giving the other process a chance to receive t= he > packets. > Same applies to rx side, where an empty queue is polled over and over > again. In such configuration, > interrupt rx-mode should be used instead of polling. Or the applicati= on > could suspend > the port. >=20 >=20 > > +/** > > + * Buffer descriptor. > > + */ > > +typedef struct __rte_packed { > > + uint16_t flags; /**< flags */ > > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained b= uffer */ >=20 > Is this define used? >=20 > <...> >=20 >=20 > Yes, see eth_memif_tx() and eth_memif_rx() >=20 >=20 > > + while (n_slots && n_rx_pkts < nb_pkts) { > > + mbuf_head =3D rte_pktmbuf_alloc(mq->mempool); > > + if (unlikely(mbuf_head =3D=3D NULL)) > > + goto no_free_bufs; > > + mbuf =3D mbuf_head; > > + mbuf->port =3D mq->in_port; > > + > > + next_slot: > > + s0 =3D cur_slot & mask; > > + d0 =3D &ring->desc[s0]; > > + > > + src_len =3D d0->length; > > + dst_off =3D 0; > > + src_off =3D 0; > > + > > + do { > > + dst_len =3D mbuf_size - dst_off; > > + if (dst_len =3D=3D 0) { > > + dst_off =3D 0; > > + dst_len =3D mbuf_size + > > + RTE_PKTMBUF_HEADROOM; > > + > > + mbuf =3D rte_pktmbuf_alloc(mq->mempool); > > + if (unlikely(mbuf =3D=3D NULL)) > > + goto no_free_bufs; > > + mbuf->port =3D mq->in_port; > > + rte_pktmbuf_chain(mbuf_head, mbuf); > > + } > > + cp_len =3D RTE_MIN(dst_len, src_len); > > + > > + rte_pktmbuf_pkt_len(mbuf) =3D > > + rte_pktmbuf_data_len(mbuf) +=3D cp_len; >=20 > Can you please make this two lines to prevent confusion? >=20 > Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'? > 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len'= is > not set to 'cp_len'... >=20 > <...> >=20 >=20 > Fix in next patch. >=20 >=20 > > + if (r->addr =3D=3D NULL) > > + return; > > + munmap(r->addr, r->region_size); > > + if (r->fd > 0) { > > + close(r->fd); > > + r->fd =3D -1; > > + } > > + } > > + rte_free(pmd->regions); > > +} > > + > > +static int > > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) { > > + struct memif_region *r; > > + char shm_name[32]; > > + int i; > > + int ret =3D 0; > > + > > + r =3D rte_zmalloc("memif_region", sizeof(struct memif_region) * (= brn + 1), > 0); > > + if (r =3D=3D NULL) { > > + MIF_LOG(ERR, "%s: Failed to allocate regions.", > > + rte_vdev_device_name(pmd->vdev)); > > + return -ENOMEM; > > + } > > + > > + pmd->regions =3D r; > > + pmd->regions_num =3D brn + 1; > > + > > + /* > > + * Create shm for every region. Region 0 is reserved for descript= ors. > > + * Other regions contain buffers. > > + */ > > + for (i =3D 0; i < (brn + 1); i++) { > > + r =3D &pmd->regions[i]; > > + > > + r->buffer_offset =3D (i =3D=3D 0) ? (pmd->run.num_s2m_rin= gs + > > + pmd->run.num_m2s_rings) * > > + (sizeof(memif_ring_t) + > > + sizeof(memif_desc_t) * (1 << > > + pmd->run.log2_ring_size)) : 0; >=20 > what exactly "buffer_offset" is? For 'region 0' it calculates the size of= the size > of the 'region 0', otherwise 0. This is offset starting from where? And i= t seems > only used for below size assignment. >=20 >=20 > It is possible to use single region for one connection (non-zero-copy= ) to > reduce > the number of files opened. It will be implemented in the next patch. > 'buffer_offset' is offset from the start of shared memory file to the= first > packet buffer. >=20 >=20 > > + (1 << pmd->run.log2_ring_size) * > > + (pmd->run.num_s2m_rings + > > + pmd->run.num_m2s_rings)); >=20 > There is an illusion of packet buffers can be in multiple regions (above > comment implies it) but this logic assumes all packet buffer are in same > region other than region 0, which gives us 'region 1' and this is already > hardcoded in a few places. Is there a benefit of assuming there will be m= ore > regions, will it make simple to accept 'region 0' for rings and 'region 1= ' is for > packet buffer? >=20 >=20 > Multiple regions are required in case of zero-copy slave. The zero-co= py will > expose dpdk shared memory, where each memseg/memseg_list will be > represended by memif_region. >=20 >=20 > > +int > > +memif_connect(struct rte_eth_dev *dev) { > > + struct pmd_internals *pmd =3D dev->data->dev_private; > > + struct memif_region *mr; > > + struct memif_queue *mq; > > + int i; > > + > > + for (i =3D 0; i < pmd->regions_num; i++) { > > + mr =3D pmd->regions + i; > > + if (mr !=3D NULL) { > > + if (mr->addr =3D=3D NULL) { > > + if (mr->fd < 0) > > + return -1; > > + mr->addr =3D mmap(NULL, mr->region_size, > > + PROT_READ | PROT_WRITE, > > + MAP_SHARED, mr->fd, 0); > > + if (mr->addr =3D=3D NULL) > > + return -1; > > + } > > + } > > + } >=20 > If multiple slave is not supported, is 'id' devarg still valid, or is it = used at all? >=20 >=20 > 'id' is used to identify peer interface. Interface with id=3D0 will o= nly connect > to an > interface with id=3D0 and so on... >=20 >=20 > <...> >=20 > > +static int > > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, > > + memif_interface_id_t id, uint32_t flags, > > + const char *socket_filename, > > + memif_log2_ring_size_t log2_ring_size, > > + uint16_t buffer_size, const char *secret, > > + struct ether_addr *eth_addr) { > > + int ret =3D 0; > > + struct rte_eth_dev *eth_dev; > > + struct rte_eth_dev_data *data; > > + struct pmd_internals *pmd; > > + const unsigned int numa_node =3D vdev->device.numa_node; > > + const char *name =3D rte_vdev_device_name(vdev); > > + > > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) { > > + MIF_LOG(ERR, "Zero-copy not supported."); > > + return -1; > > + } >=20 > What is the plan for the zero copy? > Can you please put the status & plan to the documentation? >=20 > <...> >=20 >=20 > I don't think that putting the status in the docs is needed. Zero-cop= y slave > is in testing stage > and I should be able to submit a patch once we get this one applied. >=20 >=20 > > + > > + uint16_t last_head; /**< last ring head */ > > + uint16_t last_tail; /**< last ring tail */ >=20 > ring already has head, tail variables, what is the difference in queue on= es?