From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 03863A04C5; Fri, 4 Sep 2020 04:02:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 01FA9DE0; Fri, 4 Sep 2020 04:02:09 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 92905CF3 for ; Fri, 4 Sep 2020 04:02:06 +0200 (CEST) IronPort-SDR: zq92Q+WfYKNpC6ng3tFQDXMHXLCjYTYQBkeOL9wgpVHvsnAJCmaMpM289TWXt0Bgplny+i7hPh yDnzsnLlfl1g== X-IronPort-AV: E=McAfee;i="6000,8403,9733"; a="175735938" X-IronPort-AV: E=Sophos;i="5.76,387,1592895600"; d="scan'208";a="175735938" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2020 19:02:04 -0700 IronPort-SDR: OmAJQQJPXVnaCwyohpjEOW/tkSGhY6rHPqUImYQfCDhYdAlUDmkEgHd6NFakXWISOWmubxx86J R8JzI/8LwORw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,387,1592895600"; d="scan'208";a="315686329" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by orsmga002.jf.intel.com with ESMTP; 03 Sep 2020 19:02:04 -0700 Received: from fmsmsx604.amr.corp.intel.com (10.18.126.84) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 3 Sep 2020 19:02:03 -0700 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx604.amr.corp.intel.com (10.18.126.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Thu, 3 Sep 2020 19:02:03 -0700 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.173) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1713.5; Thu, 3 Sep 2020 19:01:59 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GVumBFHiRFDbpKQA2iwESE/V03AGqYDoc0MNE0vvjtskNEH0UmnennVQNvL90yg4RMjvAfQxlnGSnMkCx8UXDz7Rp9YhNpz2DnbY5O9EyPxo8vfUapNiiMCPzEFHaEGnPEpJHoGsjaG2ofYmozgUNSY4JwYD8cF7tPHtdii2N3hSqaybbSGeMDRGyk/Y6lpawACFpxOGyBeQOWpd0JuVzznClk4RIjKoT4KEWApvrKRajmA1cWuoUIJwJAmKcEdEeTCTt5JAQBn9kui9gl/ET3yCABPpQEXf9ZJyYxa79EKMNTHDd/ePdwsX0qplP4wSNxM1a34SK6WplnRERk7FjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zzTPLDn0GfLYLGtvk9tObG3QfbjFLxA3e4kutVzuFZs=; b=N6yGYPwC7AATgTh+zUWNluqK53Qz2Rs5I5+pOoqwUUV3YSxqM51XZRv/8/2d9pYf79LhEosAPkTg3HzVl4U97HCBuWrlrKdZZYoLb6hgSdTAtYGoUYwZUFIUDeco4Dy+YEbzVXi/o8680wlN3aNLVivWaapndYapfA86aOtWgezwO1SVT9c4ZSurazg4wQdlFaZlUWCEHIkcpxT7IxNhDGO0FEPVsMFwWVuJmOFLHfV0Jnw9kB5LyNJi1KGFQSDXxfna3wxNRs8Pd+s915Q8qJ4mjzJtQGbmu54w6LXsIGszZMpwI9HDK3EjY4E9rEiOarsMAQSKA7MQ5nPy95taig== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zzTPLDn0GfLYLGtvk9tObG3QfbjFLxA3e4kutVzuFZs=; b=NCsdoFOiYk75Q9uneR2Qr9eDtANNcCJaySoved9J32MqEcZByq5dZv9rJaKYKcu2WEg3qYt6aNIgS+kvm6b76q3wawgKGLrzxELuHr9QyjD4dSZXWYhQU7LQxr43iW9c0/P2oH1Qcaj/AMUoDOxamRuVzYRYGeBp9WyfPIbKa+k= Received: from BYAPR11MB2901.namprd11.prod.outlook.com (2603:10b6:a03:91::23) by BYAPR11MB2632.namprd11.prod.outlook.com (2603:10b6:a02:c4::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.15; Fri, 4 Sep 2020 02:01:50 +0000 Received: from BYAPR11MB2901.namprd11.prod.outlook.com ([fe80::515:a970:3fec:fb73]) by BYAPR11MB2901.namprd11.prod.outlook.com ([fe80::515:a970:3fec:fb73%6]) with mapi id 15.20.3326.025; Fri, 4 Sep 2020 02:01:50 +0000 From: "Xu, Rosen" To: "Chautru, Nicolas" , "dev@dpdk.org" , "akhil.goyal@nxp.com" CC: "Richardson, Bruce" Thread-Topic: [dpdk-dev] [PATCH v3 04/11] baseband/acc100: add queue configuration Thread-Index: AQHWdb+fBRTnZsIzJEyzZI05d2E1R6lO8PAQgAB8rYCABtcQ8IABWIyAgAA0mUA= Date: Fri, 4 Sep 2020 02:01:50 +0000 Message-ID: References: <1597796731-57841-1-git-send-email-nicolas.chautru@intel.com> <1597796731-57841-5-git-send-email-nicolas.chautru@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.5.1.3 dlp-reaction: no-action x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMThkZjZlZTgtMzIwNS00ODJhLWIzMTktNDNhZmMxNWMxNzRjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiZE5VODNDSTVKUDhIQ0paYjVib2cxdDlERzlkUnNVZEtlXC85QUFyNXJFdlZHS3JydzlcLytuenZvbDgrbjlQTjhzIn0= x-ctpclassification: CTP_NT authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [192.198.147.193] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: c26196dd-acbd-48b8-9ee9-08d850767d1d x-ms-traffictypediagnostic: BYAPR11MB2632: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6790; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: tXqUwDm3+FfHmebW75ND0bFRgGvfBEIDkgPaNKCkWHEIGgjfJLvkXltBWno8ZjK6+rlY2NHrsTHJcQUv7jLa85Ij/s3lYPQUlp6puZmZ2az4DWETsXDhjUPiLD5Fw92A7j65HVWh4iCdBwmzrvEm/Ym88KzRC2BPUjVVW8KyBWOeaFua42cDctDUg14sxDIuEeTHlCV0Y8sO9oPL41XWXIK4g6pfRDGcBcpFSOWwj+hzXyGhR53MklZ5tMm2c8BH/vih4mYM7Ao++h86+zLg4g530yXFWyxTpbiYAVcfHp3iUGBJ+keCfvt2M2DF7usyHDJ9NeJ7nKY+3t+z1RTT5w== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB2901.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(39860400002)(346002)(366004)(55016002)(5660300002)(26005)(52536014)(33656002)(7696005)(9686003)(30864003)(107886003)(8936002)(86362001)(66946007)(478600001)(66446008)(66556008)(66476007)(64756008)(110136005)(71200400001)(2906002)(53546011)(6506007)(8676002)(186003)(316002)(4326008)(76116006)(83380400001)(579004)(559001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: IcITZWAGAQLGnJ+TgW05OkxfGn5rukj9fM8/Nd/6Blg1LodKzGo69Q6oIDjU1RNpcZxaobo6Wglc+n2lAOpa578mX/8dAf2uI9CGLN84TlA5ITHk69V+wsEv9C/3TplhcoTXDL5+8Rv9cSvsdNHN4AHId7ZKFdGGcNNFmaDSY39bRPXUEwK4dz0TcAmGmnGqv//UhdaRiHjehxRilbnCGj3QBG8yFvkFWCIdIF4kISGQNIwqBmxgUf0or1rSpwjxP6uVP2btmiDY7nuCOtLhtGE3MMYxmasbaD1jjPSlEBLzcw5Nl5MrZ8K9K86/vhWQ1OoBRyra98SC+Lat6g74/sbgXqhlaHZG4WZ+xrmUx0H9Yj7hj0vS7rk5Asq/78n8g2mEByLcSfPa6xfrCpax+ylYWdgiEfYpe5E54xXWajqbO6OAyHFF8qHJz0Gz9vakwM7aoV2YJRnPU6WJbFfUq4UCyv6HwVwcRabfNAeTVBwcg+720N7502S/azU5t+wrLkYkdbdQ21DQVKu5r9p2ZykQmCuTLxlrPepridNbtjO+1YH4KbDcpDl1SHjW6J+r/d39Z0cl2PqF6wWNyOh34AtvEyYYhPam7nW9oUmY41FmiVJ9c0zAcOJUKC3vJmwghjptbw4fehxLdqKf5BzsdA== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB2901.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: c26196dd-acbd-48b8-9ee9-08d850767d1d X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Sep 2020 02:01:50.6552 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: UE7Q5qKMYzM49E3z0w0znCa7Y6KhD5XeDnkRJ9BlbVQkNjRXGnPgznnrcaFjWVHAkxQt9PgxSvlUtpFhr4jXRw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2632 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v3 04/11] baseband/acc100: add queue configuration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, > -----Original Message----- > From: Chautru, Nicolas > Sent: Friday, September 04, 2020 6:49 > To: Xu, Rosen ; dev@dpdk.org; akhil.goyal@nxp.com > Cc: Richardson, Bruce > Subject: RE: [dpdk-dev] [PATCH v3 04/11] baseband/acc100: add queue > configuration >=20 > > From: Xu, Rosen > > > > Hi, > > > > > -----Original Message----- > > > From: Chautru, Nicolas > > > Sent: Sunday, August 30, 2020 1:48 > > > To: Xu, Rosen ; dev@dpdk.org; > > > akhil.goyal@nxp.com > > > Cc: Richardson, Bruce > > > Subject: RE: [dpdk-dev] [PATCH v3 04/11] baseband/acc100: add queue > > > configuration > > > > > > Hi, > > > > > > > From: Xu, Rosen > > > > > > > > Hi, > > > > > > > > > -----Original Message----- > > > > > From: dev On Behalf Of Nicolas Chautru > > > > > Sent: Wednesday, August 19, 2020 8:25 > > > > > To: dev@dpdk.org; akhil.goyal@nxp.com > > > > > Cc: Richardson, Bruce ; Chautru, > > > > > Nicolas > > > > > Subject: [dpdk-dev] [PATCH v3 04/11] baseband/acc100: add queue > > > > > configuration > > > > > > > > > > Adding function to create and configure queues for the device. > > > > > Still no capability. > > > > > > > > > > Signed-off-by: Nicolas Chautru > > > > > --- > > > > > drivers/baseband/acc100/rte_acc100_pmd.c | 420 > > > > > ++++++++++++++++++++++++++++++- > > > > > drivers/baseband/acc100/rte_acc100_pmd.h | 45 ++++ > > > > > 2 files changed, 464 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c > > > > > b/drivers/baseband/acc100/rte_acc100_pmd.c > > > > > index 7807a30..7a21c57 100644 > > > > > --- a/drivers/baseband/acc100/rte_acc100_pmd.c > > > > > +++ b/drivers/baseband/acc100/rte_acc100_pmd.c > > > > > @@ -26,6 +26,22 @@ > > > > > RTE_LOG_REGISTER(acc100_logtype, pmd.bb.acc100, NOTICE); > > > > > #endif > > > > > > > > > > +/* Write to MMIO register address */ static inline void > > > > > +mmio_write(void *addr, uint32_t value) { > > > > > + *((volatile uint32_t *)(addr)) =3D rte_cpu_to_le_32(value); } > > > > > + > > > > > +/* Write a register of a ACC100 device */ static inline void > > > > > +acc100_reg_write(struct acc100_device *d, uint32_t offset, > > > > > +uint32_t > > > > > +payload) { > > > > > + void *reg_addr =3D RTE_PTR_ADD(d->mmio_base, offset); > > > > > + mmio_write(reg_addr, payload); > > > > > + usleep(1000); > > > > > +} > > > > > + > > > > > /* Read a register of a ACC100 device */ static inline > > > > > uint32_t acc100_reg_read(struct acc100_device *d, uint32_t > > > > > offset) @@ -36,6 > > > > > +52,22 @@ > > > > > return rte_le_to_cpu_32(ret); > > > > > } > > > > > > > > > > +/* Basic Implementation of Log2 for exact 2^N */ static inline > > > > > +uint32_t log2_basic(uint32_t value) { > > > > > + return (value =3D=3D 0) ? 0 : __builtin_ctz(value); } > > > > > + > > > > > +/* Calculate memory alignment offset assuming alignment is 2^N > > > > > +*/ static inline uint32_t calc_mem_alignment_offset(void > > > > > +*unaligned_virt_mem, uint32_t alignment) { > > > > > + rte_iova_t unaligned_phy_mem =3D > > > > > rte_malloc_virt2iova(unaligned_virt_mem); > > > > > + return (uint32_t)(alignment - > > > > > + (unaligned_phy_mem & (alignment-1))); } > > > > > + > > > > > /* Calculate the offset of the enqueue register */ static > > > > > inline uint32_t queue_offset(bool pf_device, uint8_t vf_id, > > > > > uint8_t qgrp_id, uint16_t aq_id) @@ -204,10 +236,393 @@ > > > > > acc100_conf->q_dl_5g.aq_depth_log2); > > > > > } > > > > > > > > > > +static void > > > > > +free_base_addresses(void **base_addrs, int size) { > > > > > + int i; > > > > > + for (i =3D 0; i < size; i++) > > > > > + rte_free(base_addrs[i]); > > > > > +} > > > > > + > > > > > +static inline uint32_t > > > > > +get_desc_len(void) > > > > > +{ > > > > > + return sizeof(union acc100_dma_desc); } > > > > > + > > > > > +/* Allocate the 2 * 64MB block for the sw rings */ static int > > > > > +alloc_2x64mb_sw_rings_mem(struct rte_bbdev *dev, struct > > > > > +acc100_device > > > > > *d, > > > > > + int socket) > > > > > +{ > > > > > + uint32_t sw_ring_size =3D ACC100_SIZE_64MBYTE; > > > > > + d->sw_rings_base =3D rte_zmalloc_socket(dev->device- > >driver- > > > > > >name, > > > > > + 2 * sw_ring_size, RTE_CACHE_LINE_SIZE, > socket); > > > > > + if (d->sw_rings_base =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, "Failed to allocate memory > for %s:%u", > > > > > + dev->device->driver->name, > > > > > + dev->data->dev_id); > > > > > + return -ENOMEM; > > > > > + } > > > > > + memset(d->sw_rings_base, 0, ACC100_SIZE_64MBYTE); > > > > > + uint32_t next_64mb_align_offset =3D > calc_mem_alignment_offset( > > > > > + d->sw_rings_base, ACC100_SIZE_64MBYTE); > > > > > + d->sw_rings =3D RTE_PTR_ADD(d->sw_rings_base, > > > > > next_64mb_align_offset); > > > > > + d->sw_rings_phys =3D rte_malloc_virt2iova(d->sw_rings_base) > + > > > > > + next_64mb_align_offset; > > > > > + d->sw_ring_size =3D MAX_QUEUE_DEPTH * get_desc_len(); > > > > > + d->sw_ring_max_depth =3D d->sw_ring_size / get_desc_len(); > > > > > + > > > > > + return 0; > > > > > +} > > > > > > > > Why not a common alloc memory function but special function for > > > > different memory size? > > > > > > This is a bit convoluted but due to the fact the first attempt > > > method which is optimal (minimum) may not always find aligned memory. > > > > What's convoluted? Can you explain? > > For packet processing, in most scenarios, aren't we aligned memory > > when we alloc memory? >=20 > Hi Rosen, > This is related to both the alignment and the size of the contiguous amou= nt > of data in pinned down memory =3D 64MB contiguous block aligned on 64MB > boundary of physical address (not linear). > The first method can potentially fail hence is run incrementally while th= e 2nd > version may be used as safe fall through and is more wasteful in term of > footprint (hence not used as default). > That is the part that I considered "convoluted" in this way to reliably a= llocate > memory. It is possible to only use the 2nd version which would look clean= er > in term of code but more wasteful in memory usage. As you mentioned, it's not cleaner, looking forwarding your next version pa= tch. >=20 >=20 > > > > > > > > > > > > +/* Attempt to allocate minimised memory space for sw rings */ > > > > > +static void alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct > > > > > acc100_device > > > > > +*d, > > > > > + uint16_t num_queues, int socket) { > > > > > + rte_iova_t sw_rings_base_phy, next_64mb_align_addr_phy; > > > > > + uint32_t next_64mb_align_offset; > > > > > + rte_iova_t sw_ring_phys_end_addr; > > > > > + void *base_addrs[SW_RING_MEM_ALLOC_ATTEMPTS]; > > > > > + void *sw_rings_base; > > > > > + int i =3D 0; > > > > > + uint32_t q_sw_ring_size =3D MAX_QUEUE_DEPTH * > get_desc_len(); > > > > > + uint32_t dev_sw_ring_size =3D q_sw_ring_size * num_queues; > > > > > + > > > > > + /* Find an aligned block of memory to store sw rings */ > > > > > + while (i < SW_RING_MEM_ALLOC_ATTEMPTS) { > > > > > + /* > > > > > + * sw_ring allocated memory is guaranteed to be > aligned to > > > > > + * q_sw_ring_size at the condition that the > requested size is > > > > > + * less than the page size > > > > > + */ > > > > > + sw_rings_base =3D rte_zmalloc_socket( > > > > > + dev->device->driver->name, > > > > > + dev_sw_ring_size, q_sw_ring_size, > socket); > > > > > + > > > > > + if (sw_rings_base =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, > > > > > + "Failed to allocate memory > > > > > for %s:%u", > > > > > + dev->device->driver->name, > > > > > + dev->data->dev_id); > > > > > + break; > > > > > + } > > > > > + > > > > > + sw_rings_base_phy =3D > rte_malloc_virt2iova(sw_rings_base); > > > > > + next_64mb_align_offset =3D > calc_mem_alignment_offset( > > > > > + sw_rings_base, > ACC100_SIZE_64MBYTE); > > > > > + next_64mb_align_addr_phy =3D sw_rings_base_phy + > > > > > + next_64mb_align_offset; > > > > > + sw_ring_phys_end_addr =3D sw_rings_base_phy + > > > > > dev_sw_ring_size; > > > > > + > > > > > + /* Check if the end of the sw ring memory block is > before the > > > > > + * start of next 64MB aligned mem address > > > > > + */ > > > > > + if (sw_ring_phys_end_addr < > next_64mb_align_addr_phy) { > > > > > + d->sw_rings_phys =3D sw_rings_base_phy; > > > > > + d->sw_rings =3D sw_rings_base; > > > > > + d->sw_rings_base =3D sw_rings_base; > > > > > + d->sw_ring_size =3D q_sw_ring_size; > > > > > + d->sw_ring_max_depth =3D > MAX_QUEUE_DEPTH; > > > > > + break; > > > > > + } > > > > > + /* Store the address of the unaligned mem block */ > > > > > + base_addrs[i] =3D sw_rings_base; > > > > > + i++; > > > > > + } > > > > > + > > > > > + /* Free all unaligned blocks of mem allocated in the loop */ > > > > > + free_base_addresses(base_addrs, i); } > > > > > > > > It's strange to firstly alloc memory and then free memory but on > > > > operations on this memory. > > > > > > I may miss your point. We are freeing the exact same mem we did get > > > from rte_zmalloc. > > > Not that the base_addrs array refers to multiple attempts of mallocs, > > > not multiple operations in a ring. > > > > You alloc memory sw_rings_base, after some translate, assign this memor= y > to > > cc100_device *d, and before the function return, this memory has been > freed. >=20 > If you follow the logic, this actually only frees the memory from attempt= s > which were not successfully well aligned, not the one which ends up being= in > fact used for sw rings. > The actually memory for sw rings is obviously used and actually gets free= d > when closing the device below =3D> ie. rte_free(d->sw_rings_base); > Let me know if unclear. I could add more comments if this not obvious fro= m > the code. Ie. /* Free all _unaligned_ blocks of mem allocated in the loop= */* >=20 > Thanks for your review. I can see how it can look a bit odd initially. Pls make sure you code can works well in each branch. > > > > > > > > > > > + > > > > > +/* Allocate 64MB memory used for all software rings */ static in= t > > > > > +acc100_setup_queues(struct rte_bbdev *dev, uint16_t > num_queues, > > > int > > > > > +socket_id) { > > > > > + uint32_t phys_low, phys_high, payload; > > > > > + struct acc100_device *d =3D dev->data->dev_private; > > > > > + const struct acc100_registry_addr *reg_addr; > > > > > + > > > > > + if (d->pf_device && !d->acc100_conf.pf_mode_en) { > > > > > + rte_bbdev_log(NOTICE, > > > > > + "%s has PF mode disabled. This PF > can't be > > > > > used.", > > > > > + dev->data->name); > > > > > + return -ENODEV; > > > > > + } > > > > > + > > > > > + alloc_sw_rings_min_mem(dev, d, num_queues, socket_id); > > > > > + > > > > > + /* If minimal memory space approach failed, then allocate > > > > > + * the 2 * 64MB block for the sw rings > > > > > + */ > > > > > + if (d->sw_rings =3D=3D NULL) > > > > > + alloc_2x64mb_sw_rings_mem(dev, d, socket_id); > > > > > + > > > > > + /* Configure ACC100 with the base address for DMA > descriptor rings > > > > > + * Same descriptor rings used for UL and DL DMA Engines > > > > > + * Note : Assuming only VF0 bundle is used for PF mode > > > > > + */ > > > > > + phys_high =3D (uint32_t)(d->sw_rings_phys >> 32); > > > > > + phys_low =3D (uint32_t)(d->sw_rings_phys & > > > > > ~(ACC100_SIZE_64MBYTE-1)); > > > > > + > > > > > + /* Choose correct registry addresses for the device type */ > > > > > + if (d->pf_device) > > > > > + reg_addr =3D &pf_reg_addr; > > > > > + else > > > > > + reg_addr =3D &vf_reg_addr; > > > > > + > > > > > + /* Read the populated cfg from ACC100 registers */ > > > > > + fetch_acc100_config(dev); > > > > > + > > > > > + /* Mark as configured properly */ > > > > > + d->configured =3D true; > > > > > + > > > > > + /* Release AXI from PF */ > > > > > + if (d->pf_device) > > > > > + acc100_reg_write(d, HWPfDmaAxiControl, 1); > > > > > + > > > > > + acc100_reg_write(d, reg_addr->dma_ring_ul5g_hi, > phys_high); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_ul5g_lo, > phys_low); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_dl5g_hi, > phys_high); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_dl5g_lo, > phys_low); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_ul4g_hi, > phys_high); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_ul4g_lo, > phys_low); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_dl4g_hi, > phys_high); > > > > > + acc100_reg_write(d, reg_addr->dma_ring_dl4g_lo, > phys_low); > > > > > + > > > > > + /* > > > > > + * Configure Ring Size to the max queue ring size > > > > > + * (used for wrapping purpose) > > > > > + */ > > > > > + payload =3D log2_basic(d->sw_ring_size / 64); > > > > > + acc100_reg_write(d, reg_addr->ring_size, payload); > > > > > + > > > > > + /* Configure tail pointer for use when SDONE enabled */ > > > > > + d->tail_ptrs =3D rte_zmalloc_socket( > > > > > + dev->device->driver->name, > > > > > + ACC100_NUM_QGRPS * ACC100_NUM_AQS > * > > > > > sizeof(uint32_t), > > > > > + RTE_CACHE_LINE_SIZE, socket_id); > > > > > + if (d->tail_ptrs =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, "Failed to allocate tail ptr > for %s:%u", > > > > > + dev->device->driver->name, > > > > > + dev->data->dev_id); > > > > > + rte_free(d->sw_rings); > > > > > + return -ENOMEM; > > > > > + } > > > > > + d->tail_ptr_phys =3D rte_malloc_virt2iova(d->tail_ptrs); > > > > > + > > > > > + phys_high =3D (uint32_t)(d->tail_ptr_phys >> 32); > > > > > + phys_low =3D (uint32_t)(d->tail_ptr_phys); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_ul5g_hi, phys_high); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_ul5g_lo, phys_low); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_dl5g_hi, phys_high); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_dl5g_lo, phys_low); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_ul4g_hi, phys_high); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_ul4g_lo, phys_low); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_dl4g_hi, phys_high); > > > > > + acc100_reg_write(d, reg_addr->tail_ptrs_dl4g_lo, phys_low); > > > > > + > > > > > + d->harq_layout =3D rte_zmalloc_socket("HARQ Layout", > > > > > + ACC100_HARQ_LAYOUT * sizeof(*d- > >harq_layout), > > > > > + RTE_CACHE_LINE_SIZE, dev->data- > >socket_id); > > > > > + > > > > > + rte_bbdev_log_debug( > > > > > + "ACC100 (%s) configured sw_rings =3D %p, > > > > > sw_rings_phys =3D %#" > > > > > + PRIx64, dev->data->name, d->sw_rings, d- > > > > > >sw_rings_phys); > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > /* Free 64MB memory used for software rings */ static int - > > > > > acc100_dev_close(struct rte_bbdev *dev __rte_unused) > > > > > +acc100_dev_close(struct rte_bbdev *dev) > > > > > { > > > > > + struct acc100_device *d =3D dev->data->dev_private; > > > > > + if (d->sw_rings_base !=3D NULL) { > > > > > + rte_free(d->tail_ptrs); > > > > > + rte_free(d->sw_rings_base); > > > > > + d->sw_rings_base =3D NULL; > > > > > + } > > > > > + usleep(1000); > > > > > + return 0; > > > > > +} > > > > > + > > > > > + > > > > > +/** > > > > > + * Report a ACC100 queue index which is free > > > > > + * Return 0 to 16k for a valid queue_idx or -1 when no queue is > > > > > +available > > > > > + * Note : Only supporting VF0 Bundle for PF mode */ static int > > > > > +acc100_find_free_queue_idx(struct rte_bbdev *dev, > > > > > + const struct rte_bbdev_queue_conf *conf) { > > > > > + struct acc100_device *d =3D dev->data->dev_private; > > > > > + int op_2_acc[5] =3D {0, UL_4G, DL_4G, UL_5G, DL_5G}; > > > > > + int acc =3D op_2_acc[conf->op_type]; > > > > > + struct rte_q_topology_t *qtop =3D NULL; > > > > > + qtopFromAcc(&qtop, acc, &(d->acc100_conf)); > > > > > + if (qtop =3D=3D NULL) > > > > > + return -1; > > > > > + /* Identify matching QGroup Index which are sorted in > priority > > > > > +order > > > > > */ > > > > > + uint16_t group_idx =3D qtop->first_qgroup_index; > > > > > + group_idx +=3D conf->priority; > > > > > + if (group_idx >=3D ACC100_NUM_QGRPS || > > > > > + conf->priority >=3D qtop->num_qgroups) { > > > > > + rte_bbdev_log(INFO, "Invalid Priority on %s, > priority %u", > > > > > + dev->data->name, conf->priority); > > > > > + return -1; > > > > > + } > > > > > + /* Find a free AQ_idx */ > > > > > + uint16_t aq_idx; > > > > > + for (aq_idx =3D 0; aq_idx < qtop->num_aqs_per_groups; > aq_idx++) { > > > > > + if (((d->q_assigned_bit_map[group_idx] >> aq_idx) > & 0x1) > > > > > =3D=3D 0) { > > > > > + /* Mark the Queue as assigned */ > > > > > + d->q_assigned_bit_map[group_idx] |=3D (1 << > aq_idx); > > > > > + /* Report the AQ Index */ > > > > > + return (group_idx << GRP_ID_SHIFT) + > aq_idx; > > > > > + } > > > > > + } > > > > > + rte_bbdev_log(INFO, "Failed to find free queue on %s, > priority %u", > > > > > + dev->data->name, conf->priority); > > > > > + return -1; > > > > > +} > > > > > + > > > > > +/* Setup ACC100 queue */ > > > > > +static int > > > > > +acc100_queue_setup(struct rte_bbdev *dev, uint16_t queue_id, > > > > > + const struct rte_bbdev_queue_conf *conf) { > > > > > + struct acc100_device *d =3D dev->data->dev_private; > > > > > + struct acc100_queue *q; > > > > > + int16_t q_idx; > > > > > + > > > > > + /* Allocate the queue data structure. */ > > > > > + q =3D rte_zmalloc_socket(dev->device->driver->name, > sizeof(*q), > > > > > + RTE_CACHE_LINE_SIZE, conf->socket); > > > > > + if (q =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, "Failed to allocate queue > memory"); > > > > > + return -ENOMEM; > > > > > + } > > > > > + > > > > > + q->d =3D d; > > > > > + q->ring_addr =3D RTE_PTR_ADD(d->sw_rings, (d->sw_ring_size > * > > > > > queue_id)); > > > > > + q->ring_addr_phys =3D d->sw_rings_phys + (d->sw_ring_size * > > > > > queue_id); > > > > > + > > > > > + /* Prepare the Ring with default descriptor format */ > > > > > + union acc100_dma_desc *desc =3D NULL; > > > > > + unsigned int desc_idx, b_idx; > > > > > + int fcw_len =3D (conf->op_type =3D=3D RTE_BBDEV_OP_LDPC_ENC ? > > > > > + ACC100_FCW_LE_BLEN : (conf->op_type =3D=3D > > > > > RTE_BBDEV_OP_TURBO_DEC ? > > > > > + ACC100_FCW_TD_BLEN : ACC100_FCW_LD_BLEN)); > > > > > + > > > > > + for (desc_idx =3D 0; desc_idx < d->sw_ring_max_depth; > desc_idx++) { > > > > > + desc =3D q->ring_addr + desc_idx; > > > > > + desc->req.word0 =3D ACC100_DMA_DESC_TYPE; > > > > > + desc->req.word1 =3D 0; /**< Timestamp */ > > > > > + desc->req.word2 =3D 0; > > > > > + desc->req.word3 =3D 0; > > > > > + uint64_t fcw_offset =3D (desc_idx << 8) + > > > > > ACC100_DESC_FCW_OFFSET; > > > > > + desc->req.data_ptrs[0].address =3D q->ring_addr_phys > + > > > > > fcw_offset; > > > > > + desc->req.data_ptrs[0].blen =3D fcw_len; > > > > > + desc->req.data_ptrs[0].blkid =3D > ACC100_DMA_BLKID_FCW; > > > > > + desc->req.data_ptrs[0].last =3D 0; > > > > > + desc->req.data_ptrs[0].dma_ext =3D 0; > > > > > + for (b_idx =3D 1; b_idx < > ACC100_DMA_MAX_NUM_POINTERS > > > > > - 1; > > > > > + b_idx++) { > > > > > + desc->req.data_ptrs[b_idx].blkid =3D > > > > > ACC100_DMA_BLKID_IN; > > > > > + desc->req.data_ptrs[b_idx].last =3D 1; > > > > > + desc->req.data_ptrs[b_idx].dma_ext =3D 0; > > > > > + b_idx++; > > > > > + desc->req.data_ptrs[b_idx].blkid =3D > > > > > + > ACC100_DMA_BLKID_OUT_ENC; > > > > > + desc->req.data_ptrs[b_idx].last =3D 1; > > > > > + desc->req.data_ptrs[b_idx].dma_ext =3D 0; > > > > > + } > > > > > + /* Preset some fields of LDPC FCW */ > > > > > + desc->req.fcw_ld.FCWversion =3D ACC100_FCW_VER; > > > > > + desc->req.fcw_ld.gain_i =3D 1; > > > > > + desc->req.fcw_ld.gain_h =3D 1; > > > > > + } > > > > > + > > > > > + q->lb_in =3D rte_zmalloc_socket(dev->device->driver->name, > > > > > + RTE_CACHE_LINE_SIZE, > > > > > + RTE_CACHE_LINE_SIZE, conf->socket); > > > > > + if (q->lb_in =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, "Failed to allocate lb_in > memory"); > > > > > + return -ENOMEM; > > > > > + } > > > > > + q->lb_in_addr_phys =3D rte_malloc_virt2iova(q->lb_in); > > > > > + q->lb_out =3D rte_zmalloc_socket(dev->device->driver->name, > > > > > + RTE_CACHE_LINE_SIZE, > > > > > + RTE_CACHE_LINE_SIZE, conf->socket); > > > > > + if (q->lb_out =3D=3D NULL) { > > > > > + rte_bbdev_log(ERR, "Failed to allocate lb_out > memory"); > > > > > + return -ENOMEM; > > > > > + } > > > > > + q->lb_out_addr_phys =3D rte_malloc_virt2iova(q->lb_out); > > > > > + > > > > > + /* > > > > > + * Software queue ring wraps synchronously with the HW > when it > > > > > reaches > > > > > + * the boundary of the maximum allocated queue size, no > matter > > > > > what the > > > > > + * sw queue size is. This wrapping is guarded by setting the > > > > > wrap_mask > > > > > + * to represent the maximum queue size as allocated at the > time > > > > > when > > > > > + * the device has been setup (in configure()). > > > > > + * > > > > > + * The queue depth is set to the queue size value (conf- > > > > > >queue_size). > > > > > + * This limits the occupancy of the queue at any point of time, > so that > > > > > + * the queue does not get swamped with enqueue requests. > > > > > + */ > > > > > + q->sw_ring_depth =3D conf->queue_size; > > > > > + q->sw_ring_wrap_mask =3D d->sw_ring_max_depth - 1; > > > > > + > > > > > + q->op_type =3D conf->op_type; > > > > > + > > > > > + q_idx =3D acc100_find_free_queue_idx(dev, conf); > > > > > + if (q_idx =3D=3D -1) { > > > > > + rte_free(q); > > > > > + return -1; > > > > > + } > > > > > + > > > > > + q->qgrp_id =3D (q_idx >> GRP_ID_SHIFT) & 0xF; > > > > > + q->vf_id =3D (q_idx >> VF_ID_SHIFT) & 0x3F; > > > > > + q->aq_id =3D q_idx & 0xF; > > > > > + q->aq_depth =3D (conf->op_type =3D=3D > RTE_BBDEV_OP_TURBO_DEC) ? > > > > > + (1 << d- > >acc100_conf.q_ul_4g.aq_depth_log2) : > > > > > + (1 << d- > >acc100_conf.q_dl_4g.aq_depth_log2); > > > > > + > > > > > + q->mmio_reg_enqueue =3D RTE_PTR_ADD(d->mmio_base, > > > > > + queue_offset(d->pf_device, > > > > > + q->vf_id, q->qgrp_id, q- > >aq_id)); > > > > > + > > > > > + rte_bbdev_log_debug( > > > > > + "Setup dev%u q%u: qgrp_id=3D%u, vf_id=3D%u, > > > > > aq_id=3D%u, aq_depth=3D%u, mmio_reg_enqueue=3D%p", > > > > > + dev->data->dev_id, queue_id, q->qgrp_id, > q->vf_id, > > > > > + q->aq_id, q->aq_depth, q- > >mmio_reg_enqueue); > > > > > + > > > > > + dev->data->queues[queue_id].queue_private =3D q; > > > > > + return 0; > > > > > +} > > > > > + > > > > > +/* Release ACC100 queue */ > > > > > +static int > > > > > +acc100_queue_release(struct rte_bbdev *dev, uint16_t q_id) { > > > > > + struct acc100_device *d =3D dev->data->dev_private; > > > > > + struct acc100_queue *q =3D dev->data- > >queues[q_id].queue_private; > > > > > + > > > > > + if (q !=3D NULL) { > > > > > + /* Mark the Queue as un-assigned */ > > > > > + d->q_assigned_bit_map[q->qgrp_id] &=3D (0xFFFFFFFF > - > > > > > + (1 << q->aq_id)); > > > > > + rte_free(q->lb_in); > > > > > + rte_free(q->lb_out); > > > > > + rte_free(q); > > > > > + dev->data->queues[q_id].queue_private =3D NULL; > > > > > + } > > > > > + > > > > > return 0; > > > > > } > > > > > > > > > > @@ -258,8 +673,11 @@ > > > > > } > > > > > > > > > > static const struct rte_bbdev_ops acc100_bbdev_ops =3D { > > > > > + .setup_queues =3D acc100_setup_queues, > > > > > .close =3D acc100_dev_close, > > > > > .info_get =3D acc100_dev_info_get, > > > > > + .queue_setup =3D acc100_queue_setup, > > > > > + .queue_release =3D acc100_queue_release, > > > > > }; > > > > > > > > > > /* ACC100 PCI PF address map */ > > > > > diff --git a/drivers/baseband/acc100/rte_acc100_pmd.h > > > > > b/drivers/baseband/acc100/rte_acc100_pmd.h > > > > > index 662e2c8..0e2b79c 100644 > > > > > --- a/drivers/baseband/acc100/rte_acc100_pmd.h > > > > > +++ b/drivers/baseband/acc100/rte_acc100_pmd.h > > > > > @@ -518,11 +518,56 @@ struct acc100_registry_addr { > > > > > .ddr_range =3D HWVfDmaDdrBaseRangeRoVf, }; > > > > > > > > > > +/* Structure associated with each queue. */ struct > > > > > +__rte_cache_aligned acc100_queue { > > > > > + union acc100_dma_desc *ring_addr; /* Virtual address of sw > ring */ > > > > > + rte_iova_t ring_addr_phys; /* Physical address of software > ring */ > > > > > + uint32_t sw_ring_head; /* software ring head */ > > > > > + uint32_t sw_ring_tail; /* software ring tail */ > > > > > + /* software ring size (descriptors, not bytes) */ > > > > > + uint32_t sw_ring_depth; > > > > > + /* mask used to wrap enqueued descriptors on the sw ring > */ > > > > > + uint32_t sw_ring_wrap_mask; > > > > > + /* MMIO register used to enqueue descriptors */ > > > > > + void *mmio_reg_enqueue; > > > > > + uint8_t vf_id; /* VF ID (max =3D 63) */ > > > > > + uint8_t qgrp_id; /* Queue Group ID */ > > > > > + uint16_t aq_id; /* Atomic Queue ID */ > > > > > + uint16_t aq_depth; /* Depth of atomic queue */ > > > > > + uint32_t aq_enqueued; /* Count how many "batches" have > been > > > > > enqueued */ > > > > > + uint32_t aq_dequeued; /* Count how many "batches" have > been > > > > > dequeued */ > > > > > + uint32_t irq_enable; /* Enable ops dequeue interrupts if set > to 1 */ > > > > > + struct rte_mempool *fcw_mempool; /* FCW mempool */ > > > > > + enum rte_bbdev_op_type op_type; /* Type of this Queue: > TE or TD > > > > > */ > > > > > + /* Internal Buffers for loopback input */ > > > > > + uint8_t *lb_in; > > > > > + uint8_t *lb_out; > > > > > + rte_iova_t lb_in_addr_phys; > > > > > + rte_iova_t lb_out_addr_phys; > > > > > + struct acc100_device *d; > > > > > +}; > > > > > + > > > > > /* Private data structure for each ACC100 device */ struct > > > > > acc100_device > > > { > > > > > void *mmio_base; /**< Base address of MMIO registers > (BAR0) */ > > > > > + void *sw_rings_base; /* Base addr of un-aligned memory > for sw > > > > > rings */ > > > > > + void *sw_rings; /* 64MBs of 64MB aligned memory for sw > rings */ > > > > > + rte_iova_t sw_rings_phys; /* Physical address of sw_rings */ > > > > > + /* Virtual address of the info memory routed to the this > > > > > +function > > > > > under > > > > > + * operation, whether it is PF or VF. > > > > > + */ > > > > > + union acc100_harq_layout_data *harq_layout; > > > > > + uint32_t sw_ring_size; > > > > > uint32_t ddr_size; /* Size in kB */ > > > > > + uint32_t *tail_ptrs; /* Base address of response tail pointer > buffer */ > > > > > + rte_iova_t tail_ptr_phys; /* Physical address of tail pointers > */ > > > > > + /* Max number of entries available for each queue in device, > > > > > depending > > > > > + * on how many queues are enabled with configure() > > > > > + */ > > > > > + uint32_t sw_ring_max_depth; > > > > > struct acc100_conf acc100_conf; /* ACC100 Initial > configuration > > > > > */ > > > > > + /* Bitmap capturing which Queues have already been > assigned */ > > > > > + uint16_t q_assigned_bit_map[ACC100_NUM_QGRPS]; > > > > > bool pf_device; /**< True if this is a PF ACC100 device */ > > > > > bool configured; /**< True if this ACC100 device is configured > > > > > */ }; > > > > > -- > > > > > 1.8.3.1