* [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing @ 2018-04-27 16:25 Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain Once the hotplugging (6b42f7563) patchset were merged, DPAA2 Physical Addressing mode and DPAA observed drastic performance drop (~95%) This was because of an inherent assumption while doing some memory translation that memsegs would be physically contiguous This series attempts to add a workaround for that - a intermediary one while complete solution is integrated This work around creates a linked list of referenced buffers and attempts to search through it during physical to virtual translations. Shreyansh Jain (3): crypto/dpaa_sec: remove ctx based offset for PA-VA conversion bus/fslmc: optimize physical to virtual address searching bus/dpaa: optimize physical to virtual address searching drivers/bus/dpaa/rte_dpaa_bus.h | 27 +++++++++++++++++- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 23 +++++++++++++++ drivers/crypto/dpaa_sec/dpaa_sec.c | 49 +++++++++++++------------------- drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++- drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++ 5 files changed, 144 insertions(+), 31 deletions(-) -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion 2018-04-27 16:25 [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain @ 2018-04-27 16:25 ` Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain Crypto requires physical to virtual address conversion for descriptors. Prior to memory hotplugging this was based on memseg iteration assuming memsegs are all physical contiguous and using cached start address fast calculations can be done. This assumption now stands invalid with memory hotplugging support. In preparation for supporting hotplugging change to memory, this patchset removes the optimized pool context stored physical address offset based PA-VA conversion. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> --- This adversely affects the performance as complete memsegs now need to be parsed, but a rework containing necessary optimzation would be posted over this. drivers/crypto/dpaa_sec/dpaa_sec.c | 49 ++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c b/drivers/crypto/dpaa_sec/dpaa_sec.c index e456fd542..06f7e4373 100644 --- a/drivers/crypto/dpaa_sec/dpaa_sec.c +++ b/drivers/crypto/dpaa_sec/dpaa_sec.c @@ -103,13 +103,6 @@ dpaa_mem_vtop(void *vaddr) return (size_t)NULL; } -/* virtual address conversin when mempool support is available for ctx */ -static inline phys_addr_t -dpaa_mem_vtop_ctx(struct dpaa_sec_op_ctx *ctx, void *vaddr) -{ - return (size_t)vaddr - ctx->vtop_offset; -} - static inline void * dpaa_mem_ptov(rte_iova_t paddr) { @@ -630,7 +623,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) in_sg->extension = 1; in_sg->final = 1; in_sg->length = sym->auth.data.length; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(&cf->sg[2])); /* 1st seg */ sg = in_sg + 1; @@ -654,7 +647,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; rte_memcpy(old_digest, sym->auth.digest.data, ses->digest_length); - start_addr = dpaa_mem_vtop_ctx(ctx, old_digest); + start_addr = dpaa_mem_vtop(old_digest); qm_sg_entry_set64(sg, start_addr); sg->length = ses->digest_length; in_sg->length += ses->digest_length; @@ -708,7 +701,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses) if (is_decode(ses)) { /* need to extend the input to a compound frame */ sg->extension = 1; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2])); sg->length = sym->auth.data.length + ses->digest_length; sg->final = 1; cpu_to_hw_sg(sg); @@ -722,7 +715,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses) cpu_to_hw_sg(sg); /* let's check digest by hw */ - start_addr = dpaa_mem_vtop_ctx(ctx, old_digest); + start_addr = dpaa_mem_vtop(old_digest); sg++; qm_sg_entry_set64(sg, start_addr); sg->length = ses->digest_length; @@ -775,7 +768,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) out_sg = &cf->sg[0]; out_sg->extension = 1; out_sg->length = sym->cipher.data.length; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(&cf->sg[2])); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -804,7 +797,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) in_sg->length = sym->cipher.data.length + ses->iv.length; sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* IV */ @@ -871,7 +864,7 @@ build_cipher_only(struct rte_crypto_op *op, dpaa_sec_session *ses) sg->extension = 1; sg->final = 1; sg->length = sym->cipher.data.length + ses->iv.length; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2])); cpu_to_hw_sg(sg); sg = &cf->sg[2]; @@ -937,7 +930,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output sg entries */ sg = &cf->sg[2]; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -981,7 +974,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input sg entries */ sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* 1st seg IV */ @@ -1018,7 +1011,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; memcpy(ctx->digest, sym->aead.digest.data, ses->digest_length); - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; } sg->final = 1; @@ -1056,7 +1049,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input */ rte_prefetch0(cf->sg); sg = &cf->sg[2]; - qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg)); if (is_encode(ses)) { qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr)); sg->length = ses->iv.length; @@ -1101,7 +1094,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) ses->digest_length); sg++; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; length += sg->length; sg->final = 1; @@ -1115,7 +1108,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output */ sg++; - qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg)); qm_sg_entry_set64(sg, dst_start_addr + sym->aead.data.offset - ses->auth_only_len); sg->length = sym->aead.data.length + ses->auth_only_len; @@ -1184,7 +1177,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output sg entries */ sg = &cf->sg[2]; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -1226,7 +1219,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input sg entries */ sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* 1st seg IV */ @@ -1256,7 +1249,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; memcpy(ctx->digest, sym->auth.digest.data, ses->digest_length); - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; } sg->final = 1; @@ -1293,7 +1286,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input */ rte_prefetch0(cf->sg); sg = &cf->sg[2]; - qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg)); if (is_encode(ses)) { qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr)); sg->length = ses->iv.length; @@ -1323,7 +1316,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) ses->digest_length); sg++; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; length += sg->length; sg->final = 1; @@ -1337,7 +1330,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output */ sg++; - qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg)); qm_sg_entry_set64(sg, dst_start_addr + sym->cipher.data.offset); sg->length = sym->cipher.data.length; length = sg->length; @@ -1412,7 +1405,6 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops, struct rte_crypto_op *op; struct dpaa_sec_job *cf; dpaa_sec_session *ses; - struct dpaa_sec_op_ctx *ctx; uint32_t auth_only_len; struct qman_fq *inq[DPAA_SEC_BURST]; @@ -1497,8 +1489,7 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops, inq[loop] = ses->inq; fd->opaque_addr = 0; fd->cmd = 0; - ctx = container_of(cf, struct dpaa_sec_op_ctx, job); - qm_fd_addr_set64(fd, dpaa_mem_vtop_ctx(ctx, cf->sg)); + qm_fd_addr_set64(fd, dpaa_mem_vtop(cf->sg)); fd->_format1 = qm_fd_compound; fd->length29 = 2 * sizeof(struct qm_sg_entry); /* Auth_only_len is set as 0 in descriptor and it is -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching 2018-04-27 16:25 [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain @ 2018-04-27 16:25 ` Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 3/3] bus/dpaa: " Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 3 siblings, 0 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain With Hotplugging memory support, the order of memseg has been changed from physically contiguous to virtual contiguous. FSLMC bus and dpaa2 drivers depend on PA to VA address conversion when in Physical addressing mode. This patch creates a list of blocks requested to be pinned to the DPAA2 mempool. For searching physical addresses, it is expected that it would belong to this list (from hardware pool) and hence it is less expensive than memseg walks. Though, this has marginal impact on performance vis-a-vis legacy mode with physically contiguous memsegs. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> -- An optimized algorithm is being worked upon based on some recent patches in hotplugging. That would improve/recover the performance. Until that time, this patch is to be treated a stop-gap solution. --- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 23 +++++++++++++++++ drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+) diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h index c76393d45..da6e639dc 100644 --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h @@ -254,15 +254,38 @@ enum qbman_fd_format { */ #define DPAA2_EQ_RESP_ALWAYS 1 +/* Various structures representing contiguous memory maps */ +struct dpaa2_memseg { + TAILQ_ENTRY(dpaa2_memseg) next; + char *vaddr; + rte_iova_t iova; + size_t len; +}; + +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg); +extern struct dpaa2_memseg_list dpaa2_memsegs; + #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA extern uint8_t dpaa2_virt_mode; static void *dpaa2_mem_ptov(phys_addr_t paddr) __attribute__((unused)); /* todo - this is costly, need to write a fast coversion routine */ static void *dpaa2_mem_ptov(phys_addr_t paddr) { + struct dpaa2_memseg *ms; + if (dpaa2_virt_mode) return (void *)(size_t)paddr; + /* Check if the address is already part of the memseg list internally + * maintained by the dpaa2 driver. + */ + TAILQ_FOREACH(ms, &dpaa2_memsegs, next) { + if (paddr >= ms->iova && paddr < + ms->iova + ms->len) + return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + } + + /* If not, Fallback to full memseg list searching */ return rte_mem_iova2virt(paddr); } diff --git a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c index ce7a4c577..4c44c33cc 100644 --- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c +++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c @@ -32,6 +32,13 @@ struct dpaa2_bp_info rte_dpaa2_bpid_info[MAX_BPID]; static struct dpaa2_bp_list *h_bp_list; +/* List of all the memseg information locally maintained in dpaa2 driver. This + * is to optimize the PA_to_VA searches until a better mechanism (algo) is + * available. + */ +struct dpaa2_memseg_list dpaa2_memsegs + = TAILQ_HEAD_INITIALIZER(dpaa2_memsegs); + /* Dynamic logging identified for mempool */ int dpaa2_logtype_mempool; @@ -358,6 +365,41 @@ rte_hw_mbuf_get_count(const struct rte_mempool *mp) return num_of_bufs; } +static int +dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs, + void *vaddr, rte_iova_t paddr, size_t len, + rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) +{ + struct dpaa2_memseg *ms; + + /* For each memory chunk pinned to the Mempool, a linked list of the + * represeted memsegs is created for searching when PA to VA + * conversion is required. + */ + ms = rte_zmalloc(NULL, sizeof(struct dpaa2_memseg), 0); + if (!ms) { + DPAA2_MEMPOOL_ERR("Unable to allocate internal memory."); + DPAA2_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available."); + /* If the element is not added, it would only lead to failure + * in searching for the element and the logic would Fallback + * to traditional DPDK memseg traversal code. So, this is not + * a blocking error - but, error would be printed on screen. + */ + return 0; + } + + ms->vaddr = vaddr; + ms->iova = paddr; + ms->len = len; + /* Head insertions are generally faster than tail insertions as the + * buffers pinned are picked from rear end. + */ + TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next); + + return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, + obj_cb, obj_cb_arg); +} + struct rte_mempool_ops dpaa2_mpool_ops = { .name = DPAA2_MEMPOOL_OPS_NAME, .alloc = rte_hw_mbuf_create_pool, @@ -365,6 +407,7 @@ struct rte_mempool_ops dpaa2_mpool_ops = { .enqueue = rte_hw_mbuf_free_bulk, .dequeue = rte_dpaa2_mbuf_alloc_bulk, .get_count = rte_hw_mbuf_get_count, + .populate = dpaa2_populate, }; MEMPOOL_REGISTER_OPS(dpaa2_mpool_ops); -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH 3/3] bus/dpaa: optimize physical to virtual address searching 2018-04-27 16:25 [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain @ 2018-04-27 16:25 ` Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 3 siblings, 0 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain With Hotplugging memory support, the order of memseg has been changed from physically contiguous to virtual contiguous. DPAA bus and drivers depend on PA to VA address conversion for I/O. This patch creates a list of blocks requested to be pinned to the DPAA mempool. For searching physical addresses, it is expected that it would belong to this list (from hardware pool) and hence it is less expensive than memseg walks. Though, there is a marginal drop in performance vis-a-vis the legacy mode with physically contiguous memsegs. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> -- An optimized algorithm is being worked upon based on some recent patches in hotplugging. That would improve/recover the performance. Until that time, this patch is to be treated a stop-gap solution. --- drivers/bus/dpaa/rte_dpaa_bus.h | 27 ++++++++++++++++++++++++++- drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/drivers/bus/dpaa/rte_dpaa_bus.h b/drivers/bus/dpaa/rte_dpaa_bus.h index 89aeac2d1..ca32b7f2f 100644 --- a/drivers/bus/dpaa/rte_dpaa_bus.h +++ b/drivers/bus/dpaa/rte_dpaa_bus.h @@ -95,9 +95,34 @@ struct dpaa_portal { uint64_t tid;/**< Parent Thread id for this portal */ }; -/* TODO - this is costly, need to write a fast coversion routine */ +/* Various structures representing contiguous memory maps */ +struct dpaa_memseg { + TAILQ_ENTRY(dpaa_memseg) next; + char *vaddr; + rte_iova_t iova; + size_t len; +}; + +TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg); +extern struct dpaa_memseg_list dpaa_memsegs; + +/* Either iterate over the list of internal memseg references or fallback to + * EAL memseg based iova2virt. + */ static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr) { + struct dpaa_memseg *ms; + + /* Check if the address is already part of the memseg list internally + * maintained by the dpaa driver. + */ + TAILQ_FOREACH(ms, &dpaa_memsegs, next) { + if (paddr >= ms->iova && paddr < + ms->iova + ms->len) + return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + } + + /* If not, Fallback to full memseg list searching */ return rte_mem_iova2virt(paddr); } diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c index 580e4640c..e5de15ec9 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -27,6 +27,13 @@ #include <dpaa_mempool.h> +/* List of all the memseg information locally maintained in dpaa driver. This + * is to optimize the PA_to_VA searches until a better mechanism (algo) is + * available. + */ +struct dpaa_memseg_list dpaa_memsegs + = TAILQ_HEAD_INITIALIZER(dpaa_memsegs); + struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS]; static int @@ -287,10 +294,34 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs, /* Detect pool area has sufficient space for elements in this memzone */ if (len >= total_elt_sz * mp->size) bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT; + struct dpaa_memseg *ms; + + /* For each memory chunk pinned to the Mempool, a linked list of the + * represeted memsegs is created for searching when PA to VA + * conversion is required. + */ + ms = rte_zmalloc(NULL, sizeof(struct dpaa_memseg), 0); + if (!ms) { + DPAA_MEMPOOL_ERR("Unable to allocate internal memory."); + DPAA_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available."); + /* If the element is not added, it would only lead to failure + * in searching for the element and the logic would Fallback + * to traditional DPDK memseg traversal code. So, this is not + * a blocking error - but, error would be printed on screen. + */ + return 0; + } + + ms->vaddr = vaddr; + ms->iova = paddr; + ms->len = len; + /* Head insertions are generally faster than tail insertions as the + * buffers pinned are picked from rear end. + */ + TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next); return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, obj_cb, obj_cb_arg); - } struct rte_mempool_ops dpaa_mpool_ops = { -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing 2018-04-27 16:25 [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain ` (2 preceding siblings ...) 2018-04-27 16:25 ` [dpdk-dev] [PATCH 3/3] bus/dpaa: " Shreyansh Jain @ 2018-04-27 17:20 ` Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain ` (3 more replies) 3 siblings, 4 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain Once the hotplugging (6b42f7563) patchset were merged, DPAA2 Physical Addressing mode and DPAA observed drastic performance drop (~95%) This was because of an inherent assumption while doing some memory translation that memsegs would be physically contiguous This series attempts to add a workaround for that - a intermediary one while complete solution is integrated This work around creates a linked list of referenced buffers and attempts to search through it during physical to virtual translations. :Change history: v2: - fixed spelling mistakes in patch as commit Shreyansh Jain (3): crypto/dpaa_sec: remove ctx based offset for PA-VA conversion bus/fslmc: optimize physical to virtual address searching bus/dpaa: optimize physical to virtual address searching drivers/bus/dpaa/rte_dpaa_bus.h | 27 +++++++++++++++++- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 23 +++++++++++++++ drivers/crypto/dpaa_sec/dpaa_sec.c | 49 +++++++++++++------------------- drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++- drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++ 5 files changed, 144 insertions(+), 31 deletions(-) -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain @ 2018-04-27 17:20 ` Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain Crypto requires physical to virtual address conversion for descriptors. Prior to memory hotplugging this was based on memseg iteration assuming memsegs are all physical contiguous and using cached start address fast calculations can be done. This assumption now stands invalid with memory hotplugging support. In preparation for supporting hotplugging change to memory, this patchset removes the optimized pool context stored physical address offset based PA-VA conversion. This adversely affects the performance as complete memsegs now need to be parsed, but a rework containing necessary optimization would be posted over this. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> --- drivers/crypto/dpaa_sec/dpaa_sec.c | 49 ++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 29 deletions(-) diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c b/drivers/crypto/dpaa_sec/dpaa_sec.c index e456fd542..06f7e4373 100644 --- a/drivers/crypto/dpaa_sec/dpaa_sec.c +++ b/drivers/crypto/dpaa_sec/dpaa_sec.c @@ -103,13 +103,6 @@ dpaa_mem_vtop(void *vaddr) return (size_t)NULL; } -/* virtual address conversin when mempool support is available for ctx */ -static inline phys_addr_t -dpaa_mem_vtop_ctx(struct dpaa_sec_op_ctx *ctx, void *vaddr) -{ - return (size_t)vaddr - ctx->vtop_offset; -} - static inline void * dpaa_mem_ptov(rte_iova_t paddr) { @@ -630,7 +623,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) in_sg->extension = 1; in_sg->final = 1; in_sg->length = sym->auth.data.length; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(&cf->sg[2])); /* 1st seg */ sg = in_sg + 1; @@ -654,7 +647,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; rte_memcpy(old_digest, sym->auth.digest.data, ses->digest_length); - start_addr = dpaa_mem_vtop_ctx(ctx, old_digest); + start_addr = dpaa_mem_vtop(old_digest); qm_sg_entry_set64(sg, start_addr); sg->length = ses->digest_length; in_sg->length += ses->digest_length; @@ -708,7 +701,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses) if (is_decode(ses)) { /* need to extend the input to a compound frame */ sg->extension = 1; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2])); sg->length = sym->auth.data.length + ses->digest_length; sg->final = 1; cpu_to_hw_sg(sg); @@ -722,7 +715,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses) cpu_to_hw_sg(sg); /* let's check digest by hw */ - start_addr = dpaa_mem_vtop_ctx(ctx, old_digest); + start_addr = dpaa_mem_vtop(old_digest); sg++; qm_sg_entry_set64(sg, start_addr); sg->length = ses->digest_length; @@ -775,7 +768,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) out_sg = &cf->sg[0]; out_sg->extension = 1; out_sg->length = sym->cipher.data.length; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(&cf->sg[2])); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -804,7 +797,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) in_sg->length = sym->cipher.data.length + ses->iv.length; sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* IV */ @@ -871,7 +864,7 @@ build_cipher_only(struct rte_crypto_op *op, dpaa_sec_session *ses) sg->extension = 1; sg->final = 1; sg->length = sym->cipher.data.length + ses->iv.length; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2])); + qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2])); cpu_to_hw_sg(sg); sg = &cf->sg[2]; @@ -937,7 +930,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output sg entries */ sg = &cf->sg[2]; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -981,7 +974,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input sg entries */ sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* 1st seg IV */ @@ -1018,7 +1011,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; memcpy(ctx->digest, sym->aead.digest.data, ses->digest_length); - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; } sg->final = 1; @@ -1056,7 +1049,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input */ rte_prefetch0(cf->sg); sg = &cf->sg[2]; - qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg)); if (is_encode(ses)) { qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr)); sg->length = ses->iv.length; @@ -1101,7 +1094,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) ses->digest_length); sg++; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; length += sg->length; sg->final = 1; @@ -1115,7 +1108,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output */ sg++; - qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg)); qm_sg_entry_set64(sg, dst_start_addr + sym->aead.data.offset - ses->auth_only_len); sg->length = sym->aead.data.length + ses->auth_only_len; @@ -1184,7 +1177,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output sg entries */ sg = &cf->sg[2]; - qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(out_sg); /* 1st seg */ @@ -1226,7 +1219,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input sg entries */ sg++; - qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg)); cpu_to_hw_sg(in_sg); /* 1st seg IV */ @@ -1256,7 +1249,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses) sg++; memcpy(ctx->digest, sym->auth.digest.data, ses->digest_length); - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; } sg->final = 1; @@ -1293,7 +1286,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) /* input */ rte_prefetch0(cf->sg); sg = &cf->sg[2]; - qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg)); if (is_encode(ses)) { qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr)); sg->length = ses->iv.length; @@ -1323,7 +1316,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) ses->digest_length); sg++; - qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest)); + qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest)); sg->length = ses->digest_length; length += sg->length; sg->final = 1; @@ -1337,7 +1330,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses) /* output */ sg++; - qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg)); + qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg)); qm_sg_entry_set64(sg, dst_start_addr + sym->cipher.data.offset); sg->length = sym->cipher.data.length; length = sg->length; @@ -1412,7 +1405,6 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops, struct rte_crypto_op *op; struct dpaa_sec_job *cf; dpaa_sec_session *ses; - struct dpaa_sec_op_ctx *ctx; uint32_t auth_only_len; struct qman_fq *inq[DPAA_SEC_BURST]; @@ -1497,8 +1489,7 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops, inq[loop] = ses->inq; fd->opaque_addr = 0; fd->cmd = 0; - ctx = container_of(cf, struct dpaa_sec_op_ctx, job); - qm_fd_addr_set64(fd, dpaa_mem_vtop_ctx(ctx, cf->sg)); + qm_fd_addr_set64(fd, dpaa_mem_vtop(cf->sg)); fd->_format1 = qm_fd_compound; fd->length29 = 2 * sizeof(struct qm_sg_entry); /* Auth_only_len is set as 0 in descriptor and it is -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain @ 2018-04-27 17:20 ` Shreyansh Jain 2018-04-27 18:49 ` Thomas Monjalon 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain 2018-04-27 19:38 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon 3 siblings, 1 reply; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain With Hotplugging memory support, the order of memseg has been changed from physically contiguous to virtual contiguous. FSLMC bus and dpaa2 drivers depend on PA to VA address conversion when in Physical addressing mode. This patch creates a list of blocks requested to be pinned to the DPAA2 mempool. For searching physical addresses, it is expected that it would belong to this list (from hardware pool) and hence it is less expensive than memseg walks. Though, this has marginal impact on performance vis-a-vis legacy mode with physically contiguous memsegs. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> --- An optimized algorithm is being worked upon based on some recent patches in hotplugging. That would improve/recover the performance. Until that time, this patch is to be treated a stop-gap solution. --- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 23 +++++++++++++++++ drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+) diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h index c76393d45..da6e639dc 100644 --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h @@ -254,15 +254,38 @@ enum qbman_fd_format { */ #define DPAA2_EQ_RESP_ALWAYS 1 +/* Various structures representing contiguous memory maps */ +struct dpaa2_memseg { + TAILQ_ENTRY(dpaa2_memseg) next; + char *vaddr; + rte_iova_t iova; + size_t len; +}; + +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg); +extern struct dpaa2_memseg_list dpaa2_memsegs; + #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA extern uint8_t dpaa2_virt_mode; static void *dpaa2_mem_ptov(phys_addr_t paddr) __attribute__((unused)); /* todo - this is costly, need to write a fast coversion routine */ static void *dpaa2_mem_ptov(phys_addr_t paddr) { + struct dpaa2_memseg *ms; + if (dpaa2_virt_mode) return (void *)(size_t)paddr; + /* Check if the address is already part of the memseg list internally + * maintained by the dpaa2 driver. + */ + TAILQ_FOREACH(ms, &dpaa2_memsegs, next) { + if (paddr >= ms->iova && paddr < + ms->iova + ms->len) + return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + } + + /* If not, Fallback to full memseg list searching */ return rte_mem_iova2virt(paddr); } diff --git a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c index ce7a4c577..883d8d84a 100644 --- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c +++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c @@ -32,6 +32,13 @@ struct dpaa2_bp_info rte_dpaa2_bpid_info[MAX_BPID]; static struct dpaa2_bp_list *h_bp_list; +/* List of all the memseg information locally maintained in dpaa2 driver. This + * is to optimize the PA_to_VA searches until a better mechanism (algo) is + * available. + */ +struct dpaa2_memseg_list dpaa2_memsegs + = TAILQ_HEAD_INITIALIZER(dpaa2_memsegs); + /* Dynamic logging identified for mempool */ int dpaa2_logtype_mempool; @@ -358,6 +365,41 @@ rte_hw_mbuf_get_count(const struct rte_mempool *mp) return num_of_bufs; } +static int +dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs, + void *vaddr, rte_iova_t paddr, size_t len, + rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) +{ + struct dpaa2_memseg *ms; + + /* For each memory chunk pinned to the Mempool, a linked list of the + * contained memsegs is created for searching when PA to VA + * conversion is required. + */ + ms = rte_zmalloc(NULL, sizeof(struct dpaa2_memseg), 0); + if (!ms) { + DPAA2_MEMPOOL_ERR("Unable to allocate internal memory."); + DPAA2_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available."); + /* If the element is not added, it would only lead to failure + * in searching for the element and the logic would Fallback + * to traditional DPDK memseg traversal code. So, this is not + * a blocking error - but, error would be printed on screen. + */ + return 0; + } + + ms->vaddr = vaddr; + ms->iova = paddr; + ms->len = len; + /* Head insertions are generally faster than tail insertions as the + * buffers pinned are picked from rear end. + */ + TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next); + + return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, + obj_cb, obj_cb_arg); +} + struct rte_mempool_ops dpaa2_mpool_ops = { .name = DPAA2_MEMPOOL_OPS_NAME, .alloc = rte_hw_mbuf_create_pool, @@ -365,6 +407,7 @@ struct rte_mempool_ops dpaa2_mpool_ops = { .enqueue = rte_hw_mbuf_free_bulk, .dequeue = rte_dpaa2_mbuf_alloc_bulk, .get_count = rte_hw_mbuf_get_count, + .populate = dpaa2_populate, }; MEMPOOL_REGISTER_OPS(dpaa2_mpool_ops); -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain @ 2018-04-27 18:49 ` Thomas Monjalon 2018-04-27 19:24 ` Thomas Monjalon 0 siblings, 1 reply; 12+ messages in thread From: Thomas Monjalon @ 2018-04-27 18:49 UTC (permalink / raw) To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov 27/04/2018 19:20, Shreyansh Jain: > --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h > +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h > @@ -254,15 +254,38 @@ enum qbman_fd_format { > */ > #define DPAA2_EQ_RESP_ALWAYS 1 > > +/* Various structures representing contiguous memory maps */ > +struct dpaa2_memseg { > + TAILQ_ENTRY(dpaa2_memseg) next; > + char *vaddr; > + rte_iova_t iova; > + size_t len; > +}; > + > +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg); > +extern struct dpaa2_memseg_list dpaa2_memsegs; Shared compilation is broken without following patch: --- a/drivers/bus/fslmc/rte_bus_fslmc_version.map +++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map @@ -105,5 +105,6 @@ DPDK_18.05 { global: dpaa2_affine_qbman_ethrx_swp; + dpaa2_memsegs; } DPDK_18.02; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching 2018-04-27 18:49 ` Thomas Monjalon @ 2018-04-27 19:24 ` Thomas Monjalon 0 siblings, 0 replies; 12+ messages in thread From: Thomas Monjalon @ 2018-04-27 19:24 UTC (permalink / raw) To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov 27/04/2018 20:49, Thomas Monjalon: > 27/04/2018 19:20, Shreyansh Jain: > > --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h > > +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h > > @@ -254,15 +254,38 @@ enum qbman_fd_format { > > */ > > #define DPAA2_EQ_RESP_ALWAYS 1 > > > > +/* Various structures representing contiguous memory maps */ > > +struct dpaa2_memseg { > > + TAILQ_ENTRY(dpaa2_memseg) next; > > + char *vaddr; > > + rte_iova_t iova; > > + size_t len; > > +}; > > + > > +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg); > > +extern struct dpaa2_memseg_list dpaa2_memsegs; > > Shared compilation is broken without following patch: > > --- a/drivers/bus/fslmc/rte_bus_fslmc_version.map > +++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map > @@ -105,5 +105,6 @@ DPDK_18.05 { > global: > > dpaa2_affine_qbman_ethrx_swp; > + dpaa2_memsegs; > > } DPDK_18.02; Right fix is: --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h @@ -263,7 +263,7 @@ struct dpaa2_memseg { }; TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg); -extern struct dpaa2_memseg_list dpaa2_memsegs; +extern struct dpaa2_memseg_list rte_dpaa2_memsegs; #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA extern uint8_t dpaa2_virt_mode; @@ -279,10 +279,10 @@ static void *dpaa2_mem_ptov(phys_addr_t paddr) /* Check if the address is already part of the memseg list internally * maintained by the dpaa2 driver. */ - TAILQ_FOREACH(ms, &dpaa2_memsegs, next) { + TAILQ_FOREACH(ms, &rte_dpaa2_memsegs, next) { if (paddr >= ms->iova && paddr < ms->iova + ms->len) - return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + return RTE_PTR_ADD(ms->vaddr, (uintptr_t)(paddr - ms->iova)); } /* If not, Fallback to full memseg list searching */ --- a/drivers/event/dpaa2/Makefile +++ b/drivers/event/dpaa2/Makefile @@ -18,7 +18,8 @@ CFLAGS += -I$(RTE_SDK)/drivers/bus/fslmc/portal CFLAGS += -I$(RTE_SDK)/drivers/mempool/dpaa2 CFLAGS += -I$(RTE_SDK)/drivers/event/dpaa2 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal -LDLIBS += -lrte_eal -lrte_eventdev -lrte_bus_fslmc -lrte_pmd_dpaa2 +LDLIBS += -lrte_eal -lrte_eventdev +LDLIBS += -lrte_bus_fslmc -lrte_mempool_dpaa2 -lrte_pmd_dpaa2 LDLIBS += -lrte_bus_vdev CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2 CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc --- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c +++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c @@ -36,8 +36,8 @@ static struct dpaa2_bp_list *h_bp_list; * is to optimize the PA_to_VA searches until a better mechanism (algo) is * available. */ -struct dpaa2_memseg_list dpaa2_memsegs - = TAILQ_HEAD_INITIALIZER(dpaa2_memsegs); +struct dpaa2_memseg_list rte_dpaa2_memsegs + = TAILQ_HEAD_INITIALIZER(rte_dpaa2_memsegs); /* Dynamic logging identified for mempool */ int dpaa2_logtype_mempool; @@ -394,7 +394,7 @@ dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs, /* Head insertions are generally faster than tail insertions as the * buffers pinned are picked from rear end. */ - TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next); + TAILQ_INSERT_HEAD(&rte_dpaa2_memsegs, ms, next); return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, obj_cb, obj_cb_arg); --- a/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map +++ b/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map @@ -3,6 +3,7 @@ DPDK_17.05 { rte_dpaa2_bpid_info; rte_dpaa2_mbuf_alloc_bulk; + rte_dpaa2_memsegs; local: *; }; ^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v2 3/3] bus/dpaa: optimize physical to virtual address searching 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain @ 2018-04-27 17:20 ` Shreyansh Jain 2018-04-27 19:32 ` Thomas Monjalon 2018-04-27 19:38 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon 3 siblings, 1 reply; 12+ messages in thread From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw) To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain With Hotplugging memory support, the order of memseg has been changed from physically contiguous to virtual contiguous. DPAA bus and drivers depend on PA to VA address conversion for I/O. This patch creates a list of blocks requested to be pinned to the DPAA mempool. For searching physical addresses, it is expected that it would belong to this list (from hardware pool) and hence it is less expensive than memseg walks. Though, there is a marginal drop in performance vis-a-vis the legacy mode with physically contiguous memsegs. Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> --- An optimized algorithm is being worked upon based on some recent patches in hotplugging. That would improve/recover the performance. Until that time, this patch is to be treated a stop-gap solution. --- drivers/bus/dpaa/rte_dpaa_bus.h | 27 ++++++++++++++++++++++++++- drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/drivers/bus/dpaa/rte_dpaa_bus.h b/drivers/bus/dpaa/rte_dpaa_bus.h index 89aeac2d1..ca32b7f2f 100644 --- a/drivers/bus/dpaa/rte_dpaa_bus.h +++ b/drivers/bus/dpaa/rte_dpaa_bus.h @@ -95,9 +95,34 @@ struct dpaa_portal { uint64_t tid;/**< Parent Thread id for this portal */ }; -/* TODO - this is costly, need to write a fast coversion routine */ +/* Various structures representing contiguous memory maps */ +struct dpaa_memseg { + TAILQ_ENTRY(dpaa_memseg) next; + char *vaddr; + rte_iova_t iova; + size_t len; +}; + +TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg); +extern struct dpaa_memseg_list dpaa_memsegs; + +/* Either iterate over the list of internal memseg references or fallback to + * EAL memseg based iova2virt. + */ static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr) { + struct dpaa_memseg *ms; + + /* Check if the address is already part of the memseg list internally + * maintained by the dpaa driver. + */ + TAILQ_FOREACH(ms, &dpaa_memsegs, next) { + if (paddr >= ms->iova && paddr < + ms->iova + ms->len) + return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + } + + /* If not, Fallback to full memseg list searching */ return rte_mem_iova2virt(paddr); } diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c index 580e4640c..9d6277f82 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -27,6 +27,13 @@ #include <dpaa_mempool.h> +/* List of all the memseg information locally maintained in dpaa driver. This + * is to optimize the PA_to_VA searches until a better mechanism (algo) is + * available. + */ +struct dpaa_memseg_list dpaa_memsegs + = TAILQ_HEAD_INITIALIZER(dpaa_memsegs); + struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS]; static int @@ -287,10 +294,34 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs, /* Detect pool area has sufficient space for elements in this memzone */ if (len >= total_elt_sz * mp->size) bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT; + struct dpaa_memseg *ms; + + /* For each memory chunk pinned to the Mempool, a linked list of the + * contained memsegs is created for searching when PA to VA + * conversion is required. + */ + ms = rte_zmalloc(NULL, sizeof(struct dpaa_memseg), 0); + if (!ms) { + DPAA_MEMPOOL_ERR("Unable to allocate internal memory."); + DPAA_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available."); + /* If the element is not added, it would only lead to failure + * in searching for the element and the logic would Fallback + * to traditional DPDK memseg traversal code. So, this is not + * a blocking error - but, error would be printed on screen. + */ + return 0; + } + + ms->vaddr = vaddr; + ms->iova = paddr; + ms->len = len; + /* Head insertions are generally faster than tail insertions as the + * buffers pinned are picked from rear end. + */ + TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next); return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, obj_cb, obj_cb_arg); - } struct rte_mempool_ops dpaa_mpool_ops = { -- 2.14.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/3] bus/dpaa: optimize physical to virtual address searching 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain @ 2018-04-27 19:32 ` Thomas Monjalon 0 siblings, 0 replies; 12+ messages in thread From: Thomas Monjalon @ 2018-04-27 19:32 UTC (permalink / raw) To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov 27/04/2018 19:20, Shreyansh Jain: > --- a/drivers/bus/dpaa/rte_dpaa_bus.h > +++ b/drivers/bus/dpaa/rte_dpaa_bus.h > @@ -95,9 +95,34 @@ struct dpaa_portal { > uint64_t tid;/**< Parent Thread id for this portal */ > }; > > -/* TODO - this is costly, need to write a fast coversion routine */ > +/* Various structures representing contiguous memory maps */ > +struct dpaa_memseg { > + TAILQ_ENTRY(dpaa_memseg) next; > + char *vaddr; > + rte_iova_t iova; > + size_t len; > +}; > + > +TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg); > +extern struct dpaa_memseg_list dpaa_memsegs; Same as for DPAA2, fixes are required: --- a/drivers/bus/dpaa/rte_dpaa_bus.h +++ b/drivers/bus/dpaa/rte_dpaa_bus.h @@ -104,7 +104,7 @@ struct dpaa_memseg { }; TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg); -extern struct dpaa_memseg_list dpaa_memsegs; +extern struct dpaa_memseg_list rte_dpaa_memsegs; /* Either iterate over the list of internal memseg references or fallback to * EAL memseg based iova2virt. @@ -116,10 +116,10 @@ static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr) /* Check if the address is already part of the memseg list internally * maintained by the dpaa driver. */ - TAILQ_FOREACH(ms, &dpaa_memsegs, next) { + TAILQ_FOREACH(ms, &rte_dpaa_memsegs, next) { if (paddr >= ms->iova && paddr < ms->iova + ms->len) - return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova)); + return RTE_PTR_ADD(ms->vaddr, (uintptr_t)(paddr - ms->iova)); } /* If not, Fallback to full memseg list searching */ --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -31,8 +31,8 @@ * is to optimize the PA_to_VA searches until a better mechanism (algo) is * available. */ -struct dpaa_memseg_list dpaa_memsegs - = TAILQ_HEAD_INITIALIZER(dpaa_memsegs); +struct dpaa_memseg_list rte_dpaa_memsegs + = TAILQ_HEAD_INITIALIZER(rte_dpaa_memsegs); struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS]; @@ -318,7 +318,7 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs, /* Head insertions are generally faster than tail insertions as the * buffers pinned are picked from rear end. */ - TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next); + TAILQ_INSERT_HEAD(&rte_dpaa_memsegs, ms, next); return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, obj_cb, obj_cb_arg); --- a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map +++ b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map @@ -2,6 +2,7 @@ DPDK_17.11 { global: rte_dpaa_bpid_info; + rte_dpaa_memsegs; local: *; }; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain ` (2 preceding siblings ...) 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain @ 2018-04-27 19:38 ` Thomas Monjalon 3 siblings, 0 replies; 12+ messages in thread From: Thomas Monjalon @ 2018-04-27 19:38 UTC (permalink / raw) To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov > Shreyansh Jain (3): > crypto/dpaa_sec: remove ctx based offset for PA-VA conversion > bus/fslmc: optimize physical to virtual address searching > bus/dpaa: optimize physical to virtual address searching Applied with fixes for: - 32-bit compilation - symbols export for shared lib compilation - rte_ prefix namespace for exported symbols - dpaa2 mempool dependency for dpaa2 eventdev ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-04-27 19:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-27 16:25 [dpdk-dev] [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain 2018-04-27 16:25 ` [dpdk-dev] [PATCH 3/3] bus/dpaa: " Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain 2018-04-27 18:49 ` Thomas Monjalon 2018-04-27 19:24 ` Thomas Monjalon 2018-04-27 17:20 ` [dpdk-dev] [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain 2018-04-27 19:32 ` Thomas Monjalon 2018-04-27 19:38 ` [dpdk-dev] [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).