From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 6258D2F4F for ; Mon, 15 Oct 2018 17:10:26 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Oct 2018 08:10:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,384,1534834800"; d="scan'208";a="82735107" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga006.jf.intel.com with ESMTP; 15 Oct 2018 08:10:23 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.45]) by IRSMSX103.ger.corp.intel.com ([169.254.3.248]) with mapi id 14.03.0319.002; Mon, 15 Oct 2018 16:10:22 +0100 From: "Daly, Lee" To: "Verma, Shally" CC: "Jozwiak, TomaszX" , "dev@dpdk.org" , "Trahe, Fiona" , "akhil.goyal@nxp.com" Thread-Topic: [dpdk-dev] [PATCH 2/3] app/compress-perf: add performance measurement Thread-Index: AQHUYhSnTHhLU38qnE2vtMdDDLls4qUgLMhw Date: Mon, 15 Oct 2018 15:10:22 +0000 Message-ID: References: <1538400427-20164-1-git-send-email-tomaszx.jozwiak@intel.com> <1538400427-20164-3-git-send-email-tomaszx.jozwiak@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMzM0ZjJjZTktMDE5YS00NGRkLWIxODctNmZjNjcyOTAxZDlkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiQWlmOGdEbHZkU3ZNMmZmYmFmXC9aZTFpWFhRK0tGMHJIRlNYdXBxcFwvXC9FaHV2Y3dCMWRHeHFUNXF5NlJiSmw0ZCJ9 x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/3] app/compress-perf: add performance measurement X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2018 15:10:27 -0000 Thanks for your input Shally see comments below. I will be reviewing these changes while Tomasz is out this week. > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Verma, Shally > Sent: Friday, October 12, 2018 11:16 AM > To: Jozwiak, TomaszX ; dev@dpdk.org; Trahe, > Fiona ; akhil.goyal@nxp.com; De Lara Guarch, Pablo > > Cc: De@dpdk.org; Lara@dpdk.org; Guarch@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 2/3] app/compress-perf: add performance > measurement >=20 > HI TomaszX >=20 > Sorry for delay in response. Comments inline. >=20 <...> > >+static int > >+comp_perf_check_capabilities(struct comp_test_data *test_data) { > >+ const struct rte_compressdev_capabilities *cap; > >+ > >+ cap =3D rte_compressdev_capability_get(test_data->cdev_id, > >+ RTE_COMP_ALGO_DEFLATE); > >+ > >+ if (cap =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, > >+ "Compress device does not support DEFLATE\n"); > >+ return -1; > >+ } > >+ > >+ uint64_t comp_flags =3D cap->comp_feature_flags; > >+ > >+ /* Huffman enconding */ > >+ if (test_data->huffman_enc =3D=3D RTE_COMP_HUFFMAN_FIXED && > >+ (comp_flags & RTE_COMP_FF_HUFFMAN_FIXED) =3D=3D = 0) { > >+ RTE_LOG(ERR, USER1, > >+ "Compress device does not supported Fixed Huffma= n\n"); > >+ return -1; > >+ } > >+ > >+ if (test_data->huffman_enc =3D=3D RTE_COMP_HUFFMAN_DYNAMIC && > >+ (comp_flags & RTE_COMP_FF_HUFFMAN_DYNAMIC) =3D= =3D 0) { > >+ RTE_LOG(ERR, USER1, > >+ "Compress device does not supported Dynamic Huff= man\n"); > >+ return -1; > >+ } > >+ > >+ /* Window size */ > >+ if (test_data->window_sz !=3D -1) { > >+ if (param_range_check(test_data->window_sz, > >+ &cap->window_size) > What if cap->window_size is 0 i.e. implementation default? What do you mean when you say cap->window_size =3D 0? Cap->window_size is the range structure here, min, max and increment, which= are filled out by the driver. Our implementation default in the perf tool will set the window size to max= the driver can support. >=20 > >+ < 0) { > >+ RTE_LOG(ERR, USER1, > >+ "Compress device does not support " > >+ "this window size\n"); > >+ return -1; > >+ } > >+ } else > >+ /* Set window size to PMD maximum if none was specified = */ > >+ test_data->window_sz =3D cap->window_size.max; > >+ <...> > >+ > >+static int > >+comp_perf_dump_input_data(struct comp_test_data *test_data) { > >+ FILE *f =3D fopen(test_data->input_file, "r"); > >+ > >+ if (f =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, "Input file could not be opened\n"); > >+ return -1; > >+ } > >+ > >+ if (fseek(f, 0, SEEK_END) !=3D 0) { > >+ RTE_LOG(ERR, USER1, "Size of input could not be calculat= ed\n"); > >+ goto err; > >+ } > >+ size_t actual_file_sz =3D ftell(f); > >+ /* If extended input data size has not been set, > >+ * input data size =3D file size > >+ */ > >+ > >+ if (test_data->input_data_sz =3D=3D 0) > >+ test_data->input_data_sz =3D actual_file_sz; > >+ > >+ if (fseek(f, 0, SEEK_SET) !=3D 0) { > >+ RTE_LOG(ERR, USER1, "Size of input could not be calculat= ed\n"); > >+ goto err; > >+ } > >+ > >+ test_data->input_data =3D rte_zmalloc_socket(NULL, > >+ test_data->input_data_sz, 0, > >+ rte_socket_id()); > >+ > >+ if (test_data->input_data =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, "Memory to hold the data from the in= put " > >+ "file could not be allocated\n"); > >+ goto err; > >+ } > >+ > >+ size_t remaining_data =3D test_data->input_data_sz; > >+ uint8_t *data =3D test_data->input_data; > >+ > >+ while (remaining_data > 0) { > >+ size_t data_to_read =3D RTE_MIN(remaining_data, > >+ actual_file_sz); > >+ > >+ if (fread(data, data_to_read, 1, f) !=3D 1) { > >+ RTE_LOG(ERR, USER1, "Input file could not be rea= d\n"); > >+ goto err; > >+ } > >+ if (fseek(f, 0, SEEK_SET) !=3D 0) { > >+ RTE_LOG(ERR, USER1, > >+ "Size of input could not be calculated\n= "); > >+ goto err; > >+ } > >+ remaining_data -=3D data_to_read; > >+ data +=3D data_to_read; > It looks like it will run 2nd time only if input file size < input data s= ize in which > case it will just keep filling input buffer with repeated data. > Is that the intention here? >>From what I can see, yes, this will only enter this while loop a second tim= e if the file is smaller than the data_size requested. Repeating the data from your input file as much as requested.=20 If we were to pad with 0's or random data it would skew the ratio a lot. Even though I do understand the ratio may be better here in this case as we= ll, due to the repetition of data. >=20 > >+ } > >+ > >+ if (test_data->input_data_sz > actual_file_sz) > >+ RTE_LOG(INFO, USER1, > >+ "%zu bytes read from file %s, extending the file %.2f = times\n", > >+ test_data->input_data_sz, test_data->input_file, > >+ (double)test_data->input_data_sz/actual_file_sz)= ; > >+ else > >+ RTE_LOG(INFO, USER1, > >+ "%zu bytes read from file %s\n", > >+ test_data->input_data_sz, > >+ test_data->input_file); > >+ > >+ fclose(f); > >+ > >+ return 0; > >+ > >+err: > >+ fclose(f); > >+ rte_free(test_data->input_data); > >+ test_data->input_data =3D NULL; > >+ > >+ return -1; > >+} > >+ > >+static int > >+comp_perf_initialize_compressdev(struct comp_test_data *test_data) { > >+ uint8_t enabled_cdev_count; > >+ uint8_t enabled_cdevs[RTE_COMPRESS_MAX_DEVS]; > >+ > >+ enabled_cdev_count =3D rte_compressdev_devices_get(test_data- > >driver_name, > >+ enabled_cdevs, RTE_COMPRESS_MAX_DEVS); > >+ if (enabled_cdev_count =3D=3D 0) { > >+ RTE_LOG(ERR, USER1, "No compress devices type %s availab= le\n", > >+ test_data->driver_name); > >+ return -EINVAL; > >+ } > >+ > >+ if (enabled_cdev_count > 1) > >+ RTE_LOG(INFO, USER1, > >+ "Only the first compress device will be > >+ used\n"); > >+ > >+ test_data->cdev_id =3D enabled_cdevs[0]; > >+ > >+ if (comp_perf_check_capabilities(test_data) < 0) > >+ return -1; > >+ > >+ /* Configure compressdev (one device, one queue pair) */ > >+ struct rte_compressdev_config config =3D { > >+ .socket_id =3D rte_socket_id(), > >+ .nb_queue_pairs =3D 1, > >+ .max_nb_priv_xforms =3D NUM_MAX_XFORMS, > >+ .max_nb_streams =3D 0 > >+ }; > >+ > >+ if (rte_compressdev_configure(test_data->cdev_id, &config) < 0) = { > >+ RTE_LOG(ERR, USER1, "Device configuration failed\n"); > >+ return -1; > >+ } > >+ > >+ if (rte_compressdev_queue_pair_setup(test_data->cdev_id, 0, > >+ NUM_MAX_INFLIGHT_OPS, rte_socket_id()) < 0) { > >+ RTE_LOG(ERR, USER1, "Queue pair setup failed\n"); > >+ return -1; > >+ } > >+ > >+ if (rte_compressdev_start(test_data->cdev_id) < 0) { > >+ RTE_LOG(ERR, USER1, "Device could not be started\n"); > >+ return -1; > >+ } > >+ > >+ return 0; > >+} > >+ > >+static int > >+prepare_bufs(struct comp_test_data *test_data) { > >+ uint32_t remaining_data =3D test_data->input_data_sz; > >+ uint8_t *input_data_ptr =3D test_data->input_data; > >+ size_t data_sz; > >+ uint8_t *data_addr; > >+ uint32_t i, j; > >+ > >+ for (i =3D 0; i < test_data->total_bufs; i++) { > >+ /* Allocate data in input mbuf and copy data from input = file */ > >+ test_data->decomp_bufs[i] =3D > >+ rte_pktmbuf_alloc(test_data->decomp_buf_pool); > >+ if (test_data->decomp_bufs[i] =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, "Could not allocate mbuf\n")= ; > >+ return -1; > >+ } > >+ > >+ data_sz =3D RTE_MIN(remaining_data, test_data->seg_sz); > >+ data_addr =3D (uint8_t *) rte_pktmbuf_append( > >+ test_data->decomp_bufs[i], data_= sz); > >+ if (data_addr =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, "Could not append data\n"); > >+ return -1; > >+ } > >+ rte_memcpy(data_addr, input_data_ptr, data_sz); > >+ > >+ input_data_ptr +=3D data_sz; > >+ remaining_data -=3D data_sz; > >+ > >+ /* Already one segment in the mbuf */ > >+ uint16_t segs_per_mbuf =3D 1; > >+ > >+ /* Chain mbufs if needed for input mbufs */ > >+ while (segs_per_mbuf < test_data->max_sgl_segs > >+ && remaining_data > 0) { > >+ struct rte_mbuf *next_seg =3D > >+ > >+ rte_pktmbuf_alloc(test_data->decomp_buf_pool); > >+ > >+ if (next_seg =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, > >+ "Could not allocate mbuf\n"); > >+ return -1; > >+ } > >+ > >+ data_sz =3D RTE_MIN(remaining_data, test_data->s= eg_sz); > >+ data_addr =3D (uint8_t *)rte_pktmbuf_append(next= _seg, > >+ data_sz); > >+ > >+ if (data_addr =3D=3D NULL) { > >+ RTE_LOG(ERR, USER1, "Could not append > >+ data\n"); > Since a new buffer per segment is allocated, so is it possible for append= to > fail? think, this check is redundant here. True. > >+ return -1; > >+ } > >+ > >+ rte_memcpy(data_addr, input_data_ptr, data_sz); > >+ input_data_ptr +=3D data_sz; > >+ remaining_data -=3D data_sz; > >+ > >+ if (rte_pktmbuf_chain(test_data->decomp_bufs[i], > >+ next_seg) < 0) { > >+ RTE_LOG(ERR, USER1, "Could not chain mbu= fs\n"); > >+ return -1; > >+ } > >+ segs_per_mbuf++; > >+ } > >+ <...> > >+ > >+ /* Create private xform */ > >+ if (rte_compressdev_private_xform_create(dev_id, &xform, > >+ &priv_xform) < 0) { > >+ RTE_LOG(ERR, USER1, "Private xform could not be created\= n"); > >+ res =3D -1; > >+ goto end; > >+ } > >+ > >+ uint64_t tsc_start, tsc_end, tsc_duration; > >+ > >+ tsc_start =3D tsc_end =3D tsc_duration =3D 0; > >+ if (benchmarking) { > >+ tsc_start =3D rte_rdtsc(); > >+ num_iter =3D test_data->num_iter; > >+ } else > >+ num_iter =3D 1; > Looks like in same code we're doing benchmarking and functional validatio= n. > It can be reorganised to keep validation test separately like done in > crypto_perf. Good point, will keep that in mind when doing these changes. >=20 > >+ > >+ for (iter =3D 0; iter < num_iter; iter++) { > >+ uint32_t total_ops =3D test_data->total_bufs; > >+ uint32_t remaining_ops =3D test_data->total_bufs; > >+ uint32_t total_deq_ops =3D 0; > >+ uint32_t total_enq_ops =3D 0; > >+ uint16_t ops_unused =3D 0; > >+ uint16_t num_enq =3D 0; > >+ uint16_t num_deq =3D 0; > >+ > >+ output_size =3D 0; > >+ > >+ while (remaining_ops > 0) { > >+ uint16_t num_ops =3D RTE_MIN(remaining_ops, > >+ test_data->burst_sz); > >+ uint16_t ops_needed =3D num_ops - ops_unused; > >+ > >+ /* > >+ * Move the unused operations from the previous > >+ * enqueue_burst call to the front, to maintain = order > >+ */ > >+ if ((ops_unused > 0) && (num_enq > 0)) { > >+ size_t nb_b_to_mov =3D > >+ ops_unused * sizeof(struct > >+ rte_comp_op *); > >+ > >+ memmove(ops, &ops[num_enq], nb_b_to_mov)= ; > >+ } > >+ > >+ /* Allocate compression operations */ > >+ if (ops_needed && !rte_comp_op_bulk_alloc( > >+ test_data->op_pool, > >+ &ops[ops_unused], > >+ ops_needed)) { > >+ RTE_LOG(ERR, USER1, > >+ "Could not allocate enough operati= ons\n"); > >+ res =3D -1; > >+ goto end; > >+ } > >+ allocated +=3D ops_needed; > >+ > >+ for (i =3D 0; i < ops_needed; i++) { > >+ /* > >+ * Calculate next buffer to attach to op= eration > >+ */ > >+ uint32_t buf_id =3D total_enq_ops + i + > >+ ops_unused; > >+ uint16_t op_id =3D ops_unused + i; > >+ /* Reset all data in output buffers */ > >+ struct rte_mbuf *m =3D > >+ output_bufs[buf_id]; > >+ > >+ m->pkt_len =3D test_data->seg_sz * > >+ m->nb_segs; > Isn't pkt_len set already when we call rte_pktmbuf_append() and chain()? >=20 > >+ while (m) { > >+ m->data_len =3D m->buf_len - > >+ m->data_off; > Same question, shouldn't rte_pktmbuf_append() adjust data_len as well per > each mbuf? Yes you are correct, >>From what I can see the *m mbuf pointer is redundant.=20 >=20 > >+ m =3D m->next; > >+ } > >+ ops[op_id]->m_src =3D input_bufs[buf_id]= ; > >+ ops[op_id]->m_dst =3D output_bufs[buf_id= ]; > >+ ops[op_id]->src.offset =3D 0; > >+ ops[op_id]->src.length =3D > >+ rte_pktmbuf_pkt_len(input_bufs[b= uf_id]); > >+ ops[op_id]->dst.offset =3D 0; > >+ ops[op_id]->flush_flag =3D RTE_COMP_FLUS= H_FINAL; > >+ ops[op_id]->input_chksum =3D buf_id; > >+ ops[op_id]->private_xform =3D priv_xform= ; > >+ } > >+ > >+ num_enq =3D rte_compressdev_enqueue_burst(dev_id= , 0, ops, > >+ num_ops)= ; > >+ ops_unused =3D num_ops - num_enq; > >+ remaining_ops -=3D num_enq; > >+ total_enq_ops +=3D num_enq; > >+ > >+ num_deq =3D rte_compressdev_dequeue_burst(dev_id= , 0, > >+ deq_ops, > >+ test_data->bu= rst_sz); > >+ total_deq_ops +=3D num_deq; > >+ if (benchmarking =3D=3D 0) { > >+ for (i =3D 0; i < num_deq; i++) { > >+ struct rte_comp_op *op =3D deq_o= ps[i]; > >+ const void *read_data_addr =3D > >+ rte_pktmbuf_read(op->m_d= st, 0, > >+ op->produced, output_dat= a_ptr); > >+ if (read_data_addr =3D=3D NULL) = { > >+ RTE_LOG(ERR, USER1, > >+ "Could not copy buffer in destinat= ion\n"); > >+ res =3D -1; > >+ goto end; > >+ } > >+ > >+ if (read_data_addr !=3D output_d= ata_ptr) > >+ rte_memcpy(output_data_p= tr, > >+ rte_pktmbuf_mtod= ( > >+ op->m_dst, uin= t8_t *), > >+ op->produced); > >+ output_data_ptr +=3D op->produce= d; > >+ output_size +=3D op->produced; > >+ > >+ } > >+ } > >+ > >+ if (iter =3D=3D num_iter - 1) { > >+ for (i =3D 0; i < num_deq; i++) { > Why is it only for last iteration, we are adjusting dst mbuf data_len.? > Shouldn't it be done for each dequeued op? > And, for benchmarking, do we even need to set data and pkt len on dst > mbuf? I assume the data_len is only getting changed on the last iteration, for th= e reason you gave, benchmarking, to save cycles. Does it need to be at all? Possibly not.=20 >=20 > >+ struct rte_comp_op *op =3D deq_o= ps[i]; > >+ struct rte_mbuf *m =3D op->m_dst= ; > >+ > >+ m->pkt_len =3D op->produced; > >+ uint32_t remaining_data =3D op->= produced; > >+ uint16_t data_to_append; > >+ > >+ while (remaining_data > 0) { > >+ data_to_append =3D > >+ RTE_MIN(remainin= g_data, > >+ test_data->= seg_sz); > >+ m->data_len =3D data_to_= append; > >+ remaining_data -=3D > >+ data_to_= append; > >+ m =3D m->next; > Should break if m->next =3D=3D NULL Yup, you are correct, should be a check here. > >+ } > >+ } > >+ } > >+ rte_mempool_put_bulk(test_data->op_pool, > >+ (void **)deq_ops, num_deq); > >+ allocated -=3D num_deq; > >+ } > >+ > >+ /* Dequeue the last operations */ > >+ while (total_deq_ops < total_ops) { > >+ num_deq =3D rte_compressdev_dequeue_burst(dev_id= , 0, > >+ deq_ops, test_data->burs= t_sz); > >+ total_deq_ops +=3D num_deq; > >+ if (benchmarking =3D=3D 0) { > >+ for (i =3D 0; i < num_deq; i++) { > >+ struct rte_comp_op *op =3D deq_o= ps[i]; > >+ const void *read_data_addr =3D > >+ rte_pktmbuf_read(op->m_d= st, 0, > >+ op->produced, output_dat= a_ptr); > >+ if (read_data_addr =3D=3D NULL) = { > >+ RTE_LOG(ERR, USER1, > >+ "Could not copy buffer in destinat= ion\n"); > >+ res =3D -1; > >+ goto end; > >+ } > >+ > >+ if (read_data_addr !=3D output_d= ata_ptr) > >+ rte_memcpy(output_data_p= tr, > >+ rte_pktmbuf_mtod= ( > >+ op->m_dst, uint8= _t *), > >+ op->produced); > >+ output_data_ptr +=3D op->produce= d; > >+ output_size +=3D op->produced; > >+ > >+ } > >+ } > >+ > >+ if (iter =3D=3D num_iter - 1) { > >+ for (i =3D 0; i < num_deq; i++) { > >+ struct rte_comp_op *op =3D deq_o= ps[i]; > >+ struct rte_mbuf *m =3D op->m_dst= ; > >+ > >+ m->pkt_len =3D op->produced; > >+ uint32_t remaining_data =3D op->= produced; > >+ uint16_t data_to_append; > >+ > >+ while (remaining_data > 0) { > >+ data_to_append =3D > >+ RTE_MIN(remaining_data, > >+ test_data->seg_s= z); > >+ m->data_len =3D data_to_= append; > >+ remaining_data -=3D > >+ data_to_= append; > >+ m =3D m->next; > >+ } > >+ } > >+ } > >+ rte_mempool_put_bulk(test_data->op_pool, > >+ (void **)deq_ops, num_deq); > >+ allocated -=3D num_deq; > >+ } > >+ } > >+ > >+ if (benchmarking) { > >+ tsc_end =3D rte_rdtsc(); > >+ tsc_duration =3D tsc_end - tsc_start; > >+ > >+ if (type =3D=3D RTE_COMP_COMPRESS) > test looks for stateless operations only, so can we add perf test type li= ke: test > type perf, op type:STATELESS/STATEFUL=20 Are you asking for the tool to support stateful ops? Since no drivers suppo= rt stateful yet=20 We just wanted to ensure current driver functionality was covered with this= first version. >Also, why do we need --max-num- > sgl-segs as an input option from user? Shouldn't input_sz and seg_sz > internally decide on num-segs? > Or is it added to serve some other different purpose? Will have to get back to you on this one, seems illogical to get this input= from user, But I will have to do further investigation to find if there was a differen= t purpose.=20 >=20 > Thanks > Shally >=20 Thanks for the feedback,=20 We hope to get V2 sent asap.