* [PATCH 0/4] pcapng fixes @ 2023-09-21 4:23 Stephen Hemminger 2023-09-21 4:23 ` [PATCH 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (9 more replies) 0 siblings, 10 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-09-21 4:23 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger There were a couple of reported bugs in dumpcap around timestamps and multiple invocations. This patchset does some refactoring to fix them in a simpler way. Stephen Hemminger (4): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: change timestamp argument to write_stats pcapng: move timestamp calculation into pdump app/dumpcap/main.c | 31 ++++++++-------- app/test/test_pcapng.c | 4 +-- lib/pcapng/rte_pcapng.c | 79 ++++------------------------------------- lib/pcapng/rte_pcapng.h | 7 ++-- lib/pdump/rte_pdump.c | 61 +++++++++++++++++++++++++++---- 5 files changed, 83 insertions(+), 99 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 1/4] pdump: fix setting rte_errno on mp error 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger @ 2023-09-21 4:23 ` Stephen Hemminger 2023-09-21 4:23 ` [PATCH 2/4] dumpcap: allow multiple invocations Stephen Hemminger ` (8 subsequent siblings) 9 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-09-21 4:23 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 53cca1034d41..a70085bd0211 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 2/4] dumpcap: allow multiple invocations 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger 2023-09-21 4:23 ` [PATCH 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-09-21 4:23 ` Stephen Hemminger 2023-09-21 6:22 ` Morten Brørup 2023-09-21 4:23 ` [PATCH 3/4] pcapng: change timestamp argument to write_stats Stephen Hemminger ` (7 subsequent siblings) 9 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-09-21 4:23 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 64294bbfb3e6..37754fd06f4f 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%u", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH 2/4] dumpcap: allow multiple invocations 2023-09-21 4:23 ` [PATCH 2/4] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-09-21 6:22 ` Morten Brørup 2023-09-21 7:10 ` Isaac Boukris 2023-11-07 2:34 ` Stephen Hemminger 0 siblings, 2 replies; 61+ messages in thread From: Morten Brørup @ 2023-09-21 6:22 UTC (permalink / raw) To: Stephen Hemminger, dev; +Cc: Isaac Boukris, Reshma Pattan > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 21 September 2023 06.24 > > If dumpcap is run twice with each instance pointing a different > interface, it would fail because of overlap in ring a pool names. > Fix by putting process id in the name. > > Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") > Reported-by: Isaac Boukris <iboukris@gmail.com> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > app/dumpcap/main.c | 28 ++++++++++++++-------------- > 1 file changed, 14 insertions(+), 14 deletions(-) > > diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c > index 64294bbfb3e6..37754fd06f4f 100644 > --- a/app/dumpcap/main.c > +++ b/app/dumpcap/main.c > @@ -44,7 +44,6 @@ > #include <pcap/pcap.h> > #include <pcap/bpf.h> > > -#define RING_NAME "capture-ring" > #define MONITOR_INTERVAL (500 * 1000) > #define MBUF_POOL_CACHE_SIZE 32 > #define BURST_SIZE 32 > @@ -647,6 +646,7 @@ static void dpdk_init(void) > static struct rte_ring *create_ring(void) > { > struct rte_ring *ring; > + char ring_name[RTE_RING_NAMESIZE]; > size_t size, log2; > > /* Find next power of 2 >= size. */ > @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) > ring_size = size; > } > > - ring = rte_ring_lookup(RING_NAME); > - if (ring == NULL) { > - ring = rte_ring_create(RING_NAME, ring_size, > - rte_socket_id(), 0); > - if (ring == NULL) > - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", > - rte_strerror(rte_errno)); > - } > + /* Want one ring per invocation of program */ > + snprintf(ring_name, sizeof(ring_name), > + "dumpcap-%u", getpid()); I'm not sure getpid() is available on Windows. How about: #ifdef _WIN32 #include <processthreadsapi.h> // With the headers, not here. "dumpcap-%lu", GetCurrentProcessId()); #else "dumpcap-%u", getpid()); #endif > + > + ring = rte_ring_create(ring_name, ring_size, > + rte_socket_id(), 0); > + if (ring == NULL) > + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", > + rte_strerror(rte_errno)); > + > return ring; > } > > static struct rte_mempool *create_mempool(void) > { > const struct interface *intf; > - static const char pool_name[] = "capture_mbufs"; > + char pool_name[RTE_MEMPOOL_NAMESIZE]; > size_t num_mbufs = 2 * ring_size; > struct rte_mempool *mp; > uint32_t data_size = 128; > > - mp = rte_mempool_lookup(pool_name); > - if (mp) > - return mp; > + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); Same regarding getpid(). > > /* Common pool so size mbuf for biggest snap length */ > TAILQ_FOREACH(intf, &interfaces, next) { > @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct > rte_mempool *mp) > rte_exit(EXIT_FAILURE, > "Packet dump enable on %u:%s failed %s\n", > intf->port, intf->name, > - rte_strerror(-ret)); > + rte_strerror(rte_errno)); > } > > if (intf->opts.promisc_mode) { > -- > 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/4] dumpcap: allow multiple invocations 2023-09-21 6:22 ` Morten Brørup @ 2023-09-21 7:10 ` Isaac Boukris 2023-11-07 2:34 ` Stephen Hemminger 1 sibling, 0 replies; 61+ messages in thread From: Isaac Boukris @ 2023-09-21 7:10 UTC (permalink / raw) To: Morten Brørup; +Cc: Stephen Hemminger, dev, Reshma Pattan On Thu, Sep 21, 2023 at 9:22 AM Morten Brørup <mb@smartsharesystems.com> wrote: > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Thursday, 21 September 2023 06.24 > > > > If dumpcap is run twice with each instance pointing a different > > interface, it would fail because of overlap in ring a pool names. > > Fix by putting process id in the name. > > > > Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") > > Reported-by: Isaac Boukris <iboukris@gmail.com> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > app/dumpcap/main.c | 28 ++++++++++++++-------------- > > 1 file changed, 14 insertions(+), 14 deletions(-) > > > > diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c > > index 64294bbfb3e6..37754fd06f4f 100644 > > --- a/app/dumpcap/main.c > > +++ b/app/dumpcap/main.c > > @@ -44,7 +44,6 @@ > > #include <pcap/pcap.h> > > #include <pcap/bpf.h> > > > > -#define RING_NAME "capture-ring" > > #define MONITOR_INTERVAL (500 * 1000) > > #define MBUF_POOL_CACHE_SIZE 32 > > #define BURST_SIZE 32 > > @@ -647,6 +646,7 @@ static void dpdk_init(void) > > static struct rte_ring *create_ring(void) > > { > > struct rte_ring *ring; > > + char ring_name[RTE_RING_NAMESIZE]; > > size_t size, log2; > > > > /* Find next power of 2 >= size. */ > > @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) > > ring_size = size; > > } > > > > - ring = rte_ring_lookup(RING_NAME); > > - if (ring == NULL) { > > - ring = rte_ring_create(RING_NAME, ring_size, > > - rte_socket_id(), 0); > > - if (ring == NULL) > > - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", > > - rte_strerror(rte_errno)); > > - } > > + /* Want one ring per invocation of program */ > > + snprintf(ring_name, sizeof(ring_name), > > + "dumpcap-%u", getpid()); > > I'm not sure getpid() is available on Windows. How about: I think the 'app/dumpcap/meson.build' file indicates no support for Windows. Regards > #ifdef _WIN32 > #include <processthreadsapi.h> // With the headers, not here. > "dumpcap-%lu", GetCurrentProcessId()); > #else > "dumpcap-%u", getpid()); > #endif > > > + > > + ring = rte_ring_create(ring_name, ring_size, > > + rte_socket_id(), 0); > > + if (ring == NULL) > > + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", > > + rte_strerror(rte_errno)); > > + > > return ring; > > } > > > > static struct rte_mempool *create_mempool(void) > > { > > const struct interface *intf; > > - static const char pool_name[] = "capture_mbufs"; > > + char pool_name[RTE_MEMPOOL_NAMESIZE]; > > size_t num_mbufs = 2 * ring_size; > > struct rte_mempool *mp; > > uint32_t data_size = 128; > > > > - mp = rte_mempool_lookup(pool_name); > > - if (mp) > > - return mp; > > + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); > > Same regarding getpid(). > > > > > /* Common pool so size mbuf for biggest snap length */ > > TAILQ_FOREACH(intf, &interfaces, next) { > > @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct > > rte_mempool *mp) > > rte_exit(EXIT_FAILURE, > > "Packet dump enable on %u:%s failed %s\n", > > intf->port, intf->name, > > - rte_strerror(-ret)); > > + rte_strerror(rte_errno)); > > } > > > > if (intf->opts.promisc_mode) { > > -- > > 2.39.2 > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/4] dumpcap: allow multiple invocations 2023-09-21 6:22 ` Morten Brørup 2023-09-21 7:10 ` Isaac Boukris @ 2023-11-07 2:34 ` Stephen Hemminger 1 sibling, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-07 2:34 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, Isaac Boukris, Reshma Pattan On Thu, 21 Sep 2023 08:22:12 +0200 Morten Brørup <mb@smartsharesystems.com> wrote: > I'm not sure getpid() is available on Windows. How about: > > #ifdef _WIN32 > #include <processthreadsapi.h> // With the headers, not here. > "dumpcap-%lu", GetCurrentProcessId()); > #else > "dumpcap-%u", getpid()); > #endif Dumpcap doesn't support windows because there are lots of things about pdump library that won't work on Windows. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 3/4] pcapng: change timestamp argument to write_stats 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger 2023-09-21 4:23 ` [PATCH 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-09-21 4:23 ` [PATCH 2/4] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-09-21 4:23 ` Stephen Hemminger 2023-09-21 4:23 ` [PATCH 4/4] pcapng: move timestamp calculation into pdump Stephen Hemminger ` (6 subsequent siblings) 9 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-09-21 4:23 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan In order to cleanup the management of time base calculation, later patch will move the calculation from pcapng to the pdump library. One of the changes necessary is to move the timestamp calculation in the write_stats call from the pcapng library into the caller. Since dumpcap already does this for other timestamps the change is rather small. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 3 ++- app/test/test_pcapng.c | 4 ++-- lib/pcapng/rte_pcapng.c | 8 +++----- lib/pcapng/rte_pcapng.h | 5 ++++- 4 files changed, 11 insertions(+), 9 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 37754fd06f4f..8f6ab3396cef 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -577,6 +577,7 @@ report_packet_stats(dumpcap_out_t out) struct rte_pdump_stats pdump_stats; struct interface *intf; uint64_t ifrecv, ifdrop; + uint64_t timestamp = create_timestamp(); double percent; fputc('\n', stderr); @@ -590,7 +591,7 @@ report_packet_stats(dumpcap_out_t out) if (use_pcapng) rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, + timestamp, start_time, end_time, ifrecv, ifdrop); if (ifrecv == 0) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..55aa2cf93666 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -173,8 +173,8 @@ test_write_stats(void) ssize_t len; /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, + len = rte_pcapng_write_stats(pcapng, port_id, NULL, + 0, 0, 0, NUM_PACKETS, 0); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..ddce7bc87141 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -368,7 +368,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, + const char *comment, uint64_t sample_time, uint64_t start_time, uint64_t end_time, uint64_t ifrecv, uint64_t ifdrop) { @@ -376,7 +376,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, struct pcapng_option *opt; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -425,9 +424,8 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..1225ed5536ff 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -189,7 +189,9 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * @param port * The Ethernet port to report stats on. * @param comment - * Optional comment to add to statistics. + * Optional comment to add to statistics. + * @param timestamp + * Time this statistic sample refers to in nanoseconds. * @param start_time * The time when packet capture was started in nanoseconds. * Optional: can be zero if not known. @@ -209,6 +211,7 @@ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, const char *comment, + uint64_t timestamp, uint64_t start_time, uint64_t end_time, uint64_t ifrecv, uint64_t ifdrop); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 4/4] pcapng: move timestamp calculation into pdump 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-09-21 4:23 ` [PATCH 3/4] pcapng: change timestamp argument to write_stats Stephen Hemminger @ 2023-09-21 4:23 ` Stephen Hemminger 2023-10-02 8:15 ` David Marchand 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger ` (5 subsequent siblings) 9 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-09-21 4:23 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Quentin Armitage The computation of timestamp is more easily done in pdump than pcapng. The initialization is easier and makes the pcapng library have no global state. It also makes it easier to add HW timestamp support later. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/pcapng/rte_pcapng.c | 71 ++--------------------------------------- lib/pcapng/rte_pcapng.h | 2 +- lib/pdump/rte_pdump.c | 56 +++++++++++++++++++++++++++++--- 3 files changed, 55 insertions(+), 74 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index ddce7bc87141..f6b3bd0ca718 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -25,7 +25,6 @@ #include <rte_mbuf.h> #include <rte_os_shim.h> #include <rte_pcapng.h> -#include <rte_reciprocal.h> #include <rte_time.h> #include "pcapng_proto.h" @@ -43,15 +42,6 @@ struct rte_pcapng { uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,58 +92,6 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) -{ - struct timespec ts; - - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} - -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } - - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); -} - /* length of option including padding */ static uint16_t pcapng_optlen(uint16_t len) { @@ -518,7 +456,7 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, uint64_t timestamp, enum rte_pcapng_direction direction, const char *comment) { @@ -527,14 +465,11 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, struct pcapng_option *opt; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -639,8 +574,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index 1225ed5536ff..b9a9ee23ad1d 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -122,7 +122,7 @@ enum rte_pcapng_direction { * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). * @param timestamp - * The timestamp in TSC cycles. + * The timestamp in nanoseconds since 1/1/1970. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index a70085bd0211..384abf5e27ad 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -10,7 +10,9 @@ #include <rte_log.h> #include <rte_memzone.h> #include <rte_errno.h> +#include <rte_reciprocal.h> #include <rte_string_fns.h> +#include <rte_time.h> #include <rte_pcapng.h> #include "rte_pdump.h" @@ -78,6 +80,33 @@ static struct { const struct rte_memzone *mz; } *pdump_stats; +/* Time conversion values */ +static struct { + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when initialized */ + uint64_t tsc_hz; /* copy of rte_tsc_hz() */ + struct rte_reciprocal_u64 tsc_hz_inverse; /* inverse of tsc_hz */ +} pdump_time; + +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t pdump_timestamp(void) +{ + uint64_t delta, secs, ns; + + delta = rte_get_tsc_cycles() - pdump_time.tsc_base; + + /* Avoid numeric wraparound by computing seconds first */ + secs = rte_reciprocal_divide_u64(delta, &pdump_time.tsc_hz_inverse); + + /* Remove the seconds portion */ + delta -= secs * pdump_time.tsc_hz; + ns = rte_reciprocal_divide_u64(delta * NS_PER_S, + &pdump_time.tsc_hz_inverse); + + return secs * NS_PER_S + ns + pdump_time.offset_ns; +} + + /* Create a clone of mbuf to be placed into ring. */ static void pdump_copy(uint16_t port_id, uint16_t queue, @@ -90,7 +119,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; + uint64_t timestamp = 0; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +128,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -119,12 +147,17 @@ pdump_copy(uint16_t port_id, uint16_t queue, * If using pcapng then want to wrap packets * otherwise a simple copy. */ - if (cbs->ver == V2) + if (cbs->ver == V2) { + /* calculate timestamp on first packet */ + if (timestamp == 0) + timestamp = pdump_timestamp(); + p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); - else + timestamp, direction, NULL); + } else { p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); + } if (unlikely(p == NULL)) __atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED); @@ -421,8 +454,21 @@ int rte_pdump_init(void) { const struct rte_memzone *mz; + struct timespec ts; + uint64_t cycles; int ret; + /* Compute time base offsets */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + + /* put initial TSC value in middle of clock_gettime() call */ + pdump_time.tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + pdump_time.offset_ns = rte_timespec_to_ns(&ts); + + pdump_time.tsc_hz = rte_get_tsc_hz(); + pdump_time.tsc_hz_inverse = rte_reciprocal_value_u64(pdump_time.tsc_hz); + mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats), rte_socket_id(), 0); if (mz == NULL) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 4/4] pcapng: move timestamp calculation into pdump 2023-09-21 4:23 ` [PATCH 4/4] pcapng: move timestamp calculation into pdump Stephen Hemminger @ 2023-10-02 8:15 ` David Marchand 2023-10-04 17:13 ` Stephen Hemminger 0 siblings, 1 reply; 61+ messages in thread From: David Marchand @ 2023-10-02 8:15 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Reshma Pattan, Quentin Armitage Hello Stephen, On Thu, Sep 21, 2023 at 6:24 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > The computation of timestamp is more easily done in pdump > than pcapng. The initialization is easier and makes the pcapng > library have no global state. > > It also makes it easier to add HW timestamp support later. > > Simplify the computation of nanoseconds from TSC to a two > step process which avoids numeric overflow issues. The previous > code was not thread safe as well. > Bugzilla ID: 1291 ? This patch (and patch 3) updates some pcapng API, is it worth a RN update? > Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Is it worth backporting? I would say no, as some API update was needed to fix the issue. But on the other hand, this is an experimental API, so I prefer to ask. > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> -- David Marchand ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 4/4] pcapng: move timestamp calculation into pdump 2023-10-02 8:15 ` David Marchand @ 2023-10-04 17:13 ` Stephen Hemminger 2023-10-06 9:10 ` David Marchand 0 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-10-04 17:13 UTC (permalink / raw) To: David Marchand; +Cc: dev, Reshma Pattan, Quentin Armitage On Mon, 2 Oct 2023 10:15:25 +0200 David Marchand <david.marchand@redhat.com> wrote: > > > > Bugzilla ID: 1291 ? > > This patch (and patch 3) updates some pcapng API, is it worth a RN update? > > > Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") > > Is it worth backporting? > I would say no, as some API update was needed to fix the issue. > But on the other hand, this is an experimental API, so I prefer to ask. > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Good question. Is experimental API allowed to change in a stable release? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 4/4] pcapng: move timestamp calculation into pdump 2023-10-04 17:13 ` Stephen Hemminger @ 2023-10-06 9:10 ` David Marchand 2023-10-06 14:59 ` Kevin Traynor 0 siblings, 1 reply; 61+ messages in thread From: David Marchand @ 2023-10-06 9:10 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Reshma Pattan, Quentin Armitage, Thomas Monjalon, Kevin Traynor On Wed, Oct 4, 2023 at 7:13 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Mon, 2 Oct 2023 10:15:25 +0200 > David Marchand <david.marchand@redhat.com> wrote: > > > > > > > > Bugzilla ID: 1291 ? > > > > This patch (and patch 3) updates some pcapng API, is it worth a RN update? > > > > > Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") > > > > Is it worth backporting? > > I would say no, as some API update was needed to fix the issue. > > But on the other hand, this is an experimental API, so I prefer to ask. > > > > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > Good question. > Is experimental API allowed to change in a stable release? I don't think this is cleary described in our ABI policy. An experimental API may be changed at any time, but nothing is said wrt backports. Breaking an API is always a pain, and for a LTS release it would probably be badly accepted by users. Cc: Kevin for his opinion. We may need a clarification on this topic in the doc. -- David Marchand ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 4/4] pcapng: move timestamp calculation into pdump 2023-10-06 9:10 ` David Marchand @ 2023-10-06 14:59 ` Kevin Traynor 0 siblings, 0 replies; 61+ messages in thread From: Kevin Traynor @ 2023-10-06 14:59 UTC (permalink / raw) To: David Marchand, Stephen Hemminger Cc: dev, Reshma Pattan, Quentin Armitage, Thomas Monjalon, Luca Boccassi, Xueming(Steven) Li, Christian Ehrhardt On 06/10/2023 10:10, David Marchand wrote: > On Wed, Oct 4, 2023 at 7:13 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: >> >> On Mon, 2 Oct 2023 10:15:25 +0200 >> David Marchand <david.marchand@redhat.com> wrote: >> >>>> >>> >>> Bugzilla ID: 1291 ? >>> >>> This patch (and patch 3) updates some pcapng API, is it worth a RN update? >>> >>>> Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") >>> >>> Is it worth backporting? >>> I would say no, as some API update was needed to fix the issue. >>> But on the other hand, this is an experimental API, so I prefer to ask. >>> >>> >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> >> >> Good question. >> Is experimental API allowed to change in a stable release? > > I don't think this is cleary described in our ABI policy. > An experimental API may be changed at any time, but nothing is said > wrt backports. > > Breaking an API is always a pain, and for a LTS release it would > probably be badly accepted by users. > yes, I agree. IIRC, this arose sometime in the past with a branch that Luca was maintaining and I think the consensus among LTS maintainers was not to change experimental API on stable branches. > Cc: Kevin for his opinion. > > We may need a clarification on this topic in the doc. > > Perhaps it's not a "rule" since experimental API comes with no guarantee, but I can add something to the docs that it is a guideline not to break experimental API on stable branch. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2 0/4] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-09-21 4:23 ` [PATCH 4/4] pcapng: move timestamp calculation into pdump Stephen Hemminger @ 2023-10-05 23:06 ` Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (3 more replies) 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (4 subsequent siblings) 9 siblings, 4 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-10-05 23:06 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This version slightly modifies the pcapng API to fix issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). Stephen Hemminger (4): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation test: cleanups to pcapng test app/dumpcap/main.c | 53 +++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 378 +++++++++++++++++++++++++--------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++-------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 318 insertions(+), 264 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2 1/4] pdump: fix setting rte_errno on mp error 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger @ 2023-10-05 23:06 ` Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 2/4] dumpcap: allow multiple invocations Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-10-05 23:06 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 53cca1034d41..a70085bd0211 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2 2/4] dumpcap: allow multiple invocations 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-10-05 23:06 ` Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 3/4] pcapng: modify timestamp calculation Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 4/4] test: cleanups to pcapng test Stephen Hemminger 3 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-10-05 23:06 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 64294bbfb3e6..37754fd06f4f 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%u", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2 3/4] pcapng: modify timestamp calculation 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 2/4] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-10-05 23:06 ` Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 4/4] test: cleanups to pcapng test Stephen Hemminger 3 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-10-05 23:06 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 4 +- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 61 insertions(+), 112 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 37754fd06f4f..764dac6c37c0 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..55aa2cf93666 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -173,8 +173,8 @@ test_write_stats(void) ssize_t len; /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, + len = rte_pcapng_write_stats(pcapng, port_id, NULL, + 0, 0, 0, NUM_PACKETS, 0); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..c40795c721de 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -121,8 +121,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -136,7 +134,7 @@ __rte_experimental struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -188,29 +186,22 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index a70085bd0211..903f92839b8e 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2 4/4] test: cleanups to pcapng test 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-10-05 23:06 ` [PATCH v2 3/4] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-10-05 23:06 ` Stephen Hemminger 3 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-10-05 23:06 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 378 ++++++++++++++++++++++++++--------------- 2 files changed, 242 insertions(+), 138 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index bf9fc906128f..81d7c41a07cb 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -124,7 +124,7 @@ source_file_deps = { 'test_meter.c': ['meter'], 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 55aa2cf93666..45223ef38240 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,149 +78,226 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } -static int -test_setup(void) +/* + * Make a timestamp value as used by PCAPNG file format + * The library uses nanosecond time resolution so this is + * time elapsed since 1970-01-01 00:00:00 UTC. + * + * Use the same way of calculating as in pdump library. + */ +static struct { + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when initialized */ + uint64_t tsc_hz; /* copy of rte_tsc_hz() */ + struct rte_reciprocal_u64 tsc_hz_inverse; /* inverse of tsc_hz */ +} time_base; + +static void timestamp_init(void) { - int tmp_fd; + struct timespec ts; + uint64_t cycles; - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + /* Compute time base offsets */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); + /* put initial TSC value in middle of clock_gettime() call */ + time_base.tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + time_base.offset_ns = rte_timespec_to_ns(&ts); - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } + time_base.tsc_hz = rte_get_tsc_hz(); + time_base.tsc_hz_inverse = rte_reciprocal_value_u64(time_base.tsc_hz); +} - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; +static int +test_setup(void) +{ + port_id = rte_eth_dev_count_avail(); + + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + + timestamp_init(); return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, + NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, NULL, - 0, 0, 0, - NUM_PACKETS, 0); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + struct timespec ts; + uint64_t ns, now; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + clock_gettime(CLOCK_REALTIME, &ts); + now = rte_timespec_to_ns(&ts); + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < time_base.offset_ns || ns > now) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), time_base.offset_ns); + fmt_time(tend, sizeof(tend), now); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ - *countp += 1; +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); } /* @@ -219,78 +306,98 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.count = 0; + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_write_packets(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; - - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; } - return 0; + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -300,9 +407,6 @@ unit_test_suite test_pcapng_suite = { .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -313,4 +417,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 0/5] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (4 preceding siblings ...) 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (4 more replies) 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 subsequent siblings) 9 siblings, 5 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This has bugfixes and tests for dumpcap and pcapng. It should be in 23.11 but seems to have been ignored. It fixes issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). v3 - don't use alloca() since can have VLA type issues Stephen Hemminger (5): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation pcapng: avoid using alloca() test: cleanups to pcapng test app/dumpcap/main.c | 53 ++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 153 ++++++--------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 371 insertions(+), 285 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 1/5] pdump: fix setting rte_errno on mp error 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 2023-11-09 7:34 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 2/5] dumpcap: allow multiple invocations Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 80b90c6f7d03..e94f49e21250 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 1/5] pdump: fix setting rte_errno on mp error 2023-11-08 18:35 ` [PATCH v3 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-09 7:34 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 7:34 UTC (permalink / raw) To: Stephen Hemminger, dev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Wednesday, 8 November 2023 19.36 > > The response from MP server sets err_value to negative > on error. The convention for rte_errno is to use a positive > value on error. This makes errors like duplicate registration > show up with the correct error value. > > Fixes: 660098d61f57 ("pdump: use generic multi-process channel") > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 2023-11-09 7:50 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 3/5] pcapng: modify timestamp calculation Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 64294bbfb3e6..37754fd06f4f 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%u", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-08 18:35 ` [PATCH v3 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-09 7:50 ` Morten Brørup 2023-11-09 15:40 ` Stephen Hemminger 2023-11-09 17:16 ` Stephen Hemminger 0 siblings, 2 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 7:50 UTC (permalink / raw) To: Stephen Hemminger, dev; +Cc: Isaac Boukris > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Wednesday, 8 November 2023 19.36 > > If dumpcap is run twice with each instance pointing a different > interface, it would fail because of overlap in ring a pool names. > Fix by putting process id in the name. > > Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") > Reported-by: Isaac Boukris <iboukris@gmail.com> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Minor detail: getpid() returns int, so prefer %d over %u. [...] > rte_exit(EXIT_FAILURE, > "Packet dump enable on %u:%s failed %s\n", > intf->port, intf->name, > - rte_strerror(-ret)); > + rte_strerror(rte_errno)); This bugfix (the line above, not the patch itself) supports Tyler's proposal to standardize on returning -1 with rte_errno set on failure, instead of some functions returning -errno. Our dual convention for function return values will cause many bugs like this. With %u or %d, Reviewed-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-09 7:50 ` Morten Brørup @ 2023-11-09 15:40 ` Stephen Hemminger 2023-11-09 16:00 ` Morten Brørup 2023-11-09 17:16 ` Stephen Hemminger 1 sibling, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 15:40 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, Isaac Boukris On Thu, 9 Nov 2023 08:50:10 +0100 Morten Brørup <mb@smartsharesystems.com> wrote: > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > Minor detail: getpid() returns int, so prefer %d over %u. Let me check, per man page. getpid() returns pid_t. The typedef chain leads to: pid_t -> __pid_t -> __PID_T_TYPE -> __S32_TYPE -> int32 -> int ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-09 15:40 ` Stephen Hemminger @ 2023-11-09 16:00 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 16:00 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Isaac Boukris > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 9 November 2023 16.40 > > On Thu, 9 Nov 2023 08:50:10 +0100 > Morten Brørup <mb@smartsharesystems.com> wrote: > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > > --- > > > > Minor detail: getpid() returns int, so prefer %d over %u. > > Let me check, per man page. getpid() returns pid_t. > The typedef chain leads to: > pid_t -> __pid_t -> __PID_T_TYPE -> __S32_TYPE -> int32 -> int Thank you for confirming. So %d is preferred over %u for getpid(). :-) ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-09 7:50 ` Morten Brørup 2023-11-09 15:40 ` Stephen Hemminger @ 2023-11-09 17:16 ` Stephen Hemminger 2023-11-09 18:22 ` Morten Brørup 1 sibling, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:16 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, Isaac Boukris On Thu, 9 Nov 2023 08:50:10 +0100 Morten Brørup <mb@smartsharesystems.com> wrote: > > rte_exit(EXIT_FAILURE, > > "Packet dump enable on %u:%s failed %s\n", > > intf->port, intf->name, > > - rte_strerror(-ret)); > > + rte_strerror(rte_errno)); > > This bugfix (the line above, not the patch itself) supports Tyler's proposal to standardize on returning -1 with rte_errno set on failure, instead of some functions returning -errno. Our dual convention for function return values will cause many bugs like this. The error case here is when rte_pdump_enable_bpf() fails. This is return from pdump_enable in pdump library. The library does follow the rte_errno convention correctly. But the error message wasn't reporting correctly which would lead to confusing error in case where multiple invocations failed. It is not possible to do multiple captures on same interface. And not worth modifying the library (would require multiple copies and ref counts) to handle this case. ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 2/5] dumpcap: allow multiple invocations 2023-11-09 17:16 ` Stephen Hemminger @ 2023-11-09 18:22 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 18:22 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Isaac Boukris > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 9 November 2023 18.16 > > On Thu, 9 Nov 2023 08:50:10 +0100 > Morten Brørup <mb@smartsharesystems.com> wrote: > > > > rte_exit(EXIT_FAILURE, > > > "Packet dump enable on %u:%s failed %s\n", > > > intf->port, intf->name, > > > - rte_strerror(-ret)); > > > + rte_strerror(rte_errno)); > > > > This bugfix (the line above, not the patch itself) supports Tyler's > proposal to standardize on returning -1 with rte_errno set on failure, > instead of some functions returning -errno. Our dual convention for > function return values will cause many bugs like this. > > The error case here is when rte_pdump_enable_bpf() fails. > This is return from pdump_enable in pdump library. > The library does follow the rte_errno convention correctly. I'm sorry about being unclear in my comment about rte_errno conventions; it was not targeted at this library. My comment was meant as general support for Tyler's suggestion, using this as an example of a bug that would not have been there if the return convention was always -1 with rte_errno. With the dual return convention, it's amazing that you caught this bug. > But the error message wasn't reporting correctly which would lead to > confusing error in case where > multiple invocations failed. > > It is not possible to do multiple captures on same interface. And not > worth modifying the > library (would require multiple copies and ref counts) to handle this > case. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 3/5] pcapng: modify timestamp calculation 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 2023-11-09 7:57 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 4 +- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 61 insertions(+), 112 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 37754fd06f4f..764dac6c37c0 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..55aa2cf93666 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -173,8 +173,8 @@ test_write_stats(void) ssize_t len; /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, + len = rte_pcapng_write_stats(pcapng, port_id, NULL, + 0, 0, 0, NUM_PACKETS, 0); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..c40795c721de 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -121,8 +121,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -136,7 +134,7 @@ __rte_experimental struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -188,29 +186,22 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index e94f49e21250..5a1ec14d7a18 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 3/5] pcapng: modify timestamp calculation 2023-11-08 18:35 ` [PATCH v3 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-09 7:57 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 7:57 UTC (permalink / raw) To: Stephen Hemminger, dev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Wednesday, 8 November 2023 19.36 ] pcapng: modify timestamp calculation > > The computation of timestamp is best done in the part of > pcapng library that is in secondary process. > The secondary process is already doing a bunch of system > calls which makes it not performance sensitive. > > Simplify the computation of nanoseconds from TSC to a two > step process which avoids numeric overflow issues. The previous > code was not thread safe as well. > > Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- This changes the rte_pcapng lib API, but it is marked experimental, so should be allowed. Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 4/5] pcapng: avoid using alloca() 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-11-08 18:35 ` [PATCH v3 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 2023-11-09 8:21 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger The function alloca() like VLA's has problems if the caller passes a large value. Instead use a fixed size buffer (4K) which will be more than sufficient for the info related blocks in the file. Add bounds checks as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/pcapng/rte_pcapng.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 13fd2b97fb80..67f74d31aa32 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -140,9 +140,8 @@ pcapng_section_block(rte_pcapng_t *self, { struct pcapng_section_header *hdr; struct pcapng_option *opt; - void *buf; + uint8_t buf[BUFSIZ]; uint32_t len; - ssize_t cc; len = sizeof(*hdr); if (hw) @@ -158,8 +157,7 @@ pcapng_section_block(rte_pcapng_t *self, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = calloc(1, len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_section_header *)buf; @@ -193,10 +191,7 @@ pcapng_section_block(rte_pcapng_t *self, /* clone block_length after option */ memcpy(opt, &hdr->block_length, sizeof(uint32_t)); - cc = write(self->outfd, buf, len); - free(buf); - - return cc; + return write(self->outfd, buf, len); } /* Write an interface block for a DPDK port */ @@ -213,7 +208,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, struct pcapng_option *opt; const uint8_t tsresol = 9; /* nanosecond resolution */ uint32_t len; - void *buf; + uint8_t buf[BUFSIZ]; char ifname_buf[IF_NAMESIZE]; char ifhw[256]; uint64_t speed = 0; @@ -267,8 +262,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = alloca(len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_interface_block *)buf; @@ -296,17 +290,16 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE, ifhw, strlen(ifhw)); if (filter) { - /* Encoding is that the first octet indicates string vs BPF */ size_t len; - char *buf; len = strlen(filter) + 1; - buf = alloca(len); - *buf = '\0'; - memcpy(buf + 1, filter, len); + opt->code = PCAPNG_IFB_FILTER; + opt->length = len; + /* Encoding is that the first octet indicates string vs BPF */ + opt->data[0] = 0; + memcpy(opt->data + 1, filter, strlen(filter)); - opt = pcapng_add_option(opt, PCAPNG_IFB_FILTER, - buf, len); + opt = (struct pcapng_option *)((uint8_t *)opt + pcapng_optlen(len)); } opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0); @@ -333,7 +326,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, uint64_t start_time = self->offset_ns; uint64_t sample_time; uint32_t optlen, len; - uint8_t *buf; + uint8_t buf[BUFSIZ]; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -353,8 +346,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(0); len = sizeof(*hdr) + optlen + sizeof(uint32_t); - buf = alloca(len); - if (buf == NULL) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_statistics *)buf; -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 4/5] pcapng: avoid using alloca() 2023-11-08 18:35 ` [PATCH v3 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-09 8:21 ` Morten Brørup 2023-11-09 15:44 ` Stephen Hemminger 0 siblings, 1 reply; 61+ messages in thread From: Morten Brørup @ 2023-11-09 8:21 UTC (permalink / raw) To: Stephen Hemminger, dev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Wednesday, 8 November 2023 19.36 > > The function alloca() like VLA's has problems if the caller > passes a large value. Instead use a fixed size buffer (4K) > which will be more than sufficient for the info related blocks > in the file. Add bounds checks as well. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- I can't find the definition of BUFSIZ. Please make sure to add a comment to the definition of BUFSIZ mentioning - like in your patch description - that it will be more than sufficient for the info related blocks in the file. More comments inline below, regarding existing bugs found while reviewing. Assuming BUFSIZ has a comment describing the reason for its value, Acked-by: Morten Brørup <mb@smartsharesystems.com> > lib/pcapng/rte_pcapng.c | 34 +++++++++++++--------------------- > 1 file changed, 13 insertions(+), 21 deletions(-) > > diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c > index 13fd2b97fb80..67f74d31aa32 100644 > --- a/lib/pcapng/rte_pcapng.c > +++ b/lib/pcapng/rte_pcapng.c > @@ -140,9 +140,8 @@ pcapng_section_block(rte_pcapng_t *self, > { > struct pcapng_section_header *hdr; > struct pcapng_option *opt; > - void *buf; > + uint8_t buf[BUFSIZ]; > uint32_t len; > - ssize_t cc; > > len = sizeof(*hdr); > if (hw) > @@ -158,8 +157,7 @@ pcapng_section_block(rte_pcapng_t *self, > len += pcapng_optlen(0); > len += sizeof(uint32_t); > > - buf = calloc(1, len); > - if (!buf) > + if (len > sizeof(buf)) > return -1; Existing bug: rte_errno must be set before returning -1. This bug occurs multiple times in rte_pcapng.c, probably also in code you're not updating in this patch. > > hdr = (struct pcapng_section_header *)buf; > @@ -193,10 +191,7 @@ pcapng_section_block(rte_pcapng_t *self, > /* clone block_length after option */ > memcpy(opt, &hdr->block_length, sizeof(uint32_t)); > > - cc = write(self->outfd, buf, len); > - free(buf); > - > - return cc; > + return write(self->outfd, buf, len); Existing bug: if write() returns -1, errno must be stored in rte_errno before returning -1. This bug might also occur multiple times in rte_pcapng.c. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 4/5] pcapng: avoid using alloca() 2023-11-09 8:21 ` Morten Brørup @ 2023-11-09 15:44 ` Stephen Hemminger 2023-11-09 16:25 ` Morten Brørup 0 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 15:44 UTC (permalink / raw) To: Morten Brørup; +Cc: dev On Thu, 9 Nov 2023 09:21:22 +0100 Morten Brørup <mb@smartsharesystems.com> wrote: > I can't find the definition of BUFSIZ. Please make sure to add a comment to the definition of BUFSIZ mentioning - like in your patch description - that it will be more than sufficient for the info related blocks in the file. > > More comments inline below, regarding existing bugs found while reviewing. > > > Assuming BUFSIZ has a comment describing the reason for its value, > > Acked-by: Morten Brørup <mb@smartsharesystems.com> The constant BUFSIZ comes from stdio.h and used lots of places in libraries. It is 8192 in current glibc and unlikely to be a problem. Chose it because this a on stack buffer used before writing to a file which is similar to what stdio does. The library does not use stdio because most of the I/O is writing packets which needs to be fast and overhead of extra stdio buffer is harmful. Looking into using io_uring in a future version. ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v3 4/5] pcapng: avoid using alloca() 2023-11-09 15:44 ` Stephen Hemminger @ 2023-11-09 16:25 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 16:25 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 9 November 2023 16.45 > > On Thu, 9 Nov 2023 09:21:22 +0100 > Morten Brørup <mb@smartsharesystems.com> wrote: > > > I can't find the definition of BUFSIZ. Please make sure to add a > comment to the definition of BUFSIZ mentioning - like in your patch > description - that it will be more than sufficient for the info related > blocks in the file. > > > > More comments inline below, regarding existing bugs found while > reviewing. > > > > > > Assuming BUFSIZ has a comment describing the reason for its value, > > > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > > The constant BUFSIZ comes from stdio.h and used lots of places in > libraries. > It is 8192 in current glibc and unlikely to be a problem. OK, didn't know that. So I looked it up, trying to learn more about it. I found two sources [1], [2] mentioning that BUFSIZ is guaranteed to be at least 256. [1]: https://www.gnu.org/software/libc/manual/html_node/Controlling-Buffering.html#BUFSIZ [2]: Page 234 in "The C Standard Library" by P.J. Plauger, ISBN: 0-13-131509-9, from 1992 If 256 suffices, then I am OK with using BUFSIZ. I hope the authors of the other libraries using BUFSIZ don't assume more than the C standard promises about it. > Chose it because this a on stack buffer used before writing to a file > which > is similar to what stdio does. > > The library does not use stdio because most of the I/O is writing > packets > which needs to be fast and overhead of extra stdio buffer is harmful. > Looking into using io_uring in a future version. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 5/5] test: cleanups to pcapng test 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-11-08 18:35 ` [PATCH v3 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-08 18:35 ` Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-08 18:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. - add test for interface options Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++-------------- 2 files changed, 282 insertions(+), 138 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 4183d66b0e9c..dcc93f4a43b4 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -128,7 +128,7 @@ source_file_deps = { 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], 'test_net_ether.c': ['net'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 55aa2cf93666..c973aa47d1f8 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,149 +78,201 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } static int test_setup(void) { - int tmp_fd; - - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + port_id = rte_eth_dev_count_avail(); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); - - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } - - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, + NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; + uint64_t start_ns; + uint64_t end_ns; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, NULL, - 0, 0, 0, - NUM_PACKETS, 0); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + uint64_t ns; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < ctx->start_ns || ns > ctx->end_ns) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), ctx->start_ns); + fmt_time(tend, sizeof(tend), ctx->end_ns); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ + +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); +} - *countp += 1; +static uint64_t +current_timestamp(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_REALTIME, &ts); + return rte_timespec_to_ns(&ts); } /* @@ -219,78 +281,162 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, uint64_t started, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx = { }; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.start_ns = started; + ctx.end_ns = current_timestamp(); + + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_add_interface(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd; + uint64_t now = current_timestamp(); - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_addif", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface with ifname and ifdescr */ + ret = rte_pcapng_add_interface(pcapng, port_id, + "myeth", "Some long description", NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u with ifname\n", port_id); + goto fail; + } + + /* Add interface with filter */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, "tcp port 8080"); + if (ret < 0) { + fprintf(stderr, "can not add port %u with filter\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + rte_pcapng_close(pcapng); - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + ret = valid_pcapng_file(file_name, now, 0); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; +} + +static int +test_write_packets(void) +{ + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; + uint64_t now = current_timestamp(); + + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; } + printf("pcapng: output file %s\n", file_name); - return 0; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } + + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } + + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; + + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; + } + + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, now, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -299,10 +445,8 @@ unit_test_suite test_pcapng_suite = { .teardown = test_cleanup, .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { + TEST_CASE(test_add_interface), TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -313,4 +457,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 0/5] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (5 preceding siblings ...) 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (4 more replies) 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 subsequent siblings) 9 siblings, 5 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series has bugfixes and tests for dumpcap and pcapng. It should be in 23.11! It fixes issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). v4 - incorporate review feedback v3 - don't use alloca() since can have VLA type issues Stephen Hemminger (5): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation pcapng: avoid using alloca() test: cleanups to pcapng test app/dumpcap/main.c | 53 ++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 156 ++++++--------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 374 insertions(+), 285 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 1/5] pdump: fix setting rte_errno on mp error 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 2/5] dumpcap: allow multiple invocations Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 80b90c6f7d03..e94f49e21250 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 2/5] dumpcap: allow multiple invocations 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 2023-11-09 18:30 ` Morten Brørup 2023-11-09 17:34 ` [PATCH v4 3/5] pcapng: modify timestamp calculation Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. It is still not allowed to do multiple invocations on the same interface because only one callback is allowed and only one copy of mbuf is done. Dumpcap will fail with error in this case: pdump_prepare_client_request(): client request for pdump enable/disable failed EAL: Error - exiting with code: 1 Cause: Packet dump enable on 0:net_null0 failed File exists Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 64294bbfb3e6..74c754e272c5 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%d", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v4 2/5] dumpcap: allow multiple invocations 2023-11-09 17:34 ` [PATCH v4 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-09 18:30 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 18:30 UTC (permalink / raw) To: Stephen Hemminger, dev; +Cc: Isaac Boukris, Reshma Pattan > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 9 November 2023 18.34 > > If dumpcap is run twice with each instance pointing a different > interface, it would fail because of overlap in ring a pool names. > Fix by putting process id in the name. > > It is still not allowed to do multiple invocations on the same > interface because only one callback is allowed and only one copy > of mbuf is done. Dumpcap will fail with error in this case: > > pdump_prepare_client_request(): client request for pdump > enable/disable failed > EAL: Error - exiting with code: 1 > Cause: Packet dump enable on 0:net_null0 failed File exists > > Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") > Reported-by: Isaac Boukris <iboukris@gmail.com> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- [...] > + snprintf(ring_name, sizeof(ring_name), > + "dumpcap-%d", getpid()); Fixed - thank you. [...] > + snprintf(pool_name, sizeof(pool_name), "capture_%u", getpid()); Should change from %u to %d here too. ;-) Either way, Reviewed-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 3/5] pcapng: modify timestamp calculation 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. This does change the rte_pcapng_copy() and rte_pcapng_write_stats() experimental API's. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 4 +- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 61 insertions(+), 112 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 74c754e272c5..b5770875fab4 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..55aa2cf93666 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -173,8 +173,8 @@ test_write_stats(void) ssize_t len; /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, + len = rte_pcapng_write_stats(pcapng, port_id, NULL, + 0, 0, 0, NUM_PACKETS, 0); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..c40795c721de 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -121,8 +121,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -136,7 +134,7 @@ __rte_experimental struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -188,29 +186,22 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index e94f49e21250..5a1ec14d7a18 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 4/5] pcapng: avoid using alloca() 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-11-09 17:34 ` [PATCH v4 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan The function alloca() like VLA's has problems if the caller passes a large value. Instead use a fixed size buffer (2K) which will be more than sufficient for the info related blocks in the file. Add bounds checks as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pcapng/rte_pcapng.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 13fd2b97fb80..f74ec939a9f8 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -33,6 +33,9 @@ /* conversion from DPDK speed to PCAPNG */ #define PCAPNG_MBPS_SPEED 1000000ull +/* upper bound for section, stats and interface blocks */ +#define PCAPNG_BLKSIZ 2048 + /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ @@ -140,9 +143,8 @@ pcapng_section_block(rte_pcapng_t *self, { struct pcapng_section_header *hdr; struct pcapng_option *opt; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; uint32_t len; - ssize_t cc; len = sizeof(*hdr); if (hw) @@ -158,8 +160,7 @@ pcapng_section_block(rte_pcapng_t *self, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = calloc(1, len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_section_header *)buf; @@ -193,10 +194,7 @@ pcapng_section_block(rte_pcapng_t *self, /* clone block_length after option */ memcpy(opt, &hdr->block_length, sizeof(uint32_t)); - cc = write(self->outfd, buf, len); - free(buf); - - return cc; + return write(self->outfd, buf, len); } /* Write an interface block for a DPDK port */ @@ -213,7 +211,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, struct pcapng_option *opt; const uint8_t tsresol = 9; /* nanosecond resolution */ uint32_t len; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; char ifname_buf[IF_NAMESIZE]; char ifhw[256]; uint64_t speed = 0; @@ -267,8 +265,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = alloca(len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_interface_block *)buf; @@ -296,17 +293,16 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE, ifhw, strlen(ifhw)); if (filter) { - /* Encoding is that the first octet indicates string vs BPF */ size_t len; - char *buf; len = strlen(filter) + 1; - buf = alloca(len); - *buf = '\0'; - memcpy(buf + 1, filter, len); + opt->code = PCAPNG_IFB_FILTER; + opt->length = len; + /* Encoding is that the first octet indicates string vs BPF */ + opt->data[0] = 0; + memcpy(opt->data + 1, filter, strlen(filter)); - opt = pcapng_add_option(opt, PCAPNG_IFB_FILTER, - buf, len); + opt = (struct pcapng_option *)((uint8_t *)opt + pcapng_optlen(len)); } opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0); @@ -333,7 +329,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, uint64_t start_time = self->offset_ns; uint64_t sample_time; uint32_t optlen, len; - uint8_t *buf; + uint8_t buf[PCAPNG_BLKSIZ]; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -353,8 +349,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(0); len = sizeof(*hdr) + optlen + sizeof(uint32_t); - buf = alloca(len); - if (buf == NULL) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_statistics *)buf; -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4 5/5] test: cleanups to pcapng test 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-11-09 17:34 ` [PATCH v4 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-09 17:34 ` Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 17:34 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. - add test for interface options Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++-------------- 2 files changed, 282 insertions(+), 138 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 4183d66b0e9c..dcc93f4a43b4 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -128,7 +128,7 @@ source_file_deps = { 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], 'test_net_ether.c': ['net'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 55aa2cf93666..c973aa47d1f8 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,149 +78,201 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } static int test_setup(void) { - int tmp_fd; - - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + port_id = rte_eth_dev_count_avail(); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); - - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } - - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, + NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; + uint64_t start_ns; + uint64_t end_ns; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, NULL, - 0, 0, 0, - NUM_PACKETS, 0); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + uint64_t ns; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < ctx->start_ns || ns > ctx->end_ns) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), ctx->start_ns); + fmt_time(tend, sizeof(tend), ctx->end_ns); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ + +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); +} - *countp += 1; +static uint64_t +current_timestamp(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_REALTIME, &ts); + return rte_timespec_to_ns(&ts); } /* @@ -219,78 +281,162 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, uint64_t started, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx = { }; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.start_ns = started; + ctx.end_ns = current_timestamp(); + + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_add_interface(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd; + uint64_t now = current_timestamp(); - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_addif", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface with ifname and ifdescr */ + ret = rte_pcapng_add_interface(pcapng, port_id, + "myeth", "Some long description", NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u with ifname\n", port_id); + goto fail; + } + + /* Add interface with filter */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, "tcp port 8080"); + if (ret < 0) { + fprintf(stderr, "can not add port %u with filter\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + rte_pcapng_close(pcapng); - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + ret = valid_pcapng_file(file_name, now, 0); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; +} + +static int +test_write_packets(void) +{ + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; + uint64_t now = current_timestamp(); + + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; } + printf("pcapng: output file %s\n", file_name); - return 0; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } + + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } + + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; + + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; + } + + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, now, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -299,10 +445,8 @@ unit_test_suite test_pcapng_suite = { .teardown = test_cleanup, .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { + TEST_CASE(test_add_interface), TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -313,4 +457,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 0/5] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (6 preceding siblings ...) 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (4 more replies) 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger 9 siblings, 5 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series has bugfixes and tests for dumpcap and pcapng. It should be in 23.11! It fixes issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). v5 - fix format of getpid in capture name v4 - incorporate review feedback v3 - don't use alloca() since can have VLA type issues Stephen Hemminger (5): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation pcapng: avoid using alloca() test: cleanups to pcapng test app/dumpcap/main.c | 53 ++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 156 ++++++--------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 374 insertions(+), 285 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 1/5] pdump: fix setting rte_errno on mp error 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 2/5] dumpcap: allow multiple invocations Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 80b90c6f7d03..e94f49e21250 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 2/5] dumpcap: allow multiple invocations 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 2023-11-09 20:09 ` Morten Brørup 2023-11-09 19:45 ` [PATCH v5 3/5] pcapng: modify timestamp calculation Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. It is still not allowed to do multiple invocations on the same interface because only one callback is allowed and only one copy of mbuf is done. Dumpcap will fail with error in this case: pdump_prepare_client_request(): client request for pdump enable/disable failed EAL: Error - exiting with code: 1 Cause: Packet dump enable on 0:net_null0 failed File exists Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 64294bbfb3e6..efc60372d718 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%d", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%d", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v5 2/5] dumpcap: allow multiple invocations 2023-11-09 19:45 ` [PATCH v5 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-09 20:09 ` Morten Brørup 0 siblings, 0 replies; 61+ messages in thread From: Morten Brørup @ 2023-11-09 20:09 UTC (permalink / raw) To: Stephen Hemminger, dev; +Cc: Isaac Boukris, Reshma Pattan > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 9 November 2023 20.46 > > If dumpcap is run twice with each instance pointing a different > interface, it would fail because of overlap in ring a pool names. > Fix by putting process id in the name. > > It is still not allowed to do multiple invocations on the same > interface because only one callback is allowed and only one copy > of mbuf is done. Dumpcap will fail with error in this case: > > pdump_prepare_client_request(): client request for pdump > enable/disable failed > EAL: Error - exiting with code: 1 > Cause: Packet dump enable on 0:net_null0 failed File exists > > Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") > Reported-by: Isaac Boukris <iboukris@gmail.com> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Reviewed-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 3/5] pcapng: modify timestamp calculation 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 2023-11-12 14:22 ` Thomas Monjalon 2023-11-09 19:45 ` [PATCH v5 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 1 reply; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. This does change the rte_pcapng_copy() and rte_pcapng_write_stats() experimental API's. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 4 +- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 61 insertions(+), 112 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index efc60372d718..583bce80166c 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..55aa2cf93666 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -173,8 +173,8 @@ test_write_stats(void) ssize_t len; /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, + len = rte_pcapng_write_stats(pcapng, port_id, NULL, + 0, 0, 0, NUM_PACKETS, 0); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..c40795c721de 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -121,8 +121,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -136,7 +134,7 @@ __rte_experimental struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -188,29 +186,22 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index e94f49e21250..5a1ec14d7a18 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v5 3/5] pcapng: modify timestamp calculation 2023-11-09 19:45 ` [PATCH v5 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-12 14:22 ` Thomas Monjalon 0 siblings, 0 replies; 61+ messages in thread From: Thomas Monjalon @ 2023-11-12 14:22 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Morten Brørup, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage, Stephen Hemminger 09/11/2023 20:45, Stephen Hemminger: > The computation of timestamp is best done in the part of > pcapng library that is in secondary process. > The secondary process is already doing a bunch of system > calls which makes it not performance sensitive. > This does change the rte_pcapng_copy() > and rte_pcapng_write_stats() experimental API's. > > Simplify the computation of nanoseconds from TSC to a two > step process which avoids numeric overflow issues. The previous > code was not thread safe as well. > > Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > Acked-by: Morten Brørup <mb@smartsharesystems.com> It does not compile: app/test/test_pcapng.c:148:22: error: too many arguments to function 'rte_pcapng_copy' Please make sure it compiles after each patch. Thank you ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 4/5] pcapng: avoid using alloca() 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-11-09 19:45 ` [PATCH v5 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan The function alloca() like VLA's has problems if the caller passes a large value. Instead use a fixed size buffer (2K) which will be more than sufficient for the info related blocks in the file. Add bounds checks as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pcapng/rte_pcapng.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 13fd2b97fb80..f74ec939a9f8 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -33,6 +33,9 @@ /* conversion from DPDK speed to PCAPNG */ #define PCAPNG_MBPS_SPEED 1000000ull +/* upper bound for section, stats and interface blocks */ +#define PCAPNG_BLKSIZ 2048 + /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ @@ -140,9 +143,8 @@ pcapng_section_block(rte_pcapng_t *self, { struct pcapng_section_header *hdr; struct pcapng_option *opt; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; uint32_t len; - ssize_t cc; len = sizeof(*hdr); if (hw) @@ -158,8 +160,7 @@ pcapng_section_block(rte_pcapng_t *self, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = calloc(1, len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_section_header *)buf; @@ -193,10 +194,7 @@ pcapng_section_block(rte_pcapng_t *self, /* clone block_length after option */ memcpy(opt, &hdr->block_length, sizeof(uint32_t)); - cc = write(self->outfd, buf, len); - free(buf); - - return cc; + return write(self->outfd, buf, len); } /* Write an interface block for a DPDK port */ @@ -213,7 +211,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, struct pcapng_option *opt; const uint8_t tsresol = 9; /* nanosecond resolution */ uint32_t len; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; char ifname_buf[IF_NAMESIZE]; char ifhw[256]; uint64_t speed = 0; @@ -267,8 +265,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = alloca(len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_interface_block *)buf; @@ -296,17 +293,16 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE, ifhw, strlen(ifhw)); if (filter) { - /* Encoding is that the first octet indicates string vs BPF */ size_t len; - char *buf; len = strlen(filter) + 1; - buf = alloca(len); - *buf = '\0'; - memcpy(buf + 1, filter, len); + opt->code = PCAPNG_IFB_FILTER; + opt->length = len; + /* Encoding is that the first octet indicates string vs BPF */ + opt->data[0] = 0; + memcpy(opt->data + 1, filter, strlen(filter)); - opt = pcapng_add_option(opt, PCAPNG_IFB_FILTER, - buf, len); + opt = (struct pcapng_option *)((uint8_t *)opt + pcapng_optlen(len)); } opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0); @@ -333,7 +329,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, uint64_t start_time = self->offset_ns; uint64_t sample_time; uint32_t optlen, len; - uint8_t *buf; + uint8_t buf[PCAPNG_BLKSIZ]; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -353,8 +349,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(0); len = sizeof(*hdr) + optlen + sizeof(uint32_t); - buf = alloca(len); - if (buf == NULL) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_statistics *)buf; -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v5 5/5] test: cleanups to pcapng test 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-11-09 19:45 ` [PATCH v5 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-09 19:45 ` Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-09 19:45 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. - add test for interface options Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 418 +++++++++++++++++++++++++++-------------- 2 files changed, 282 insertions(+), 138 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 4183d66b0e9c..dcc93f4a43b4 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -128,7 +128,7 @@ source_file_deps = { 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], 'test_net_ether.c': ['net'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 55aa2cf93666..c973aa47d1f8 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,149 +78,201 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } static int test_setup(void) { - int tmp_fd; - - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + port_id = rte_eth_dev_count_avail(); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); - - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } - - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, + NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; + uint64_t start_ns; + uint64_t end_ns; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, NULL, - 0, 0, 0, - NUM_PACKETS, 0); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + uint64_t ns; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < ctx->start_ns || ns > ctx->end_ns) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), ctx->start_ns); + fmt_time(tend, sizeof(tend), ctx->end_ns); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ + +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); +} - *countp += 1; +static uint64_t +current_timestamp(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_REALTIME, &ts); + return rte_timespec_to_ns(&ts); } /* @@ -219,78 +281,162 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, uint64_t started, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx = { }; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.start_ns = started; + ctx.end_ns = current_timestamp(); + + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_add_interface(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd; + uint64_t now = current_timestamp(); - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_addif", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface with ifname and ifdescr */ + ret = rte_pcapng_add_interface(pcapng, port_id, + "myeth", "Some long description", NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u with ifname\n", port_id); + goto fail; + } + + /* Add interface with filter */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, "tcp port 8080"); + if (ret < 0) { + fprintf(stderr, "can not add port %u with filter\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + rte_pcapng_close(pcapng); - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + ret = valid_pcapng_file(file_name, now, 0); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; +} + +static int +test_write_packets(void) +{ + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; + uint64_t now = current_timestamp(); + + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; } + printf("pcapng: output file %s\n", file_name); - return 0; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } + + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } + + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; + + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; + } + + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, now, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -299,10 +445,8 @@ unit_test_suite test_pcapng_suite = { .teardown = test_cleanup, .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { + TEST_CASE(test_add_interface), TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -313,4 +457,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 0/5] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (7 preceding siblings ...) 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (4 more replies) 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger 9 siblings, 5 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger It fixes issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). v6 - make sure all steps compile v5 - fix format of getpid in capture name v4 - incorporate review feedback v3 - don't use alloca() since can have VLA type issues Stephen Hemminger (5): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation pcapng: avoid using alloca() test: cleanups to pcapng test app/dumpcap/main.c | 53 ++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 417 +++++++++++++++++++++++++++------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 156 ++++++--------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 373 insertions(+), 285 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 1/5] pdump: fix setting rte_errno on mp error 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 2/5] dumpcap: allow multiple invocations Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 80b90c6f7d03..e94f49e21250 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 2/5] dumpcap: allow multiple invocations 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 3/5] pcapng: modify timestamp calculation Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. It is still not allowed to do multiple invocations on the same interface because only one callback is allowed and only one copy of mbuf is done. Dumpcap will fail with error in this case: pdump_prepare_client_request(): client request for pdump enable/disable failed EAL: Error - exiting with code: 1 Cause: Packet dump enable on 0:net_null0 failed File exists Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 4f581bd341d8..d05dddac0071 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%d", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%d", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 3/5] pcapng: modify timestamp calculation 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. This does change the rte_pcapng_copy() and rte_pcapng_write_stats() experimental API's. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 7 +-- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 62 insertions(+), 114 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index d05dddac0071..fc28e2d7027a 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..21131dfa0c5e 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -146,7 +146,7 @@ test_write_packets(void) struct rte_mbuf *mc; mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); + RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); if (mc == NULL) { fprintf(stderr, "Cannot copy packet\n"); return -1; @@ -174,8 +174,7 @@ test_write_stats(void) /* write a statistics block */ len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, - NUM_PACKETS, 0); + UINT64_MAX, UINT64_MAX, NULL); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); return -1; @@ -262,7 +261,7 @@ test_write_over_limit_iov_max(void) struct rte_mbuf *mc; mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); + RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); if (mc == NULL) { fprintf(stderr, "Cannot copy packet\n"); return -1; diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index d93cc9f73ad5..c40795c721de 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -121,8 +121,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -136,7 +134,7 @@ __rte_experimental struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -188,29 +186,22 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ __rte_experimental ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index e94f49e21250..5a1ec14d7a18 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 4/5] pcapng: avoid using alloca() 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-11-13 16:15 ` [PATCH v6 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 5/5] test: cleanups to pcapng test Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan The function alloca() like VLA's has problems if the caller passes a large value. Instead use a fixed size buffer (2K) which will be more than sufficient for the info related blocks in the file. Add bounds checks as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pcapng/rte_pcapng.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 13fd2b97fb80..f74ec939a9f8 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -33,6 +33,9 @@ /* conversion from DPDK speed to PCAPNG */ #define PCAPNG_MBPS_SPEED 1000000ull +/* upper bound for section, stats and interface blocks */ +#define PCAPNG_BLKSIZ 2048 + /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ @@ -140,9 +143,8 @@ pcapng_section_block(rte_pcapng_t *self, { struct pcapng_section_header *hdr; struct pcapng_option *opt; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; uint32_t len; - ssize_t cc; len = sizeof(*hdr); if (hw) @@ -158,8 +160,7 @@ pcapng_section_block(rte_pcapng_t *self, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = calloc(1, len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_section_header *)buf; @@ -193,10 +194,7 @@ pcapng_section_block(rte_pcapng_t *self, /* clone block_length after option */ memcpy(opt, &hdr->block_length, sizeof(uint32_t)); - cc = write(self->outfd, buf, len); - free(buf); - - return cc; + return write(self->outfd, buf, len); } /* Write an interface block for a DPDK port */ @@ -213,7 +211,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, struct pcapng_option *opt; const uint8_t tsresol = 9; /* nanosecond resolution */ uint32_t len; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; char ifname_buf[IF_NAMESIZE]; char ifhw[256]; uint64_t speed = 0; @@ -267,8 +265,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = alloca(len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_interface_block *)buf; @@ -296,17 +293,16 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE, ifhw, strlen(ifhw)); if (filter) { - /* Encoding is that the first octet indicates string vs BPF */ size_t len; - char *buf; len = strlen(filter) + 1; - buf = alloca(len); - *buf = '\0'; - memcpy(buf + 1, filter, len); + opt->code = PCAPNG_IFB_FILTER; + opt->length = len; + /* Encoding is that the first octet indicates string vs BPF */ + opt->data[0] = 0; + memcpy(opt->data + 1, filter, strlen(filter)); - opt = pcapng_add_option(opt, PCAPNG_IFB_FILTER, - buf, len); + opt = (struct pcapng_option *)((uint8_t *)opt + pcapng_optlen(len)); } opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0); @@ -333,7 +329,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, uint64_t start_time = self->offset_ns; uint64_t sample_time; uint32_t optlen, len; - uint8_t *buf; + uint8_t buf[PCAPNG_BLKSIZ]; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -353,8 +349,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(0); len = sizeof(*hdr) + optlen + sizeof(uint32_t); - buf = alloca(len); - if (buf == NULL) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_statistics *)buf; -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 5/5] test: cleanups to pcapng test 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-11-13 16:15 ` [PATCH v6 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-13 16:15 ` Stephen Hemminger 4 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-13 16:15 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. - add test for interface options Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 416 +++++++++++++++++++++++++++-------------- 2 files changed, 281 insertions(+), 137 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 4183d66b0e9c..dcc93f4a43b4 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -128,7 +128,7 @@ source_file_deps = { 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], 'test_net_ether.c': ['net'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 21131dfa0c5e..89535efad096 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,148 +78,200 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } static int test_setup(void) { - int tmp_fd; - - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + port_id = rte_eth_dev_count_avail(); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); - - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } - - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; + uint64_t start_ns; + uint64_t end_ns; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - UINT64_MAX, UINT64_MAX, NULL); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + uint64_t ns; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < ctx->start_ns || ns > ctx->end_ns) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), ctx->start_ns); + fmt_time(tend, sizeof(tend), ctx->end_ns); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ + +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); +} - *countp += 1; +static uint64_t +current_timestamp(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_REALTIME, &ts); + return rte_timespec_to_ns(&ts); } /* @@ -218,78 +280,162 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, uint64_t started, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx = { }; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.start_ns = started; + ctx.end_ns = current_timestamp(); + + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_add_interface(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd; + uint64_t now = current_timestamp(); - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_addif", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface with ifname and ifdescr */ + ret = rte_pcapng_add_interface(pcapng, port_id, + "myeth", "Some long description", NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u with ifname\n", port_id); + goto fail; + } + + /* Add interface with filter */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, "tcp port 8080"); + if (ret < 0) { + fprintf(stderr, "can not add port %u with filter\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + rte_pcapng_close(pcapng); - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + ret = valid_pcapng_file(file_name, now, 0); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; +} + +static int +test_write_packets(void) +{ + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; + uint64_t now = current_timestamp(); + + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; } + printf("pcapng: output file %s\n", file_name); - return 0; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } + + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } + + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; + + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; + } + + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, now, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -298,10 +444,8 @@ unit_test_suite test_pcapng_suite = { .teardown = test_cleanup, .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { + TEST_CASE(test_add_interface), TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -312,4 +456,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.39.2 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 0/5] dumpcap and pcapng fixes 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger ` (8 preceding siblings ...) 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger ` (5 more replies) 9 siblings, 6 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger It fixes issues related to timestamping. The design choices are to maximize performance in the primary process; and do all the time adjustment in the secondary (dumpcap) since the dumpcap needs to system calls anyway to write the result. This patches set changes where the adjustment is calculated into the pcapng portion that opens the output file. All details of the format of timestamp are contained inside pcapng (data hiding). v7 - no change, rebase there were some apply failures by CI v6 - make sure all steps compile v5 - fix format of getpid in capture name v4 - incorporate review feedback v3 - don't use alloca() since can have VLA type issues Stephen Hemminger (5): pdump: fix setting rte_errno on mp error dumpcap: allow multiple invocations pcapng: modify timestamp calculation pcapng: avoid using alloca() test: cleanups to pcapng test app/dumpcap/main.c | 53 ++--- app/test/meson.build | 2 +- app/test/test_pcapng.c | 417 +++++++++++++++++++++++++++------------- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 156 ++++++--------- lib/pcapng/rte_pcapng.h | 19 +- lib/pdump/rte_pdump.c | 9 +- 7 files changed, 373 insertions(+), 285 deletions(-) -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 1/5] pdump: fix setting rte_errno on mp error 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 2/5] dumpcap: allow multiple invocations Stephen Hemminger ` (4 subsequent siblings) 5 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jianfeng Tan The response from MP server sets err_value to negative on error. The convention for rte_errno is to use a positive value on error. This makes errors like duplicate registration show up with the correct error value. Fixes: 660098d61f57 ("pdump: use generic multi-process channel") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pdump/rte_pdump.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index 80b90c6f7d03..e94f49e21250 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -564,9 +564,10 @@ pdump_prepare_client_request(const char *device, uint16_t queue, if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) { mp_rep = &mp_reply.msgs[0]; resp = (struct pdump_response *)mp_rep->param; - rte_errno = resp->err_value; - if (!resp->err_value) + if (resp->err_value == 0) ret = 0; + else + rte_errno = -resp->err_value; free(mp_reply.msgs); } -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 2/5] dumpcap: allow multiple invocations 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 3/5] pcapng: modify timestamp calculation Stephen Hemminger ` (3 subsequent siblings) 5 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Isaac Boukris, Reshma Pattan If dumpcap is run twice with each instance pointing a different interface, it would fail because of overlap in ring a pool names. Fix by putting process id in the name. It is still not allowed to do multiple invocations on the same interface because only one callback is allowed and only one copy of mbuf is done. Dumpcap will fail with error in this case: pdump_prepare_client_request(): client request for pdump enable/disable failed EAL: Error - exiting with code: 1 Cause: Packet dump enable on 0:net_null0 failed File exists Fixes: cbb44143be74 ("app/dumpcap: add new packet capture application") Reported-by: Isaac Boukris <iboukris@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/dumpcap/main.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index 4f581bd341d8..d05dddac0071 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -44,7 +44,6 @@ #include <pcap/pcap.h> #include <pcap/bpf.h> -#define RING_NAME "capture-ring" #define MONITOR_INTERVAL (500 * 1000) #define MBUF_POOL_CACHE_SIZE 32 #define BURST_SIZE 32 @@ -647,6 +646,7 @@ static void dpdk_init(void) static struct rte_ring *create_ring(void) { struct rte_ring *ring; + char ring_name[RTE_RING_NAMESIZE]; size_t size, log2; /* Find next power of 2 >= size. */ @@ -660,28 +660,28 @@ static struct rte_ring *create_ring(void) ring_size = size; } - ring = rte_ring_lookup(RING_NAME); - if (ring == NULL) { - ring = rte_ring_create(RING_NAME, ring_size, - rte_socket_id(), 0); - if (ring == NULL) - rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", - rte_strerror(rte_errno)); - } + /* Want one ring per invocation of program */ + snprintf(ring_name, sizeof(ring_name), + "dumpcap-%d", getpid()); + + ring = rte_ring_create(ring_name, ring_size, + rte_socket_id(), 0); + if (ring == NULL) + rte_exit(EXIT_FAILURE, "Could not create ring :%s\n", + rte_strerror(rte_errno)); + return ring; } static struct rte_mempool *create_mempool(void) { const struct interface *intf; - static const char pool_name[] = "capture_mbufs"; + char pool_name[RTE_MEMPOOL_NAMESIZE]; size_t num_mbufs = 2 * ring_size; struct rte_mempool *mp; uint32_t data_size = 128; - mp = rte_mempool_lookup(pool_name); - if (mp) - return mp; + snprintf(pool_name, sizeof(pool_name), "capture_%d", getpid()); /* Common pool so size mbuf for biggest snap length */ TAILQ_FOREACH(intf, &interfaces, next) { @@ -826,7 +826,7 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp) rte_exit(EXIT_FAILURE, "Packet dump enable on %u:%s failed %s\n", intf->port, intf->name, - rte_strerror(-ret)); + rte_strerror(rte_errno)); } if (intf->opts.promisc_mode) { -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 3/5] pcapng: modify timestamp calculation 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 2/5] dumpcap: allow multiple invocations Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 4/5] pcapng: avoid using alloca() Stephen Hemminger ` (2 subsequent siblings) 5 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan, Quentin Armitage The computation of timestamp is best done in the part of pcapng library that is in secondary process. The secondary process is already doing a bunch of system calls which makes it not performance sensitive. This does change the rte_pcapng_copy() and rte_pcapng_write_stats() experimental API's. Simplify the computation of nanoseconds from TSC to a two step process which avoids numeric overflow issues. The previous code was not thread safe as well. Fixes: c882eb544842 ("pcapng: fix timestamp wrapping in output files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- app/dumpcap/main.c | 25 +++------ app/test/test_pcapng.c | 7 +-- lib/graph/graph_pcap.c | 2 +- lib/pcapng/rte_pcapng.c | 119 +++++++++++++++------------------------- lib/pcapng/rte_pcapng.h | 19 ++----- lib/pdump/rte_pdump.c | 4 +- 6 files changed, 62 insertions(+), 114 deletions(-) diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c index d05dddac0071..fc28e2d7027a 100644 --- a/app/dumpcap/main.c +++ b/app/dumpcap/main.c @@ -66,13 +66,13 @@ static bool print_stats; /* capture limit options */ static struct { - uint64_t duration; /* nanoseconds */ + time_t duration; /* seconds */ unsigned long packets; /* number of packets in file */ size_t size; /* file size (bytes) */ } stop; /* Running state */ -static uint64_t start_time, end_time; +static time_t start_time; static uint64_t packets_received; static size_t file_size; @@ -197,7 +197,7 @@ static void auto_stop(char *opt) if (*value == '\0' || *endp != '\0' || interval <= 0) rte_exit(EXIT_FAILURE, "Invalid duration \"%s\"\n", value); - stop.duration = NSEC_PER_SEC * interval; + stop.duration = interval; } else if (strcmp(opt, "filesize") == 0) { stop.size = get_uint(value, "filesize", 0) * 1024; } else if (strcmp(opt, "packets") == 0) { @@ -511,15 +511,6 @@ static void statistics_loop(void) } } -/* Return the time since 1/1/1970 in nanoseconds */ -static uint64_t create_timestamp(void) -{ - struct timespec now; - - clock_gettime(CLOCK_MONOTONIC, &now); - return rte_timespec_to_ns(&now); -} - static void cleanup_pdump_resources(void) { @@ -589,9 +580,8 @@ report_packet_stats(dumpcap_out_t out) ifdrop = pdump_stats.nombuf + pdump_stats.ringfull; if (use_pcapng) - rte_pcapng_write_stats(out.pcapng, intf->port, NULL, - start_time, end_time, - ifrecv, ifdrop); + rte_pcapng_write_stats(out.pcapng, intf->port, + ifrecv, ifdrop, NULL); if (ifrecv == 0) percent = 0; @@ -983,7 +973,7 @@ int main(int argc, char **argv) mp = create_mempool(); out = create_output(); - start_time = create_timestamp(); + start_time = time(NULL); enable_pdump(r, mp); if (!quiet) { @@ -1005,11 +995,10 @@ int main(int argc, char **argv) break; if (stop.duration != 0 && - create_timestamp() - start_time > stop.duration) + time(NULL) - start_time > stop.duration) break; } - end_time = create_timestamp(); disable_primary_monitor(); if (rte_eal_primary_proc_alive(NULL)) diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index b8429a02f160..21131dfa0c5e 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -146,7 +146,7 @@ test_write_packets(void) struct rte_mbuf *mc; mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); + RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); if (mc == NULL) { fprintf(stderr, "Cannot copy packet\n"); return -1; @@ -174,8 +174,7 @@ test_write_stats(void) /* write a statistics block */ len = rte_pcapng_write_stats(pcapng, port_id, - NULL, 0, 0, - NUM_PACKETS, 0); + UINT64_MAX, UINT64_MAX, NULL); if (len <= 0) { fprintf(stderr, "Write of statistics failed\n"); return -1; @@ -262,7 +261,7 @@ test_write_over_limit_iov_max(void) struct rte_mbuf *mc; mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - rte_get_tsc_cycles(), 0, NULL); + RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); if (mc == NULL) { fprintf(stderr, "Cannot copy packet\n"); return -1; diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index db722c375fa7..89525f1220ca 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -214,7 +214,7 @@ graph_pcap_dispatch(struct rte_graph *graph, mbuf = (struct rte_mbuf *)objs[i]; mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len, - rte_get_tsc_cycles(), 0, buffer); + 0, buffer); if (mc == NULL) break; diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 3c91fc77644a..13fd2b97fb80 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -36,22 +36,14 @@ /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ - unsigned int ports; /* number of interfaces added */ + uint64_t offset_ns; /* ns since 1/1/1970 when initialized */ + uint64_t tsc_base; /* TSC when started */ /* DPDK port id to interface index in file */ uint32_t port_index[RTE_MAX_ETHPORTS]; }; -/* For converting TSC cycles to PCAPNG ns format */ -static struct pcapng_time { - uint64_t ns; - uint64_t cycles; - uint64_t tsc_hz; - struct rte_reciprocal_u64 tsc_hz_inverse; -} pcapng_time; - - #ifdef RTE_EXEC_ENV_WINDOWS /* * Windows does not have writev() call. @@ -102,56 +94,21 @@ static ssize_t writev(int fd, const struct iovec *iov, int iovcnt) #define if_indextoname(ifindex, ifname) NULL #endif -static inline void -pcapng_init(void) +/* Convert from TSC (CPU cycles) to nanoseconds */ +static uint64_t +pcapng_timestamp(const rte_pcapng_t *self, uint64_t cycles) { - struct timespec ts; + uint64_t delta, rem, secs, ns; + const uint64_t hz = rte_get_tsc_hz(); - pcapng_time.cycles = rte_get_tsc_cycles(); - clock_gettime(CLOCK_REALTIME, &ts); - pcapng_time.cycles = (pcapng_time.cycles + rte_get_tsc_cycles()) / 2; - pcapng_time.ns = rte_timespec_to_ns(&ts); - - pcapng_time.tsc_hz = rte_get_tsc_hz(); - pcapng_time.tsc_hz_inverse = rte_reciprocal_value_u64(pcapng_time.tsc_hz); -} + delta = cycles - self->tsc_base; -/* PCAPNG timestamps are in nanoseconds */ -static uint64_t pcapng_tsc_to_ns(uint64_t cycles) -{ - uint64_t delta, secs; - - if (!pcapng_time.tsc_hz) - pcapng_init(); - - /* In essence the calculation is: - * delta = (cycles - pcapng_time.cycles) * NSEC_PRE_SEC / rte_get_tsc_hz() - * but this overflows within 4 to 8 seconds depending on TSC frequency. - * Instead, if delta >= pcapng_time.tsc_hz: - * Increase pcapng_time.ns and pcapng_time.cycles by the number of - * whole seconds in delta and reduce delta accordingly. - * delta will therefore always lie in the interval [0, pcapng_time.tsc_hz), - * which will not overflow when multiplied by NSEC_PER_SEC provided the - * TSC frequency < approx 18.4GHz. - * - * Currently all TSCs operate below 5GHz. - */ - delta = cycles - pcapng_time.cycles; - if (unlikely(delta >= pcapng_time.tsc_hz)) { - if (likely(delta < pcapng_time.tsc_hz * 2)) { - delta -= pcapng_time.tsc_hz; - pcapng_time.cycles += pcapng_time.tsc_hz; - pcapng_time.ns += NSEC_PER_SEC; - } else { - secs = rte_reciprocal_divide_u64(delta, &pcapng_time.tsc_hz_inverse); - delta -= secs * pcapng_time.tsc_hz; - pcapng_time.cycles += secs * pcapng_time.tsc_hz; - pcapng_time.ns += secs * NSEC_PER_SEC; - } - } + /* Avoid numeric wraparound by computing seconds first */ + secs = delta / hz; + rem = delta % hz; + ns = (rem * NS_PER_S) / hz; - return pcapng_time.ns + rte_reciprocal_divide_u64(delta * NSEC_PER_SEC, - &pcapng_time.tsc_hz_inverse); + return secs * NS_PER_S + ns + self->offset_ns; } /* length of option including padding */ @@ -368,15 +325,15 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop) + uint64_t ifrecv, uint64_t ifdrop, + const char *comment) { struct pcapng_statistics *hdr; struct pcapng_option *opt; + uint64_t start_time = self->offset_ns; + uint64_t sample_time; uint32_t optlen, len; uint8_t *buf; - uint64_t ns; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -386,10 +343,10 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(sizeof(ifrecv)); if (ifdrop != UINT64_MAX) optlen += pcapng_optlen(sizeof(ifdrop)); + if (start_time != 0) optlen += pcapng_optlen(sizeof(start_time)); - if (end_time != 0) - optlen += pcapng_optlen(sizeof(end_time)); + if (comment) optlen += pcapng_optlen(strlen(comment)); if (optlen != 0) @@ -409,9 +366,6 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, if (start_time != 0) opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME, &start_time, sizeof(start_time)); - if (end_time != 0) - opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME, - &end_time, sizeof(end_time)); if (ifrecv != UINT64_MAX) opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV, &ifrecv, sizeof(ifrecv)); @@ -425,9 +379,9 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, hdr->block_length = len; hdr->interface_id = self->port_index[port_id]; - ns = pcapng_tsc_to_ns(rte_get_tsc_cycles()); - hdr->timestamp_hi = ns >> 32; - hdr->timestamp_lo = (uint32_t)ns; + sample_time = pcapng_timestamp(self, rte_get_tsc_cycles()); + hdr->timestamp_hi = sample_time >> 32; + hdr->timestamp_lo = (uint32_t)sample_time; /* clone block_length after option */ memcpy(opt, &len, sizeof(uint32_t)); @@ -520,23 +474,21 @@ struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *md, struct rte_mempool *mp, - uint32_t length, uint64_t cycles, + uint32_t length, enum rte_pcapng_direction direction, const char *comment) { struct pcapng_enhance_packet_block *epb; uint32_t orig_len, data_len, padding, flags; struct pcapng_option *opt; + uint64_t timestamp; uint16_t optlen; struct rte_mbuf *mc; - uint64_t ns; bool rss_hash; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL); #endif - ns = pcapng_tsc_to_ns(cycles); - orig_len = rte_pktmbuf_pkt_len(md); /* Take snapshot of the data */ @@ -641,8 +593,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue, /* Interface index is filled in later during write */ mc->port = port_id; - epb->timestamp_hi = ns >> 32; - epb->timestamp_lo = (uint32_t)ns; + /* Put timestamp in cycles here - adjust in packet write */ + timestamp = rte_get_tsc_cycles(); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; epb->capture_length = data_len; epb->original_length = orig_len; @@ -668,6 +622,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self, for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *m = pkts[i]; struct pcapng_enhance_packet_block *epb; + uint64_t cycles, timestamp; /* sanity check that is really a pcapng mbuf */ epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *); @@ -684,6 +639,13 @@ rte_pcapng_write_packets(rte_pcapng_t *self, return -1; } + /* adjust timestamp recorded in packet */ + cycles = (uint64_t)epb->timestamp_hi << 32; + cycles += epb->timestamp_lo; + timestamp = pcapng_timestamp(self, cycles); + epb->timestamp_hi = timestamp >> 32; + epb->timestamp_lo = (uint32_t)timestamp; + /* * Handle case of highly fragmented and large burst size * Note: this assumes that max segments per mbuf < IOV_MAX @@ -725,6 +687,8 @@ rte_pcapng_fdopen(int fd, { unsigned int i; rte_pcapng_t *self; + struct timespec ts; + uint64_t cycles; self = malloc(sizeof(*self)); if (!self) { @@ -734,6 +698,13 @@ rte_pcapng_fdopen(int fd, self->outfd = fd; self->ports = 0; + + /* record start time in ns since 1/1/1970 */ + cycles = rte_get_tsc_cycles(); + clock_gettime(CLOCK_REALTIME, &ts); + self->tsc_base = (cycles + rte_get_tsc_cycles()) / 2; + self->offset_ns = rte_timespec_to_ns(&ts); + for (i = 0; i < RTE_MAX_ETHPORTS; i++) self->port_index[i] = UINT32_MAX; diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index 03d658aab209..48f2b5756430 100644 --- a/lib/pcapng/rte_pcapng.h +++ b/lib/pcapng/rte_pcapng.h @@ -114,8 +114,6 @@ enum rte_pcapng_direction { * @param length * The upper limit on bytes to copy. Passing UINT32_MAX * means all data (after offset). - * @param timestamp - * The timestamp in TSC cycles. * @param direction * The direction of the packer: receive, transmit or unknown. * @param comment @@ -128,7 +126,7 @@ enum rte_pcapng_direction { struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue, const struct rte_mbuf *m, struct rte_mempool *mp, - uint32_t length, uint64_t timestamp, + uint32_t length, enum rte_pcapng_direction direction, const char *comment); @@ -178,28 +176,21 @@ rte_pcapng_write_packets(rte_pcapng_t *self, * The handle to the packet capture file * @param port * The Ethernet port to report stats on. - * @param comment - * Optional comment to add to statistics. - * @param start_time - * The time when packet capture was started in nanoseconds. - * Optional: can be zero if not known. - * @param end_time - * The time when packet capture was stopped in nanoseconds. - * Optional: can be zero if not finished; * @param ifrecv * The number of packets received by capture. * Optional: use UINT64_MAX if not known. * @param ifdrop * The number of packets missed by the capture process. * Optional: use UINT64_MAX if not known. + * @param comment + * Optional comment to add to statistics. * @return * number of bytes written to file, -1 on failure to write file */ ssize_t rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port, - const char *comment, - uint64_t start_time, uint64_t end_time, - uint64_t ifrecv, uint64_t ifdrop); + uint64_t ifrecv, uint64_t ifdrop, + const char *comment); #ifdef __cplusplus } diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index e94f49e21250..5a1ec14d7a18 100644 --- a/lib/pdump/rte_pdump.c +++ b/lib/pdump/rte_pdump.c @@ -90,7 +90,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, int ring_enq; uint16_t d_pkts = 0; struct rte_mbuf *dup_bufs[nb_pkts]; - uint64_t ts; struct rte_ring *ring; struct rte_mempool *mp; struct rte_mbuf *p; @@ -99,7 +98,6 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->filter) rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts); - ts = rte_get_tsc_cycles(); ring = cbs->ring; mp = cbs->mp; for (i = 0; i < nb_pkts; i++) { @@ -122,7 +120,7 @@ pdump_copy(uint16_t port_id, uint16_t queue, if (cbs->ver == V2) p = rte_pcapng_copy(port_id, queue, pkts[i], mp, cbs->snaplen, - ts, direction, NULL); + direction, NULL); else p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen); -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 4/5] pcapng: avoid using alloca() 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (2 preceding siblings ...) 2023-11-17 16:35 ` [PATCH v7 3/5] pcapng: modify timestamp calculation Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-22 22:42 ` [PATCH v7 0/5] dumpcap and pcapng fixes Thomas Monjalon 5 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Morten Brørup, Reshma Pattan The function alloca() like VLA's has problems if the caller passes a large value. Instead use a fixed size buffer (2K) which will be more than sufficient for the info related blocks in the file. Add bounds checks as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> --- lib/pcapng/rte_pcapng.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index 13fd2b97fb80..f74ec939a9f8 100644 --- a/lib/pcapng/rte_pcapng.c +++ b/lib/pcapng/rte_pcapng.c @@ -33,6 +33,9 @@ /* conversion from DPDK speed to PCAPNG */ #define PCAPNG_MBPS_SPEED 1000000ull +/* upper bound for section, stats and interface blocks */ +#define PCAPNG_BLKSIZ 2048 + /* Format of the capture file handle */ struct rte_pcapng { int outfd; /* output file */ @@ -140,9 +143,8 @@ pcapng_section_block(rte_pcapng_t *self, { struct pcapng_section_header *hdr; struct pcapng_option *opt; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; uint32_t len; - ssize_t cc; len = sizeof(*hdr); if (hw) @@ -158,8 +160,7 @@ pcapng_section_block(rte_pcapng_t *self, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = calloc(1, len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_section_header *)buf; @@ -193,10 +194,7 @@ pcapng_section_block(rte_pcapng_t *self, /* clone block_length after option */ memcpy(opt, &hdr->block_length, sizeof(uint32_t)); - cc = write(self->outfd, buf, len); - free(buf); - - return cc; + return write(self->outfd, buf, len); } /* Write an interface block for a DPDK port */ @@ -213,7 +211,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, struct pcapng_option *opt; const uint8_t tsresol = 9; /* nanosecond resolution */ uint32_t len; - void *buf; + uint8_t buf[PCAPNG_BLKSIZ]; char ifname_buf[IF_NAMESIZE]; char ifhw[256]; uint64_t speed = 0; @@ -267,8 +265,7 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, len += pcapng_optlen(0); len += sizeof(uint32_t); - buf = alloca(len); - if (!buf) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_interface_block *)buf; @@ -296,17 +293,16 @@ rte_pcapng_add_interface(rte_pcapng_t *self, uint16_t port, opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE, ifhw, strlen(ifhw)); if (filter) { - /* Encoding is that the first octet indicates string vs BPF */ size_t len; - char *buf; len = strlen(filter) + 1; - buf = alloca(len); - *buf = '\0'; - memcpy(buf + 1, filter, len); + opt->code = PCAPNG_IFB_FILTER; + opt->length = len; + /* Encoding is that the first octet indicates string vs BPF */ + opt->data[0] = 0; + memcpy(opt->data + 1, filter, strlen(filter)); - opt = pcapng_add_option(opt, PCAPNG_IFB_FILTER, - buf, len); + opt = (struct pcapng_option *)((uint8_t *)opt + pcapng_optlen(len)); } opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0); @@ -333,7 +329,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, uint64_t start_time = self->offset_ns; uint64_t sample_time; uint32_t optlen, len; - uint8_t *buf; + uint8_t buf[PCAPNG_BLKSIZ]; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -353,8 +349,7 @@ rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id, optlen += pcapng_optlen(0); len = sizeof(*hdr) + optlen + sizeof(uint32_t); - buf = alloca(len); - if (buf == NULL) + if (len > sizeof(buf)) return -1; hdr = (struct pcapng_statistics *)buf; -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 5/5] test: cleanups to pcapng test 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (3 preceding siblings ...) 2023-11-17 16:35 ` [PATCH v7 4/5] pcapng: avoid using alloca() Stephen Hemminger @ 2023-11-17 16:35 ` Stephen Hemminger 2023-11-22 22:42 ` [PATCH v7 0/5] dumpcap and pcapng fixes Thomas Monjalon 5 siblings, 0 replies; 61+ messages in thread From: Stephen Hemminger @ 2023-11-17 16:35 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Reshma Pattan Overhaul of the pcapng test: - promote it to be a fast test so it gets regularly run. - create null device and use i. - use UDP discard packets that are valid so that for debugging the resulting pcapng file can be looked at with wireshark. - do basic checks on resulting pcap file that lengths and timestamps are in range. - add test for interface options Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/meson.build | 2 +- app/test/test_pcapng.c | 416 +++++++++++++++++++++++++++-------------- 2 files changed, 281 insertions(+), 137 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 4183d66b0e9c..dcc93f4a43b4 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -128,7 +128,7 @@ source_file_deps = { 'test_metrics.c': ['metrics'], 'test_mp_secondary.c': ['hash', 'lpm'], 'test_net_ether.c': ['net'], - 'test_pcapng.c': ['ethdev', 'net', 'pcapng'], + 'test_pcapng.c': ['ethdev', 'net', 'pcapng', 'bus_vdev'], 'test_pdcp.c': ['eventdev', 'pdcp', 'net', 'timer', 'security'], 'test_pdump.c': ['pdump'] + sample_packet_forward_deps, 'test_per_lcore.c': [], diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index 21131dfa0c5e..89535efad096 100644 --- a/app/test/test_pcapng.c +++ b/app/test/test_pcapng.c @@ -6,25 +6,34 @@ #include <stdlib.h> #include <unistd.h> +#include <rte_bus_vdev.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_ip.h> #include <rte_mbuf.h> #include <rte_mempool.h> #include <rte_net.h> #include <rte_pcapng.h> +#include <rte_random.h> +#include <rte_reciprocal.h> +#include <rte_time.h> +#include <rte_udp.h> #include <pcap/pcap.h> #include "test.h" -#define NUM_PACKETS 10 -#define DUMMY_MBUF_NUM 3 +#define PCAPNG_TEST_DEBUG 0 + +#define TOTAL_PACKETS 4096 +#define MAX_BURST 64 +#define MAX_GAP_US 100000 +#define DUMMY_MBUF_NUM 3 -static rte_pcapng_t *pcapng; static struct rte_mempool *mp; static const uint32_t pkt_len = 200; static uint16_t port_id; -static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; +static const char null_dev[] = "net_null0"; /* first mbuf in the packet, should always be at offset 0 */ struct dummy_mbuf { @@ -61,6 +70,7 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) struct { struct rte_ether_hdr eth; struct rte_ipv4_hdr ip; + struct rte_udp_hdr udp; } pkt = { .eth = { .dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff", @@ -68,148 +78,200 @@ mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen) }, .ip = { .version_ihl = RTE_IPV4_VHL_DEF, - .total_length = rte_cpu_to_be_16(plen), - .time_to_live = IPDEFTTL, - .next_proto_id = IPPROTO_RAW, + .time_to_live = 1, + .next_proto_id = IPPROTO_UDP, .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK), .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST), - } + }, + .udp = { + .dst_port = rte_cpu_to_be_16(9), /* Discard port */ + }, }; memset(dm, 0, sizeof(*dm)); dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen); rte_eth_random_addr(pkt.eth.src_addr.addr_bytes); - memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen)); + plen -= sizeof(struct rte_ether_hdr); + + pkt.ip.total_length = rte_cpu_to_be_16(plen); + pkt.ip.hdr_checksum = rte_ipv4_cksum(&pkt.ip); + + plen -= sizeof(struct rte_ipv4_hdr); + pkt.udp.src_port = rte_rand(); + pkt.udp.dgram_len = rte_cpu_to_be_16(plen); + + memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, sizeof(pkt)); } static int test_setup(void) { - int tmp_fd; - - port_id = rte_eth_find_next(0); - if (port_id >= RTE_MAX_ETHPORTS) { - fprintf(stderr, "No valid Ether port\n"); - return -1; - } + port_id = rte_eth_dev_count_avail(); - tmp_fd = mkstemps(file_name, strlen(".pcapng")); - if (tmp_fd == -1) { - perror("mkstemps() failure"); - return -1; - } - printf("pcapng: output file %s\n", file_name); - - /* open a test capture file */ - pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); - if (pcapng == NULL) { - fprintf(stderr, "rte_pcapng_fdopen failed\n"); - close(tmp_fd); - return -1; - } - - /* Add interface to the file */ - if (rte_pcapng_add_interface(pcapng, port_id, - NULL, NULL, NULL) != 0) { - fprintf(stderr, "can not add port %u\n", port_id); - return -1; + /* Make a dummy null device to snoop on */ + if (rte_vdev_init(null_dev, NULL) != 0) { + fprintf(stderr, "Failed to create vdev '%s'\n", null_dev); + goto fail; } /* Make a pool for cloned packets */ - mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS, - 0, 0, - rte_pcapng_mbuf_size(pkt_len), + mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", + MAX_BURST, 0, 0, + rte_pcapng_mbuf_size(pkt_len) + 128, SOCKET_ID_ANY, "ring_mp_sc"); if (mp == NULL) { fprintf(stderr, "Cannot create mempool\n"); - return -1; + goto fail; } + return 0; + +fail: + rte_vdev_uninit(null_dev); + rte_mempool_free(mp); + return -1; } static int -test_write_packets(void) +fill_pcapng_file(rte_pcapng_t *pcapng, unsigned int num_packets) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[NUM_PACKETS] = { }; struct dummy_mbuf mbfs; - unsigned int i; + struct rte_mbuf *orig; + unsigned int burst_size; + unsigned int count; ssize_t len; /* make a dummy packet */ mbuf1_prepare(&mbfs, pkt_len); - - /* clone them */ orig = &mbfs.mb[0]; - for (i = 0; i < NUM_PACKETS; i++) { - struct rte_mbuf *mc; - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); + for (count = 0; count < num_packets; count += burst_size) { + struct rte_mbuf *clones[MAX_BURST]; + unsigned int i; + + /* put 1 .. MAX_BURST packets in one write call */ + burst_size = rte_rand_max(MAX_BURST) + 1; + for (i = 0; i < burst_size; i++) { + struct rte_mbuf *mc; + + mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, + RTE_PCAPNG_DIRECTION_IN, NULL); + if (mc == NULL) { + fprintf(stderr, "Cannot copy packet\n"); + return -1; + } + clones[i] = mc; + } + + /* write it to capture file */ + len = rte_pcapng_write_packets(pcapng, clones, burst_size); + rte_pktmbuf_free_bulk(clones, burst_size); + + if (len <= 0) { + fprintf(stderr, "Write of packets failed: %s\n", + rte_strerror(rte_errno)); return -1; } - clones[i] = mc; + + /* Leave a small gap between packets to test for time wrap */ + usleep(rte_rand_max(MAX_GAP_US)); } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS); + return count; +} - rte_pktmbuf_free_bulk(clones, NUM_PACKETS); +static char * +fmt_time(char *buf, size_t size, uint64_t ts_ns) +{ + time_t sec; + size_t len; - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; - } + sec = ts_ns / NS_PER_S; + len = strftime(buf, size, "%X", localtime(&sec)); + snprintf(buf + len, size - len, ".%09lu", + (unsigned long)(ts_ns % NS_PER_S)); - return 0; + return buf; } -static int -test_write_stats(void) +/* Context for the pcap_loop callback */ +struct pkt_print_ctx { + pcap_t *pcap; + unsigned int count; + uint64_t start_ns; + uint64_t end_ns; +}; + +static void +print_packet(uint64_t ts_ns, const struct rte_ether_hdr *eh, size_t len) { - ssize_t len; + char tbuf[128], src[64], dst[64]; - /* write a statistics block */ - len = rte_pcapng_write_stats(pcapng, port_id, - UINT64_MAX, UINT64_MAX, NULL); - if (len <= 0) { - fprintf(stderr, "Write of statistics failed\n"); - return -1; - } - return 0; + fmt_time(tbuf, sizeof(tbuf), ts_ns); + rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); + rte_ether_format_addr(src, sizeof(src), &eh->src_addr); + printf("%s: %s -> %s type %x length %zu\n", + tbuf, src, dst, rte_be_to_cpu_16(eh->ether_type), len); } +/* Callback from pcap_loop used to validate packets in the file */ static void -pkt_print(u_char *user, const struct pcap_pkthdr *h, - const u_char *bytes) +parse_pcap_packet(u_char *user, const struct pcap_pkthdr *h, + const u_char *bytes) { - unsigned int *countp = (unsigned int *)user; + struct pkt_print_ctx *ctx = (struct pkt_print_ctx *)user; const struct rte_ether_hdr *eh; - struct tm *tm; - char tbuf[128], src[64], dst[64]; + const struct rte_ipv4_hdr *ip; + uint64_t ns; - tm = localtime(&h->ts.tv_sec); - if (tm == NULL) { - perror("localtime"); - return; + eh = (const struct rte_ether_hdr *)bytes; + ip = (const struct rte_ipv4_hdr *)(eh + 1); + + ctx->count += 1; + + /* The pcap library is misleading in reporting timestamp. + * packet header struct gives timestamp as a timeval (ie. usec); + * but the file is open in nanonsecond mode therefore + * the timestamp is really in timespec (ie. nanoseconds). + */ + ns = h->ts.tv_sec * NS_PER_S + h->ts.tv_usec; + if (ns < ctx->start_ns || ns > ctx->end_ns) { + char tstart[128], tend[128]; + + fmt_time(tstart, sizeof(tstart), ctx->start_ns); + fmt_time(tend, sizeof(tend), ctx->end_ns); + fprintf(stderr, "Timestamp out of range [%s .. %s]\n", + tstart, tend); + goto error; } - if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) { - fprintf(stderr, "strftime returned 0!\n"); - return; + if (!rte_is_broadcast_ether_addr(&eh->dst_addr)) { + fprintf(stderr, "Destination is not broadcast\n"); + goto error; } - eh = (const struct rte_ether_hdr *)bytes; - rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr); - rte_ether_format_addr(src, sizeof(src), &eh->src_addr); - printf("%s.%06lu: %s -> %s type %x length %u\n", - tbuf, (unsigned long)h->ts.tv_usec, - src, dst, rte_be_to_cpu_16(eh->ether_type), h->len); + if (rte_ipv4_cksum(ip) != 0) { + fprintf(stderr, "Bad IPv4 checksum\n"); + goto error; + } + + return; /* packet is normal */ + +error: + print_packet(ns, eh, h->len); + + /* Stop parsing at first error */ + pcap_breakloop(ctx->pcap); +} - *countp += 1; +static uint64_t +current_timestamp(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_REALTIME, &ts); + return rte_timespec_to_ns(&ts); } /* @@ -218,78 +280,162 @@ pkt_print(u_char *user, const struct pcap_pkthdr *h, * but that creates an unwanted dependency. */ static int -test_validate(void) +valid_pcapng_file(const char *file_name, uint64_t started, unsigned int expected) { char errbuf[PCAP_ERRBUF_SIZE]; - unsigned int count = 0; - pcap_t *pcap; + struct pkt_print_ctx ctx = { }; int ret; - pcap = pcap_open_offline(file_name, errbuf); - if (pcap == NULL) { + ctx.start_ns = started; + ctx.end_ns = current_timestamp(); + + ctx.pcap = pcap_open_offline_with_tstamp_precision(file_name, + PCAP_TSTAMP_PRECISION_NANO, + errbuf); + if (ctx.pcap == NULL) { fprintf(stderr, "pcap_open_offline('%s') failed: %s\n", file_name, errbuf); return -1; } - ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count); - if (ret == 0) - printf("Saw %u packets\n", count); - else + ret = pcap_loop(ctx.pcap, 0, parse_pcap_packet, (u_char *)&ctx); + if (ret != 0) { fprintf(stderr, "pcap_dispatch: failed: %s\n", - pcap_geterr(pcap)); - pcap_close(pcap); + pcap_geterr(ctx.pcap)); + } else if (ctx.count != expected) { + printf("Only %u packets, expected %u\n", + ctx.count, expected); + ret = -1; + } + + pcap_close(ctx.pcap); return ret; } static int -test_write_over_limit_iov_max(void) +test_add_interface(void) { - struct rte_mbuf *orig; - struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { }; - struct dummy_mbuf mbfs; - unsigned int i; - ssize_t len; + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd; + uint64_t now = current_timestamp(); - /* make a dummy packet */ - mbuf1_prepare(&mbfs, pkt_len); + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; + } + printf("pcapng: output file %s\n", file_name); - /* clone them */ - orig = &mbfs.mb[0]; - for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) { - struct rte_mbuf *mc; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_addif", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } - mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len, - RTE_PCAPNG_DIRECTION_UNKNOWN, NULL); - if (mc == NULL) { - fprintf(stderr, "Cannot copy packet\n"); - return -1; - } - clones[i] = mc; + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; } - /* write it to capture file */ - len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS); + /* Add interface with ifname and ifdescr */ + ret = rte_pcapng_add_interface(pcapng, port_id, + "myeth", "Some long description", NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u with ifname\n", port_id); + goto fail; + } + + /* Add interface with filter */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, "tcp port 8080"); + if (ret < 0) { + fprintf(stderr, "can not add port %u with filter\n", port_id); + goto fail; + } - rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS); + rte_pcapng_close(pcapng); - if (len <= 0) { - fprintf(stderr, "Write of packets failed\n"); - return -1; + ret = valid_pcapng_file(file_name, now, 0); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; +} + +static int +test_write_packets(void) +{ + char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng"; + static rte_pcapng_t *pcapng; + int ret, tmp_fd, count; + uint64_t now = current_timestamp(); + + tmp_fd = mkstemps(file_name, strlen(".pcapng")); + if (tmp_fd == -1) { + perror("mkstemps() failure"); + goto fail; } + printf("pcapng: output file %s\n", file_name); - return 0; + /* open a test capture file */ + pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL); + if (pcapng == NULL) { + fprintf(stderr, "rte_pcapng_fdopen failed\n"); + close(tmp_fd); + goto fail; + } + + /* Add interface to the file */ + ret = rte_pcapng_add_interface(pcapng, port_id, + NULL, NULL, NULL); + if (ret < 0) { + fprintf(stderr, "can not add port %u\n", port_id); + goto fail; + } + + count = fill_pcapng_file(pcapng, TOTAL_PACKETS); + if (count < 0) + goto fail; + + /* write a statistics block */ + ret = rte_pcapng_write_stats(pcapng, port_id, + count, 0, "end of test"); + if (ret <= 0) { + fprintf(stderr, "Write of statistics failed\n"); + goto fail; + } + + rte_pcapng_close(pcapng); + + ret = valid_pcapng_file(file_name, now, count); + /* if test fails want to investigate the file */ + if (ret == 0) + unlink(file_name); + + return ret; + +fail: + rte_pcapng_close(pcapng); + return -1; } static void test_cleanup(void) { rte_mempool_free(mp); - - if (pcapng) - rte_pcapng_close(pcapng); - + rte_vdev_uninit(null_dev); } static struct @@ -298,10 +444,8 @@ unit_test_suite test_pcapng_suite = { .teardown = test_cleanup, .suite_name = "Test Pcapng Unit Test Suite", .unit_test_cases = { + TEST_CASE(test_add_interface), TEST_CASE(test_write_packets), - TEST_CASE(test_write_stats), - TEST_CASE(test_validate), - TEST_CASE(test_write_over_limit_iov_max), TEST_CASES_END() } }; @@ -312,4 +456,4 @@ test_pcapng(void) return unit_test_suite_runner(&test_pcapng_suite); } -REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng); +REGISTER_FAST_TEST(pcapng_autotest, true, true, test_pcapng); -- 2.42.0 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 0/5] dumpcap and pcapng fixes 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger ` (4 preceding siblings ...) 2023-11-17 16:35 ` [PATCH v7 5/5] test: cleanups to pcapng test Stephen Hemminger @ 2023-11-22 22:42 ` Thomas Monjalon 5 siblings, 0 replies; 61+ messages in thread From: Thomas Monjalon @ 2023-11-22 22:42 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev > Stephen Hemminger (5): > pdump: fix setting rte_errno on mp error > dumpcap: allow multiple invocations > pcapng: modify timestamp calculation > pcapng: avoid using alloca() > test: cleanups to pcapng test Applied with a note about the API change in release notes. ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2023-11-22 22:42 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-21 4:23 [PATCH 0/4] pcapng fixes Stephen Hemminger 2023-09-21 4:23 ` [PATCH 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-09-21 4:23 ` [PATCH 2/4] dumpcap: allow multiple invocations Stephen Hemminger 2023-09-21 6:22 ` Morten Brørup 2023-09-21 7:10 ` Isaac Boukris 2023-11-07 2:34 ` Stephen Hemminger 2023-09-21 4:23 ` [PATCH 3/4] pcapng: change timestamp argument to write_stats Stephen Hemminger 2023-09-21 4:23 ` [PATCH 4/4] pcapng: move timestamp calculation into pdump Stephen Hemminger 2023-10-02 8:15 ` David Marchand 2023-10-04 17:13 ` Stephen Hemminger 2023-10-06 9:10 ` David Marchand 2023-10-06 14:59 ` Kevin Traynor 2023-10-05 23:06 ` [PATCH v2 0/4] dumpcap and pcapng fixes Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 1/4] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 2/4] dumpcap: allow multiple invocations Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 3/4] pcapng: modify timestamp calculation Stephen Hemminger 2023-10-05 23:06 ` [PATCH v2 4/4] test: cleanups to pcapng test Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-08 18:35 ` [PATCH v3 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-09 7:34 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 2/5] dumpcap: allow multiple invocations Stephen Hemminger 2023-11-09 7:50 ` Morten Brørup 2023-11-09 15:40 ` Stephen Hemminger 2023-11-09 16:00 ` Morten Brørup 2023-11-09 17:16 ` Stephen Hemminger 2023-11-09 18:22 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 3/5] pcapng: modify timestamp calculation Stephen Hemminger 2023-11-09 7:57 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-09 8:21 ` Morten Brørup 2023-11-09 15:44 ` Stephen Hemminger 2023-11-09 16:25 ` Morten Brørup 2023-11-08 18:35 ` [PATCH v3 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 2/5] dumpcap: allow multiple invocations Stephen Hemminger 2023-11-09 18:30 ` Morten Brørup 2023-11-09 17:34 ` [PATCH v4 3/5] pcapng: modify timestamp calculation Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-09 17:34 ` [PATCH v4 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 2/5] dumpcap: allow multiple invocations Stephen Hemminger 2023-11-09 20:09 ` Morten Brørup 2023-11-09 19:45 ` [PATCH v5 3/5] pcapng: modify timestamp calculation Stephen Hemminger 2023-11-12 14:22 ` Thomas Monjalon 2023-11-09 19:45 ` [PATCH v5 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-09 19:45 ` [PATCH v5 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 2/5] dumpcap: allow multiple invocations Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 3/5] pcapng: modify timestamp calculation Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-13 16:15 ` [PATCH v6 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 0/5] dumpcap and pcapng fixes Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 1/5] pdump: fix setting rte_errno on mp error Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 2/5] dumpcap: allow multiple invocations Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 3/5] pcapng: modify timestamp calculation Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 4/5] pcapng: avoid using alloca() Stephen Hemminger 2023-11-17 16:35 ` [PATCH v7 5/5] test: cleanups to pcapng test Stephen Hemminger 2023-11-22 22:42 ` [PATCH v7 0/5] dumpcap and pcapng fixes Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).