* [PATCH] pcapng: fix write more packets than IOV_MAX limit
@ 2022-07-25 15:28 Mário Kuka
2022-07-25 15:57 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Mário Kuka @ 2022-07-25 15:28 UTC (permalink / raw)
To: Reshma Pattan, Stephen Hemminger, Ray Kinsella; +Cc: dev, Mário Kuka
The rte_pcapng_write_packets() function fails when we try to write more
packets than the IOV_MAX limit. The error is caused by the writev()
system call, which is limited by the IOV_MAX limit. The iovcnt argument
is valid if it is greater than 0 and less than or equal to IOV_MAX as
defined in <limits.h>.
To avoid this problem, we can split the iovec buffer into smaller
chunks with a maximum size of IOV_MAX and write them sequentially by
calling the writev() repeatedly.
Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Cc: stephen@networkplumber.org
Signed-off-by: Mário Kuka <kuka@cesnet.cz>
---
app/test/test_pcapng.c | 42 ++++++++++++++++++++++++++++-
lib/pcapng/rte_pcapng.c | 58 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index 320dacea34..7f51946fff 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -110,7 +110,7 @@ test_setup(void)
}
/* Make a pool for cloned packets */
- mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+ mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS,
0, 0,
rte_pcapng_mbuf_size(pkt_len),
SOCKET_ID_ANY, "ring_mp_sc");
@@ -237,6 +237,45 @@ test_validate(void)
return ret;
}
+static int
+test_write_over_limit_iov_max(void)
+{
+ struct rte_mbuf *orig;
+ struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { };
+ struct dummy_mbuf mbfs;
+ unsigned int i;
+ ssize_t len;
+
+ /* make a dummy packet */
+ mbuf1_prepare(&mbfs, pkt_len);
+
+ /* clone them */
+ orig = &mbfs.mb[0];
+ for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) {
+ struct rte_mbuf *mc;
+
+ mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+ rte_get_tsc_cycles(), 0);
+ if (mc == NULL) {
+ fprintf(stderr, "Cannot copy packet\n");
+ return -1;
+ }
+ clones[i] = mc;
+ }
+
+ /* write it to capture file */
+ len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS);
+
+ rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS);
+
+ if (len <= 0) {
+ fprintf(stderr, "Write of packets failed\n");
+ return -1;
+ }
+
+ return 0;
+}
+
static void
test_cleanup(void)
{
@@ -256,6 +295,7 @@ unit_test_suite test_pcapng_suite = {
TEST_CASE(test_write_packets),
TEST_CASE(test_write_stats),
TEST_CASE(test_validate),
+ TEST_CASE(test_write_over_limit_iov_max),
TEST_CASES_END()
}
};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 06ad712bd1..5762f89cb9 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -567,6 +567,62 @@ mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
return iovcnt;
}
+/*
+ * Update iov after writev() has returned written. We must find how many iov
+ * buffers (from beginning) have been written. The first buffer that was not
+ * written fully is to be updated accordingly.
+ *
+ * Returns offset of buffer that was not written fully.
+ */
+static int
+pcapng_update_iov(struct iovec *iov, const int count, size_t written)
+{
+ for (int i = 0; written > 0 && i < count; ++i) {
+ if (written < iov[i].iov_len) {
+ /* found buffer that was not written fully */
+ iov[i].iov_base = RTE_PTR_ADD(iov[i].iov_base, written);
+ iov[i].iov_len -= written;
+
+ return i;
+ }
+
+ /* buffer fully written, zero it and skip */
+ written -= iov[i].iov_len;
+
+ iov[i].iov_base = NULL;
+ iov[i].iov_len = 0;
+ }
+
+ return count;
+}
+
+/*
+ * Writes all iovcnt buffers of data described by iov to the file associated with
+ * the file descriptor fd.
+ *
+ * Note: POSIX.1-2001 allows an implementation to place a limit on the number
+ * of items that can be passed in iov. An implementation can advertise
+ * its limit by defining IOV_MAX in <limits.h>.
+ */
+static ssize_t
+pcapng_writev(int fd, struct iovec *iov, const int count)
+{
+ size_t total = 0;
+ int at = 0;
+
+ while (at < count) {
+ const int iov_cnt = RTE_MIN(count - at, IOV_MAX);
+ ssize_t wlen = writev(fd, &iov[at], iov_cnt);
+ if (unlikely(wlen < 0))
+ return wlen;
+
+ total += wlen;
+ at += pcapng_update_iov(&iov[at], iov_cnt, wlen);
+ }
+
+ return total;
+}
+
/* Write pre-formatted packets to file. */
ssize_t
rte_pcapng_write_packets(rte_pcapng_t *self,
@@ -601,7 +657,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
} while ((m = m->next));
}
- ret = writev(self->outfd, iov, iovcnt);
+ ret = pcapng_writev(self->outfd, iov, iovcnt);
if (unlikely(ret < 0))
rte_errno = errno;
return ret;
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pcapng: fix write more packets than IOV_MAX limit
2022-07-25 15:28 [PATCH] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
@ 2022-07-25 15:57 ` Stephen Hemminger
2022-07-25 16:10 ` Stephen Hemminger
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
2 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2022-07-25 15:57 UTC (permalink / raw)
To: Mário Kuka; +Cc: Reshma Pattan, Ray Kinsella, dev
On Mon, 25 Jul 2022 17:28:11 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> The rte_pcapng_write_packets() function fails when we try to write more
> packets than the IOV_MAX limit. The error is caused by the writev()
> system call, which is limited by the IOV_MAX limit. The iovcnt argument
> is valid if it is greater than 0 and less than or equal to IOV_MAX as
> defined in <limits.h>.
>
> To avoid this problem, we can split the iovec buffer into smaller
> chunks with a maximum size of IOV_MAX and write them sequentially by
> calling the writev() repeatedly.
That risks writing partial packets.
Better to move the break down inside the logic of rte_pcapng_write_packets()
and at the same time, remove the VLA of iov[iovcnt]
>
> Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> Cc: stephen@networkplumber.org
>
> Signed-off-by: Mário Kuka <kuka@cesnet.cz>
What is the burst size here?
IOV_MAX is 1024, and typical max mbuf per packet would be 5
that means as is the code works for up to 204 burst size.
Sure you can use DPDK in odd ways with small mbuf sizes and huge
burst sizes, but this needs to only in some slow path.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pcapng: fix write more packets than IOV_MAX limit
2022-07-25 15:28 [PATCH] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-07-25 15:57 ` Stephen Hemminger
@ 2022-07-25 16:10 ` Stephen Hemminger
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
2 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2022-07-25 16:10 UTC (permalink / raw)
To: Mário Kuka; +Cc: Reshma Pattan, Ray Kinsella, dev
On Mon, 25 Jul 2022 17:28:11 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> The rte_pcapng_write_packets() function fails when we try to write more
> packets than the IOV_MAX limit. The error is caused by the writev()
> system call, which is limited by the IOV_MAX limit. The iovcnt argument
> is valid if it is greater than 0 and less than or equal to IOV_MAX as
> defined in <limits.h>.
>
> To avoid this problem, we can split the iovec buffer into smaller
> chunks with a maximum size of IOV_MAX and write them sequentially by
> calling the writev() repeatedly.
>
> Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> Cc: stephen@networkplumber.org
>
> Signed-off-by: Mário Kuka <kuka@cesnet.cz>
> ---
Something like this (compile tested only).
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 06ad712bd1eb..e41cf909e120 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -551,33 +551,16 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
return NULL;
}
-/* Count how many segments are in this array of mbufs */
-static unsigned int
-mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
-{
- unsigned int i, iovcnt;
-
- for (iovcnt = 0, i = 0; i < n; i++) {
- const struct rte_mbuf *m = pkts[i];
-
- __rte_mbuf_sanity_check(m, 1);
-
- iovcnt += m->nb_segs;
- }
- return iovcnt;
-}
-
/* Write pre-formatted packets to file. */
ssize_t
rte_pcapng_write_packets(rte_pcapng_t *self,
struct rte_mbuf *pkts[], uint16_t nb_pkts)
{
- int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
- struct iovec iov[iovcnt];
- unsigned int i, cnt;
- ssize_t ret;
+ struct iovec iov[IOV_MAX];
+ unsigned int i, cnt = 0;
+ ssize_t ret, total = 0;
- for (i = cnt = 0; i < nb_pkts; i++) {
+ for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
@@ -589,6 +572,20 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
return -1;
}
+ /*
+ * Handle case of highly fragmented and large burst size
+ * Note: this assumes that max segments per mbuf < IOV_MAX
+ */
+ if (unlikely(cnt + m->nb_segs >= IOV_MAX)) {
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
+ rte_errno = errno;
+ return -1;
+ }
+ total += ret;
+ cnt = 0;
+ }
+
/*
* The DPDK port is recorded during pcapng_copy.
* Map that to PCAPNG interface in file.
@@ -601,10 +598,12 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
} while ((m = m->next));
}
- ret = writev(self->outfd, iov, iovcnt);
- if (unlikely(ret < 0))
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
rte_errno = errno;
- return ret;
+ return -1;
+ }
+ return total + ret;
}
/* Create new pcapng writer handle */
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 0/2] pcapng: fix some issues with writing packets.
2022-07-25 15:28 [PATCH] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-07-25 15:57 ` Stephen Hemminger
2022-07-25 16:10 ` Stephen Hemminger
@ 2022-07-29 7:18 ` Mário Kuka
2022-07-29 7:18 ` [PATCH v2 1/2] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
` (3 more replies)
2 siblings, 4 replies; 15+ messages in thread
From: Mário Kuka @ 2022-07-29 7:18 UTC (permalink / raw)
To: kuka; +Cc: dev, mdr, reshma.pattan, stephen
This patchset contains fixes for some issues that occur when writing a
large burst of packets at once, such as writing more packets than the
IOV_MAX limit and the problem of partial writing of a packet to a file
if the writev() system call performs a partial write.
The typical use of pcapng in our cases is to copy the packets into a
separate buffer and the process of writing to the file is done in some
slow path, for example by writing in a separate thread or at the end of
the application, where we don't mind the limitation of the typically
slow speed of the storage medium.
Mário Kuka (2):
pcapng: fix write more packets than IOV_MAX limit
pcapng: check if writev() returns a partial write
app/test/test_pcapng.c | 42 +++++++++++++++++-
lib/pcapng/rte_pcapng.c | 96 +++++++++++++++++++++++++++++++++--------
2 files changed, 120 insertions(+), 18 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] pcapng: fix write more packets than IOV_MAX limit
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
@ 2022-07-29 7:18 ` Mário Kuka
2022-07-29 7:18 ` [PATCH v2 2/2] pcapng: check if writev() returns a partial write Mário Kuka
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Mário Kuka @ 2022-07-29 7:18 UTC (permalink / raw)
To: kuka; +Cc: dev, mdr, reshma.pattan, stephen
The rte_pcapng_write_packets() function fails when we try to write more
packets than the IOV_MAX limit. writev() system call is limited by the
IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
less than or equal to IOV_MAX as defined in <limits.h>.
To avoid this problem, we can check that all segments of the next
packet will fit into the iovec buffer, whose capacity will be limited
by the IOV_MAX limit. If not, we flush the current iovec buffer to the
file by calling writev() and, if successful, fit the current packet at
the beginning of the flushed iovec buffer.
Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Cc: stephen@networkplumber.org
Signed-off-by: Mário Kuka <kuka@cesnet.cz>
---
app/test/test_pcapng.c | 42 +++++++++++++++++++++++++++++++++++-
lib/pcapng/rte_pcapng.c | 47 ++++++++++++++++++++---------------------
2 files changed, 64 insertions(+), 25 deletions(-)
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index 320dacea34..7f51946fff 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -110,7 +110,7 @@ test_setup(void)
}
/* Make a pool for cloned packets */
- mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+ mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS,
0, 0,
rte_pcapng_mbuf_size(pkt_len),
SOCKET_ID_ANY, "ring_mp_sc");
@@ -237,6 +237,45 @@ test_validate(void)
return ret;
}
+static int
+test_write_over_limit_iov_max(void)
+{
+ struct rte_mbuf *orig;
+ struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { };
+ struct dummy_mbuf mbfs;
+ unsigned int i;
+ ssize_t len;
+
+ /* make a dummy packet */
+ mbuf1_prepare(&mbfs, pkt_len);
+
+ /* clone them */
+ orig = &mbfs.mb[0];
+ for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) {
+ struct rte_mbuf *mc;
+
+ mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+ rte_get_tsc_cycles(), 0);
+ if (mc == NULL) {
+ fprintf(stderr, "Cannot copy packet\n");
+ return -1;
+ }
+ clones[i] = mc;
+ }
+
+ /* write it to capture file */
+ len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS);
+
+ rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS);
+
+ if (len <= 0) {
+ fprintf(stderr, "Write of packets failed\n");
+ return -1;
+ }
+
+ return 0;
+}
+
static void
test_cleanup(void)
{
@@ -256,6 +295,7 @@ unit_test_suite test_pcapng_suite = {
TEST_CASE(test_write_packets),
TEST_CASE(test_write_stats),
TEST_CASE(test_validate),
+ TEST_CASE(test_write_over_limit_iov_max),
TEST_CASES_END()
}
};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 06ad712bd1..e41cf909e1 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -551,33 +551,16 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
return NULL;
}
-/* Count how many segments are in this array of mbufs */
-static unsigned int
-mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
-{
- unsigned int i, iovcnt;
-
- for (iovcnt = 0, i = 0; i < n; i++) {
- const struct rte_mbuf *m = pkts[i];
-
- __rte_mbuf_sanity_check(m, 1);
-
- iovcnt += m->nb_segs;
- }
- return iovcnt;
-}
-
/* Write pre-formatted packets to file. */
ssize_t
rte_pcapng_write_packets(rte_pcapng_t *self,
struct rte_mbuf *pkts[], uint16_t nb_pkts)
{
- int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
- struct iovec iov[iovcnt];
- unsigned int i, cnt;
- ssize_t ret;
+ struct iovec iov[IOV_MAX];
+ unsigned int i, cnt = 0;
+ ssize_t ret, total = 0;
- for (i = cnt = 0; i < nb_pkts; i++) {
+ for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
@@ -589,6 +572,20 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
return -1;
}
+ /*
+ * Handle case of highly fragmented and large burst size
+ * Note: this assumes that max segments per mbuf < IOV_MAX
+ */
+ if (unlikely(cnt + m->nb_segs >= IOV_MAX)) {
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
+ rte_errno = errno;
+ return -1;
+ }
+ total += ret;
+ cnt = 0;
+ }
+
/*
* The DPDK port is recorded during pcapng_copy.
* Map that to PCAPNG interface in file.
@@ -601,10 +598,12 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
} while ((m = m->next));
}
- ret = writev(self->outfd, iov, iovcnt);
- if (unlikely(ret < 0))
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
rte_errno = errno;
- return ret;
+ return -1;
+ }
+ return total + ret;
}
/* Create new pcapng writer handle */
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] pcapng: check if writev() returns a partial write
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
2022-07-29 7:18 ` [PATCH v2 1/2] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
@ 2022-07-29 7:18 ` Mário Kuka
2022-07-29 16:00 ` Stephen Hemminger
2022-07-29 15:58 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Stephen Hemminger
2022-08-01 8:40 ` [PATCH v3] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
3 siblings, 1 reply; 15+ messages in thread
From: Mário Kuka @ 2022-07-29 7:18 UTC (permalink / raw)
To: kuka; +Cc: dev, mdr, reshma.pattan, stephen
The result from wrtiev() is not checked. When writev() returns
a partial write, the output file will not contain all packets from the
pkts buffer and some packets may be partially written, which is
undesirable behavior.
To avoid this problem, we have to check the number of bytes returned
from the writev(), and if we get a partial write, we need to call the
writev() function again on any ivo buffers that were not written or
were written partially.
Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Cc: stephen@networkplumber.org
Signed-off-by: Mário Kuka <kuka@cesnet.cz>
---
lib/pcapng/rte_pcapng.c | 67 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 65 insertions(+), 2 deletions(-)
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index e41cf909e1..7c1136337c 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -551,6 +551,69 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
return NULL;
}
+/*
+ * Update iov after writev() has returned written. We must find how many iov
+ * buffers (from beginning) have been written. The first buffer that was not
+ * written fully is to be updated accordingly.
+ *
+ * Returns offset of buffer that was not written fully.
+ */
+static int
+pcapng_update_iov(struct iovec *iov, const int count, size_t written)
+{
+ int i;
+
+ for (i = 0; written > 0 && i < count; ++i) {
+ if (written < iov[i].iov_len) {
+ /* found buffer that was not written fully */
+ iov[i].iov_base = RTE_PTR_ADD(iov[i].iov_base, written);
+ iov[i].iov_len -= written;
+
+ return i;
+ }
+
+ written -= iov[i].iov_len;
+ }
+
+ return count;
+}
+
+/*
+ * Writes all iovcnt buffers of data described by iov to the file associated with
+ * the file descriptor fd.
+ */
+static ssize_t
+pcapng_writev(int fd, struct iovec *iov, const int count)
+{
+ size_t total = 0;
+ int at = 0;
+
+ while (at < count) {
+ /*
+ * Note: writev() can return the following on a write request:
+ * Complete:
+ * written = [sum of all iov.iov_len]
+ * Partial:
+ * written < [sum of all iov.iov_len]
+ * Deferred:
+ * written = -1, errno = [EAGAIN]
+ *
+ * Partial and deferred writes are only possible with O_NONBLOCK set.
+ *
+ * If we get a partial result, we have to call the writev() again on any ivo buffers
+ * that have not been fully written.
+ */
+ ssize_t written = writev(fd, &iov[at], count - at);
+ if (unlikely(written < 0))
+ return written;
+
+ total += written;
+ at += pcapng_update_iov(&iov[at], count - at, written);
+ }
+
+ return total;
+}
+
/* Write pre-formatted packets to file. */
ssize_t
rte_pcapng_write_packets(rte_pcapng_t *self,
@@ -577,7 +640,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
* Note: this assumes that max segments per mbuf < IOV_MAX
*/
if (unlikely(cnt + m->nb_segs >= IOV_MAX)) {
- ret = writev(self->outfd, iov, cnt);
+ ret = pcapng_writev(self->outfd, iov, cnt);
if (unlikely(ret < 0)) {
rte_errno = errno;
return -1;
@@ -598,7 +661,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
} while ((m = m->next));
}
- ret = writev(self->outfd, iov, cnt);
+ ret = pcapng_writev(self->outfd, iov, cnt);
if (unlikely(ret < 0)) {
rte_errno = errno;
return -1;
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] pcapng: fix some issues with writing packets.
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
2022-07-29 7:18 ` [PATCH v2 1/2] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-07-29 7:18 ` [PATCH v2 2/2] pcapng: check if writev() returns a partial write Mário Kuka
@ 2022-07-29 15:58 ` Stephen Hemminger
2022-07-29 17:33 ` Mário Kuka
2022-08-01 8:40 ` [PATCH v3] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
3 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2022-07-29 15:58 UTC (permalink / raw)
To: Mário Kuka; +Cc: dev, mdr, reshma.pattan
On Fri, 29 Jul 2022 09:18:39 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> This patchset contains fixes for some issues that occur when writing a
> large burst of packets at once, such as writing more packets than the
> IOV_MAX limit and the problem of partial writing of a packet to a file
> if the writev() system call performs a partial write.
>
> The typical use of pcapng in our cases is to copy the packets into a
> separate buffer and the process of writing to the file is done in some
> slow path, for example by writing in a separate thread or at the end of
> the application, where we don't mind the limitation of the typically
> slow speed of the storage medium.
>
> Mário Kuka (2):
> pcapng: fix write more packets than IOV_MAX limit
> pcapng: check if writev() returns a partial write
>
> app/test/test_pcapng.c | 42 +++++++++++++++++-
> lib/pcapng/rte_pcapng.c | 96 +++++++++++++++++++++++++++++++++--------
> 2 files changed, 120 insertions(+), 18 deletions(-)
>
You ignored my feedback from earlier patch.
It seems you are adding a lot more here that is necessary.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] pcapng: check if writev() returns a partial write
2022-07-29 7:18 ` [PATCH v2 2/2] pcapng: check if writev() returns a partial write Mário Kuka
@ 2022-07-29 16:00 ` Stephen Hemminger
2022-07-29 17:08 ` Mário Kuka
0 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2022-07-29 16:00 UTC (permalink / raw)
To: Mário Kuka; +Cc: dev, mdr, reshma.pattan
On Fri, 29 Jul 2022 09:18:41 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> +pcapng_writev(int fd, struct iovec *iov, const int count)
> +{
> + size_t total = 0;
> + int at = 0;
> +
> + while (at < count) {
> + /*
> + * Note: writev() can return the following on a write request:
> + * Complete:
> + * written = [sum of all iov.iov_len]
> + * Partial:
> + * written < [sum of all iov.iov_len]
> + * Deferred:
> + * written = -1, errno = [EAGAIN]
> + *
> + * Partial and deferred writes are only possible with O_NONBLOCK set.
> + *
> + * If we get a partial result, we have to call the writev() again on any ivo buffers
> + * that have not been fully written.
> + */
> + ssize_t written = writev(fd, &iov[at], count - at);
> + if (unlikely(written < 0))
> + return written;
> +
> + total += written;
> + at += pcapng_update_iov(&iov[at], count - at, written);
> + }
> +
> + return total;
Since this is being written to a file, handling partial writes makes little
sense. The only case where partial write would happen would be if filesystem
was full. Retrying just adds unnecessary complexity.
If you really want to track this, then add a dropped counter.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] pcapng: check if writev() returns a partial write
2022-07-29 16:00 ` Stephen Hemminger
@ 2022-07-29 17:08 ` Mário Kuka
2022-07-29 18:14 ` Stephen Hemminger
0 siblings, 1 reply; 15+ messages in thread
From: Mário Kuka @ 2022-07-29 17:08 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, mdr, reshma.pattan
[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]
> Since this is being written to a file, handling partial writes makes little
> sense. The only case where partial write would happen would be if filesystem
> was full. Retrying just adds unnecessary complexity.
>
> If you really want to track this, then add a dropped counter.
But the file descriptor doesn't have to refer to just a regular file, what
if it's a socket or a pipe or some device? The pcapng documentation doesn't
say anything about any restrictions, so the implementation should be fully
generic. What's the point of a function to write packets to a file
descriptor
where there's a risk that it won't write all the packets or that the
file will
by corrupted due to a partial write and still not even let me know about
it?
On 29/07/2022 18:00, Stephen Hemminger wrote:
> On Fri, 29 Jul 2022 09:18:41 +0200
> Mário Kuka <kuka@cesnet.cz> wrote:
>
>> +pcapng_writev(int fd, struct iovec *iov, const int count)
>> +{
>> + size_t total = 0;
>> + int at = 0;
>> +
>> + while (at < count) {
>> + /*
>> + * Note: writev() can return the following on a write request:
>> + * Complete:
>> + * written = [sum of all iov.iov_len]
>> + * Partial:
>> + * written < [sum of all iov.iov_len]
>> + * Deferred:
>> + * written = -1, errno = [EAGAIN]
>> + *
>> + * Partial and deferred writes are only possible with O_NONBLOCK set.
>> + *
>> + * If we get a partial result, we have to call the writev() again on any ivo buffers
>> + * that have not been fully written.
>> + */
>> + ssize_t written = writev(fd, &iov[at], count - at);
>> + if (unlikely(written < 0))
>> + return written;
>> +
>> + total += written;
>> + at += pcapng_update_iov(&iov[at], count - at, written);
>> + }
>> +
>> + return total;
> Since this is being written to a file, handling partial writes makes little
> sense. The only case where partial write would happen would be if filesystem
> was full. Retrying just adds unnecessary complexity.
>
> If you really want to track this, then add a dropped counter.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4299 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] pcapng: fix some issues with writing packets.
2022-07-29 15:58 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Stephen Hemminger
@ 2022-07-29 17:33 ` Mário Kuka
0 siblings, 0 replies; 15+ messages in thread
From: Mário Kuka @ 2022-07-29 17:33 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, mdr, reshma.pattan
[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]
> You ignored my feedback from earlier patch.
> It seems you are adding a lot more here that is necessary.
I split the changes into two separate patches, where [PATCH v2 1/2]
addresses
the IOV_MAX limit issue and where I've incorporated your feedback +
added a unit
test that tests this situation. if I did something wrong, let me know.
The problem of partial writig is addressed in the second patch [PATCH v2
2/2]
On 29/07/2022 17:58, Stephen Hemminger wrote:
> On Fri, 29 Jul 2022 09:18:39 +0200
> Mário Kuka <kuka@cesnet.cz> wrote:
>
>> This patchset contains fixes for some issues that occur when writing a
>> large burst of packets at once, such as writing more packets than the
>> IOV_MAX limit and the problem of partial writing of a packet to a file
>> if the writev() system call performs a partial write.
>>
>> The typical use of pcapng in our cases is to copy the packets into a
>> separate buffer and the process of writing to the file is done in some
>> slow path, for example by writing in a separate thread or at the end of
>> the application, where we don't mind the limitation of the typically
>> slow speed of the storage medium.
>>
>> Mário Kuka (2):
>> pcapng: fix write more packets than IOV_MAX limit
>> pcapng: check if writev() returns a partial write
>>
>> app/test/test_pcapng.c | 42 +++++++++++++++++-
>> lib/pcapng/rte_pcapng.c | 96 +++++++++++++++++++++++++++++++++--------
>> 2 files changed, 120 insertions(+), 18 deletions(-)
>>
> You ignored my feedback from earlier patch.
> It seems you are adding a lot more here that is necessary.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4299 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] pcapng: check if writev() returns a partial write
2022-07-29 17:08 ` Mário Kuka
@ 2022-07-29 18:14 ` Stephen Hemminger
2022-08-01 8:42 ` Mário Kuka
0 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2022-07-29 18:14 UTC (permalink / raw)
To: Mário Kuka; +Cc: dev, mdr, reshma.pattan
On Fri, 29 Jul 2022 19:08:41 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> > Since this is being written to a file, handling partial writes makes little
> > sense. The only case where partial write would happen would be if filesystem
> > was full. Retrying just adds unnecessary complexity.
> >
> > If you really want to track this, then add a dropped counter.
>
> But the file descriptor doesn't have to refer to just a regular file, what
> if it's a socket or a pipe or some device? The pcapng documentation doesn't
> say anything about any restrictions, so the implementation should be fully
> generic. What's the point of a function to write packets to a file
> descriptor
> where there's a risk that it won't write all the packets or that the
> file will
> by corrupted due to a partial write and still not even let me know about
> it?
As pcapng is used in the dpdk application it writes to a file.
You could repurpose it to something else, but even a pipe will not
give partial writes unless you configure the pipe as non-blocking.
Writing to a non-blocking pipe is going to have a load of other problems.
This seems like a purely hypothetical case, can't see why it needs to be addressed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3] pcapng: fix write more packets than IOV_MAX limit
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
` (2 preceding siblings ...)
2022-07-29 15:58 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Stephen Hemminger
@ 2022-08-01 8:40 ` Mário Kuka
2022-08-01 15:33 ` Stephen Hemminger
3 siblings, 1 reply; 15+ messages in thread
From: Mário Kuka @ 2022-08-01 8:40 UTC (permalink / raw)
To: kuka; +Cc: dev, mdr, reshma.pattan, stephen
The rte_pcapng_write_packets() function fails when we try to write more
packets than the IOV_MAX limit. writev() system call is limited by the
IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
less than or equal to IOV_MAX as defined in <limits.h>.
To avoid this problem, we can check that all segments of the next
packet will fit into the iovec buffer, whose capacity will be limited
by the IOV_MAX limit. If not, we flush the current iovec buffer to the
file by calling writev() and, if successful, fit the current packet at
the beginning of the flushed iovec buffer.
Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Cc: stephen@networkplumber.org
Signed-off-by: Mário Kuka <kuka@cesnet.cz>
---
v3:
* Remove unwanted changes to check for partial writes from the writev().
app/test/test_pcapng.c | 42 +++++++++++++++++++++++++++++++++++-
lib/pcapng/rte_pcapng.c | 47 ++++++++++++++++++++---------------------
2 files changed, 64 insertions(+), 25 deletions(-)
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index 320dacea34..7f51946fff 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -110,7 +110,7 @@ test_setup(void)
}
/* Make a pool for cloned packets */
- mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+ mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + NUM_PACKETS,
0, 0,
rte_pcapng_mbuf_size(pkt_len),
SOCKET_ID_ANY, "ring_mp_sc");
@@ -237,6 +237,45 @@ test_validate(void)
return ret;
}
+static int
+test_write_over_limit_iov_max(void)
+{
+ struct rte_mbuf *orig;
+ struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { };
+ struct dummy_mbuf mbfs;
+ unsigned int i;
+ ssize_t len;
+
+ /* make a dummy packet */
+ mbuf1_prepare(&mbfs, pkt_len);
+
+ /* clone them */
+ orig = &mbfs.mb[0];
+ for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) {
+ struct rte_mbuf *mc;
+
+ mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+ rte_get_tsc_cycles(), 0);
+ if (mc == NULL) {
+ fprintf(stderr, "Cannot copy packet\n");
+ return -1;
+ }
+ clones[i] = mc;
+ }
+
+ /* write it to capture file */
+ len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS);
+
+ rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS);
+
+ if (len <= 0) {
+ fprintf(stderr, "Write of packets failed\n");
+ return -1;
+ }
+
+ return 0;
+}
+
static void
test_cleanup(void)
{
@@ -256,6 +295,7 @@ unit_test_suite test_pcapng_suite = {
TEST_CASE(test_write_packets),
TEST_CASE(test_write_stats),
TEST_CASE(test_validate),
+ TEST_CASE(test_write_over_limit_iov_max),
TEST_CASES_END()
}
};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 06ad712bd1..e41cf909e1 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -551,33 +551,16 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
return NULL;
}
-/* Count how many segments are in this array of mbufs */
-static unsigned int
-mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
-{
- unsigned int i, iovcnt;
-
- for (iovcnt = 0, i = 0; i < n; i++) {
- const struct rte_mbuf *m = pkts[i];
-
- __rte_mbuf_sanity_check(m, 1);
-
- iovcnt += m->nb_segs;
- }
- return iovcnt;
-}
-
/* Write pre-formatted packets to file. */
ssize_t
rte_pcapng_write_packets(rte_pcapng_t *self,
struct rte_mbuf *pkts[], uint16_t nb_pkts)
{
- int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
- struct iovec iov[iovcnt];
- unsigned int i, cnt;
- ssize_t ret;
+ struct iovec iov[IOV_MAX];
+ unsigned int i, cnt = 0;
+ ssize_t ret, total = 0;
- for (i = cnt = 0; i < nb_pkts; i++) {
+ for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
@@ -589,6 +572,20 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
return -1;
}
+ /*
+ * Handle case of highly fragmented and large burst size
+ * Note: this assumes that max segments per mbuf < IOV_MAX
+ */
+ if (unlikely(cnt + m->nb_segs >= IOV_MAX)) {
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
+ rte_errno = errno;
+ return -1;
+ }
+ total += ret;
+ cnt = 0;
+ }
+
/*
* The DPDK port is recorded during pcapng_copy.
* Map that to PCAPNG interface in file.
@@ -601,10 +598,12 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
} while ((m = m->next));
}
- ret = writev(self->outfd, iov, iovcnt);
- if (unlikely(ret < 0))
+ ret = writev(self->outfd, iov, cnt);
+ if (unlikely(ret < 0)) {
rte_errno = errno;
- return ret;
+ return -1;
+ }
+ return total + ret;
}
/* Create new pcapng writer handle */
--
2.31.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] pcapng: check if writev() returns a partial write
2022-07-29 18:14 ` Stephen Hemminger
@ 2022-08-01 8:42 ` Mário Kuka
0 siblings, 0 replies; 15+ messages in thread
From: Mário Kuka @ 2022-08-01 8:42 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, mdr, reshma.pattan
[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]
> As pcapng is used in the dpdk application it writes to a file.
> You could repurpose it to something else, but even a pipe will not
> give partial writes unless you configure the pipe as non-blocking.
>
> Writing to a non-blocking pipe is going to have a load of other problems.
>
> This seems like a purely hypothetical case, can't see why it needs to be addressed.
OK, I'm sending patch v3 which only fixes the IVO_MAX limit issue.
I've removed the changes related to the partial write check.
On 29/07/2022 20:14, Stephen Hemminger wrote:
> On Fri, 29 Jul 2022 19:08:41 +0200
> Mário Kuka <kuka@cesnet.cz> wrote:
>
>>> Since this is being written to a file, handling partial writes makes little
>>> sense. The only case where partial write would happen would be if filesystem
>>> was full. Retrying just adds unnecessary complexity.
>>>
>>> If you really want to track this, then add a dropped counter.
>> But the file descriptor doesn't have to refer to just a regular file, what
>> if it's a socket or a pipe or some device? The pcapng documentation doesn't
>> say anything about any restrictions, so the implementation should be fully
>> generic. What's the point of a function to write packets to a file
>> descriptor
>> where there's a risk that it won't write all the packets or that the
>> file will
>> by corrupted due to a partial write and still not even let me know about
>> it?
> As pcapng is used in the dpdk application it writes to a file.
> You could repurpose it to something else, but even a pipe will not
> give partial writes unless you configure the pipe as non-blocking.
>
> Writing to a non-blocking pipe is going to have a load of other problems.
>
> This seems like a purely hypothetical case, can't see why it needs to be addressed.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4299 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] pcapng: fix write more packets than IOV_MAX limit
2022-08-01 8:40 ` [PATCH v3] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
@ 2022-08-01 15:33 ` Stephen Hemminger
2022-10-10 0:40 ` Thomas Monjalon
0 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2022-08-01 15:33 UTC (permalink / raw)
To: Mário Kuka; +Cc: dev, mdr, reshma.pattan
On Mon, 1 Aug 2022 10:40:56 +0200
Mário Kuka <kuka@cesnet.cz> wrote:
> The rte_pcapng_write_packets() function fails when we try to write more
> packets than the IOV_MAX limit. writev() system call is limited by the
> IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
> less than or equal to IOV_MAX as defined in <limits.h>.
>
> To avoid this problem, we can check that all segments of the next
> packet will fit into the iovec buffer, whose capacity will be limited
> by the IOV_MAX limit. If not, we flush the current iovec buffer to the
> file by calling writev() and, if successful, fit the current packet at
> the beginning of the flushed iovec buffer.
>
> Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> Cc: stephen@networkplumber.org
>
> Signed-off-by: Mário Kuka <kuka@cesnet.cz>
Thanks for fixing this.
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] pcapng: fix write more packets than IOV_MAX limit
2022-08-01 15:33 ` Stephen Hemminger
@ 2022-10-10 0:40 ` Thomas Monjalon
0 siblings, 0 replies; 15+ messages in thread
From: Thomas Monjalon @ 2022-10-10 0:40 UTC (permalink / raw)
To: Mário Kuka; +Cc: dev, mdr, reshma.pattan, Stephen Hemminger, stable
01/08/2022 17:33, Stephen Hemminger:
> On Mon, 1 Aug 2022 10:40:56 +0200
> Mário Kuka <kuka@cesnet.cz> wrote:
>
> > The rte_pcapng_write_packets() function fails when we try to write more
> > packets than the IOV_MAX limit. writev() system call is limited by the
> > IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
> > less than or equal to IOV_MAX as defined in <limits.h>.
> >
> > To avoid this problem, we can check that all segments of the next
> > packet will fit into the iovec buffer, whose capacity will be limited
> > by the IOV_MAX limit. If not, we flush the current iovec buffer to the
> > file by calling writev() and, if successful, fit the current packet at
> > the beginning of the flushed iovec buffer.
> >
> > Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> > Cc: stephen@networkplumber.org
+ Cc: stable@dpdk.org
> >
> > Signed-off-by: Mário Kuka <kuka@cesnet.cz>
>
> Thanks for fixing this.
>
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Applied, thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-10-10 0:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-25 15:28 [PATCH] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-07-25 15:57 ` Stephen Hemminger
2022-07-25 16:10 ` Stephen Hemminger
2022-07-29 7:18 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Mário Kuka
2022-07-29 7:18 ` [PATCH v2 1/2] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-07-29 7:18 ` [PATCH v2 2/2] pcapng: check if writev() returns a partial write Mário Kuka
2022-07-29 16:00 ` Stephen Hemminger
2022-07-29 17:08 ` Mário Kuka
2022-07-29 18:14 ` Stephen Hemminger
2022-08-01 8:42 ` Mário Kuka
2022-07-29 15:58 ` [PATCH v2 0/2] pcapng: fix some issues with writing packets Stephen Hemminger
2022-07-29 17:33 ` Mário Kuka
2022-08-01 8:40 ` [PATCH v3] pcapng: fix write more packets than IOV_MAX limit Mário Kuka
2022-08-01 15:33 ` Stephen Hemminger
2022-10-10 0:40 ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).