[dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
@ 2014-11-14 21:05 Sanford, Robert
  2014-11-14 23:44 ` Marc Sune
  2014-11-15  0:04 ` Zhou, Danny
  0 siblings, 2 replies; 5+ messages in thread
From: Sanford, Robert @ 2014-11-14 21:05 UTC (permalink / raw)
  To: Thomas Monjalon, dev

Hello Thomas,

I want to discuss a small enhancement to KNI that we developed. We wanted
to send/receive mbufs between one KNI device and multiple cores, but we
wanted to keep the changes simple. So, here were our requirements:

1. Don't use heavy synchronization when reading/writing the FIFOs in
shared memory.
2. Don't make any one core (data or control plane) perform too much work
shuffling mbufs to/from the FIFOs.
3. Don't use an intermediate RTE ring to drop off mbufs when another core
is already accessing the same KNI FIFO.
4. Since (for our private use case) we don't need MP/MC on the kernel
side, don't change the kernel KNI code at all.
5. Don't change the request/reply logic. It stays single-threaded on both
sides.

Here is what we came up with:

1. Borrow heavily from the librte_ring implementation.
2. In librte_kni structure rte_kni, supplement each rte_kni_fifo (tx, rx,
alloc, and free q) with another private, corresponding structure that
contains a head, tail, mask, and size field.
3. Create kni_fifo_put_mp function with merged logic of kni_fifo_put and
__rte_ring_mp_do_enqueue. After we update the tail index (which is private
to the DPDK-app), we update the FIFO write index (shared between app and
kernel).
4. Create kni_fifo_get_mc function with merged logic of kni_fifo_get and
__rte_ring_mc_do_dequeue. After we update the tail index, update the FIFO
read index.
5. In rte_kni_tx_burst and kni_alloc_mbufs, call kni_fifo_put_mp instead
of kni_fifo_put.
6. In rte_kni_rx_burst and kni_free_bufs, call kni_fifo_get_mc instead of
kni_fifo_get.

We believe this is a common use case, and thus would like to contribute it
to dpdk.org.
Questions/comments:
1. Are you interested for us to send the changes as an RFC?
2. Do you agree with this approach, or would it be better, say, to rewrite
both sides of the interface to be more like librte_ring?
3. Perhaps we could improve upon kni_allocate_mbufs, as it blindly
attempts to allocate and enqueue a constant number of mbufs. We have not
really focused on the kernel ==> app data path, because we were only
interested in app ==> kernel.

--
Regards,
Robert Sanford

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
  2014-11-14 21:05 [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer Sanford, Robert
@ 2014-11-14 23:44 ` Marc Sune
  2014-11-15  0:04 ` Zhou, Danny
  1 sibling, 0 replies; 5+ messages in thread
From: Marc Sune @ 2014-11-14 23:44 UTC (permalink / raw)
  To: dev

On 14/11/14 22:05, Sanford, Robert wrote:
> Hello Thomas,
>
> I want to discuss a small enhancement to KNI that we developed. We wanted
> to send/receive mbufs between one KNI device and multiple cores, but we
> wanted to keep the changes simple. So, here were our requirements:
>
> 1. Don't use heavy synchronization when reading/writing the FIFOs in
> shared memory.
> 2. Don't make any one core (data or control plane) perform too much work
> shuffling mbufs to/from the FIFOs.
> 3. Don't use an intermediate RTE ring to drop off mbufs when another core
> is already accessing the same KNI FIFO.
> 4. Since (for our private use case) we don't need MP/MC on the kernel
> side, don't change the kernel KNI code at all.
> 5. Don't change the request/reply logic. It stays single-threaded on both
> sides.

I've done also a quick look during this last days, but still I need to 
allocate some time to look it carefully. I haven't done anything yet 
though, so I would be very much interested to see your patch.

With the current KNI implementation, and even with the very small patch 
([PATCH] Adding RTE_KNI_PREEMPT configuration option) that I sent to the 
list (btw, any volunteer to review it? ), we are getting with a single 
core KNI a very very high delay, comparing direct kernel (std. driver) 
with IGB/IXGB-> KNI->KERNEL->KNI->IGB/IXGB (I was expecting a slightly 
increased delay, but not of a factor of x100 or more as I get). In 
addition the jitter is really horrendous (buffering between rte_kni and 
kernel module?). Btw, we are using a single Kthread (kni_single).

We cannot confirm yet what I am about to say , but having a sequence of 
KNIs (PHY>KNI1>KNI2->...->PHY), we have observed a huge decrease in 
performance (even with the NO_PREEMPT patch) and, what worries me more, 
I've seen reordering of packets.

Of course, we have first to assure that these issues comented before are 
not a problem of our DPDK-based switch, but the code is similar to l3fwd 
(it is partially based on it). Once I find some time, I will try to 
modify the l2fwd example to try this out, and isolate the problem.

One last thing we have obeserved in our application is the huge amount 
of mbufs that KNI interfaces require. But this could be a product of the 
buffering either in the rte_kni/kernel module or the kernel itself to 
treat the incoming skbs, or a combination. I also need to further 
investigate this.

> Here is what we came up with:
>
> 1. Borrow heavily from the librte_ring implementation.
> 2. In librte_kni structure rte_kni, supplement each rte_kni_fifo (tx, rx,
> alloc, and free q) with another private, corresponding structure that
> contains a head, tail, mask, and size field.
> 3. Create kni_fifo_put_mp function with merged logic of kni_fifo_put and
> __rte_ring_mp_do_enqueue. After we update the tail index (which is private
> to the DPDK-app), we update the FIFO write index (shared between app and
> kernel).
> 4. Create kni_fifo_get_mc function with merged logic of kni_fifo_get and
> __rte_ring_mc_do_dequeue. After we update the tail index, update the FIFO
> read index.
> 5. In rte_kni_tx_burst and kni_alloc_mbufs, call kni_fifo_put_mp instead
> of kni_fifo_put.
> 6. In rte_kni_rx_burst and kni_free_bufs, call kni_fifo_get_mc instead of
> kni_fifo_get.
>
> We believe this is a common use case, and thus would like to contribute it
> to dpdk.org.
> Questions/comments:
> 1. Are you interested for us to send the changes as an RFC?
I am, personally.
> 2. Do you agree with this approach, or would it be better, say, to rewrite
> both sides of the interface to be more like librte_ring?
I was thinking about something like this.

But one of the original authors can perhaps shed some light here.
> 3. Perhaps we could improve upon kni_allocate_mbufs, as it blindly
> attempts to allocate and enqueue a constant number of mbufs. We have not
> really focused on the kernel ==> app data path, because we were only
> interested in app ==> kernel.
Seen that and agree. Looks like it is a waste of cycles, at first 
glance. There is a small patch:

[PATCH] kni: optimizing the rte_kni_rx_burst

That seems to partially address this issue, but as far as I have seen, 
it is only addressing the case where no pkts are being send, but not 
cases where for instance

@Thomas: I rebased this patch to master HEAD, and I (quickly) tried this 
patch, but I don't have results. AFAIU from the patch, it only optimizes 
the case where no buffers are being sent in that KNI, so actually the 
benchmarking setup seems to be slightly more complicate. In any case, 
perhaps waiting for Robert's contribution and merge both would make sense.

Marc

>
> --
> Regards,
> Robert Sanford
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
  2014-11-14 21:05 [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer Sanford, Robert
  2014-11-14 23:44 ` Marc Sune
@ 2014-11-15  0:04 ` Zhou, Danny
  2014-11-19 20:48   ` Robert Sanford
  1 sibling, 1 reply; 5+ messages in thread
From: Zhou, Danny @ 2014-11-15  0:04 UTC (permalink / raw)
  To: Sanford, Robert, Thomas Monjalon, dev

It will be always good if you can submit the RFC patch in terms of KNI optimization.

On the other hand, do you have any perf. data to prove that your patchset could improve
KNI performance which is the concern that most customers care about? We introduced
multiple-threaded KNI kernel support last year, if I remember correctly, the key perform
bottle-neck we found is the skb alloc/free and memcpy between skb and mbuf. Would be 
very happy if your patchset can approve I am wrong.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sanford, Robert
> Sent: Saturday, November 15, 2014 5:06 AM
> To: Thomas Monjalon; dev@dpdk.org
> Subject: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
> 
> Hello Thomas,
> 
> I want to discuss a small enhancement to KNI that we developed. We wanted
> to send/receive mbufs between one KNI device and multiple cores, but we
> wanted to keep the changes simple. So, here were our requirements:
> 
> 1. Don't use heavy synchronization when reading/writing the FIFOs in
> shared memory.
> 2. Don't make any one core (data or control plane) perform too much work
> shuffling mbufs to/from the FIFOs.
> 3. Don't use an intermediate RTE ring to drop off mbufs when another core
> is already accessing the same KNI FIFO.
> 4. Since (for our private use case) we don't need MP/MC on the kernel
> side, don't change the kernel KNI code at all.
> 5. Don't change the request/reply logic. It stays single-threaded on both
> sides.
> 
> Here is what we came up with:
> 
> 1. Borrow heavily from the librte_ring implementation.
> 2. In librte_kni structure rte_kni, supplement each rte_kni_fifo (tx, rx,
> alloc, and free q) with another private, corresponding structure that
> contains a head, tail, mask, and size field.
> 3. Create kni_fifo_put_mp function with merged logic of kni_fifo_put and
> __rte_ring_mp_do_enqueue. After we update the tail index (which is private
> to the DPDK-app), we update the FIFO write index (shared between app and
> kernel).
> 4. Create kni_fifo_get_mc function with merged logic of kni_fifo_get and
> __rte_ring_mc_do_dequeue. After we update the tail index, update the FIFO
> read index.
> 5. In rte_kni_tx_burst and kni_alloc_mbufs, call kni_fifo_put_mp instead
> of kni_fifo_put.
> 6. In rte_kni_rx_burst and kni_free_bufs, call kni_fifo_get_mc instead of
> kni_fifo_get.
> 
> We believe this is a common use case, and thus would like to contribute it
> to dpdk.org.
> Questions/comments:
> 1. Are you interested for us to send the changes as an RFC?
> 2. Do you agree with this approach, or would it be better, say, to rewrite
> both sides of the interface to be more like librte_ring?
> 3. Perhaps we could improve upon kni_allocate_mbufs, as it blindly
> attempts to allocate and enqueue a constant number of mbufs. We have not
> really focused on the kernel ==> app data path, because we were only
> interested in app ==> kernel.
> 
> --
> Regards,
> Robert Sanford

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
  2014-11-15  0:04 ` Zhou, Danny
@ 2014-11-19 20:48   ` Robert Sanford
  2014-11-20  4:00     ` Zhou, Danny
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Sanford @ 2014-11-19 20:48 UTC (permalink / raw)
  To: Zhou, Danny, dev

Hi Danny,

On Fri, Nov 14, 2014 at 7:04 PM, Zhou, Danny <danny.zhou@intel.com> wrote:

> It will be always good if you can submit the RFC patch in terms of KNI
> optimization.
>
> On the other hand, do you have any perf. data to prove that your patchset
> could improve
> KNI performance which is the concern that most customers care about? We
> introduced
> multiple-threaded KNI kernel support last year, if I remember correctly,
> the key perform
> bottle-neck we found is the skb alloc/free and memcpy between skb and
> mbuf. Would be
> very happy if your patchset can approve I am wrong.



This is not an attempt to improve raw performance. Our modest goal is to
make librte_kni's RX/TX burst APIs multithreaded, without changing
rte_kni.ko. In this RFC patch, we make it possible for multiple cores to
concurrently invoke rte_kni_tx_burst (or rte_kni_rx_burst) for the same KNI
device.

At the moment, multiple cores invoking rte_kni_tx_burst for the same device
cannot function correctly, because the rte_kni_fifo structures (memory
shared between app and kernel driver) are single-producer, single-consumer.
The following patch supplements the rte_kni_fifo structure with an
additional structure that is private to the application, and we borrow
librte_ring's MP/MC enqueue/dequeue logic.

Here is a patch for 1.8. We have only tested a 1.7.1 version. Please have a
look and let us know whether you think something like this would be useful.

--
Thanks,
Robert


*Signed-off-by: Robert Sanford <rsanford@akamai.com <rsanford@akamai.com>>*

---
 lib/librte_kni/rte_kni.c      |   21 +++++-
 lib/librte_kni/rte_kni_fifo.h |  131
+++++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..8009173 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -76,6 +76,11 @@ struct rte_kni {
  struct rte_kni_fifo *alloc_q;       /**< Allocated mbufs queue */
  struct rte_kni_fifo *free_q;        /**< To be freed mbufs queue */

+ struct rte_kni_fifo_multi tx_q_mc;  /**< Make tx_q multi-consumer */
+ struct rte_kni_fifo_multi alloc_q_mp;/**< Make alloc_q multi-producer */
+ struct rte_kni_fifo_multi rx_q_mp;  /**< Make rx_q multi-producer */
+ struct rte_kni_fifo_multi free_q_mc;/**< Make free_q multi-consumer */
+
  /* For request & response */
  struct rte_kni_fifo *req_q;         /**< Request queue */
  struct rte_kni_fifo *resp_q;        /**< Response queue */
@@ -414,6 +419,11 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
  kni_fifo_init(ctx->free_q, KNI_FIFO_COUNT_MAX);
  dev_info.free_phys = mz->phys_addr;

+ kni_fifo_multi_init(&ctx->tx_q_mc, KNI_FIFO_COUNT_MAX);
+ kni_fifo_multi_init(&ctx->alloc_q_mp, KNI_FIFO_COUNT_MAX);
+ kni_fifo_multi_init(&ctx->rx_q_mp, KNI_FIFO_COUNT_MAX);
+ kni_fifo_multi_init(&ctx->free_q_mc, KNI_FIFO_COUNT_MAX);
+
  /* Request RING */
  mz = slot->m_req_q;
  ctx->req_q = mz->addr;
@@ -557,7 +567,8 @@ rte_kni_handle_request(struct rte_kni *kni)
 unsigned
 rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned
num)
 {
- unsigned ret = kni_fifo_put(kni->rx_q, (void **)mbufs, num);
+ unsigned ret = kni_fifo_put_mp(kni->rx_q, &kni->rx_q_mp, (void **)mbufs,
+ num);

  /* Get mbufs from free_q and then free them */
  kni_free_mbufs(kni);
@@ -568,7 +579,8 @@ rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf
**mbufs, unsigned num)
 unsigned
 rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned
num)
 {
- unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);
+ unsigned ret = kni_fifo_get_mc(kni->tx_q, &kni->tx_q_mc,
+ (void **)mbufs, num);

  /* Allocate mbufs and then put them into alloc_q */
  kni_allocate_mbufs(kni);
@@ -582,7 +594,8 @@ kni_free_mbufs(struct rte_kni *kni)
  int i, ret;
  struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];

- ret = kni_fifo_get(kni->free_q, (void **)pkts, MAX_MBUF_BURST_NUM);
+ ret = kni_fifo_get_mc(kni->free_q, &kni->free_q_mc, (void **)pkts,
+ MAX_MBUF_BURST_NUM);
  if (likely(ret > 0)) {
  for (i = 0; i < ret; i++)
  rte_pktmbuf_free(pkts[i]);
@@ -629,7 +642,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
  if (i <= 0)
  return;

- ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);
+ ret = kni_fifo_put_mp(kni->alloc_q, &kni->alloc_q_mp, (void **)pkts, i);

  /* Check if any mbufs not put into alloc_q, and then free them */
  if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {
diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h
index 8cb8587..7dccba2 100644
--- a/lib/librte_kni/rte_kni_fifo.h
+++ b/lib/librte_kni/rte_kni_fifo.h
@@ -91,3 +91,134 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data,
unsigned num)
  fifo->read = new_read;
  return i;
 }
+
+
+/**
+ * Suplemental members to facilitate MP/MC access to KNI FIFOs.
+ */
+struct rte_kni_fifo_multi {
+ volatile uint32_t head;
+ volatile uint32_t tail;
+ uint32_t mask;
+ uint32_t size;
+};
+
+/**
+ * Initialize a kni fifo MP/MC struct.
+ */
+static void
+kni_fifo_multi_init(struct rte_kni_fifo_multi *multi, unsigned size)
+{
+ multi->head = 0;
+ multi->tail = 0;
+ multi->mask = (typeof(multi->mask))(size - 1);
+ multi->size = (typeof(multi->size))size;
+}
+
+/**
+ * Adds num elements into the fifo. Return the number actually written.
+ *
+ * Multiple-producer version, modeled after __rte_ring_mp_do_enqueue().
+ */
+static inline unsigned
+kni_fifo_put_mp(struct rte_kni_fifo *fifo, struct rte_kni_fifo_multi *prod,
+ void **data, unsigned n)
+{
+ uint32_t prod_head, prod_next;
+ uint32_t cons_tail, free_entries;
+ const unsigned max = n;
+ int success;
+ unsigned i;
+ const uint32_t mask = prod->mask;
+
+ /* Move prod->head atomically. */
+ do {
+ /* Reset n to the initial burst count. */
+ n = max;
+
+ prod_head = prod->head;
+ cons_tail = fifo->read;
+
+ free_entries = (mask + cons_tail - prod_head) & mask;
+
+ /* Check that we have enough room in ring. */
+ if (unlikely(n > free_entries)) {
+ if (unlikely(free_entries == 0))
+ return 0;
+ n = free_entries;
+ }
+
+ prod_next = (prod_head + n) & mask;
+ success = rte_atomic32_cmpset(&prod->head, prod_head, prod_next);
+ } while (unlikely(success == 0));
+
+ /* Write entries in ring. */
+ for (i = 0; i < n; i++) {
+ fifo->buffer[(prod_head + i) & mask] = data[i];
+ }
+ rte_compiler_barrier();
+
+ /* If there are other enqueues in progress that preceded us,
+ * we need to wait for them to complete.
+ */
+ while (unlikely(prod->tail != prod_head))
+ rte_pause();
+
+ prod->tail = prod_next;
+ fifo->write = prod_next;
+ return n;
+}
+
+/**
+ * Get up to num elements from the fifo. Return the number actully read.
+ *
+ * Multiple-consumer version, modeled after __rte_ring_mc_do_dequeue().
+ */
+static inline unsigned
+kni_fifo_get_mc(struct rte_kni_fifo *fifo, struct rte_kni_fifo_multi *cons,
+ void **data, unsigned n)
+{
+ uint32_t cons_head, prod_tail;
+ uint32_t cons_next, entries;
+ const unsigned max = n;
+ int success;
+ unsigned i;
+ const uint32_t mask = cons->mask;
+
+ /* Move cons->head atomically. */
+ do {
+ /* Restore n as it may change every loop. */
+ n = max;
+
+ cons_head = cons->head;
+ prod_tail = fifo->write;
+
+ entries = (prod_tail - cons_head + mask + 1) & mask;
+
+ /* Set the actual entries for dequeue. */
+ if (n > entries) {
+ if (unlikely(entries == 0))
+ return 0;
+ n = entries;
+ }
+
+ cons_next = (cons_head + n) & mask;
+ success = rte_atomic32_cmpset(&cons->head, cons_head, cons_next);
+ } while (unlikely(success == 0));
+
+ /* Copy entries from ring. */
+ for (i = 0; i < n; i++) {
+ data[i] = fifo->buffer[(cons_head + i) & mask];
+ }
+ rte_compiler_barrier();
+
+ /* If there are other dequeues in progress that preceded us,
+ * we need to wait for them to complete.
+ */
+ while (unlikely(cons->tail != cons_head))
+ rte_pause();
+
+ cons->tail = cons_next;
+ fifo->read = cons_next;
+ return n;
+}
-- 
1.7.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
  2014-11-19 20:48   ` Robert Sanford
@ 2014-11-20  4:00     ` Zhou, Danny
  0 siblings, 0 replies; 5+ messages in thread
From: Zhou, Danny @ 2014-11-20  4:00 UTC (permalink / raw)
  To: Robert Sanford; +Cc: dev

Robert, I roughly review the code below about lockless KNI fifo with MP/MC support , which looks good to me. Comparing to SP/SC implementation, I think It should introduce a little performance degradation but it will be making your multiple-threaded DPDK application easier to program.  Do you have plan to support MP/MC in kernel part of KNI as well?

From: Robert Sanford [mailto:rsanford2@gmail.com]
Sent: Thursday, November 20, 2014 4:49 AM
To: Zhou, Danny; dev@dpdk.org
Subject: Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer

Hi Danny,

On Fri, Nov 14, 2014 at 7:04 PM, Zhou, Danny <danny.zhou@intel.com<mailto:danny.zhou@intel.com>> wrote:
It will be always good if you can submit the RFC patch in terms of KNI optimization.

On the other hand, do you have any perf. data to prove that your patchset could improve
KNI performance which is the concern that most customers care about? We introduced
multiple-threaded KNI kernel support last year, if I remember correctly, the key perform
bottle-neck we found is the skb alloc/free and memcpy between skb and mbuf. Would be
very happy if your patchset can approve I am wrong.


This is not an attempt to improve raw performance. Our modest goal is to make librte_kni's RX/TX burst APIs multithreaded, without changing rte_kni.ko. In this RFC patch, we make it possible for multiple cores to concurrently invoke rte_kni_tx_burst (or rte_kni_rx_burst) for the same KNI device.

At the moment, multiple cores invoking rte_kni_tx_burst for the same device cannot function correctly, because the rte_kni_fifo structures (memory shared between app and kernel driver) are single-producer, single-consumer. The following patch supplements the rte_kni_fifo structure with an additional structure that is private to the application, and we borrow librte_ring's MP/MC enqueue/dequeue logic.


Here is a patch for 1.8. We have only tested a 1.7.1 version. Please have a look and let us know whether you think something like this would be useful.

--
Thanks,
Robert


Signed-off-by: Robert Sanford <rsanford@akamai.com<mailto:rsanford@akamai.com>>

---
 lib/librte_kni/rte_kni.c      |   21 +++++-
 lib/librte_kni/rte_kni_fifo.h |  131 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..8009173 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -76,6 +76,11 @@ struct rte_kni {
  struct rte_kni_fifo *alloc_q;       /**< Allocated mbufs queue */
  struct rte_kni_fifo *free_q;        /**< To be freed mbufs queue */

+  struct rte_kni_fifo_multi tx_q_mc;  /**< Make tx_q multi-consumer */
+  struct rte_kni_fifo_multi alloc_q_mp;/**< Make alloc_q multi-producer */
+  struct rte_kni_fifo_multi rx_q_mp;  /**< Make rx_q multi-producer */
+  struct rte_kni_fifo_multi free_q_mc;/**< Make free_q multi-consumer */
+
  /* For request & response */
  struct rte_kni_fifo *req_q;         /**< Request queue */
  struct rte_kni_fifo *resp_q;        /**< Response queue */
@@ -414,6 +419,11 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
  kni_fifo_init(ctx->free_q, KNI_FIFO_COUNT_MAX);
  dev_info.free_phys = mz->phys_addr;

+  kni_fifo_multi_init(&ctx->tx_q_mc, KNI_FIFO_COUNT_MAX);
+  kni_fifo_multi_init(&ctx->alloc_q_mp, KNI_FIFO_COUNT_MAX);
+  kni_fifo_multi_init(&ctx->rx_q_mp, KNI_FIFO_COUNT_MAX);
+  kni_fifo_multi_init(&ctx->free_q_mc, KNI_FIFO_COUNT_MAX);
+
  /* Request RING */
  mz = slot->m_req_q;
  ctx->req_q = mz->addr;
@@ -557,7 +567,8 @@ rte_kni_handle_request(struct rte_kni *kni)
 unsigned
 rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num)
 {
-  unsigned ret = kni_fifo_put(kni->rx_q, (void **)mbufs, num);
+  unsigned ret = kni_fifo_put_mp(kni->rx_q, &kni->rx_q_mp, (void **)mbufs,
+         num);

  /* Get mbufs from free_q and then free them */
  kni_free_mbufs(kni);
@@ -568,7 +579,8 @@ rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num)
 unsigned
 rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num)
 {
-  unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);
+  unsigned ret = kni_fifo_get_mc(kni->tx_q, &kni->tx_q_mc,
+         (void **)mbufs, num);

  /* Allocate mbufs and then put them into alloc_q */
  kni_allocate_mbufs(kni);
@@ -582,7 +594,8 @@ kni_free_mbufs(struct rte_kni *kni)
  int i, ret;
  struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];

-  ret = kni_fifo_get(kni->free_q, (void **)pkts, MAX_MBUF_BURST_NUM);
+  ret = kni_fifo_get_mc(kni->free_q, &kni->free_q_mc, (void **)pkts,
+         MAX_MBUF_BURST_NUM);
  if (likely(ret > 0)) {
     for (i = 0; i < ret; i++)
         rte_pktmbuf_free(pkts[i]);
@@ -629,7 +642,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
  if (i <= 0)
     return;

-  ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);
+  ret = kni_fifo_put_mp(kni->alloc_q, &kni->alloc_q_mp, (void **)pkts, i);

  /* Check if any mbufs not put into alloc_q, and then free them */
  if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {
diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h
index 8cb8587..7dccba2 100644
--- a/lib/librte_kni/rte_kni_fifo.h
+++ b/lib/librte_kni/rte_kni_fifo.h
@@ -91,3 +91,134 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, unsigned num)
  fifo->read = new_read;
  return i;
 }
+
+
+/**
+ * Suplemental members to facilitate MP/MC access to KNI FIFOs.
+ */
+struct rte_kni_fifo_multi {
+  volatile uint32_t head;
+  volatile uint32_t tail;
+  uint32_t mask;
+  uint32_t size;
+};
+
+/**
+ * Initialize a kni fifo MP/MC struct.
+ */
+static void
+kni_fifo_multi_init(struct rte_kni_fifo_multi *multi, unsigned size)
+{
+  multi->head = 0;
+  multi->tail = 0;
+  multi->mask = (typeof(multi->mask))(size - 1);
+  multi->size = (typeof(multi->size))size;
+}
+
+/**
+ * Adds num elements into the fifo. Return the number actually written.
+ *
+ * Multiple-producer version, modeled after __rte_ring_mp_do_enqueue().
+ */
+static inline unsigned
+kni_fifo_put_mp(struct rte_kni_fifo *fifo, struct rte_kni_fifo_multi *prod,
+     void **data, unsigned n)
+{
+  uint32_t prod_head, prod_next;
+  uint32_t cons_tail, free_entries;
+  const unsigned max = n;
+  int success;
+  unsigned i;
+  const uint32_t mask = prod->mask;
+
+  /* Move prod->head atomically. */
+  do {
+     /* Reset n to the initial burst count. */
+     n = max;
+
+     prod_head = prod->head;
+     cons_tail = fifo->read;
+
+     free_entries = (mask + cons_tail - prod_head) & mask;
+
+     /* Check that we have enough room in ring. */
+     if (unlikely(n > free_entries)) {
+         if (unlikely(free_entries == 0))
+            return 0;
+         n = free_entries;
+     }
+
+     prod_next = (prod_head + n) & mask;
+     success = rte_atomic32_cmpset(&prod->head, prod_head, prod_next);
+  } while (unlikely(success == 0));
+
+  /* Write entries in ring. */
+  for (i = 0; i < n; i++) {
+     fifo->buffer[(prod_head + i) & mask] = data[i];
+  }
+  rte_compiler_barrier();
+
+  /* If there are other enqueues in progress that preceded us,
+  * we need to wait for them to complete.
+  */
+  while (unlikely(prod->tail != prod_head))
+     rte_pause();
+
+  prod->tail = prod_next;
+  fifo->write = prod_next;
+  return n;
+}
+
+/**
+ * Get up to num elements from the fifo. Return the number actully read.
+ *
+ * Multiple-consumer version, modeled after __rte_ring_mc_do_dequeue().
+ */
+static inline unsigned
+kni_fifo_get_mc(struct rte_kni_fifo *fifo, struct rte_kni_fifo_multi *cons,
+     void **data, unsigned n)
+{
+  uint32_t cons_head, prod_tail;
+  uint32_t cons_next, entries;
+  const unsigned max = n;
+  int success;
+  unsigned i;
+  const uint32_t mask = cons->mask;
+
+  /* Move cons->head atomically. */
+  do {
+     /* Restore n as it may change every loop. */
+     n = max;
+
+     cons_head = cons->head;
+     prod_tail = fifo->write;
+
+     entries = (prod_tail - cons_head + mask + 1) & mask;
+
+     /* Set the actual entries for dequeue. */
+     if (n > entries) {
+         if (unlikely(entries == 0))
+            return 0;
+         n = entries;
+     }
+
+     cons_next = (cons_head + n) & mask;
+     success = rte_atomic32_cmpset(&cons->head, cons_head, cons_next);
+  } while (unlikely(success == 0));
+
+  /* Copy entries from ring. */
+  for (i = 0; i < n; i++) {
+     data[i] = fifo->buffer[(cons_head + i) & mask];
+  }
+  rte_compiler_barrier();
+
+  /* If there are other dequeues in progress that preceded us,
+  * we need to wait for them to complete.
+  */
+  while (unlikely(cons->tail != cons_head))
+     rte_pause();
+
+  cons->tail = cons_next;
+  fifo->read = cons_next;
+  return n;
+}
--
1.7.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-11-20  3:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 21:05 [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer Sanford, Robert
2014-11-14 23:44 ` Marc Sune
2014-11-15  0:04 ` Zhou, Danny
2014-11-19 20:48   ` Robert Sanford
2014-11-20  4:00     ` Zhou, Danny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).