DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)" <jgrajcia@cisco.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
Date: Thu, 2 May 2019 12:35:56 +0000	[thread overview]
Message-ID: <1556800556778.14516@cisco.com> (raw)
In-Reply-To: <0762da59-4a97-474f-7d67-e3bd8daf50f2@intel.com>


________________________________________
From: Ferruh Yigit <ferruh.yigit@intel.com>
Sent: Monday, March 25, 2019 9:58 PM
To: Jakub Grajciar; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)

On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>

<...>

> @@ -0,0 +1,200 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Master creates the socket and listens for any slave
> +connection requests. The socket may already exist on the system. Be sure to
> +remove any such sockets, if you are creating a master interface, or you will
> +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,

Can it be possible to remove this existing socket on successfully termination of
the dpdk application with 'master' role?
Otherwise each time to run a dpdk app, requires to delete the socket file first.


    net_memif is the same case as net_virtio_user, I'd use the same workaround.
    testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.


> +which removes memif interface, will also remove a listener socket, if it is
> +not being used by any other interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.

When I connect two slaves, id=0 & id=1 to the master, both master and second
slave crashed as soon as second slave connected, is this a know issue? Can you
please check this?

master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i
slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
net_memif0,role=slave,id=0 -- -i
slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
net_memif0,role=slave,id=1 -- -i

<...>


    Each interface can be connected to one peer interface at the same time, I'll
    add more details to the documentation.


> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.

How this play with multi process support? When I run a secondary app when memif
PMD is enabled in primary, secondary process crashes.
Can you please check and ensure at least nothing crashes when a secondary app run?


    For now I'd disable multi-process support, so we can get the patch applied, then
    provide multi-process support in a separate patch.


> +
> +First create ``master`` interface::
> +
> +    #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> +    #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> +    testpmd> set fwd rxonly
> +    testpmd> start
> +
> +    testpmd> set fwd txonly
> +    testpmd> start

Would it be useful to add loopback option to the PMD, for testing/debug ?

Also I am getting low performance numbers above, comparing the ring pmd for
example, is there any performance target for the pmd?
Same forwarding core for both testpmds, this is terrible, either something is
wrong or I am doing something wrong, it is ~40Kpps
Different forwarding cores for each testpmd, still low, !18Mpps

<...>


    The difference between ring pmd and memif pmd is that while memif transfers packets,
    ring transfers whole buffers. (Also ring can not be used process to process)
    That means, it does not have to alloc/free buffers.
    I did a simple test where I modified the code so the tx function will only free given buffers
    and rx allocates new buffers. On my machine, this can only handle 45-50Mpps.

    With that in mind, I believe that 23Mpps is fine performance. No performance target is
    defined, the goal is to be as fast as possible.

    The cause of the issue, where traffic drops to 40Kpps while using the same core for both applications,
    is that within the timeslice given to testpmd process, memif driver fills its queues,
    but keeps polling, not giving the other process a chance to receive the packets.
    Same applies to rx side, where an empty queue is polled over and over again. In such configuration,
    interrupt rx-mode should be used instead of polling. Or the application could suspend
    the port.


> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> +     uint16_t flags;                         /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1                       /**< is chained buffer */

Is this define used?

<...>


     Yes, see eth_memif_tx() and eth_memif_rx()


> +     while (n_slots && n_rx_pkts < nb_pkts) {
> +             mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> +             if (unlikely(mbuf_head == NULL))
> +                     goto no_free_bufs;
> +             mbuf = mbuf_head;
> +             mbuf->port = mq->in_port;
> +
> + next_slot:
> +             s0 = cur_slot & mask;
> +             d0 = &ring->desc[s0];
> +
> +             src_len = d0->length;
> +             dst_off = 0;
> +             src_off = 0;
> +
> +             do {
> +                     dst_len = mbuf_size - dst_off;
> +                     if (dst_len == 0) {
> +                             dst_off = 0;
> +                             dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
> +
> +                             mbuf = rte_pktmbuf_alloc(mq->mempool);
> +                             if (unlikely(mbuf == NULL))
> +                                     goto no_free_bufs;
> +                             mbuf->port = mq->in_port;
> +                             rte_pktmbuf_chain(mbuf_head, mbuf);
> +                     }
> +                     cp_len = RTE_MIN(dst_len, src_len);
> +
> +                     rte_pktmbuf_pkt_len(mbuf) =
> +                         rte_pktmbuf_data_len(mbuf) += cp_len;

Can you please make this two lines to prevent confusion?

Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not
set to 'cp_len'...

<...>


    Fix in next patch.


> +             if (r->addr == NULL)
> +                     return;
> +             munmap(r->addr, r->region_size);
> +             if (r->fd > 0) {
> +                     close(r->fd);
> +                     r->fd = -1;
> +             }
> +     }
> +     rte_free(pmd->regions);
> +}
> +
> +static int
> +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
> +{
> +     struct memif_region *r;
> +     char shm_name[32];
> +     int i;
> +     int ret = 0;
> +
> +     r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
> +     if (r == NULL) {
> +             MIF_LOG(ERR, "%s: Failed to allocate regions.",
> +                     rte_vdev_device_name(pmd->vdev));
> +             return -ENOMEM;
> +     }
> +
> +     pmd->regions = r;
> +     pmd->regions_num = brn + 1;
> +
> +     /*
> +      * Create shm for every region. Region 0 is reserved for descriptors.
> +      * Other regions contain buffers.
> +      */
> +     for (i = 0; i < (brn + 1); i++) {
> +             r = &pmd->regions[i];
> +
> +             r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> +                                            pmd->run.num_m2s_rings) *
> +                 (sizeof(memif_ring_t) +
> +                  sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;

what exactly "buffer_offset" is? For 'region 0' it calculates the size of the
size of the 'region 0', otherwise 0. This is offset starting from where? And it
seems only used for below size assignment.


    It is possible to use single region for one connection (non-zero-copy) to reduce
    the number of files opened. It will be implemented in the next patch.
    'buffer_offset' is offset from the start of shared memory file to the first packet buffer.


> +                             (1 << pmd->run.log2_ring_size) *
> +                             (pmd->run.num_s2m_rings +
> +                              pmd->run.num_m2s_rings));

There is an illusion of packet buffers can be in multiple regions (above comment
implies it) but this logic assumes all packet buffer are in same region other
than region 0, which gives us 'region 1' and this is already hardcoded in a few
places. Is there a benefit of assuming there will be more regions, will it make
simple to accept 'region 0' for rings and 'region 1' is for packet buffer?


    Multiple regions are required in case of zero-copy slave. The zero-copy will
    expose dpdk shared memory, where each memseg/memseg_list will be
    represended by memif_region.


> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> +     struct pmd_internals *pmd = dev->data->dev_private;
> +     struct memif_region *mr;
> +     struct memif_queue *mq;
> +     int i;
> +
> +     for (i = 0; i < pmd->regions_num; i++) {
> +             mr = pmd->regions + i;
> +             if (mr != NULL) {
> +                     if (mr->addr == NULL) {
> +                             if (mr->fd < 0)
> +                                     return -1;
> +                             mr->addr = mmap(NULL, mr->region_size,
> +                                             PROT_READ | PROT_WRITE,
> +                                             MAP_SHARED, mr->fd, 0);
> +                             if (mr->addr == NULL)
> +                                     return -1;
> +                     }
> +             }
> +     }

If multiple slave is not supported, is 'id' devarg still valid, or is it used at
all?


    'id' is used to identify peer interface. Interface with id=0 will only connect to an
    interface with id=0 and so on...


<...>

> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> +          memif_interface_id_t id, uint32_t flags,
> +          const char *socket_filename,
> +          memif_log2_ring_size_t log2_ring_size,
> +          uint16_t buffer_size, const char *secret,
> +          struct ether_addr *eth_addr)
> +{
> +     int ret = 0;
> +     struct rte_eth_dev *eth_dev;
> +     struct rte_eth_dev_data *data;
> +     struct pmd_internals *pmd;
> +     const unsigned int numa_node = vdev->device.numa_node;
> +     const char *name = rte_vdev_device_name(vdev);
> +
> +     if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> +             MIF_LOG(ERR, "Zero-copy not supported.");
> +             return -1;
> +     }

What is the plan for the zero copy?
Can you please put the status & plan to the documentation?

<...>


    I don't think that putting the status in the docs is needed. Zero-copy slave is in testing stage
    and I should be able to submit a patch once we get this one applied.


> +
> +     uint16_t last_head;                     /**< last ring head */
> +     uint16_t last_tail;                     /**< last ring tail */

ring already has head, tail variables, what is the difference in queue ones?

  parent reply	other threads:[~2019-05-02 12:36 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-13 13:30 [dpdk-dev] [RFC v3] " Jakub Grajciar
2018-12-13 18:07 ` Stephen Hemminger
2018-12-14  9:39   ` Bruce Richardson
2018-12-14 16:12     ` Wiles, Keith
2019-01-04 17:16 ` Ferruh Yigit
2019-01-04 19:23 ` Stephen Hemminger
2019-01-04 19:27 ` Stephen Hemminger
2019-01-04 19:32 ` Stephen Hemminger
2019-02-20 11:52 ` [dpdk-dev] [RFC v4] " Jakub Grajciar
2019-02-20 15:46   ` Stephen Hemminger
2019-02-20 16:17   ` Stephen Hemminger
2019-02-21 10:50   ` Rami Rosen
2019-02-27 17:04   ` Ferruh Yigit
2019-03-22 11:57   ` [dpdk-dev] [RFC v5] " Jakub Grajciar
2019-03-22 11:57     ` Jakub Grajciar
2019-03-25 20:58     ` Ferruh Yigit
2019-03-25 20:58       ` Ferruh Yigit
2019-05-02 12:35       ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) [this message]
2019-05-02 12:35         ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-03  4:27         ` Honnappa Nagarahalli
2019-05-03  4:27           ` Honnappa Nagarahalli
2019-05-06 11:00           ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:00             ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:04             ` Damjan Marion (damarion)
2019-05-06 11:04               ` Damjan Marion (damarion)
2019-05-07 11:29               ` Honnappa Nagarahalli
2019-05-07 11:29                 ` Honnappa Nagarahalli
2019-05-07 11:37                 ` Damjan Marion (damarion)
2019-05-07 11:37                   ` Damjan Marion (damarion)
2019-05-08  7:53                   ` Honnappa Nagarahalli
2019-05-08  7:53                     ` Honnappa Nagarahalli
2019-05-09  8:30     ` [dpdk-dev] [RFC v6] " Jakub Grajciar
2019-05-09  8:30       ` Jakub Grajciar
2019-05-13 10:45       ` [dpdk-dev] [RFC v7] " Jakub Grajciar
2019-05-13 10:45         ` Jakub Grajciar
2019-05-16 11:46         ` [dpdk-dev] [RFC v8] " Jakub Grajciar
2019-05-16 15:18           ` Stephen Hemminger
2019-05-16 15:19           ` Stephen Hemminger
2019-05-16 15:21           ` Stephen Hemminger
2019-05-20  9:22             ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-16 15:25           ` Stephen Hemminger
2019-05-16 15:28           ` Stephen Hemminger
2019-05-20 10:18           ` [dpdk-dev] [RFC v9] " Jakub Grajciar
2019-05-29 17:29             ` Ferruh Yigit
2019-05-30 12:38               ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-31  6:22             ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
2019-05-31  7:43               ` Ye Xiaolong
2019-06-03 11:28                 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-06-03 14:25                   ` Ye Xiaolong
2019-06-05 12:01                 ` Ferruh Yigit
2019-06-03 13:37               ` Aaron Conole
2019-06-05 11:55               ` Ferruh Yigit
2019-06-06  9:24                 ` Ferruh Yigit
2019-06-06 10:25                   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-06-06 11:18                     ` Ferruh Yigit
2019-06-06  8:24               ` [dpdk-dev] [PATCH v11] " Jakub Grajciar
2019-06-06 11:38                 ` [dpdk-dev] [PATCH v12] " Jakub Grajciar
2019-06-06 14:07                   ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556800556778.14516@cisco.com \
    --to=jgrajcia@cisco.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).