On Thu, 1 Sep 2022 09:33:54 +0200
Anna Tauzzi <admin@argonnetech.net> wrote:
> I'm using the Mellanox Connect X5:
>
> pci@0000:3b:00.0 enp59s0f0np0 network MT27800 Family [ConnectX-5]
> pci@0000:3b:00.1 enp59s0f1np1 network MT27800 Family [ConnectX-5]
> pci@0000:3b:00.2 enp59s0f0v0 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.3 enp59s0f0v1 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.4 enp59s0f0v2 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.5 enp59s0f0v3 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.2 enp59s0f1v0 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.3 enp59s0f1v1 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.4 enp59s0f1v2 network MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.5 enp59s0f1v3 network MT27800 Family [ConnectX-5
> Virtual Function]
>
> This is the message:
> lcore 6 called tx_pkt_burst for not ready port 0
> 8: [/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7ffff7c77a00]]
> 7: [/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7ffff7be5b43]]
> 6: [/usr/local/lib/librte_eal.so.22(+0x1559a) [0x7ffff7d8e59a]]
> 5: [build/simple_eth_tx_mp(+0x1a0c7) [0x55555556e0c7]]
> 4: [build/simple_eth_tx_mp(+0x19f89) [0x55555556df89]]
> 3: [build/simple_eth_tx_mp(+0x423c) [0x55555555823c]]
> 2: [/usr/local/lib/librte_ethdev.so.22(+0x7cbc) [0x7ffff7eb3cbc]]
> 1: [/usr/local/lib/librte_eal.so.22(rte_dump_stack+0x32) [0x7ffff7daf152]]
>
> I'm having all sorts of problems with this Mellanox stuff, Intel cards are
> much more user friendly.
>
> Just to recap:
> * configure on primary and transmit on primary ---> GOOD
>
> * configure on secondary and transmit on secondary ---> SIGSEGV
> Thread 4 "lcore-worker-6" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff4346640 (LWP 7208)]
> rte_eth_tx_burst (port_id=0, queue_id=0, tx_pkts=0x7ffff4344ac0, nb_pkts=1)
> at /usr/local/include/rte_ethdev.h:5650
> 5650 qd = p->txq.data[queue_id];
> (gdb) print p->txq
> $2 = {data = 0x0, clbk = 0x7ffff7f21528 <rte_eth_devices+8296>} (data is
> NULL)
>
>
> * configure on primary and transmit on secondary ---> PORT NOT READY
>
> Do you know who should be notified of this problem? Should I open a bug on
> DPDK bugzilla or file it to NVIDIA?
>
> Thx.
>
>
>
> Il giorno gio 1 set 2022 alle ore 03:25 Stephen Hemminger <
> stephen@networkplumber.org> ha scritto:
>
> > On Wed, 31 Aug 2022 22:59:56 +0200
> > Anna Tauzzi <admin@argonnetech.net> wrote:
> >
> > > I initialize a port with the following methods on a primary process:
> > >
> > > rte_dev_probe(vf)
> > >
> > > rte_eth_dev_configure(port_id, ... );
> > >
> > > rte_eth_dev_adjust_nb_rx_tx_desc(port_id, ... );
> > >
> > > rte_eth_rx_queue_setup(port_id, .... );
> > >
> > > rte_eth_tx_queue_setup(port_id, ... );
> > >
> > > rte_eth_dev_start(port_id ... );
> > >
> > >
> > >
> > > Then I use the rte_eth_tx_burst(port_id) in the secondary process but I
> > get
> > > this message:
> > >
> > > called tx_pkt_burst for not ready port 0
> > >
> > > Is this expected?
> >
> > No looks like a device driver bug. Which PMD?
What version of rdma-core and kernel.
There were some bugs in earlier versions around secondary process support.
They were fixed, some users are using failsafe and mlx5 on Azure with
secondary processes.