Thank you for your interest in the problem. It seems that the error message was due to the passing of option --allow 0000:00.0 by mistake to the secondary too.
The primary correctly did all initialization phases:

rte_dev_probe(vf)
rte_eth_dev_configure(port_id, ... );
rte_eth_dev_adjust_nb_rx_tx_desc(port_id, ... );
rte_eth_rx_queue_setup(port_id, .... );
rte_eth_tx_queue_setup(port_id, ... );
rte_eth_dev_start(port_id ... );

 and the secondary did nothing apart from the tx_burst but the secondary didn't see the port at all due to --allow wrong options.

BR,
Anna.


Il giorno gio 1 set 2022 alle ore 17:22 Stephen Hemminger <stephen@networkplumber.org> ha scritto:
On Thu, 1 Sep 2022 09:33:54 +0200
Anna Tauzzi <admin@argonnetech.net> wrote:

> I'm using the Mellanox Connect X5:
>
> pci@0000:3b:00.0  enp59s0f0np0   network        MT27800 Family [ConnectX-5]
> pci@0000:3b:00.1  enp59s0f1np1   network        MT27800 Family [ConnectX-5]
> pci@0000:3b:00.2  enp59s0f0v0    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.3  enp59s0f0v1    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.4  enp59s0f0v2    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:00.5  enp59s0f0v3    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.2  enp59s0f1v0    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.3  enp59s0f1v1    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.4  enp59s0f1v2    network        MT27800 Family [ConnectX-5
> Virtual Function]
> pci@0000:3b:04.5  enp59s0f1v3    network        MT27800 Family [ConnectX-5
> Virtual Function]
>
> This is the message:
> lcore 6 called tx_pkt_burst for not ready port 0
> 8: [/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7ffff7c77a00]]
> 7: [/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7ffff7be5b43]]
> 6: [/usr/local/lib/librte_eal.so.22(+0x1559a) [0x7ffff7d8e59a]]
> 5: [build/simple_eth_tx_mp(+0x1a0c7) [0x55555556e0c7]]
> 4: [build/simple_eth_tx_mp(+0x19f89) [0x55555556df89]]
> 3: [build/simple_eth_tx_mp(+0x423c) [0x55555555823c]]
> 2: [/usr/local/lib/librte_ethdev.so.22(+0x7cbc) [0x7ffff7eb3cbc]]
> 1: [/usr/local/lib/librte_eal.so.22(rte_dump_stack+0x32) [0x7ffff7daf152]]
>
> I'm having all sorts of problems with this Mellanox stuff, Intel cards are
> much more user friendly.
>
> Just to recap:
> * configure on primary and transmit on primary           ---> GOOD
>
> * configure on secondary and transmit on secondary  ---> SIGSEGV
> Thread 4 "lcore-worker-6" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff4346640 (LWP 7208)]
> rte_eth_tx_burst (port_id=0, queue_id=0, tx_pkts=0x7ffff4344ac0, nb_pkts=1)
> at /usr/local/include/rte_ethdev.h:5650
> 5650            qd = p->txq.data[queue_id];
> (gdb) print p->txq
> $2 = {data = 0x0, clbk = 0x7ffff7f21528 <rte_eth_devices+8296>} (data is
> NULL)
>
>
> * configure on primary and transmit on secondary       ---> PORT NOT READY
>
> Do you know who should be notified of this problem? Should I open a bug on
> DPDK bugzilla or file it to NVIDIA?
>
> Thx.
>
>
>
> Il giorno gio 1 set 2022 alle ore 03:25 Stephen Hemminger <
> stephen@networkplumber.org> ha scritto: 
>
> > On Wed, 31 Aug 2022 22:59:56 +0200
> > Anna Tauzzi <admin@argonnetech.net> wrote:
> > 
> > > I initialize a port with the following methods on a primary process:
> > >
> > > rte_dev_probe(vf)
> > >
> > > rte_eth_dev_configure(port_id, ... );
> > >
> > > rte_eth_dev_adjust_nb_rx_tx_desc(port_id, ... );
> > >
> > > rte_eth_rx_queue_setup(port_id, .... );
> > >
> > > rte_eth_tx_queue_setup(port_id, ... );
> > >
> > > rte_eth_dev_start(port_id ... );
> > >
> > >
> > >
> > > Then I use the rte_eth_tx_burst(port_id) in the secondary process but I 
> > get 
> > > this message:
> > >
> > > called tx_pkt_burst for not ready port 0
> > >
> > > Is this expected? 
> >
> > No looks like a device driver bug. Which PMD?

What version of rdma-core and kernel.
There were some bugs in earlier versions around secondary process support.
They were fixed, some users are using failsafe and mlx5 on Azure with
secondary processes.