The vhost_user_read_cb() and rte_vhost_driver_unregister() can be called at the same time by 2 threads. Eg thread1 calls vhost_user_read_cb() and removes the vsocket from conn_list, then thread2 calls rte_vhost_driver_unregister() and frees the vsocket since it is NOT in the conn_list. So thread1 will access invalid memory when trying to reconnect. The fix is to move the "removing of vsocket from conn_list" to end of the vhost_user_read_cb(), then avoid the race condition. The core trace is Program terminated with signal 11, Segmentation fault. Signed-off-by: Zhike Wang <wangzhike@jd.com> --- lib/librte_vhost/socket.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index ebb2ff6..01a9dec 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -318,16 +318,16 @@ struct vhost_user { vhost_destroy_device(conn->vid); + if (vsocket->reconnect) { + create_unix_socket(vsocket); + vhost_user_start_client(vsocket); + } + pthread_mutex_lock(&vsocket->conn_mutex); TAILQ_REMOVE(&vsocket->conn_list, conn, next); pthread_mutex_unlock(&vsocket->conn_mutex); free(conn); - - if (vsocket->reconnect) { - create_unix_socket(vsocket); - vhost_user_start_client(vsocket); - } } } -- 1.8.3.1
On 1/16/20 3:07 AM, Zhike Wang wrote: > The vhost_user_read_cb() and rte_vhost_driver_unregister() can be > called at the same time by 2 threads. Eg thread1 calls vhost_user_read_cb() > and removes the vsocket from conn_list, then thread2 calls > rte_vhost_driver_unregister() and frees the vsocket since it is NOT in the > conn_list. So thread1 will access invalid memory when trying to reconnect. > > The fix is to move the "removing of vsocket from conn_list" to end of the > vhost_user_read_cb(), then avoid the race condition. > > The core trace is > Program terminated with signal 11, Segmentation fault. Need to add the Fixes tag; Fixes: af1475918124 ("vhost: introduce API to start a specific driver") Cc: stable@dpdk.org > Signed-off-by: Zhike Wang <wangzhike@jd.com> > --- > lib/librte_vhost/socket.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index ebb2ff6..01a9dec 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -318,16 +318,16 @@ struct vhost_user { > > vhost_destroy_device(conn->vid); > > + if (vsocket->reconnect) { > + create_unix_socket(vsocket); > + vhost_user_start_client(vsocket); > + } > + > pthread_mutex_lock(&vsocket->conn_mutex); > TAILQ_REMOVE(&vsocket->conn_list, conn, next); > pthread_mutex_unlock(&vsocket->conn_mutex); > > free(conn); > - > - if (vsocket->reconnect) { > - create_unix_socket(vsocket); > - vhost_user_start_client(vsocket); > - } > } > } > > No need to send v2, I'll add it when applying to my tree. Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Thanks, Maxime
On 1/16/20 3:07 AM, Zhike Wang wrote:
> The vhost_user_read_cb() and rte_vhost_driver_unregister() can be
> called at the same time by 2 threads. Eg thread1 calls vhost_user_read_cb()
> and removes the vsocket from conn_list, then thread2 calls
> rte_vhost_driver_unregister() and frees the vsocket since it is NOT in the
> conn_list. So thread1 will access invalid memory when trying to reconnect.
>
> The fix is to move the "removing of vsocket from conn_list" to end of the
> vhost_user_read_cb(), then avoid the race condition.
>
> The core trace is
> Program terminated with signal 11, Segmentation fault.
>
> Signed-off-by: Zhike Wang <wangzhike@jd.com>
> ---
> lib/librte_vhost/socket.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
Applied to dpdk-next-virtio tree.
Thanks,
Maxime