* [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers @ 2022-07-14 8:44 abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini ` (5 more replies) 0 siblings, 6 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> In SW assisted live migration, vDPA driver will stop all virtqueues and setup up SW vrings to relay the communication between the virtio driver and the vDPA device using an event driven relay thread This will allow vDPA driver to help on guest dirty page logging for live migration. Abhimanyu Saini (5): common/sfc_efx/base: remove VQ index check during VQ start common/sfc_efx/base: update MCDI headers common/sfc_efx/base: use the updated definitions of cidx/pidx vdpa/sfc: enable support for multi-queue vdpa/sfc: Add support for SW assisted live migration drivers/common/sfc_efx/base/efx.h | 12 +- drivers/common/sfc_efx/base/efx_regs_mcdi.h | 36 +- drivers/common/sfc_efx/base/rhead_virtio.c | 28 +- drivers/vdpa/sfc/sfc_vdpa.h | 1 + drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 + drivers/vdpa/sfc/sfc_vdpa_ops.c | 345 ++++++++++++++++++-- drivers/vdpa/sfc/sfc_vdpa_ops.h | 17 +- 7 files changed, 378 insertions(+), 63 deletions(-) -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/5] common/sfc_efx/base: remove VQ index check during VQ start 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini @ 2022-07-14 8:44 ` abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini ` (4 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini, stable From: Abhimanyu Saini <absaini@amd.com> The used/avail queue indexes are not bound by queue size, because HW calculates descriptor entry index by performing a simple modulo between queue index and queue_size. So, do not check initial used and avail queue indexes against queue size because it is possible for these indexes to be greater than queue size in the following cases: 1) The queue is created to be migrated into, or 2) The client issues a qstop/qstart after running datapath Fixes: 4dda72dbdeab3 ("common/sfc_efx/base: add base virtio support for vDPA") Cc: stable@dpdk.org Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter drivers/common/sfc_efx/base/rhead_virtio.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c b/drivers/common/sfc_efx/base/rhead_virtio.c index 335cb747d1..7f087170fe 100644 --- a/drivers/common/sfc_efx/base/rhead_virtio.c +++ b/drivers/common/sfc_efx/base/rhead_virtio.c @@ -47,14 +47,6 @@ rhead_virtio_qstart( goto fail2; } - if (evvdp != NULL) { - if ((evvdp->evvd_vq_cidx > evvcp->evvc_vq_size) || - (evvdp->evvd_vq_pidx > evvcp->evvc_vq_size)) { - rc = EINVAL; - goto fail3; - } - } - req.emr_cmd = MC_CMD_VIRTIO_INIT_QUEUE; req.emr_in_buf = payload; req.emr_in_length = MC_CMD_VIRTIO_INIT_QUEUE_REQ_LEN; @@ -116,15 +108,13 @@ rhead_virtio_qstart( if (req.emr_rc != 0) { rc = req.emr_rc; - goto fail4; + goto fail3; } evvp->evv_vi_index = vi_index; return (0); -fail4: - EFSYS_PROBE(fail4); fail3: EFSYS_PROBE(fail3); fail2: -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 2/5] common/sfc_efx/base: update MCDI headers 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini @ 2022-07-14 8:44 ` abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini ` (3 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Regenerate MCDI headers from smartnic_registry:72940ad Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter drivers/common/sfc_efx/base/efx_regs_mcdi.h | 36 ++++++++++++++------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/common/sfc_efx/base/efx_regs_mcdi.h b/drivers/common/sfc_efx/base/efx_regs_mcdi.h index 2daf825a36..d1d8093601 100644 --- a/drivers/common/sfc_efx/base/efx_regs_mcdi.h +++ b/drivers/common/sfc_efx/base/efx_regs_mcdi.h @@ -28071,18 +28071,26 @@ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_FEATURES_HI_WIDTH 32 /* Enum values, see field(s): */ /* MC_CMD_VIRTIO_GET_FEATURES/MC_CMD_VIRTIO_GET_FEATURES_OUT/FEATURES */ -/* The initial producer index for this queue's used ring. If this queue is - * being created to be migrated into, this should be the FINAL_PIDX value - * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from. - * Otherwise, it should be zero. +/* The initial available index for this virtqueue. If this queue is being + * created to be migrated into, this should be the FINAL_AVAIL_IDX value + * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from (or + * equivalent if the original queue was on a thirdparty device). Otherwise, it + * should be zero. */ +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX_OFST 56 +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX_LEN 4 +/* Alias of INITIAL_AVAIL_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX_OFST 56 #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX_LEN 4 -/* The initial consumer index for this queue's available ring. If this queue is - * being created to be migrated into, this should be the FINAL_CIDX value - * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from. - * Otherwise, it should be zero. - */ +/* The initial used index for this virtqueue. If this queue is being created to + * be migrated into, this should be the FINAL_USED_IDX value returned by + * MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from (or equivalent if + * the original queue was on a thirdparty device). Otherwise, it should be + * zero. + */ +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX_OFST 60 +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX_LEN 4 +/* Alias of INITIAL_USED_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX_OFST 60 #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX_LEN 4 /* A MAE_MPORT_SELECTOR defining which mport this queue should be associated @@ -28128,10 +28136,16 @@ /* MC_CMD_VIRTIO_FINI_QUEUE_RESP msgresponse */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_LEN 8 -/* The producer index of the used ring when the queue was stopped. */ +/* The available index of the virtqueue when the queue was stopped. */ +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX_OFST 0 +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX_LEN 4 +/* Alias of FINAL_AVAIL_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX_OFST 0 #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX_LEN 4 -/* The consumer index of the available ring when the queue was stopped. */ +/* The used index of the virtqueue when the queue was stopped. */ +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX_OFST 4 +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX_LEN 4 +/* Alias of FINAL_USED_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX_OFST 4 #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX_LEN 4 -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini @ 2022-07-14 8:44 ` abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini ` (2 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Change cidx and pidx definition to mean used queue and avail queue index respectively. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter drivers/common/sfc_efx/base/efx.h | 12 ++++++------ drivers/common/sfc_efx/base/rhead_virtio.c | 16 ++++++++-------- drivers/vdpa/sfc/sfc_vdpa_ops.c | 4 ++-- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/common/sfc_efx/base/efx.h b/drivers/common/sfc_efx/base/efx.h index 95f5fb6bc0..c19205c638 100644 --- a/drivers/common/sfc_efx/base/efx.h +++ b/drivers/common/sfc_efx/base/efx.h @@ -4886,17 +4886,17 @@ typedef enum efx_virtio_vq_type_e { typedef struct efx_virtio_vq_dyncfg_s { /* - * If queue is being created to be migrated then this - * should be the FINAL_PIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE + * If queue is being created to be migrated then this should be + * the FINAL_AVAIL_IDX value returned by MC_CMD_VIRTIO_FINI_QUEUE * of the queue being migrated from. Otherwise, it should be zero. */ - uint32_t evvd_vq_pidx; + uint32_t evvd_vq_avail_idx; /* - * If this queue is being created to be migrated then this - * should be the FINAL_CIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE + * If queue is being created to be migrated then this should be + * the FINAL_USED_IDX value returned by MC_CMD_VIRTIO_FINI_QUEUE * of the queue being migrated from. Otherwise, it should be zero. */ - uint32_t evvd_vq_cidx; + uint32_t evvd_vq_used_idx; } efx_virtio_vq_dyncfg_t; /* diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c b/drivers/common/sfc_efx/base/rhead_virtio.c index 7f087170fe..5a2ebe8822 100644 --- a/drivers/common/sfc_efx/base/rhead_virtio.c +++ b/drivers/common/sfc_efx/base/rhead_virtio.c @@ -95,10 +95,10 @@ rhead_virtio_qstart( evvcp->evcc_features >> 32); if (evvdp != NULL) { - MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX, - evvdp->evvd_vq_pidx); - MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX, - evvdp->evvd_vq_cidx); + MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX, + evvdp->evvd_vq_avail_idx); + MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX, + evvdp->evvd_vq_used_idx); } MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_MPORT_SELECTOR, @@ -161,10 +161,10 @@ rhead_virtio_qstop( } if (evvdp != NULL) { - evvdp->evvd_vq_pidx = - MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX); - evvdp->evvd_vq_cidx = - MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX); + evvdp->evvd_vq_avail_idx = + MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX); + evvdp->evvd_vq_used_idx = + MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX); } return (0); diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index b84699d234..f4c4f82605 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -258,8 +258,8 @@ sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) vq_cfg.evvc_used_ring_addr = vring.used; vq_cfg.evvc_vq_size = vring.size; - vq_dyncfg.evvd_vq_pidx = vring.last_used_idx; - vq_dyncfg.evvd_vq_cidx = vring.last_avail_idx; + vq_dyncfg.evvd_vq_used_idx = vring.last_used_idx; + vq_dyncfg.evvd_vq_avail_idx = vring.last_avail_idx; /* MSI-X vector is function-relative */ vq_cfg.evvc_msix_vector = RTE_INTR_VEC_RXTX_OFFSET + vq_num; -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 4/5] vdpa/sfc: enable support for multi-queue 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (2 preceding siblings ...) 2022-07-14 8:44 ` [PATCH v2 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini @ 2022-07-14 8:44 ` abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Increase the number to default RX/TX queue pairs to 8, and add MQ feature flag to vDPA protocol features. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 ++ drivers/vdpa/sfc/sfc_vdpa_ops.c | 10 ++++++---- drivers/vdpa/sfc/sfc_vdpa_ops.h | 2 +- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/vdpa/sfc/sfc_vdpa_hw.c b/drivers/vdpa/sfc/sfc_vdpa_hw.c index a7018b1ffe..edb7e35c2c 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_hw.c +++ b/drivers/vdpa/sfc/sfc_vdpa_hw.c @@ -286,6 +286,8 @@ sfc_vdpa_estimate_resource_limits(struct sfc_vdpa_adapter *sva) SFC_VDPA_ASSERT(max_queue_cnt > 0); sva->max_queue_count = max_queue_cnt; + sfc_vdpa_log_init(sva, "NIC init done with %u pair(s) of queues", + max_queue_cnt); return 0; diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index f4c4f82605..6401d4e16f 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -24,14 +24,16 @@ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ - (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD)) + (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_MQ)) /* * Set of features which are enabled by default. * Protocol feature bit is needed to enable notification notifier ctrl. */ #define SFC_VDPA_DEFAULT_FEATURES \ - (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) + ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ + (1ULL << VIRTIO_NET_F_MQ)) #define SFC_VDPA_MSIX_IRQ_SET_BUF_LEN \ (sizeof(struct vfio_irq_set) + \ @@ -321,8 +323,8 @@ sfc_vdpa_virtq_stop(struct sfc_vdpa_ops_data *ops_data, int vq_num) /* stop the vq */ rc = efx_virtio_qstop(vq, &vq_idx); if (rc == 0) { - ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_cidx; - ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_pidx; + ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_used_idx; + ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_avail_idx; } ops_data->vq_cxt[vq_num].enable = B_FALSE; diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.h b/drivers/vdpa/sfc/sfc_vdpa_ops.h index 9dbd5b84dd..5c8e352de3 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.h +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.h @@ -7,7 +7,7 @@ #include <rte_vdpa.h> -#define SFC_VDPA_MAX_QUEUE_PAIRS 1 +#define SFC_VDPA_MAX_QUEUE_PAIRS 8 enum sfc_vdpa_context { SFC_VDPA_AS_VF -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 5/5] vdpa/sfc: Add support for SW assisted live migration 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (3 preceding siblings ...) 2022-07-14 8:44 ` [PATCH v2 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini @ 2022-07-14 8:44 ` abhimanyu.saini 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 8:44 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> In SW assisted live migration, vDPA driver will stop all virtqueues and setup up SW vrings to relay the communication between the virtio driver and the vDPA device using an event driven relay thread This will allow vDPA driver to help on guest dirty page logging for live migration. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter drivers/vdpa/sfc/sfc_vdpa.h | 1 + drivers/vdpa/sfc/sfc_vdpa_ops.c | 337 ++++++++++++++++++++++++++++++-- drivers/vdpa/sfc/sfc_vdpa_ops.h | 15 +- 3 files changed, 330 insertions(+), 23 deletions(-) diff --git a/drivers/vdpa/sfc/sfc_vdpa.h b/drivers/vdpa/sfc/sfc_vdpa.h index daeb27d4cd..ae522caebe 100644 --- a/drivers/vdpa/sfc/sfc_vdpa.h +++ b/drivers/vdpa/sfc/sfc_vdpa.h @@ -18,6 +18,7 @@ #define SFC_VDPA_MAC_ADDR "mac" #define SFC_VDPA_DEFAULT_MCDI_IOVA 0x200000000000 +#define SFC_SW_VRING_IOVA 0x300000000000 /* Broadcast & Unicast MAC filters are supported */ #define SFC_MAX_SUPPORTED_FILTERS 3 diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index 6401d4e16f..1d29ee7187 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -4,10 +4,13 @@ #include <pthread.h> #include <unistd.h> +#include <sys/epoll.h> #include <sys/ioctl.h> +#include <rte_eal_paging.h> #include <rte_errno.h> #include <rte_malloc.h> +#include <rte_memory.h> #include <rte_vdpa.h> #include <rte_vfio.h> #include <rte_vhost.h> @@ -33,7 +36,9 @@ */ #define SFC_VDPA_DEFAULT_FEATURES \ ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ - (1ULL << VIRTIO_NET_F_MQ)) + (1ULL << VIRTIO_NET_F_MQ) | \ + (1ULL << VHOST_F_LOG_ALL) | \ + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE)) #define SFC_VDPA_MSIX_IRQ_SET_BUF_LEN \ (sizeof(struct vfio_irq_set) + \ @@ -42,6 +47,142 @@ /* It will be used for target VF when calling function is not PF */ #define SFC_VDPA_VF_NULL 0xFFFF +#define SFC_VDPA_DECODE_FD(data) (data.u64 >> 32) +#define SFC_VDPA_DECODE_QID(data) (data.u32 >> 1) +#define SFC_VDPA_DECODE_EV_TYPE(data) (data.u32 & 1) + +/* + * Create q_num number of epoll events for kickfd interrupts + * and q_num/2 events for callfd interrupts. Round up the + * total to (q_num * 2) number of events. + */ +#define SFC_VDPA_SW_RELAY_EVENT_NUM(q_num) (q_num * 2) + +static inline uint64_t +sfc_vdpa_encode_ev_data(int type, uint32_t qid, int fd) +{ + SFC_VDPA_ASSERT(fd > UINT32_MAX || qid > UINT32_MAX / 2); + return type | (qid << 1) | (uint64_t)fd << 32; +} + +static inline void +sfc_vdpa_queue_relay(struct sfc_vdpa_ops_data *ops_data, uint32_t qid) +{ + rte_vdpa_relay_vring_used(ops_data->vid, qid, &ops_data->sw_vq[qid]); + rte_vhost_vring_call(ops_data->vid, qid); +} + +static void* +sfc_vdpa_sw_relay(void *data) +{ + uint64_t buf; + uint32_t qid, q_num; + struct epoll_event ev; + struct rte_vhost_vring vring; + int nbytes, i, ret, fd, epfd, nfds = 0; + struct epoll_event events[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + struct sfc_vdpa_ops_data *ops_data = (struct sfc_vdpa_ops_data *)data; + + q_num = rte_vhost_get_vring_num(ops_data->vid); + epfd = epoll_create(SFC_VDPA_SW_RELAY_EVENT_NUM(q_num)); + if (epfd < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "failed to create epoll instance"); + goto fail_epoll; + } + ops_data->epfd = epfd; + + vring.kickfd = -1; + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + ret = rte_vhost_get_vhost_vring(ops_data->vid, qid, &vring); + if (ret != 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "rte_vhost_get_vhost_vring error %s", + strerror(errno)); + goto fail_vring; + } + + ev.data.u64 = sfc_vdpa_encode_ev_data(0, qid, vring.kickfd); + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll add error: %s", + strerror(errno)); + goto fail_epoll_add; + } + } + + /* + * Register intr_fd created by vDPA driver in lieu of qemu's callfd + * to intercept rx queue notification. So that we can monitor rx + * notifications and issue rte_vdpa_relay_vring_used() + */ + for (qid = 0; qid < q_num; qid += 2) { + fd = ops_data->intr_fd[qid]; + ev.events = EPOLLIN | EPOLLPRI; + ev.data.u64 = sfc_vdpa_encode_ev_data(1, qid, fd); + if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll add error: %s", + strerror(errno)); + goto fail_epoll_add; + } + sfc_vdpa_queue_relay(ops_data, qid); + } + + /* + * virtio driver in VM was continuously sending queue notifications + * while were setting up software vrings and hence the HW misses + * these doorbell notifications. Since, it is safe to send duplicate + * doorbell, send another doorbell from vDPA driver. + */ + for (qid = 0; qid < q_num; qid++) + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); + + for (;;) { + nfds = epoll_wait(epfd, events, + SFC_VDPA_SW_RELAY_EVENT_NUM(q_num), -1); + if (nfds < 0) { + if (errno == EINTR) + continue; + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll_wait return fail\n"); + goto fail_epoll_wait; + } + + for (i = 0; i < nfds; i++) { + fd = SFC_VDPA_DECODE_FD(events[i].data); + /* Ensure kickfd is not busy before proceeding */ + for (;;) { + nbytes = read(fd, &buf, 8); + if (nbytes < 0) { + if (errno == EINTR || + errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + } + break; + } + + qid = SFC_VDPA_DECODE_QID(events[i].data); + if (SFC_VDPA_DECODE_EV_TYPE(events[i].data)) + sfc_vdpa_queue_relay(ops_data, qid); + else + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); + } + } + + return NULL; + +fail_epoll: +fail_vring: +fail_epoll_add: +fail_epoll_wait: + close(epfd); + ops_data->epfd = -1; + return NULL; +} + static int sfc_vdpa_get_device_features(struct sfc_vdpa_ops_data *ops_data) { @@ -99,7 +240,7 @@ hva_to_gpa(int vid, uint64_t hva) static int sfc_vdpa_enable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) { - int rc; + int rc, fd; int *irq_fd_ptr; int vfio_dev_fd; uint32_t i, num_vring; @@ -131,6 +272,17 @@ sfc_vdpa_enable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) return -1; irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; + if (ops_data->sw_fallback_mode && !(i & 1)) { + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + sfc_vdpa_err(ops_data->dev_handle, + "failed to create eventfd"); + goto fail_eventfd; + } + ops_data->intr_fd[i] = fd; + irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; + } else + ops_data->intr_fd[i] = -1; } rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -138,16 +290,26 @@ sfc_vdpa_enable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) sfc_vdpa_err(ops_data->dev_handle, "error enabling MSI-X interrupts: %s", strerror(errno)); - return -1; + goto fail_ioctl; } return 0; + +fail_ioctl: +fail_eventfd: + for (i = 0; i < num_vring; i++) { + if (ops_data->intr_fd[i] != -1) { + close(ops_data->intr_fd[i]); + ops_data->intr_fd[i] = -1; + } + } + return -1; } static int sfc_vdpa_disable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) { - int rc; + int rc, i; int vfio_dev_fd; struct vfio_irq_set irq_set; void *dev; @@ -161,6 +323,12 @@ sfc_vdpa_disable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) irq_set.index = VFIO_PCI_MSIX_IRQ_INDEX; irq_set.start = 0; + for (i = 0; i < ops_data->vq_count; i++) { + if (ops_data->intr_fd[i] >= 0) + close(ops_data->intr_fd[i]); + ops_data->intr_fd[i] = -1; + } + rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, &irq_set); if (rc) { sfc_vdpa_err(ops_data->dev_handle, @@ -223,12 +391,15 @@ sfc_vdpa_get_vring_info(struct sfc_vdpa_ops_data *ops_data, static int sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) { - int rc; + int rc, fd; + uint64_t size; uint32_t doorbell; efx_virtio_vq_t *vq; + void *vring_buf, *dev; struct sfc_vdpa_vring_info vring; efx_virtio_vq_cfg_t vq_cfg; efx_virtio_vq_dyncfg_t vq_dyncfg; + uint64_t sw_vq_iova = ops_data->sw_vq_iova; vq = ops_data->vq_cxt[vq_num].vq; if (vq == NULL) @@ -241,6 +412,33 @@ sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) goto fail_vring_info; } + if (ops_data->sw_fallback_mode) { + size = vring_size(vring.size, rte_mem_page_size()); + size = RTE_ALIGN_CEIL(size, rte_mem_page_size()); + vring_buf = rte_zmalloc("vdpa", size, rte_mem_page_size()); + vring_init(&ops_data->sw_vq[vq_num], vring.size, vring_buf, + rte_mem_page_size()); + + dev = ops_data->dev_handle; + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; + rc = rte_vfio_container_dma_map(fd, + (uint64_t)(uintptr_t)vring_buf, + sw_vq_iova, size); + + /* Direct I/O for Tx queue, relay for Rx queue */ + if (!(vq_num & 1)) + vring.used = sw_vq_iova + + (char *)ops_data->sw_vq[vq_num].used - + (char *)ops_data->sw_vq[vq_num].desc; + + ops_data->sw_vq[vq_num].used->idx = vring.last_used_idx; + ops_data->sw_vq[vq_num].avail->idx = vring.last_avail_idx; + + ops_data->vq_cxt[vq_num].sw_vq_iova = sw_vq_iova; + ops_data->vq_cxt[vq_num].sw_vq_size = size; + ops_data->sw_vq_iova += size; + } + vq_cfg.evvc_target_vf = SFC_VDPA_VF_NULL; /* even virtqueue for RX and odd for TX */ @@ -309,9 +507,12 @@ sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) static int sfc_vdpa_virtq_stop(struct sfc_vdpa_ops_data *ops_data, int vq_num) { - int rc; + int rc, fd; + void *dev, *buf; + uint64_t size, len, iova; efx_virtio_vq_dyncfg_t vq_idx; efx_virtio_vq_t *vq; + struct rte_vhost_vring vring; if (ops_data->vq_cxt[vq_num].enable != B_TRUE) return -1; @@ -320,12 +521,34 @@ sfc_vdpa_virtq_stop(struct sfc_vdpa_ops_data *ops_data, int vq_num) if (vq == NULL) return -1; + if (ops_data->sw_fallback_mode) { + dev = ops_data->dev_handle; + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; + /* synchronize remaining new used entries if any */ + if (!(vq_num & 1)) + sfc_vdpa_queue_relay(ops_data, vq_num); + + rte_vhost_get_vhost_vring(ops_data->vid, vq_num, &vring); + len = SFC_VDPA_USED_RING_LEN(vring.size); + rte_vhost_log_used_vring(ops_data->vid, vq_num, 0, len); + + buf = ops_data->sw_vq[vq_num].desc; + size = ops_data->vq_cxt[vq_num].sw_vq_size; + iova = ops_data->vq_cxt[vq_num].sw_vq_iova; + rte_vfio_container_dma_unmap(fd, (uint64_t)(uintptr_t)buf, + iova, size); + } + /* stop the vq */ rc = efx_virtio_qstop(vq, &vq_idx); if (rc == 0) { - ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_used_idx; - ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_avail_idx; + if (ops_data->sw_fallback_mode) + vq_idx.evvd_vq_avail_idx = vq_idx.evvd_vq_used_idx; + rte_vhost_set_vring_base(ops_data->vid, vq_num, + vq_idx.evvd_vq_avail_idx, + vq_idx.evvd_vq_used_idx); } + ops_data->vq_cxt[vq_num].enable = B_FALSE; return rc; @@ -450,7 +673,11 @@ sfc_vdpa_start(struct sfc_vdpa_ops_data *ops_data) SFC_EFX_ASSERT(ops_data->state == SFC_VDPA_STATE_CONFIGURED); - sfc_vdpa_log_init(ops_data->dev_handle, "entry"); + if (ops_data->sw_fallback_mode) { + sfc_vdpa_log_init(ops_data->dev_handle, + "Trying to start VDPA with SW I/O relay"); + ops_data->sw_vq_iova = SFC_SW_VRING_IOVA; + } ops_data->state = SFC_VDPA_STATE_STARTING; @@ -675,6 +902,7 @@ static int sfc_vdpa_dev_close(int vid) { int ret; + void *status; struct rte_vdpa_device *vdpa_dev; struct sfc_vdpa_ops_data *ops_data; @@ -707,7 +935,23 @@ sfc_vdpa_dev_close(int vid) } ops_data->is_notify_thread_started = false; + if (ops_data->sw_fallback_mode) { + ret = pthread_cancel(ops_data->sw_relay_thread_id); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to cancel LM relay thread: %s", + rte_strerror(ret)); + + ret = pthread_join(ops_data->sw_relay_thread_id, &status); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to join LM relay thread: %s", + rte_strerror(ret)); + } + sfc_vdpa_stop(ops_data); + ops_data->sw_fallback_mode = false; + sfc_vdpa_close(ops_data); sfc_vdpa_adapter_unlock(ops_data->dev_handle); @@ -774,9 +1018,49 @@ sfc_vdpa_set_vring_state(int vid, int vring, int state) static int sfc_vdpa_set_features(int vid) { - RTE_SET_USED(vid); + int ret; + uint64_t features = 0; + struct rte_vdpa_device *vdpa_dev; + struct sfc_vdpa_ops_data *ops_data; - return -1; + vdpa_dev = rte_vhost_get_vdpa_device(vid); + ops_data = sfc_vdpa_get_data_by_dev(vdpa_dev); + if (ops_data == NULL) + return -1; + + rte_vhost_get_negotiated_features(vid, &features); + + if (!RTE_VHOST_NEED_LOG(features)) + return -1; + + sfc_vdpa_info(ops_data->dev_handle, "live-migration triggered"); + + sfc_vdpa_adapter_lock(ops_data->dev_handle); + + /* Stop HW Offload and unset host notifier */ + sfc_vdpa_stop(ops_data); + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false) != 0) + sfc_vdpa_info(ops_data->dev_handle, + "vDPA (%s): Failed to clear host notifier", + ops_data->vdpa_dev->device->name); + + /* Restart vDPA with SW relay on RX queue */ + ops_data->sw_fallback_mode = true; + sfc_vdpa_start(ops_data); + ret = pthread_create(&ops_data->sw_relay_thread_id, NULL, + sfc_vdpa_sw_relay, (void *)ops_data); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to create rx_relay thread: %s", + rte_strerror(ret)); + + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0) + sfc_vdpa_info(ops_data->dev_handle, "notifier setup failed!"); + + sfc_vdpa_adapter_unlock(ops_data->dev_handle); + sfc_vdpa_info(ops_data->dev_handle, "SW fallback setup done!"); + + return 0; } static int @@ -860,17 +1144,28 @@ sfc_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size) sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, *offset); - pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; - doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; + if (!ops_data->sw_fallback_mode) { + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + + *offset; + /* + * virtio-net driver in VM sends queue notifications before + * vDPA has a chance to setup the queues and notification area, + * and hence the HW misses these doorbell notifications. + * Since, it is safe to send duplicate doorbell, send another + * doorbell from vDPA driver as workaround for this timing issue + */ + rte_write16(qid, doorbell); + + /* + * Update doorbell address, it will come in handy during + * live-migration. + */ + ops_data->vq_cxt[qid].doorbell = doorbell; + } - /* - * virtio-net driver in VM sends queue notifications before - * vDPA has a chance to setup the queues and notification area, - * and hence the HW misses these doorbell notifications. - * Since, it is safe to send duplicate doorbell, send another - * doorbell from vDPA driver as workaround for this timing issue. - */ - rte_write16(qid, doorbell); + sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, + *offset); return 0; } diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.h b/drivers/vdpa/sfc/sfc_vdpa_ops.h index 5c8e352de3..dd301bae86 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.h +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.h @@ -6,8 +6,11 @@ #define _SFC_VDPA_OPS_H #include <rte_vdpa.h> +#include <vdpa_driver.h> #define SFC_VDPA_MAX_QUEUE_PAIRS 8 +#define SFC_VDPA_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) enum sfc_vdpa_context { SFC_VDPA_AS_VF @@ -37,9 +40,10 @@ struct sfc_vdpa_vring_info { typedef struct sfc_vdpa_vq_context_s { volatile void *doorbell; uint8_t enable; - uint32_t pidx; - uint32_t cidx; efx_virtio_vq_t *vq; + + uint64_t sw_vq_iova; + uint64_t sw_vq_size; } sfc_vdpa_vq_context_t; struct sfc_vdpa_ops_data { @@ -57,6 +61,13 @@ struct sfc_vdpa_ops_data { uint16_t vq_count; struct sfc_vdpa_vq_context_s vq_cxt[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + + int epfd; + uint64_t sw_vq_iova; + bool sw_fallback_mode; + pthread_t sw_relay_thread_id; + struct vring sw_vq[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + int intr_fd[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; }; struct sfc_vdpa_ops_data * -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (4 preceding siblings ...) 2022-07-14 8:44 ` [PATCH v2 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini @ 2022-07-14 13:47 ` abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini ` (5 more replies) 5 siblings, 6 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:47 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> In SW assisted live migration, vDPA driver will stop all virtqueues and setup up SW vrings to relay the communication between the virtio driver and the vDPA device using an event driven relay thread This will allow vDPA driver to help on guest dirty page logging for live migration. Abhimanyu Saini (5): common/sfc_efx/base: remove VQ index check during VQ start common/sfc_efx/base: update MCDI headers common/sfc_efx/base: use the updated definitions of cidx/pidx vdpa/sfc: enable support for multi-queue vdpa/sfc: Add support for SW assisted live migration drivers/common/sfc_efx/base/efx.h | 12 +- drivers/common/sfc_efx/base/efx_regs_mcdi.h | 36 +- drivers/common/sfc_efx/base/rhead_virtio.c | 28 +- drivers/vdpa/sfc/sfc_vdpa.h | 1 + drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 + drivers/vdpa/sfc/sfc_vdpa_ops.c | 345 ++++++++++++++++++-- drivers/vdpa/sfc/sfc_vdpa_ops.h | 17 +- 7 files changed, 378 insertions(+), 63 deletions(-) -- 2.18.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/5] common/sfc_efx/base: remove VQ index check during VQ start 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini @ 2022-07-14 13:48 ` abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini ` (4 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:48 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> The used/avail queue indexes are not bound by queue size, because HW calculates descriptor entry index by performing a simple modulo between queue index and queue_size. So, do not check initial used and avail queue indexes against queue size because it is possible for these indexes to be greater than queue size in the following cases: 1) The queue is created to be migrated into, or 2) The client issues a qstop/qstart after running datapath Fixes: 4dda72dbdeab3 ("common/sfc_efx/base: add base virtio support for vDPA") Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter v3: * Restructure patchset drivers/common/sfc_efx/base/rhead_virtio.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c b/drivers/common/sfc_efx/base/rhead_virtio.c index 335cb74..7f08717 100644 --- a/drivers/common/sfc_efx/base/rhead_virtio.c +++ b/drivers/common/sfc_efx/base/rhead_virtio.c @@ -47,14 +47,6 @@ goto fail2; } - if (evvdp != NULL) { - if ((evvdp->evvd_vq_cidx > evvcp->evvc_vq_size) || - (evvdp->evvd_vq_pidx > evvcp->evvc_vq_size)) { - rc = EINVAL; - goto fail3; - } - } - req.emr_cmd = MC_CMD_VIRTIO_INIT_QUEUE; req.emr_in_buf = payload; req.emr_in_length = MC_CMD_VIRTIO_INIT_QUEUE_REQ_LEN; @@ -116,15 +108,13 @@ if (req.emr_rc != 0) { rc = req.emr_rc; - goto fail4; + goto fail3; } evvp->evv_vi_index = vi_index; return (0); -fail4: - EFSYS_PROBE(fail4); fail3: EFSYS_PROBE(fail3); fail2: -- 1.8.3.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini @ 2022-07-14 13:48 ` abhimanyu.saini 2022-07-28 11:32 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini ` (3 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:48 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Regenerate MCDI headers from smartnic_registry:72940ad Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter v3: * Restructure patchset drivers/common/sfc_efx/base/efx_regs_mcdi.h | 36 ++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/common/sfc_efx/base/efx_regs_mcdi.h b/drivers/common/sfc_efx/base/efx_regs_mcdi.h index 2daf825..d1d8093 100644 --- a/drivers/common/sfc_efx/base/efx_regs_mcdi.h +++ b/drivers/common/sfc_efx/base/efx_regs_mcdi.h @@ -28071,18 +28071,26 @@ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_FEATURES_HI_WIDTH 32 /* Enum values, see field(s): */ /* MC_CMD_VIRTIO_GET_FEATURES/MC_CMD_VIRTIO_GET_FEATURES_OUT/FEATURES */ -/* The initial producer index for this queue's used ring. If this queue is - * being created to be migrated into, this should be the FINAL_PIDX value - * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from. - * Otherwise, it should be zero. +/* The initial available index for this virtqueue. If this queue is being + * created to be migrated into, this should be the FINAL_AVAIL_IDX value + * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from (or + * equivalent if the original queue was on a thirdparty device). Otherwise, it + * should be zero. */ +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX_OFST 56 +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX_LEN 4 +/* Alias of INITIAL_AVAIL_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX_OFST 56 #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX_LEN 4 -/* The initial consumer index for this queue's available ring. If this queue is - * being created to be migrated into, this should be the FINAL_CIDX value - * returned by MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from. - * Otherwise, it should be zero. - */ +/* The initial used index for this virtqueue. If this queue is being created to + * be migrated into, this should be the FINAL_USED_IDX value returned by + * MC_CMD_VIRTIO_FINI_QUEUE of the queue being migrated from (or equivalent if + * the original queue was on a thirdparty device). Otherwise, it should be + * zero. + */ +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX_OFST 60 +#define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX_LEN 4 +/* Alias of INITIAL_USED_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX_OFST 60 #define MC_CMD_VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX_LEN 4 /* A MAE_MPORT_SELECTOR defining which mport this queue should be associated @@ -28128,10 +28136,16 @@ /* MC_CMD_VIRTIO_FINI_QUEUE_RESP msgresponse */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_LEN 8 -/* The producer index of the used ring when the queue was stopped. */ +/* The available index of the virtqueue when the queue was stopped. */ +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX_OFST 0 +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX_LEN 4 +/* Alias of FINAL_AVAIL_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX_OFST 0 #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX_LEN 4 -/* The consumer index of the available ring when the queue was stopped. */ +/* The used index of the virtqueue when the queue was stopped. */ +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX_OFST 4 +#define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX_LEN 4 +/* Alias of FINAL_USED_IDX, kept for compatibility. */ #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX_OFST 4 #define MC_CMD_VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX_LEN 4 -- 1.8.3.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers 2022-07-14 13:48 ` [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini @ 2022-07-28 11:32 ` Andrew Rybchenko 0 siblings, 0 replies; 17+ messages in thread From: Andrew Rybchenko @ 2022-07-28 11:32 UTC (permalink / raw) To: abhimanyu.saini, dev; +Cc: chenbo.xia, maxime.coquelin, Abhimanyu Saini On 7/14/22 16:48, abhimanyu.saini@xilinx.com wrote: > From: Abhimanyu Saini <absaini@amd.com> > > Regenerate MCDI headers from smartnic_registry:72940ad since smartnic_registry is not publicly available, it does not make sense to refer to its changeset in the patch description. I think you should put update goal here. Which MCDI fields do you want to pick up? > > Signed-off-by: Abhimanyu Saini <absaini@amd.com> Other than that Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini @ 2022-07-14 13:48 ` abhimanyu.saini 2022-07-28 11:34 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini ` (2 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:48 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Change cidx and pidx definition to mean used queue and avail queue index respectively. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter v3: * Restructure patchset drivers/common/sfc_efx/base/efx.h | 12 ++++++------ drivers/common/sfc_efx/base/rhead_virtio.c | 16 ++++++++-------- drivers/vdpa/sfc/sfc_vdpa_ops.c | 8 ++++---- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/common/sfc_efx/base/efx.h b/drivers/common/sfc_efx/base/efx.h index 95f5fb6..c19205c 100644 --- a/drivers/common/sfc_efx/base/efx.h +++ b/drivers/common/sfc_efx/base/efx.h @@ -4886,17 +4886,17 @@ extern __checkReturn __success(return != B_FALSE) boolean_t typedef struct efx_virtio_vq_dyncfg_s { /* - * If queue is being created to be migrated then this - * should be the FINAL_PIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE + * If queue is being created to be migrated then this should be + * the FINAL_AVAIL_IDX value returned by MC_CMD_VIRTIO_FINI_QUEUE * of the queue being migrated from. Otherwise, it should be zero. */ - uint32_t evvd_vq_pidx; + uint32_t evvd_vq_avail_idx; /* - * If this queue is being created to be migrated then this - * should be the FINAL_CIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE + * If queue is being created to be migrated then this should be + * the FINAL_USED_IDX value returned by MC_CMD_VIRTIO_FINI_QUEUE * of the queue being migrated from. Otherwise, it should be zero. */ - uint32_t evvd_vq_cidx; + uint32_t evvd_vq_used_idx; } efx_virtio_vq_dyncfg_t; /* diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c b/drivers/common/sfc_efx/base/rhead_virtio.c index 7f08717..5a2ebe8 100644 --- a/drivers/common/sfc_efx/base/rhead_virtio.c +++ b/drivers/common/sfc_efx/base/rhead_virtio.c @@ -95,10 +95,10 @@ evvcp->evcc_features >> 32); if (evvdp != NULL) { - MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_PIDX, - evvdp->evvd_vq_pidx); - MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_CIDX, - evvdp->evvd_vq_cidx); + MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_AVAIL_IDX, + evvdp->evvd_vq_avail_idx); + MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_INITIAL_USED_IDX, + evvdp->evvd_vq_used_idx); } MCDI_IN_SET_DWORD(req, VIRTIO_INIT_QUEUE_REQ_MPORT_SELECTOR, @@ -161,10 +161,10 @@ } if (evvdp != NULL) { - evvdp->evvd_vq_pidx = - MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_PIDX); - evvdp->evvd_vq_cidx = - MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_CIDX); + evvdp->evvd_vq_avail_idx = + MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_AVAIL_IDX); + evvdp->evvd_vq_used_idx = + MCDI_OUT_DWORD(req, VIRTIO_FINI_QUEUE_RESP_FINAL_USED_IDX); } return (0); diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index b84699d..e2f119b 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -258,8 +258,8 @@ vq_cfg.evvc_used_ring_addr = vring.used; vq_cfg.evvc_vq_size = vring.size; - vq_dyncfg.evvd_vq_pidx = vring.last_used_idx; - vq_dyncfg.evvd_vq_cidx = vring.last_avail_idx; + vq_dyncfg.evvd_vq_used_idx = vring.last_used_idx; + vq_dyncfg.evvd_vq_avail_idx = vring.last_avail_idx; /* MSI-X vector is function-relative */ vq_cfg.evvc_msix_vector = RTE_INTR_VEC_RXTX_OFFSET + vq_num; @@ -321,8 +321,8 @@ /* stop the vq */ rc = efx_virtio_qstop(vq, &vq_idx); if (rc == 0) { - ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_cidx; - ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_pidx; + ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_used_idx; + ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_avail_idx; } ops_data->vq_cxt[vq_num].enable = B_FALSE; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx 2022-07-14 13:48 ` [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini @ 2022-07-28 11:34 ` Andrew Rybchenko 0 siblings, 0 replies; 17+ messages in thread From: Andrew Rybchenko @ 2022-07-28 11:34 UTC (permalink / raw) To: abhimanyu.saini, dev; +Cc: chenbo.xia, maxime.coquelin, Abhimanyu Saini On 7/14/22 16:48, abhimanyu.saini@xilinx.com wrote: > From: Abhimanyu Saini <absaini@amd.com> Please, try to stick to one E-mail address. > > Change cidx and pidx definition to mean used queue and avail > queue index respectively. > > Signed-off-by: Abhimanyu Saini <absaini@amd.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (2 preceding siblings ...) 2022-07-14 13:48 ` [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini @ 2022-07-14 13:48 ` abhimanyu.saini 2022-07-28 11:29 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini 2022-10-04 15:31 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers Andrew Rybchenko 5 siblings, 1 reply; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:48 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> Increase the number to default RX/TX queue pairs to 8, and add MQ feature flag to vDPA protocol features. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter v3: * Restructure patchset drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 ++ drivers/vdpa/sfc/sfc_vdpa_ops.c | 6 ++++-- drivers/vdpa/sfc/sfc_vdpa_ops.h | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/vdpa/sfc/sfc_vdpa_hw.c b/drivers/vdpa/sfc/sfc_vdpa_hw.c index a7018b1..edb7e35 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_hw.c +++ b/drivers/vdpa/sfc/sfc_vdpa_hw.c @@ -286,6 +286,8 @@ SFC_VDPA_ASSERT(max_queue_cnt > 0); sva->max_queue_count = max_queue_cnt; + sfc_vdpa_log_init(sva, "NIC init done with %u pair(s) of queues", + max_queue_cnt); return 0; diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index e2f119b..6401d4e 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -24,14 +24,16 @@ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ - (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD)) + (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_MQ)) /* * Set of features which are enabled by default. * Protocol feature bit is needed to enable notification notifier ctrl. */ #define SFC_VDPA_DEFAULT_FEATURES \ - (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) + ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ + (1ULL << VIRTIO_NET_F_MQ)) #define SFC_VDPA_MSIX_IRQ_SET_BUF_LEN \ (sizeof(struct vfio_irq_set) + \ diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.h b/drivers/vdpa/sfc/sfc_vdpa_ops.h index 9dbd5b8..5c8e352 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.h +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.h @@ -7,7 +7,7 @@ #include <rte_vdpa.h> -#define SFC_VDPA_MAX_QUEUE_PAIRS 1 +#define SFC_VDPA_MAX_QUEUE_PAIRS 8 enum sfc_vdpa_context { SFC_VDPA_AS_VF -- 1.8.3.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue 2022-07-14 13:48 ` [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini @ 2022-07-28 11:29 ` Andrew Rybchenko 0 siblings, 0 replies; 17+ messages in thread From: Andrew Rybchenko @ 2022-07-28 11:29 UTC (permalink / raw) To: abhimanyu.saini, dev; +Cc: chenbo.xia, maxime.coquelin, Abhimanyu Saini On 7/14/22 16:48, abhimanyu.saini@xilinx.com wrote: > From: Abhimanyu Saini <absaini@amd.com> > > Increase the number to default RX/TX queue pairs to 8, > and add MQ feature flag to vDPA protocol features. > > Signed-off-by: Abhimanyu Saini <absaini@amd.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (3 preceding siblings ...) 2022-07-14 13:48 ` [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini @ 2022-07-14 13:48 ` abhimanyu.saini 2022-07-28 13:42 ` Andrew Rybchenko 2022-10-04 15:31 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers Andrew Rybchenko 5 siblings, 1 reply; 17+ messages in thread From: abhimanyu.saini @ 2022-07-14 13:48 UTC (permalink / raw) To: dev; +Cc: chenbo.xia, maxime.coquelin, andrew.rybchenko, Abhimanyu Saini From: Abhimanyu Saini <absaini@amd.com> In SW assisted live migration, vDPA driver will stop all virtqueues and setup up SW vrings to relay the communication between the virtio driver and the vDPA device using an event driven relay thread This will allow vDPA driver to help on guest dirty page logging for live migration. Signed-off-by: Abhimanyu Saini <absaini@amd.com> --- v2: * Fix checkpatch warnings * Add a cover letter v3: * Restructure patchset drivers/vdpa/sfc/sfc_vdpa.h | 1 + drivers/vdpa/sfc/sfc_vdpa_ops.c | 337 +++++++++++++++++++++++++++++++++++++--- drivers/vdpa/sfc/sfc_vdpa_ops.h | 15 +- 3 files changed, 330 insertions(+), 23 deletions(-) diff --git a/drivers/vdpa/sfc/sfc_vdpa.h b/drivers/vdpa/sfc/sfc_vdpa.h index daeb27d..ae522ca 100644 --- a/drivers/vdpa/sfc/sfc_vdpa.h +++ b/drivers/vdpa/sfc/sfc_vdpa.h @@ -18,6 +18,7 @@ #define SFC_VDPA_MAC_ADDR "mac" #define SFC_VDPA_DEFAULT_MCDI_IOVA 0x200000000000 +#define SFC_SW_VRING_IOVA 0x300000000000 /* Broadcast & Unicast MAC filters are supported */ #define SFC_MAX_SUPPORTED_FILTERS 3 diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c index 6401d4e..1d29ee7 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c @@ -4,10 +4,13 @@ #include <pthread.h> #include <unistd.h> +#include <sys/epoll.h> #include <sys/ioctl.h> +#include <rte_eal_paging.h> #include <rte_errno.h> #include <rte_malloc.h> +#include <rte_memory.h> #include <rte_vdpa.h> #include <rte_vfio.h> #include <rte_vhost.h> @@ -33,7 +36,9 @@ */ #define SFC_VDPA_DEFAULT_FEATURES \ ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ - (1ULL << VIRTIO_NET_F_MQ)) + (1ULL << VIRTIO_NET_F_MQ) | \ + (1ULL << VHOST_F_LOG_ALL) | \ + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE)) #define SFC_VDPA_MSIX_IRQ_SET_BUF_LEN \ (sizeof(struct vfio_irq_set) + \ @@ -42,6 +47,142 @@ /* It will be used for target VF when calling function is not PF */ #define SFC_VDPA_VF_NULL 0xFFFF +#define SFC_VDPA_DECODE_FD(data) (data.u64 >> 32) +#define SFC_VDPA_DECODE_QID(data) (data.u32 >> 1) +#define SFC_VDPA_DECODE_EV_TYPE(data) (data.u32 & 1) + +/* + * Create q_num number of epoll events for kickfd interrupts + * and q_num/2 events for callfd interrupts. Round up the + * total to (q_num * 2) number of events. + */ +#define SFC_VDPA_SW_RELAY_EVENT_NUM(q_num) (q_num * 2) + +static inline uint64_t +sfc_vdpa_encode_ev_data(int type, uint32_t qid, int fd) +{ + SFC_VDPA_ASSERT(fd > UINT32_MAX || qid > UINT32_MAX / 2); + return type | (qid << 1) | (uint64_t)fd << 32; +} + +static inline void +sfc_vdpa_queue_relay(struct sfc_vdpa_ops_data *ops_data, uint32_t qid) +{ + rte_vdpa_relay_vring_used(ops_data->vid, qid, &ops_data->sw_vq[qid]); + rte_vhost_vring_call(ops_data->vid, qid); +} + +static void* +sfc_vdpa_sw_relay(void *data) +{ + uint64_t buf; + uint32_t qid, q_num; + struct epoll_event ev; + struct rte_vhost_vring vring; + int nbytes, i, ret, fd, epfd, nfds = 0; + struct epoll_event events[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + struct sfc_vdpa_ops_data *ops_data = (struct sfc_vdpa_ops_data *)data; + + q_num = rte_vhost_get_vring_num(ops_data->vid); + epfd = epoll_create(SFC_VDPA_SW_RELAY_EVENT_NUM(q_num)); + if (epfd < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "failed to create epoll instance"); + goto fail_epoll; + } + ops_data->epfd = epfd; + + vring.kickfd = -1; + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + ret = rte_vhost_get_vhost_vring(ops_data->vid, qid, &vring); + if (ret != 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "rte_vhost_get_vhost_vring error %s", + strerror(errno)); + goto fail_vring; + } + + ev.data.u64 = sfc_vdpa_encode_ev_data(0, qid, vring.kickfd); + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll add error: %s", + strerror(errno)); + goto fail_epoll_add; + } + } + + /* + * Register intr_fd created by vDPA driver in lieu of qemu's callfd + * to intercept rx queue notification. So that we can monitor rx + * notifications and issue rte_vdpa_relay_vring_used() + */ + for (qid = 0; qid < q_num; qid += 2) { + fd = ops_data->intr_fd[qid]; + ev.events = EPOLLIN | EPOLLPRI; + ev.data.u64 = sfc_vdpa_encode_ev_data(1, qid, fd); + if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) < 0) { + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll add error: %s", + strerror(errno)); + goto fail_epoll_add; + } + sfc_vdpa_queue_relay(ops_data, qid); + } + + /* + * virtio driver in VM was continuously sending queue notifications + * while were setting up software vrings and hence the HW misses + * these doorbell notifications. Since, it is safe to send duplicate + * doorbell, send another doorbell from vDPA driver. + */ + for (qid = 0; qid < q_num; qid++) + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); + + for (;;) { + nfds = epoll_wait(epfd, events, + SFC_VDPA_SW_RELAY_EVENT_NUM(q_num), -1); + if (nfds < 0) { + if (errno == EINTR) + continue; + sfc_vdpa_log_init(ops_data->dev_handle, + "epoll_wait return fail\n"); + goto fail_epoll_wait; + } + + for (i = 0; i < nfds; i++) { + fd = SFC_VDPA_DECODE_FD(events[i].data); + /* Ensure kickfd is not busy before proceeding */ + for (;;) { + nbytes = read(fd, &buf, 8); + if (nbytes < 0) { + if (errno == EINTR || + errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + } + break; + } + + qid = SFC_VDPA_DECODE_QID(events[i].data); + if (SFC_VDPA_DECODE_EV_TYPE(events[i].data)) + sfc_vdpa_queue_relay(ops_data, qid); + else + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); + } + } + + return NULL; + +fail_epoll: +fail_vring: +fail_epoll_add: +fail_epoll_wait: + close(epfd); + ops_data->epfd = -1; + return NULL; +} + static int sfc_vdpa_get_device_features(struct sfc_vdpa_ops_data *ops_data) { @@ -99,7 +240,7 @@ static int sfc_vdpa_enable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) { - int rc; + int rc, fd; int *irq_fd_ptr; int vfio_dev_fd; uint32_t i, num_vring; @@ -131,6 +272,17 @@ return -1; irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; + if (ops_data->sw_fallback_mode && !(i & 1)) { + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + sfc_vdpa_err(ops_data->dev_handle, + "failed to create eventfd"); + goto fail_eventfd; + } + ops_data->intr_fd[i] = fd; + irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; + } else + ops_data->intr_fd[i] = -1; } rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -138,16 +290,26 @@ sfc_vdpa_err(ops_data->dev_handle, "error enabling MSI-X interrupts: %s", strerror(errno)); - return -1; + goto fail_ioctl; } return 0; + +fail_ioctl: +fail_eventfd: + for (i = 0; i < num_vring; i++) { + if (ops_data->intr_fd[i] != -1) { + close(ops_data->intr_fd[i]); + ops_data->intr_fd[i] = -1; + } + } + return -1; } static int sfc_vdpa_disable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) { - int rc; + int rc, i; int vfio_dev_fd; struct vfio_irq_set irq_set; void *dev; @@ -161,6 +323,12 @@ irq_set.index = VFIO_PCI_MSIX_IRQ_INDEX; irq_set.start = 0; + for (i = 0; i < ops_data->vq_count; i++) { + if (ops_data->intr_fd[i] >= 0) + close(ops_data->intr_fd[i]); + ops_data->intr_fd[i] = -1; + } + rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, &irq_set); if (rc) { sfc_vdpa_err(ops_data->dev_handle, @@ -223,12 +391,15 @@ static int sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) { - int rc; + int rc, fd; + uint64_t size; uint32_t doorbell; efx_virtio_vq_t *vq; + void *vring_buf, *dev; struct sfc_vdpa_vring_info vring; efx_virtio_vq_cfg_t vq_cfg; efx_virtio_vq_dyncfg_t vq_dyncfg; + uint64_t sw_vq_iova = ops_data->sw_vq_iova; vq = ops_data->vq_cxt[vq_num].vq; if (vq == NULL) @@ -241,6 +412,33 @@ goto fail_vring_info; } + if (ops_data->sw_fallback_mode) { + size = vring_size(vring.size, rte_mem_page_size()); + size = RTE_ALIGN_CEIL(size, rte_mem_page_size()); + vring_buf = rte_zmalloc("vdpa", size, rte_mem_page_size()); + vring_init(&ops_data->sw_vq[vq_num], vring.size, vring_buf, + rte_mem_page_size()); + + dev = ops_data->dev_handle; + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; + rc = rte_vfio_container_dma_map(fd, + (uint64_t)(uintptr_t)vring_buf, + sw_vq_iova, size); + + /* Direct I/O for Tx queue, relay for Rx queue */ + if (!(vq_num & 1)) + vring.used = sw_vq_iova + + (char *)ops_data->sw_vq[vq_num].used - + (char *)ops_data->sw_vq[vq_num].desc; + + ops_data->sw_vq[vq_num].used->idx = vring.last_used_idx; + ops_data->sw_vq[vq_num].avail->idx = vring.last_avail_idx; + + ops_data->vq_cxt[vq_num].sw_vq_iova = sw_vq_iova; + ops_data->vq_cxt[vq_num].sw_vq_size = size; + ops_data->sw_vq_iova += size; + } + vq_cfg.evvc_target_vf = SFC_VDPA_VF_NULL; /* even virtqueue for RX and odd for TX */ @@ -309,9 +507,12 @@ static int sfc_vdpa_virtq_stop(struct sfc_vdpa_ops_data *ops_data, int vq_num) { - int rc; + int rc, fd; + void *dev, *buf; + uint64_t size, len, iova; efx_virtio_vq_dyncfg_t vq_idx; efx_virtio_vq_t *vq; + struct rte_vhost_vring vring; if (ops_data->vq_cxt[vq_num].enable != B_TRUE) return -1; @@ -320,12 +521,34 @@ if (vq == NULL) return -1; + if (ops_data->sw_fallback_mode) { + dev = ops_data->dev_handle; + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; + /* synchronize remaining new used entries if any */ + if (!(vq_num & 1)) + sfc_vdpa_queue_relay(ops_data, vq_num); + + rte_vhost_get_vhost_vring(ops_data->vid, vq_num, &vring); + len = SFC_VDPA_USED_RING_LEN(vring.size); + rte_vhost_log_used_vring(ops_data->vid, vq_num, 0, len); + + buf = ops_data->sw_vq[vq_num].desc; + size = ops_data->vq_cxt[vq_num].sw_vq_size; + iova = ops_data->vq_cxt[vq_num].sw_vq_iova; + rte_vfio_container_dma_unmap(fd, (uint64_t)(uintptr_t)buf, + iova, size); + } + /* stop the vq */ rc = efx_virtio_qstop(vq, &vq_idx); if (rc == 0) { - ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_used_idx; - ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_avail_idx; + if (ops_data->sw_fallback_mode) + vq_idx.evvd_vq_avail_idx = vq_idx.evvd_vq_used_idx; + rte_vhost_set_vring_base(ops_data->vid, vq_num, + vq_idx.evvd_vq_avail_idx, + vq_idx.evvd_vq_used_idx); } + ops_data->vq_cxt[vq_num].enable = B_FALSE; return rc; @@ -450,7 +673,11 @@ SFC_EFX_ASSERT(ops_data->state == SFC_VDPA_STATE_CONFIGURED); - sfc_vdpa_log_init(ops_data->dev_handle, "entry"); + if (ops_data->sw_fallback_mode) { + sfc_vdpa_log_init(ops_data->dev_handle, + "Trying to start VDPA with SW I/O relay"); + ops_data->sw_vq_iova = SFC_SW_VRING_IOVA; + } ops_data->state = SFC_VDPA_STATE_STARTING; @@ -675,6 +902,7 @@ sfc_vdpa_dev_close(int vid) { int ret; + void *status; struct rte_vdpa_device *vdpa_dev; struct sfc_vdpa_ops_data *ops_data; @@ -707,7 +935,23 @@ } ops_data->is_notify_thread_started = false; + if (ops_data->sw_fallback_mode) { + ret = pthread_cancel(ops_data->sw_relay_thread_id); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to cancel LM relay thread: %s", + rte_strerror(ret)); + + ret = pthread_join(ops_data->sw_relay_thread_id, &status); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to join LM relay thread: %s", + rte_strerror(ret)); + } + sfc_vdpa_stop(ops_data); + ops_data->sw_fallback_mode = false; + sfc_vdpa_close(ops_data); sfc_vdpa_adapter_unlock(ops_data->dev_handle); @@ -774,9 +1018,49 @@ static int sfc_vdpa_set_features(int vid) { - RTE_SET_USED(vid); + int ret; + uint64_t features = 0; + struct rte_vdpa_device *vdpa_dev; + struct sfc_vdpa_ops_data *ops_data; - return -1; + vdpa_dev = rte_vhost_get_vdpa_device(vid); + ops_data = sfc_vdpa_get_data_by_dev(vdpa_dev); + if (ops_data == NULL) + return -1; + + rte_vhost_get_negotiated_features(vid, &features); + + if (!RTE_VHOST_NEED_LOG(features)) + return -1; + + sfc_vdpa_info(ops_data->dev_handle, "live-migration triggered"); + + sfc_vdpa_adapter_lock(ops_data->dev_handle); + + /* Stop HW Offload and unset host notifier */ + sfc_vdpa_stop(ops_data); + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false) != 0) + sfc_vdpa_info(ops_data->dev_handle, + "vDPA (%s): Failed to clear host notifier", + ops_data->vdpa_dev->device->name); + + /* Restart vDPA with SW relay on RX queue */ + ops_data->sw_fallback_mode = true; + sfc_vdpa_start(ops_data); + ret = pthread_create(&ops_data->sw_relay_thread_id, NULL, + sfc_vdpa_sw_relay, (void *)ops_data); + if (ret != 0) + sfc_vdpa_err(ops_data->dev_handle, + "failed to create rx_relay thread: %s", + rte_strerror(ret)); + + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0) + sfc_vdpa_info(ops_data->dev_handle, "notifier setup failed!"); + + sfc_vdpa_adapter_unlock(ops_data->dev_handle); + sfc_vdpa_info(ops_data->dev_handle, "SW fallback setup done!"); + + return 0; } static int @@ -860,17 +1144,28 @@ sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, *offset); - pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; - doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; + if (!ops_data->sw_fallback_mode) { + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + + *offset; + /* + * virtio-net driver in VM sends queue notifications before + * vDPA has a chance to setup the queues and notification area, + * and hence the HW misses these doorbell notifications. + * Since, it is safe to send duplicate doorbell, send another + * doorbell from vDPA driver as workaround for this timing issue + */ + rte_write16(qid, doorbell); + + /* + * Update doorbell address, it will come in handy during + * live-migration. + */ + ops_data->vq_cxt[qid].doorbell = doorbell; + } - /* - * virtio-net driver in VM sends queue notifications before - * vDPA has a chance to setup the queues and notification area, - * and hence the HW misses these doorbell notifications. - * Since, it is safe to send duplicate doorbell, send another - * doorbell from vDPA driver as workaround for this timing issue. - */ - rte_write16(qid, doorbell); + sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, + *offset); return 0; } diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.h b/drivers/vdpa/sfc/sfc_vdpa_ops.h index 5c8e352..dd301ba 100644 --- a/drivers/vdpa/sfc/sfc_vdpa_ops.h +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.h @@ -6,8 +6,11 @@ #define _SFC_VDPA_OPS_H #include <rte_vdpa.h> +#include <vdpa_driver.h> #define SFC_VDPA_MAX_QUEUE_PAIRS 8 +#define SFC_VDPA_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) enum sfc_vdpa_context { SFC_VDPA_AS_VF @@ -37,9 +40,10 @@ struct sfc_vdpa_vring_info { typedef struct sfc_vdpa_vq_context_s { volatile void *doorbell; uint8_t enable; - uint32_t pidx; - uint32_t cidx; efx_virtio_vq_t *vq; + + uint64_t sw_vq_iova; + uint64_t sw_vq_size; } sfc_vdpa_vq_context_t; struct sfc_vdpa_ops_data { @@ -57,6 +61,13 @@ struct sfc_vdpa_ops_data { uint16_t vq_count; struct sfc_vdpa_vq_context_s vq_cxt[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + + int epfd; + uint64_t sw_vq_iova; + bool sw_fallback_mode; + pthread_t sw_relay_thread_id; + struct vring sw_vq[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; + int intr_fd[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; }; struct sfc_vdpa_ops_data * -- 1.8.3.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration 2022-07-14 13:48 ` [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini @ 2022-07-28 13:42 ` Andrew Rybchenko 0 siblings, 0 replies; 17+ messages in thread From: Andrew Rybchenko @ 2022-07-28 13:42 UTC (permalink / raw) To: abhimanyu.saini, dev; +Cc: chenbo.xia, maxime.coquelin, Abhimanyu Saini On 7/14/22 16:48, abhimanyu.saini@xilinx.com wrote: > From: Abhimanyu Saini <absaini@amd.com> > > In SW assisted live migration, vDPA driver will stop all virtqueues > and setup up SW vrings to relay the communication between the > virtio driver and the vDPA device using an event driven relay thread > > This will allow vDPA driver to help on guest dirty page logging for > live migration. > > Signed-off-by: Abhimanyu Saini <absaini@amd.com> > --- > v2: > * Fix checkpatch warnings > * Add a cover letter > v3: > * Restructure patchset > > drivers/vdpa/sfc/sfc_vdpa.h | 1 + > drivers/vdpa/sfc/sfc_vdpa_ops.c | 337 +++++++++++++++++++++++++++++++++++++--- > drivers/vdpa/sfc/sfc_vdpa_ops.h | 15 +- > 3 files changed, 330 insertions(+), 23 deletions(-) > > diff --git a/drivers/vdpa/sfc/sfc_vdpa.h b/drivers/vdpa/sfc/sfc_vdpa.h > index daeb27d..ae522ca 100644 > --- a/drivers/vdpa/sfc/sfc_vdpa.h > +++ b/drivers/vdpa/sfc/sfc_vdpa.h > @@ -18,6 +18,7 @@ > > #define SFC_VDPA_MAC_ADDR "mac" > #define SFC_VDPA_DEFAULT_MCDI_IOVA 0x200000000000 > +#define SFC_SW_VRING_IOVA 0x300000000000 > > /* Broadcast & Unicast MAC filters are supported */ > #define SFC_MAX_SUPPORTED_FILTERS 3 > diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c > index 6401d4e..1d29ee7 100644 > --- a/drivers/vdpa/sfc/sfc_vdpa_ops.c > +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c > @@ -4,10 +4,13 @@ > > #include <pthread.h> > #include <unistd.h> > +#include <sys/epoll.h> > #include <sys/ioctl.h> > > +#include <rte_eal_paging.h> > #include <rte_errno.h> > #include <rte_malloc.h> > +#include <rte_memory.h> > #include <rte_vdpa.h> > #include <rte_vfio.h> > #include <rte_vhost.h> > @@ -33,7 +36,9 @@ > */ > #define SFC_VDPA_DEFAULT_FEATURES \ > ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ > - (1ULL << VIRTIO_NET_F_MQ)) > + (1ULL << VIRTIO_NET_F_MQ) | \ > + (1ULL << VHOST_F_LOG_ALL) | \ > + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE)) > > #define SFC_VDPA_MSIX_IRQ_SET_BUF_LEN \ > (sizeof(struct vfio_irq_set) + \ > @@ -42,6 +47,142 @@ > /* It will be used for target VF when calling function is not PF */ > #define SFC_VDPA_VF_NULL 0xFFFF > > +#define SFC_VDPA_DECODE_FD(data) (data.u64 >> 32) Macro argument should be enclosed in paranthesis to be on a safe side taking various operators prioririties into account. ((data).u64 >> 32). Same below. > +#define SFC_VDPA_DECODE_QID(data) (data.u32 >> 1) > +#define SFC_VDPA_DECODE_EV_TYPE(data) (data.u32 & 1) > + > +/* > + * Create q_num number of epoll events for kickfd interrupts > + * and q_num/2 events for callfd interrupts. Round up the > + * total to (q_num * 2) number of events. > + */ > +#define SFC_VDPA_SW_RELAY_EVENT_NUM(q_num) (q_num * 2) Enclose macro argument in parenthesis. > + > +static inline uint64_t > +sfc_vdpa_encode_ev_data(int type, uint32_t qid, int fd) > +{ > + SFC_VDPA_ASSERT(fd > UINT32_MAX || qid > UINT32_MAX / 2); fd is a signed integer, but it is compared vs define with unsigned value. Don't we need to ensure that fd is really non-negative? > + return type | (qid << 1) | (uint64_t)fd << 32; > +} > + > +static inline void > +sfc_vdpa_queue_relay(struct sfc_vdpa_ops_data *ops_data, uint32_t qid) > +{ > + rte_vdpa_relay_vring_used(ops_data->vid, qid, &ops_data->sw_vq[qid]); > + rte_vhost_vring_call(ops_data->vid, qid); > +} > + > +static void* missing space between void and * > +sfc_vdpa_sw_relay(void *data) > +{ > + uint64_t buf; > + uint32_t qid, q_num; > + struct epoll_event ev; > + struct rte_vhost_vring vring; > + int nbytes, i, ret, fd, epfd, nfds = 0; So long list of variables in one line is not readable. It is better to move some variables below in loop bodies below to make it block local. > + struct epoll_event events[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; > + struct sfc_vdpa_ops_data *ops_data = (struct sfc_vdpa_ops_data *)data; > + > + q_num = rte_vhost_get_vring_num(ops_data->vid); > + epfd = epoll_create(SFC_VDPA_SW_RELAY_EVENT_NUM(q_num)); > + if (epfd < 0) { > + sfc_vdpa_log_init(ops_data->dev_handle, > + "failed to create epoll instance"); > + goto fail_epoll; > + } > + ops_data->epfd = epfd; > + > + vring.kickfd = -1; > + for (qid = 0; qid < q_num; qid++) { > + ev.events = EPOLLIN | EPOLLPRI; > + ret = rte_vhost_get_vhost_vring(ops_data->vid, qid, &vring); > + if (ret != 0) { > + sfc_vdpa_log_init(ops_data->dev_handle, > + "rte_vhost_get_vhost_vring error %s", > + strerror(errno)); > + goto fail_vring; > + } > + > + ev.data.u64 = sfc_vdpa_encode_ev_data(0, qid, vring.kickfd); > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { > + sfc_vdpa_log_init(ops_data->dev_handle, > + "epoll add error: %s", > + strerror(errno)); > + goto fail_epoll_add; > + } > + } > + > + /* > + * Register intr_fd created by vDPA driver in lieu of qemu's callfd > + * to intercept rx queue notification. So that we can monitor rx > + * notifications and issue rte_vdpa_relay_vring_used() > + */ > + for (qid = 0; qid < q_num; qid += 2) { > + fd = ops_data->intr_fd[qid]; > + ev.events = EPOLLIN | EPOLLPRI; > + ev.data.u64 = sfc_vdpa_encode_ev_data(1, qid, fd); > + if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) < 0) { > + sfc_vdpa_log_init(ops_data->dev_handle, > + "epoll add error: %s", > + strerror(errno)); > + goto fail_epoll_add; > + } > + sfc_vdpa_queue_relay(ops_data, qid); > + } > + > + /* > + * virtio driver in VM was continuously sending queue notifications > + * while were setting up software vrings and hence the HW misses > + * these doorbell notifications. Since, it is safe to send duplicate > + * doorbell, send another doorbell from vDPA driver. > + */ > + for (qid = 0; qid < q_num; qid++) > + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); > + > + for (;;) { Is it forever loop? Why? If it is intended, it should be highlighted in comments. > + nfds = epoll_wait(epfd, events, > + SFC_VDPA_SW_RELAY_EVENT_NUM(q_num), -1); > + if (nfds < 0) { > + if (errno == EINTR) > + continue; > + sfc_vdpa_log_init(ops_data->dev_handle, > + "epoll_wait return fail\n"); > + goto fail_epoll_wait; > + } > + > + for (i = 0; i < nfds; i++) { > + fd = SFC_VDPA_DECODE_FD(events[i].data); > + /* Ensure kickfd is not busy before proceeding */ > + for (;;) { > + nbytes = read(fd, &buf, 8); > + if (nbytes < 0) { > + if (errno == EINTR || > + errno == EWOULDBLOCK || > + errno == EAGAIN) > + continue; > + } > + break; > + } I think do { } while is a better constract above. It would be easier to read and understand. > + > + qid = SFC_VDPA_DECODE_QID(events[i].data); > + if (SFC_VDPA_DECODE_EV_TYPE(events[i].data)) > + sfc_vdpa_queue_relay(ops_data, qid); > + else > + rte_write16(qid, ops_data->vq_cxt[qid].doorbell); > + } > + } > + > + return NULL; > + > +fail_epoll: > +fail_vring: > +fail_epoll_add: > +fail_epoll_wait: > + close(epfd); May be it is better to avoid close() call with negative input paramter? > + ops_data->epfd = -1; > + return NULL; > +} > + > static int > sfc_vdpa_get_device_features(struct sfc_vdpa_ops_data *ops_data) > { > @@ -99,7 +240,7 @@ > static int > sfc_vdpa_enable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) > { > - int rc; > + int rc, fd; > int *irq_fd_ptr; > int vfio_dev_fd; > uint32_t i, num_vring; > @@ -131,6 +272,17 @@ > return -1; > > irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; > + if (ops_data->sw_fallback_mode && !(i & 1)) { (i & 1) considition should be wrapper in a tiny function to make it clear the meaning of the check. > + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > + if (fd < 0) { > + sfc_vdpa_err(ops_data->dev_handle, > + "failed to create eventfd"); > + goto fail_eventfd; > + } > + ops_data->intr_fd[i] = fd; > + irq_fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; > + } else > + ops_data->intr_fd[i] = -1; Since if body uses curly brackets, please, use curly brackets in else body as well regardles number of lines/statements. > } > > rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > @@ -138,16 +290,26 @@ > sfc_vdpa_err(ops_data->dev_handle, > "error enabling MSI-X interrupts: %s", > strerror(errno)); > - return -1; > + goto fail_ioctl; > } > > return 0; > + > +fail_ioctl: > +fail_eventfd: > + for (i = 0; i < num_vring; i++) { > + if (ops_data->intr_fd[i] != -1) { Why do you use different condition for bad FD? -1 exactly here, but just negative below. It is better to be consistent. > + close(ops_data->intr_fd[i]); > + ops_data->intr_fd[i] = -1; > + } > + } > + return -1; > } > > static int > sfc_vdpa_disable_vfio_intr(struct sfc_vdpa_ops_data *ops_data) > { > - int rc; > + int rc, i; > int vfio_dev_fd; > struct vfio_irq_set irq_set; > void *dev; > @@ -161,6 +323,12 @@ > irq_set.index = VFIO_PCI_MSIX_IRQ_INDEX; > irq_set.start = 0; > > + for (i = 0; i < ops_data->vq_count; i++) { > + if (ops_data->intr_fd[i] >= 0) here (see above) > + close(ops_data->intr_fd[i]); > + ops_data->intr_fd[i] = -1; > + } > + > rc = ioctl(vfio_dev_fd, VFIO_DEVICE_SET_IRQS, &irq_set); > if (rc) { > sfc_vdpa_err(ops_data->dev_handle, > @@ -223,12 +391,15 @@ > static int > sfc_vdpa_virtq_start(struct sfc_vdpa_ops_data *ops_data, int vq_num) > { > - int rc; > + int rc, fd; > + uint64_t size; > uint32_t doorbell; > efx_virtio_vq_t *vq; > + void *vring_buf, *dev; > struct sfc_vdpa_vring_info vring; > efx_virtio_vq_cfg_t vq_cfg; > efx_virtio_vq_dyncfg_t vq_dyncfg; > + uint64_t sw_vq_iova = ops_data->sw_vq_iova; > > vq = ops_data->vq_cxt[vq_num].vq; > if (vq == NULL) > @@ -241,6 +412,33 @@ > goto fail_vring_info; > } > > + if (ops_data->sw_fallback_mode) { > + size = vring_size(vring.size, rte_mem_page_size()); > + size = RTE_ALIGN_CEIL(size, rte_mem_page_size()); > + vring_buf = rte_zmalloc("vdpa", size, rte_mem_page_size()); > + vring_init(&ops_data->sw_vq[vq_num], vring.size, vring_buf, > + rte_mem_page_size()); > + > + dev = ops_data->dev_handle; > + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; > + rc = rte_vfio_container_dma_map(fd, > + (uint64_t)(uintptr_t)vring_buf, > + sw_vq_iova, size); > + > + /* Direct I/O for Tx queue, relay for Rx queue */ > + if (!(vq_num & 1)) Use tiny function (see above). > + vring.used = sw_vq_iova + > + (char *)ops_data->sw_vq[vq_num].used - > + (char *)ops_data->sw_vq[vq_num].desc; > + > + ops_data->sw_vq[vq_num].used->idx = vring.last_used_idx; > + ops_data->sw_vq[vq_num].avail->idx = vring.last_avail_idx; > + > + ops_data->vq_cxt[vq_num].sw_vq_iova = sw_vq_iova; > + ops_data->vq_cxt[vq_num].sw_vq_size = size; > + ops_data->sw_vq_iova += size; > + } > + > vq_cfg.evvc_target_vf = SFC_VDPA_VF_NULL; > > /* even virtqueue for RX and odd for TX */ > @@ -309,9 +507,12 @@ > static int > sfc_vdpa_virtq_stop(struct sfc_vdpa_ops_data *ops_data, int vq_num) > { > - int rc; > + int rc, fd; > + void *dev, *buf; > + uint64_t size, len, iova; > efx_virtio_vq_dyncfg_t vq_idx; > efx_virtio_vq_t *vq; > + struct rte_vhost_vring vring; > > if (ops_data->vq_cxt[vq_num].enable != B_TRUE) > return -1; > @@ -320,12 +521,34 @@ > if (vq == NULL) > return -1; > > + if (ops_data->sw_fallback_mode) { > + dev = ops_data->dev_handle; > + fd = sfc_vdpa_adapter_by_dev_handle(dev)->vfio_container_fd; > + /* synchronize remaining new used entries if any */ > + if (!(vq_num & 1)) same here > + sfc_vdpa_queue_relay(ops_data, vq_num); > + > + rte_vhost_get_vhost_vring(ops_data->vid, vq_num, &vring); > + len = SFC_VDPA_USED_RING_LEN(vring.size); > + rte_vhost_log_used_vring(ops_data->vid, vq_num, 0, len); > + > + buf = ops_data->sw_vq[vq_num].desc; > + size = ops_data->vq_cxt[vq_num].sw_vq_size; > + iova = ops_data->vq_cxt[vq_num].sw_vq_iova; > + rte_vfio_container_dma_unmap(fd, (uint64_t)(uintptr_t)buf, > + iova, size); > + } > + > /* stop the vq */ > rc = efx_virtio_qstop(vq, &vq_idx); > if (rc == 0) { > - ops_data->vq_cxt[vq_num].cidx = vq_idx.evvd_vq_used_idx; > - ops_data->vq_cxt[vq_num].pidx = vq_idx.evvd_vq_avail_idx; > + if (ops_data->sw_fallback_mode) > + vq_idx.evvd_vq_avail_idx = vq_idx.evvd_vq_used_idx; > + rte_vhost_set_vring_base(ops_data->vid, vq_num, > + vq_idx.evvd_vq_avail_idx, > + vq_idx.evvd_vq_used_idx); > } > + > ops_data->vq_cxt[vq_num].enable = B_FALSE; > > return rc; > @@ -450,7 +673,11 @@ > > SFC_EFX_ASSERT(ops_data->state == SFC_VDPA_STATE_CONFIGURED); > > - sfc_vdpa_log_init(ops_data->dev_handle, "entry"); > + if (ops_data->sw_fallback_mode) { > + sfc_vdpa_log_init(ops_data->dev_handle, > + "Trying to start VDPA with SW I/O relay"); > + ops_data->sw_vq_iova = SFC_SW_VRING_IOVA; > + } > > ops_data->state = SFC_VDPA_STATE_STARTING; > > @@ -675,6 +902,7 @@ > sfc_vdpa_dev_close(int vid) > { > int ret; > + void *status; > struct rte_vdpa_device *vdpa_dev; > struct sfc_vdpa_ops_data *ops_data; > > @@ -707,7 +935,23 @@ > } > ops_data->is_notify_thread_started = false; > > + if (ops_data->sw_fallback_mode) { > + ret = pthread_cancel(ops_data->sw_relay_thread_id); > + if (ret != 0) > + sfc_vdpa_err(ops_data->dev_handle, > + "failed to cancel LM relay thread: %s", > + rte_strerror(ret)); > + > + ret = pthread_join(ops_data->sw_relay_thread_id, &status); > + if (ret != 0) > + sfc_vdpa_err(ops_data->dev_handle, > + "failed to join LM relay thread: %s", > + rte_strerror(ret)); > + } > + > sfc_vdpa_stop(ops_data); > + ops_data->sw_fallback_mode = false; > + > sfc_vdpa_close(ops_data); > > sfc_vdpa_adapter_unlock(ops_data->dev_handle); > @@ -774,9 +1018,49 @@ > static int > sfc_vdpa_set_features(int vid) > { > - RTE_SET_USED(vid); > + int ret; > + uint64_t features = 0; > + struct rte_vdpa_device *vdpa_dev; > + struct sfc_vdpa_ops_data *ops_data; > > - return -1; > + vdpa_dev = rte_vhost_get_vdpa_device(vid); > + ops_data = sfc_vdpa_get_data_by_dev(vdpa_dev); > + if (ops_data == NULL) > + return -1; > + > + rte_vhost_get_negotiated_features(vid, &features); > + > + if (!RTE_VHOST_NEED_LOG(features)) > + return -1; > + > + sfc_vdpa_info(ops_data->dev_handle, "live-migration triggered"); > + > + sfc_vdpa_adapter_lock(ops_data->dev_handle); > + > + /* Stop HW Offload and unset host notifier */ > + sfc_vdpa_stop(ops_data); > + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false) != 0) > + sfc_vdpa_info(ops_data->dev_handle, > + "vDPA (%s): Failed to clear host notifier", > + ops_data->vdpa_dev->device->name); > + > + /* Restart vDPA with SW relay on RX queue */ > + ops_data->sw_fallback_mode = true; > + sfc_vdpa_start(ops_data); > + ret = pthread_create(&ops_data->sw_relay_thread_id, NULL, > + sfc_vdpa_sw_relay, (void *)ops_data); > + if (ret != 0) > + sfc_vdpa_err(ops_data->dev_handle, > + "failed to create rx_relay thread: %s", > + rte_strerror(ret)); > + > + if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0) > + sfc_vdpa_info(ops_data->dev_handle, "notifier setup failed!"); > + > + sfc_vdpa_adapter_unlock(ops_data->dev_handle); > + sfc_vdpa_info(ops_data->dev_handle, "SW fallback setup done!"); > + > + return 0; > } > > static int > @@ -860,17 +1144,28 @@ > sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, > *offset); > > - pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; > - doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset; > + if (!ops_data->sw_fallback_mode) { > + pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev; > + doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + > + *offset; > + /* > + * virtio-net driver in VM sends queue notifications before > + * vDPA has a chance to setup the queues and notification area, > + * and hence the HW misses these doorbell notifications. > + * Since, it is safe to send duplicate doorbell, send another > + * doorbell from vDPA driver as workaround for this timing issue > + */ > + rte_write16(qid, doorbell); > + > + /* > + * Update doorbell address, it will come in handy during > + * live-migration. > + */ > + ops_data->vq_cxt[qid].doorbell = doorbell; > + } > > - /* > - * virtio-net driver in VM sends queue notifications before > - * vDPA has a chance to setup the queues and notification area, > - * and hence the HW misses these doorbell notifications. > - * Since, it is safe to send duplicate doorbell, send another > - * doorbell from vDPA driver as workaround for this timing issue. > - */ > - rte_write16(qid, doorbell); > + sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64, > + *offset); > > return 0; > } > diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.h b/drivers/vdpa/sfc/sfc_vdpa_ops.h > index 5c8e352..dd301ba 100644 > --- a/drivers/vdpa/sfc/sfc_vdpa_ops.h > +++ b/drivers/vdpa/sfc/sfc_vdpa_ops.h > @@ -6,8 +6,11 @@ > #define _SFC_VDPA_OPS_H > > #include <rte_vdpa.h> > +#include <vdpa_driver.h> > > #define SFC_VDPA_MAX_QUEUE_PAIRS 8 > +#define SFC_VDPA_USED_RING_LEN(size) \ > + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) > > enum sfc_vdpa_context { > SFC_VDPA_AS_VF > @@ -37,9 +40,10 @@ struct sfc_vdpa_vring_info { > typedef struct sfc_vdpa_vq_context_s { > volatile void *doorbell; > uint8_t enable; > - uint32_t pidx; > - uint32_t cidx; > efx_virtio_vq_t *vq; > + > + uint64_t sw_vq_iova; > + uint64_t sw_vq_size; > } sfc_vdpa_vq_context_t; > > struct sfc_vdpa_ops_data { > @@ -57,6 +61,13 @@ struct sfc_vdpa_ops_data { > > uint16_t vq_count; > struct sfc_vdpa_vq_context_s vq_cxt[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; > + > + int epfd; > + uint64_t sw_vq_iova; > + bool sw_fallback_mode; > + pthread_t sw_relay_thread_id; > + struct vring sw_vq[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; > + int intr_fd[SFC_VDPA_MAX_QUEUE_PAIRS * 2]; > }; > > struct sfc_vdpa_ops_data * ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini ` (4 preceding siblings ...) 2022-07-14 13:48 ` [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini @ 2022-10-04 15:31 ` Andrew Rybchenko 5 siblings, 0 replies; 17+ messages in thread From: Andrew Rybchenko @ 2022-10-04 15:31 UTC (permalink / raw) To: abhimanyu.saini, dev; +Cc: chenbo.xia, maxime.coquelin, Abhimanyu Saini On 7/14/22 16:47, abhimanyu.saini@xilinx.com wrote: > From: Abhimanyu Saini <absaini@amd.com> > > In SW assisted live migration, vDPA driver will stop all virtqueues > and setup up SW vrings to relay the communication between the > virtio driver and the vDPA device using an event driven relay thread > This will allow vDPA driver to help on guest dirty page logging for > live migration. > > Abhimanyu Saini (5): > common/sfc_efx/base: remove VQ index check during VQ start > common/sfc_efx/base: update MCDI headers > common/sfc_efx/base: use the updated definitions of cidx/pidx > vdpa/sfc: enable support for multi-queue > vdpa/sfc: Add support for SW assisted live migration > > drivers/common/sfc_efx/base/efx.h | 12 +- > drivers/common/sfc_efx/base/efx_regs_mcdi.h | 36 +- > drivers/common/sfc_efx/base/rhead_virtio.c | 28 +- > drivers/vdpa/sfc/sfc_vdpa.h | 1 + > drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 + > drivers/vdpa/sfc/sfc_vdpa_ops.c | 345 ++++++++++++++++++-- > drivers/vdpa/sfc/sfc_vdpa_ops.h | 17 +- > 7 files changed, 378 insertions(+), 63 deletions(-) > Patch 4/5 requires review notes processing. Applied without the 4/5 patch to dpdk-next-net/main, thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-10-04 15:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-14 8:44 [PATCH v2 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini 2022-07-14 8:44 ` [PATCH v2 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini 2022-07-14 13:47 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 1/5] common/sfc_efx/base: remove VQ index check during VQ start abhimanyu.saini 2022-07-14 13:48 ` [PATCH v3 2/5] common/sfc_efx/base: update MCDI headers abhimanyu.saini 2022-07-28 11:32 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 3/5] common/sfc_efx/base: use the updated definitions of cidx/pidx abhimanyu.saini 2022-07-28 11:34 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 4/5] vdpa/sfc: enable support for multi-queue abhimanyu.saini 2022-07-28 11:29 ` Andrew Rybchenko 2022-07-14 13:48 ` [PATCH v3 5/5] vdpa/sfc: Add support for SW assisted live migration abhimanyu.saini 2022-07-28 13:42 ` Andrew Rybchenko 2022-10-04 15:31 ` [PATCH v3 0/5] Add support for live migration and cleanup MCDI headers Andrew Rybchenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).