cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev

DPDK patches and discussions
 help / color / mirror / Atom feed

* cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev_action
@ 2022-02-02 16:08 Changchun Zhang
  2022-02-03  4:41 ` Changchun Zhang
  0 siblings, 1 reply; 2+ messages in thread
From: Changchun Zhang @ 2022-02-02 16:08 UTC (permalink / raw)
  To: users, dev

[-- Attachment #1: Type: text/plain, Size: 3193 bytes --]

Hi,

Has anyone noticed that crypto_aesni_mb virtual crypto device has issue of memory crash caused by the scanning and probe on secondary process. Can anyone cast any lights on it.
What I encountered is:
On the primary process, the crypto_aesni_mb device is probed and created successfully and I got the mb_mgr set in the device private data. But during the packet process, the application crashes on accessing the mb_mgr. The deugging shows this mb_mgr address has been changed to an invalid address (non-NULL). Further digging shows this memory contamination occurs after the vdev_action replies the scan request.
In below code, the crash is gone by either disable sending message on VDEV_SCAN_REQ or skip processing the VDEV_SCAN_ONE. It seems the insert_vdev() on secondary process triggers another probe and break the existing device data?
It is also noticed there was an issue which was fixed by this patch https://review.spdk.io/gerrit/c/spdk/dpdk/+/1056 but this patch is cancelled. This patch was complaining the similar memory issue found during scanning and probing on the secondary process.

static int
vdev_action(const struct rte_mp_msg *mp_msg, const void *peer)
{
     struct rte_vdev_device *dev;
     struct rte_mp_msg mp_resp;
     struct vdev_param *ou = (struct vdev_param *)&mp_resp.param;
     const struct vdev_param *in = (const struct vdev_param *)mp_msg->param;
     const char *devname;
     int num;
     int ret;

     strlcpy(mp_resp.name, VDEV_MP_KEY, sizeof(mp_resp.name));
     mp_resp.len_param = sizeof(*ou);
     mp_resp.num_fds = 0;

     switch (in->type) {
     case VDEV_SCAN_REQ:
          VDEV_LOG(INFO, "changczh skip vdev, %s", devname);
          ou->type = VDEV_SCAN_ONE;
          ou->num = 1;
          num = 0;

          rte_spinlock_recursive_lock(&vdev_device_list_lock);
          TAILQ_FOREACH(dev, &vdev_device_list, next) {
              devname = rte_vdev_device_name(dev);
              if (strlen(devname) == 0) {
                   VDEV_LOG(INFO, "vdev with no name is not sent");
                   continue;
              }
              VDEV_LOG(INFO, "send vdev, %s", devname);
              strlcpy(ou->name, devname, RTE_DEV_NAME_MAX_LEN);
              if (rte_mp_sendmsg(&mp_resp) < 0)
                   VDEV_LOG(ERR, "send vdev, %s, failed, %s",
                        devname, strerror(rte_errno));
              num++;
          }
          rte_spinlock_recursive_unlock(&vdev_device_list_lock);
          ou->type = VDEV_SCAN_REP;
          ou->num = num;
          if (rte_mp_reply(&mp_resp, peer) < 0)
              VDEV_LOG(ERR, "Failed to reply a scan request");
          break;
     case VDEV_SCAN_ONE:
          VDEV_LOG(INFO, "receive vdev, %s", in->name);
          ret = insert_vdev(in->name, NULL, NULL, false);
          if (ret == -EEXIST)
              VDEV_LOG(DEBUG, "device already exist, %s", in->name);
          else if (ret < 0)
              VDEV_LOG(ERR, "failed to add vdev, %s", in->name);
          break;
     default:
          VDEV_LOG(ERR, "vdev cannot recognize this message");
     }

     return 0;
}

Thanks,
Alex

[-- Attachment #2: Type: text/html, Size: 15269 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev_action
  2022-02-02 16:08 cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev_action Changchun Zhang
@ 2022-02-03  4:41 ` Changchun Zhang
  0 siblings, 0 replies; 2+ messages in thread
From: Changchun Zhang @ 2022-02-03  4:41 UTC (permalink / raw)
  To: users, dev

[-- Attachment #1: Type: text/plain, Size: 4305 bytes --]

The issue can be resolved by allocating the mb_mgr only for the primary process in the PMD probe/create function of the aesni_mb_pmd.
Disabling the scanning/probe is just a work around, I guess that is the reason why the fix in this https://review.spdk.io/gerrit/c/spdk/dpdk/+/1056<https://urldefense.com/v3/__https:/review.spdk.io/gerrit/c/spdk/dpdk/*/1056__;Kw!!ACWV5N9M2RV99hQ!YF30Qsfu_2n30Fh2UhUdVH-1-72sWzP0kFWqtxkp1w3jzijsVGrk6w6v8C1gKCMGq9Bu$> is canceled as the latest DPDK of 21.11 redesigned this PMD. The mb_mgr is removed from the device data and put to the each creation of the crypto session (wondering it has performance degradation).

Thanks,
Changchun (Alex)

From: Changchun Zhang [mailto:changchun.zhang@oracle.com]
Sent: Wednesday, February 2, 2022 11:08 AM
To: users@dpdk.org; dev@dpdk.org
Subject: [External] : cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev_action

Hi,

Has anyone noticed that crypto_aesni_mb virtual crypto device has issue of memory crash caused by the scanning and probe on secondary process. Can anyone cast any lights on it.
What I encountered is:
On the primary process, the crypto_aesni_mb device is probed and created successfully and I got the mb_mgr set in the device private data. But during the packet process, the application crashes on accessing the mb_mgr. The deugging shows this mb_mgr address has been changed to an invalid address (non-NULL). Further digging shows this memory contamination occurs after the vdev_action replies the scan request.
In below code, the crash is gone by either disable sending message on VDEV_SCAN_REQ or skip processing the VDEV_SCAN_ONE. It seems the insert_vdev() on secondary process triggers another probe and break the existing device data?
It is also noticed there was an issue which was fixed by this patch https://review.spdk.io/gerrit/c/spdk/dpdk/+/1056<https://urldefense.com/v3/__https:/review.spdk.io/gerrit/c/spdk/dpdk/*/1056__;Kw!!ACWV5N9M2RV99hQ!YF30Qsfu_2n30Fh2UhUdVH-1-72sWzP0kFWqtxkp1w3jzijsVGrk6w6v8C1gKCMGq9Bu$> but this patch is cancelled. This patch was complaining the similar memory issue found during scanning and probing on the secondary process.

static int
vdev_action(const struct rte_mp_msg *mp_msg, const void *peer)
{
     struct rte_vdev_device *dev;
     struct rte_mp_msg mp_resp;
     struct vdev_param *ou = (struct vdev_param *)&mp_resp.param;
     const struct vdev_param *in = (const struct vdev_param *)mp_msg->param;
     const char *devname;
     int num;
     int ret;

     strlcpy(mp_resp.name, VDEV_MP_KEY, sizeof(mp_resp.name));
     mp_resp.len_param = sizeof(*ou);
     mp_resp.num_fds = 0;

     switch (in->type) {
     case VDEV_SCAN_REQ:
          VDEV_LOG(INFO, "changczh skip vdev, %s", devname);
          ou->type = VDEV_SCAN_ONE;
          ou->num = 1;
          num = 0;

          rte_spinlock_recursive_lock(&vdev_device_list_lock);
          TAILQ_FOREACH(dev, &vdev_device_list, next) {
              devname = rte_vdev_device_name(dev);
              if (strlen(devname) == 0) {
                   VDEV_LOG(INFO, "vdev with no name is not sent");
                   continue;
              }
              VDEV_LOG(INFO, "send vdev, %s", devname);
              strlcpy(ou->name, devname, RTE_DEV_NAME_MAX_LEN);
              if (rte_mp_sendmsg(&mp_resp) < 0)
                   VDEV_LOG(ERR, "send vdev, %s, failed, %s",
                        devname, strerror(rte_errno));
              num++;
          }
          rte_spinlock_recursive_unlock(&vdev_device_list_lock);
          ou->type = VDEV_SCAN_REP;
          ou->num = num;
          if (rte_mp_reply(&mp_resp, peer) < 0)
              VDEV_LOG(ERR, "Failed to reply a scan request");
          break;
     case VDEV_SCAN_ONE:
          VDEV_LOG(INFO, "receive vdev, %s", in->name);
          ret = insert_vdev(in->name, NULL, NULL, false);
          if (ret == -EEXIST)
              VDEV_LOG(DEBUG, "device already exist, %s", in->name);
          else if (ret < 0)
              VDEV_LOG(ERR, "failed to add vdev, %s", in->name);
          break;
     default:
          VDEV_LOG(ERR, "vdev cannot recognize this message");
     }

     return 0;
}

Thanks,
Alex

[-- Attachment #2: Type: text/html, Size: 17721 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-02-03  4:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 16:08 cryto_aesni_mb device data contaminated and causing crash when supporting vdev_scan/vdev_action Changchun Zhang
2022-02-03  4:41 ` Changchun Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).