From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 18ABB2D13 for ; Thu, 25 Jan 2018 05:14:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jan 2018 20:14:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,409,1511856000"; d="scan'208";a="12847701" Received: from dpdk06.sh.intel.com ([10.67.110.196]) by fmsmga007.fm.intel.com with ESMTP; 24 Jan 2018 20:14:33 -0800 From: Jianfeng Tan To: dev@dpdk.org Cc: anatoly.burakov@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net, Jianfeng Tan Date: Thu, 25 Jan 2018 04:16:21 +0000 Message-Id: <1516853783-108023-2-git-send-email-jianfeng.tan@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516853783-108023-1-git-send-email-jianfeng.tan@intel.com> References: <1512067450-59203-1-git-send-email-jianfeng.tan@intel.com> <1516853783-108023-1-git-send-email-jianfeng.tan@intel.com> Subject: [dpdk-dev] [PATCH v3 1/3] eal: add channel for multi-process communication X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jan 2018 04:14:36 -0000 Previouly, there are three channels for multi-process (i.e., primary/secondary) communication. 1. Config-file based channel, in which, the primary process writes info into a pre-defined config file, and the secondary process reads the info out. 2. vfio submodule has its own channel based on unix socket for the secondary process to get container fd and group fd from the primary process. 3. pdump submodule also has its own channel based on unix socket for packet dump. It'd be good to have a generic communication channel for multi-process communication to accomodate the requirements including: a. Secondary wants to send info to primary, for example, secondary would like to send request (about some specific vdev to primary). b. Sending info at any time, instead of just initialization time. c. Share FDs with the other side, for vdev like vhost, related FDs (memory region, kick) should be shared. d. A send message request needs the other side to response immediately. This patch proposes to create a communication channel, based on datagram unix socket, for above requirements. Each process will block on a unix socket waiting for messages from the peers. Three new APIs are added: 1. rte_eal_mp_action_register() is used to register an action, indexed by a string, when a component at receiver side would like to response the messages from the peer processe. 2. rte_eal_mp_action_unregister() is used to unregister the action if the calling component does not want to response the messages. 3. rte_eal_mp_sendmsg() is used to send a message, and returns immediately. If there are n secondary processes, the primary process will send n messages. Suggested-by: Konstantin Ananyev Signed-off-by: Jianfeng Tan --- lib/librte_eal/common/eal_common_proc.c | 390 +++++++++++++++++++++++++++++++- lib/librte_eal/common/eal_filesystem.h | 17 ++ lib/librte_eal/common/eal_private.h | 10 + lib/librte_eal/common/include/rte_eal.h | 75 ++++++ lib/librte_eal/linuxapp/eal/eal.c | 8 + lib/librte_eal/rte_eal_version.map | 3 + 6 files changed, 502 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 40fa982..baeb7d1 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -2,14 +2,48 @@ * Copyright(c) 2016 Intel Corporation */ -#include +#include +#include #include +#include +#include +#include +#include +#include #include +#include +#include +#include +#include +#include + +#include #include +#include +#include +#include +#include "eal_private.h" #include "eal_filesystem.h" #include "eal_internal_cfg.h" +static int mp_fd = -1; +static char mp_filter[PATH_MAX]; /* Filter for secondary process sockets */ +static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */ +static pthread_mutex_t mp_mutex_action = PTHREAD_MUTEX_INITIALIZER; + +struct action_entry { + TAILQ_ENTRY(action_entry) next; + char action_name[RTE_MP_MAX_NAME_LEN]; + rte_eal_mp_t action; +}; + +/** Double linked list of actions. */ +TAILQ_HEAD(action_entry_list, action_entry); + +static struct action_entry_list action_entry_list = + TAILQ_HEAD_INITIALIZER(action_entry_list); + int rte_eal_primary_proc_alive(const char *config_file_path) { @@ -31,3 +65,357 @@ rte_eal_primary_proc_alive(const char *config_file_path) return !!ret; } + +static struct action_entry * +find_action_entry_by_name(const char *name) +{ + struct action_entry *entry; + + TAILQ_FOREACH(entry, &action_entry_list, next) { + if (strncmp(entry->action_name, name, RTE_MP_MAX_NAME_LEN) == 0) + break; + } + + return entry; +} + +static bool +validate_action_name(const char *name) +{ + if (name == NULL) { + RTE_LOG(ERR, EAL, "Action name cannot be NULL\n"); + rte_errno = -EINVAL; + return false; + } + if (strnlen(name, RTE_MP_MAX_NAME_LEN) == 0) { + RTE_LOG(ERR, EAL, "Length of action name is zero\n"); + rte_errno = -EINVAL; + return false; + } + if (strnlen(name, RTE_MP_MAX_NAME_LEN) == RTE_MP_MAX_NAME_LEN) { + rte_errno = -E2BIG; + return false; + } + return true; +} + +int +rte_eal_mp_action_register(const char *name, rte_eal_mp_t action) +{ + struct action_entry *entry; + + if(!validate_action_name(name)) + return -1; + + entry = malloc(sizeof(struct action_entry)); + if (entry == NULL) { + rte_errno = -ENOMEM; + return -1; + } + strcpy(entry->action_name, name); + entry->action = action; + + pthread_mutex_lock(&mp_mutex_action); + if (find_action_entry_by_name(name) != NULL) { + pthread_mutex_unlock(&mp_mutex_action); + rte_errno = -EEXIST; + free(entry); + return -1; + } + TAILQ_INSERT_TAIL(&action_entry_list, entry, next); + pthread_mutex_unlock(&mp_mutex_action); + return 0; +} + +void +rte_eal_mp_action_unregister(const char *name) +{ + struct action_entry *entry; + + if(!validate_action_name(name)) + return; + + pthread_mutex_lock(&mp_mutex_action); + entry = find_action_entry_by_name(name); + if (entry == NULL) { + pthread_mutex_unlock(&mp_mutex_action); + return; + } + TAILQ_REMOVE(&action_entry_list, entry, next); + pthread_mutex_unlock(&mp_mutex_action); + free(entry); +} + +static int +read_msg(struct rte_mp_msg *msg) +{ + int msglen; + struct iovec iov; + struct msghdr msgh; + char control[CMSG_SPACE(sizeof(msg->fds))]; + struct cmsghdr *cmsg; + int buflen = sizeof(*msg) - sizeof(msg->fds); + + memset(&msgh, 0, sizeof(msgh)); + iov.iov_base = msg; + iov.iov_len = buflen; + + msgh.msg_iov = &iov; + msgh.msg_iovlen = 1; + msgh.msg_control = control; + msgh.msg_controllen = sizeof(control); + + msglen = recvmsg(mp_fd, &msgh, 0); + if (msglen < 0) { + RTE_LOG(ERR, EAL, "recvmsg failed, %s\n", strerror(errno)); + return -1; + } + + if (msglen != buflen || (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { + RTE_LOG(ERR, EAL, "truncted msg\n"); + return -1; + } + + /* read auxiliary FDs if any */ + for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL; + cmsg = CMSG_NXTHDR(&msgh, cmsg)) { + if ((cmsg->cmsg_level == SOL_SOCKET) && + (cmsg->cmsg_type == SCM_RIGHTS)) { + memcpy(msg->fds, CMSG_DATA(cmsg), sizeof(msg->fds)); + break; + } + } + + return 0; +} + +static void +process_msg(struct rte_mp_msg *msg) +{ + struct action_entry *entry; + rte_eal_mp_t action = NULL; + + RTE_LOG(DEBUG, EAL, "msg: %s\n", msg->name); + pthread_mutex_lock(&mp_mutex_action); + entry = find_action_entry_by_name(msg->name); + if (entry != NULL) + action = entry->action; + pthread_mutex_unlock(&mp_mutex_action); + + if (!action) + RTE_LOG(ERR, EAL, "Cannot find action: %s\n", msg->name); + else if (action(msg) < 0) + RTE_LOG(ERR, EAL, "Fail to handle message: %s\n", msg->name); +} + +static void * +mp_handle(void *arg __rte_unused) +{ + struct rte_mp_msg msg; + + while (1) { + if (read_msg(&msg) == 0) + process_msg(&msg); + } + + return NULL; +} + +static int +open_socket_fd(void) +{ + struct sockaddr_un un; + const char *prefix = eal_mp_socket_path(); + + mp_fd = socket(AF_UNIX, SOCK_DGRAM, 0); + if (mp_fd < 0) { + RTE_LOG(ERR, EAL, "failed to create unix socket\n"); + return -1; + } + + memset(&un, 0, sizeof(un)); + un.sun_family = AF_UNIX; + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + snprintf(un.sun_path, sizeof(un.sun_path), "%s", prefix); + else + snprintf(un.sun_path, sizeof(un.sun_path), "%s_%d", + prefix, getpid()); + unlink(un.sun_path); /* May still exist since last run */ + if (bind(mp_fd, (struct sockaddr *)&un, sizeof(un)) < 0) { + RTE_LOG(ERR, EAL, "failed to bind %s: %s\n", + un.sun_path, strerror(errno)); + close(mp_fd); + return -1; + } + + RTE_LOG(INFO, EAL, "Multi-process socket %s\n", un.sun_path); + return mp_fd; +} + +static void +unlink_sockets(void) +{ + int dir_fd; + DIR *mp_dir; + struct dirent *ent; + + mp_dir = opendir(mp_dir_path); + if (!mp_dir) { + RTE_LOG(ERR, EAL, "Unable to open directory %s\n", mp_dir_path); + return; + } + dir_fd = dirfd(mp_dir); + + while ((ent = readdir(mp_dir))) { + if (fnmatch(mp_filter, ent->d_name, 0) == 0) + unlinkat(dir_fd, ent->d_name, 0); + } + + closedir(mp_dir); +} + +int +rte_eal_mp_channel_init(void) +{ + char thread_name[RTE_MAX_THREAD_NAME_LEN]; + char *path; + pthread_t tid; + + snprintf(mp_filter, PATH_MAX, ".%s_unix_*", + internal_config.hugefile_prefix); + + path = strdup(eal_mp_socket_path()); + snprintf(mp_dir_path, PATH_MAX, "%s", dirname(path)); + free(path); + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + unlink_sockets(); + + if (open_socket_fd() < 0) + return -1; + + snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "rte_mp_handle"); + + if (pthread_create(&tid, NULL, mp_handle, NULL) == 0) { + /* try best to set thread name */ + rte_thread_setname(tid, thread_name); + return 0; + } + + RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n", strerror(errno)); + close(mp_fd); + mp_fd = -1; + return -1; +} + +static int +send_msg(const char *dst_path, struct rte_mp_msg *msg) +{ + int snd; + struct iovec iov; + struct msghdr msgh; + struct cmsghdr *cmsg; + struct sockaddr_un dst; + int fd_size = msg->num_fds * sizeof(int); + char control[CMSG_SPACE(fd_size)]; + + memset(&dst, 0, sizeof(dst)); + dst.sun_family = AF_UNIX; + snprintf(dst.sun_path, sizeof(dst.sun_path), "%s", dst_path); + + memset(&msgh, 0, sizeof(msgh)); + memset(control, 0, sizeof(control)); + + iov.iov_base = msg; + iov.iov_len = sizeof(*msg) - sizeof(msg->fds); + + msgh.msg_name = &dst; + msgh.msg_namelen = sizeof(dst); + msgh.msg_iov = &iov; + msgh.msg_iovlen = 1; + msgh.msg_control = control; + msgh.msg_controllen = sizeof(control); + + cmsg = CMSG_FIRSTHDR(&msgh); + cmsg->cmsg_len = CMSG_LEN(fd_size); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + memcpy(CMSG_DATA(cmsg), msg->fds, fd_size); + + do { + snd = sendmsg(mp_fd, &msgh, 0); + } while (snd < 0 && errno == EINTR); + + if (snd > 0) + return 1; + + RTE_LOG(ERR, EAL, "failed to send to (%s) due to %s\n", + dst_path, strerror(errno)); + return 0; +} + +static int +mp_send(struct rte_mp_msg *msg) +{ + int n = 0; + DIR *mp_dir; + struct dirent *ent; + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* broadcast to all secondary processes */ + mp_dir = opendir(mp_dir_path); + if (!mp_dir) { + RTE_LOG(ERR, EAL, "Unable to open directory %s\n", + mp_dir_path); + return 0; + } + while ((ent = readdir(mp_dir))) { + if (fnmatch(mp_filter, ent->d_name, 0) != 0) + continue; + + n += send_msg(ent->d_name, msg); + } + closedir(mp_dir); + } else + n += send_msg(eal_mp_socket_path(), msg); + + return n; +} + +static bool +check_input(const struct rte_mp_msg *msg) +{ + if (msg == NULL) { + RTE_LOG(ERR, EAL, "Msg cannot be NULL\n"); + rte_errno = -EINVAL; + return false; + } + + if (!validate_action_name(msg->name)) + return false; + + if (msg->len_param > RTE_MP_MAX_PARAM_LEN) { + RTE_LOG(ERR, EAL, "Message data is too long\n"); + rte_errno = -E2BIG; + return false; + } + + if (msg->num_fds > RTE_MP_MAX_FD_NUM) { + RTE_LOG(ERR, EAL, "Cannot send more than %d FDs\n", + RTE_MP_MAX_FD_NUM); + rte_errno = -E2BIG; + return false; + } + + return true; +} + +int +rte_eal_mp_sendmsg(struct rte_mp_msg *msg) +{ + if (!check_input(msg)) + return -1; + + RTE_LOG(DEBUG, EAL, "sendmsg: %s\n", msg->name); + return mp_send(msg); +} diff --git a/lib/librte_eal/common/eal_filesystem.h b/lib/librte_eal/common/eal_filesystem.h index e8959eb..3b2929d 100644 --- a/lib/librte_eal/common/eal_filesystem.h +++ b/lib/librte_eal/common/eal_filesystem.h @@ -38,6 +38,23 @@ eal_runtime_config_path(void) return buffer; } +/** Path of primary/secondary communication unix socket file. */ +#define MP_SOCKET_PATH_FMT "%s/.%s_unix" +static inline const char * +eal_mp_socket_path(void) +{ + static char buffer[PATH_MAX]; /* static so auto-zeroed */ + const char *directory = default_config_dir; + const char *home_dir = getenv("HOME"); + + if (getuid() != 0 && home_dir != NULL) + directory = home_dir; + snprintf(buffer, sizeof(buffer) - 1, MP_SOCKET_PATH_FMT, + directory, internal_config.hugefile_prefix); + + return buffer; +} + /** Path of hugepage info file. */ #define HUGEPAGE_INFO_FMT "%s/.%s_hugepage_info" diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index c46dd8f..e36e3b5 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -195,4 +195,14 @@ int rte_eal_hugepage_attach(void); */ struct rte_bus *rte_bus_find_by_device_name(const char *str); +/** + * Create the unix channel for primary/secondary communication. + * + * @return + * 0 on success; + * (<0) on failure. + */ + +int rte_eal_mp_channel_init(void); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index 2aba2c8..9a1aac2 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -186,6 +186,81 @@ int rte_eal_init(int argc, char **argv); */ int rte_eal_primary_proc_alive(const char *config_file_path); +#define RTE_MP_MAX_FD_NUM 8 /* The max amount of fds */ +#define RTE_MP_MAX_NAME_LEN 64 /* The max length of action name */ +#define RTE_MP_MAX_PARAM_LEN 256 /* The max length of param */ +struct rte_mp_msg { + char name[RTE_MP_MAX_NAME_LEN]; + int len_param; + int num_fds; + uint8_t param[RTE_MP_MAX_PARAM_LEN]; + int fds[RTE_MP_MAX_FD_NUM]; +}; + +/** + * Action function typedef used by other components. + * + * As we create socket channel for primary/secondary communication, use + * this function typedef to register action for coming messages. + */ +typedef int (*rte_eal_mp_t)(const struct rte_mp_msg *msg); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Register an action function for primary/secondary communication. + * + * Call this function to register an action, if the calling component wants + * to response the messages from the corresponding component in its primary + * process or secondary processes. + * + * @param name + * The name argument plays as the nonredundant key to find the action. + * + * @param action + * The action argument is the function pointer to the action function. + * + * @return + * - 0 on success. + * - (<0) on failure. + */ +int rte_eal_mp_action_register(const char *name, rte_eal_mp_t action); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Unregister an action function for primary/secondary communication. + * + * Call this function to unregister an action if the calling component does + * not want to response the messages from the corresponding component in its + * primary process or secondary processes. + * + * @param name + * The name argument plays as the nonredundant key to find the action. + * + */ +void rte_eal_mp_action_unregister(const char *name); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Send a message to the peer process. + * + * This function will send a message which will be responsed by the action + * identified by name in the peer process. + * + * @param msg + * The msg argument contains the customized message. + * + * @return + * - (<0) on invalid parameters; + * - (>=0) as the number of messages being sent successfully. + */ +int rte_eal_mp_sendmsg(struct rte_mp_msg *msg); + /** * Usage function typedef used by the application usage function. * diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 229eec9..ad44ab5 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -852,6 +852,14 @@ rte_eal_init(int argc, char **argv) return -1; } + if (rte_eal_mp_channel_init() < 0) { + rte_eal_init_alert("failed to init mp channel\n"); + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + rte_errno = EFAULT; + return -1; + } + } + #ifdef VFIO_PRESENT if (rte_eal_vfio_setup() < 0) { rte_eal_init_alert("Cannot init VFIO\n"); diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 7088b72..adeadfb 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -217,6 +217,9 @@ EXPERIMENTAL { rte_eal_devargs_remove; rte_eal_hotplug_add; rte_eal_hotplug_remove; + rte_eal_mp_action_register; + rte_eal_mp_action_unregister; + rte_eal_mp_sendmsg; rte_service_attr_get; rte_service_attr_reset_all; rte_service_component_register; -- 2.7.4