From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: "Richardson, Bruce" <bruce.richardson@intel.com>,
"thomas@monjalon.net" <thomas@monjalon.net>
Subject: Re: [dpdk-dev] [PATCH v2 3/4] eal: add synchronous multi-process communication
Date: Wed, 17 Jan 2018 17:20:38 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772588627F12B@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <74ccd840-86af-4dba-e5ba-494017052841@intel.com>
>
>
> On 1/17/2018 6:50 PM, Ananyev, Konstantin wrote:
> >
> >>> Hi Jianfeng,
> >>>
> >>>> -----Original Message-----
> >>>> From: Tan, Jianfeng
> >>>> Sent: Tuesday, January 16, 2018 8:11 AM
> >>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>
> >>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
> >>>> Subject: Re: [PATCH v2 3/4] eal: add synchronous multi-process communication
> >>>>
> >>>> Thank you, Konstantin and Anatoly firstly. Other comments are well
> >>>> received and I'll send out a new version.
> >>>>
> >>>>
> >>>> On 1/16/2018 8:00 AM, Ananyev, Konstantin wrote:
> >>>>>> We need the synchronous way for multi-process communication,
> >>>>>> i.e., blockingly waiting for reply message when we send a request
> >>>>>> to the peer process.
> >>>>>>
> >>>>>> We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
> >>>>>> such use case. By invoking rte_eal_mp_request(), a request message
> >>>>>> is sent out, and then it waits there for a reply message. The
> >>>>>> timeout is hard-coded 5 Sec. And the replied message will be copied
> >>>>>> in the parameters of this API so that the caller can decide how
> >>>>>> to translate those information (including params and fds). Note
> >>>>>> if a primary process owns multiple secondary processes, this API
> >>>>>> will fail.
> >>>>>>
> >>>>>> The API rte_eal_mp_reply() is always called by an mp action handler.
> >>>>>> Here we add another parameter for rte_eal_mp_t so that the action
> >>>>>> handler knows which peer address to reply.
> >>>>>>
> >>>>>> We use mutex in rte_eal_mp_request() to guarantee that only one
> >>>>>> request is on the fly for one pair of processes.
> >>>>> You don't need to do things in such strange and restrictive way.
> >>>>> Instead you can do something like that:
> >>>>> 1) Introduce new struct, list for it and mutex
> >>>>> struct sync_request {
> >>>>> int reply_received;
> >>>>> char dst[PATH_MAX];
> >>>>> char reply[...];
> >>>>> LIST_ENTRY(sync_request) next;
> >>>>> };
> >>>>>
> >>>>> static struct
> >>>>> LIST_HEAD(list, sync_request);
> >>>>> pthread_mutex_t lock;
> >>>>> pthead_cond_t cond;
> >>>>> } sync_requests;
> >>>>>
> >>>>> 2) then at request() call:
> >>>>> Grab sync_requests.lock
> >>>>> Check do we already have a pending request for that destination,
> >>>>> If yes - the release the lock and returns with error.
> >>>>> - allocate and init new sync_request struct, set reply_received=0
> >>>>> - do send_msg()
> >>>>> -then in a cycle:
> >>>>> pthread_cond_timed_wait(&sync_requests.cond, &sync_request.lock, ×pec);
> >>>>> - at return from it check if sync_request.reply_received == 1, if not
> >>>>> check if timeout expired and either return a failure or go to the start of the cycle.
> >>>>>
> >>>>> 3) at mp_handler() if REPLY received - grab sync_request.lock,
> >>>>> search through sync_requests.list for dst[] ,
> >>>>> if found, then set it's reply_received=1, copy the received message into reply
> >>>>> and call pthread_cond_braodcast((&sync_requests.cond);
> >>>> The only benefit I can see is that now the sender can request to
> >>>> multiple receivers at the same time. And it makes things more
> >>>> complicated. Do we really need this?
> >>> The benefit is that one thread is blocked waiting for response,
> >>> your mp_handler can still receive and handle other messages.
> >> This can already be done in the original implementation. mp_handler
> >> listens for msg, request from the other peer(s), and replies the
> >> requests, which is not affected.
> >>
> >>> Plus as you said - other threads can keep sending messages.
> >> For this one, in the original implementation, other threads can still
> >> send msg, but not request. I suppose the request is not in a fast path,
> >> why we care to make it fast?
> >>
> > +int
> > +rte_eal_mp_request(const char *action_name,
> > + void *params,
> > + int len_p,
> > + int fds[],
> > + int fds_in,
> > + int fds_out)
> > +{
> > + int i, j;
> > + int sockfd;
> > + int nprocs;
> > + int ret = 0;
> > + struct mp_msghdr *req;
> > + struct timeval tv;
> > + char buf[MAX_MSG_LENGTH];
> > + struct mp_msghdr *hdr;
> > +
> > + RTE_LOG(DEBUG, EAL, "request: %s\n", action_name);
> > +
> > + if (fds_in > SCM_MAX_FD || fds_out > SCM_MAX_FD) {
> > + RTE_LOG(ERR, EAL, "Cannot send more than %d FDs\n", SCM_MAX_FD);
> > + rte_errno = -E2BIG;
> > + return 0;
> > + }
> > +
> > + req = format_msg(action_name, params, len_p, fds_in, MP_REQ);
> > + if (req == NULL)
> > + return 0;
> > +
> > + if ((sockfd = open_unix_fd(0)) < 0) {
> > + free(req);
> > + return 0;
> > + }
> > +
> > + tv.tv_sec = 5; /* 5 Secs Timeout */
> > + tv.tv_usec = 0;
> > + if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO,
> > + (const void *)&tv, sizeof(struct timeval)) < 0)
> > + RTE_LOG(INFO, EAL, "Failed to set recv timeout\n");
> >
> > I f you set it just for one call, why do you not restore it?
>
> Yes, original code is buggy, I should have put it into the critical section.
>
> Do you mean we just create once and use for ever? if yes, we could put
> the open and setting into mp_init().
>
> > Also I don't think it is a good idea to change it here -
> > if you'll make timeout a parameter value - then it could be overwritten
> > by different threads.
>
> For simplicity, I'm not inclined to put the timeout as an parameter
> exposing to caller. So if you agree, I'll put it into the mp_init() with
> open.
My preference would be to have timeout value on a per call basis.
For one request user would like to wait no more than 5sec,
for another one user would probably be ok to wait forever.
>
> >
> > +
> > + /* Only allow one req at a time */
> > + pthread_mutex_lock(&mp_mutex_request);
> > +
> > + if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > + nprocs = 0;
> > + for (i = 0; i < MAX_SECONDARY_PROCS; ++i)
> > + if (!mp_sec_sockets[i]) {
> > + j = i;
> > + nprocs++;
> > + }
> > +
> > + if (nprocs > 1) {
> > + RTE_LOG(ERR, EAL,
> > + "multi secondary processes not supported\n");
> > + goto free_and_ret;
> > + }
> > +
> > + ret = send_msg(sockfd, mp_sec_sockets[j], req, fds);
> >
> > As I remember - sndmsg() is also blocking call, so under some conditions you can stall
> > there forever.
>
> From linux's unix_diagram_sendmsg(), we see:
> timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
Ok, but it would have effect only if (msg->msg_flags & MSG_DONTWAIT) != 0.
And for that, as I remember you need your socket in non-blocking mode, no?
>
> I assume it will not block for datagram unix socket in Linux. But I'm
> not sure what it behaves in freebsd.
>
> Anyway, better to add an explicit setsockopt() to make it not blocking.
You can't do that - at the same moment another thread might call your sendmsg()
and it might expect it to be blocking call.
>
> > As mp_mutex_requestis still held - next rte_eal_mp_request(0 will also block forever here.
> >
> > + } else
> > + ret = send_msg(sockfd, eal_mp_unix_path(), req, fds);
> > +
> > + if (ret == 0) {
> > + RTE_LOG(ERR, EAL, "failed to send request: %s\n", action_name);
> > + ret = -1;
> > + goto free_and_ret;
> > + }
> > +
> > + ret = read_msg(sockfd, buf, MAX_MSG_LENGTH, fds, fds_out, NULL);
> >
> > if the message you receive is not a reply you are expecting -
> > it will be simply dropped - mp_handler() would never process it.
>
> We cannot detect if it's the right reply absolutely correctly, but just
> check the action_name, which means, it still possibly gets a wrong reply
> if an action_name contains multiple requests.
>
> Is just comparing the action_name acceptable?
As I can see the main issue here is that you can call recvmsg() from 2 different
points and they are not syncronised:
1. your mp_handler() doesn't aware about reply you are waiting and not
have any handler associated with it.
So if mp_handler() will receive a reply it will just drop it.
2. your reply() is not aware about any other messages and associated actions -
so again it can't handle them properly (and probably shouldn't).
The simplest (and most common) way - always call recvmsg from one place -
mp_handler() and have a special action for reply msg.
As I wrote before that action will be just find the appropriate buffer provided
by reply() - copy message into it and signal thread waiting in reply() that
it can proceed.
Konstantin
>
> >
> > + if (ret > 0) {
> > + hdr = (struct mp_msghdr *)buf;
> > + if (hdr->len_params == len_p)
> > + memcpy(params, hdr->params, len_p);
> > + else {
> > + RTE_LOG(ERR, EAL, "invalid reply\n");
> > + ret = 0;
> > + }
> > + }
> > +
> > +free_and_ret:
> > + free(req);
> > + close(sockfd);
> > + pthread_mutex_unlock(&mp_mutex_request);
> > + return ret;
> > +}
> >
> > All of the above makes me think that current implementation is erroneous
> > and needs to be reworked.
>
> Thank you for your review. I'll work on a new version.
>
> Thanks,
> Jianfeng
>
> > Konstantin
> >
> >
next prev parent reply other threads:[~2018-01-17 17:20 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-30 18:44 [dpdk-dev] [PATCH 0/3] generic channel for " Jianfeng Tan
2017-11-30 18:44 ` [dpdk-dev] [PATCH 1/3] eal: add " Jianfeng Tan
2017-12-11 11:04 ` Burakov, Anatoly
2017-12-11 16:43 ` Ananyev, Konstantin
2017-11-30 18:44 ` [dpdk-dev] [PATCH 2/3] eal: add synchronous " Jianfeng Tan
2017-12-11 11:39 ` Burakov, Anatoly
2017-12-11 16:49 ` Ananyev, Konstantin
2017-11-30 18:44 ` [dpdk-dev] [PATCH 3/3] vfio: use the generic multi-process channel Jianfeng Tan
2017-12-11 12:01 ` Burakov, Anatoly
2017-12-11 9:59 ` [dpdk-dev] [PATCH 0/3] generic channel for multi-process communication Burakov, Anatoly
2017-12-12 7:34 ` Tan, Jianfeng
2017-12-12 16:18 ` Burakov, Anatoly
2018-01-11 4:07 ` [dpdk-dev] [PATCH v2 0/4] " Jianfeng Tan
2018-01-11 4:07 ` [dpdk-dev] [PATCH v2 1/4] eal: add " Jianfeng Tan
2018-01-13 12:57 ` Burakov, Anatoly
2018-01-15 19:52 ` Ananyev, Konstantin
2018-01-11 4:07 ` [dpdk-dev] [PATCH v2 2/4] eal: add and del secondary processes in the primary Jianfeng Tan
2018-01-13 13:11 ` Burakov, Anatoly
2018-01-15 21:45 ` Ananyev, Konstantin
2018-01-11 4:07 ` [dpdk-dev] [PATCH v2 3/4] eal: add synchronous multi-process communication Jianfeng Tan
2018-01-13 13:41 ` Burakov, Anatoly
2018-01-16 0:00 ` Ananyev, Konstantin
2018-01-16 8:10 ` Tan, Jianfeng
2018-01-16 11:12 ` Ananyev, Konstantin
2018-01-16 16:47 ` Tan, Jianfeng
2018-01-17 10:50 ` Ananyev, Konstantin
2018-01-17 13:09 ` Tan, Jianfeng
2018-01-17 13:15 ` Tan, Jianfeng
2018-01-17 17:20 ` Ananyev, Konstantin [this message]
2018-01-11 4:07 ` [dpdk-dev] [PATCH v2 4/4] vfio: use the generic multi-process channel Jianfeng Tan
2018-01-13 14:03 ` Burakov, Anatoly
2018-03-04 14:57 ` [dpdk-dev] [PATCH v5] vfio: change to use " Jianfeng Tan
2018-03-14 13:27 ` Burakov, Anatoly
2018-03-19 6:53 ` Tan, Jianfeng
2018-03-20 10:33 ` Burakov, Anatoly
2018-03-20 10:56 ` Burakov, Anatoly
2018-03-20 8:50 ` [dpdk-dev] [PATCH v6] " Jianfeng Tan
2018-04-05 14:26 ` Tan, Jianfeng
2018-04-05 14:39 ` Burakov, Anatoly
2018-04-12 23:27 ` Thomas Monjalon
2018-04-12 15:26 ` Burakov, Anatoly
2018-04-15 15:06 ` [dpdk-dev] [PATCH v7] " Jianfeng Tan
2018-04-15 15:10 ` Tan, Jianfeng
2018-04-17 23:04 ` Thomas Monjalon
2018-01-25 4:16 ` [dpdk-dev] [PATCH v3 0/3] generic channel for multi-process communication Jianfeng Tan
2018-01-25 4:16 ` [dpdk-dev] [PATCH v3 1/3] eal: add " Jianfeng Tan
2018-01-25 10:41 ` Thomas Monjalon
2018-01-25 11:27 ` Burakov, Anatoly
2018-01-25 11:34 ` Thomas Monjalon
2018-01-25 12:21 ` Ananyev, Konstantin
2018-01-25 4:16 ` [dpdk-dev] [PATCH v3 2/3] eal: add synchronous " Jianfeng Tan
2018-01-25 12:00 ` Burakov, Anatoly
2018-01-25 12:19 ` Ananyev, Konstantin
2018-01-25 12:25 ` Burakov, Anatoly
2018-01-25 13:00 ` Ananyev, Konstantin
2018-01-25 13:05 ` Burakov, Anatoly
2018-01-25 13:10 ` Burakov, Anatoly
2018-01-25 15:03 ` Ananyev, Konstantin
2018-01-25 16:22 ` Burakov, Anatoly
2018-01-25 17:10 ` Tan, Jianfeng
2018-01-25 18:02 ` Burakov, Anatoly
2018-01-25 12:19 ` Burakov, Anatoly
2018-01-25 12:22 ` Ananyev, Konstantin
2018-01-25 4:16 ` [dpdk-dev] [PATCH v3 3/3] vfio: use the generic multi-process channel Jianfeng Tan
2018-01-25 10:47 ` Thomas Monjalon
2018-01-25 10:52 ` Burakov, Anatoly
2018-01-25 10:57 ` Thomas Monjalon
2018-01-25 12:15 ` Burakov, Anatoly
2018-01-25 19:14 ` [dpdk-dev] [PATCH v4 0/2] generic channel for multi-process communication Jianfeng Tan
2018-01-25 19:14 ` [dpdk-dev] [PATCH v4 1/2] eal: add synchronous " Jianfeng Tan
2018-01-25 19:14 ` [dpdk-dev] [PATCH v4 2/2] vfio: use the generic multi-process channel Jianfeng Tan
2018-01-25 19:15 ` [dpdk-dev] [PATCH v4 0/2] generic channel for multi-process communication Tan, Jianfeng
2018-01-25 19:21 ` [dpdk-dev] [PATCH v5 " Jianfeng Tan
2018-01-25 19:21 ` [dpdk-dev] [PATCH v5 1/2] eal: add " Jianfeng Tan
2018-01-25 19:21 ` [dpdk-dev] [PATCH v5 2/2] eal: add synchronous " Jianfeng Tan
2018-01-25 21:23 ` [dpdk-dev] [PATCH v5 0/2] generic channel for " Thomas Monjalon
2018-01-26 3:41 ` [dpdk-dev] [PATCH v6 " Jianfeng Tan
2018-01-26 3:41 ` [dpdk-dev] [PATCH v6 1/2] eal: add " Jianfeng Tan
2018-01-26 10:25 ` Burakov, Anatoly
2018-01-29 6:37 ` Tan, Jianfeng
2018-01-29 9:37 ` Burakov, Anatoly
2018-01-26 3:41 ` [dpdk-dev] [PATCH v6 2/2] eal: add synchronous " Jianfeng Tan
2018-01-26 10:31 ` Burakov, Anatoly
2018-01-29 23:52 ` [dpdk-dev] [PATCH v6 0/2] generic channel for " Thomas Monjalon
2018-01-30 6:58 ` [dpdk-dev] [PATCH v7 " Jianfeng Tan
2018-01-30 6:58 ` [dpdk-dev] [PATCH v7 1/2] eal: add " Jianfeng Tan
2018-01-30 6:58 ` [dpdk-dev] [PATCH v7 2/2] eal: add synchronous " Jianfeng Tan
2018-01-30 14:46 ` [dpdk-dev] [PATCH v7 0/2] generic channel for " Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB9772588627F12B@irsmsx105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=jianfeng.tan@intel.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).