From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 2534D23B for ; Fri, 25 May 2018 18:56:27 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 May 2018 09:56:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,440,1520924400"; d="scan'208";a="61696403" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga002.jf.intel.com with ESMTP; 25 May 2018 09:56:22 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w4PGuLOn026605; Fri, 25 May 2018 17:56:21 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w4PGuLp0006093; Fri, 25 May 2018 17:56:21 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w4PGuL3p006088; Fri, 25 May 2018 17:56:21 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , thomas@monjalon.net Date: Fri, 25 May 2018 17:56:21 +0100 Message-Id: <0a167de6079798b6b21f59f2f8f9b97ff76ab541.1527267364.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 Subject: [dpdk-dev] [PATCH] doc: add documentation for IPC X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2018 16:56:29 -0000 Describe all the capabilities of DPDK IPC, and provide some insight into how to best make use of it. Signed-off-by: Anatoly Burakov --- doc/guides/prog_guide/multi_proc_support.rst | 137 +++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst index e9ebeeb..371d028 100644 --- a/doc/guides/prog_guide/multi_proc_support.rst +++ b/doc/guides/prog_guide/multi_proc_support.rst @@ -178,3 +178,140 @@ instead of the functions which do the hashing internally, such as rte_hash_add() which means that only the first, primary DPDK process instance can open and mmap /dev/hpet. If the number of required DPDK processes exceeds that of the number of available HPET comparators, the TSC (which is the default timer in this release) must be used as a time source across all processes instead of the HPET. + +Communication between multiple processes +---------------------------------------- + +While there are multiple ways one can approach inter-process communication in +DPDK, there is also a native DPDK IPC API available. It is not intended to be +performance-critical, but rather is intended to be a convenient, general +purpose API to exchange short messages between primary and secondary processes. + +DPDK IPC API supports the following communication modes: + +* Unicast message from secondary to primary +* Broadcast message from primary to all secondaries + +In other words, any IPC message sent in a primary process will be delivered to +all secondaries, while any IPC message sent in a secondary process will only be +delivered to primary process. Unicast from primary to secondary or from +secondary to secondary is not supported. + +There are three types of communications that are available within DPDK IPC API: + +* Message +* Synchronous request +* Asynchronous request + +A "message" type does not expect a response and is meant to be a best-effort +notification mechanism, while the two types of "requests" are meant to be a two +way communication mechanism, with the requester expecting a response from the +other side. + +Both messages and requests will trigger a named callback on the receiver side. +These callbacks will be called from within a dedicated IPC thread that is not +part of EAL lcore threads. + +Registering for incoming messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Before any messages can be received, a callback will need to be registered. +This is accomplished by calling ``rte_mp_action_register()`` function. This +function accepts a unique callback name, and a function pointer to a callback +that will be called when a message or a request matching this callback name +arrives. + +If the application is no longer willing to receive messages intended for a +specific callback function, ``rte_mp_action_unregister()`` function can be +called to ensure that callback will not be triggered again. + +Sending messages +~~~~~~~~~~~~~~~~ + +To send a message, a ``rte_mp_msg`` descriptor must be populated first. The list +of fields to be populated are as follows: + +* ``name`` - message name. This name must match receivers' callback name. +* ``param`` - message data (up to 256 bytes). +* ``len_param`` - length of message data. +* ``fds`` - file descriptors to pass long with the data (up to 8 fd's). +* ``num_fds`` - number of file descriptors to send. + +Once the structure is populated, calling ``rte_mp_sendmsg()`` will send the +descriptor either to all secondary processes (if sent from primary process), or +to primary process (if sent from secondary process). The function will return +a value indicating whether sending the message succeeded or not. + +Sending requests +~~~~~~~~~~~~~~~~ + +Sending requests involves waiting for the other side to reply, so they can block +for a relatively long time. + +To send a request, a message descriptor ``rte_mp_msg`` must be populated. +Additionally, a ``timespec`` value must be specified as a timeout, after which +IPC will stop waiting and return. + +For synchronous synchronous requests, the ``rte_mp_reply`` descriptor must also +be created. This is where the responses will be stored. The list of fields that +will be populated by IPC are as follows: + +* ``nb_sent`` - number indicating how many requests were sent (i.e. how many + peer processes were active at the time of the request). +* ``nb_received`` - number indicating how many responses were received (i.e. of + those peer processes that were active at the time of request, how many have + replied) +* ``msgs`` - pointer to where all of the responses are stored. The order in + which responses appear is undefined. Whendoing sycnrhonous requests, this + memory must be freed by the requestor after request completes! + +For asynchronous requests, a function pointer to the callback function must be +provided instead. This callback will be called when the request either has timed +out, or will have received a response to all the messages that were sent. + +When the callback is called, the original request descriptor will be provided +(so that it would be possible to determine for which sent message this is a +callback to), along with a response descriptor like the one described above. +When doing asynchronous requests, there is no need to free the resulting +``rte_mp_reply`` descriptor. + +Receiving and responding to messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To receive a message, a name callback must be registered using the +``rte_mp_action_register()`` function. The name of the callback must match the +``name`` field in sender's ``rte_mp_msg`` message descriptor in order for this +message to be delivered and for the callback to be trigger. + +The callback's definition is ``rte_mp_t``, and consists of the incoming message +pointer ``msg``, and an opaque pointer ``peer``. Contents of ``msg`` will be +identical to ones sent by the sender. + +If a response is required, a new ``rte_mp_msg`` message descriptor must be +constructed and sent via ``rte_mp_reply()`` function, along with ``peer`` +pointer. The resulting response will then be delivered to the correct requestor. + +Misc considerations +~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the underlying IPC implementation being single-threaded, recursive +requests (i.e. sending a request while responding to another request) is not +supported. However, since sending messages (not requests) does not involve an +IPC thread, sending messages while processing another message or request is +supported. + +If callbacks spend a long time processing the incoming requests, the requestor +might time out, so setting the right timeout value on the requestor side is +imperative. + +If some of the messages timed out, ``nb_sent`` and ``nb_received`` fields in the +``rte_mp_reply`` descriptor will not have matching values. This is not treated +as error by the IPC API, and it is expected that the user will be responsible +for deciding how to handle such cases. + +If a callback has been registered, IPC will assume that it is safe to call it. +This is important when registering callbacks during DPDK initialization. +During initialization, IPC will consider the receiving side as non-existing if +the callback has not been registered yet. However, once the callback has been +registered, it is expected that IPC should be safe to trigger it, even if the +rest of the DPDK initialization hasn't finished yet. -- 2.7.4