DPDK patches and discussions
 help / color / mirror / Atom feed
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
To: Yang Ming <ming.1.yang@nokia-sbell.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH 1/2] net/mlx5: improve socket file path
Date: Fri, 14 Mar 2025 11:48:51 +0000	[thread overview]
Message-ID: <CH3PR12MB8460C2AFAD3227CD704A617FA4D22@CH3PR12MB8460.namprd12.prod.outlook.com> (raw)
In-Reply-To: <47f4086f-b1eb-4d18-a433-e53593afceb4@nokia-sbell.com>

Hi,

> From: Yang Ming <ming.1.yang@nokia-sbell.com> 
> Sent: Wednesday, March 12, 2025 3:56 AM
> To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam <orika@nvidia.com>; > Suanming Mou <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>; dev@dpdk.org
> Subject: Re: [PATCH 1/2] net/mlx5: improve socket file path
> 
> External email: Use caution opening links or attachments 
> 
> 
> On 2025/1/3 10:51, Ming 1. Yang (NSB) wrote:
> 
> On 2024/12/14 01:16, Bruce Richardson wrote: 
> On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote:
> On Fri, 13 Dec 2024 17:24:42 +0800
> Yang Ming mailto:ming.1.yang@nokia-sbell.com wrote:
> 
> 1. /var/tmp is hard code which is not a good style
> 2. /var/tmp may be not allowed to be written via container's
> read only mode.
> 
> Signed-off-by: Yang Ming mailto:ming.1.yang@nokia-sbell.com
> Since this is a unix domain socket, why not use abstract socket
> that doesn't have to be associated with filesystem?
> In general, I think we should avoid abstract sockets in DPDK. Primary
> reason is that they are linux-specific. Last time I checked other unixes,
> like BSD, don't support them. A secondary concern is that having a
> filesystem path allows permission checks, so for e.g. telemetry sockets,
> only users with appropriate permissions can connect. With an abstract socket
> we'd have to open up the area of user authentication.
> 
> /Bruce
> 
> Hi Stephen & Bruce,
> I'm not sure whether abstract socket is a good idea. Maybe it can be improved further or step by step. But we don't need to discuss it for this commit. 
> We do this improvement because "/var/tmp" and "/var/log" can't be write in Readonly mode of container except that we add /var/ specfic for DPDK > application in container's setting. But nearly all DPDK modules have already used common runtime path returned from `rte_eal_get_runtime_dir()`. Why > not we apply this common path for Mellanox NIC?
> 
> 
> 
> Hi Stephen,
> 
> I'm not entirely sure whether using an abstract socket is the best approach. It might be possible to improve it further or incrementally. However, we > don't need to discuss this for the current commit.
> We made this improvement because the directories "/var/tmp" and "/var/log" cannot be written to in a container with read-only mode, unless we > specifically configure the /var/ directory for the DPDK application in the container's settings. Nearly all DPDK modules already use the common runtime > path returned by rte_eal_get_runtime_dir(). Therefore, it makes sense to apply this common path for the Mellanox NIC as well.
> Actually, the objective of this patch series is to prevent the DPDK Mellanox driver from crashing when attempting to access the read-only directories "/> var/" in a container.
> 
> Brs,
> Yang Ming

Let me provide the context for the functionality in question here.

mlx5 PMD has an ability to dump HW flow rules using mlx_steering_dump tool
(documented in https://doc.dpdk.org/guides/nics/mlx5.html#how-to-dump-flows and in https://github.com/Mellanox/mlx_steering_dump/tree/master/hws).
Dumping itself is supported only on Linux.
There are 2 ways to use that tool:

1. Application calls rte_flow_dev_dump(), providing FILE* handle.
  This saves the flow rules metadata (e.g., IDs of HW objects) in the file.
  mlx_steering_dump tool is then used to parse it and dump HW rules.
2. mlx_steering_dump communicates through the Unix socket with mlx5 PMD, sharing destination file descriptor of output metadata file with DPDK process through SCM_RIGHTS mechanism.
  mlx5 PMD internally calls rte_flow_dev_dump(), passing the provided file.
  After dumping is done, the tool parses the metadata file and extracts rules from the HW.

In practice, 2nd option is the more frequently used one, because of its convenience since it does not require modification of application code.
It has also the benefit that the tool itself owns the output dump file e.g., tool could be called from outside of the container and debug dump will be generated in host's context, not container context (assuming socket path is mounted on the host).
This is the case for example when containers use VFs of mlx5 NICs.

Changing the filesystem path of the Unix socket would require the update of all the tooling - there would be a need to copy the logic of runtime directory discovery (like it is done in dpdk-telemetry.py).
Until it is done, this change would introduce a breakage for existing users.

What do you think about fallback mechanism here? If "/var/tmp" is read-only or creation of the socket on this path fails, then socket will be created in EAL runtime directory.

Best regards,
Dariusz Sosnowski



      reply	other threads:[~2025-03-14 11:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  9:24 Yang Ming
2024-12-13  9:24 ` [PATCH 2/2] net/mlx5: improve log " Yang Ming
2025-03-04  6:23   ` Bing Zhao
2025-03-05  3:20     ` Yang Ming
2025-03-10 14:59     ` Stephen Hemminger
2025-03-12  2:32       ` [External] " Yang Ming
2024-12-13 17:12 ` [PATCH 1/2] net/mlx5: improve socket " Stephen Hemminger
2024-12-13 17:16   ` Bruce Richardson
2025-01-03  2:51     ` Yang Ming
2025-03-12  2:55       ` Yang Ming
2025-03-14 11:48         ` Dariusz Sosnowski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH3PR12MB8460C2AFAD3227CD704A617FA4D22@CH3PR12MB8460.namprd12.prod.outlook.com \
    --to=dsosnowski@nvidia.com \
    --cc=bingz@nvidia.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=ming.1.yang@nokia-sbell.com \
    --cc=orika@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).