DPDK patches and discussions
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: David Christensen <drc@linux.vnet.ibm.com>
Cc: david.marchand@redhat.com, dev@dpdk.org, "Burakov,
	Anatoly" <anatoly.burakov@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing
Date: Wed, 07 Oct 2020 14:49:30 +0200
Message-ID: <1611210.Cl7YQ8O76l@thomas> (raw)
In-Reply-To: <2c830988-c4db-7bdc-50f3-3fa445a81673@intel.com>

Hi David,
Do you plan to send a v4?

17/09/2020 13:13, Burakov, Anatoly:
> On 10-Aug-20 10:07 PM, David Christensen wrote:
> > The SPAPR IOMMU requires that a DMA window size be defined before memory
> > can be mapped for DMA. Current code dynamically modifies the DMA window
> > size in response to every new memory allocation which is potentially
> > dangerous because all existing mappings need to be unmapped/remapped in
> > order to resize the DMA window, leaving hardware holding IOVA addresses
> > that are temporarily unmapped.  The new SPAPR code statically assigns
> > the DMA window size on first use, using the largest physical memory
> > memory address when IOVA=PA and the highest existing memseg virtual
> > address when IOVA=VA.
> > 
> > Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
> > ---
> 
> <snip>
> 
> > +struct spapr_size_walk_param {
> > +	uint64_t max_va;
> > +	uint64_t page_sz;
> > +	int external;
> > +};
> > +
> > +/*
> > + * In order to set the DMA window size required for the SPAPR IOMMU
> > + * we need to walk the existing virtual memory allocations as well as
> > + * find the hugepage size used.
> > + */
> >   static int
> > -vfio_spapr_unmap_walk(const struct rte_memseg_list *msl,
> > -		const struct rte_memseg *ms, void *arg)
> > +vfio_spapr_size_walk(const struct rte_memseg_list *msl, void *arg)
> >   {
> > -	int *vfio_container_fd = arg;
> > +	struct spapr_size_walk_param *param = arg;
> > +	uint64_t max = (uint64_t) msl->base_va + (uint64_t) msl->len;
> >   
> > -	/* skip external memory that isn't a heap */
> > -	if (msl->external && !msl->heap)
> > -		return 0;
> > +	if (msl->external) {
> > +		param->external++;
> > +		if (!msl->heap)
> > +			return 0;
> > +	}
> 
> It would be nice to have some comments in the code explaining what we're 
> skipping and why.
> 
> Also, seems that you're using param->external as bool? This is a 
> non-public API so using stdbool is not an issue here, perhaps replace it 
> with bool param->has_external?
> 
> >   
> > -	/* skip any segments with invalid IOVA addresses */
> > -	if (ms->iova == RTE_BAD_IOVA)
> > -		return 0;
> > +	if (max > param->max_va) {
> > +		param->page_sz = msl->page_sz;
> > +		param->max_va = max;
> > +	}
> >   
> > -	return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova,
> > -			ms->len, 0);
> > +	return 0;
> >   }
> >   
> > -struct spapr_walk_param {
> > -	uint64_t window_size;
> > -	uint64_t hugepage_sz;
> > -};
> > -
> > +/*
> > + * The SPAPRv2 IOMMU supports 2 DMA windows with starting
> > + * address at 0 or 1<<59.  By default, a DMA window is set
> > + * at address 0, 2GB long, with a 4KB page.  For DPDK we
> > + * must remove the default window and setup a new DMA window
> > + * based on the hugepage size and memory requirements of
> > + * the application before we can map memory for DMA.
> > + */
> >   static int
> > -vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
> > -		const struct rte_memseg *ms, void *arg)
> > +spapr_dma_win_size(void)
> >   {
> > -	struct spapr_walk_param *param = arg;
> > -	uint64_t max = ms->iova + ms->len;
> > +	struct spapr_size_walk_param param;
> >   
> > -	/* skip external memory that isn't a heap */
> > -	if (msl->external && !msl->heap)
> > +	/* only create DMA window once */
> > +	if (spapr_dma_win_len > 0)
> >   		return 0;
> >   
> > -	/* skip any segments with invalid IOVA addresses */
> > -	if (ms->iova == RTE_BAD_IOVA)
> > -		return 0;
> > +	/* walk the memseg list to find the page size/max VA address */
> > +	memset(&param, 0, sizeof(param));
> > +	if (rte_memseg_list_walk(vfio_spapr_size_walk, &param) < 0) {
> > +		RTE_LOG(ERR, EAL, "Failed to walk memseg list for DMA "
> > +			"window size\n");
> > +		return -1;
> > +	}
> > +
> > +	/* We can't be sure if DMA window covers external memory */
> > +	if (param.external > 0)
> > +		RTE_LOG(WARNING, EAL, "Detected external memory which may "
> > +			"not be managed by the IOMMU\n");
> > +
> > +	/* find the maximum IOVA address for setting the DMA window size */
> > +	if (rte_eal_iova_mode() == RTE_IOVA_PA) {
> > +		static const char proc_iomem[] = "/proc/iomem";
> > +		static const char str_sysram[] = "System RAM";
> > +		uint64_t start, end, max = 0;
> > +		char *line = NULL;
> > +		char *dash, *space;
> > +		size_t line_len;
> > +
> > +		/*
> > +		 * Example "System RAM" in /proc/iomem:
> > +		 * 00000000-1fffffffff : System RAM
> > +		 * 200000000000-201fffffffff : System RAM
> > +		 */
> > +		FILE *fd = fopen(proc_iomem, "r");
> > +		if (fd == NULL) {
> > +			RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_iomem);
> > +			return -1;
> > +		}
> > +		/* Scan /proc/iomem for the highest PA in the system */
> > +		while (getline(&line, &line_len, fd) != -1) {
> > +			if (strstr(line, str_sysram) == NULL)
> > +				continue;
> > +
> > +			space = strstr(line, " ");
> > +			dash = strstr(line, "-");
> > +
> > +			/* Validate the format of the memory string */
> > +			if (space == NULL || dash == NULL || space < dash) {
> > +				RTE_LOG(ERR, EAL, "Can't parse line \"%s\" in "
> > +					"file %s\n", line, proc_iomem);
> > +				continue;
> > +			}
> > +
> > +			start = strtoull(line, NULL, 16);
> > +			end   = strtoull(dash + 1, NULL, 16);
> > +			RTE_LOG(DEBUG, EAL, "Found system RAM from 0x%"
> > +				PRIx64 " to 0x%" PRIx64 "\n", start, end);
> > +			if (end > max)
> > +				max = end;
> > +		}
> > +		free(line);
> > +		fclose(fd);
> 
> I would've put all of this file reading business into a separate 
> function, as otherwise it's a bit hard to follow the mix of file ops and 
> using the results. Something like
> 
> value = get_value_from_iomem();
> if (value > ...)
> ...
> 
> is much easier on the eyes :)
> 
> >   
> > -	if (max > param->window_size) {
> > -		param->hugepage_sz = ms->hugepage_sz;
> > -		param->window_size = max;
> > +		if (max == 0) {
> > +			RTE_LOG(ERR, EAL, "Failed to find valid \"System RAM\" "
> > +				"entry in file %s\n", proc_iomem);
> > +			return -1;
> > +		}
> > +
> > +		spapr_dma_win_len = rte_align64pow2(max + 1);
> > +		RTE_LOG(DEBUG, EAL, "Setting DMA window size to 0x%"
> > +			PRIx64 "\n", spapr_dma_win_len);
> > +	} else if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > +		RTE_LOG(DEBUG, EAL, "Highest VA address in memseg list is 0x%"
> > +			PRIx64 "\n", param.max_va);
> > +		spapr_dma_win_len = rte_align64pow2(param.max_va);
> > +		RTE_LOG(DEBUG, EAL, "Setting DMA window size to 0x%"
> > +			PRIx64 "\n", spapr_dma_win_len);
> > +	} else {
> > +		RTE_LOG(ERR, EAL, "Unsupported IOVA mode\n");
> > +		return -1;
> >   	}
> >   
> > +	spapr_dma_win_page_sz = param.page_sz;
> > +	rte_mem_set_dma_mask(__builtin_ctzll(spapr_dma_win_len));
> >   	return 0;
> >   }
> >   
> >   static int
> > -vfio_spapr_create_new_dma_window(int vfio_container_fd,
> > -		struct vfio_iommu_spapr_tce_create *create) {
> > +vfio_spapr_create_dma_window(int vfio_container_fd)
> > +{
> > +	struct vfio_iommu_spapr_tce_create create = {
> > +		.argsz = sizeof(create), };
> >   	struct vfio_iommu_spapr_tce_remove remove = {
> > -		.argsz = sizeof(remove),
> > -	};
> > +		.argsz = sizeof(remove), };
> >   	struct vfio_iommu_spapr_tce_info info = {
> > -		.argsz = sizeof(info),
> > -	};
> > +		.argsz = sizeof(info), };
> >   	int ret;
> >   
> > -	/* query spapr iommu info */
> > +	ret = spapr_dma_win_size();
> > +	if (ret < 0)
> > +		return ret;
> > +
> >   	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> >   	if (ret) {
> > -		RTE_LOG(ERR, EAL, "  cannot get iommu info, "
> > -				"error %i (%s)\n", errno, strerror(errno));
> 
> Here and in other similar places, no need to split strings into multiline.
> 
> Overall, since these changes are confined to PPC64 i can't really test 
> these, but with the above changes:
> 
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> 






  reply	other threads:[~2020-10-07 12:49 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14   ` Burakov, Anatoly
2020-04-30 16:22     ` David Christensen
2020-04-30 16:24       ` Burakov, Anatoly
2020-04-30 17:38         ` David Christensen
2020-05-01  8:49           ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34   ` Burakov, Anatoly
2020-04-30 17:36     ` David Christensen
2020-05-01  9:06       ` Burakov, Anatoly
2020-05-01 16:48         ` David Christensen
2020-05-05 14:57           ` Burakov, Anatoly
2020-05-05 16:26             ` David Christensen
2020-05-06 10:18               ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38   ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07   ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07     ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55       ` David Christensen
2020-09-17 11:13       ` Burakov, Anatoly
2020-10-07 12:49         ` Thomas Monjalon [this message]
2020-10-07 17:44         ` David Christensen
2020-10-08  9:39           ` Burakov, Anatoly
2020-10-12 19:19             ` David Christensen
2020-10-14  9:27               ` Burakov, Anatoly
2020-10-15 17:23     ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23       ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05         ` Thomas Monjalon
2020-10-29 21:30           ` Thomas Monjalon
2020-11-02 11:04         ` Burakov, Anatoly
2020-11-03 22:05       ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05         ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43           ` Thomas Monjalon
2020-11-04 21:00             ` David Christensen
2020-11-04 21:02               ` Thomas Monjalon
2020-11-04 22:25                 ` David Christensen
2020-11-05  7:12                   ` Thomas Monjalon
2020-11-06 22:16                     ` David Christensen
2020-11-07  9:58                       ` Thomas Monjalon
2020-11-09 20:35         ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35           ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10             ` Thomas Monjalon
2020-11-10 17:41           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13  8:39               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1611210.Cl7YQ8O76l@thomas \
    --to=thomas@monjalon.net \
    --cc=anatoly.burakov@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git