DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures
@ 2019-10-30 18:02 Jerin Jacob Kollanukkaran
  2019-11-11 14:01 ` Gavin Hu (Arm Technology China)
  2019-11-13 23:08 ` David Christensen
  0 siblings, 2 replies; 4+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-10-30 18:02 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, David Christensen,
	bruce.richardson, konstantin.ananyev, hemant.agrawal,
	Shahaf Shuler, Honnappa Nagarahalli, Gavin Hu, viktorin,
	anatoly.burakov

CC:  Arch and platform maintainers

While reviewing the mempool objection allocation requirements in the code, 

A) it's found that in the default case, mempool objects have padding 
in the object trailer to have start addresses of objects among the different channels,
to enable equally load on the DRAM channel to have better performance

# More documentation is here
https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
in section 8.3. Memory Alignment Constraints

B) The optimize_object_size() does the channel distribution requirement
by the following formula

        new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) / RTE_MEMPOOL_ALIGN;
        while (get_gcd(new_obj_size, nrank * nchan) != 1)
               new_obj_size++;


C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2 SoC
The memory/DDR controller works in different way. Where by:
# It does XOR operation of some  of physical address lines(not the user space VA address)
to compute the hash and that the function defines the actual channel.

The XOR(kind of CRC) scheme is useful because there is natural  channel distribution
based on the address i.e No need to have padding to waste memory

So, in short the padding scheme does not need for some SoC. I trying to send the patch
to fix it. So the questions is,

# Is PPC and other ARM SoC has formula (B)  to compute DRAM channel distribution ? or
Is it specific to x86? That would define where the hooks needs to added to have proper fix.











^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures
  2019-10-30 18:02 [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures Jerin Jacob Kollanukkaran
@ 2019-11-11 14:01 ` Gavin Hu (Arm Technology China)
  2019-11-12  2:36   ` Gavin Hu (Arm Technology China)
  2019-11-13 23:08 ` David Christensen
  1 sibling, 1 reply; 4+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-11-11 14:01 UTC (permalink / raw)
  To: jerinj, dev
  Cc: Olivier Matz, Andrew Rybchenko, David Christensen,
	bruce.richardson, konstantin.ananyev, hemant.agrawal,
	Shahaf Shuler, Honnappa Nagarahalli, viktorin, anatoly.burakov,
	Steve Capper, Ola Liljedahl, nd

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Sent: Thursday, October 31, 2019 2:02 AM
> To: dev@dpdk.org
> Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; David Christensen <drc@linux.vnet.ibm.com>;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> hemant.agrawal@nxp.com; Shahaf Shuler <shahafs@mellanox.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>; viktorin@rehivetech.com;
> anatoly.burakov@intel.com
> Subject: Mbuf memory alignment constraints for (micro)architectures
> 
> CC:  Arch and platform maintainers
> 
> While reviewing the mempool objection allocation requirements in the code,
> 
> A) it's found that in the default case, mempool objects have padding
> in the object trailer to have start addresses of objects among the different
> channels,
> to enable equally load on the DRAM channel to have better performance
> 
> # More documentation is here
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> in section 8.3. Memory Alignment Constraints
> 
> B) The optimize_object_size() does the channel distribution requirement
> by the following formula
> 
>         new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
> RTE_MEMPOOL_ALIGN;
>         while (get_gcd(new_obj_size, nrank * nchan) != 1)
>                new_obj_size++;
> 
> 
> C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> SoC
> The memory/DDR controller works in different way. Where by:
> # It does XOR operation of some  of physical address lines(not the user space
> VA address)
> to compute the hash and that the function defines the actual channel.
> 
> The XOR(kind of CRC) scheme is useful because there is natural  channel
> distribution
> based on the address i.e No need to have padding to waste memory
> 
> So, in short the padding scheme does not need for some SoC. I trying to send
> the patch
> to fix it. So the questions is,
> 
> # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel
> distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to have
> proper fix.
Reading through some documents, both x86 and arm, and having internal discussion,
it looks like this is specific to x86, x86 spreads adjacent virtual addresses within a page across multiple memory devices, 
the interleaving was done per one or two cache lines. https://software.intel.com/en-us/articles/how-memory-is-accessed    

Arm leaves flexibility to implementations, no fixed pattern for interleaving and thus it can hardly be generalized. 
/Gavin
> 
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures
  2019-11-11 14:01 ` Gavin Hu (Arm Technology China)
@ 2019-11-12  2:36   ` Gavin Hu (Arm Technology China)
  0 siblings, 0 replies; 4+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-11-12  2:36 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), jerinj, dev
  Cc: Olivier Matz, Andrew Rybchenko, David Christensen,
	bruce.richardson, konstantin.ananyev, hemant.agrawal,
	Shahaf Shuler, Honnappa Nagarahalli, viktorin, anatoly.burakov,
	Steve Capper, Ola Liljedahl, nd, nd

Hi Jerin,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Gavin Hu (Arm Technology
> China)
> Sent: Monday, November 11, 2019 10:01 PM
> To: jerinj@marvell.com; dev@dpdk.org
> Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; David Christensen <drc@linux.vnet.ibm.com>;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> hemant.agrawal@nxp.com; Shahaf Shuler <shahafs@mellanox.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> viktorin@rehivetech.com; anatoly.burakov@intel.com; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd
> <nd@arm.com>
> Subject: Re: [dpdk-dev] Mbuf memory alignment constraints for
> (micro)architectures
> 
> Hi Jerin,
> 
> > -----Original Message-----
> > From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > Sent: Thursday, October 31, 2019 2:02 AM
> > To: dev@dpdk.org
> > Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; David Christensen
> <drc@linux.vnet.ibm.com>;
> > bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> > hemant.agrawal@nxp.com; Shahaf Shuler <shahafs@mellanox.com>;
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm
> > Technology China) <Gavin.Hu@arm.com>; viktorin@rehivetech.com;
> > anatoly.burakov@intel.com
> > Subject: Mbuf memory alignment constraints for (micro)architectures
> >
> > CC:  Arch and platform maintainers
> >
> > While reviewing the mempool objection allocation requirements in the code,
> >
> > A) it's found that in the default case, mempool objects have padding
> > in the object trailer to have start addresses of objects among the different
> > channels,
> > to enable equally load on the DRAM channel to have better performance
> >
> > # More documentation is here
> > https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> > in section 8.3. Memory Alignment Constraints
> >
> > B) The optimize_object_size() does the channel distribution requirement
> > by the following formula
> >
> >         new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
> > RTE_MEMPOOL_ALIGN;
> >         while (get_gcd(new_obj_size, nrank * nchan) != 1)
> >                new_obj_size++;
> >
> >
> > C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> > SoC
> > The memory/DDR controller works in different way. Where by:
> > # It does XOR operation of some  of physical address lines(not the user space
> > VA address)
> > to compute the hash and that the function defines the actual channel.
> >
> > The XOR(kind of CRC) scheme is useful because there is natural  channel
> > distribution
> > based on the address i.e No need to have padding to waste memory
> >
> > So, in short the padding scheme does not need for some SoC. I trying to send
> > the patch
> > to fix it. So the questions is,
> >
> > # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel
> > distribution ? or
> > Is it specific to x86? That would define where the hooks needs to added to
> have
> > proper fix.
> Reading through some documents, both x86 and arm, and having internal
> discussion,
> it looks like this is specific to x86, x86 spreads adjacent virtual addresses within
> a page across multiple memory devices,
> the interleaving was done per one or two cache lines.
> https://software.intel.com/en-us/articles/how-memory-is-accessed
> 
> Arm leaves flexibility to implementations, no fixed pattern for interleaving and
> thus it can hardly be generalized.
Same conclusion, but more words for this topic(from Arm internally):
"Interleaving (or stripping) happens at the interconnect/memory controller level, so on Arm-based systems it's going to be highly dependent on the given SoC's integration and probably the system configuration too. Arm own interconnect and DMC IPs generally offer various options to support stripping, but even then it's the integrator's choice how to use them, and obviously there are multitudes of alternative third-party IPs too.
In summary, this really depends on the system's interconnect and memory controller capabilities and how it has been configured."
/Gavin
> >
> >
> >
> >
> >
> >
> >
> >


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures
  2019-10-30 18:02 [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures Jerin Jacob Kollanukkaran
  2019-11-11 14:01 ` Gavin Hu (Arm Technology China)
@ 2019-11-13 23:08 ` David Christensen
  1 sibling, 0 replies; 4+ messages in thread
From: David Christensen @ 2019-11-13 23:08 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, dev
  Cc: Olivier Matz, Andrew Rybchenko, bruce.richardson,
	konstantin.ananyev, hemant.agrawal, Shahaf Shuler,
	Honnappa Nagarahalli, Gavin Hu, viktorin, anatoly.burakov

> # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to have proper fix.
The Power 9 chip has eight memory channels, each with a dedicated memory 
controller unit (MCU).  The MCUs can be configured into one or more 
address interleave groups (with 1, 2, 3, 4, 6, or 8 MCUs), with a 
programmable interleave granularity of 128B to 32KB.  Trying to find 
more info on how to access this configuration data and expose it to the 
DPDK.

Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-13 23:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30 18:02 [dpdk-dev] Mbuf memory alignment constraints for (micro)architectures Jerin Jacob Kollanukkaran
2019-11-11 14:01 ` Gavin Hu (Arm Technology China)
2019-11-12  2:36   ` Gavin Hu (Arm Technology China)
2019-11-13 23:08 ` David Christensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).