dpdk-setup has been used for a long time in order to compile and configure dpdk along with running some basic applications. dpdk-setup uses the make build system to compile which is now deprecated. In addition to this it has been discussed on the mailing list a few times that dpdk-setup UI is quite old and it needs improvement along with addition of some other facilities. This had created a need for python curses based script that would provide similar functionality as dpdk-setup but with more options and better UI. The idea is almost similar to kernel's make menuconfig. The reason to select python curses is that it comes as a standard library with python. The script will use the meson build system for compilation. Here is a link containing suggested UI: https://drive.google.com/file/d/18ngGpO_e-8FYNKjkKqS1IKQSrDDcXSO6/view?usp=sharing The following options will be present in the Menu and Sub-Menu: Compile Compile with gcc Compile with icc Compile with clang Compile examples Cross compile arm64_armada_linux_gcc arm64_armv8_linux_gcc arm64_bluefield_linux_gcc arm64_dpaa_linux_gcc arm64_emag_linux_gcc arm64_n1sdp_linux_gcc arm64_octeontx2_linux_gcc arm64_stingray_linux_gcc arm64_thunderx2_linux_gcc Arm64_thunderx_linux_gcc Ppc64le-power8-linux-gcc cross-mingw Hugepages Setup hugepage for non-NUMA Setup hugepages for NUMA Remove hugepage mappings Insert module Setup VFIO permissions VFIO KNI IBG UIO Remove module VFIO KNI IBG UIO Bind and Unbind devices Bind device to IGB UIO Bind device to VFIO Unbind devices from IGB UIO or VFIO driver Display Hugepages info Current device settings Run Applications Test application → prompt user to enter flags and possibly build directory name. Also give a default options for flags Testpmd application → prompt user to enter flags and possibly build directory name. Also give a default options for flags In addition to this, the user will have the facility to provide any additional flags for compilation if he/she wishes to.
On Tue, 18 Aug 2020 17:39:19 +0500
Sarosh Arif <sarosh.arif@emumba.com> wrote:
> dpdk-setup has been used for a long time in order to compile and
> configure dpdk along with running some basic applications. dpdk-setup
> uses the make build system to compile which is now deprecated. In addition
> to this it has been discussed on the mailing list a few times that
> dpdk-setup UI is quite old and it needs improvement along with
> addition of some other facilities. This had created a need for python
> curses based script that would provide similar functionality as
> dpdk-setup but with more options and better UI. The idea is almost similar
> to kernel's make menuconfig. The reason to select python curses is that it
> comes as a standard library with python. The script will use the meson build
> system for compilation.
>
> Here is a link containing suggested UI:
> https://drive.google.com/file/d/18ngGpO_e-8FYNKjkKqS1IKQSrDDcXSO6/view?usp=sharing
>
> The following options will be present in the Menu and Sub-Menu:
>
> Compile
> Compile with gcc
> Compile with icc
> Compile with clang
> Compile examples
> Cross compile
> arm64_armada_linux_gcc
> arm64_armv8_linux_gcc
> arm64_bluefield_linux_gcc
> arm64_dpaa_linux_gcc
> arm64_emag_linux_gcc
> arm64_n1sdp_linux_gcc
> arm64_octeontx2_linux_gcc
> arm64_stingray_linux_gcc
> arm64_thunderx2_linux_gcc
> Arm64_thunderx_linux_gcc
> Ppc64le-power8-linux-gcc
> cross-mingw
> Hugepages
> Setup hugepage for non-NUMA
> Setup hugepages for NUMA
> Remove hugepage mappings
> Insert module
> Setup VFIO permissions
> VFIO
> KNI
> IBG UIO
> Remove module
> VFIO
> KNI
> IBG UIO
> Bind and Unbind devices
> Bind device to IGB UIO
> Bind device to VFIO
> Unbind devices from IGB UIO or VFIO driver
> Display
> Hugepages info
> Current device settings
> Run Applications
> Test application → prompt user to enter flags and possibly build
> directory name. Also give a default options for flags
> Testpmd application → prompt user to enter flags and possibly
> build directory name. Also give a default options for flags
>
> In addition to this, the user will have the facility to provide any
> additional flags for compilation if he/she wishes to.
I would prefer a set of scripts that each do one thing.
Having a GUI is a lot of overhead to support.
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sarosh Arif
> Sent: Tuesday, August 18, 2020 2:39 PM
>
> dpdk-setup has been used for a long time in order to compile and
> configure dpdk along with running some basic applications. dpdk-setup
> uses the make build system to compile which is now deprecated. In
> addition
> to this it has been discussed on the mailing list a few times that
> dpdk-setup UI is quite old and it needs improvement along with
> addition of some other facilities. This had created a need for python
> curses based script that would provide similar functionality as
> dpdk-setup but with more options and better UI. The idea is almost
> similar
> to kernel's make menuconfig. The reason to select python curses is that
> it
> comes as a standard library with python. The script will use the meson
> build
> system for compilation.
>
Looks good.
I often use the kernel's make menuconfig to explore configuration options. And review how they affect the resulting configuration file. Something similar in DPDK would be nice.
But we must have a command line based build system, so our entire project can be built from scratch without any human intervention after checking out the project from our repository.
And there must be a 1:1 relationship between the options offered by the GUI and the command line versions of the configuration tool.
BTW: We cross compile, even though our target is also x86_64. This ensures that the distributed binary firmware is built by the cross compiler kept in the project repository together with the source code of the firmware. This way, the resulting firmware is the same, regardless which host we built it on.
Med venlig hilsen / kind regards
- Morten Brørup
18/08/2020 19:09, Stephen Hemminger:
> On Tue, 18 Aug 2020 17:39:19 +0500
> Sarosh Arif <sarosh.arif@emumba.com> wrote:
>
> > dpdk-setup has been used for a long time in order to compile and
> > configure dpdk along with running some basic applications. dpdk-setup
> > uses the make build system to compile which is now deprecated. In addition
> > to this it has been discussed on the mailing list a few times that
> > dpdk-setup UI is quite old and it needs improvement along with
> > addition of some other facilities. This had created a need for python
> > curses based script that would provide similar functionality as
> > dpdk-setup but with more options and better UI. The idea is almost similar
> > to kernel's make menuconfig. The reason to select python curses is that it
> > comes as a standard library with python. The script will use the meson build
> > system for compilation.
> >
> > Here is a link containing suggested UI:
> > https://drive.google.com/file/d/18ngGpO_e-8FYNKjkKqS1IKQSrDDcXSO6/view?usp=sharing
> >
> > The following options will be present in the Menu and Sub-Menu:
> >
> > Compile
> > Compile with gcc
> > Compile with icc
> > Compile with clang
> > Compile examples
> > Cross compile
> > arm64_armada_linux_gcc
> > arm64_armv8_linux_gcc
> > arm64_bluefield_linux_gcc
> > arm64_dpaa_linux_gcc
> > arm64_emag_linux_gcc
> > arm64_n1sdp_linux_gcc
> > arm64_octeontx2_linux_gcc
> > arm64_stingray_linux_gcc
> > arm64_thunderx2_linux_gcc
> > Arm64_thunderx_linux_gcc
> > Ppc64le-power8-linux-gcc
> > cross-mingw
> > Hugepages
> > Setup hugepage for non-NUMA
> > Setup hugepages for NUMA
> > Remove hugepage mappings
> > Insert module
> > Setup VFIO permissions
> > VFIO
> > KNI
> > IBG UIO
> > Remove module
> > VFIO
> > KNI
> > IBG UIO
> > Bind and Unbind devices
> > Bind device to IGB UIO
> > Bind device to VFIO
> > Unbind devices from IGB UIO or VFIO driver
> > Display
> > Hugepages info
> > Current device settings
> > Run Applications
> > Test application → prompt user to enter flags and possibly build
> > directory name. Also give a default options for flags
> > Testpmd application → prompt user to enter flags and possibly
> > build directory name. Also give a default options for flags
> >
> > In addition to this, the user will have the facility to provide any
> > additional flags for compilation if he/she wishes to.
>
> I would prefer a set of scripts that each do one thing.
> Having a GUI is a lot of overhead to support.
Me too, I prefer simple scripts.
And I prefer even more documenting simple tasks.
We can extract the lines for hugepages settings in a standalone script.
Perhaps doing the same for VFIO setup.
Not sure about the rest.
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- This is lightly tested, it still needs testing on multiple architectures etc. usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100755 usertools/hugepage-setup.sh diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh new file mode 100755 index 000000000000..df132e2f8d64 --- /dev/null +++ b/usertools/hugepage-setup.sh @@ -0,0 +1,169 @@ +#! /bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2010-2014 Intel Corporation +# + +usage() +{ + echo "Usage: $0 size [pagesize]" + echo " size is in bytes with optional M or G suffix" + echo " pagesize is the pagesize to use" + exit 1 +} + +get_pagesize() +{ + SIZE="$1" + + if [[ "$SIZE" =~ ^[0-9]+G$ ]]; then + echo $((${SIZE%%G} * 1024 * 1024)) + elif [[ "$SIZE" =~ ^[0-9]+M$ ]]; then + echo $((${SIZE%%M} * 1024)) + elif [[ "$SIZE" =~ ^[0-9]+K$ ]]; then + echo ${SIZE%%K} + elif [[ "$SIZE" =~ ^[0-9]+$ ]]; then + if [ $((SIZE % 1024)) -ne 0 ]; then + exit 1 + else + echo $((SIZE / 1024)) + fi + else + exit 1 + fi +} + +# +# Creates hugepage filesystem. +# +create_mnt_huge() +{ + echo "Creating /mnt/huge and mounting as hugetlbfs" + mkdir -p /mnt/huge + + grep -s '/mnt/huge' /proc/mounts > /dev/null + if [ $? -ne 0 ] ; then + mount -t hugetlbfs -o pagesize=${PAGESIZE} nodev /mnt/huge + fi +} + +# +# Removes hugepage filesystem. +# +remove_mnt_huge() +{ + echo "Unmounting /mnt/huge and removing directory" + grep -s '/mnt/huge' /proc/mounts > /dev/null + if [ $? -eq 0 ] ; then + umount /mnt/huge + fi + + if [ -d /mnt/huge ] ; then + rm -R /mnt/huge + fi +} +# +# Removes all reserved hugepages. +# +clear_huge_pages() +{ + echo > .echo_tmp + for d in /sys/devices/system/node/node? ; do + for sz in $d/hugepages/hugepages-* ; do + echo "echo 0 > ${sz}/nr_hugepages" >> .echo_tmp + done + done + echo "Removing currently reserved hugepages" + sh .echo_tmp + rm -f .echo_tmp + + remove_mnt_huge +} + +# +# Creates hugepages. +# +set_non_numa_pages() +{ + path=/sys/kernel/mm/hugepages/hugepages-${HUGEPGSZ}kB + if [ ! -d $path ]; then + >&2 echo "${HUGEPGSZ}K is not a valid huge page size" + exit 1 + fi + for sz in /sys/kernel/mm/hugepages/hugepages-* ; do + echo "echo 0 > ${sz}/nr_hugepages" >> .echo_tmp + done + + echo "Reserving $PAGES hugepages of size $HUGEPGSZ kB" + echo $PAGES > $path/nr_hugepages + + create_mnt_huge +} + +# +# Creates hugepages on specific NUMA nodes. +# +set_numa_pages() +{ + clear_huge_pages + + echo > .echo_tmp + for d in /sys/devices/system/node/node? ; do + node=$(basename $d) + path="$d/hugepages/hugepages-${HUGEPGSZ}kB" + if [ ! -d $path ]; then + >&2 echo "${HUGEPGSZ}K is not a valid huge page size" + exit 1 + fi + + echo "echo $Pages > $path" >> .echo_tmp + done + echo "Reserving $PAGES hugepages of size $HUGEPGSZ kB (numa)" + sh .echo_tmp + rm -f .echo_tmp + + create_mnt_huge +} + +# +# Need size argument +# +[ $# -ge 1 ] || usage + +# +# Convert from size to pages +# +KSIZE=$(get_pagesize $1) +if [ $? -ne 0 ]; then + >&2 echo "Invalid huge area size: $1" + exit 1 +fi + +# +# Optional second argument is pagesize +# +if [ $# -gt 1 ]; then + HUGEPGSZ=$(get_pagesize $2) + if [ $? -ne 0 ]; then + >&2 echo "Invalid huge page size: $2" + exit 1 + fi +else + HUGEPGSZ=$(awk '/^Hugepagesize/ { print $2 }' /proc/meminfo ) +fi + +if [ $((KSIZE % HUGEPGSZ)) -ne 0 ] ; then + echo "Invalid number of huge pages $KSIZE K, should be multiple of $HUGEPGSZ K" + exit 1 +fi + +PAGES=$((KSIZE / HUGEPGSZ)) +PAGESIZE=$((HUGEPGSZ * 1024)) + +# +# Do NUMA if necessary +# +if [ -e /sys/devices/numa/node ]; then + set_numa_pages +else + set_non_numa_pages +fi -- 2.27.0
On 9/1/2020 5:56 PM, Stephen Hemminger wrote:
> This is an improved version of the setup of huge pages
> bases on earlier DPDK setup. Differences are:
> * it autodetects NUMA vs non NUMA
> * it allows setting different page sizes
> recent kernels support multiple sizes.
> * it accepts a parameter in bytes (not pages).
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> This is lightly tested, it still needs testing on multiple architectures
> etc.
>
Thanks.
It can be useful to have options to display current hugepage settings and remove
the allocation.
On Tue, Sep 01, 2020 at 09:56:43AM -0700, Stephen Hemminger wrote:
> This is an improved version of the setup of huge pages
> bases on earlier DPDK setup. Differences are:
> * it autodetects NUMA vs non NUMA
> * it allows setting different page sizes
> recent kernels support multiple sizes.
> * it accepts a parameter in bytes (not pages).
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> This is lightly tested, it still needs testing on multiple architectures
> etc.
>
> usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 169 insertions(+)
> create mode 100755 usertools/hugepage-setup.sh
>
> diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh
> new file mode 100755
> index 000000000000..df132e2f8d64
> --- /dev/null
> +++ b/usertools/hugepage-setup.sh
> @@ -0,0 +1,169 @@
> +#! /bin/bash
Is there a good reason to limit this to bash rather than general "sh"?
Also, if we ever see this script being expanded to cover more, would it be
more extensible in python rather than shell?
On Wed, 2 Sep 2020 10:55:07 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Tue, Sep 01, 2020 at 09:56:43AM -0700, Stephen Hemminger wrote:
> > This is an improved version of the setup of huge pages
> > bases on earlier DPDK setup. Differences are:
> > * it autodetects NUMA vs non NUMA
> > * it allows setting different page sizes
> > recent kernels support multiple sizes.
> > * it accepts a parameter in bytes (not pages).
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > This is lightly tested, it still needs testing on multiple architectures
> > etc.
> >
> > usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 169 insertions(+)
> > create mode 100755 usertools/hugepage-setup.sh
> >
> > diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh
> > new file mode 100755
> > index 000000000000..df132e2f8d64
> > --- /dev/null
> > +++ b/usertools/hugepage-setup.sh
> > @@ -0,0 +1,169 @@
> > +#! /bin/bash
>
> Is there a good reason to limit this to bash rather than general "sh"?
>
> Also, if we ever see this script being expanded to cover more, would it be
> more extensible in python rather than shell?
Mainly because bash has arithmetic operations, and doing it with normal shell
requires using expr.
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). If necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v2 -- rewrite in python The script is python3 only because supporting older versions no longer makes any sense. usertools/hugepage-setup.py | 317 ++++++++++++++++++++++++++++++++++++ 1 file changed, 317 insertions(+) create mode 100644 usertools/hugepage-setup.py diff --git a/usertools/hugepage-setup.py b/usertools/hugepage-setup.py new file mode 100644 index 000000000000..8e7642428d9e --- /dev/null +++ b/usertools/hugepage-setup.py @@ -0,0 +1,317 @@ +# Copyright (c) 2020 Microsoft Corporation +# +# Script to query and setup huge pages for DPDK applications. + +import sys +import os +import re +import getopt +import glob +from os.path import exists, basename + +# convention for where to mount huge pages +hugedir = '/dev/hugepages' + +# command-line flags +show_flag = None +reserve_kb = None +clear_flag = None +hugepagesize_kb = None +mount_flag = None +unmount_flag = None + + +def usage(): + '''Print usage information for the program''' + global hugedir + mnt = hugedir + argv0 = basename(sys.argv[0]) + print(""" +Usage: +------ + %(argv0)s [options] + +Options: + --help, --usage: + Display usage information and quit + + -s, --show: + Print the current huge page configuration. + + --setup: + Simplified version of clear, umount, reserve, mount operations + + -c, --clear: + Remove all huge pages + + -r, --reserve: + Reserve huge pages. The size specified is in bytes, with + optional K, M or G suffix. The size must be a multiple + of the page size. + + -p, --pagesize + Choose page size to use. If not specified, the default + system page size will be used. + + -m, --mount + Mount the system huge page directory %(mnt)s + + -u, --umount + Unmount the system huge page directory %(mnt)s + + +Examples: +--------- + +To display current huge page settings: + %(argv0)s -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + %(argv0)s -p 1G --setup 2G + +Equivalent to: + %(argv0)s -p 1G -c -u -r 2G -m + +To clear existing huge page settings and umount %(mnt)s + %(argv0)s -c -u + + """ % locals()) + + +def fmt_memsize(sz): + '''Format memory size in conventional format''' + sz_kb = int(sz) + if sz_kb >= 1024 * 1024: + return '{}Gb'.format(sz_kb / (1024 * 1024)) + elif sz_kb >= 1024: + return '{}Mb'.format(sz_kb / 1024) + else: + return '{}Kb'.format(sz_kb) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + m = re.match('(\d+)([GMKgmk]?)$', arg) + if m is None: + sys.exit('{} is not a valid page size'.format(arg)) + + num = float(m.group(1)) + suf = m.group(2) + if suf == "G" or suf == "g": + return int(num * 1024 * 1024) + elif suf == "M" or suf == "m": + return int(num * 1024) + elif suf == "K" or suf == "k": + return int(num) + else: + return int(num / 1024.) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return exists('/sys/devices/numa/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as f: + return int(f.read()) + return 0 + + +def show_numa_pages(): + print('Node Pages Size') + for n in glob.glob('/sys/devices/system/node/node*'): + path = n + '/hugepages' + node = n[29:] # slice after /sys/devices/system/node/node + for d in os.listdir(path): + sz = d[10:-2] # slice out of hugepages-NNNkB + nr_pages = get_hugepages(path + '/' + d) + if nr_pages > 0: + pg_sz = fmt_memsize(sz) + print('{:<4} {:<5} {}'.format(node, nr_pages, pg_sz)) + + +def show_non_numa_pages(): + print('Pages Size') + path = '/sys/kernel/mm/hugepages' + for d in os.listdir(path): + sz = d[10:-2] + nr_pages = get_hugepages(path + '/' + d) + if nr_pages > 0: + pg_sz = fmt_memsize(sz) + print('{:<5} {}'.format(nr_pages, pg_sz)) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_numa_pages(): + for path in glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*'): + with open(path + '/nr_hugepages', 'w') as f: + f.write('\n0') + + +def clear_non_numa_pages(): + for path in glob.glob('/sys/kernel/mm/hugepages/hugepages-*'): + with open(path + '/nr_hugepages', 'w') as f: + f.write('0\n') + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + clear_numa_pages() + else: + clear_non_numa_pages() + + +def default_size(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as f: + for line in f: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(nr_pages, hugepgsz): + for n in glob.glob('/sys/devices/system/node/node*/hugepages'): + path = '{}/hugepages-{}kB'.format(n, hugepgsz) + if not exists(path): + sys.exit( + '{}Kb is not a valid system huge page size'.format(hugepgsz)) + + with open(path + '/nr_hugepages', 'w') as f: + f.write('{}\n'.format(nr_pages)) + + +def set_non_numa_pages(nr_pages, hugepgsz): + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) + if not exists(path): + sys.exit('{}Kb is not a valid system huge page size'.format(hugepgsz)) + + with open(path + '/nr_hugepages', 'w') as f: + f.write('{}\n'.format(nr_pages)) + + +def set_pages(pages, hugepgsz): + '''Sets the numberof huge pages to be reserved''' + if is_numa(): + set_numa_pages(pages, hugepgsz) + else: + set_non_numa_pages(pages, hugepgsz) + + +def mount_huge(pagesize): + global hugedir + cmd = "mount -t hugetlbfs" + hugedir + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize) + cmd += ' nodev {}'.format(hugedir) + os.system(cmd) + + +def show_mount(): + mounted = None + with open('/proc/mounts') as f: + for line in f: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + if not mounted: + print("Hugepages mounted on:", end=" ") + mounted = True + print(fields[1], end=" ") + if mounted: + print() + else: + print("Hugepages not mounted") + + +def parse_args(): + '''Parses the command-line arguments given by the user and takes the + appropriate action for each''' + global clear_flag + global show_flag + global reserve_kb + global hugepagesize_kb + global args + + if len(sys.argv) <= 1: + usage() + sys.exit(0) + + try: + opts, args = getopt.getopt(sys.argv[1:], "r:p:csmu", [ + "help", "usage", "show", "clear", "setup=", "eserve=", "pagesize=", + "mount", "unmount" + ]) + except getopt.GetoptError as error: + print(str(error)) + print("Run '%s --usage' for further information" % sys.argv[0]) + sys.exit(1) + + for opt, arg in opts: + if opt == "--help" or opt == "--usage": + usage() + sys.exit(0) + if opt == "--setup": + clear_flag = True + unmount_flag = True + reserve_kb = get_memsize(arg) + mount_flag = True + if opt == "--show" or opt == "-s": + show_flag = True + if opt == "--clear" or opt == "-c": + clear_flag = True + if opt == "--reserve" or opt == "-r": + reserve_kb = get_memsize(arg) + if opt == "--pagesize" or opt == "-p": + hugepagesize_kb = get_memsize(arg) + if opt == "--unmount" or opt == "-u": + unmount_flag = True + if opt == "--mount" or opt == "-m": + mount_flag = True + + +def do_arg_actions(): + '''do the actual action requested by the user''' + global clear_flag + global show_flag + global hugepagesize_kb + global reserve_kb + + if clear_flag: + clear_pages() + if unmount_flag: + os.system("umount " + hugedir) + if reserve_kb: + if hugepagesize_kb is None: + hugepagesize_kb = default_size() + if reserve_kb % hugepagesize_kb != 0: + sys.exit('{} is not a multiple of page size {}'.format( + reserve_kb, hugepagesize_kb)) + nr_pages = int(reserve_kb / hugepagesize_kb) + set_pages(nr_pages, hugepagesize_kb) + if mount_flag: + mount_huge(hugepagesize_kb * 1024) + if show_flag: + show_pages() + print() + show_mount() + + +def main(): + parse_args() + do_arg_actions() + + +if __name__ == "__main__": + main() -- 2.27.0
On Thu, Sep 03, 2020 at 03:48:31PM -0700, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > If necessary the steps of clearing old settings and mounting/umounting > can be done individually. > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Overall looks really good and readable! Thanks for this. Couple of comments inline below. /Bruce > v2 -- rewrite in python > The script is python3 only because supporting older versions > no longer makes any sense. > > usertools/hugepage-setup.py | 317 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 317 insertions(+) <snip> > +def set_numa_pages(nr_pages, hugepgsz): > + for n in glob.glob('/sys/devices/system/node/node*/hugepages'): > + path = '{}/hugepages-{}kB'.format(n, hugepgsz) > + if not exists(path): > + sys.exit( > + '{}Kb is not a valid system huge page size'.format(hugepgsz)) > + > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('{}\n'.format(nr_pages)) > + > + > +def set_non_numa_pages(nr_pages, hugepgsz): > + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) > + if not exists(path): > + sys.exit('{}Kb is not a valid system huge page size'.format(hugepgsz)) > + > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('{}\n'.format(nr_pages)) > + > + > +def set_pages(pages, hugepgsz): > + '''Sets the numberof huge pages to be reserved''' > + if is_numa(): > + set_numa_pages(pages, hugepgsz) > + else: > + set_non_numa_pages(pages, hugepgsz) > + I'm not sure I agree with this behaviour for numa nodes. When a size is specified on a numa system we probably don't want to reserve that size on all nodes. I think one of two other options actually makes more sense: 1. Divide up the allocation equally between all nodes 2. Require the user to specify a numa node for the allocation. Option #2 is best, I think. > + > +def mount_huge(pagesize): > + global hugedir > + cmd = "mount -t hugetlbfs" + hugedir > + if pagesize: > + cmd += ' -o pagesize={}'.format(pagesize) > + cmd += ' nodev {}'.format(hugedir) > + os.system(cmd) > + > + > +def show_mount(): > + mounted = None > + with open('/proc/mounts') as f: > + for line in f: > + fields = line.split() > + if fields[2] != 'hugetlbfs': > + continue > + if not mounted: > + print("Hugepages mounted on:", end=" ") > + mounted = True > + print(fields[1], end=" ") > + if mounted: > + print() > + else: > + print("Hugepages not mounted") > + > + > +def parse_args(): > + '''Parses the command-line arguments given by the user and takes the > + appropriate action for each''' > + global clear_flag > + global show_flag > + global reserve_kb > + global hugepagesize_kb > + global args > + > + if len(sys.argv) <= 1: > + usage() > + sys.exit(0) > + > + try: > + opts, args = getopt.getopt(sys.argv[1:], "r:p:csmu", [ > + "help", "usage", "show", "clear", "setup=", "eserve=", "pagesize=", Typo -> "eserve" > + "mount", "unmount" > + ]) > + except getopt.GetoptError as error: > + print(str(error)) > + print("Run '%s --usage' for further information" % sys.argv[0]) > + sys.exit(1) > + > + for opt, arg in opts: > + if opt == "--help" or opt == "--usage": > + usage() > + sys.exit(0) > + if opt == "--setup": > + clear_flag = True > + unmount_flag = True > + reserve_kb = get_memsize(arg) > + mount_flag = True > + if opt == "--show" or opt == "-s": > + show_flag = True > + if opt == "--clear" or opt == "-c": > + clear_flag = True > + if opt == "--reserve" or opt == "-r": > + reserve_kb = get_memsize(arg) > + if opt == "--pagesize" or opt == "-p": > + hugepagesize_kb = get_memsize(arg) > + if opt == "--unmount" or opt == "-u": > + unmount_flag = True > + if opt == "--mount" or opt == "-m": > + mount_flag = True > + I think the trend in python is to use argparse rather than getopt, though personally I don't have strong feelings about the issue. > + > +def do_arg_actions(): > + '''do the actual action requested by the user''' > + global clear_flag > + global show_flag > + global hugepagesize_kb > + global reserve_kb > + > + if clear_flag: > + clear_pages() > + if unmount_flag: > + os.system("umount " + hugedir) > + if reserve_kb: > + if hugepagesize_kb is None: > + hugepagesize_kb = default_size() > + if reserve_kb % hugepagesize_kb != 0: > + sys.exit('{} is not a multiple of page size {}'.format( > + reserve_kb, hugepagesize_kb)) > + nr_pages = int(reserve_kb / hugepagesize_kb) > + set_pages(nr_pages, hugepagesize_kb) > + if mount_flag: > + mount_huge(hugepagesize_kb * 1024) > + if show_flag: > + show_pages() > + print() > + show_mount() > + > + > +def main(): > + parse_args() > + do_arg_actions() > + > + > +if __name__ == "__main__": > + main() > --
On 03-Sep-20 11:48 PM, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > If necessary the steps of clearing old settings and mounting/umounting > can be done individually. > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > v2 -- rewrite in python > The script is python3 only because supporting older versions > no longer makes any sense. > > usertools/hugepage-setup.py | 317 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 317 insertions(+) > create mode 100644 usertools/hugepage-setup.py > > diff --git a/usertools/hugepage-setup.py b/usertools/hugepage-setup.py > new file mode 100644 > index 000000000000..8e7642428d9e > --- /dev/null > +++ b/usertools/hugepage-setup.py > @@ -0,0 +1,317 @@ > +# Copyright (c) 2020 Microsoft Corporation > +# > +# Script to query and setup huge pages for DPDK applications. > + > +import sys > +import os > +import re > +import getopt > +import glob > +from os.path import exists, basename > + > +# convention for where to mount huge pages > +hugedir = '/dev/hugepages' This isn't a "convention", this is a default systemd mountpoint. > + > +# command-line flags > +show_flag = None > +reserve_kb = None > +clear_flag = None > +hugepagesize_kb = None > +mount_flag = None > +unmount_flag = None > + > + > +def usage(): > + '''Print usage information for the program''' > + global hugedir > + mnt = hugedir > + argv0 = basename(sys.argv[0]) > + print(""" > +Usage: > +------ > + %(argv0)s [options] > + > +Options: > + --help, --usage: > + Display usage information and quit > + > + -s, --show: > + Print the current huge page configuration. > + > + --setup: > + Simplified version of clear, umount, reserve, mount operations > + > + -c, --clear: > + Remove all huge pages > + > + -r, --reserve: > + Reserve huge pages. The size specified is in bytes, with > + optional K, M or G suffix. The size must be a multiple > + of the page size. > + > + -p, --pagesize > + Choose page size to use. If not specified, the default > + system page size will be used. > + > + -m, --mount > + Mount the system huge page directory %(mnt)s > + > + -u, --umount > + Unmount the system huge page directory %(mnt)s > + > + > +Examples: > +--------- > + > +To display current huge page settings: > + %(argv0)s -s > + > +To a complete setup of with 2 Gigabyte of 1G huge pages: > + %(argv0)s -p 1G --setup 2G > + > +Equivalent to: > + %(argv0)s -p 1G -c -u -r 2G -m > + > +To clear existing huge page settings and umount %(mnt)s > + %(argv0)s -c -u > + > + """ % locals()) > + > + > +def fmt_memsize(sz): > + '''Format memory size in conventional format''' > + sz_kb = int(sz) > + if sz_kb >= 1024 * 1024: > + return '{}Gb'.format(sz_kb / (1024 * 1024)) > + elif sz_kb >= 1024: > + return '{}Mb'.format(sz_kb / 1024) > + else: > + return '{}Kb'.format(sz_kb) I've lost count how many times i've had to reimplement this code, but there is an easier way :) Off the top of my head, idx = log2(sz) # every 10th power of 2 return '{}{}b'.format(sz, ' kMG'[int(idx) / 10]) or something close to that. > + > + > +def get_memsize(arg): > + '''Convert memory size with suffix to kB''' > + m = re.match('(\d+)([GMKgmk]?)$', arg) > + if m is None: > + sys.exit('{} is not a valid page size'.format(arg)) > + > + num = float(m.group(1)) > + suf = m.group(2) > + if suf == "G" or suf == "g": > + return int(num * 1024 * 1024) > + elif suf == "M" or suf == "m": > + return int(num * 1024) > + elif suf == "K" or suf == "k": > + return int(num) > + else: > + return int(num / 1024.) Same here, can simply index an array and do powers of 2. > + > + > +def is_numa(): > + '''Test if NUMA is necessary on this system''' > + return exists('/sys/devices/numa/node') > + > + > +def get_hugepages(path): > + '''Read number of reserved pages''' > + with open(path + '/nr_hugepages') as f: Here and in other places... os.path.join()? > + return int(f.read()) > + return 0 > + > + > +def show_numa_pages(): > + print('Node Pages Size') > + for n in glob.glob('/sys/devices/system/node/node*'): > + path = n + '/hugepages' > + node = n[29:] # slice after /sys/devices/system/node/node I mean, it works but it's not terribly Pythonic and looks more like C-style string manipulation. Soooo, os.path.join(), os.path.basename(), regex match? I would gladly trade readability and idiomatic-ness of this code for any misguided pursuit of performance here. It'd also make it easier to understand what's going on if you didn't mix logic with presentation, and just returned an array or a dict of values and print everything out in the caller, as opposed to printing everything inline. > + for d in os.listdir(path): > + sz = d[10:-2] # slice out of hugepages-NNNkB > + nr_pages = get_hugepages(path + '/' + d) > + if nr_pages > 0: > + pg_sz = fmt_memsize(sz) > + print('{:<4} {:<5} {}'.format(node, nr_pages, pg_sz)) > + > + > +def show_non_numa_pages(): > + print('Pages Size') > + path = '/sys/kernel/mm/hugepages' > + for d in os.listdir(path): > + sz = d[10:-2] > + nr_pages = get_hugepages(path + '/' + d) > + if nr_pages > 0: > + pg_sz = fmt_memsize(sz) > + print('{:<5} {}'.format(nr_pages, pg_sz)) > + > + > +def show_pages(): > + '''Show existing huge page settings''' > + if is_numa(): > + show_numa_pages() > + else: > + show_non_numa_pages() > + > + > +def clear_numa_pages(): > + for path in glob.glob( > + '/sys/devices/system/node/node*/hugepages/hugepages-*'): > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('\n0') > + > + > +def clear_non_numa_pages(): > + for path in glob.glob('/sys/kernel/mm/hugepages/hugepages-*'): > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('0\n') > + > + > +def clear_pages(): > + '''Clear all existing huge page mappings''' > + if is_numa(): > + clear_numa_pages() > + else: > + clear_non_numa_pages() > + > + > +def default_size(): > + '''Get default huge page size from /proc/meminfo''' > + with open('/proc/meminfo') as f: > + for line in f: > + if line.startswith('Hugepagesize:'): > + return int(line.split()[1]) > + return None > + > + > +def set_numa_pages(nr_pages, hugepgsz): > + for n in glob.glob('/sys/devices/system/node/node*/hugepages'): > + path = '{}/hugepages-{}kB'.format(n, hugepgsz) > + if not exists(path): > + sys.exit( > + '{}Kb is not a valid system huge page size'.format(hugepgsz)) > + > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('{}\n'.format(nr_pages)) > + > + > +def set_non_numa_pages(nr_pages, hugepgsz): > + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) > + if not exists(path): > + sys.exit('{}Kb is not a valid system huge page size'.format(hugepgsz)) > + > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('{}\n'.format(nr_pages)) > + > + > +def set_pages(pages, hugepgsz): > + '''Sets the numberof huge pages to be reserved''' > + if is_numa(): > + set_numa_pages(pages, hugepgsz) > + else: > + set_non_numa_pages(pages, hugepgsz) > + > + > +def mount_huge(pagesize): > + global hugedir > + cmd = "mount -t hugetlbfs" + hugedir > + if pagesize: > + cmd += ' -o pagesize={}'.format(pagesize) > + cmd += ' nodev {}'.format(hugedir) > + os.system(cmd) > + > + > +def show_mount(): > + mounted = None > + with open('/proc/mounts') as f: > + for line in f: > + fields = line.split() > + if fields[2] != 'hugetlbfs': > + continue > + if not mounted: > + print("Hugepages mounted on:", end=" ") > + mounted = True > + print(fields[1], end=" ") > + if mounted: > + print() > + else: > + print("Hugepages not mounted") > + > + > +def parse_args(): > + '''Parses the command-line arguments given by the user and takes the > + appropriate action for each''' > + global clear_flag > + global show_flag > + global reserve_kb > + global hugepagesize_kb > + global args > + > + if len(sys.argv) <= 1: > + usage() > + sys.exit(0) > + > + try: > + opts, args = getopt.getopt(sys.argv[1:], "r:p:csmu", [ > + "help", "usage", "show", "clear", "setup=", "eserve=", "pagesize=", > + "mount", "unmount" > + ]) > + except getopt.GetoptError as error: > + print(str(error)) > + print("Run '%s --usage' for further information" % sys.argv[0]) > + sys.exit(1) > + > + for opt, arg in opts: > + if opt == "--help" or opt == "--usage": > + usage() > + sys.exit(0) > + if opt == "--setup": > + clear_flag = True > + unmount_flag = True > + reserve_kb = get_memsize(arg) > + mount_flag = True > + if opt == "--show" or opt == "-s": > + show_flag = True > + if opt == "--clear" or opt == "-c": > + clear_flag = True > + if opt == "--reserve" or opt == "-r": > + reserve_kb = get_memsize(arg) > + if opt == "--pagesize" or opt == "-p": > + hugepagesize_kb = get_memsize(arg) > + if opt == "--unmount" or opt == "-u": > + unmount_flag = True > + if opt == "--mount" or opt == "-m": > + mount_flag = True > + > + > +def do_arg_actions(): > + '''do the actual action requested by the user''' > + global clear_flag > + global show_flag > + global hugepagesize_kb > + global reserve_kb > + > + if clear_flag: > + clear_pages() > + if unmount_flag: > + os.system("umount " + hugedir) > + if reserve_kb: > + if hugepagesize_kb is None: > + hugepagesize_kb = default_size() > + if reserve_kb % hugepagesize_kb != 0: > + sys.exit('{} is not a multiple of page size {}'.format( > + reserve_kb, hugepagesize_kb)) > + nr_pages = int(reserve_kb / hugepagesize_kb) > + set_pages(nr_pages, hugepagesize_kb) > + if mount_flag: > + mount_huge(hugepagesize_kb * 1024) > + if show_flag: > + show_pages() > + print() > + show_mount() > + > + > +def main(): > + parse_args() > + do_arg_actions() > + > + > +if __name__ == "__main__": This is a sysadmin script and you're not attempting to catch exceptions anywhere - perhaps check uid before proceeding? > + main() > -- Thanks, Anatoly
On Fri, Sep 04, 2020 at 03:58:03PM +0100, Burakov, Anatoly wrote:
> On 03-Sep-20 11:48 PM, Stephen Hemminger wrote:
> > This is an improved version of the setup of huge pages
> > bases on earlier DPDK setup. Differences are:
> > * it autodetects NUMA vs non NUMA
> > * it allows setting different page sizes
> > recent kernels support multiple sizes.
> > * it accepts a parameter in bytes (not pages).
> >
> > If necessary the steps of clearing old settings and mounting/umounting
> > can be done individually.
> >
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > v2 -- rewrite in python
> > The script is python3 only because supporting older versions
> > no longer makes any sense.
> >
> > usertools/hugepage-setup.py | 317 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 317 insertions(+)
> > create mode 100644 usertools/hugepage-setup.py
> >
> > diff --git a/usertools/hugepage-setup.py b/usertools/hugepage-setup.py
> > new file mode 100644
> > index 000000000000..8e7642428d9e
> > --- /dev/null
> > +++ b/usertools/hugepage-setup.py
> > @@ -0,0 +1,317 @@
> > +# Copyright (c) 2020 Microsoft Corporation
> > +#
> > +# Script to query and setup huge pages for DPDK applications.
> > +
> > +import sys
> > +import os
> > +import re
> > +import getopt
> > +import glob
> > +from os.path import exists, basename
> > +
> > +# convention for where to mount huge pages
> > +hugedir = '/dev/hugepages'
>
> This isn't a "convention", this is a default systemd mountpoint.
>
> > +
> > +# command-line flags
> > +show_flag = None
> > +reserve_kb = None
> > +clear_flag = None
> > +hugepagesize_kb = None
> > +mount_flag = None
> > +unmount_flag = None
> > +
> > +
> > +def usage():
> > + '''Print usage information for the program'''
> > + global hugedir
> > + mnt = hugedir
> > + argv0 = basename(sys.argv[0])
> > + print("""
> > +Usage:
> > +------
> > + %(argv0)s [options]
> > +
> > +Options:
> > + --help, --usage:
> > + Display usage information and quit
> > +
> > + -s, --show:
> > + Print the current huge page configuration.
> > +
> > + --setup:
> > + Simplified version of clear, umount, reserve, mount operations
> > +
> > + -c, --clear:
> > + Remove all huge pages
> > +
> > + -r, --reserve:
> > + Reserve huge pages. The size specified is in bytes, with
> > + optional K, M or G suffix. The size must be a multiple
> > + of the page size.
> > +
> > + -p, --pagesize
> > + Choose page size to use. If not specified, the default
> > + system page size will be used.
> > +
> > + -m, --mount
> > + Mount the system huge page directory %(mnt)s
> > +
> > + -u, --umount
> > + Unmount the system huge page directory %(mnt)s
> > +
> > +
> > +Examples:
> > +---------
> > +
> > +To display current huge page settings:
> > + %(argv0)s -s
> > +
> > +To a complete setup of with 2 Gigabyte of 1G huge pages:
> > + %(argv0)s -p 1G --setup 2G
> > +
> > +Equivalent to:
> > + %(argv0)s -p 1G -c -u -r 2G -m
> > +
> > +To clear existing huge page settings and umount %(mnt)s
> > + %(argv0)s -c -u
> > +
> > + """ % locals())
> > +
> > +
> > +def fmt_memsize(sz):
> > + '''Format memory size in conventional format'''
> > + sz_kb = int(sz)
> > + if sz_kb >= 1024 * 1024:
> > + return '{}Gb'.format(sz_kb / (1024 * 1024))
> > + elif sz_kb >= 1024:
> > + return '{}Mb'.format(sz_kb / 1024)
> > + else:
> > + return '{}Kb'.format(sz_kb)
>
> I've lost count how many times i've had to reimplement this code, but there
> is an easier way :) Off the top of my head,
>
> idx = log2(sz)
> # every 10th power of 2
> return '{}{}b'.format(sz, ' kMG'[int(idx) / 10])
>
> or something close to that.
>
Another minor nit, since these are memory sizes, not bandwidth rates, it's
bytes not bits, so the "b" should be "B" in all the prints, whatever way
it's calculated.
On Fri, 4 Sep 2020 10:22:28 +0100 Bruce Richardson <bruce.richardson@intel.com> wrote: > > +def set_pages(pages, hugepgsz): > > + '''Sets the numberof huge pages to be reserved''' > > + if is_numa(): > > + set_numa_pages(pages, hugepgsz) > > + else: > > + set_non_numa_pages(pages, hugepgsz) > > + > > I'm not sure I agree with this behaviour for numa nodes. When a size is > specified on a numa system we probably don't want to reserve that size on > all nodes. I think one of two other options actually makes more sense: > 1. Divide up the allocation equally between all nodes > 2. Require the user to specify a numa node for the allocation. > > Option #2 is best, I think. I was just reproducing what old script does for now. How about a --node option to do a single node and by default divide by number of nodes? > I think the trend in python is to use argparse rather than getopt, though > personally I don't have strong feelings about the issue. Used getopt because that is what devbind was using.
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). If necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v3 -- incorporate review feedback add missing SPDX and env header overengineer the memory prefix string code add numa node argument fix some pylint warnings v2 -- convert to python3 usertools/hugepage-setup.py | 326 ++++++++++++++++++++++++++++++++++++ 1 file changed, 326 insertions(+) create mode 100755 usertools/hugepage-setup.py diff --git a/usertools/hugepage-setup.py b/usertools/hugepage-setup.py new file mode 100755 index 000000000000..9fe1422c5a68 --- /dev/null +++ b/usertools/hugepage-setup.py @@ -0,0 +1,326 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2020 Microsoft Corporation +# +# Script to query and setup huge pages for DPDK applications. + +import sys +import os +import re +import getopt +import glob +from os.path import exists, basename +from math import log2 + +# systemd mount point for huge pages +HUGEDIR = '/dev/hugepages' + +# Standard binary prefix +BINARY_PREFIX = "KMG" + +# command-line flags +show_flag = None +reserve_kb = None +clear_flag = None +hugepagesize_kb = None +mount_flag = None +unmount_flag = None +numa_node = None + + +def usage(): + '''Print usage information for the program''' + mnt = HUGEDIR + argv0 = basename(sys.argv[0]) + print(""" +Usage: +------ + %(argv0)s [options] + +Options: + --help, --usage: + Display usage information and quit + + -s, --show: + Print the current huge page configuration. + + --setup: + Simplified version of clear, umount, reserve, mount operations + + -c, --clear: + Remove all huge pages + + -r, --reserve: + Reserve huge pages. The size specified is in bytes, with + optional K, M or G suffix. The size must be a multiple + of the page size. + + -p, --pagesize + Choose page size to use. If not specified, the default + system page size will be used. + + -n, --node + Select numa node to reserve pages on. + If not specified, pages will be reserved on all nodes. + + -m, --mount + Mount the system huge page directory %(mnt)s + + -u, --umount + Unmount the system huge page directory %(mnt)s + + +Examples: +--------- + +To display current huge page settings: + %(argv0)s -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + %(argv0)s -p 1G --setup 2G + +Equivalent to: + %(argv0)s -p 1G -c -u -r 2G -m + +To clear existing huge page settings and umount %(mnt)s + %(argv0)s -c -u + + """ % locals()) + + +def fmt_memsize(sz_k): + '''Format memory size in kB into conventional format''' + if sz_k < 1024: + return sz_k + l = int(log2(sz_k) / 10) + return '{}{}b'.format(int(sz_k / (2**(l * 10))), BINARY_PREFIX[l]) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + m = re.match(r'(\d+)([' + BINARY_PREFIX + r']?)$', arg.upper()) + if m is None: + sys.exit('{} is not a valid page size'.format(arg)) + num = float(m.group(1)) + suffix = m.group(2) + if suffix == "": + return int(num / 1024) + idx = BINARY_PREFIX.find(suffix) + return int(num * (2**(idx * 10))) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return exists('/sys/devices/numa/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as f: + return int(f.read()) + return 0 + + +def show_numa_pages(): + print('Node Pages Size') + for n in glob.glob('/sys/devices/system/node/node*'): + path = n + '/hugepages' + node = n[29:] # slice after /sys/devices/system/node/node + for d in os.listdir(path): + sz = int(d[10:-2]) # slice out of hugepages-NNNkB + nr_pages = get_hugepages(path + '/' + d) + if nr_pages > 0: + pg_sz = fmt_memsize(sz) + print('{:<4} {:<5} {}'.format(node, nr_pages, pg_sz)) + + +def show_non_numa_pages(): + print('Pages Size') + path = '/sys/kernel/mm/hugepages' + for d in os.listdir(path): + sz = int(d[10:-2]) + nr_pages = get_hugepages(path + '/' + d) + if nr_pages > 0: + pg_sz = fmt_memsize(sz) + print('{:<5} {}'.format(nr_pages, pg_sz)) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_numa_pages(): + for path in glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*'): + with open(path + '/nr_hugepages', 'w') as f: + f.write('\n0') + + +def clear_non_numa_pages(): + for path in glob.glob('/sys/kernel/mm/hugepages/hugepages-*'): + with open(path + '/nr_hugepages', 'w') as f: + f.write('0\n') + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + clear_numa_pages() + else: + clear_non_numa_pages() + + +def default_size(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as f: + for line in f: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(nr_pages, hugepgsz): + if numa_node: + nodes = ['/sys/devices/system/node/node{}/hugepages'.format(numa_node)] + else: + nodes = glob.glob('/sys/devices/system/node/node*/hugepages') + + for n in nodes: + path = '{}/hugepages-{}kB/nr_hugepages'.format(n, hugepgsz) + if not exists(path): + sys.exit( + '{}Kb is not a valid system huge page size'.format(hugepgsz)) + with open(path, 'w') as f: + f.write('{}\n'.format(nr_pages)) + + +def set_non_numa_pages(nr_pages, hugepgsz): + path = '/sys/kernel/mm/hugepages/hugepages-{}kB/nr_hugepages'.format( + hugepgsz) + if not exists(path): + sys.exit('{}Kb is not a valid system huge page size'.format(hugepgsz)) + + with open(path, 'w') as f: + f.write('{}\n'.format(nr_pages)) + + +def set_pages(pages, hugepgsz): + '''Sets the number of huge pages to be reserved''' + if is_numa(): + set_numa_pages(pages, hugepgsz) + else: + set_non_numa_pages(pages, hugepgsz) + + +def mount_huge(pagesize): + cmd = "mount -t hugetlbfs" + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize) + cmd += ' nodev {}'.format(HUGEDIR) + os.system(cmd) + + +def show_mount(): + mounted = None + with open('/proc/mounts') as f: + for line in f: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + if not mounted: + print("Hugepages mounted on:", end=" ") + mounted = True + print(fields[1], end=" ") + if mounted: + print() + else: + print("Hugepages not mounted") + + +def parse_args(): + '''Parses the command-line arguments given by the user and takes the + appropriate action for each''' + global clear_flag + global hugepagesize_kb + global mount_flag + global numa_node + global reserve_kb + global show_flag + global unmount_flag + + if len(sys.argv) <= 1: + usage() + sys.exit(0) + + try: + opts, args = getopt.getopt(sys.argv[1:], "r:p:csmun:", [ + "help", "usage", "show", "clear", "setup=", "reserve=", + "pagesize=", "node=", "mount", "unmount" + ]) + except getopt.GetoptError as error: + print(str(error)) + print("Run '%s --usage' for further information" % sys.argv[0]) + sys.exit(1) + + for opt, arg in opts: + if opt in ('--help', '--usage'): + usage() + sys.exit(0) + elif opt == '--setup': + clear_flag = True + unmount_flag = True + reserve_kb = get_memsize(arg) + mount_flag = True + elif opt in ('--show', '-s'): + show_flag = True + elif opt in ('--clear', '-c'): + clear_flag = True + elif opt in ('--reserve', '-r'): + reserve_kb = get_memsize(arg) + elif opt in ('--pagesize', '-p'): + hugepagesize_kb = get_memsize(arg) + elif opt in ('--unmount', '-u'): + unmount_flag = True + elif opt in ('--mount', '-m'): + mount_flag = True + elif opt in ('--node', '-n'): + if not arg.isdigit(): + sys.exit('Numeric value for numa node expected') + numa_node = arg + + +def do_arg_actions(): + '''do the actual action requested by the user''' + global hugepagesize_kb + + if clear_flag: + clear_pages() + if unmount_flag: + os.system("umount " + HUGEDIR) + if reserve_kb: + if hugepagesize_kb is None: + hugepagesize_kb = default_size() + if reserve_kb % hugepagesize_kb != 0: + sys.exit('{} is not a multiple of page size {}'.format( + reserve_kb, hugepagesize_kb)) + nr_pages = int(reserve_kb / hugepagesize_kb) + set_pages(nr_pages, hugepagesize_kb) + if mount_flag: + mount_huge(hugepagesize_kb * 1024) + if show_flag: + show_pages() + print() + show_mount() + + +def main(): + parse_args() + do_arg_actions() + + +if __name__ == "__main__": + main() -- 2.27.0
On 9/4/2020 7:35 PM, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > If necessary the steps of clearing old settings and mounting/umounting > can be done individually. Very handy, thanks. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > v3 -- incorporate review feedback > add missing SPDX and env header > overengineer the memory prefix string code > add numa node argument > fix some pylint warnings > > v2 -- convert to python3 > <...> > +def is_numa(): > + '''Test if NUMA is necessary on this system''' > + return exists('/sys/devices/numa/node') - return exists('/sys/devices/numa/node') + return exists('/sys/devices/system/node') <...> > +def clear_numa_pages(): > + for path in glob.glob( > + '/sys/devices/system/node/node*/hugepages/hugepages-*'): > + with open(path + '/nr_hugepages', 'w') as f: > + f.write('\n0') - f.write('\n0') + f.write('0\n') <...> > +def mount_huge(pagesize):> + cmd = "mount -t hugetlbfs" > + if pagesize: > + cmd += ' -o pagesize={}'.format(pagesize) > + cmd += ' nodev {}'.format(HUGEDIR) > + os.system(cmd) What do you thing checking if mount point exist before 'cmd'? + if not exists(HUGEDIR): + os.system('mkdir -p ' + HUGEDIR) <...> > +def do_arg_actions(): > + '''do the actual action requested by the user''' > + global hugepagesize_kb > + > + if clear_flag: > + clear_pages() > + if unmount_flag: > + os.system("umount " + HUGEDIR) What do you think umount only if it is mounted, (to get rid of warning for --setup after -u), something like: + if not os.system("mount | grep -q " + HUGEDIR): + os.system("umount " + HUGEDIR)
On Sat, 5 Sep 2020 00:13:06 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> What do you thing checking if mount point exist before 'cmd'?
>
> + if not exists(HUGEDIR):
> + os.system('mkdir -p ' + HUGEDIR)
Not necessary with standard systemd
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). If necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v4 -- more review feedback use argparser rather than getopt (thanks Bruce) silently handle already mounted type errors handle exceptions for permission and file not found fix numa bugs code now passes pylint v3 -- incorporate review feedback add missing SPDX and env header overengineer the memory prefix string code add numa node argument fix some pylint warnings v2 -- convert to python3 usertools/hugepage_setup.py | 272 ++++++++++++++++++++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100755 usertools/hugepage_setup.py diff --git a/usertools/hugepage_setup.py b/usertools/hugepage_setup.py new file mode 100755 index 000000000000..3091dbe5d4c2 --- /dev/null +++ b/usertools/hugepage_setup.py @@ -0,0 +1,272 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2020 Microsoft Corporation +"""Script to query and setup huge pages for DPDK applications.""" + +import argparse +import glob +import os +import re +import sys +from math import log2 + +# Standard binary prefix +BINARY_PREFIX = "KMG" + +# systemd mount point for huge pages +HUGE_MOUNT = "/dev/hugepages" + + +def fmt_memsize(sz_k): + '''Format memory size in kB into conventional format''' + if sz_k.isdigit(): + return int(sz_k) / 1024 + logk = int(log2(sz_k) / 10) + return '{}{}b'.format(int(sz_k / (2**(logk * 10))), BINARY_PREFIX[logk]) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + match = re.match(r'(\d+)([' + BINARY_PREFIX + r']?)$', arg.upper()) + if match is None: + sys.exit('{} is not a valid page size'.format(arg)) + num = float(match.group(1)) + suffix = match.group(2) + if suffix == "": + return int(num / 1024) + idx = BINARY_PREFIX.find(suffix) + return int(num * (2**(idx * 10))) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return os.path.exists('/sys/devices/system/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as nr_hugpages: + return int(nr_hugpages.read()) + return 0 + + +def set_hugepages(path, pages): + '''Write the number of reserved huge pages''' + filename = path + '/nr_hugepages' + try: + with open(filename, 'w') as nr_hugpages: + nr_hugpages.write('{}\n'.format(pages)) + except PermissionError: + sys.exit('Permission denied: need to be root!') + except FileNotFoundError: + filename = os.path.basename(path) + size = filename[10:] + sys.exit('{} is not a valid system huge page size'.format(size)) + + +def show_numa_pages(): + '''Show huge page reservations on Numa system''' + print('Node Pages Size') + for numa_path in glob.glob('/sys/devices/system/node/node*'): + node = numa_path[29:] # slice after /sys/devices/system/node/node + path = numa_path + '/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize( + hdir[10:-2]) # slice out of hugepages-NNNkB + print('{:<4} {:<5} {}'.format(node, pages, pg_sz)) + + +def show_non_numa_pages(): + '''Show huge page reservations on non Numa system''' + print('Pages Size') + path = '/sys/kernel/mm/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize(int(hdir[10:-2])) + print('{:<5} {}'.format(pages, pg_sz)) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + dirs = glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*') + else: + dirs = glob.glob('/sys/kernel/mm/hugepages/hugepages-*') + + for path in dirs: + set_hugepages(path, 0) + + +def default_pagesize(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as meminfo: + for line in meminfo: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(pages, hugepgsz, node=None): + '''Set huge page reservation on Numa system''' + if node: + nodes = ['/sys/devices/system/node/node{}/hugepages'.format(node)] + else: + nodes = glob.glob('/sys/devices/system/node/node*/hugepages') + + for node_path in nodes: + huge_path = '{}/hugepages-{}kB'.format(node_path, hugepgsz) + set_hugepages(huge_path, pages) + + +def set_non_numa_pages(pages, hugepgsz): + '''Set huge page reservation on non Numa system''' + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) + set_hugepages(path, pages) + + +def reserve_pages(pages, hugepgsz, node=None): + '''Sets the number of huge pages to be reserved''' + if node or is_numa(): + set_numa_pages(pages, hugepgsz, node=node) + else: + set_non_numa_pages(pages, hugepgsz) + + +def get_mountpoints(): + '''get list of of where hugepage filesystem is mounted''' + mounted = [] + with open('/proc/mounts') as mounts: + for line in mounts: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + mounted.append(fields[1]) + return mounted + + +def mount_huge(pagesize, mountpoint): + '''mount the huge tlb file system''' + if mountpoint in get_mountpoints(): + print(mountpoint, "already mounted") + return + cmd = "mount -t hugetlbfs" + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize * 1024) + cmd += ' nodev ' + mountpoint + os.system(cmd) + + +def umount_huge(mountpoint): + '''unmount the huge tlb file system (if mounted)''' + if mountpoint in get_mountpoints(): + os.system("umount " + mountpoint) + + +def show_mount(): + '''Show where huge page filesystem is mounted''' + mounted = get_mountpoints() + if mounted: + print("Hugepages mounted on", *mounted) + else: + print("Hugepages not mounted") + + +def main(): + '''Process the command line arguments and setup huge pages''' + argv0 = os.path.basename(sys.argv[0]) + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Setup huge pages", + epilog=""" +Examples: + +To display current huge page settings: + {argv0} -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + {argv0} -p 1G --setup 2G +""".format(argv0=argv0)) + parser.add_argument( + '--show', + '-s', + action='store_true', + help="print the current huge page configuration") + parser.add_argument( + '--clear', '-c', action='store_true', help="clear existing huge pages") + parser.add_argument( + '--mount', + '-m', + action='store_true', + help='mount the huge page filesystem') + parser.add_argument( + '--unmount', + '-u', + action='store_true', + help='unmount the system huge page directory') + parser.add_argument( + '--node', + '-n', + action='store', + help='select numa node to reserve pages on') + parser.add_argument( + '--pagesize', + '-p', + action='store', + help='choose huge page size to use') + parser.add_argument( + '--reserve', + '-r', + action='store', + help='reserve huge pages. Size is in bytes with K, M, or G suffix') + parser.add_argument( + '--setup', + action='store', + help='setup huge pages by doing clear, unmount, reserve and mount') + args = parser.parse_args() + + if args.setup: + args.clear = True + args.unmount = True + args.reserve = args.setup + args.mount = True + + if args.pagesize: + pagesize_kb = get_memsize(args.pagesize) + else: + pagesize_kb = default_pagesize() + + if args.clear: + clear_pages() + if args.unmount: + umount_huge(HUGE_MOUNT) + + if args.reserve: + reserve_kb = get_memsize(args.reserve) + if reserve_kb % pagesize_kb != 0: + sys.exit( + 'Huge reservation {}kB is not a multiple of page size {}kB'. + format(reserve_kb, pagesize_kb)) + reserve_pages( + int(reserve_kb / pagesize_kb), pagesize_kb, node=args.node) + if args.mount: + mount_huge(pagesize_kb, HUGE_MOUNT) + if args.show: + show_pages() + print() + show_mount() + + +if __name__ == "__main__": + main() -- 2.27.0
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). If necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v5 -- cleanup help messages add documentation and install script v4 -- more review feedback use argparser rather than getopt (thanks Bruce) silently handle already mounted type errors handle exceptions for permission and file not found fix numa bugs code now passes pylint v3 -- incorporate review feedback add missing SPDX and env header overengineer the memory prefix string code add numa node argument fix some pylint warnings doc/guides/tools/hugepagesetup.rst | 79 +++++++++ doc/guides/tools/index.rst | 1 + usertools/hugepage_setup.py | 271 +++++++++++++++++++++++++++++ usertools/meson.build | 7 +- 4 files changed, 357 insertions(+), 1 deletion(-) create mode 100644 doc/guides/tools/hugepagesetup.rst create mode 100755 usertools/hugepage_setup.py diff --git a/doc/guides/tools/hugepagesetup.rst b/doc/guides/tools/hugepagesetup.rst new file mode 100644 index 000000000000..b1a5a8993a25 --- /dev/null +++ b/doc/guides/tools/hugepagesetup.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright (c) 2020 Microsoft Corporation + +hugepage_setup Application +========================== + +The ``hugepage_setup`` tool is a Data Plane Development Kit (DPDK) utility +that helps in reserving hugepages. +As well as checking for current settings. + + +Running the Application +----------------------- + +The tool has a number of command line options: + +.. code-block:: console + + + hugepage_setup [options] + + +OPTIONS +------- + +* ``-h, --help`` + + Display usage information and quit + +* ``-s, --show`` + + Print the current huge page configuration + +* ``-c driver, --clear`` + + Clear existing huge page reservation + +* ``-m, --mount`` + + Mount the huge page filesystem + +* ``-u, --unmount`` + + Unmount the huge page filesystem + +* ``-n NODE, --node=NODE`` + + Set NUMA node to reserve pages on + +* ``-p SIZE, --pagesize=SIZE`` + + Select hugepage size to use. + If not specified the default system huge page size is used. + +* ``-r SIZE, --reserve=SIZE`` + + Reserve huge pages. + Size is in bytes with K, M or G suffix. + +* ``--setup SIZE`` + + Short cut to clear, unmount, reserve and mount. + +.. warning:: + + While any user can run the ``hugepage_setup`` script to view the + status of huge pages, modifying the setup requires root privileges. + + +Examples +-------- + +To display current huge page settings:: + + hugepage_setup -s + +To a complete setup of with 2 Gigabyte of 1G huge pages:: + + hugepage_setup -p 1G --setup 2G diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst index c721943606f9..25c953ab01d4 100644 --- a/doc/guides/tools/index.rst +++ b/doc/guides/tools/index.rst @@ -11,6 +11,7 @@ DPDK Tools User Guides proc_info pdump pmdinfo + hugepagesetup devbind flow-perf testbbdev diff --git a/usertools/hugepage_setup.py b/usertools/hugepage_setup.py new file mode 100755 index 000000000000..bec010883e63 --- /dev/null +++ b/usertools/hugepage_setup.py @@ -0,0 +1,271 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2020 Microsoft Corporation +"""Script to query and setup huge pages for DPDK applications.""" + +import argparse +import glob +import os +import re +import sys +from math import log2 + +# Standard binary prefix +BINARY_PREFIX = "KMG" + +# systemd mount point for huge pages +HUGE_MOUNT = "/dev/hugepages" + + +def fmt_memsize(sz_k): + '''Format memory size in kB into conventional format''' + if sz_k.isdigit(): + return int(sz_k) / 1024 + logk = int(log2(sz_k) / 10) + return '{}{}b'.format(int(sz_k / (2**(logk * 10))), BINARY_PREFIX[logk]) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + match = re.match(r'(\d+)([' + BINARY_PREFIX + r']?)$', arg.upper()) + if match is None: + sys.exit('{} is not a valid page size'.format(arg)) + num = float(match.group(1)) + suffix = match.group(2) + if suffix == "": + return int(num / 1024) + idx = BINARY_PREFIX.find(suffix) + return int(num * (2**(idx * 10))) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return os.path.exists('/sys/devices/system/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as nr_hugpages: + return int(nr_hugpages.read()) + return 0 + + +def set_hugepages(path, pages): + '''Write the number of reserved huge pages''' + filename = path + '/nr_hugepages' + try: + with open(filename, 'w') as nr_hugpages: + nr_hugpages.write('{}\n'.format(pages)) + except PermissionError: + sys.exit('Permission denied: need to be root!') + except FileNotFoundError: + filename = os.path.basename(path) + size = filename[10:] + sys.exit('{} is not a valid system huge page size'.format(size)) + + +def show_numa_pages(): + '''Show huge page reservations on Numa system''' + print('Node Pages Size') + for numa_path in glob.glob('/sys/devices/system/node/node*'): + node = numa_path[29:] # slice after /sys/devices/system/node/node + path = numa_path + '/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize( + hdir[10:-2]) # slice out of hugepages-NNNkB + print('{:<4} {:<5} {}'.format(node, pages, pg_sz)) + + +def show_non_numa_pages(): + '''Show huge page reservations on non Numa system''' + print('Pages Size') + path = '/sys/kernel/mm/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize(int(hdir[10:-2])) + print('{:<5} {}'.format(pages, pg_sz)) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + dirs = glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*') + else: + dirs = glob.glob('/sys/kernel/mm/hugepages/hugepages-*') + + for path in dirs: + set_hugepages(path, 0) + + +def default_pagesize(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as meminfo: + for line in meminfo: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(pages, hugepgsz, node=None): + '''Set huge page reservation on Numa system''' + if node: + nodes = ['/sys/devices/system/node/node{}/hugepages'.format(node)] + else: + nodes = glob.glob('/sys/devices/system/node/node*/hugepages') + + for node_path in nodes: + huge_path = '{}/hugepages-{}kB'.format(node_path, hugepgsz) + set_hugepages(huge_path, pages) + + +def set_non_numa_pages(pages, hugepgsz): + '''Set huge page reservation on non Numa system''' + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) + set_hugepages(path, pages) + + +def reserve_pages(pages, hugepgsz, node=None): + '''Sets the number of huge pages to be reserved''' + if node or is_numa(): + set_numa_pages(pages, hugepgsz, node=node) + else: + set_non_numa_pages(pages, hugepgsz) + + +def get_mountpoints(): + '''get list of of where hugepage filesystem is mounted''' + mounted = [] + with open('/proc/mounts') as mounts: + for line in mounts: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + mounted.append(fields[1]) + return mounted + + +def mount_huge(pagesize, mountpoint): + '''mount the huge tlb file system''' + if mountpoint in get_mountpoints(): + print(mountpoint, "already mounted") + return + cmd = "mount -t hugetlbfs" + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize * 1024) + cmd += ' nodev ' + mountpoint + os.system(cmd) + + +def umount_huge(mountpoint): + '''unmount the huge tlb file system (if mounted)''' + if mountpoint in get_mountpoints(): + os.system("umount " + mountpoint) + + +def show_mount(): + '''Show where huge page filesystem is mounted''' + mounted = get_mountpoints() + if mounted: + print("Hugepages mounted on", *mounted) + else: + print("Hugepages not mounted") + + +def main(): + '''Process the command line arguments and setup huge pages''' + argv0 = os.path.basename(sys.argv[0]) + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Setup huge pages", + epilog=""" +Examples: + +To display current huge page settings: + %(prog)s -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + %(prog)s -p 1G --setup 2G +""") + parser.add_argument( + '--show', + '-s', + action='store_true', + help="print the current huge page configuration") + parser.add_argument( + '--clear', '-c', action='store_true', help="clear existing huge pages") + parser.add_argument( + '--mount', + '-m', + action='store_true', + help='mount the huge page filesystem') + parser.add_argument( + '--unmount', + '-u', + action='store_true', + help='unmount the system huge page directory') + parser.add_argument( + '--node', + '-n', + help='select numa node to reserve pages on') + parser.add_argument( + '--pagesize', + '-p', + metavar='SIZE', + help='choose huge page size to use') + parser.add_argument( + '--reserve', + '-r', + metavar='SIZE', + help='reserve huge pages. Size is in bytes with K, M, or G suffix') + parser.add_argument( + '--setup', + metavar='SIZE', + help='setup huge pages by doing clear, unmount, reserve and mount') + args = parser.parse_args() + + if args.setup: + args.clear = True + args.unmount = True + args.reserve = args.setup + args.mount = True + + if args.pagesize: + pagesize_kb = get_memsize(args.pagesize) + else: + pagesize_kb = default_pagesize() + + if args.clear: + clear_pages() + if args.unmount: + umount_huge(HUGE_MOUNT) + + if args.reserve: + reserve_kb = get_memsize(args.reserve) + if reserve_kb % pagesize_kb != 0: + sys.exit( + 'Huge reservation {}kB is not a multiple of page size {}kB'. + format(reserve_kb, pagesize_kb)) + reserve_pages( + int(reserve_kb / pagesize_kb), pagesize_kb, node=args.node) + if args.mount: + mount_huge(pagesize_kb, HUGE_MOUNT) + if args.show: + show_pages() + print() + show_mount() + + +if __name__ == "__main__": + main() diff --git a/usertools/meson.build b/usertools/meson.build index 64e27238f45b..2bf2aeace5b5 100644 --- a/usertools/meson.build +++ b/usertools/meson.build @@ -1,4 +1,9 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin') +install_data([ + 'dpdk-devbind.py', + 'dpdk-pmdinfo.py', + 'dpdk-telemetry.py', + 'hugepage_setup.py' +],install_dir: 'bin') -- 2.27.0
On 9/6/2020 4:42 AM, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > If necessary the steps of clearing old settings and mounting/umounting > can be done individually. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> <...> > @@ -1,4 +1,9 @@ > # SPDX-License-Identifier: BSD-3-Clause > # Copyright(c) 2017 Intel Corporation > > -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin') > +install_data([ > + 'dpdk-devbind.py', > + 'dpdk-pmdinfo.py', > + 'dpdk-telemetry.py', > + 'hugepage_setup.py' > +],install_dir: 'bin') > Should script name has 'dpdk-' prefix as others do?
On Mon, Sep 07, 2020 at 09:54:29AM +0100, Ferruh Yigit wrote:
> On 9/6/2020 4:42 AM, Stephen Hemminger wrote:
> > This is an improved version of the setup of huge pages
> > bases on earlier DPDK setup. Differences are:
> > * it autodetects NUMA vs non NUMA
> > * it allows setting different page sizes
> > recent kernels support multiple sizes.
> > * it accepts a parameter in bytes (not pages).
> >
> > If necessary the steps of clearing old settings and mounting/umounting
> > can be done individually.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> <...>
>
> > @@ -1,4 +1,9 @@
> > # SPDX-License-Identifier: BSD-3-Clause
> > # Copyright(c) 2017 Intel Corporation
> >
> > -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin')
> > +install_data([
> > + 'dpdk-devbind.py',
> > + 'dpdk-pmdinfo.py',
> > + 'dpdk-telemetry.py',
> > + 'hugepage_setup.py'
> > +],install_dir: 'bin')
> >
>
> Should script name has 'dpdk-' prefix as others do?
+1 to that.
On Mon, 7 Sep 2020 09:58:27 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Mon, Sep 07, 2020 at 09:54:29AM +0100, Ferruh Yigit wrote:
> > On 9/6/2020 4:42 AM, Stephen Hemminger wrote:
> > > This is an improved version of the setup of huge pages
> > > bases on earlier DPDK setup. Differences are:
> > > * it autodetects NUMA vs non NUMA
> > > * it allows setting different page sizes
> > > recent kernels support multiple sizes.
> > > * it accepts a parameter in bytes (not pages).
> > >
> > > If necessary the steps of clearing old settings and mounting/umounting
> > > can be done individually.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > <...>
> >
> > > @@ -1,4 +1,9 @@
> > > # SPDX-License-Identifier: BSD-3-Clause
> > > # Copyright(c) 2017 Intel Corporation
> > >
> > > -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin')
> > > +install_data([
> > > + 'dpdk-devbind.py',
> > > + 'dpdk-pmdinfo.py',
> > > + 'dpdk-telemetry.py',
> > > + 'hugepage_setup.py'
> > > +],install_dir: 'bin')
> > >
> >
> > Should script name has 'dpdk-' prefix as others do?
>
> +1 to that.
Ok but - in the name violates Python lint naming for modules.
The standard is underscore.
On Mon, Sep 07, 2020 at 10:20:13AM -0700, Stephen Hemminger wrote:
> On Mon, 7 Sep 2020 09:58:27 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
> > On Mon, Sep 07, 2020 at 09:54:29AM +0100, Ferruh Yigit wrote:
> > > On 9/6/2020 4:42 AM, Stephen Hemminger wrote:
> > > > This is an improved version of the setup of huge pages
> > > > bases on earlier DPDK setup. Differences are:
> > > > * it autodetects NUMA vs non NUMA
> > > > * it allows setting different page sizes
> > > > recent kernels support multiple sizes.
> > > > * it accepts a parameter in bytes (not pages).
> > > >
> > > > If necessary the steps of clearing old settings and mounting/umounting
> > > > can be done individually.
> > > >
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > >
> > > <...>
> > >
> > > > @@ -1,4 +1,9 @@
> > > > # SPDX-License-Identifier: BSD-3-Clause
> > > > # Copyright(c) 2017 Intel Corporation
> > > >
> > > > -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin')
> > > > +install_data([
> > > > + 'dpdk-devbind.py',
> > > > + 'dpdk-pmdinfo.py',
> > > > + 'dpdk-telemetry.py',
> > > > + 'hugepage_setup.py'
> > > > +],install_dir: 'bin')
> > > >
> > >
> > > Should script name has 'dpdk-' prefix as others do?
> >
> > +1 to that.
>
> Ok but - in the name violates Python lint naming for modules.
> The standard is underscore.
We don't really need 100% lint cleanliness for all our scripts, 95% is
surely enough. However, if you feel strongly, then I suggest we prefix all
our python scripts with "dpdk_", rather than "dpdk-".
On Tue, 8 Sep 2020 09:18:08 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Mon, Sep 07, 2020 at 10:20:13AM -0700, Stephen Hemminger wrote:
> > On Mon, 7 Sep 2020 09:58:27 +0100
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> > > On Mon, Sep 07, 2020 at 09:54:29AM +0100, Ferruh Yigit wrote:
> > > > On 9/6/2020 4:42 AM, Stephen Hemminger wrote:
> > > > > This is an improved version of the setup of huge pages
> > > > > bases on earlier DPDK setup. Differences are:
> > > > > * it autodetects NUMA vs non NUMA
> > > > > * it allows setting different page sizes
> > > > > recent kernels support multiple sizes.
> > > > > * it accepts a parameter in bytes (not pages).
> > > > >
> > > > > If necessary the steps of clearing old settings and mounting/umounting
> > > > > can be done individually.
> > > > >
> > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > >
> > > > <...>
> > > >
> > > > > @@ -1,4 +1,9 @@
> > > > > # SPDX-License-Identifier: BSD-3-Clause
> > > > > # Copyright(c) 2017 Intel Corporation
> > > > >
> > > > > -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin')
> > > > > +install_data([
> > > > > + 'dpdk-devbind.py',
> > > > > + 'dpdk-pmdinfo.py',
> > > > > + 'dpdk-telemetry.py',
> > > > > + 'hugepage_setup.py'
> > > > > +],install_dir: 'bin')
> > > > >
> > > >
> > > > Should script name has 'dpdk-' prefix as others do?
> > >
> > > +1 to that.
> >
> > Ok but - in the name violates Python lint naming for modules.
> > The standard is underscore.
>
> We don't really need 100% lint cleanliness for all our scripts, 95% is
> surely enough. However, if you feel strongly, then I suggest we prefix all
> our python scripts with "dpdk_", rather than "dpdk-"
Agree.
Just wanted to raise the observation. Maybe add a .pylintrc to usertools to suppress this warning.
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Features: * can display current hugepage settings. * autodetects NUMA vs non NUMA * allows setting different page sizes recent kernels support multiple sizes. * accepts a parameter in bytes (not pages). Most users will just use --setup argument but if necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v6 -- rename to dpdk-hugepages doc/guides/tools/hugepages.rst | 79 ++++++++++ doc/guides/tools/index.rst | 1 + usertools/dpdk-hugepages.py | 270 +++++++++++++++++++++++++++++++++ usertools/meson.build | 7 +- 4 files changed, 356 insertions(+), 1 deletion(-) create mode 100644 doc/guides/tools/hugepages.rst create mode 100755 usertools/dpdk-hugepages.py diff --git a/doc/guides/tools/hugepages.rst b/doc/guides/tools/hugepages.rst new file mode 100644 index 000000000000..a82b71620011 --- /dev/null +++ b/doc/guides/tools/hugepages.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright (c) 2020 Microsoft Corporation + +dpdk-hugpages Application +========================== + +The ``dpdk-hugpages`` tool is a Data Plane Development Kit (DPDK) utility +that helps in reserving hugepages. +As well as checking for current settings. + + +Running the Application +----------------------- + +The tool has a number of command line options: + +.. code-block:: console + + + dpdk-hugpages [options] + + +OPTIONS +------- + +* ``-h, --help`` + + Display usage information and quit + +* ``-s, --show`` + + Print the current huge page configuration + +* ``-c driver, --clear`` + + Clear existing huge page reservation + +* ``-m, --mount`` + + Mount the huge page filesystem + +* ``-u, --unmount`` + + Unmount the huge page filesystem + +* ``-n NODE, --node=NODE`` + + Set NUMA node to reserve pages on + +* ``-p SIZE, --pagesize=SIZE`` + + Select hugepage size to use. + If not specified the default system huge page size is used. + +* ``-r SIZE, --reserve=SIZE`` + + Reserve huge pages. + Size is in bytes with K, M or G suffix. + +* ``--setup SIZE`` + + Short cut to clear, unmount, reserve and mount. + +.. warning:: + + While any user can run the ``dpdk-hugpages.py`` script to view the + status of huge pages, modifying the setup requires root privileges. + + +Examples +-------- + +To display current huge page settings:: + + dpdk-hugpages.py -s + +To a complete setup of with 2 Gigabyte of 1G huge pages:: + + dpdk-hugpages.py -p 1G --setup 2G diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst index c721943606f9..93dde4148e90 100644 --- a/doc/guides/tools/index.rst +++ b/doc/guides/tools/index.rst @@ -11,6 +11,7 @@ DPDK Tools User Guides proc_info pdump pmdinfo + hugepages devbind flow-perf testbbdev diff --git a/usertools/dpdk-hugepages.py b/usertools/dpdk-hugepages.py new file mode 100755 index 000000000000..b3ce2635d27b --- /dev/null +++ b/usertools/dpdk-hugepages.py @@ -0,0 +1,270 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2020 Microsoft Corporation +"""Script to query and setup huge pages for DPDK applications.""" + +import argparse +import glob +import os +import re +import sys +from math import log2 + +# Standard binary prefix +BINARY_PREFIX = "KMG" + +# systemd mount point for huge pages +HUGE_MOUNT = "/dev/hugepages" + + +def fmt_memsize(sz_k): + '''Format memory size in kB into conventional format''' + if sz_k.isdigit(): + return int(sz_k) / 1024 + logk = int(log2(sz_k) / 10) + return '{}{}b'.format(int(sz_k / (2**(logk * 10))), BINARY_PREFIX[logk]) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + match = re.match(r'(\d+)([' + BINARY_PREFIX + r']?)$', arg.upper()) + if match is None: + sys.exit('{} is not a valid page size'.format(arg)) + num = float(match.group(1)) + suffix = match.group(2) + if suffix == "": + return int(num / 1024) + idx = BINARY_PREFIX.find(suffix) + return int(num * (2**(idx * 10))) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return os.path.exists('/sys/devices/system/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as nr_hugpages: + return int(nr_hugpages.read()) + return 0 + + +def set_hugepages(path, pages): + '''Write the number of reserved huge pages''' + filename = path + '/nr_hugepages' + try: + with open(filename, 'w') as nr_hugpages: + nr_hugpages.write('{}\n'.format(pages)) + except PermissionError: + sys.exit('Permission denied: need to be root!') + except FileNotFoundError: + filename = os.path.basename(path) + size = filename[10:] + sys.exit('{} is not a valid system huge page size'.format(size)) + + +def show_numa_pages(): + '''Show huge page reservations on Numa system''' + print('Node Pages Size') + for numa_path in glob.glob('/sys/devices/system/node/node*'): + node = numa_path[29:] # slice after /sys/devices/system/node/node + path = numa_path + '/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize( + hdir[10:-2]) # slice out of hugepages-NNNkB + print('{:<4} {:<5} {}'.format(node, pages, pg_sz)) + + +def show_non_numa_pages(): + '''Show huge page reservations on non Numa system''' + print('Pages Size') + path = '/sys/kernel/mm/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + pg_sz = fmt_memsize(int(hdir[10:-2])) + print('{:<5} {}'.format(pages, pg_sz)) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + dirs = glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*') + else: + dirs = glob.glob('/sys/kernel/mm/hugepages/hugepages-*') + + for path in dirs: + set_hugepages(path, 0) + + +def default_pagesize(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as meminfo: + for line in meminfo: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(pages, hugepgsz, node=None): + '''Set huge page reservation on Numa system''' + if node: + nodes = ['/sys/devices/system/node/node{}/hugepages'.format(node)] + else: + nodes = glob.glob('/sys/devices/system/node/node*/hugepages') + + for node_path in nodes: + huge_path = '{}/hugepages-{}kB'.format(node_path, hugepgsz) + set_hugepages(huge_path, pages) + + +def set_non_numa_pages(pages, hugepgsz): + '''Set huge page reservation on non Numa system''' + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) + set_hugepages(path, pages) + + +def reserve_pages(pages, hugepgsz, node=None): + '''Sets the number of huge pages to be reserved''' + if node or is_numa(): + set_numa_pages(pages, hugepgsz, node=node) + else: + set_non_numa_pages(pages, hugepgsz) + + +def get_mountpoints(): + '''get list of of where hugepage filesystem is mounted''' + mounted = [] + with open('/proc/mounts') as mounts: + for line in mounts: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + mounted.append(fields[1]) + return mounted + + +def mount_huge(pagesize, mountpoint): + '''mount the huge tlb file system''' + if mountpoint in get_mountpoints(): + print(mountpoint, "already mounted") + return + cmd = "mount -t hugetlbfs" + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize * 1024) + cmd += ' nodev ' + mountpoint + os.system(cmd) + + +def umount_huge(mountpoint): + '''unmount the huge tlb file system (if mounted)''' + if mountpoint in get_mountpoints(): + os.system("umount " + mountpoint) + + +def show_mount(): + '''Show where huge page filesystem is mounted''' + mounted = get_mountpoints() + if mounted: + print("Hugepages mounted on", *mounted) + else: + print("Hugepages not mounted") + + +def main(): + '''Process the command line arguments and setup huge pages''' + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Setup huge pages", + epilog=""" +Examples: + +To display current huge page settings: + %(prog)s -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + %(prog)s -p 1G --setup 2G +""") + parser.add_argument( + '--show', + '-s', + action='store_true', + help="print the current huge page configuration") + parser.add_argument( + '--clear', '-c', action='store_true', help="clear existing huge pages") + parser.add_argument( + '--mount', + '-m', + action='store_true', + help='mount the huge page filesystem') + parser.add_argument( + '--unmount', + '-u', + action='store_true', + help='unmount the system huge page directory') + parser.add_argument( + '--node', + '-n', + help='select numa node to reserve pages on') + parser.add_argument( + '--pagesize', + '-p', + metavar='SIZE', + help='choose huge page size to use') + parser.add_argument( + '--reserve', + '-r', + metavar='SIZE', + help='reserve huge pages. Size is in bytes with K, M, or G suffix') + parser.add_argument( + '--setup', + metavar='SIZE', + help='setup huge pages by doing clear, unmount, reserve and mount') + args = parser.parse_args() + + if args.setup: + args.clear = True + args.unmount = True + args.reserve = args.setup + args.mount = True + + if args.pagesize: + pagesize_kb = get_memsize(args.pagesize) + else: + pagesize_kb = default_pagesize() + + if args.clear: + clear_pages() + if args.unmount: + umount_huge(HUGE_MOUNT) + + if args.reserve: + reserve_kb = get_memsize(args.reserve) + if reserve_kb % pagesize_kb != 0: + sys.exit( + 'Huge reservation {}kB is not a multiple of page size {}kB'. + format(reserve_kb, pagesize_kb)) + reserve_pages( + int(reserve_kb / pagesize_kb), pagesize_kb, node=args.node) + if args.mount: + mount_huge(pagesize_kb, HUGE_MOUNT) + if args.show: + show_pages() + print() + show_mount() + + +if __name__ == "__main__": + main() diff --git a/usertools/meson.build b/usertools/meson.build index 64e27238f45b..596eaefb0e23 100644 --- a/usertools/meson.build +++ b/usertools/meson.build @@ -1,4 +1,9 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin') +install_data([ + 'dpdk-devbind.py', + 'dpdk-pmdinfo.py', + 'dpdk-telemetry.py', + 'dpdk-hugepages.py' +],install_dir: 'bin') -- 2.27.0
07/09/2020 10:58, Bruce Richardson:
> On Mon, Sep 07, 2020 at 09:54:29AM +0100, Ferruh Yigit wrote:
> > On 9/6/2020 4:42 AM, Stephen Hemminger wrote:
> > > This is an improved version of the setup of huge pages
> > > bases on earlier DPDK setup. Differences are:
> > > * it autodetects NUMA vs non NUMA
> > > * it allows setting different page sizes
> > > recent kernels support multiple sizes.
> > > * it accepts a parameter in bytes (not pages).
> > >
> > > If necessary the steps of clearing old settings and mounting/umounting
> > > can be done individually.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > <...>
> >
> > > +install_data([
> > > + 'dpdk-devbind.py',
> > > + 'dpdk-pmdinfo.py',
> > > + 'dpdk-telemetry.py',
> > > + 'hugepage_setup.py'
> > > +],install_dir: 'bin')
> > >
> >
> > Should script name has 'dpdk-' prefix as others do?
>
> +1 to that.
If the script is going to be installed system-wise,
it should start with dpdk- as namespace protection to avoid conflict.
On 9/8/2020 4:17 PM, Stephen Hemminger wrote:
> This is an improved version of the setup of huge pages
> bases on earlier DPDK setup.
>
> Features:
> * can display current hugepage settings.
> * autodetects NUMA vs non NUMA
> * allows setting different page sizes
> recent kernels support multiple sizes.
> * accepts a parameter in bytes (not pages).
>
> Most users will just use --setup argument but if necessary
> the steps of clearing old settings and mounting/umounting
> can be done individually.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * autodetects NUMA vs non NUMA * allows setting different page sizes recent kernels support multiple sizes. * accepts a parameter in bytes (not pages). * can display current hugepage settings. Most users will just use --setup argument but if necessary the steps of clearing old settings and mounting/umounting can be done individually. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v7 - fix issues with show and add Total column cleanup whitespace in hugepages.rst doc/guides/tools/hugepages.rst | 79 ++++++++++ doc/guides/tools/index.rst | 1 + usertools/dpdk-hugepages.py | 270 +++++++++++++++++++++++++++++++++ usertools/meson.build | 7 +- 4 files changed, 356 insertions(+), 1 deletion(-) create mode 100644 doc/guides/tools/hugepages.rst create mode 100755 usertools/dpdk-hugepages.py diff --git a/doc/guides/tools/hugepages.rst b/doc/guides/tools/hugepages.rst new file mode 100644 index 000000000000..40e5387a682c --- /dev/null +++ b/doc/guides/tools/hugepages.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright (c) 2020 Microsoft Corporation + +dpdk-hugpages Application +========================== + +The ``dpdk-hugpages`` tool is a Data Plane Development Kit (DPDK) utility +that helps in reserving hugepages. +As well as checking for current settings. + + +Running the Application +----------------------- + +The tool has a number of command line options: + +.. code-block:: console + + + dpdk-hugpages [options] + + +OPTIONS +------- + +* ``-h, --help`` + + Display usage information and quit + +* ``-s, --show`` + + Print the current huge page configuration + +* ``-c driver, --clear`` + + Clear existing huge page reservation + +* ``-m, --mount`` + + Mount the huge page filesystem + +* ``-u, --unmount`` + + Unmount the huge page filesystem + +* ``-n NODE, --node=NODE`` + + Set NUMA node to reserve pages on + +* ``-p SIZE, --pagesize=SIZE`` + + Select hugepage size to use. + If not specified the default system huge page size is used. + +* ``-r SIZE, --reserve=SIZE`` + + Reserve huge pages. + Size is in bytes with K, M or G suffix. + +* ``--setup SIZE`` + + Short cut to clear, unmount, reserve and mount. + +.. warning:: + + While any user can run the ``dpdk-hugpages.py`` script to view the + status of huge pages, modifying the setup requires root privileges. + + +Examples +-------- + +To display current huge page settings:: + + dpdk-hugpages.py -s + +To a complete setup of with 2 Gigabyte of 1G huge pages:: + + dpdk-hugpages.py -p 1G --setup 2G diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst index c721943606f9..93dde4148e90 100644 --- a/doc/guides/tools/index.rst +++ b/doc/guides/tools/index.rst @@ -11,6 +11,7 @@ DPDK Tools User Guides proc_info pdump pmdinfo + hugepages devbind flow-perf testbbdev diff --git a/usertools/dpdk-hugepages.py b/usertools/dpdk-hugepages.py new file mode 100755 index 000000000000..a78e9b94567a --- /dev/null +++ b/usertools/dpdk-hugepages.py @@ -0,0 +1,270 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2020 Microsoft Corporation +"""Script to query and setup huge pages for DPDK applications.""" + +import argparse +import glob +import os +import re +import sys +from math import log2 + +# Standard binary prefix +BINARY_PREFIX = "KMG" + +# systemd mount point for huge pages +HUGE_MOUNT = "/dev/hugepages" + + +def fmt_memsize(kb): + '''Format memory size in kB into conventional format''' + logk = int(log2(kb) / 10) + suffix = BINARY_PREFIX[logk] + unit = 2**(logk * 10) + return '{}{}b'.format(int(kb / unit), suffix) + + +def get_memsize(arg): + '''Convert memory size with suffix to kB''' + match = re.match(r'(\d+)([' + BINARY_PREFIX + r']?)$', arg.upper()) + if match is None: + sys.exit('{} is not a valid page size'.format(arg)) + num = float(match.group(1)) + suffix = match.group(2) + if suffix == "": + return int(num / 1024) + idx = BINARY_PREFIX.find(suffix) + return int(num * (2**(idx * 10))) + + +def is_numa(): + '''Test if NUMA is necessary on this system''' + return os.path.exists('/sys/devices/system/node') + + +def get_hugepages(path): + '''Read number of reserved pages''' + with open(path + '/nr_hugepages') as nr_hugpages: + return int(nr_hugpages.read()) + return 0 + + +def set_hugepages(path, pages): + '''Write the number of reserved huge pages''' + filename = path + '/nr_hugepages' + try: + with open(filename, 'w') as nr_hugpages: + nr_hugpages.write('{}\n'.format(pages)) + except PermissionError: + sys.exit('Permission denied: need to be root!') + except FileNotFoundError: + filename = os.path.basename(path) + size = filename[10:] + sys.exit('{} is not a valid system huge page size'.format(size)) + + +def show_numa_pages(): + '''Show huge page reservations on Numa system''' + print('Node Pages Size Total') + for numa_path in glob.glob('/sys/devices/system/node/node*'): + node = numa_path[29:] # slice after /sys/devices/system/node/node + path = numa_path + '/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + kb = int(hdir[10:-2]) # slice out of hugepages-NNNkB + print('{:<4} {:<5} {:<6} {}'.format(node, pages, + fmt_memsize(kb), + fmt_memsize(pages * kb))) + + +def show_non_numa_pages(): + '''Show huge page reservations on non Numa system''' + print('Pages Size Total') + path = '/sys/kernel/mm/hugepages' + for hdir in os.listdir(path): + pages = get_hugepages(path + '/' + hdir) + if pages > 0: + kb = int(hdir[10:-2]) + print('{:<5} {:<6} {}'.format(pages, fmt_memsize(kb), + fmt_memsize(pages * kb))) + + +def show_pages(): + '''Show existing huge page settings''' + if is_numa(): + show_numa_pages() + else: + show_non_numa_pages() + + +def clear_pages(): + '''Clear all existing huge page mappings''' + if is_numa(): + dirs = glob.glob( + '/sys/devices/system/node/node*/hugepages/hugepages-*') + else: + dirs = glob.glob('/sys/kernel/mm/hugepages/hugepages-*') + + for path in dirs: + set_hugepages(path, 0) + + +def default_pagesize(): + '''Get default huge page size from /proc/meminfo''' + with open('/proc/meminfo') as meminfo: + for line in meminfo: + if line.startswith('Hugepagesize:'): + return int(line.split()[1]) + return None + + +def set_numa_pages(pages, hugepgsz, node=None): + '''Set huge page reservation on Numa system''' + if node: + nodes = ['/sys/devices/system/node/node{}/hugepages'.format(node)] + else: + nodes = glob.glob('/sys/devices/system/node/node*/hugepages') + + for node_path in nodes: + huge_path = '{}/hugepages-{}kB'.format(node_path, hugepgsz) + set_hugepages(huge_path, pages) + + +def set_non_numa_pages(pages, hugepgsz): + '''Set huge page reservation on non Numa system''' + path = '/sys/kernel/mm/hugepages/hugepages-{}kB'.format(hugepgsz) + set_hugepages(path, pages) + + +def reserve_pages(pages, hugepgsz, node=None): + '''Sets the number of huge pages to be reserved''' + if node or is_numa(): + set_numa_pages(pages, hugepgsz, node=node) + else: + set_non_numa_pages(pages, hugepgsz) + + +def get_mountpoints(): + '''get list of of where hugepage filesystem is mounted''' + mounted = [] + with open('/proc/mounts') as mounts: + for line in mounts: + fields = line.split() + if fields[2] != 'hugetlbfs': + continue + mounted.append(fields[1]) + return mounted + + +def mount_huge(pagesize, mountpoint): + '''mount the huge tlb file system''' + if mountpoint in get_mountpoints(): + print(mountpoint, "already mounted") + return + cmd = "mount -t hugetlbfs" + if pagesize: + cmd += ' -o pagesize={}'.format(pagesize * 1024) + cmd += ' nodev ' + mountpoint + os.system(cmd) + + +def umount_huge(mountpoint): + '''unmount the huge tlb file system (if mounted)''' + if mountpoint in get_mountpoints(): + os.system("umount " + mountpoint) + + +def show_mount(): + '''Show where huge page filesystem is mounted''' + mounted = get_mountpoints() + if mounted: + print("Hugepages mounted on", *mounted) + else: + print("Hugepages not mounted") + + +def main(): + '''Process the command line arguments and setup huge pages''' + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description="Setup huge pages", + epilog=""" +Examples: + +To display current huge page settings: + %(prog)s -s + +To a complete setup of with 2 Gigabyte of 1G huge pages: + %(prog)s -p 1G --setup 2G +""") + parser.add_argument( + '--show', + '-s', + action='store_true', + help="print the current huge page configuration") + parser.add_argument( + '--clear', '-c', action='store_true', help="clear existing huge pages") + parser.add_argument( + '--mount', + '-m', + action='store_true', + help='mount the huge page filesystem') + parser.add_argument( + '--unmount', + '-u', + action='store_true', + help='unmount the system huge page directory') + parser.add_argument( + '--node', '-n', help='select numa node to reserve pages on') + parser.add_argument( + '--pagesize', + '-p', + metavar='SIZE', + help='choose huge page size to use') + parser.add_argument( + '--reserve', + '-r', + metavar='SIZE', + help='reserve huge pages. Size is in bytes with K, M, or G suffix') + parser.add_argument( + '--setup', + metavar='SIZE', + help='setup huge pages by doing clear, unmount, reserve and mount') + args = parser.parse_args() + + if args.setup: + args.clear = True + args.unmount = True + args.reserve = args.setup + args.mount = True + + if args.pagesize: + pagesize_kb = get_memsize(args.pagesize) + else: + pagesize_kb = default_pagesize() + + if args.clear: + clear_pages() + if args.unmount: + umount_huge(HUGE_MOUNT) + + if args.reserve: + reserve_kb = get_memsize(args.reserve) + if reserve_kb % pagesize_kb != 0: + sys.exit( + 'Huge reservation {}kB is not a multiple of page size {}kB'. + format(reserve_kb, pagesize_kb)) + reserve_pages( + int(reserve_kb / pagesize_kb), pagesize_kb, node=args.node) + if args.mount: + mount_huge(pagesize_kb, HUGE_MOUNT) + if args.show: + show_pages() + print() + show_mount() + + +if __name__ == "__main__": + main() diff --git a/usertools/meson.build b/usertools/meson.build index 64e27238f45b..596eaefb0e23 100644 --- a/usertools/meson.build +++ b/usertools/meson.build @@ -1,4 +1,9 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -install_data(['dpdk-devbind.py', 'dpdk-pmdinfo.py', 'dpdk-telemetry.py'], install_dir: 'bin') +install_data([ + 'dpdk-devbind.py', + 'dpdk-pmdinfo.py', + 'dpdk-telemetry.py', + 'dpdk-hugepages.py' +],install_dir: 'bin') -- 2.27.0
::::snip:::: > > diff --git a/doc/guides/tools/hugepages.rst > b/doc/guides/tools/hugepages.rst > new file mode 100644 > index 000000000000..a82b71620011 > --- /dev/null > +++ b/doc/guides/tools/hugepages.rst > @@ -0,0 +1,79 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright (c) 2020 Microsoft Corporation > + > +dpdk-hugpages Application > Should this be dpdk-hugepages ? +========================== > + > +The ``dpdk-hugpages`` tool is a Data Plane Development Kit (DPDK) utility > +that helps in reserving hugepages. > +As well as checking for current settings. > + > + > +Running the Application > +----------------------- > + > +The tool has a number of command line options: > + > +.. code-block:: console > + > + > + dpdk-hugpages [options] > s/hugpages/hugepages ? > + > + > +OPTIONS > +------- > + > +* ``-h, --help`` > + > + Display usage information and quit > + > +* ``-s, --show`` > + > + Print the current huge page configuration > + > +* ``-c driver, --clear`` > + > + Clear existing huge page reservation > + > +* ``-m, --mount`` > + > + Mount the huge page filesystem > + > +* ``-u, --unmount`` > + > + Unmount the huge page filesystem > + > +* ``-n NODE, --node=NODE`` > + > + Set NUMA node to reserve pages on > + > +* ``-p SIZE, --pagesize=SIZE`` > + > + Select hugepage size to use. > + If not specified the default system huge page size is used. > + > +* ``-r SIZE, --reserve=SIZE`` > + > + Reserve huge pages. > + Size is in bytes with K, M or G suffix. > + > +* ``--setup SIZE`` > + > + Short cut to clear, unmount, reserve and mount. > + > +.. warning:: > + > + While any user can run the ``dpdk-hugpages.py`` script to view the > + status of huge pages, modifying the setup requires root privileges. > + > + > +Examples > +-------- > + > +To display current huge page settings:: > + > + dpdk-hugpages.py -s > + > +To a complete setup of with 2 Gigabyte of 1G huge pages:: > + > + dpdk-hugpages.py -p 1G --setup 2G > > ::::snip:::: >
On 09-Sep-20 7:51 PM, Stephen Hemminger wrote:
> This is an improved version of the setup of huge pages
> bases on earlier DPDK setup.
>
> Differences are:
> * autodetects NUMA vs non NUMA
> * allows setting different page sizes
> recent kernels support multiple sizes.
> * accepts a parameter in bytes (not pages).
> * can display current hugepage settings.
>
> Most users will just use --setup argument but if necessary
> the steps of clearing old settings and mounting/umounting
> can be done individually.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--
Thanks,
Anatoly
On Wed, 9 Sep 2020 11:51:01 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:
> This is an improved version of the setup of huge pages
> bases on earlier DPDK setup.
>
> Differences are:
> * autodetects NUMA vs non NUMA
> * allows setting different page sizes
> recent kernels support multiple sizes.
> * accepts a parameter in bytes (not pages).
> * can display current hugepage settings.
>
> Most users will just use --setup argument but if necessary
> the steps of clearing old settings and mounting/umounting
> can be done individually.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ping, no open issues and still not merged.
On 9/14/2020 4:31 PM, Burakov, Anatoly wrote:
> On 09-Sep-20 7:51 PM, Stephen Hemminger wrote:
>> This is an improved version of the setup of huge pages
>> bases on earlier DPDK setup.
>>
>> Differences are:
>> * autodetects NUMA vs non NUMA
>> * allows setting different page sizes
>> recent kernels support multiple sizes.
>> * accepts a parameter in bytes (not pages).
>> * can display current hugepage settings.
>>
>> Most users will just use --setup argument but if necessary
>> the steps of clearing old settings and mounting/umounting
>> can be done individually.
>>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>> ---
>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
24/09/2020 06:31, Stephen Hemminger: > On Wed, 9 Sep 2020 11:51:01 -0700 > Stephen Hemminger <stephen@networkplumber.org> wrote: > > > This is an improved version of the setup of huge pages > > bases on earlier DPDK setup. > > > > Differences are: > > * autodetects NUMA vs non NUMA > > * allows setting different page sizes > > recent kernels support multiple sizes. > > * accepts a parameter in bytes (not pages). > > * can display current hugepage settings. > > > > Most users will just use --setup argument but if necessary > > the steps of clearing old settings and mounting/umounting > > can be done individually. > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > Ping, no open issues and still not merged. There are minor issues. From checkpatch: WARNING:REPEATED_WORD: Possible repeated word: 'of' #462: FILE: usertools/dpdk-hugepages.py:150: + '''get list of of where hugepage filesystem is mounted''' From Ajit: > +dpdk-hugpages Application Should this be dpdk-hugepages ? > + dpdk-hugpages [options] s/hugpages/hugepages ? I will fix them while applying.
20/10/2020 20:01, Ferruh Yigit:
> On 9/14/2020 4:31 PM, Burakov, Anatoly wrote:
> > On 09-Sep-20 7:51 PM, Stephen Hemminger wrote:
> >> This is an improved version of the setup of huge pages
> >> bases on earlier DPDK setup.
> >>
> >> Differences are:
> >> * autodetects NUMA vs non NUMA
> >> * allows setting different page sizes
> >> recent kernels support multiple sizes.
> >> * accepts a parameter in bytes (not pages).
> >> * can display current hugepage settings.
> >>
> >> Most users will just use --setup argument but if necessary
> >> the steps of clearing old settings and mounting/umounting
> >> can be done individually.
> >>
> >> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Applied with some minor fixed, thanks.
On Sun, 22 Nov 2020 22:30:00 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:
> 24/09/2020 06:31, Stephen Hemminger:
> > On Wed, 9 Sep 2020 11:51:01 -0700
> > Stephen Hemminger <stephen@networkplumber.org> wrote:
> >
> > > This is an improved version of the setup of huge pages
> > > bases on earlier DPDK setup.
> > >
> > > Differences are:
> > > * autodetects NUMA vs non NUMA
> > > * allows setting different page sizes
> > > recent kernels support multiple sizes.
> > > * accepts a parameter in bytes (not pages).
> > > * can display current hugepage settings.
> > >
> > > Most users will just use --setup argument but if necessary
> > > the steps of clearing old settings and mounting/umounting
> > > can be done individually.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > Ping, no open issues and still not merged.
>
> There are minor issues.
>
> From checkpatch:
>
> WARNING:REPEATED_WORD: Possible repeated word: 'of'
> #462: FILE: usertools/dpdk-hugepages.py:150:
> + '''get list of of where hugepage filesystem is mounted'''
>
> From Ajit:
>
> > +dpdk-hugpages Application
> Should this be dpdk-hugepages ?
>
> > + dpdk-hugpages [options]
> s/hugpages/hugepages ?
>
> I will fix them while applying.
>
>
Thanks, for fixing the obvious typos
Could we kill the old dpdk-setup.sh script? Or add for 20.11 a notice "this script will go away, learn to use other ones..." It is out of date, and a terrible user API. Of course, there are lots of old pieces of documentation that still refer to it?
24/11/2020 18:45, Stephen Hemminger: > Could we kill the old dpdk-setup.sh script? > Or add for 20.11 a notice "this script will go away, learn to use other ones..." Yes we should add a deprecation warning in the script, and plan its removal for DPDK 21.11. > It is out of date, and a terrible user API. I agree. > Of course, there are lots of old pieces of documentation that still refer to it? Not so much, except bbdev doc and the quick start guide.
On 11/24/2020 9:37 PM, Thomas Monjalon wrote: > 24/11/2020 18:45, Stephen Hemminger: >> Could we kill the old dpdk-setup.sh script? >> Or add for 20.11 a notice "this script will go away, learn to use other ones..." > > Yes we should add a deprecation warning in the script, > and plan its removal for DPDK 21.11. > Ack > >> It is out of date, and a terrible user API. > > I agree. > > >> Of course, there are lots of old pieces of documentation that still refer to it? > > Not so much, except bbdev doc and the quick start guide. > >