From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 6A5775F19 for ; Tue, 5 Feb 2019 21:29:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Feb 2019 12:29:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,336,1544515200"; d="scan'208";a="120208428" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by fmsmga007.fm.intel.com with ESMTP; 05 Feb 2019 12:29:12 -0800 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.408.0; Tue, 5 Feb 2019 12:29:13 -0800 Received: from fmsmsx117.amr.corp.intel.com ([169.254.3.160]) by fmsmsx121.amr.corp.intel.com ([169.254.6.18]) with mapi id 14.03.0415.000; Tue, 5 Feb 2019 12:29:13 -0800 From: "Wiles, Keith" To: Iain Barker CC: "dev@dpdk.org" , "edwin.leung@oracle.com" Thread-Topic: [dpdk-dev] Question about DPDK hugepage fd change Thread-Index: AQHUvYShavbveo+38Uic3IPTnFslkKXSLfsA Date: Tue, 5 Feb 2019 20:29:12 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.255.228.113] Content-Type: text/plain; charset="us-ascii" Content-ID: <6D06B580B82E5B4683C26D3F66304775@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Question about DPDK hugepage fd change X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2019 20:29:16 -0000 > On Feb 5, 2019, at 12:56 PM, Iain Barker wrote: >=20 > Hi everyone, >=20 > We just updated our application from DPDK 17.11.4 (LTS) to DPDK 18.11 (LT= S) and we noticed a regression. >=20 > Our host platform is providing 2MB huge pages, so for 8GB reservation thi= s means 4000 pages are allocated. >=20 > This worked fine in the prior LTS, but after upgrading DPDK what we are s= eeing is that select() on an fd is failing. >=20 > select() works fine when the process starts up, but does not work after D= PDK has been initialized. >=20 > We did some investigation and found in the DPDK patches linked below, the= hugepage tracking mechanism was changed from mmap to an array of file desc= riptors, and the rlimit for fd's is raised from the default to allow more f= d's to be open. >=20 > https://mails.dpdk.org/archives/dev/2018-September/110890.html > https://mails.dpdk.org/archives/dev/2018-September/110889.html >=20 > The problem is that the GNU C library (glibc) has a limit for the maximum= fd passed to select(), and is hard-coded in the POSIX header file and libc= at 1024 (and probably many other OS libraries too as a result). >=20 > Raising the rlimit for fd >1024 has undefined results, per the manpage: >=20 > http://man7.org/linux/man-pages/man2/select.2.html > An fd_set is a fixed size buffer. Executing FD_CLR() or FD_SET() > with a value of fd that is negative or is equal to or larger than > FD_SETSIZE will result in undefined behavior. Moreover, POSIX > requires fd to be a valid file descriptor. >=20 > The Linux kernel allows file descriptor sets of arbitrary size, > determining the length of the sets to be checked from the value of > nfds. However, in the glibc implementation, the fd_set type is fixed > in size. >=20 > Specifically, libc's header include/sys/select.h has an array of fd's whi= ch is FD_SETSIZE deep. > __fd_mask fds_bits[__FD_SETSIZE / __NFDBITS]; >=20 > and usr/include/linux/posix_types.h is hard-coded with > #define __FD_SETSIZE 1024 >=20 > As this define and array are in libc, they are used in many libraries on = a Linux system. So to use setsize >1024 means recompiling OS libraries and = any other package that needs to use FDs, or ensuring that no library used b= y the application ever calls select() on an fd set. That seems an unreasona= ble burden... >=20 > Any thoughts? Would poll work here instead? >=20 > thanks, > Iain Regards, Keith