* [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel [not found] <DM6PR03MB3547EDC8BAAD1DA17AE4470EB9020@DM6PR03MB3547.namprd03.prod.outlook.com> @ 2020-10-15 13:26 ` Mohakud, Amiya Ranjan 2020-10-15 13:52 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-15 13:26 UTC (permalink / raw) To: dpdk-dev; +Cc: Mohakud, Amiya Ranjan Hi All, I'm facing one issue with DPDK-18.11.6 in EAL library. Please find the below problem statement. Problem Statement: I have one DPDK application using DPDK version 18.11.6 which works fine in 4.19 version kernel. The rte_eal_init() works fine and eal_clean_runtime_dir() does not remove the files present in dpdk run time directory, /var/run/dpdk/rte/. The same application when I am trying to run in 5.4.35 kernel, the rte_eal_init() behavior is different. eal_clean_runtime_dir() cleans up dpdk run time directory, as a result the secondary processes fail to come up. Basically the flock system call succeeds , return value is 0 which goes and deletes the files. And in 4.19 kernel the flcok system call fails. Note: This is the case with 5.3 kernel version . int eal_clean_runtime_dir(void) { DIR *dir; struct dirent *dirent; int dir_fd, fd, lck_result; static const char * const filters[] = { "fbarray_*", "mp_socket_*" }; /* open directory */ dir = opendir(runtime_dir); if (!dir) { RTE_LOG(ERR, EAL, "Unable to open runtime directory %s\n", runtime_dir); goto error; } dir_fd = dirfd(dir); /* lock the directory before doing anything, to avoid races */ if (flock(dir_fd, LOCK_EX) < 0) { RTE_LOG(ERR, EAL, "Unable to lock runtime directory %s\n", runtime_dir); goto error; } dirent = readdir(dir); if (!dirent) { RTE_LOG(ERR, EAL, "Unable to read runtime directory %s\n", runtime_dir); goto error; } while (dirent != NULL) { unsigned int f_idx; bool skip = true; /* skip files that don't match the patterns */ for (f_idx = 0; f_idx < RTE_DIM(filters); f_idx++) { const char *filter = filters[f_idx]; if (fnmatch(filter, dirent->d_name, 0) == 0) { skip = false; break; } } if (skip) { dirent = readdir(dir); continue; } /* try and lock the file */ fd = openat(dir_fd, dirent->d_name, O_RDONLY); /* skip to next file */ if (fd == -1) { dirent = readdir(dir); continue; } /* non-blocking lock */ lck_result = flock(fd, LOCK_EX | LOCK_NB); /* if lock succeeds, remove the file */ if (lck_result != -1) unlinkat(dir_fd, dirent->d_name, 0); close(fd); dirent = readdir(dir); } /* closedir closes dir_fd and drops the lock */ closedir(dir); return 0; error: if (dir) closedir(dir); RTE_LOG(ERR, EAL, "Error while clearing runtime dir: %s\n", strerror(errno)); return -1; } Could anybody please help me know, if there is any patch related to this or any way to solve this issue? Thanks in advance. Regards Amiya ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 13:26 ` [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Mohakud, Amiya Ranjan @ 2020-10-15 13:52 ` Burakov, Anatoly 2020-10-15 14:43 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-15 13:52 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev On 15-Oct-20 2:26 PM, Mohakud, Amiya Ranjan wrote: > Hi All, > I'm facing one issue with DPDK-18.11.6 in EAL library. Please find the below problem statement. > > Problem Statement: > I have one DPDK application using DPDK version 18.11.6 which works fine in 4.19 version kernel. The rte_eal_init() works fine and eal_clean_runtime_dir() does not remove the files present in dpdk run time directory, /var/run/dpdk/rte/. > The same application when I am trying to run in 5.4.35 kernel, the rte_eal_init() behavior is different. eal_clean_runtime_dir() cleans up dpdk run time directory, as a result the secondary processes fail to come up. Basically the flock system call succeeds , return value is 0 which goes and deletes the files. And in 4.19 kernel the flcok system call fails. > > Note: This is the case with 5.3 kernel version . > I'm not quite sure what the issue is. The runtime dir is *supposed to* be cleared when you're running a primary process, and is not supposed to be cleared when you're running a secondary process. Are you expecting for a *primary* process to not clear the runtime directory? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 13:52 ` Burakov, Anatoly @ 2020-10-15 14:43 ` Mohakud, Amiya Ranjan 2020-10-15 15:08 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-15 14:43 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev The primary process does not clear the files ( e.g. /var/run/dpdk/rte/fbarray_*) in case of 4.19 kernel, since the flock() fails. I think, this is correct behavior, since secondary processes rely on those files for their memzone_init(). But in 5.4, the primary process clears these files, which cause secondary processes to fail. From: Burakov, Anatoly <anatoly.burakov@intel.com> Sent: 15 October 2020 19:23 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com>; dpdk-dev <dev@dpdk.org> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 2:26 PM, Mohakud, Amiya Ranjan wrote: > Hi All, > I'm facing one issue with DPDK-18.11.6 in EAL library. Please find the below problem statement. > > Problem Statement: > I have one DPDK application using DPDK version 18.11.6 which works fine in 4.19 version kernel. The rte_eal_init() works fine and eal_clean_runtime_dir() does not remove the files present in dpdk run time directory, /var/run/dpdk/rte/. > The same application when I am trying to run in 5.4.35 kernel, the rte_eal_init() behavior is different. eal_clean_runtime_dir() cleans up dpdk run time directory, as a result the secondary processes fail to come up. Basically the flock system call succeeds , return value is 0 which goes and deletes the files. And in 4.19 kernel the flcok system call fails. > > Note: This is the case with 5.3 kernel version . > I'm not quite sure what the issue is. The runtime dir is *supposed to* be cleared when you're running a primary process, and is not supposed to be cleared when you're running a secondary process. Are you expecting for a *primary* process to not clear the runtime directory? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 14:43 ` Mohakud, Amiya Ranjan @ 2020-10-15 15:08 ` Burakov, Anatoly 2020-10-15 16:07 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-15 15:08 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev On 15-Oct-20 3:43 PM, Mohakud, Amiya Ranjan wrote: > The primary process does not clear the files ( e.g. > /var/run/dpdk/rte/fbarray_*) in case of 4.19 kernel, since the flock() > fails. I think, this is correct behavior, since secondary processes rely > on those files for their memzone_init(). > > But in 5.4, the primary process clears these files, which cause > secondary processes to fail. I'm not sure i understand. Primary process is supposed to clear the files. It will then recreate them. Are you suggesting that it's clearing them *after* it has created them? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 15:08 ` Burakov, Anatoly @ 2020-10-15 16:07 ` Mohakud, Amiya Ranjan 2020-10-15 16:14 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-15 16:07 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly - Thanks for helping on this. I am not aware, where the primary process re-creates the files. Can you please point me to that? As per my code browsing and understanding, I can see, fbarray_memzone file gets created in rte_eal_memzone_init()->rte_fbarray_init() and it stays there till eal_clean_runtime_dir() gets called towards end of rte_eal_init(). This does not get deleted in 4.19 kernel, but in 5.4, it does. I'm not sure i understand. Primary process is supposed to clear the files. It will then recreate them. Are you suggesting that it's clearing them *after* it has created them? Going by my observation, the file highlighted below gets deleted by the time rte_eal_init() is over. srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 -rw------- 1 root root 16529 Oct 15 11:24 config Please reach out to me for further clarification. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com> Sent: 15 October 2020 20:39 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com>; dpdk-dev <dev@dpdk.org> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 3:43 PM, Mohakud, Amiya Ranjan wrote: > The primary process does not clear the files ( e.g. > /var/run/dpdk/rte/fbarray_*) in case of 4.19 kernel, since the flock() > fails. I think, this is correct behavior, since secondary processes rely > on those files for their memzone_init(). > > But in 5.4, the primary process clears these files, which cause > secondary processes to fail. I'm not sure i understand. Primary process is supposed to clear the files. It will then recreate them. Are you suggesting that it's clearing them *after* it has created them? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 16:07 ` Mohakud, Amiya Ranjan @ 2020-10-15 16:14 ` Mohakud, Amiya Ranjan 2020-10-15 18:01 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-15 16:14 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev; +Cc: Dey, Souvik + Souvik From: Mohakud, Amiya Ranjan Sent: 15 October 2020 21:38 To: Burakov, Anatoly <anatoly.burakov@intel.com>; dpdk-dev <dev@dpdk.org> Subject: RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Hi Anatoly - Thanks for helping on this. I am not aware, where the primary process re-creates the files. Can you please point me to that? As per my code browsing and understanding, I can see, fbarray_memzone file gets created in rte_eal_memzone_init()->rte_fbarray_init() and it stays there till eal_clean_runtime_dir() gets called towards end of rte_eal_init(). This does not get deleted in 4.19 kernel, but in 5.4, it does. I'm not sure i understand. Primary process is supposed to clear the files. It will then recreate them. Are you suggesting that it's clearing them *after* it has created them? Going by my observation, the file highlighted below gets deleted by the time rte_eal_init() is over. srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 -rw------- 1 root root 16529 Oct 15 11:24 config Please reach out to me for further clarification. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>> Sent: 15 October 2020 20:39 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com<mailto:amohakud@rbbn.com>>; dpdk-dev <dev@dpdk.org<mailto:dev@dpdk.org>> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 3:43 PM, Mohakud, Amiya Ranjan wrote: > The primary process does not clear the files ( e.g. > /var/run/dpdk/rte/fbarray_*) in case of 4.19 kernel, since the flock() > fails. I think, this is correct behavior, since secondary processes rely > on those files for their memzone_init(). > > But in 5.4, the primary process clears these files, which cause > secondary processes to fail. I'm not sure i understand. Primary process is supposed to clear the files. It will then recreate them. Are you suggesting that it's clearing them *after* it has created them? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 16:14 ` Mohakud, Amiya Ranjan @ 2020-10-15 18:01 ` Burakov, Anatoly 2020-10-15 18:34 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-15 18:01 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev; +Cc: Dey, Souvik On 15-Oct-20 5:14 PM, Mohakud, Amiya Ranjan wrote: > + Souvik > > *From:*Mohakud, Amiya Ranjan > *Sent:* 15 October 2020 21:38 > *To:* Burakov, Anatoly <anatoly.burakov@intel.com>; dpdk-dev <dev@dpdk.org> > *Subject:* RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() > function cleans the runtime directory in 5.4.35 kernel > > Hi Anatoly - Thanks for helping on this. > > I am not aware, where the primary process re-creates the files. Can you > please point me to that? As per my code browsing and understanding, I > can see, fbarray_memzone file gets created in > rte_eal_memzone_init()->rte_fbarray_init() and it stays there till > eal_clean_runtime_dir() gets called towards end of rte_eal_init(). This > does not get deleted in 4.19 kernel, but in 5.4, it does. > > /I'm not sure i understand. Primary process is supposed to clear the > files. It will then recreate them. Are you suggesting that it's clearing > them *after* it has created them?/ > > / > /Going by my observation, the file highlighted below gets deleted by the > time rte_eal_init() is over. > > srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket > > -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info > > -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone > > -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 > > -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 > > -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 > > -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 > > -rw------- 1 root root 16529 Oct 15 11:24 config > > Please reach out to me for further clarification. > > Regards > > Amiya Hi, Sorry, yes, you're right (it's been a while since i looked at the code), it removes unused stuff at the end of init. There's even a comment explaining why that's done :D It sounds like closing the file descriptor also drops the lock. This locking business is a huge pain because we have to support old kernels which don't have the only sane file locking implementation that Linux has. While i wouldn't go as far as to say "this is a kernel regression" as most likely it's me who's at fault here, but this definitely shouldn't happen. Unfortunately, i won't be online for the next two weeks, but i'll definitely look into this after i'm back, so thanks for your report. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 18:01 ` Burakov, Anatoly @ 2020-10-15 18:34 ` Burakov, Anatoly 2020-10-16 5:34 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-15 18:34 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev; +Cc: Dey, Souvik On 15-Oct-20 7:01 PM, Burakov, Anatoly wrote: > On 15-Oct-20 5:14 PM, Mohakud, Amiya Ranjan wrote: >> + Souvik >> >> *From:*Mohakud, Amiya Ranjan >> *Sent:* 15 October 2020 21:38 >> *To:* Burakov, Anatoly <anatoly.burakov@intel.com>; dpdk-dev >> <dev@dpdk.org> >> *Subject:* RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() >> function cleans the runtime directory in 5.4.35 kernel >> >> Hi Anatoly - Thanks for helping on this. >> >> I am not aware, where the primary process re-creates the files. Can >> you please point me to that? As per my code browsing and >> understanding, I can see, fbarray_memzone file gets created in >> rte_eal_memzone_init()->rte_fbarray_init() and it stays there till >> eal_clean_runtime_dir() gets called towards end of rte_eal_init(). >> This does not get deleted in 4.19 kernel, but in 5.4, it does. >> >> /I'm not sure i understand. Primary process is supposed to clear the >> files. It will then recreate them. Are you suggesting that it's clearing >> them *after* it has created them?/ >> >> / >> /Going by my observation, the file highlighted below gets deleted by >> the time rte_eal_init() is over. >> >> srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket >> >> -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info >> >> -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 >> >> -rw------- 1 root root 16529 Oct 15 11:24 config >> >> Please reach out to me for further clarification. >> >> Regards >> >> Amiya > > Hi, > > Sorry, yes, you're right (it's been a while since i looked at the code), > it removes unused stuff at the end of init. There's even a comment > explaining why that's done :D > > It sounds like closing the file descriptor also drops the lock. This > locking business is a huge pain because we have to support old kernels > which don't have the only sane file locking implementation that Linux has. > > While i wouldn't go as far as to say "this is a kernel regression" as > most likely it's me who's at fault here, but this definitely shouldn't > happen. Unfortunately, i won't be online for the next two weeks, but > i'll definitely look into this after i'm back, so thanks for your report. > Hi, Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-15 18:34 ` Burakov, Anatoly @ 2020-10-16 5:34 ` Mohakud, Amiya Ranjan 2020-10-22 7:12 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-16 5:34 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly, Thanks for your confirmation. Yes, you got it right now. The primary process removing *its own* own files for some locking issue, which should not be the case. Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com> Sent: 16 October 2020 00:04 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com>; dpdk-dev <dev@dpdk.org> Cc: Dey, Souvik <sodey@rbbn.com> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 7:01 PM, Burakov, Anatoly wrote: > On 15-Oct-20 5:14 PM, Mohakud, Amiya Ranjan wrote: >> + Souvik >> >> *From:*Mohakud, Amiya Ranjan >> *Sent:* 15 October 2020 21:38 >> *To:* Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; dpdk-dev >> <dev@dpdk.org<mailto:dev@dpdk.org>> >> *Subject:* RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() >> function cleans the runtime directory in 5.4.35 kernel >> >> Hi Anatoly - Thanks for helping on this. >> >> I am not aware, where the primary process re-creates the files. Can >> you please point me to that? As per my code browsing and >> understanding, I can see, fbarray_memzone file gets created in >> rte_eal_memzone_init()->rte_fbarray_init() and it stays there till >> eal_clean_runtime_dir() gets called towards end of rte_eal_init(). >> This does not get deleted in 4.19 kernel, but in 5.4, it does. >> >> /I'm not sure i understand. Primary process is supposed to clear the >> files. It will then recreate them. Are you suggesting that it's clearing >> them *after* it has created them?/ >> >> / >> /Going by my observation, the file highlighted below gets deleted by >> the time rte_eal_init() is over. >> >> srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket >> >> -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info >> >> -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 >> >> -rw------- 1 root root 16529 Oct 15 11:24 config >> >> Please reach out to me for further clarification. >> >> Regards >> >> Amiya > > Hi, > > Sorry, yes, you're right (it's been a while since i looked at the code), > it removes unused stuff at the end of init. There's even a comment > explaining why that's done :D > > It sounds like closing the file descriptor also drops the lock. This > locking business is a huge pain because we have to support old kernels > which don't have the only sane file locking implementation that Linux has. > > While i wouldn't go as far as to say "this is a kernel regression" as > most likely it's me who's at fault here, but this definitely shouldn't > happen. Unfortunately, i won't be online for the next two weeks, but > i'll definitely look into this after i'm back, so thanks for your report. > Hi, Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-16 5:34 ` Mohakud, Amiya Ranjan @ 2020-10-22 7:12 ` Mohakud, Amiya Ranjan [not found] ` <DM6PR03MB3547C263D4E20B3DD124AD43B91D0@DM6PR03MB3547.namprd03.prod.outlook.com> 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-22 7:12 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly - I raised a bug https://bugs.dpdk.org/show_bug.cgi?id=561 in DPDK for tracking purpose. Regards Amiya From: Mohakud, Amiya Ranjan Sent: 16 October 2020 11:04 To: Burakov, Anatoly <anatoly.burakov@intel.com>; dpdk-dev <dev@dpdk.org> Subject: RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Hi Anatoly, Thanks for your confirmation. Yes, you got it right now. The primary process removing *its own* own files for some locking issue, which should not be the case. Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>> Sent: 16 October 2020 00:04 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com<mailto:amohakud@rbbn.com>>; dpdk-dev <dev@dpdk.org<mailto:dev@dpdk.org>> Cc: Dey, Souvik <sodey@rbbn.com<mailto:sodey@rbbn.com>> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 7:01 PM, Burakov, Anatoly wrote: > On 15-Oct-20 5:14 PM, Mohakud, Amiya Ranjan wrote: >> + Souvik >> >> *From:*Mohakud, Amiya Ranjan >> *Sent:* 15 October 2020 21:38 >> *To:* Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; dpdk-dev >> <dev@dpdk.org<mailto:dev@dpdk.org>> >> *Subject:* RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() >> function cleans the runtime directory in 5.4.35 kernel >> >> Hi Anatoly - Thanks for helping on this. >> >> I am not aware, where the primary process re-creates the files. Can >> you please point me to that? As per my code browsing and >> understanding, I can see, fbarray_memzone file gets created in >> rte_eal_memzone_init()->rte_fbarray_init() and it stays there till >> eal_clean_runtime_dir() gets called towards end of rte_eal_init(). >> This does not get deleted in 4.19 kernel, but in 5.4, it does. >> >> /I'm not sure i understand. Primary process is supposed to clear the >> files. It will then recreate them. Are you suggesting that it's clearing >> them *after* it has created them?/ >> >> / >> /Going by my observation, the file highlighted below gets deleted by >> the time rte_eal_init() is over. >> >> srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket >> >> -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info >> >> -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 >> >> -rw------- 1 root root 16529 Oct 15 11:24 config >> >> Please reach out to me for further clarification. >> >> Regards >> >> Amiya > > Hi, > > Sorry, yes, you're right (it's been a while since i looked at the code), > it removes unused stuff at the end of init. There's even a comment > explaining why that's done :D > > It sounds like closing the file descriptor also drops the lock. This > locking business is a huge pain because we have to support old kernels > which don't have the only sane file locking implementation that Linux has. > > While i wouldn't go as far as to say "this is a kernel regression" as > most likely it's me who's at fault here, but this definitely shouldn't > happen. Unfortunately, i won't be online for the next two weeks, but > i'll definitely look into this after i'm back, so thanks for your report. > Hi, Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <DM6PR03MB3547C263D4E20B3DD124AD43B91D0@DM6PR03MB3547.namprd03.prod.outlook.com>]
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel [not found] ` <DM6PR03MB3547C263D4E20B3DD124AD43B91D0@DM6PR03MB3547.namprd03.prod.outlook.com> @ 2020-10-28 7:00 ` Mohakud, Amiya Ranjan 2020-10-29 15:51 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-28 7:00 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly, Are you back from vacation? Can you please let me know if there is any proceedings on this or if there is anything pending from my side ? https://bugs.dpdk.org/show_bug.cgi?id=561 Regards Amiya From: Mohakud, Amiya Ranjan Sent: 23 October 2020 01:07 To: 'Burakov, Anatoly' <anatoly.burakov@intel.com> Subject: RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Please let me know if anything pending from my side. From: Mohakud, Amiya Ranjan Sent: 22 October 2020 12:43 To: 'Burakov, Anatoly' <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; 'dpdk-dev' <dev@dpdk.org<mailto:dev@dpdk.org>> Subject: RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Hi Anatoly - I raised a bug https://bugs.dpdk.org/show_bug.cgi?id=561 in DPDK for tracking purpose. Regards Amiya From: Mohakud, Amiya Ranjan Sent: 16 October 2020 11:04 To: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; dpdk-dev <dev@dpdk.org<mailto:dev@dpdk.org>> Subject: RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Hi Anatoly, Thanks for your confirmation. Yes, you got it right now. The primary process removing *its own* own files for some locking issue, which should not be the case. Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>> Sent: 16 October 2020 00:04 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com<mailto:amohakud@rbbn.com>>; dpdk-dev <dev@dpdk.org<mailto:dev@dpdk.org>> Cc: Dey, Souvik <sodey@rbbn.com<mailto:sodey@rbbn.com>> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 15-Oct-20 7:01 PM, Burakov, Anatoly wrote: > On 15-Oct-20 5:14 PM, Mohakud, Amiya Ranjan wrote: >> + Souvik >> >> *From:*Mohakud, Amiya Ranjan >> *Sent:* 15 October 2020 21:38 >> *To:* Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; dpdk-dev >> <dev@dpdk.org<mailto:dev@dpdk.org>> >> *Subject:* RE: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() >> function cleans the runtime directory in 5.4.35 kernel >> >> Hi Anatoly - Thanks for helping on this. >> >> I am not aware, where the primary process re-creates the files. Can >> you please point me to that? As per my code browsing and >> understanding, I can see, fbarray_memzone file gets created in >> rte_eal_memzone_init()->rte_fbarray_init() and it stays there till >> eal_clean_runtime_dir() gets called towards end of rte_eal_init(). >> This does not get deleted in 4.19 kernel, but in 5.4, it does. >> >> /I'm not sure i understand. Primary process is supposed to clear the >> files. It will then recreate them. Are you suggesting that it's clearing >> them *after* it has created them?/ >> >> / >> /Going by my observation, the file highlighted below gets deleted by >> the time rte_eal_init() is over. >> >> srwxr-xr-x 1 root root 0 Oct 15 11:24 mp_socket >> >> -rw------- 1 root root 12432 Oct 15 11:24 hugepage_info >> >> -rw------- 1 root root 188416 Oct 15 11:24 fbarray_memzone >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-1 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-0 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-3 >> >> -rw------- 1 root root 397312 Oct 15 11:24 fbarray_memseg-2048k-0-2 >> >> -rw------- 1 root root 16529 Oct 15 11:24 config >> >> Please reach out to me for further clarification. >> >> Regards >> >> Amiya > > Hi, > > Sorry, yes, you're right (it's been a while since i looked at the code), > it removes unused stuff at the end of init. There's even a comment > explaining why that's done :D > > It sounds like closing the file descriptor also drops the lock. This > locking business is a huge pain because we have to support old kernels > which don't have the only sane file locking implementation that Linux has. > > While i wouldn't go as far as to say "this is a kernel regression" as > most likely it's me who's at fault here, but this definitely shouldn't > happen. Unfortunately, i won't be online for the next two weeks, but > i'll definitely look into this after i'm back, so thanks for your report. > Hi, Just to clarify: Removing stuff at the end of the init process is intended process. This is because every process will create their own shadow page tables, and defunct processes will not remove them afterwards. This is done because otherwise the tmpfs will slowly fill up with unused processes' fbarrays. What is *not* intended behavior is primary process removing *its own* filesystem entries - this shouldn't happen, and in fact wouldn't have happened if the file locking was working as it was intended. Normally, when seeing files that are locked (in use), EAL will skip them, so that only files that are not in use would be deleted. It looks like you're observing exactly that - primary process removing *its own* files for some reason. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-28 7:00 ` Mohakud, Amiya Ranjan @ 2020-10-29 15:51 ` Burakov, Anatoly 2020-10-29 17:10 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-29 15:51 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev On 28-Oct-20 7:00 AM, Mohakud, Amiya Ranjan wrote: > Hi Anatoly, > > Are you back from vacation? Can you please let me know if there is any > proceedings on this or if there is anything pending from my side ? > > https://bugs.dpdk.org/show_bug.cgi?id=561 > > Regards > > Amiya > Hi, I've just checked with kernel v18.11.9 with kernel 5.8, everything working fine. I'll try with the exact version you're using and try to find a kernel 5.4 machine as well, and report back. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-29 15:51 ` Burakov, Anatoly @ 2020-10-29 17:10 ` Burakov, Anatoly 2020-10-29 17:40 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-29 17:10 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev On 29-Oct-20 3:51 PM, Burakov, Anatoly wrote: > On 28-Oct-20 7:00 AM, Mohakud, Amiya Ranjan wrote: >> Hi Anatoly, >> >> Are you back from vacation? Can you please let me know if there is any >> proceedings on this or if there is anything pending from my side ? >> >> https://bugs.dpdk.org/show_bug.cgi?id=561 >> >> Regards >> >> Amiya >> > > Hi, > > I've just checked with kernel v18.11.9 with kernel 5.8, everything > working fine. I'll try with the exact version you're using and try to > find a kernel 5.4 machine as well, and report back. > Tried with 5.4.35 and with v18.11.6, cannot reproduce as well. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-29 17:10 ` Burakov, Anatoly @ 2020-10-29 17:40 ` Mohakud, Amiya Ranjan 2020-10-30 10:00 ` Burakov, Anatoly 0 siblings, 1 reply; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2020-10-29 17:40 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly Thanks for the reply. Do you have a sample code for primary and secondary processes which I can try in my setup? And once reproduced, can let you know. We have DPDK applications in our product with which I always see this issue. Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com> Sent: 29 October 2020 22:40 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com>; dpdk-dev <dev@dpdk.org> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 29-Oct-20 3:51 PM, Burakov, Anatoly wrote: > On 28-Oct-20 7:00 AM, Mohakud, Amiya Ranjan wrote: >> Hi Anatoly, >> >> Are you back from vacation? Can you please let me know if there is any >> proceedings on this or if there is anything pending from my side ? >> >> https://bugs.dpdk.org/show_bug.cgi?id=561<https://bugs.dpdk.org/show_bug.cgi?id=561> >> >> Regards >> >> Amiya >> > > Hi, > > I've just checked with kernel v18.11.9 with kernel 5.8, everything > working fine. I'll try with the exact version you're using and try to > find a kernel 5.4 machine as well, and report back. > Tried with 5.4.35 and with v18.11.6, cannot reproduce as well. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-29 17:40 ` Mohakud, Amiya Ranjan @ 2020-10-30 10:00 ` Burakov, Anatoly 2021-01-04 7:53 ` Mohakud, Amiya Ranjan 0 siblings, 1 reply; 16+ messages in thread From: Burakov, Anatoly @ 2020-10-30 10:00 UTC (permalink / raw) To: Mohakud, Amiya Ranjan, dpdk-dev On 29-Oct-20 5:40 PM, Mohakud, Amiya Ranjan wrote: > Hi Anatoly > > Thanks for the reply. > > Do you have a sample code for primary and secondary processes which I > can try in my setup? And once reproduced, can let you know. > > We have DPDK applications in our product with which I always see this issue. > > Regards > > Amiya > Hi, I was running a test app with a debugger, putting a break point before and after eal_cleanup_runtime_dir(), and checking if things were deleted that shouldn't have been deleted. Any old DPDK app will do for this purpose, as all of them will call EAL init. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel 2020-10-30 10:00 ` Burakov, Anatoly @ 2021-01-04 7:53 ` Mohakud, Amiya Ranjan 0 siblings, 0 replies; 16+ messages in thread From: Mohakud, Amiya Ranjan @ 2021-01-04 7:53 UTC (permalink / raw) To: Burakov, Anatoly, dpdk-dev Hi Anatoly - Happy New Year ! I have updated the bug https://bugs.dpdk.org/show_bug.cgi?id=561with my latest comments. Request you to please have a look and let me know if anything more is required. Thanks! Regards Amiya From: Burakov, Anatoly <anatoly.burakov@intel.com> Sent: 30 October 2020 15:30 To: Mohakud, Amiya Ranjan <amohakud@rbbn.com>; dpdk-dev <dev@dpdk.org> Subject: Re: [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ On 29-Oct-20 5:40 PM, Mohakud, Amiya Ranjan wrote: > Hi Anatoly > > Thanks for the reply. > > Do you have a sample code for primary and secondary processes which I > can try in my setup? And once reproduced, can let you know. > > We have DPDK applications in our product with which I always see this issue. > > Regards > > Amiya > Hi, I was running a test app with a debugger, putting a break point before and after eal_cleanup_runtime_dir(), and checking if things were deleted that shouldn't have been deleted. Any old DPDK app will do for this purpose, as all of them will call EAL init. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-01-04 7:53 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <DM6PR03MB3547EDC8BAAD1DA17AE4470EB9020@DM6PR03MB3547.namprd03.prod.outlook.com> 2020-10-15 13:26 ` [dpdk-dev] eal: DPDK: 18.11.6 version rte_eal_init() function cleans the runtime directory in 5.4.35 kernel Mohakud, Amiya Ranjan 2020-10-15 13:52 ` Burakov, Anatoly 2020-10-15 14:43 ` Mohakud, Amiya Ranjan 2020-10-15 15:08 ` Burakov, Anatoly 2020-10-15 16:07 ` Mohakud, Amiya Ranjan 2020-10-15 16:14 ` Mohakud, Amiya Ranjan 2020-10-15 18:01 ` Burakov, Anatoly 2020-10-15 18:34 ` Burakov, Anatoly 2020-10-16 5:34 ` Mohakud, Amiya Ranjan 2020-10-22 7:12 ` Mohakud, Amiya Ranjan [not found] ` <DM6PR03MB3547C263D4E20B3DD124AD43B91D0@DM6PR03MB3547.namprd03.prod.outlook.com> 2020-10-28 7:00 ` Mohakud, Amiya Ranjan 2020-10-29 15:51 ` Burakov, Anatoly 2020-10-29 17:10 ` Burakov, Anatoly 2020-10-29 17:40 ` Mohakud, Amiya Ranjan 2020-10-30 10:00 ` Burakov, Anatoly 2021-01-04 7:53 ` Mohakud, Amiya Ranjan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).