* [dpdk-dev] Question on mlx5 PMD txq memory registration @ 2017-07-17 13:29 Sagi Grimberg 2017-07-17 21:02 ` Nélio Laranjeiro 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2017-07-17 13:29 UTC (permalink / raw) To: dev, Shahaf Shuler, Nélio Laranjeiro, Yongseok Koh, Roy Shterman, Alexander Solganik Hi, Looking at the code, it looks like mlx5 keeps a MR cache per TX queue (each MR registers a rte_mempool). Once a TX queue is created, mlx5 scans existing mempools and pre-registers a MR for each mempool it meets (using rte_mempool_walk). For each MR registration that exceeds the TX queue cache, it removes the first entry from the cache and deregisters that MR (in txq_mp2mr_reg). Now on TX burst, if the driver sees a mbuf from an unknown mempool, it registers the mempool on the fly and again *deregister* the first MR in the cache. My question is, what guarantees that no inflight send operations posted on the TX queue when we deregister and remove a MR from the cache? AFAICT, it is the driver responsibility to guarantee to never deregister a memory region that has inflight send operations posted, otherwise the send operation *will* complete with a local protection error. Is that taken care of? Another question, why is the MR cache maintained per TX queue and not per device? If the application starts N TX queues then a single mempool will be registered N times instead of just once. Having lots of MR instances will pollute the device ICMC pretty badly. Am I missing something? Thanks, Sagi. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-17 13:29 [dpdk-dev] Question on mlx5 PMD txq memory registration Sagi Grimberg @ 2017-07-17 21:02 ` Nélio Laranjeiro 2017-07-19 6:21 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Nélio Laranjeiro @ 2017-07-17 21:02 UTC (permalink / raw) To: Sagi Grimberg Cc: dev, Shahaf Shuler, Yongseok Koh, Roy Shterman, Alexander Solganik Hello Sagi, On Mon, Jul 17, 2017 at 04:29:34PM +0300, Sagi Grimberg wrote: > Hi, > > Looking at the code, it looks like mlx5 keeps a MR cache per TX queue > (each MR registers a rte_mempool). > > Once a TX queue is created, mlx5 scans existing mempools and > pre-registers a MR for each mempool it meets (using rte_mempool_walk). > For each MR registration that exceeds the TX queue cache, it removes the > first entry from the cache and deregisters that MR (in txq_mp2mr_reg). > > Now on TX burst, if the driver sees a mbuf from an unknown mempool, it > registers the mempool on the fly and again *deregister* the first MR in > the cache. > > My question is, what guarantees that no inflight send operations posted > on the TX queue when we deregister and remove a MR from the cache? There is none, if you send a burst of 9 packets each one coming from a different mempool the first one will be dropped. > AFAICT, it is the driver responsibility to guarantee to never deregister > a memory region that has inflight send operations posted, otherwise > the send operation *will* complete with a local protection error. Is > that taken care of? Up to now we have assumed that the user knows its application needs and he can increase this cache size to its needs through the configuration item. This way this limit and guarantee becomes true. > Another question, why is the MR cache maintained per TX queue and not > per device? If the application starts N TX queues then a single mempool > will be registered N times instead of just once. Having lots of MR > instances will pollute the device ICMC pretty badly. Am I missing > something? Having this cache per device needs a lock on the device structure while threads are sending packets. Having such locks cost cycles, that is why the cache is per queue. Another point is, having several mempool per device is something common, whereas having several mempool per queues is not, it seems logical to have this cache per queue for those two reasons. I am currently re-working this part of the code to improve it using reference counters instead. The cache will remain for performance purpose. This will fix the issues you are pointing. Are you facing some kind of issue? Maybe you can share it, it can help to improve things. Thanks, -- Nélio Laranjeiro 6WIND ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-17 21:02 ` Nélio Laranjeiro @ 2017-07-19 6:21 ` Sagi Grimberg 2017-07-20 13:55 ` Nélio Laranjeiro 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2017-07-19 6:21 UTC (permalink / raw) To: Nélio Laranjeiro Cc: dev, Shahaf Shuler, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky > There is none, if you send a burst of 9 packets each one coming from a > different mempool the first one will be dropped. Its worse than just a drop, without debug enabled the error completion is ignored, and the wqe_pi is taken from an invalid field, which leads to bogus mbufs free (elts_tail is not valid). >> AFAICT, it is the driver responsibility to guarantee to never deregister >> a memory region that has inflight send operations posted, otherwise >> the send operation *will* complete with a local protection error. Is >> that taken care of? > > Up to now we have assumed that the user knows its application needs and > he can increase this cache size to its needs through the configuration > item. > This way this limit and guarantee becomes true. That is an undocumented assumption. >> Another question, why is the MR cache maintained per TX queue and not >> per device? If the application starts N TX queues then a single mempool >> will be registered N times instead of just once. Having lots of MR >> instances will pollute the device ICMC pretty badly. Am I missing >> something? > > Having this cache per device needs a lock on the device structure while > threads are sending packets. Not sure why it needs a lock at all. it *may* need an rcu protection or rw_lock if at all. > Having such locks cost cycles, that is why > the cache is per queue. Another point is, having several mempool per > device is something common, whereas having several mempool per queues is > not, it seems logical to have this cache per queue for those two > reasons. > > > I am currently re-working this part of the code to improve it using > reference counters instead. The cache will remain for performance > purpose. This will fix the issues you are pointing. AFAICT, all this caching mechanism is just working around the fact that mlx5 allocates resources on top of the existing verbs interface. I think it should work like any other pmd driver, i.e. use mbuf the physical addresses. The mlx5 device (like all other rdma devices) has a global DMA lkey that spans the entire physical address space. Just about all the kernel drivers heavily use this lkey. IMO, the mlx5_pmd driver should be able to query the kernel what this lkey is and ask for the kernel to create the QP with privilege level to post send/recv operations with that lkey. And then, mlx5_pmd becomes like other drivers working with physical addresses instead of working around the memory registration sub-optimally. And while were on the subject, what is the plan of detaching mlx5_pmd from its MLNX_OFED dependency? Mellanox has been doing a good job upstreaming the needed features (rdma-core). CC'ing Leon (who is co-maintaining the user-space rdma tree. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-19 6:21 ` Sagi Grimberg @ 2017-07-20 13:55 ` Nélio Laranjeiro 2017-07-20 14:06 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Nélio Laranjeiro @ 2017-07-20 13:55 UTC (permalink / raw) To: Sagi Grimberg, Shahaf Shuler Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky On Wed, Jul 19, 2017 at 09:21:39AM +0300, Sagi Grimberg wrote: > > > There is none, if you send a burst of 9 packets each one coming from a > > different mempool the first one will be dropped. > > Its worse than just a drop, without debug enabled the error completion > is ignored, and the wqe_pi is taken from an invalid field, which leads > to bogus mbufs free (elts_tail is not valid). Right > > > AFAICT, it is the driver responsibility to guarantee to never deregister > > > a memory region that has inflight send operations posted, otherwise > > > the send operation *will* complete with a local protection error. Is > > > that taken care of? > > > > Up to now we have assumed that the user knows its application needs and > > he can increase this cache size to its needs through the configuration > > item. > > This way this limit and guarantee becomes true. > > That is an undocumented assumption. I agree it should be documented, in reality you are the first one we know having this issue. > > > Another question, why is the MR cache maintained per TX queue and not > > > per device? If the application starts N TX queues then a single mempool > > > will be registered N times instead of just once. Having lots of MR > > > instances will pollute the device ICMC pretty badly. Am I missing > > > something? > > > > Having this cache per device needs a lock on the device structure while > > threads are sending packets. > > Not sure why it needs a lock at all. it *may* need an rcu protection > or rw_lock if at all. Tx queues may run on several CPU there is a need to be sure this array cannot be modified by two threads at the same time. Anyway it is costly. > > Having such locks cost cycles, that is why > > the cache is per queue. Another point is, having several mempool per > > device is something common, whereas having several mempool per queues is > > not, it seems logical to have this cache per queue for those two > > reasons. > > > > > > I am currently re-working this part of the code to improve it using > > reference counters instead. The cache will remain for performance > > purpose. This will fix the issues you are pointing. > > AFAICT, all this caching mechanism is just working around the fact > that mlx5 allocates resources on top of the existing verbs interface. > I think it should work like any other pmd driver, i.e. use mbuf the > physical addresses. > > The mlx5 device (like all other rdma devices) has a global DMA lkey that > spans the entire physical address space. Just about all the kernel > drivers heavily use this lkey. IMO, the mlx5_pmd driver should be able > to query the kernel what this lkey is and ask for the kernel to create > the QP with privilege level to post send/recv operations with that lkey. > > And then, mlx5_pmd becomes like other drivers working with physical > addresses instead of working around the memory registration sub-optimally. It is one possibility discussed also with Mellanox guys, the point is this breaks the security point of view which is also an important stuff. If this is added in the future it will certainly be as an option, this way both will be possible, the application could then choose about security vs performance. I don't know any planning on this from Mellanox side, maybe Shahaf have. > And while were on the subject, what is the plan of detaching mlx5_pmd > from its MLNX_OFED dependency? Mellanox has been doing a good job > upstreaming the needed features (rdma-core). CC'ing Leon (who is > co-maintaining the user-space rdma tree. This is also a in progress in PMD part, it should be part of the next DPDK release. -- Nélio Laranjeiro 6WIND ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-20 13:55 ` Nélio Laranjeiro @ 2017-07-20 14:06 ` Sagi Grimberg 2017-07-20 15:20 ` Shahaf Shuler 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2017-07-20 14:06 UTC (permalink / raw) To: Nélio Laranjeiro, Shahaf Shuler Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky >> Its worse than just a drop, without debug enabled the error completion >> is ignored, and the wqe_pi is taken from an invalid field, which leads >> to bogus mbufs free (elts_tail is not valid). > > Right A simple work-around would be to simply fill a correct tail so that error completions will still have it (although I'm not sure the which fields are reliable other than the status in error completions). >> Not sure why it needs a lock at all. it *may* need an rcu protection >> or rw_lock if at all. > > Tx queues may run on several CPU there is a need to be sure this array > cannot be modified by two threads at the same time. Anyway it is > costly. As I said, there are primitives which are designed to handle frequent reads and rare mutations. >> AFAICT, all this caching mechanism is just working around the fact >> that mlx5 allocates resources on top of the existing verbs interface. >> I think it should work like any other pmd driver, i.e. use mbuf the >> physical addresses. >> >> The mlx5 device (like all other rdma devices) has a global DMA lkey that >> spans the entire physical address space. Just about all the kernel >> drivers heavily use this lkey. IMO, the mlx5_pmd driver should be able >> to query the kernel what this lkey is and ask for the kernel to create >> the QP with privilege level to post send/recv operations with that lkey. >> >> And then, mlx5_pmd becomes like other drivers working with physical >> addresses instead of working around the memory registration sub-optimally. > > It is one possibility discussed also with Mellanox guys, the point is > this breaks the security point of view which is also an important stuff. What security aspect? The entire dpdk model builds on top of physical addresses awareness running under root permissions. I'm not saying exposing it to the application nor granting remote permissions to the physical space. mlx5_pmd is a network driver, and as a driver, it should allowed to use the device dma lkey as it sees fit. I honestly think its pretty much mandatory considering the work-arounds mlx5_pmd tries to do (which we agreed are broken). > If this is added in the future it will certainly be as an option, this > way both will be possible, the application could then choose about > security vs performance. Why should the application even be aware of that? Does any other driver expose the user how it maps pkt mbufs to the NIC? Just like the MR handling, its 100% internal to mlx5 and no reason why the user should ever be exposed to any of these details. > I don't know any planning on this from Mellanox side, maybe Shahaf have. rdma-core has a very nice vendor extension mechanism (which is important because we really don't want to pollute the verbs API just for dpdk). Its very easy to expose the dma lkey and create the TX queue-pairs with reserved lkey attributes via this mechanism. Just the kernel needs to verify root permissions before exposing it. >> And while were on the subject, what is the plan of detaching mlx5_pmd >> from its MLNX_OFED dependency? Mellanox has been doing a good job >> upstreaming the needed features (rdma-core). CC'ing Leon (who is >> co-maintaining the user-space rdma tree. > > This is also a in progress in PMD part, it should be part of the next > DPDK release. That is *very* good to hear! Can you guys share a branch? I'm willing to take it for testing. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-20 14:06 ` Sagi Grimberg @ 2017-07-20 15:20 ` Shahaf Shuler 2017-07-20 16:22 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Shahaf Shuler @ 2017-07-20 15:20 UTC (permalink / raw) To: Sagi Grimberg, Nélio Laranjeiro Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky Hi Sagi, Thursday, July 20, 2017 5:06 PM, Sagi Grimberg: > >> Its worse than just a drop, without debug enabled the error > >> completion is ignored, and the wqe_pi is taken from an invalid field, > >> which leads to bogus mbufs free (elts_tail is not valid). > > > > Right > > A simple work-around would be to simply fill a correct tail so that error > completions will still have it (although I'm not sure the which fields are > reliable other than the status in error completions). As Nélio said, we have patches which solves the issue from the root cause. In the meanwhile the walk around is to have large enough MR cache. I agree documentation is missing. > > >> Not sure why it needs a lock at all. it *may* need an rcu protection > >> or rw_lock if at all. > > > > Tx queues may run on several CPU there is a need to be sure this array > > cannot be modified by two threads at the same time. Anyway it is > > costly. > > As I said, there are primitives which are designed to handle frequent reads > and rare mutations. Even with such primitives, rarely lock is more than never lock. > > >> AFAICT, all this caching mechanism is just working around the fact > >> that mlx5 allocates resources on top of the existing verbs interface. > >> I think it should work like any other pmd driver, i.e. use mbuf the > >> physical addresses. > >> > >> The mlx5 device (like all other rdma devices) has a global DMA lkey > >> that spans the entire physical address space. Just about all the > >> kernel drivers heavily use this lkey. IMO, the mlx5_pmd driver should > >> be able to query the kernel what this lkey is and ask for the kernel > >> to create the QP with privilege level to post send/recv operations with > that lkey. > >> > >> And then, mlx5_pmd becomes like other drivers working with physical > >> addresses instead of working around the memory registration sub- > optimally. > > > > It is one possibility discussed also with Mellanox guys, the point is > > this breaks the security point of view which is also an important stuff. > > What security aspect? The entire dpdk model builds on top of physical > addresses awareness running under root permissions. >I'm not saying > exposing it to the application nor granting remote permissions to the physical > space. > > mlx5_pmd is a network driver, and as a driver, it should allowed to use the > device dma lkey as it sees fit. I honestly think its pretty much mandatory > considering the work-arounds mlx5_pmd tries to do (which we agreed are > broken). True, There are many PMDs which can work only with physical memory. However Mellanox NICs have the option to work with virtual one thus provide more security. The fact running under root doesn't mean you have privileges to access every physical page on the server (even if you try very hard to be aware). The issue here, AFAIU, is performance. We are now looking into ways to provide the same performance as if it was only a single lkey, while preserving the security feature. > > > If this is added in the future it will certainly be as an option, this > > way both will be possible, the application could then choose about > > security vs performance. > > Why should the application even be aware of that? Does any other driver > expose the user how it maps pkt mbufs to the NIC? Just like the MR > handling, its 100% internal to mlx5 and no reason why the user should ever > be exposed to any of these details. Other option is the reserved lkey as you suggested, but it will lose the security guarantees. Like every performance optimization it should be the application decision. In fact, there are some discussions on the ML of exposing the option to use va instead of pa. [1] > > > I don't know any planning on this from Mellanox side, maybe Shahaf have. > > rdma-core has a very nice vendor extension mechanism (which is important > because we really don't want to pollute the verbs API just for dpdk). > Its very easy to expose the dma lkey and create the TX queue-pairs with > reserved lkey attributes via this mechanism. Just the kernel needs to verify > root permissions before exposing it. > > >> And while were on the subject, what is the plan of detaching mlx5_pmd > >> from its MLNX_OFED dependency? Mellanox has been doing a good job > >> upstreaming the needed features (rdma-core). CC'ing Leon (who is > >> co-maintaining the user-space rdma tree. > > > > This is also a in progress in PMD part, it should be part of the next > > DPDK release. > > That is *very* good to hear! Can you guys share a branch? I'm willing to take > it for testing. The branch is still pre-mature. It may be good enough for external testing in about two weeks. Contact me directly and I will provide it to you. [1] http://dpdk.org/ml/archives/dev/2017-June/067156.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-20 15:20 ` Shahaf Shuler @ 2017-07-20 16:22 ` Sagi Grimberg 2017-07-23 8:17 ` Shahaf Shuler 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2017-07-20 16:22 UTC (permalink / raw) To: Shahaf Shuler, Nélio Laranjeiro Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky >> As I said, there are primitives which are designed to handle frequent reads >> and rare mutations. > > Even with such primitives, rarely lock is more than never lock. You do realize that the cache mutation involves ibv_dereg_mr() right? Any locking scheme for mutation is negligible compared to that, and rcu_read_lock is practically free. >>> It is one possibility discussed also with Mellanox guys, the point is >>> this breaks the security point of view which is also an important stuff. >> >> What security aspect? The entire dpdk model builds on top of physical >> addresses awareness running under root permissions. >> I'm not saying >> exposing it to the application nor granting remote permissions to the physical >> space. >> >> mlx5_pmd is a network driver, and as a driver, it should allowed to use the >> device dma lkey as it sees fit. I honestly think its pretty much mandatory >> considering the work-arounds mlx5_pmd tries to do (which we agreed are >> broken). > > True, There are many PMDs which can work only with physical memory. > However Mellanox NICs have the option to work with virtual one thus provide more security. I don't understand the security argument. Its completely private to the driver. anything under librte is equivalent to an OS wrt networking, so I fail to see what is the security feature your talking about. > The fact running under root doesn't mean you have privileges to access every physical page on the server (even if you try very hard to be aware). But dpdk core mbufs structs are built this way. > The issue here, AFAIU, is performance. > We are now looking into ways to provide the same performance as if it was only a single lkey, while preserving the security feature. Hmm, What exactly do you have in mind? I'm hoping that you are not referring to ODP. If you are, I think that latency unpredictability would be a huge non-starter, page-faults are way too expensive for dpdk users. Again, personally, I don't think that using virtual memory registration are very useful for a network driver, It's a driver, not an application. >>> If this is added in the future it will certainly be as an option, this >>> way both will be possible, the application could then choose about >>> security vs performance. >> >> Why should the application even be aware of that? Does any other driver >> expose the user how it maps pkt mbufs to the NIC? Just like the MR >> handling, its 100% internal to mlx5 and no reason why the user should ever >> be exposed to any of these details. > > Other option is the reserved lkey as you suggested, but it will lose the security guarantees. OK, obviously I'm missing something. Can you articulate what do you mean by "security guarantees"? > Like every performance optimization it should be the application decision. I tend to disagree with you on this. The application must never be aware of the nitty details of the underlying driver implementation. In fact, propagating this information to the application sounds like a layering violation. > In fact, there are some discussions on the ML of exposing the option to use va instead of pa. [1] My understanding is that the correct proposal was to hide this knowledge from the user. Also the use-case is not security (which I'm not even sure what it means yet). >>> This is also a in progress in PMD part, it should be part of the next >>> DPDK release. >> >> That is *very* good to hear! Can you guys share a branch? I'm willing to take >> it for testing. > > The branch is still pre-mature. It may be good enough for external testing in about two weeks. > Contact me directly and I will provide it to you. Awesome! Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-20 16:22 ` Sagi Grimberg @ 2017-07-23 8:17 ` Shahaf Shuler 2017-07-23 9:03 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Shahaf Shuler @ 2017-07-23 8:17 UTC (permalink / raw) To: Sagi Grimberg, Nélio Laranjeiro Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky Thursday, July 20, 2017 7:22 PM, Sagi Grimberg: > >> As I said, there are primitives which are designed to handle frequent > >> reads and rare mutations. > > > > Even with such primitives, rarely lock is more than never lock. > > You do realize that the cache mutation involves ibv_dereg_mr() right? > Any locking scheme for mutation is negligible compared to that, and > rcu_read_lock is practically free. The fix for the issue you presented includes not to dereg MR on datapath. MR will be deregistered only on device close. MRs replace on cache can still happen, but it won't involve any de-registration. > > >>> It is one possibility discussed also with Mellanox guys, the point > >>> is this breaks the security point of view which is also an important stuff. > >> > >> What security aspect? The entire dpdk model builds on top of physical > >> addresses awareness running under root permissions. > >> I'm not saying > >> exposing it to the application nor granting remote permissions to the > >> physical space. > >> > >> mlx5_pmd is a network driver, and as a driver, it should allowed to > >> use the device dma lkey as it sees fit. I honestly think its pretty > >> much mandatory considering the work-arounds mlx5_pmd tries to do > >> (which we agreed are broken). > > > > True, There are many PMDs which can work only with physical memory. > > However Mellanox NICs have the option to work with virtual one thus > provide more security. > > I don't understand the security argument. Its completely private to the > driver. anything under librte is equivalent to an OS wrt networking, so I fail to > see what is the security feature your talking about. You are correct that as a root you are able to do whatever you want on the server. The security I refer to is to protect against badly written code. The fact the PMD only registers the mempools, and use the device engine to translate the VA, provide some protection. For example, one DPDK process will not be able to access the memory of other DPDK process *by mistake*. I am not saying using the reserved lkey is not a good suggestion, and we plan to test its value. All I am saying is there are maybe other option to provide the same performance with the extra protection mentioned above. One of them can be to use indirect keys. One indirect key to represent 64b memory area, and other regular keys for the hugepages. The rest of the memory area can be filled with keys pointing to /dev/null. > > > The fact running under root doesn't mean you have privileges to access > every physical page on the server (even if you try very hard to be aware). > > But dpdk core mbufs structs are built this way. > > > The issue here, AFAIU, is performance. > > We are now looking into ways to provide the same performance as if it was > only a single lkey, while preserving the security feature. > > Hmm, What exactly do you have in mind? > > I'm hoping that you are not referring to ODP. If you are, I think that latency > unpredictability would be a huge non-starter, page-faults are way too > expensive for dpdk users. No ODP :). As all relevant DPDK memory is on top of hugepages, there is no reason to avoid registration and pinning in advance. > > Again, personally, I don't think that using virtual memory registration are very > useful for a network driver, It's a driver, not an application. > > >>> If this is added in the future it will certainly be as an option, > >>> this way both will be possible, the application could then choose > >>> about security vs performance. > >> > >> Why should the application even be aware of that? Does any other > >> driver expose the user how it maps pkt mbufs to the NIC? Just like > >> the MR handling, its 100% internal to mlx5 and no reason why the user > >> should ever be exposed to any of these details. > > > > Other option is the reserved lkey as you suggested, but it will lose the > security guarantees. > > OK, obviously I'm missing something. Can you articulate what do you mean > by "security guarantees"? > > > Like every performance optimization it should be the application decision. > > I tend to disagree with you on this. The application must never be aware of > the nitty details of the underlying driver implementation. In fact, propagating > this information to the application sounds like a layering violation. > > > In fact, there are some discussions on the ML of exposing the option > > to use va instead of pa. [1] > > My understanding is that the correct proposal was to hide this knowledge > from the user. Also the use-case is not security (which I'm not even sure > what it means yet). > > >>> This is also a in progress in PMD part, it should be part of the > >>> next DPDK release. > >> > >> That is *very* good to hear! Can you guys share a branch? I'm willing > >> to take it for testing. > > > > The branch is still pre-mature. It may be good enough for external testing in > about two weeks. > > Contact me directly and I will provide it to you. > > Awesome! > > Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-23 8:17 ` Shahaf Shuler @ 2017-07-23 9:03 ` Sagi Grimberg 2017-07-24 13:44 ` Bruce Richardson 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2017-07-23 9:03 UTC (permalink / raw) To: Shahaf Shuler, Nélio Laranjeiro Cc: dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky >> I don't understand the security argument. Its completely private to the >> driver. anything under librte is equivalent to an OS wrt networking, so I fail to >> see what is the security feature your talking about. > > You are correct that as a root you are able to do whatever you want on the server. > The security I refer to is to protect against badly written code. > > The fact the PMD only registers the mempools, and use the device engine to translate the VA, provide some protection. > For example, one DPDK process will not be able to access the memory of other DPDK process *by mistake*. Well, this is a fair argument, but without a *complete* solution for all of dpdk peripherals, it has very little merit (if at all). A badly written code can just as easily crash a server by passing a mbuf to a crypto device or another network device that co-exists with mlx5. So, while I understand the argument, I think its value is not worth the hassle that mlx5_pmd needs to take to achieve it. Did this come from a real requirement (from a real implementation)? > I am not saying using the reserved lkey is not a good suggestion, and we plan to test its value. > All I am saying is there are maybe other option to provide the same performance with the extra protection mentioned above. > One of them can be to use indirect keys. One indirect key to represent 64b memory area, and other regular keys for the hugepages. > The rest of the memory area can be filled with keys pointing to /dev/null. If I understand what you are suggesting, this would trigger out-of-order transfers on an indirect memory key just about always (each transfer can originate from a different hugepage and SGL resolution alone will require a walk on the memory key context SGL list). I'm afraid this would introduce a very bad performance scaling due to the fact that a SGL context (klm) will need to be fetched from the ICM for essentially every send operation. Having said that, its just my 2 cents, if your solution works then I don't really care. You are the one testing it... >>> The fact running under root doesn't mean you have privileges to access >> every physical page on the server (even if you try very hard to be aware). >> >> But dpdk core mbufs structs are built this way. >> >>> The issue here, AFAIU, is performance. >>> We are now looking into ways to provide the same performance as if it was >> only a single lkey, while preserving the security feature. >> >> Hmm, What exactly do you have in mind? >> >> I'm hoping that you are not referring to ODP. If you are, I think that latency >> unpredictability would be a huge non-starter, page-faults are way too >> expensive for dpdk users. > > No ODP :). > As all relevant DPDK memory is on top of hugepages, there is no reason to avoid registration and pinning in advance. Agree. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-23 9:03 ` Sagi Grimberg @ 2017-07-24 13:44 ` Bruce Richardson 2017-07-27 10:48 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Bruce Richardson @ 2017-07-24 13:44 UTC (permalink / raw) To: Sagi Grimberg Cc: Shahaf Shuler, Nélio Laranjeiro, dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky On Sun, Jul 23, 2017 at 12:03:41PM +0300, Sagi Grimberg wrote: > > > > I don't understand the security argument. Its completely private to the > > > driver. anything under librte is equivalent to an OS wrt networking, so I fail to > > > see what is the security feature your talking about. > > > > You are correct that as a root you are able to do whatever you want on the server. > > The security I refer to is to protect against badly written code. > > > > The fact the PMD only registers the mempools, and use the device engine to translate the VA, provide some protection. > > For example, one DPDK process will not be able to access the memory of other DPDK process *by mistake*. > > Well, this is a fair argument, but without a *complete* solution for all > of dpdk peripherals, it has very little merit (if at all). A badly > written code can just as easily crash a server by passing a mbuf to > a crypto device or another network device that co-exists with mlx5. > > So, while I understand the argument, I think its value is not worth the > hassle that mlx5_pmd needs to take to achieve it. Did this come from a > real requirement (from a real implementation)? > Would using VFIO (and the IOMMU) not allow us to provide an equivalent level of security to what is provided by the current scheme? From what I see on-list there are a few folks already looking into that area, and taking advantage of the IOMMU should improve security of all devices in DPDK. /Bruce ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] Question on mlx5 PMD txq memory registration 2017-07-24 13:44 ` Bruce Richardson @ 2017-07-27 10:48 ` Sagi Grimberg 0 siblings, 0 replies; 11+ messages in thread From: Sagi Grimberg @ 2017-07-27 10:48 UTC (permalink / raw) To: Bruce Richardson Cc: Shahaf Shuler, Nélio Laranjeiro, dev, Yongseok Koh, Roy Shterman, Alexander Solganik, Leon Romanovsky >> Well, this is a fair argument, but without a *complete* solution for all >> of dpdk peripherals, it has very little merit (if at all). A badly >> written code can just as easily crash a server by passing a mbuf to >> a crypto device or another network device that co-exists with mlx5. >> >> So, while I understand the argument, I think its value is not worth the >> hassle that mlx5_pmd needs to take to achieve it. Did this come from a >> real requirement (from a real implementation)? >> > Would using VFIO (and the IOMMU) not allow us to provide an equivalent > level of security to what is provided by the current scheme? mlx5 does not take over the device with vfio, it simply asks the kernel to setup resources for it and sets a mac steering rule to direct traffic to its own rings. Also, I'm not aware of any way to enforce iommu is enabled. > From what I > see on-list there are a few folks already looking into that area, and > taking advantage of the IOMMU should improve security of all devices in > DPDK. I agree that this can be improved in dpdk, I was simply arguing that mlx5 guarantees alone are not very valuable, especially considering the work-arounds taken in mlx5 to achieve it. mlx5 can be converted to take over the device with vfio and simply not deal with memory registration aspects, but that is really up to mlx5 maintainers. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-07-27 10:48 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-17 13:29 [dpdk-dev] Question on mlx5 PMD txq memory registration Sagi Grimberg 2017-07-17 21:02 ` Nélio Laranjeiro 2017-07-19 6:21 ` Sagi Grimberg 2017-07-20 13:55 ` Nélio Laranjeiro 2017-07-20 14:06 ` Sagi Grimberg 2017-07-20 15:20 ` Shahaf Shuler 2017-07-20 16:22 ` Sagi Grimberg 2017-07-23 8:17 ` Shahaf Shuler 2017-07-23 9:03 ` Sagi Grimberg 2017-07-24 13:44 ` Bruce Richardson 2017-07-27 10:48 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).