Hi all,
I have a design question regarding rte_ring that I didn¡¦t find a historical rationale for in the archives.
Most modern high-perf ring buffers (e.g. some NIC drivers, some DB queue implementations, etc.) eliminate wrap-around branches by taking the ring¡¦s element array and mapping two consecutive VA ranges to the same physical backing pages.
i.e. you allocate N elements, commit enough pages to cover N, then call mmap (or equivalent) again immediately following it, pointing to the same physical pages. So from the CPU¡¦s POV the element array is logically [0 .. N*2) but physically it¡¦s the same backing. Therefore a batch read/write can always be done as a single contiguous memcpy/CLD/STOS without conditionals, even if (head+bulk) exceeds N.
Pseudo illustration:
[phys buffer of N slots] VA: [0 .. N) -> phys VA: [N .. 2N) -> same phys
For multi-element enqueue/dequeue it eliminates the ¡§if wrap ¡÷ split¡¨ case entirely ¡X you can always memcpy in one contiguous op.
Question:
Is there an explicit reason DPDK doesn¡¦t use this technique for rte_ring?
e.g.
I just want to understand the architectural trade that was made.
Because on 64-bit Linux, double-mapping a 1¡V2 MB region is pretty trivial, and bulk ops in DPDK are extremely common ¡X it feels like an obvious microarchitectural win for the ¡§hot ring¡¨ case.
So: is there a concrete blocker? or simply ¡§no one pushed a patch because current perf was ¡¥good enough¡¦¡¨?
Pointers to prior mailing list discussion / patches would be appreciated.
Thanks,
Rami Neiman


Confidentiality notice

This e-mail message and any attachment hereto contain confidential information which may be privileged and which is intended for the exclusive use of its addressee(s). If you receive this message in error, please inform sender immediately and destroy any copy thereof. Furthermore, any disclosure, distribution or copying of this message and/or any attachment hereto without the consent of the sender is strictly prohibited. Thank you.