From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) by dpdk.org (Postfix) with ESMTP id A8A8B5A8C for ; Tue, 17 Nov 2015 23:49:06 +0100 (CET) Received: from localhost (50-206-118-3-static.hfc.comcastbusiness.net [50.206.118.3]) by mx.zohomail.com with SMTPS id 1447800544675594.2553470972526; Tue, 17 Nov 2015 14:49:04 -0800 (PST) Date: Tue, 17 Nov 2015 14:49:03 -0800 From: Flavio Leitner To: "Michael S. Tsirkin" Message-ID: <20151117224903.GB2340@x240.home> References: <20151021124815.GG3115@yliu-dev.sh.intel.com> <20151021172336-mutt-send-email-mst@redhat.com> <20151022094955.GR3115@yliu-dev.sh.intel.com> <20151022142141-mutt-send-email-mst@redhat.com> <20151024023408.GA7182@x240.home> <20151024204401-mutt-send-email-mst@redhat.com> <20151028203041.GB22208@x240.home> <20151028230324-mutt-send-email-mst@redhat.com> <20151116222057.GA2340@x240.home> <20151117094416-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151117094416-mutt-send-email-mst@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Cc: dev@dpdk.org, marcel@redhat.com Subject: Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 22:49:07 -0000 On Tue, Nov 17, 2015 at 10:23:38AM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote: > > On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote: > > > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote: > > > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote: > > > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote: > > > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote: > > > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote: > > > > > > > > > > > Please note that for virtio devices, guest is supposed to > > > > > > > > > > > control the placement of incoming packets in RX queues. > > > > > > > > > > > > > > > > > > > > I may not follow you. > > > > > > > > > > > > > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the > > > > > > > > > > guest, how could the guest take the control here? > > > > > > > > > > > > > > > > > > > > --yliu > > > > > > > > > > > > > > > > > > vhost should do what guest told it to. > > > > > > > > > > > > > > > > > > See virtio spec: > > > > > > > > > 5.1.6.5.5 Automatic receive steering in multiqueue mode > > > > > > > > > > > > > > > > Spec says: > > > > > > > > > > > > > > > > After the driver transmitted a packet of a flow on transmitqX, > > > > > > > > the device SHOULD cause incoming packets for that flow to be > > > > > > > > steered to receiveqX. > > > > > > > > > > > > > > > > > > > > > > > > Michael, I still have no idea how vhost could know the flow even > > > > > > > > after discussion with Huawei. Could you be more specific about > > > > > > > > this? Say, how could guest know that? And how could guest tell > > > > > > > > vhost which RX is gonna to use? > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > --yliu > > > > > > > > > > > > > > I don't really understand the question. > > > > > > > > > > > > > > When guests transmits a packet, it makes a decision > > > > > > > about the flow to use, and maps that to a tx/rx pair of queues. > > > > > > > > > > > > > > It sends packets out on the tx queue and expects device to > > > > > > > return packets from the same flow on the rx queue. > > > > > > > > > > > > Why? I can understand that there should be a mapping between > > > > > > flows and queues in a way that there is no re-ordering, but > > > > > > I can't see the relation of receiving a flow with a TX queue. > > > > > > > > > > > > fbl > > > > > > > > > > That's the way virtio chose to program the rx steering logic. > > > > > > > > > > It's low overhead (no special commands), and > > > > > works well for TCP when user is an endpoint since rx and tx > > > > > for tcp are generally tied (because of ack handling). > > > > It is low overhead for the control plane, but not for the data plane. > > Well, there's zero data plane overhead within the guest. > You can't go lower :) I agree, but I am talking about vhost-user or whatever means we use to provide packets to the virtio backend. That will have to distribute the packets according to the guest's mapping which is not zero overhead. > > > > > We can discuss other ways, e.g. special commands for guests to > > > > > program steering. > > > > > We'd have to first see some data showing the current scheme > > > > > is problematic somehow. > > > > The issue is that the spec assumes the packets are coming in > > a serialized way and the distribution will be made by vhost-user > > but that isn't necessarily true. > > > > Making the distribution guest controlled is obviously the right > thing to do if guest is the endpoint: we need guest scheduler to > make the decisions, it's the only entity that knows > how are tasks distributed across VCPUs. Again, I agree. My point is that it can also allows no mapping or full freedom. I don't see that as an option now. > It's possible that this is not the right thing for when guest > is just doing bridging between two VNICs: > are you saying packets should just go from RX queue N > on eth0 to TX queue N on eth1, making host make all > the queue selection decisions? The idea is that the guest could TX on queue N and the host would push packets from the same stream on RX queue Y. So, guest is free to send packets on any queue and the host is free to send packet on any queue as long as both keep a stable mapping to avoid re-ordering. What if the guest is not trustable and the host has the requirement to send priority packets to queue#0? That is not possible if backend is forced to follow guest mapping. > This sounds reasonable. Since there's a mix of local and > bridged traffic normally, does this mean we need > a per-packet flag that tells host to > ignore the packet for classification purposes? Real NICs will apply a hash to each coming packet and send out to a specific queue, then a CPU is selected from there. So, the NIC driver or the OS doesn't change that. Same rationale works for virtio-net. Of course, we can use ntuple to force specific streams to go to specific queues, but that isn't the default policy. > > > > The issue is that the restriction imposes operations to be done in the > > > > data path. For instance, Open vSwitch has N number of threads to manage > > > > X RX queues. We distribute them in round-robin fashion. So, the thread > > > > polling one RX queue will do all the packet processing and push it to the > > > > TX queue of the other device (vhost-user or not) using the same 'id'. > > > > > > > > Doing so we can avoid locking between threads and TX queues and any other > > > > extra computation while still keeping the packet ordering/distribution fine. > > > > > > > > However, if vhost-user has to send packets according with guest mapping, > > > > it will require locking between queues and additional operations to select > > > > the appropriate queue. Those actions will cause performance issues. > > > > > > You only need to send updates if guest moves a flow to another queue. > > > This is very rare since guest must avoid reordering. > > > > OK, maybe I missed something. Could you point me to the spec talking > > about the update? > > > > It doesn't talk about that really - it's an implementation > detail. What I am saying is that you can have e.g. > a per queue data structure with flows using it. > If you find the flow there, then you know nothing changed > and there is no need to update other queues. > > > > > > Oh and you don't have to have locking. Just update the table and make > > > the target pick up the new value at leasure, worst case a packet ends up > > > in the wrong queue. > > > > You do because packets are coming on different vswitch queues and they > > could get mapped to the same virtio queue enforced by the guest, so some > > sort of synchronization is needed. > > Right. So to optimize that, you really need a 1:1 mapping, but this > optimization only makes sense if guest is not in the end processing > these packets in the application on the same CPU - otherwise you > are just causing IPIs. Guest should move the apps to the CPU processing the queues. That's what Linux does by default and that's why I am saying the requirement from spec should be about maintaining stable mapping. > With the per-packet flag to bypass the classifier as suggested above, > you would do a lookup, find flow is not classified and just forward > it 1:1 as you wanted to. That is heavy, we can't afford per packet inspection. > > That is one thing. Another is that it will need some mapping between the > > hash available in the vswitch (not necessary L2~L4) with the hash/queue > > mapping provided by the guest. That doesn't require locking, but it's a > > costly operation. Alternatively, vswitch could calculate full L2-L4 hash > > which is also a costly operation. > > > > Packets ending in the wrong queue isn't that bad, but then we need to > > enforce processing order because re-ordering is really bad. > > > > Right. So if you consider a mix of packets with guest as endpoint > and guest as a bridge, then there's apparently no way out - > you need to identify the flow somehow in order to know > which is which. > > I guess one solution is to give up and make it a global > decision. My proposal is to: 1) keep stable flow-to-queue mapping stable by default 2) Respect guest's request to map certain flow to specific queue. > But OTOH I think igb supports calculating the RX hash in hardware: > it sets NETIF_F_RXHASH on Linux. > If so, can't that be used for the initial lookup? Yes, it does. But I can't guarantee all vswitch ports or packets will have a valid rxhash. Even if we decide to use that, we still need to move each packet coming from different vswitch queues to specific virtio queues. (packets crossing queues) > > > > I see no real benefit from enforcing the guest mapping outside to > > > > justify all the computation cost, so my suggestion is to change the > > > > spec to suggest that behavior, but not to require that to be compliant. > > > > > > > > Does that make sense? > > > > > > > > Thanks, > > > > fbl > > > > > > It's not a question of what the spec says, it's a question of the > > > quality of implementation: guest needs to be able to balance load > > > between CPUs serving the queues, this means it needs a way to control > > > steering. > > > > Indeed, a mapping could be provided by the guest to steer certain flows > > to specific queues and of course the implementation must follow that. > > However, it seems that guest could let that mapping simply open too. > > Right, we can add such an option in the spec. :-) > > > IMO having dpdk control it makes no sense in the scenario. > > > > Why not? The only requirement should be that the implemention > > should avoid re-ordering by keeping the mapping stable between > > streams and queues. > > Well this depends on whether there's an application within > guest that consumes the flow and does something with > the data. If yes, then we need to be careful not to > compete with that application for CPU, otherwise > it won't be able to produce data. When you have multiple queues, ideally irqbalance will spread their interrupts to each CPU. So, when a specific queue receives a packet, it will generate an interrupt, which runs a softirq that puts the data into app's socket and schedule the app to run next. So, in summary, the app by default will run on the CPU processing its traffic. > I guess that's not the case for pcktgen or forwarding, > in these cases networking is all you care about. Those use-cases will work regardless. > > > This is different from dpdk sending packets to real NIC > > > queues which all operate in parallel. > > > > The goal of multiqueue support is to have them working in parallel. > > > > fbl > > What I meant is "in parallel with the application doing the > actual logic and producing the packets". fbl