From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by dpdk.org (Postfix) with ESMTP id 616F8ADC6 for ; Tue, 17 Feb 2015 17:05:21 +0100 (CET) Received: by pdbfl12 with SMTP id fl12so44598557pdb.2 for ; Tue, 17 Feb 2015 08:05:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=pSElwlV/+2WOiE76Wj3BcEfZmKXWG702QoOJGdT6+zc=; b=UhNlW+jR6B6Pqgm61n0TeBHKbqNaPbzCtW+Jhrkipj49caqu3v0XxS55VP4CLR3c0k C2hiNtQkSlvwpSpO6dOnqepHl5RunFMwgN5Z3oNP+a+z77Xk0IJ9l0mbvfNLuzqA/eOK wf91RQDpeBCEr1lWUGKAs0m2tZbaHQEA+rMrovD880yUs5RI6DY3W3w8P0K6NM/Qnb9r VKyB1bsYJ+F4EiQRFnu5fDefzNPm/pNMzC1rAtBRfmGnQISKllkO0BY+qJuqsQkLHvAV MzAQUG6VHgYO8Ga7BwjL+FpAAFq8q6xQlNk/EajXBUnoyV8v/mTBg31uUEQPCZ0KI9o+ Bt5A== X-Gm-Message-State: ALoCoQnz7sGH5Pi4rKWKEqL0+eFadmiW4Gm5kQwcWT0ID5AXTIFtRCyoQUr8BCSLY2yKLPCIXnPR X-Received: by 10.70.131.43 with SMTP id oj11mr51178660pdb.54.1424189120629; Tue, 17 Feb 2015 08:05:20 -0800 (PST) Received: from uryu.home.lan ([144.49.99.22]) by mx.google.com with ESMTPSA id x10sm18256916pas.18.2015.02.17.08.05.17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Feb 2015 08:05:20 -0800 (PST) Date: Tue, 17 Feb 2015 11:05:00 -0500 From: Stephen Hemminger To: "Dumitrescu, Cristian" Message-ID: <20150217110500.41ed8a18@uryu.home.lan> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D8912632318070@IRSMSX108.ger.corp.intel.com> References: <1423116841-19799-4-git-send-email-stephen@networkplumber.org> <1423116841-19799-6-git-send-email-stephen@networkplumber.org> <3EB4FA525960D640B5BDFFD6A3D8912632318070@IRSMSX108.ger.corp.intel.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" , Stephen Hemminger Subject: Re: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Feb 2015 16:05:21 -0000 On Mon, 16 Feb 2015 22:44:31 +0000 "Dumitrescu, Cristian" wrote: > Hi Stephen, > > Sorry, NACK. > > 1. Overflow issue > As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the line of code below results in overflow: > port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / params->rate; > Therefore, there is most likely a significant accuracy loss, which might result in more packets allowed to go out than it should. The tsc shifted is still 64 bits. and rate is 32 bits bytes/sec. I chose scale such that if clock = 3 Ghz then min rate = 715 bytes/sec = 5722 bits/sec > 2. Integer division has a higher cost than floating point division > My understanding is we are considering a performance improvement by replacing the double precision floating point division in: > double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; > with an integer division: > uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / port->cycles_per_byte; > I don't think this is going to have the claimed benefit, as acording to "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Appendix C), the latency of the integer division instruction is significantly bigger than the latency of integer division: > Instruction FDIV double precision: latency = 38-40 cycles > Instruction IDIV: latency = 56 - 80 cycles I observed that performance when from 5Gbit/sec to 10Gbit/sec. Mostly because the floating point engages more instruction units and does not pipeline. Cycle count is not everything. This was on Ivy Bridge processor. > 3. Alternative > I hear though your suggestion about replacing the floating point division with a more performant construction. One suggestion would be to replace it with an integer multiplication followed by a shift right, probably by using a uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to prototype this code myself. Would you be OK to look into providing an alternative implementation? > I looked into multiplative integer method, and will do it in future. But it has more scaling issues since it would require that the values both be 32 bits.