From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0AED8A00C4; Thu, 4 Jun 2020 23:06:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 116D41D5ED; Thu, 4 Jun 2020 23:06:48 +0200 (CEST) Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by dpdk.org (Postfix) with ESMTP id 660941D5E6 for ; Thu, 4 Jun 2020 23:06:46 +0200 (CEST) Received: by mail-pg1-f195.google.com with SMTP id m1so4109901pgk.1 for ; Thu, 04 Jun 2020 14:06:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CADh2GrQ6FRG0lkmbencMfiQ7pPDu7+gF1yKIyQbCb8=; b=QR4LZamZBpd7k5oKRK5uCnRqdSrQDWspwU2u+EvCg3xvUZswx1zXlYkItTkyAWU5Pe Wr97gxdh+unzMOexKsF0iDfnJ4jPLKDPS6cG6jCl09bi45kU8eZOAirYDREwhyH6+146 N5Z6d8AAmunamNYLVLdw9XI+H9scXIKoagj5KgI/Txb/XQ0C3v2g87598dW+JfRqeH5A tz6DmDdJ7xBOqnOZVElfn/ESrd0GqaSGfjIlYj3nHKIzSN52urJ3lfHZSONNsP8OxdTn RD82PggcMBlU+tD/zfvf6L7stHxVTlgIwYtpatGjYpIgrWsGvuA9AqbX/ETA3lo3DOug w57Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CADh2GrQ6FRG0lkmbencMfiQ7pPDu7+gF1yKIyQbCb8=; b=EG5acsS0vpTLkXuMiF6cXfUUZeTfuLPL9/2VXYkJORGpcwWxoBEU3wnW4/mukIDPbo EUWbbNryd2x8C8YaHWX+XeUPwrfiNKcLa34ahgZL9043r1Nx7soup3UZPYJ7Qr6jrHv9 N8DCP8GnBU9w99A4A95ZPD3aw4JQSu7wSRwGSK6x9wS1wPLNw273XI2K0xjA8ILyw2g3 4VNck+TZOJ5LSG9L7m4ylUrctziicbIuEZtTTwzvFuMxuKxr8aMETfZlc2Vy60kOrthW MbYGFfu18SW4BfFXb10lrDEmm+wHrvflPue2GJd9jzMBlzWcaT+s6eBH1r9IdX681Zpa igyw== X-Gm-Message-State: AOAM531+gywzMTw1PcTrd2kB2Ll8zNlpmAmiumTdDLYaa8jQ6toCJG8f 2YZNcDbalE2YsBSWWsodV0+wyQ== X-Google-Smtp-Source: ABdhPJxXyIxUJCTbOqaDx3pyFGC1nG9TaP6mszFxGDGc4oUonzV7hP0j+eE7rBjnjQ2CjUd1pjc4Mg== X-Received: by 2002:a63:b252:: with SMTP id t18mr6200894pgo.133.1591304805503; Thu, 04 Jun 2020 14:06:45 -0700 (PDT) Received: from hermes.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id x1sm5132549pfn.76.2020.06.04.14.06.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 14:06:45 -0700 (PDT) Date: Thu, 4 Jun 2020 14:06:42 -0700 From: Stephen Hemminger To: "Wang, Yipeng1" Cc: Honnappa Nagarahalli , "Gobriel, Sameh" , "Richardson, Bruce" , "dev@dpdk.org" , "De Lara Guarch, Pablo" , nd Message-ID: <20200604140642.45a0b0f7@hermes.lan> In-Reply-To: References: <20200604171731.6738-1-stephen@networkplumber.org> <20200604105817.1a3a2749@hermes.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, 4 Jun 2020 20:22:06 +0000 "Wang, Yipeng1" wrote: > > -----Original Message----- > > From: Honnappa Nagarahalli > > Sent: Thursday, June 4, 2020 12:34 PM > > To: Wang, Yipeng1 ; Stephen Hemminger > > > > Cc: Gobriel, Sameh ; Richardson, Bruce > > ; dev@dpdk.org; De Lara Guarch, Pablo > > ; nd ; Honnappa > > Nagarahalli ; nd > > Subject: RE: [PATCH] hash: document breakage with multi-writer thread > > > > > > > > > > > > > > > > > > > Subject: [PATCH] hash: document breakage with multi-writer > > > > > > > thread > > > > > > > > > > > > > > The code in rte_cuckoo_hash multi-writer support is broken if > > > > > > > write operations are called from a non-EAL thread. > > > > > > > > > > > > > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non > > > > > > > EAL thread and that leads to using wrong local cache. > > > > > > > > > > > > > > Add error checks and document the restriction. > > > > > > Having multiple non-EAL writer threads is a valid use case. > > > > > > Should we fix the > > > > > issue instead? > > > > > > > > > > Discovered this the hard way... > > > > > > > > > > Fixing is non-trivial. Basically, the local cache has to be take > > > > > out and that leads to having to do real locking or atomic operations. > > > > Looking at rte_hash_create function: > > > > > > > > if (params->extra_flag & > > > > RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) { > > > > use_local_cache = 1; > > > > writer_takes_lock = 1; > > > > } > > > > > > > > The writer locks are in place already. The code to handle the case > > > > when local cache is taken out is also there. > > > > What we need is another input flag that says 'multi writer + non-eal > > > threads' > > > > which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'. > > > > Not sure, it would be valuable addition. But looks like this is what > > > > you were expecting when you had enabled > > > > 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in > > DPDK > > > do > > > > not provide this kind of MT safety. > > > > > > [Wang, Yipeng] > > > If possible, we can try to not add new flags, because there are > > > already a lot of flag options. > > > How about in the code, we check if the writer is a non-eal or not by > > > checking the rte_lcore_id, and operate on the global queue? > > > Could this work? > > > If(h->use_local_cache) { > > > lcore_id = rte_lcore_id(); > > > if(lcore_id == LCORE_ID_ANY) { // this is non-eal threads > > > > > > } > > > Else { > > > > > > } > > > } > > The other thing I wanted to do was saving on the memory allocated for the > > local cache when the writers are non-eal threads. Without knowing the kind > > of threads upfront, we might have to create the local cache when a writer > > adds the entry first time. > > I got what you mean. If people only use non-eal threads, we could save the space of local cache completely. > Creating local cache during the first write is one solution. But the current rte_hash always allocate things during > table creation time. This provides guarantee that the program won't fail in the middle due to memory allocation issue. > > Meanwhile I would rather be wasting some space than adding another option flag related to multi-threading. > In my opinion, all those flags are already confusing enough. It would also be harder to maintain in future. Don't care about exact fix. Just don't want to randomly corrupt memory. Having it work would be better. But can we just error out for now; existing code is broken