From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4DD04424A9; Sat, 28 Jan 2023 12:27:34 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E664E40146; Sat, 28 Jan 2023 12:27:33 +0100 (CET) Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by mails.dpdk.org (Postfix) with ESMTP id 1640340143 for ; Sat, 28 Jan 2023 12:27:32 +0100 (CET) Received: by mail-vs1-f54.google.com with SMTP id p10so3413604vsu.5 for ; Sat, 28 Jan 2023 03:27:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=z6+vtdKASTgqpy/RHSS/vt2AiQLTRTnG6sw7i920oN8=; b=Gc2DgbL+ds9L25KCYUDCn61TuHCykH/iyoQDn/59U9cQXRgE8Hw9khBA9ubGqEKa6f Kv7CqhlQnOyQMRGx7qTaUlzCrlEwqX47axKzI64VtwvReVRqCgXd8uOpGRDM1CcD72nN Mc4drFvthK4VGjvpd9VTHwm/oy+sS6uUim79sGhOvG5WBVpmyFmfSeqhOOCW4zA4aW1m gCFlBNWUlXKzbiRawQoWSlGaO5yAoPAvAo/nCh7GOOfCCFPnw+Wi7bKcTzYEWgQ+fyZs rMtHp70jjAJTR4ckZakcbGzjC8BsXJNwSkAz1+ZAySNQLEtFrl2n49mQN4IUQfbfkL5h n3mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=z6+vtdKASTgqpy/RHSS/vt2AiQLTRTnG6sw7i920oN8=; b=USz7cEyOVjI2p001McSVK+kznnesWSlhK+y2bAU/+jqxDRcnXn8wbOy1nKGRNS/JRx ZlZZvg3GpK9O7SweGqUsDEpY7izmjlrS7l8uUUzvmPhwMjuOhuaa7eNdgln59WD+Comr n5xy1kQczBmoqTcFLFFyDqjU+pEYf0Oe5kcyIBJ8s64zSRFgp5+rLDx10olSLXzndeZk H649jauK3LIU7noyCJCvOrz/Vn1G0YBsDKFYCMyJQ2Fj4zFM6jsYf3PRC5pmfegL5LS6 MuDwgLnX+kok4CoLeyM5+nRbxhhwx5Nw1X+FQBLaIyyXC+hnKEIxlYZKAbtvfgO04qTi 4CAg== X-Gm-Message-State: AO0yUKW5obtpnJFnymBQZN6Pn1dhN87ds/4zsumBVJbC1J/14mri1zJ2 IEPBsXMl+dxVwE5PNUFD1L0L+liVFJzhpcF5tKA= X-Google-Smtp-Source: AK7set+kkgsjkQMTIsJ078iUlh3Oqy/7yfk0iFFnvVYDtITdctlLDarqy8A5S8S6OfDhxdNVn4y3YvdnTCujKeIjBGI= X-Received: by 2002:a67:df8f:0:b0:3eb:2024:bda9 with SMTP id x15-20020a67df8f000000b003eb2024bda9mr1378869vsk.45.1674905251185; Sat, 28 Jan 2023 03:27:31 -0800 (PST) MIME-Version: 1.0 References: <20220803132839.2747858-2-jerinj@marvell.com> <2871791.H8VbNj7W2P@thomas> In-Reply-To: <2871791.H8VbNj7W2P@thomas> From: Jerin Jacob Date: Sat, 28 Jan 2023 16:57:03 +0530 Message-ID: Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v1 00/12] mldev: introduce machine learning device library To: Thomas Monjalon Cc: Shivah Shankar Shankar Narayan Rao , Jerin Jacob Kollanukkaran , "dev@dpdk.org" , "ferruh.yigit@amd.com" , "ajit.khaparde@broadcom.com" , "aboyer@pensando.io" , "andrew.rybchenko@oktetlabs.ru" , "beilei.xing@intel.com" , "bruce.richardson@intel.com" , "chas3@att.com" , "chenbo.xia@intel.com" , "ciara.loftus@intel.com" , Devendra Singh Rawat , "ed.czeck@atomicrules.com" , "evgenys@amazon.com" , "grive@u256.net" , "g.singh@nxp.com" , "zhouguoyang@huawei.com" , "haiyue.wang@intel.com" , Harman Kalra , "heinrich.kuhn@corigine.com" , "hemant.agrawal@nxp.com" , "hyonkim@cisco.com" , "igorch@amazon.com" , Igor Russkikh , "jgrajcia@cisco.com" , "jasvinder.singh@intel.com" , "jianwang@trustnetic.com" , "jiawenwu@trustnetic.com" , "jingjing.wu@intel.com" , "johndale@cisco.com" , "john.miller@atomicrules.com" , "linville@tuxdriver.com" , "keith.wiles@intel.com" , Kiran Kumar Kokkilagadda , "oulijun@huawei.com" , Liron Himi , "longli@microsoft.com" , "mw@semihalf.com" , "spinler@cesnet.cz" , "matan@nvidia.com" , "matt.peters@windriver.com" , "maxime.coquelin@redhat.com" , "mk@semihalf.com" , "humin29@huawei.com" , Pradeep Kumar Nalla , Nithin Kumar Dabilpuram , "qiming.yang@intel.com" , "qi.z.zhang@intel.com" , Radha Chintakuntla , "rahul.lakkireddy@chelsio.com" , Rasesh Mody , "rosen.xu@intel.com" , "sachin.saxena@oss.nxp.com" , Satha Koteswara Rao Kottidi , Shahed Shaikh , "shaibran@amazon.com" , "shepard.siegel@atomicrules.com" , "asomalap@amd.com" , "somnath.kotur@broadcom.com" , "sthemmin@microsoft.com" , "steven.webster@windriver.com" , Sunil Kumar Kori , "mtetsuyah@gmail.com" , Veerasenareddy Burru , "viacheslavo@nvidia.com" , "xiao.w.wang@intel.com" , "cloud.wangxiaoyun@huawei.com" , "yisen.zhuang@huawei.com" , "yongwang@vmware.com" , "xuanziyang2@huawei.com" , Prasun Kapoor , Nadav Haklai , Satananda Burla , Narayana Prasad Raju Athreya , Akhil Goyal , "dmitry.kozliuk@gmail.com" , "anatoly.burakov@intel.com" , "cristian.dumitrescu@intel.com" , "honnappa.nagarahalli@arm.com" , "mattias.ronnblom@ericsson.com" , "ruifeng.wang@arm.com" , "drc@linux.vnet.ibm.com" , "konstantin.ananyev@intel.com" , "olivier.matz@6wind.com" , "jay.jayatheerthan@intel.com" , Ashwin Sekhar T K , Pavan Nikhilesh Bhagavatula , "eagostini@nvidia.com" , Srikanth Yalavarthi , Derek Chickles , "david.marchand@redhat.com" Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Jan 27, 2023 at 6:28 PM Thomas Monjalon wrote: > > Hi, > > Shivah Shankar, please quote your replies > so we can distinguish what I said from what you say. > > Please try to understand my questions, you tend to reply to something else. > > > 27/01/2023 05:29, Jerin Jacob: > > On Fri, Jan 27, 2023 at 8:04 AM Shivah Shankar Shankar Narayan Rao > > wrote: > > > 25/01/2023 20:01, Jerin Jacob: > > > > On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon wrote: > > > > > 14/11/2022 13:02, jerinj@marvell.com: > > > > > > > ML Model: An ML model is an algorithm trained over a dataset. A > > > > > > > model consists of procedure/algorithm and data/pattern required to > > > > > > > make predictions on live data. Once the model is created and > > > > > > > trained outside of the DPDK scope, > > > > > > > the model can be loaded via rte_ml_model_load() and then start it > > > > > > > using rte_ml_model_start() API. The rte_ml_model_params_update() > > > > > > > can be used to update the model > > > > > > > parameters such as weight and bias without unloading the model > > > > > > > using rte_ml_model_unload().> > > > > > > > > > > The fact that the model is prepared outside means the model format > > > > > > is free and probably different per mldev driver. > > > > > > I think it is OK but it requires a lot of documentation effort to > > > > > > explain how to bind the model and its parameters with the DPDK API. > > > > > > Also we may need to pass some metadata from the model builder to the > > > > > > inference engine in order to enable optimizations prepared in the > > > > > > model. > > > > > > And the other way, we may need inference capabilities in order to > > > > > > generate an optimized model which can run in the inference engine. > > > > > > > > > > The base API specification kept absolute minimum. Currently, weight > > > > > and biases parameters updated through rte_ml_model_params_update(). It > > > > > can be extended when there are drivers supports it or if you have any > > > > > specific parameter you would like to add it in > > > > > rte_ml_model_params_update(). > > > > > > > > This function is > > > > int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void > > > > *buffer); > > > > > > > > How are we supposed to provide separate parameters in this void* ? > > > > > > Just to clarify on what "parameters" mean, > > > they just mean weights and biases of the model, > > > which are the parameters for a model. > > > Also, the Proposed APIs are for running the inference > > > on a pre-trained model. > > > For running the inference the amount of parameters tuning > > > needed/done is limited/none. > > Why is it limited? > I think you are limiting to *your* model. See below. > > > > The only parameters that get may get changed are the Weights and Bias > > > which the API rte_ml_model_params_update() caters to. > > We cannot imagine a model with more type of parameters? > > > > While running the inference on a Model there won't be any random > > > addition or removal of operators to/from the model or there won't > > > be any changes in the actual flow of model. > > > Since the only parameter that can be changed is Weights and Biases > > > the above API should take care. > > No, you don't reply to my question. > I want to be able to change a single parameter. > I am expecting a more fine-grain API than a simple "void*". > We could give the name of the parameter and a value, why not? The current API model is follows 1)The model is developed outside DPDK and binary file is loaded via rte_ml_model_load() 2)The modes "read only" capabilities like shape or quantized data can be read through rte_ml_model_info_get() API. If you wish to advertise any other capability for optimization etc please give inline reply around rte_ml_io_info for the parameter and its comment. We can review and add it. 3)Now comes the parameter, which is the "update" on the model which loaded prior via rte_ml_model_load() . Also, it created outside DPDK. User have an "update" to the parameter when we have new set of training happens. Currently we are assuming this as single blob due to that fact that It is model specific and it just continues stream of bytes from model and thus void* is given. If you have use case or your model support more parameter update as separate blob, we should able to update rte_ml_model_params_update() as needed. Please suggest the new rte_ml_model_params_type enum or so. We can add that to rte_ml_model_params_update(). Also, if you have concrete data type instead of void* for given TYPE. Please propose the structure for that as well, We should be able to update struct rte_ml_dev_info for these capabilities to abstract the models or inference engine differences. > > > > > > Other metadata data like batch, shapes, formats queried using > > > > > rte_ml_io_info(). > > > > Copying: > > > > +/** Input and output data information structure > > > > + * > > > > + * Specifies the type and shape of input and output data. > > > > + */ > > > > +struct rte_ml_io_info { > > > > + char name[RTE_ML_STR_MAX]; > > > > + /**< Name of data */ > > > > + struct rte_ml_io_shape shape; > > > > + /**< Shape of data */ > > > > + enum rte_ml_io_type qtype; > > > > + /**< Type of quantized data */ > > > > + enum rte_ml_io_type dtype; > > > > + /**< Type of de-quantized data */ }; > > > > > > > > Is it the right place to notify the app that some model optimizations > > > > are supported? (example: merge some operations in the graph) > > > > > > The inference is run on a pre-trained model, which means > > > any merges /additions of operations to the graph are NOT done. > > > If any such things are done then the changed model needs to go > > > through the training and compilation once again > > > which is out of scope of these APIs. > > Please try to understand what I am saying. > I want the application to be able to know some capabilities are supported > by the inference driver. > So it will allow to generate the model with some optimizations. See above. Yes, this place to add that. Please propose any changes that you want to add. > > > > > > > [...] > > > > > > > Typical application utilisation of the ML API will follow the > > > > > > > following programming flow. > > > > > > > > > > > > > > - rte_ml_dev_configure() > > > > > > > - rte_ml_dev_queue_pair_setup() > > > > > > > - rte_ml_model_load() > > > > > > > - rte_ml_model_start() > > > > > > > - rte_ml_model_info() > > > > > > > - rte_ml_dev_start() > > > > > > > - rte_ml_enqueue_burst() > > > > > > > - rte_ml_dequeue_burst() > > > > > > > - rte_ml_model_stop() > > > > > > > - rte_ml_model_unload() > > > > > > > - rte_ml_dev_stop() > > > > > > > - rte_ml_dev_close() > > > > > > > > > > > > Where is parameters update in this flow? > > > > > > > > > > Added the mandatory APIs in the top level flow doc. > > > > > rte_ml_model_params_update() used to update the parameters. > > > > > > > > The question is "where" should it be done? > > > > Before/after start? > > > > > > The model image comes with the Weights and Bias > > > and will be loaded and used as a part of rte_ml_model_load > > > and rte_ml_model_start. > > > In rare scenarios where the user wants to update > > > the Weights and Bias of an already loaded model, > > > the rte_ml_model_stop can be called to stop the model > > > and the Weights and Biases can be updated using the > > > The parameters (Weights&Biases) can be updated > > > when the rte_ml_model_params_update() API > > > followed by rte_ml_model_start to start the model > > > with the new Weights and Biases. > > OK please sure it is documented that parameters update > must be done on a stopped engine. The doc is already there in the exisitng patch. Please see +/** + * Update the model parameters without unloading model. + * + * Update model parameters such as weights and bias without unloading the model. + * rte_ml_model_stop() must be called before invoking this API. + * + * @param[in] dev_id + * The identifier of the device. + * @param[in] model_id + * Identifier for the model created + * @param[in] buffer + * Pointer to the model weights and bias buffer. + * Size of the buffer is equal to wb_size returned in *rte_ml_model_info*. + * + * @return + * - Returns 0 on success + * - Returns negative value on failure + */ +__rte_experimental +int +rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void *buffer); > > > > > > > Should we update all parameters at once or can it be done more > > > > > > fine-grain? > > > > > > > > > > Currently, rte_ml_model_params_update() can be used to update weight > > > > > and bias via buffer when device is in stop state and without unloading > > > > > the model. > > Passing a raw buffer is a really dark API. > We need to know how to fill the buffer. See above, Currently it is model specific and model is spitting out the paramater blob update after the traning or so. DPDK interfence engine API is a means to transport these blob from model to ML engine. > > > > > The question is "can we update a single parameter"? > > > > And how? > > > > > > As mentioned above for running inference the model is already trained > > > the only parameter that is updated is the Weights and Biases. > > > "Parameters" is another word for Weights and Bias. > > > No other parameters are considered. > > You are not replying to the question. > How can we update a single parameter? See above. I see main comments are on param update and get the capablities. To enable that, please propose the changes around rte_ml_model_params_update(), rte_ml_model_info. We should able to take that and send v2. > > > > Are there any other parameters you have on your mind? > > No > > > > > > > Question about the memory used by mldev: > > > > > > Can we manage where the memory is allocated (host, device, mix, > > > > > > etc)? > > > > > > > > > > Just passing buffer pointers now like other subsystem. > > > > > Other EAL infra service can take care of the locality of memory as it > > > > > is not specific to ML dev. > > > > > > > > I was thinking about memory allocation required by the inference engine. > > > > How to specify where to allocate? Is it just hardcoded in the driver? > > > > > > Any memory within the hardware is managed by the driver. > > > > I think, Thomas is asking input and output memory for interference. If > > so, the parameters for > > struct rte_ml_buff_seg or needs to add type or so. Thomas, Please > > propose what parameters you want here. > > In case if it is for internal driver memory, We can pass the memory > > type in rte_ml_dev_configure(), If so, please propose > > the memory types you need and the parameters. > > I'm talking about the memory used by the driver to make the inference works. > In some cases we may prefer the hardware using host memory, > sometimes use the device memory. > I think that's something we may tune in the configuration. > I suppose we are fine with allocation hardcoded in the driver for now, > as I don't have a clear need. OK. > >