From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9304BA0032; Fri, 24 Jun 2022 12:22:17 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6CE9440A87; Fri, 24 Jun 2022 12:22:17 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 3C7044069D for ; Fri, 24 Jun 2022 12:22:16 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC PATCH 0/6] add json string escaping to telemetry X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Fri, 24 Jun 2022 12:22:15 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87171@smartserver.smartshare.dk> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC PATCH 0/6] add json string escaping to telemetry Thread-Index: AdiHqzEz1cSt9zg+So+dYN8/vMYshAAAGJlQ References: <20220623164245.561371-1-bruce.richardson@intel.com> <98CBD80474FA8B44BF855DF32C47DC35D8716D@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D8716F@smartserver.smartshare.dk> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Bruce Richardson" Cc: , , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Friday, 24 June 2022 11.17 >=20 > On Fri, Jun 24, 2022 at 11:12:05AM +0200, Morten Br=F8rup wrote: > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > Sent: Friday, 24 June 2022 10.14 > > > > > > On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Br=F8rup wrote: > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > > Sent: Thursday, 23 June 2022 18.43 > > > > > > > > > > This RFC shows one possible approach for escaping strings for > the > > > json > > > > > output of telemetry library. For now this RFC supports = escaping > > > strings > > > > > for the cases of returning a single string, or returning an > array > > > of > > > > > strings. Not done is escaping of strings in objs/dicts [see > more > > > below > > > > > on TODO] > > > > > > > > Very good initiative. > > > > > > > > > > > > > > As well as telemetry lib changes, this patchset includes unit > tests > > > for > > > > > the above and also little bit of cleanup to the json tests. > > > > > > > > > > TODO: > > > > > Beyond what is here in this RFC: > > > > > > > > > > 1. we need to decide what to do about name/value pairs. > Personally, > > > I > > > > > think we should add the restriction to the > > > "rte_tel_data_add_obj_*" > > > > > APIs > > > > > to only allow a defined subset of characters in names: e.g. > > > > > alphanumeric > > > > > chars, underscore and dash. That means that we only need to > > > escape > > > > > the data part in the case of string returns. > > > > > > > > I agree about only allowing a subset of characters in names, so > JSON > > > (and other) encoding is not required. > > > > > > > > However, I think we should be less restrictive, and also allow > > > characters commonly used for separation, indexing and wildcard, > such as > > > '/', '[', ']', and '*', '?' or '%'. > > > > > > > > Obviously, we should disallow characters requiring escaping in > not > > > just JSON, but also other foreseeable encodings and protocols. So > > > please bring your crystal ball to the discussion. ;-) > > > > > > > Exactly why I am looking for feedback - and why I'm looking to = have > an > > > explicit allowed list of characters rather than trying to just > block > > > the > > > known-bad in json ones. > > > > > > For your suggestions: +1 to separators and indexing, i.e. '[', ']' > and > > > '/', > > > though I would probably also add ',' and maybe '.' (unless it's > likely > > > to > > > cause issues with some protocol we are likely to want to use). > > > > After having slept on it, I think we should also allow characters > that could appear in IP and MAC addresses, i.e. '.' and ':' (and '/' > for subnetting). > > > > > For the wildcarding, I find it hard to see why we would want = those? > > > > Initially, I thought a wildcard might be useful as a placeholder in > templates. > > > > But it might also be useful for partial IP or MAC addresses. E.g.: > > - The SmartShare Systems OUI could be represented by the MAC address > "00:1F:B4:??:??:??". > > - A default gateway address in a template configuration could be > "192.168.*.1". > > > > On the other hand, wildcard characters could be disallowed or = require > escaping in other (non-JSON) protocols. > > > > So I'm just being a bit creative here, throwing out ideas in our > search for the right balance in the restrictions. > > >=20 > I could see those characters certainly being needed in data values, = but > do > you foresee them being required in the names of fields? We don't use the Telemetry library, because we have our own libraries = for similar and related purposes. So I'm mostly speculating, trying to = transform our experience into how I would expect the Telemetry library = to work, while also trying to look farther into the future. Answering your question: Yes, if you consider the names as keys in a key/value store, there might = be single entries that look like a template. Although the names of such = entries might as well be "00:1F:B4:xx:xx:xx" or "192.168.z.1", using 'x' = and 'z' as the wildcard characters. Perhaps we should start with the low risk choice, and not allow the = special wild card characters, such as '*', '?', '%', since 'x' is just = as good in those cases. >=20 > > > > > > The other advantage of using an allowlist of characters is that it > > > makes it > > > possible to expand over time, compared to a blocklist which always > runs > > > the > > > risk of breaking something if you expand it. Therefore I suggest = we > > > keep > > > the list as small as we need right now, and expand it only as we > need. > > > > +1 > > >=20 > From previous on-list discussion, I take it that SNMP is a possible > target > protocol you might have in mind. Any other protocols you can think of > and > what restrictions (if any) would SNMP or those other protocols add? JSON and UTF-8 seems to have taken over the world entirely. SNMP support is usually required for legacy reasons. The SNMP lookup key = is always an OID (Object Identifier), which basically is a sequence of = numbers with a well known length of the sequence. In theory, any BLOB = could be converted to an OID. With that in mind, I don't think SNMP puts = any restrictions to the character set of the Telemetry names. The = translation between OID format (i.e. a sequence of numbers) and = Telemetry name format (i.e. a string) could be a very simple = encoder/decoder, since there are no special characters requiring special = treatment. Going back to the IP address topic above, some of the SNMP MIBs use the = IP address as the last four numbers in the OID, e.g. = "ipAdEntIfIndex.192.0.1.1" (where ipAdEntIfIndex is short for = "1.3.6.1.2.1.4.20.1.2"). My point here is: The names available for = lookup in the telemetry database could be highly dynamic.=20 As for other protocols, there could be something like InfluxDB [1], for = direct streaming of statistics and other telemetry, but I don't have = real experience with any of them. Our customers currently use scripts to = poll the JSON data from our API and push them into their InfluxDB = databases. There could also be limitations in the structured format for SYSLOG [2], = but again I don't have any experience with it. We just use classic = SYSLOG text messages. [1] = https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol= / [2] https://datatracker.ietf.org/doc/html/rfc5424 >=20 > /Bruce