From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 6FAAEA0032;
	Fri, 24 Jun 2022 10:03:31 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 0311540A87;
	Fri, 24 Jun 2022 10:03:31 +0200 (CEST)
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by mails.dpdk.org (Postfix) with ESMTP id 3A48A40A82
 for <dev@dpdk.org>; Fri, 24 Jun 2022 10:03:28 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1656057809; x=1687593809;
 h=date:from:to:cc:subject:message-id:references:
 mime-version:content-transfer-encoding:in-reply-to;
 bh=on43CkxAJLOK0kSg872+33tPk8C04d/yzDN7cbxMeuE=;
 b=WcFkZz8F75H6xDhHzVuMw/2oBRxopVav2CXHPX7+9ztBAFif+mYXZQ6j
 GTWJBI+6jSXmkXP8YuBMrvEo9/Oke9ke2r/SSdi5phd/laARFda6WNd1I
 iGmISaWNr43VKX02q57WsYREjmNK6ujvMuHlqgdLq0HmZmV32NDzpXuGj
 9buUXwj+b/QtgiMBDSAJZHAXYIFmT2YMxR351HHifPKLlAG67qDM8DzHr
 zaqUt79IrTUnyVxQXQyTjucGRu3lOHBo8b3CoP5lmJ5twnU4mecGbZ0NI
 j0giS0+SOSNz/frrEvIiiVb+ZVgz6KVQaCvZM3FYJAKurwnB6hftoRMIv A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10387"; a="269679656"
X-IronPort-AV: E=Sophos;i="5.92,218,1650956400"; d="scan'208";a="269679656"
Received: from orsmga008.jf.intel.com ([10.7.209.65])
 by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 24 Jun 2022 01:03:28 -0700
X-IronPort-AV: E=Sophos;i="5.92,218,1650956400"; d="scan'208";a="615896711"
Received: from bricha3-mobl.ger.corp.intel.com ([10.252.25.171])
 by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA;
 24 Jun 2022 01:03:26 -0700
Date: Fri, 24 Jun 2022 09:03:23 +0100
From: Bruce Richardson <bruce.richardson@intel.com>
To: Morten =?iso-8859-1?Q?Br=F8rup?= <mb@smartsharesystems.com>
Cc: dev@dpdk.org, ciara.power@intel.com, fengchengwen@huawei.com
Subject: Re: [RFC PATCH 2/6] telemetry: fix escaping of invalid json characters
Message-ID: <YrVvy33JoU+qAIcE@bricha3-MOBL.ger.corp.intel.com>
References: <20220623164245.561371-1-bruce.richardson@intel.com>
 <20220623164245.561371-3-bruce.richardson@intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D8716B@smartserver.smartshare.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8716B@smartserver.smartshare.dk>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Thu, Jun 23, 2022 at 08:34:07PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Thursday, 23 June 2022 18.43
> > 
> > For string values returned from telemetry, escape any values that
> > cannot
> > normally appear in a json string. According to the json spec[1], the
> > characters than need to be handled are control chars (char value <
> > 0x20)
> > and '"' and '\' characters.
> 
> Correct. Other chars are optional to escape.
> 
> > 
> > To handle this, we replace the snprintf call with a separate string
> > copying and encapsulation routine which checks each character as it
> > copies it to the final array.
> > 
> > [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  lib/telemetry/telemetry_json.h | 48 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 47 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/telemetry/telemetry_json.h
> > b/lib/telemetry/telemetry_json.h
> > index db70690274..13df5d07e3 100644
> > --- a/lib/telemetry/telemetry_json.h
> > +++ b/lib/telemetry/telemetry_json.h
> > @@ -44,6 +44,52 @@ __json_snprintf(char *buf, const int len, const char
> > *format, ...)
> >  	return 0; /* nothing written or modified */
> >  }
> > 
> > +static const char control_chars[0x20] = {
> > +		['\n'] = 'n',
> > +		['\r'] = 'r',
> > +		['\t'] = 't',
> > +};
> > +
> > +/**
> > + * @internal
> > + * Does the same as __json_snprintf(buf, len, "\"%s\"", str)
> > + * except that it does proper escaping as necessary.
> > + * Drops any invalid characters we don't support
> > + */
> > +static inline int
> > +__json_format_str(char *buf, const int len, const char *str)
> > +{
> > +	char tmp[len];
> > +	int tmpidx = 0;
> > +
> > +	tmp[tmpidx++] = '"';
> > +	while (*str != '\0') {
> > +		if (*str < (int)RTE_DIM(control_chars)) {
> 
> I would prefer the more explicit 0x20, directly copied from the RFC. RTE_DIM(control_chars) hints that it could change.
>
Sure. Just trying to avoid magic constants, but in this case it does make
sense. Alternatively, I considered using space char as the sentinel value,
as first non-control-char allowed.
 
> > +			int idx = *str;  /* compilers don't like char type as
> > index */
> > +			if (control_chars[idx] != 0) {
> > +				tmp[tmpidx++] = '\\';
> > +				tmp[tmpidx++] = control_chars[idx];
> > +			}
> 
> Consider support for other control characters:
> +			else {
> +				tmp[tmpidx++] = '\\';
> +				tmp[tmpidx++] = 'u';
> +				tmp[tmpidx++] = '0';
> +				tmp[tmpidx++] = '0';
> +				tmp[tmpidx++] = hexchar(idx >> 4);
> +				tmp[tmpidx++] = hexchar(idx & 0xf);
> +			}
> 
> Or just drop them, as you mention in the function's description.
> 

Yeah, I'd appreciate general feedback on that. Adding support is nice, but
just not sure if we really need it or not.

> > +		} else if (*str == '"' || *str == '\\') {
> > +			tmp[tmpidx++] = '\\';
> > +			tmp[tmpidx++] = *str;
> > +		} else
> > +			tmp[tmpidx++] = *str;
> > +		/* we always need space for closing quote and null
> > character.
> > +		 * Ensuring at least two free characters also means we can
> > always take an
> > +		 * escaped character like "\n" without overflowing
> > +		 */
> > +		if (tmpidx > len - 2)
> 
> If supporting the \u00XX encoding, you need to reserve more than 2 characters here and in related code.
> 
Yep. I avoided supporting it for simplicity for now.

> > +			return 0;
> > +		str++;
> > +	}
> > +	tmp[tmpidx++] = '"';
> > +	tmp[tmpidx] = '\0';
> > +
> > +	strcpy(buf, tmp);
> > +	return tmpidx;
> > +}
> > +
> >  /* Copies an empty array into the provided buffer. */
> >  static inline int
> >  rte_tel_json_empty_array(char *buf, const int len, const int used)
> > @@ -62,7 +108,7 @@ rte_tel_json_empty_obj(char *buf, const int len,
> > const int used)
> >  static inline int
> >  rte_tel_json_str(char *buf, const int len, const int used, const char
> > *str)
> >  {
> > -	return used + __json_snprintf(buf + used, len - used, "\"%s\"",
> > str);
> > +	return used + __json_format_str(buf + used, len - used, str);
> >  }
> > 
> >  /* Appends a string into the JSON array in the provided buffer. */
> > --
> > 2.34.1
> > 
>