From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0666F439A2; Tue, 23 Jan 2024 06:49:48 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 00747402B8; Tue, 23 Jan 2024 06:49:47 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id DB2F94025D for ; Tue, 23 Jan 2024 06:49:46 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705988986; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xl/+tpraUVFQqQPqf1yfPiatoiw0cPk0bC1mi4+eaZ8=; b=dTdeI3slHgfb7NH5liRfQLBArzLtpePdT52atbPRHpM261sKgHhTRDUYt0yn+xLkt0iJ1E 9dBSCG2PTgOhYpz9FCqJzGKetG5Rkc69R0XirBl4eNT2vARDSTISkvHcozzKRbfN1+JHZB etx7omUZnO3TYt1WWdcbZZPvM66V3r0= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-306-DblbFZB5NRu7q2ChSkeaqA-1; Tue, 23 Jan 2024 00:49:44 -0500 X-MC-Unique: DblbFZB5NRu7q2ChSkeaqA-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-a2c4e9cb449so218919266b.1 for ; Mon, 22 Jan 2024 21:49:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705988983; x=1706593783; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xl/+tpraUVFQqQPqf1yfPiatoiw0cPk0bC1mi4+eaZ8=; b=Spqr4E3ccTVUrj50yV3j/es3X0HWzkkelHrE17g2i1niZLIroTV33MgniMTjm6voJ9 GO0BC+M2PNFGmvhTrvMrfAlUZtlAlSsb51XY89EBqx9C5FFJpSZkccA+B9rTXh1HK1AM LSHbBqL1+I/7344DUokRZkj7nVmf+qnw6QzMhuOgCi6bI03HrqnEywzwuWM8pkQWHPPr asRZUQdoJKs7ErIODyJQYdlPYLZS9mibDtpe46ASvoVDkt0FDGZ4N3FBBOa/z7ZswZoT FHyZm4jcgxCdP8UZfbvhnLYOtky/rztlj96PHXOzvyJQAnIHqaaXkwIVd9f+zNXGakw8 LyPw== X-Gm-Message-State: AOJu0YzsT42KQYuNGv+Dy6AcZSELpJ5ZHNiKMGE07GUqhmosXYRBOkGF yqEB1jcIKT7J+nPgovn6OddIOT38rbeQjfOFHFJ2fxiY/4lSMB9b8n/M9vdId3BVwegRuOesCCs 57Q6/rdFlCrqN3wfxiVFG7zqHJ+/iRqOqHDvsVtQCiYieO3q/Ox5T+SQsi9BSQIT1yhM/eplugk MZXNP/0nD/vtzS7fawibx6SgBD X-Received: by 2002:a17:907:c207:b0:a30:494:75e6 with SMTP id ti7-20020a170907c20700b00a30049475e6mr1097398ejc.182.1705988982927; Mon, 22 Jan 2024 21:49:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGf6k/AFie7Srqd0Y8f88AywuIl1x3FI/BOWTWKrPVSQ/B04N4haLrV8B1U4bS19MVBq1tXt+pFZ9uiedXL2CI= X-Received: by 2002:a17:907:c207:b0:a30:494:75e6 with SMTP id ti7-20020a170907c20700b00a30049475e6mr1097395ejc.182.1705988982590; Mon, 22 Jan 2024 21:49:42 -0800 (PST) MIME-Version: 1.0 References: <20240122234034.3883647-1-aconole@redhat.com> <20240122234034.3883647-3-aconole@redhat.com> In-Reply-To: <20240122234034.3883647-3-aconole@redhat.com> From: Michael Santana Date: Tue, 23 Jan 2024 00:49:31 -0500 Message-ID: Subject: Re: [PATCH v3 2/2] post_pw: Store submitted checks locally as well To: Aaron Conole Cc: ci@dpdk.org, Ilya Maximets , Jeremy Kerr X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org On Mon, Jan 22, 2024 at 6:40=E2=80=AFPM Aaron Conole w= rote: > > Jeremy Kerr reports that our PW checks reporting submitted 43000 API call= s > in just a single day. That is alarmingly unacceptable. We can store the > URLs we've already submitted and then just skip over any additional > processing at least on the PW side. > > This patch does two things to try and mitigate this issue: > > 1. Store each patch ID and URL in the series DB to show that we reported > the check. This means we don't need to poll patchwork for check statu= s > > 2. Store the last modified time of the reports mailing list. This means > we only poll the mailing list when a new email has surely landed. > > Signed-off-by: Aaron Conole Acked-by: Michael Santana > --- > v2: fixed up the Last-Modified grep and storage > v3: Simplified the logic of creating the last-access file > > post_pw.sh | 35 ++++++++++++++++++++++++++++++++++- > series_db_lib.sh | 25 +++++++++++++++++++++++++ > 2 files changed, 59 insertions(+), 1 deletion(-) > > diff --git a/post_pw.sh b/post_pw.sh > index 9163ea1..a8111ff 100755 > --- a/post_pw.sh > +++ b/post_pw.sh > @@ -20,6 +20,7 @@ > # License for the specific language governing permissions and limitation= s > # under the License. > > +[ -f "$(dirname $0)/series_db_lib.sh" ] && source "$(dirname $0)/series_= db_lib.sh" || exit 1 > [ -f "${HOME}/.mail_patchwork_sync.rc" ] && source "${HOME}/.mail_patchw= ork_sync.rc" > > # Patchwork instance to update with new reports from mailing list > @@ -75,6 +76,13 @@ send_post() { > if [ -z "$context" -o -z "$state" -o -z "$description" -o -z "$patch= _id" ]; then > echo "Skpping \"$link\" due to missing context, state, descripti= on," \ > "or patch_id" 1>&2 > + # Just don't want to even bother seeing these "bad" patches as w= ell. > + add_check_scanned_url "$patch_id" "$target_url" > + return 0 > + fi > + > + if check_id_exists "$patch_id" "$target_url" ; then > + echo "Skipping \"$link\" - already reported." 1>&2 > return 0 > fi > > @@ -84,6 +92,7 @@ send_post() { > "$api_url")" > if [ $? -ne 0 ]; then > echo "Failed to get proper server response on link ${api_url}" 1= >&2 > + # Don't store these as processed in case the server has a tempor= ary issue. > return 0 > fi > > @@ -95,6 +104,9 @@ send_post() { > jq -e "[.[].target_url] | contains([\"$mail_url\"])" >/dev/null > then > echo "Report ${target_url} already pushed to patchwork. Skipping= ." 1>&2 > + # Somehow this was not stored (for example, first time we apply = the tracking > + # feature). Store it now. > + add_check_scanned_url "$patch_id" "$target_url" > return 0 > fi > > @@ -114,12 +126,31 @@ send_post() { > if [ $? -ne 0 ]; then > echo -e "Failed to push retults based on report ${link} to the"\ > "patchwork instance ${pw_instance} using the following R= EST"\ > - "API Endpoint ${api_url} with the following data:\n$data= \n" > + "API Endpoint ${api_url} with the following data:\n$data= \n" 1>&2 > return 0 > fi > + > + add_check_scanned_url "$patch_id" "$target_url" > } > > +# Collect the date. NOTE: this needs some accomodate to catch the month= change-overs > year_month=3D"$(date +"%Y-%B")" > + > +# Get the last modified time > +report_last_mod=3D$(curl --head -A "(pw-ci) pw-post" -sSf "${mail_archiv= e}${year_month}/thread.html" | grep -i Last-Modified) > + > +mailing_list_save_file=3D$(echo ".post_pw_${mail_archive}${year_month}" = | sed -e "s@/@_@g" -e "s@:@_@g" -e "s,@,_,g") > + > +if [ -e "${HOME}/${mailing_list_save_file}" ]; then > + last_read_date=3D$(cat "${HOME}/${mailing_list_save_file}") > + if [ "$last_read_date" -a "$last_read_date" =3D=3D "$report_last_mod= " ]; then > + echo "Last modified times match. Skipping list parsing." > + exit 0 > + fi > +fi > + > +last_read_date=3D"$report_last_mod" > + > reports=3D"$(curl -A "(pw-ci) pw-post" -sSf "${mail_archive}${year_month= }/thread.html" | \ > grep -i 'HREF=3D' | sed -e 's@[0-9]*
  • if [ $? -ne 0 ]; then > @@ -132,3 +163,5 @@ echo "$reports" | while IFS=3D'|' read -r blank link = title; do > send_post "${mail_archive}${year_month}/$link" > fi > done > + > +echo "$last_read_date" > "${HOME}/${mailing_list_save_file}" > diff --git a/series_db_lib.sh b/series_db_lib.sh > index c5f42e0..0635469 100644 > --- a/series_db_lib.sh > +++ b/series_db_lib.sh > @@ -130,6 +130,17 @@ recheck_sync INTEGER > EOF > run_db_command "INSERT INTO series_schema_version(id) values (8)= ;" > fi > + > + run_db_command "select * from series_schema_version;" | egrep '^9$' = > /dev/null 2>&1 > + if [ $? -eq 1 ]; then > + sqlite3 ${HOME}/.series-db < +CREATE TABLE check_id_scanned ( > +check_patch_id INTEGER, > +check_url STRING > +) > +EOF > + run_db_command "INSERT INTO series_schema_version(id) values (9)= ;" > + fi > } > > function series_db_exists() { > @@ -468,3 +479,17 @@ function set_recheck_request_state() { > > echo "UPDATE recheck_requests set recheck_sync=3D$recheck_state wher= e patchwork_instance=3D\"$recheck_instance\" and patchwork_project=3D\"$rec= heck_project\" and recheck_requested_by=3D\"$recheck_requested_by\" and rec= heck_series=3D\"$recheck_series\";" | series_db_execute > } > + > +function add_check_scanned_url() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "INSERT into check_id_scanned (check_patch_id, check_url) value= s (${patch_id}, \"$url\");" | series_db_execute > +} > + > +function check_id_exists() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "select * from check_id_scanned where check_patch_id=3D$patch_i= d and check_url=3D\"$url\";" | series_db_execute | grep "$url" >/dev/null 2= >&1 > +} > -- > 2.41.0 >