From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 963CB4399F; Tue, 23 Jan 2024 00:15:09 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8F9E0402A8; Tue, 23 Jan 2024 00:15:09 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id A1B4A40273 for ; Tue, 23 Jan 2024 00:15:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705965308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xGcMxLLfWenYTLPsKUk7HB50Tf4pNp/GmfHfNstpHBo=; b=KHhg2Lelc9Oja0fnv+UDYoQqghp+zoAXnQNDmooF4LBfm7QAXciJEzlvD8lW0/5XXdflak 1TZ9AWsKNwm1DBAm5XNA65/dyBHtLSaNXOEDyQoKREvCOqQ7ufvPnqxF6XyEHG8LTQBWbq Z6Xx2npXzW5/ANjp0IkbTgd6AekqiRY= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-623-wJ05ZJ6WOe-OuPSt-VQJzQ-1; Mon, 22 Jan 2024 18:15:06 -0500 X-MC-Unique: wJ05ZJ6WOe-OuPSt-VQJzQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 50F3638135E5; Mon, 22 Jan 2024 23:15:06 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.33.141]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DF38340C1430; Mon, 22 Jan 2024 23:15:05 +0000 (UTC) From: Aaron Conole To: Michael Santana Cc: ci@dpdk.org, Ilya Maximets , Jeremy Kerr Subject: Re: [PATCH v2 2/2] post_pw: Store submitted checks locally as well References: <20240122193232.3734371-1-aconole@redhat.com> <20240122193232.3734371-3-aconole@redhat.com> Date: Mon, 22 Jan 2024 18:14:51 -0500 In-Reply-To: (Michael Santana's message of "Mon, 22 Jan 2024 15:39:31 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org Michael Santana writes: > On Mon, Jan 22, 2024 at 2:32=E2=80=AFPM Aaron Conole = wrote: >> >> Jeremy Kerr reports that our PW checks reporting submitted 43000 API cal= ls >> in just a single day. That is alarmingly unacceptable. We can store th= e >> URLs we've already submitted and then just skip over any additional >> processing at least on the PW side. >> >> This patch does two things to try and mitigate this issue: >> >> 1. Store each patch ID and URL in the series DB to show that we reported >> the check. This means we don't need to poll patchwork for check stat= us >> >> 2. Store the last modified time of the reports mailing list. This means >> we only poll the mailing list when a new email has surely landed. >> >> Signed-off-by: Aaron Conole >> --- >> v2: fixed up the Last-Modified grep and storage >> >> post_pw.sh | 38 +++++++++++++++++++++++++++++++++++++- >> series_db_lib.sh | 25 +++++++++++++++++++++++++ >> 2 files changed, 62 insertions(+), 1 deletion(-) >> >> diff --git a/post_pw.sh b/post_pw.sh >> index 9163ea1..a23bdc5 100755 >> --- a/post_pw.sh >> +++ b/post_pw.sh >> @@ -20,6 +20,7 @@ >> # License for the specific language governing permissions and limitatio= ns >> # under the License. >> >> +[ -f "$(dirname $0)/series_db_lib.sh" ] && source "$(dirname $0)/series= _db_lib.sh" || exit 1 >> [ -f "${HOME}/.mail_patchwork_sync.rc" ] && source "${HOME}/.mail_patch= work_sync.rc" >> >> # Patchwork instance to update with new reports from mailing list >> @@ -75,6 +76,13 @@ send_post() { >> if [ -z "$context" -o -z "$state" -o -z "$description" -o -z "$patc= h_id" ]; then >> echo "Skpping \"$link\" due to missing context, state, descript= ion," \ >> "or patch_id" 1>&2 >> + # Just don't want to even bother seeing these "bad" patches as = well. >> + add_check_scanned_url "$patch_id" "$target_url" >> + return 0 >> + fi >> + >> + if check_id_exists "$patch_id" "$target_url" ; then >> + echo "Skipping \"$link\" - already reported." 1>&2 >> return 0 >> fi >> >> @@ -84,6 +92,7 @@ send_post() { >> "$api_url")" >> if [ $? -ne 0 ]; then >> echo "Failed to get proper server response on link ${api_url}" = 1>&2 >> + # Don't store these as processed in case the server has a tempo= rary issue. >> return 0 >> fi >> >> @@ -95,6 +104,9 @@ send_post() { >> jq -e "[.[].target_url] | contains([\"$mail_url\"])" >/dev/null >> then >> echo "Report ${target_url} already pushed to patchwork. Skippin= g." 1>&2 >> + # Somehow this was not stored (for example, first time we apply= the tracking >> + # feature). Store it now. >> + add_check_scanned_url "$patch_id" "$target_url" >> return 0 >> fi >> >> @@ -114,12 +126,34 @@ send_post() { >> if [ $? -ne 0 ]; then >> echo -e "Failed to push retults based on report ${link} to the"= \ >> "patchwork instance ${pw_instance} using the following = REST"\ >> - "API Endpoint ${api_url} with the following data:\n$dat= a\n" >> + "API Endpoint ${api_url} with the following data:\n$dat= a\n" 1>&2 >> return 0 >> fi >> + >> + add_check_scanned_url "$patch_id" "$target_url" >> } >> >> +# Collect the date. NOTE: this needs some accomodate to catch the mont= h change-overs >> year_month=3D"$(date +"%Y-%B")" >> + >> +# Get the last modified time >> +report_last_mod=3D$(curl --head -A "(pw-ci) pw-post" -sSf "${mail_archi= ve}${year_month}/thread.html" | grep -i Last-Modified) >> + >> +mailing_list_save_file=3D$(echo ".post_pw_${mail_archive}${year_month}"= | sed -e "s@/@_@g" -e "s@:@_@g" -e "s,@,_,g") >> + >> +if [ -e "${HOME}/${mailing_list_save_file}" ]; then >> + last_read_date=3D$(cat "${HOME}/${mailing_list_save_file}") >> + if [ "$last_read_date" -a "$last_read_date" =3D=3D "$report_last_mo= d" ]; then >> + echo "Last modified times match. Skipping list parsing." >> + exit 0 >> + else >> + last_read_date=3D"$report_last_mod" >> + fi >> +else >> + last_read_date=3D"Failed curl." >> + touch "${HOME}/${mailing_list_save_file}" > One last thing, sorry > > Instead of touch, could we propagate with $report_last_mod ? > Im looking at it from the POV the first time this script is run and > the file does not exist, it creates the timestamp. Next time it runs, > if the timestamps are the same the script exits, which is what we > want. If we keep it as its written, the second time the script runs > the variable will be propagated with "Failed curl" and the script will > run fully. This might not be ideal if the timestamps match but just > not yet propagated on the file > > or maybe another last_read_date=3D"$report_last_mod" might do the job > > Otherwise, everything looks good! Ack. Will send a v3. >> +fi >> + >> reports=3D"$(curl -A "(pw-ci) pw-post" -sSf "${mail_archive}${year_mont= h}/thread.html" | \ >> grep -i 'HREF=3D' | sed -e 's@[0-9]*
  • @\|@')" >> if [ $? -ne 0 ]; then >> @@ -132,3 +166,5 @@ echo "$reports" | while IFS=3D'|' read -r blank link= title; do >> send_post "${mail_archive}${year_month}/$link" >> fi >> done >> + >> +echo "$last_read_date" > "${HOME}/${mailing_list_save_file}" >> diff --git a/series_db_lib.sh b/series_db_lib.sh >> index c5f42e0..0635469 100644 >> --- a/series_db_lib.sh >> +++ b/series_db_lib.sh >> @@ -130,6 +130,17 @@ recheck_sync INTEGER >> EOF >> run_db_command "INSERT INTO series_schema_version(id) values (8= );" >> fi >> + >> + run_db_command "select * from series_schema_version;" | egrep '^9$'= > /dev/null 2>&1 >> + if [ $? -eq 1 ]; then >> + sqlite3 ${HOME}/.series-db <> +CREATE TABLE check_id_scanned ( >> +check_patch_id INTEGER, >> +check_url STRING >> +) >> +EOF >> + run_db_command "INSERT INTO series_schema_version(id) values (9= );" >> + fi >> } >> >> function series_db_exists() { >> @@ -468,3 +479,17 @@ function set_recheck_request_state() { >> >> echo "UPDATE recheck_requests set recheck_sync=3D$recheck_state whe= re patchwork_instance=3D\"$recheck_instance\" and patchwork_project=3D\"$re= check_project\" and recheck_requested_by=3D\"$recheck_requested_by\" and re= check_series=3D\"$recheck_series\";" | series_db_execute >> } >> + >> +function add_check_scanned_url() { >> + local patch_id=3D"$1" >> + local url=3D"$2" >> + >> + echo "INSERT into check_id_scanned (check_patch_id, check_url) valu= es (${patch_id}, \"$url\");" | series_db_execute >> +} >> + >> +function check_id_exists() { >> + local patch_id=3D"$1" >> + local url=3D"$2" >> + >> + echo "select * from check_id_scanned where check_patch_id=3D$patch_= id and check_url=3D\"$url\";" | series_db_execute | grep "$url" >/dev/null = 2>&1 >> +} >> -- >> 2.41.0 >>