DPDK CI discussions
 help / color / mirror / Atom feed
* Polling for patchseries in DPDK - the /series/ and /events/ endpoints
@ 2025-05-05 16:08 Patrick Robb
  2025-05-06 14:12 ` Aaron Conole
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Robb @ 2025-05-05 16:08 UTC (permalink / raw)
  To: Aaron Conole
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]

There was some discussion at last week's CI meeting about usage of the
Patchwork /events/ endpoint for polling for patches, and issues with that
process. Here is a relevant blurb, explaining some issues Aaron has run
into using the dpdk-ci repo "poll-pw.sh" shell script:

----------------

* Discussion pertaining to looking at polling for series using the events
API. This events endpoint (with series created event) returns info that a
series has been created, but returns a limited set of data in the payload,
and this necessitates a followup request to patchwork. So, this seems like
it would actually increase the amount of requests made to the patchwork
server. Some related issues discussed are:
   * You cannot query the events endpoint for only events from a particular
project (this matters for patchwork instances with many projects under
them). For DPDK there are only 4 projects under DPDK patchwork, so it’s not
a huge deal, but still a small issue.
   * The datetime that the series-created event returns is the datetimes of
one of the commits in the series, not the datetime of when the series was
submitted. So, this means that if you amend a commit (this does not update
commit datetime) and resubmit a patchseries, the datetime on the
series-created record will not be “updated”. This can cause us to miss
series when polling via the events endpoint.

------------------

And for context, poll-pw.sh will check the /events/ endpoint for new series
created events like so:

--------------------

URL="${URL}/events/?category=${resource_type}-completed"

callcmd () # <patchwork id>
{
	eval $cmd
}

while true ; do
	date_now=$(date --utc '+%FT%T')
	since=$(date --utc '+%FT%T' -d $(cat $since_file | tr '\n' ' '))
	page=1
	while true ; do
		ids=$(curl -s "${URL}&page=${page}&since=${since}" |
			jq "try ( .[] | select( .project.name == \"$project\" ) )" |
			jq "try ( .payload.${resource_type}.id )")
		[ -z "$(echo $ids | tr -d '\n')" ] && break
		for id in $ids ; do
			if grep -q "^${id}$" $poll_pw_ids_file ; then
				continue
			fi
			callcmd $id
			echo $id >>$poll_pw_ids_file


-------------------

But, as was discussed at the meeting, once you have the series ids, then
you need to make a followup request to /series/{id}.

UNH has a download_patchset.py polling script very much like poll-pw.sh
except that, because we store extra info about our processed patchseries in
a database (to facilitate lab.dpdk.org filtering functions), we use our
database to get the most recently processed patchseries, instead of the
"since_file." Our process (running every 10 minutes from Jenkins) is like
this:

1. get the "since_id" from our database
2. get the "newest_id" from https://patchwork.dpdk.org/api/
events/?category=series-completed. Get the [0] index of the json response
(the most recent patchseries) and save that series id.
3. for seriesID in range(since_id, newest_id): get patch from
https://patchwork.dpdk.org/api/series/{id}.

So, both poll-pw.sh and our UNH script follow the process of making a
request to /events/, and then followup requests for /series/. Thus the
total number of requests being made on patchwork is (number of new
patchseries + 1).

-The most consequential difference in the two implementations is that
poll-pw.sh makes a request to /events/ with the &since=${since} parameter,
passing in a since datetime, and UNH does not. As Aaron explained at the CI
meeting, because the datetime provided in the /events/ payload is not what
one would expect (it gives the datetime of the commit, not when the series
was submitted) this means that poll-pw-sh can miss series. With the UNH lab
polling script we don't have this issue because we don't make use of the
since parameter in our /events/ request. I think the options for poll-pw.sh
going forward would be:
1. Update patchwork so that the datetime provided in the /events/ payload
is what is "expected" i.e. the datetime that the series was submitted at.
2. Adopt the UNH process of discarding the &since=${since} parameter, and
rely solely on tracking the most recently processed patchseries id, get the
newest patchseries id from /events/, and traverse the range of (since_id,
newest_id).

-I agree it makes sense for /events/ to support a "project" param.

Thanks Aaron for raising this conversation. We can continue the
conversation over email, or also in person at DPDK Prague!

[-- Attachment #2: Type: text/html, Size: 5503 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-05 16:08 Polling for patchseries in DPDK - the /series/ and /events/ endpoints Patrick Robb
@ 2025-05-06 14:12 ` Aaron Conole
  2025-05-06 19:08   ` Patrick Robb
  0 siblings, 1 reply; 3+ messages in thread
From: Aaron Conole @ 2025-05-06 14:12 UTC (permalink / raw)
  To: Patrick Robb
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

Patrick Robb <probb@iol.unh.edu> writes:

> There was some discussion at last week's CI meeting about usage of the Patchwork
> /events/ endpoint for polling for patches, and issues with that process. Here is a relevant
> blurb, explaining some issues Aaron has run into using the dpdk-ci repo "poll-pw.sh" shell
> script: 
>
> ----------------
>
> * Discussion pertaining to looking at polling for series using the events API. This events
> endpoint (with series created event) returns info that a series has been created, but returns
> a limited set of data in the payload, and this necessitates a followup request to patchwork.
> So, this seems like it would actually increase the amount of requests made to the patchwork
> server. Some related issues discussed are:
>    * You cannot query the events endpoint for only events from a particular project (this
> matters for patchwork instances with many projects under them). For DPDK there are only 4
> projects under DPDK patchwork, so it’s not a huge deal, but still a small issue.
>    * The datetime that the series-created event returns is the datetimes of one of the
> commits in the series, not the datetime of when the series was submitted. So, this means
> that if you amend a commit (this does not update commit datetime) and resubmit a
> patchseries, the datetime on the series-created record will not be “updated”. This can cause
> us to miss series when polling via the events endpoint.

Sorry - I think there is still a misunderstanding here.

The datetime for the /series/ endpoint is what is provided in the patch
(so could be not updated)

The datetime for the /events/ endpoint is when the event fires (that is
when the series is received).

I can reply to the meeting minutes document with this as well.

> ------------------
>
> And for context, poll-pw.sh will check the /events/ endpoint for new series created events
> like so:
>
> --------------------
>
> URL="${URL}/events/?category=${resource_type}-completed"
>
> callcmd () # <patchwork id>
> {
> 	eval $cmd
> }
>
> while true ; do
> 	date_now=$(date --utc '+%FT%T')
> 	since=$(date --utc '+%FT%T' -d $(cat $since_file | tr '\n' ' '))
> 	page=1
> 	while true ; do
> 		ids=$(curl -s "${URL}&page=${page}&since=${since}" |
> 			jq "try ( .[] | select( .project.name == \"$project\" ) )" |
> 			jq "try ( .payload.${resource_type}.id )")
> 		[ -z "$(echo $ids | tr -d '\n')" ] && break
> 		for id in $ids ; do
> 			if grep -q "^${id}$" $poll_pw_ids_file ; then
> 				continue
> 			fi
> 			callcmd $id
> 			echo $id >>$poll_pw_ids_file
>
> -------------------
>
> But, as was discussed at the meeting, once you have the series ids, then you need to make a
> followup request to /series/{id}.
>
> UNH has a download_patchset.py polling script very much like poll-pw.sh except that,
> because we store extra info about our processed patchseries in a database (to facilitate
> lab.dpdk.org filtering functions), we use our database to get the most recently processed
> patchseries, instead of the "since_file." Our process (running every 10 minutes from Jenkins)
> is like this:
>
> 1. get the "since_id" from our database
> 2. get the "newest_id" from
> https://patchwork.dpdk.org/api/events/?category=series-completed. Get the [0] index of
> the json response (the most recent patchseries) and save that series id.
> 3. for seriesID in range(since_id, newest_id): get patch from
> https://patchwork.dpdk.org/api/series/{id}.
>
> So, both poll-pw.sh and our UNH script follow the process of making a request to /events/,
> and then followup requests for /series/. Thus the total number of requests being made on
> patchwork is (number of new patchseries + 1).
>
> -The most consequential difference in the two implementations is that poll-pw.sh makes a
> request to /events/ with the &since=${since} parameter, passing in a since datetime, and
> UNH does not. As Aaron explained at the CI meeting, because the datetime provided in the
> /events/ payload is not what one would expect (it gives the datetime of the commit, not
> when the series was submitted) this means that poll-pw-sh can miss series. With the UNH
> lab polling script we don't have this issue because we don't make use of the since
> parameter in our /events/ request. I think the options for poll-pw.sh going forward would
> be:
> 1. Update patchwork so that the datetime provided in the /events/ payload is what is
> "expected" i.e. the datetime that the series was submitted at.

That already is done.

> 2. Adopt the UNH process of discarding the &since=${since} parameter, and rely solely on
> tracking the most recently processed patchseries id, get the newest patchseries id from
> /events/, and traverse the range of (since_id, newest_id).
>
> -I agree it makes sense for /events/ to support a "project" param.
>
> Thanks Aaron for raising this conversation. We can continue the conversation over email, or
> also in person at DPDK Prague!

Let's keep discussing.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-06 14:12 ` Aaron Conole
@ 2025-05-06 19:08   ` Patrick Robb
  0 siblings, 0 replies; 3+ messages in thread
From: Patrick Robb @ 2025-05-06 19:08 UTC (permalink / raw)
  To: Aaron Conole
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

[-- Attachment #1: Type: text/plain, Size: 119 bytes --]

Thanks for the clarification regarding the datetimes. Yes let's clear up
any remaining questions offline at Prague. :)

[-- Attachment #2: Type: text/html, Size: 146 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-06 19:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-05 16:08 Polling for patchseries in DPDK - the /series/ and /events/ endpoints Patrick Robb
2025-05-06 14:12 ` Aaron Conole
2025-05-06 19:08   ` Patrick Robb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).