DPDK patches and discussions
 help / color / mirror / Atom feed
* Polling for patchseries in DPDK - the /series/ and /events/ endpoints
@ 2025-05-05 16:08 Patrick Robb
  2025-05-06 14:12 ` Aaron Conole
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Robb @ 2025-05-05 16:08 UTC (permalink / raw)
  To: Aaron Conole
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]

There was some discussion at last week's CI meeting about usage of the
Patchwork /events/ endpoint for polling for patches, and issues with that
process. Here is a relevant blurb, explaining some issues Aaron has run
into using the dpdk-ci repo "poll-pw.sh" shell script:

----------------

* Discussion pertaining to looking at polling for series using the events
API. This events endpoint (with series created event) returns info that a
series has been created, but returns a limited set of data in the payload,
and this necessitates a followup request to patchwork. So, this seems like
it would actually increase the amount of requests made to the patchwork
server. Some related issues discussed are:
   * You cannot query the events endpoint for only events from a particular
project (this matters for patchwork instances with many projects under
them). For DPDK there are only 4 projects under DPDK patchwork, so it’s not
a huge deal, but still a small issue.
   * The datetime that the series-created event returns is the datetimes of
one of the commits in the series, not the datetime of when the series was
submitted. So, this means that if you amend a commit (this does not update
commit datetime) and resubmit a patchseries, the datetime on the
series-created record will not be “updated”. This can cause us to miss
series when polling via the events endpoint.

------------------

And for context, poll-pw.sh will check the /events/ endpoint for new series
created events like so:

--------------------

URL="${URL}/events/?category=${resource_type}-completed"

callcmd () # <patchwork id>
{
	eval $cmd
}

while true ; do
	date_now=$(date --utc '+%FT%T')
	since=$(date --utc '+%FT%T' -d $(cat $since_file | tr '\n' ' '))
	page=1
	while true ; do
		ids=$(curl -s "${URL}&page=${page}&since=${since}" |
			jq "try ( .[] | select( .project.name == \"$project\" ) )" |
			jq "try ( .payload.${resource_type}.id )")
		[ -z "$(echo $ids | tr -d '\n')" ] && break
		for id in $ids ; do
			if grep -q "^${id}$" $poll_pw_ids_file ; then
				continue
			fi
			callcmd $id
			echo $id >>$poll_pw_ids_file


-------------------

But, as was discussed at the meeting, once you have the series ids, then
you need to make a followup request to /series/{id}.

UNH has a download_patchset.py polling script very much like poll-pw.sh
except that, because we store extra info about our processed patchseries in
a database (to facilitate lab.dpdk.org filtering functions), we use our
database to get the most recently processed patchseries, instead of the
"since_file." Our process (running every 10 minutes from Jenkins) is like
this:

1. get the "since_id" from our database
2. get the "newest_id" from https://patchwork.dpdk.org/api/
events/?category=series-completed. Get the [0] index of the json response
(the most recent patchseries) and save that series id.
3. for seriesID in range(since_id, newest_id): get patch from
https://patchwork.dpdk.org/api/series/{id}.

So, both poll-pw.sh and our UNH script follow the process of making a
request to /events/, and then followup requests for /series/. Thus the
total number of requests being made on patchwork is (number of new
patchseries + 1).

-The most consequential difference in the two implementations is that
poll-pw.sh makes a request to /events/ with the &since=${since} parameter,
passing in a since datetime, and UNH does not. As Aaron explained at the CI
meeting, because the datetime provided in the /events/ payload is not what
one would expect (it gives the datetime of the commit, not when the series
was submitted) this means that poll-pw-sh can miss series. With the UNH lab
polling script we don't have this issue because we don't make use of the
since parameter in our /events/ request. I think the options for poll-pw.sh
going forward would be:
1. Update patchwork so that the datetime provided in the /events/ payload
is what is "expected" i.e. the datetime that the series was submitted at.
2. Adopt the UNH process of discarding the &since=${since} parameter, and
rely solely on tracking the most recently processed patchseries id, get the
newest patchseries id from /events/, and traverse the range of (since_id,
newest_id).

-I agree it makes sense for /events/ to support a "project" param.

Thanks Aaron for raising this conversation. We can continue the
conversation over email, or also in person at DPDK Prague!

[-- Attachment #2: Type: text/html, Size: 5503 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-05 16:08 Polling for patchseries in DPDK - the /series/ and /events/ endpoints Patrick Robb
@ 2025-05-06 14:12 ` Aaron Conole
  2025-05-06 19:08   ` Patrick Robb
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron Conole @ 2025-05-06 14:12 UTC (permalink / raw)
  To: Patrick Robb
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

Patrick Robb <probb@iol.unh.edu> writes:

> There was some discussion at last week's CI meeting about usage of the Patchwork
> /events/ endpoint for polling for patches, and issues with that process. Here is a relevant
> blurb, explaining some issues Aaron has run into using the dpdk-ci repo "poll-pw.sh" shell
> script: 
>
> ----------------
>
> * Discussion pertaining to looking at polling for series using the events API. This events
> endpoint (with series created event) returns info that a series has been created, but returns
> a limited set of data in the payload, and this necessitates a followup request to patchwork.
> So, this seems like it would actually increase the amount of requests made to the patchwork
> server. Some related issues discussed are:
>    * You cannot query the events endpoint for only events from a particular project (this
> matters for patchwork instances with many projects under them). For DPDK there are only 4
> projects under DPDK patchwork, so it’s not a huge deal, but still a small issue.
>    * The datetime that the series-created event returns is the datetimes of one of the
> commits in the series, not the datetime of when the series was submitted. So, this means
> that if you amend a commit (this does not update commit datetime) and resubmit a
> patchseries, the datetime on the series-created record will not be “updated”. This can cause
> us to miss series when polling via the events endpoint.

Sorry - I think there is still a misunderstanding here.

The datetime for the /series/ endpoint is what is provided in the patch
(so could be not updated)

The datetime for the /events/ endpoint is when the event fires (that is
when the series is received).

I can reply to the meeting minutes document with this as well.

> ------------------
>
> And for context, poll-pw.sh will check the /events/ endpoint for new series created events
> like so:
>
> --------------------
>
> URL="${URL}/events/?category=${resource_type}-completed"
>
> callcmd () # <patchwork id>
> {
> 	eval $cmd
> }
>
> while true ; do
> 	date_now=$(date --utc '+%FT%T')
> 	since=$(date --utc '+%FT%T' -d $(cat $since_file | tr '\n' ' '))
> 	page=1
> 	while true ; do
> 		ids=$(curl -s "${URL}&page=${page}&since=${since}" |
> 			jq "try ( .[] | select( .project.name == \"$project\" ) )" |
> 			jq "try ( .payload.${resource_type}.id )")
> 		[ -z "$(echo $ids | tr -d '\n')" ] && break
> 		for id in $ids ; do
> 			if grep -q "^${id}$" $poll_pw_ids_file ; then
> 				continue
> 			fi
> 			callcmd $id
> 			echo $id >>$poll_pw_ids_file
>
> -------------------
>
> But, as was discussed at the meeting, once you have the series ids, then you need to make a
> followup request to /series/{id}.
>
> UNH has a download_patchset.py polling script very much like poll-pw.sh except that,
> because we store extra info about our processed patchseries in a database (to facilitate
> lab.dpdk.org filtering functions), we use our database to get the most recently processed
> patchseries, instead of the "since_file." Our process (running every 10 minutes from Jenkins)
> is like this:
>
> 1. get the "since_id" from our database
> 2. get the "newest_id" from
> https://patchwork.dpdk.org/api/events/?category=series-completed. Get the [0] index of
> the json response (the most recent patchseries) and save that series id.
> 3. for seriesID in range(since_id, newest_id): get patch from
> https://patchwork.dpdk.org/api/series/{id}.
>
> So, both poll-pw.sh and our UNH script follow the process of making a request to /events/,
> and then followup requests for /series/. Thus the total number of requests being made on
> patchwork is (number of new patchseries + 1).
>
> -The most consequential difference in the two implementations is that poll-pw.sh makes a
> request to /events/ with the &since=${since} parameter, passing in a since datetime, and
> UNH does not. As Aaron explained at the CI meeting, because the datetime provided in the
> /events/ payload is not what one would expect (it gives the datetime of the commit, not
> when the series was submitted) this means that poll-pw-sh can miss series. With the UNH
> lab polling script we don't have this issue because we don't make use of the since
> parameter in our /events/ request. I think the options for poll-pw.sh going forward would
> be:
> 1. Update patchwork so that the datetime provided in the /events/ payload is what is
> "expected" i.e. the datetime that the series was submitted at.

That already is done.

> 2. Adopt the UNH process of discarding the &since=${since} parameter, and rely solely on
> tracking the most recently processed patchseries id, get the newest patchseries id from
> /events/, and traverse the range of (since_id, newest_id).
>
> -I agree it makes sense for /events/ to support a "project" param.
>
> Thanks Aaron for raising this conversation. We can continue the conversation over email, or
> also in person at DPDK Prague!

Let's keep discussing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-06 14:12 ` Aaron Conole
@ 2025-05-06 19:08   ` Patrick Robb
  2025-05-30 16:02     ` Patrick Robb
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Robb @ 2025-05-06 19:08 UTC (permalink / raw)
  To: Aaron Conole
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy, Rajesh T

[-- Attachment #1: Type: text/plain, Size: 119 bytes --]

Thanks for the clarification regarding the datetimes. Yes let's clear up
any remaining questions offline at Prague. :)

[-- Attachment #2: Type: text/html, Size: 146 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-06 19:08   ` Patrick Robb
@ 2025-05-30 16:02     ` Patrick Robb
  2025-07-10 20:11       ` Adam Hassick
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Robb @ 2025-05-30 16:02 UTC (permalink / raw)
  To: Adam Hassick
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy,
	Rajesh T, Aaron Conole

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

I went looking through recent series on patchwork and I think this is a
good example of the timestamp condition:
https://patchwork.dpdk.org/api/series/35145/. Looks like the original
commits were made on April 18, then the newest version was submitted on May
5 but the series record retains the 4/18 date.

And I see that there is no project filter provided by the /events/
endpoint. https://patchwork.dpdk.org/api/events/

Adam, would you agree the project filter for API requests is pretty low
hanging fruit? Seems like a common sense improvement to me.

On Tue, May 6, 2025 at 3:08 PM Patrick Robb <probb@iol.unh.edu> wrote:

> Thanks for the clarification regarding the datetimes. Yes let's clear up
> any remaining questions offline at Prague. :)
>

[-- Attachment #2: Type: text/html, Size: 1239 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Polling for patchseries in DPDK - the /series/ and /events/ endpoints
  2025-05-30 16:02     ` Patrick Robb
@ 2025-07-10 20:11       ` Adam Hassick
  0 siblings, 0 replies; 5+ messages in thread
From: Adam Hassick @ 2025-07-10 20:11 UTC (permalink / raw)
  To: Patrick Robb
  Cc: ci, dev, Ali Alnubani, Brandes, Shai, zhoumin, Puttaswamy,
	Rajesh T, Aaron Conole

Hi All,

It appears that there already is a filter for projects for events, but
the filter is configured to be hidden from the HTML view for some
reason. We don't need to make any code changes to enable this feature.
See here: https://github.com/getpatchwork/patchwork/blob/stable/3.2/patchwork/api/filters.py#L239

If you visit the following URL:
https://patchwork.dpdk.org/api/events/?category=series-created&project=CI
You can filter for series created events by project. This query
filters for series created events for the CI repo's project.

Adam

On Fri, May 30, 2025 at 12:07 PM Patrick Robb <probb@iol.unh.edu> wrote:
>
> I went looking through recent series on patchwork and I think this is a good example of the timestamp condition: https://patchwork.dpdk.org/api/series/35145/. Looks like the original commits were made on April 18, then the newest version was submitted on May 5 but the series record retains the 4/18 date.
>
> And I see that there is no project filter provided by the /events/ endpoint. https://patchwork.dpdk.org/api/events/
>
> Adam, would you agree the project filter for API requests is pretty low hanging fruit? Seems like a common sense improvement to me.
>
> On Tue, May 6, 2025 at 3:08 PM Patrick Robb <probb@iol.unh.edu> wrote:
>>
>> Thanks for the clarification regarding the datetimes. Yes let's clear up any remaining questions offline at Prague. :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-10 20:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-05 16:08 Polling for patchseries in DPDK - the /series/ and /events/ endpoints Patrick Robb
2025-05-06 14:12 ` Aaron Conole
2025-05-06 19:08   ` Patrick Robb
2025-05-30 16:02     ` Patrick Robb
2025-07-10 20:11       ` Adam Hassick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).