DPDK CI discussions
 help / color / Atom feed
* [dpdk-ci] [RFC] test lab database schema
@ 2017-11-09 23:53 Patrick MacArthur
  2017-11-10 12:33 ` Shepard Siegel
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Patrick MacArthur @ 2017-11-09 23:53 UTC (permalink / raw)
  To: ci; +Cc: Bob Noseworthy

[-- Attachment #1: Type: text/plain, Size: 2331 bytes --]

Hi, all,

I have been working on a database schema for storing performance
results. I am attempting to make this generic enough to store whatever
measurements we want.

I have attached an entity-relationship diagram of the schema which
should illustrate the starting point for the schema.

As a side note, I have managed to get DTS to run some of the
functional tests on a pair of nodes locally in a test setup while I
wait for equipment to be ready to ship. I am still working on a setup
to get it to run the performance tests so I can get some output to
parse to begin working on pulling information into the database.

Some notes on the tables:

patch: I propose that for this schema patches will be stored on the
filesystem content-addressable by the sha256 hash of the patch file.

patchset: "branch" refers to corresponding repo (master -> dpdk,
dpdk-next-net -> dpdk/next-net, etc.) and will be NULL until the
patchset is applied. Tarballs stored as files named after patchset id.

patchset_result: Result entry for a patchset. A patchset passes if
there is a result for every measurement which is either PASS or N/A.

environment: This should represent everything about where the test was
run and a new environment needs to be created every time this changes
(e.g., kernel or compiler update). I gathered the list of fields by
looking at the existing performance reports on the DPDK website. This
can be used for verification, to allow the test environment to be
reproducible, and to ensure that all comparisons are within an
identical setup.

measurement: A single measurement which can be applied to any
patchset. We can use values like (name: “BUILD”, higherIsBetter: TRUE,
expectedValue: 1, deltaLimit: 0) to verify non-performance conditions,
such as the build succeeding for the given environment.

The primary keys for these tables are not shown; I will likely be
implementing the database using the data modeling framework for
whatever Web backend we wind up selecting, which will set up primary
keys and table join relationships automatically.

Comments/suggestions? Is there anything that this schema does not cover?

Thanks,
Patrick

-- 
Patrick MacArthur
Research and Development, High Performance Networking and Storage
UNH InterOperability Laboratory

[-- Attachment #2: Test Result ERD.pdf --]
[-- Type: application/pdf, Size: 15176 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-ci] [RFC] test lab database schema
  2017-11-09 23:53 [dpdk-ci] [RFC] test lab database schema Patrick MacArthur
@ 2017-11-10 12:33 ` Shepard Siegel
  2017-11-28 14:14 ` Thomas Monjalon
  2017-11-28 16:32 ` Gema Gomez
  2 siblings, 0 replies; 6+ messages in thread
From: Shepard Siegel @ 2017-11-10 12:33 UTC (permalink / raw)
  To: Patrick MacArthur; +Cc: ci, Bob Noseworthy

[-- Attachment #1: Type: text/plain, Size: 3272 bytes --]

Patrick,

This is a great start. Thanks. Those of us supporting FPGA based NICs have
the added challenge of almost a universe of different firmware/gateware -
sometimes capable of being changed at runtime (e.g. partial
reconfiguration). I think your schema as-is covers this; but I feel it
would be better to additionally include a sha256 hash field for something
like "Firmware Source ID". That is, every time we make new FPGA NIC
Firmware (dozens of times per day), baked into the bitstream is the hash of
the git commit that was used to produce that bitstream. This is enormously
empowering, as it allows one later at run time, to determine exactly what
the code was that created that image.

-Shep

Shepard Siegel, CTO
atomicrules.com


On Thu, Nov 9, 2017 at 6:53 PM, Patrick MacArthur <pmacarth@iol.unh.edu>
wrote:

> Hi, all,
>
> I have been working on a database schema for storing performance
> results. I am attempting to make this generic enough to store whatever
> measurements we want.
>
> I have attached an entity-relationship diagram of the schema which
> should illustrate the starting point for the schema.
>
> As a side note, I have managed to get DTS to run some of the
> functional tests on a pair of nodes locally in a test setup while I
> wait for equipment to be ready to ship. I am still working on a setup
> to get it to run the performance tests so I can get some output to
> parse to begin working on pulling information into the database.
>
> Some notes on the tables:
>
> patch: I propose that for this schema patches will be stored on the
> filesystem content-addressable by the sha256 hash of the patch file.
>
> patchset: "branch" refers to corresponding repo (master -> dpdk,
> dpdk-next-net -> dpdk/next-net, etc.) and will be NULL until the
> patchset is applied. Tarballs stored as files named after patchset id.
>
> patchset_result: Result entry for a patchset. A patchset passes if
> there is a result for every measurement which is either PASS or N/A.
>
> environment: This should represent everything about where the test was
> run and a new environment needs to be created every time this changes
> (e.g., kernel or compiler update). I gathered the list of fields by
> looking at the existing performance reports on the DPDK website. This
> can be used for verification, to allow the test environment to be
> reproducible, and to ensure that all comparisons are within an
> identical setup.
>
> measurement: A single measurement which can be applied to any
> patchset. We can use values like (name: “BUILD”, higherIsBetter: TRUE,
> expectedValue: 1, deltaLimit: 0) to verify non-performance conditions,
> such as the build succeeding for the given environment.
>
> The primary keys for these tables are not shown; I will likely be
> implementing the database using the data modeling framework for
> whatever Web backend we wind up selecting, which will set up primary
> keys and table join relationships automatically.
>
> Comments/suggestions? Is there anything that this schema does not cover?
>
> Thanks,
> Patrick
>
> --
> Patrick MacArthur
> Research and Development, High Performance Networking and Storage
> UNH InterOperability Laboratory
>

[-- Attachment #2: Type: text/html, Size: 3855 bytes --]

<div dir="ltr"><div><div><div><div>Patrick,<br><br></div>This is a great start. Thanks. Those of us supporting FPGA based NICs have the added challenge of almost a universe of different firmware/gateware - sometimes capable of being changed at runtime (e.g. partial reconfiguration). I think your schema as-is covers this; but I feel it would be better to additionally include a sha256 hash field for something like &quot;Firmware Source ID&quot;. That is, every time we make new FPGA NIC Firmware (dozens of times per day), baked into the bitstream is the hash of the git commit that was used to produce that bitstream. This is enormously empowering, as it allows one later at run time, to determine exactly what the code was that created that image.<br><br></div>-Shep<br><br></div>Shepard Siegel, CTO<br></div><a href="http://atomicrules.com">atomicrules.com</a><br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 9, 2017 at 6:53 PM, Patrick MacArthur <span dir="ltr">&lt;<a href="mailto:pmacarth@iol.unh.edu" target="_blank">pmacarth@iol.unh.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi, all,<br>
<br>
I have been working on a database schema for storing performance<br>
results. I am attempting to make this generic enough to store whatever<br>
measurements we want.<br>
<br>
I have attached an entity-relationship diagram of the schema which<br>
should illustrate the starting point for the schema.<br>
<br>
As a side note, I have managed to get DTS to run some of the<br>
functional tests on a pair of nodes locally in a test setup while I<br>
wait for equipment to be ready to ship. I am still working on a setup<br>
to get it to run the performance tests so I can get some output to<br>
parse to begin working on pulling information into the database.<br>
<br>
Some notes on the tables:<br>
<br>
patch: I propose that for this schema patches will be stored on the<br>
filesystem content-addressable by the sha256 hash of the patch file.<br>
<br>
patchset: &quot;branch&quot; refers to corresponding repo (master -&gt; dpdk,<br>
dpdk-next-net -&gt; dpdk/next-net, etc.) and will be NULL until the<br>
patchset is applied. Tarballs stored as files named after patchset id.<br>
<br>
patchset_result: Result entry for a patchset. A patchset passes if<br>
there is a result for every measurement which is either PASS or N/A.<br>
<br>
environment: This should represent everything about where the test was<br>
run and a new environment needs to be created every time this changes<br>
(e.g., kernel or compiler update). I gathered the list of fields by<br>
looking at the existing performance reports on the DPDK website. This<br>
can be used for verification, to allow the test environment to be<br>
reproducible, and to ensure that all comparisons are within an<br>
identical setup.<br>
<br>
measurement: A single measurement which can be applied to any<br>
patchset. We can use values like (name: “BUILD”, higherIsBetter: TRUE,<br>
expectedValue: 1, deltaLimit: 0) to verify non-performance conditions,<br>
such as the build succeeding for the given environment.<br>
<br>
The primary keys for these tables are not shown; I will likely be<br>
implementing the database using the data modeling framework for<br>
whatever Web backend we wind up selecting, which will set up primary<br>
keys and table join relationships automatically.<br>
<br>
Comments/suggestions? Is there anything that this schema does not cover?<br>
<br>
Thanks,<br>
Patrick<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Patrick MacArthur<br>
Research and Development, High Performance Networking and Storage<br>
UNH InterOperability Laboratory<br>
</font></span></blockquote></div><br></div>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-ci] [RFC] test lab database schema
  2017-11-09 23:53 [dpdk-ci] [RFC] test lab database schema Patrick MacArthur
  2017-11-10 12:33 ` Shepard Siegel
@ 2017-11-28 14:14 ` Thomas Monjalon
  2017-11-28 16:32 ` Gema Gomez
  2 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2017-11-28 14:14 UTC (permalink / raw)
  To: Patrick MacArthur; +Cc: ci, Bob Noseworthy

Hi,

10/11/2017 00:53, Patrick MacArthur:
> I have been working on a database schema for storing performance
> results. I am attempting to make this generic enough to store whatever
> measurements we want.

Thanks for working on it.

> I have attached an entity-relationship diagram of the schema which
> should illustrate the starting point for the schema.

I will do some comments below.

[...]
> patch: I propose that for this schema patches will be stored on the
> filesystem content-addressable by the sha256 hash of the patch file.

I don't see the need to store the full diff.
You could store the patchwork id to retrieve it.

> patchset: "branch" refers to corresponding repo (master -> dpdk,
> dpdk-next-net -> dpdk/next-net, etc.) and will be NULL until the
> patchset is applied. Tarballs stored as files named after patchset id.

You should also be able to store measurements for a given state of a git tree.
It will be used for regular tests (like daily).

> patchset_result: Result entry for a patchset. A patchset passes if
> there is a result for every measurement which is either PASS or N/A.
> 
> environment: This should represent everything about where the test was
> run and a new environment needs to be created every time this changes
> (e.g., kernel or compiler update). I gathered the list of fields by
> looking at the existing performance reports on the DPDK website. This
> can be used for verification, to allow the test environment to be
> reproducible, and to ensure that all comparisons are within an
> identical setup.

Do we want a limited set of environment fields, or something flexible?
For instance, there is a field dts_config, but we could use other test tools.

> measurement: A single measurement which can be applied to any
> patchset. We can use values like (name: “BUILD”, higherIsBetter: TRUE,
> expectedValue: 1, deltaLimit: 0) to verify non-performance conditions,
> such as the build succeeding for the given environment.

I think we should store the normal values per-environment at different time.
I mean, the normal value will evolve with time and we should keep history of it.

> The primary keys for these tables are not shown; I will likely be
> implementing the database using the data modeling framework for
> whatever Web backend we wind up selecting, which will set up primary
> keys and table join relationships automatically.
> 
> Comments/suggestions? Is there anything that this schema does not cover?

As I said above, we need to cover tests based on git state, not patchset.

Thanks :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-ci] [RFC] test lab database schema
  2017-11-09 23:53 [dpdk-ci] [RFC] test lab database schema Patrick MacArthur
  2017-11-10 12:33 ` Shepard Siegel
  2017-11-28 14:14 ` Thomas Monjalon
@ 2017-11-28 16:32 ` Gema Gomez
  2017-11-28 17:13   ` Thomas Monjalon
  2 siblings, 1 reply; 6+ messages in thread
From: Gema Gomez @ 2017-11-28 16:32 UTC (permalink / raw)
  To: ci

Hi Patrick,

thanks for starting this work.

As per our discussion today, summarizing my feedback on an email:

- I would like to have some way of debugging issues when there are
performance regressions. Even if that is not a primary goal at this
stage, being able to do that in the future would be important for CI.
Specially when there are regressions on HW that the community / patch
writer won't have access to. Making performance counters available would
be useful.

- Make sure every server is distinguishable from each other in the
database. This is important when it comes to comparing performance
metrics as different servers from same vendor may vary in terms of
performance measurements.

- Would be nice to make the column names homogeneous (i.e. underscores
or camel casing, but not both), I vote for underscores.

- There is kernel_version on the database but no OS/distro/installed
packages. Is there a default distro that everybody is going to test on?
Having a list of packages/versions were installed on the server when the
tests run will be useful for traceability.

Having the UI as something to think about/implement later worries me.
Representing the data in a way that it is consumable by engineering is
not an easy task, and it is important. It will also affect the database
design. I'd like to contribute an example performance graph/drawings
from a previous life that may or may not be useful:

http://ci.ubuntu.com/bootspeed/arch/amd64/

Cheers,
Gema


On 09/11/17 23:53, Patrick MacArthur wrote:
> Hi, all,
> 
> I have been working on a database schema for storing performance
> results. I am attempting to make this generic enough to store whatever
> measurements we want.
> 
> I have attached an entity-relationship diagram of the schema which
> should illustrate the starting point for the schema.
> 
> As a side note, I have managed to get DTS to run some of the
> functional tests on a pair of nodes locally in a test setup while I
> wait for equipment to be ready to ship. I am still working on a setup
> to get it to run the performance tests so I can get some output to
> parse to begin working on pulling information into the database.
> 
> Some notes on the tables:
> 
> patch: I propose that for this schema patches will be stored on the
> filesystem content-addressable by the sha256 hash of the patch file.
> 
> patchset: "branch" refers to corresponding repo (master -> dpdk,
> dpdk-next-net -> dpdk/next-net, etc.) and will be NULL until the
> patchset is applied. Tarballs stored as files named after patchset id.
> 
> patchset_result: Result entry for a patchset. A patchset passes if
> there is a result for every measurement which is either PASS or N/A.
> 
> environment: This should represent everything about where the test was
> run and a new environment needs to be created every time this changes
> (e.g., kernel or compiler update). I gathered the list of fields by
> looking at the existing performance reports on the DPDK website. This
> can be used for verification, to allow the test environment to be
> reproducible, and to ensure that all comparisons are within an
> identical setup.
> 
> measurement: A single measurement which can be applied to any
> patchset. We can use values like (name: “BUILD”, higherIsBetter: TRUE,
> expectedValue: 1, deltaLimit: 0) to verify non-performance conditions,
> such as the build succeeding for the given environment.
> 
> The primary keys for these tables are not shown; I will likely be
> implementing the database using the data modeling framework for
> whatever Web backend we wind up selecting, which will set up primary
> keys and table join relationships automatically.
> 
> Comments/suggestions? Is there anything that this schema does not cover?
> 
> Thanks,
> Patrick
> 


-- 
Gema Gomez-Solano
Tech Lead, SDI
Linaro Ltd
IRC: gema@#linaro on irc.freenode.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-ci] [RFC] test lab database schema
  2017-11-28 16:32 ` Gema Gomez
@ 2017-11-28 17:13   ` Thomas Monjalon
  2017-11-28 17:26     ` Gema Gomez
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Monjalon @ 2017-11-28 17:13 UTC (permalink / raw)
  To: Gema Gomez; +Cc: ci

28/11/2017 17:32, Gema Gomez:
> Having the UI as something to think about/implement later worries me.
> Representing the data in a way that it is consumable by engineering is
> not an easy task, and it is important. It will also affect the database
> design. I'd like to contribute an example performance graph/drawings
> from a previous life that may or may not be useful:
> 
> http://ci.ubuntu.com/bootspeed/arch/amd64/

Regarding the UI, I think we can use grafana which is highly customizable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-ci] [RFC] test lab database schema
  2017-11-28 17:13   ` Thomas Monjalon
@ 2017-11-28 17:26     ` Gema Gomez
  0 siblings, 0 replies; 6+ messages in thread
From: Gema Gomez @ 2017-11-28 17:26 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: ci

On 28/11/17 17:13, Thomas Monjalon wrote:
> 28/11/2017 17:32, Gema Gomez:
>> Having the UI as something to think about/implement later worries me.
>> Representing the data in a way that it is consumable by engineering is
>> not an easy task, and it is important. It will also affect the database
>> design. I'd like to contribute an example performance graph/drawings
>> from a previous life that may or may not be useful:
>>
>> http://ci.ubuntu.com/bootspeed/arch/amd64/
> 
> Regarding the UI, I think we can use grafana which is highly customizable.
> 

Agreed, but grafana is a tool and can be used in many different ways. I
was after what data is going to display, what is going to be tested,
what frequency, how are engineering decisions going to be made from that
data, etc. Test plan kind of information. I have heard that we want to
draw graphs but also make pass/fail decisions based on thresholds, all
of these will affect the data that needs to be stored in the database.

Cheers,
Gema

-- 
Gema Gomez-Solano
Tech Lead, SDI
Linaro Ltd
IRC: gema@#linaro on irc.freenode.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 23:53 [dpdk-ci] [RFC] test lab database schema Patrick MacArthur
2017-11-10 12:33 ` Shepard Siegel
2017-11-28 14:14 ` Thomas Monjalon
2017-11-28 16:32 ` Gema Gomez
2017-11-28 17:13   ` Thomas Monjalon
2017-11-28 17:26     ` Gema Gomez

DPDK CI discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/ci/0 ci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ci ci/ http://inbox.dpdk.org/ci \
		ci@dpdk.org
	public-inbox-index ci


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.ci


AGPL code for this site: git clone https://public-inbox.org/ public-inbox