mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
451 lines
22 KiB
451 lines
22 KiB
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) |
|
.. [see the bottom of this file for redistribution information] |
|
|
|
Reporting regressions |
|
+++++++++++++++++++++ |
|
|
|
"*We don't cause regressions*" is the first rule of Linux kernel development; |
|
Linux founder and lead developer Linus Torvalds established it himself and |
|
ensures it's obeyed. |
|
|
|
This document describes what the rule means for users and how the Linux kernel's |
|
development model ensures to address all reported regressions; aspects relevant |
|
for kernel developers are left to Documentation/process/handling-regressions.rst. |
|
|
|
|
|
The important bits (aka "TL;DR") |
|
================================ |
|
|
|
#. It's a regression if something running fine with one Linux kernel works worse |
|
or not at all with a newer version. Note, the newer kernel has to be compiled |
|
using a similar configuration; the detailed explanations below describes this |
|
and other fine print in more detail. |
|
|
|
#. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst, |
|
it already covers all aspects important for regressions and repeated |
|
below for convenience. Two of them are important: start your report's subject |
|
with "[REGRESSION]" and CC or forward it to `the regression mailing list |
|
<https://lore.kernel.org/regressions/>`_ ([email protected]). |
|
|
|
#. Optional, but recommended: when sending or forwarding your report, make the |
|
Linux kernel regression tracking bot "regzbot" track the issue by specifying |
|
when the regression started like this:: |
|
|
|
#regzbot introduced v5.13..v5.14-rc1 |
|
|
|
|
|
All the details on Linux kernel regressions relevant for users |
|
============================================================== |
|
|
|
|
|
The important basics |
|
-------------------- |
|
|
|
|
|
What is a "regression" and what is the "no regressions rule"? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
It's a regression if some application or practical use case running fine with |
|
one Linux kernel works worse or not at all with a newer version compiled using a |
|
similar configuration. The "no regressions rule" forbids this to take place; if |
|
it happens by accident, developers that caused it are expected to quickly fix |
|
the issue. |
|
|
|
It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with |
|
5.14 doesn't work at all, works significantly slower, or misbehaves somehow. |
|
It's also a regression if a perfectly working application suddenly shows erratic |
|
behavior with a newer kernel version; such issues can be caused by changes in |
|
procfs, sysfs, or one of the many other interfaces Linux provides to userland |
|
software. But keep in mind, as mentioned earlier: 5.14 in this example needs to |
|
be built from a configuration similar to the one from 5.13. This can be achieved |
|
using ``make olddefconfig``, as explained in more detail below. |
|
|
|
Note the "practical use case" in the first sentence of this section: developers |
|
despite the "no regressions" rule are free to change any aspect of the kernel |
|
and even APIs or ABIs to userland, as long as no existing application or use |
|
case breaks. |
|
|
|
Also be aware the "no regressions" rule covers only interfaces the kernel |
|
provides to the userland. It thus does not apply to kernel-internal interfaces |
|
like the module API, which some externally developed drivers use to hook into |
|
the kernel. |
|
|
|
How do I report a regression? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Just report the issue as outlined in |
|
Documentation/admin-guide/reporting-issues.rst, it already describes the |
|
important points. The following aspects outlined there are especially relevant |
|
for regressions: |
|
|
|
* When checking for existing reports to join, also search the `archives of the |
|
Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and |
|
`regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. |
|
|
|
* Start your report's subject with "[REGRESSION]". |
|
|
|
* In your report, clearly mention the last kernel version that worked fine and |
|
the first broken one. Ideally try to find the exact change causing the |
|
regression using a bisection, as explained below in more detail. |
|
|
|
* Remember to let the Linux regressions mailing list |
|
([email protected]) know about your report: |
|
|
|
* If you report the regression by mail, CC the regressions list. |
|
|
|
* If you report your regression to some bug tracker, forward the submitted |
|
report by mail to the regressions list while CCing the maintainer and the |
|
mailing list for the subsystem in question. |
|
|
|
If it's a regression within a stable or longterm series (e.g. |
|
v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list |
|
<https://lore.kernel.org/stable/>`_ ([email protected]). |
|
|
|
In case you performed a successful bisection, add everyone to the CC the |
|
culprit's commit message mentions in lines starting with "Signed-off-by:". |
|
|
|
When CCing for forwarding your report to the list, consider directly telling the |
|
aforementioned Linux kernel regression tracking bot about your report. To do |
|
that, include a paragraph like this in your mail:: |
|
|
|
#regzbot introduced: v5.13..v5.14-rc1 |
|
|
|
Regzbot will then consider your mail a report for a regression introduced in the |
|
specified version range. In above case Linux v5.13 still worked fine and Linux |
|
v5.14-rc1 was the first version where you encountered the issue. If you |
|
performed a bisection to find the commit that caused the regression, specify the |
|
culprit's commit-id instead:: |
|
|
|
#regzbot introduced: 1f2e3d4c5d |
|
|
|
Placing such a "regzbot command" is in your interest, as it will ensure the |
|
report won't fall through the cracks unnoticed. If you omit this, the Linux |
|
kernel's regressions tracker will take care of telling regzbot about your |
|
regression, as long as you send a copy to the regressions mailing lists. But the |
|
regression tracker is just one human which sometimes has to rest or occasionally |
|
might even enjoy some time away from computers (as crazy as that might sound). |
|
Relying on this person thus will result in an unnecessary delay before the |
|
regressions becomes mentioned `on the list of tracked and unresolved Linux |
|
kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the |
|
weekly regression reports sent by regzbot. Such delays can result in Linus |
|
Torvalds being unaware of important regressions when deciding between "continue |
|
development or call this finished and release the final?". |
|
|
|
Are really all regressions fixed? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Nearly all of them are, as long as the change causing the regression (the |
|
"culprit commit") is reliably identified. Some regressions can be fixed without |
|
this, but often it's required. |
|
|
|
Who needs to find the root cause of a regression? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Developers of the affected code area should try to locate the culprit on their |
|
own. But for them that's often impossible to do with reasonable effort, as quite |
|
a lot of issues only occur in a particular environment outside the developer's |
|
reach -- for example, a specific hardware platform, firmware, Linux distro, |
|
system's configuration, or application. That's why in the end it's often up to |
|
the reporter to locate the culprit commit; sometimes users might even need to |
|
run additional tests afterwards to pinpoint the exact root cause. Developers |
|
should offer advice and reasonably help where they can, to make this process |
|
relatively easy and achievable for typical users. |
|
|
|
How can I find the culprit? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Perform a bisection, as roughly outlined in |
|
Documentation/admin-guide/reporting-issues.rst and described in more detail by |
|
Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but |
|
in many cases finds the culprit relatively quickly. If it's hard or |
|
time-consuming to reliably reproduce the issue, consider teaming up with other |
|
affected users to narrow down the search range together. |
|
|
|
Who can I ask for advice when it comes to regressions? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Send a mail to the regressions mailing list ([email protected]) while |
|
CCing the Linux kernel's regression tracker ([email protected]); if the |
|
issue might better be dealt with in private, feel free to omit the list. |
|
|
|
|
|
Additional details about regressions |
|
------------------------------------ |
|
|
|
|
|
What is the goal of the "no regressions rule"? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Users should feel safe when updating kernel versions and not have to worry |
|
something might break. This is in the interest of the kernel developers to make |
|
updating attractive: they don't want users to stay on stable or longterm Linux |
|
series that are either abandoned or more than one and a half years old. That's |
|
in everybody's interest, as `those series might have known bugs, security |
|
issues, or other problematic aspects already fixed in later versions |
|
<http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_. |
|
Additionally, the kernel developers want to make it simple and appealing for |
|
users to test the latest pre-release or regular release. That's also in |
|
everybody's interest, as it's a lot easier to track down and fix problems, if |
|
they are reported shortly after being introduced. |
|
|
|
Is the "no regressions" rule really adhered in practice? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
It's taken really seriously, as can be seen by many mailing list posts from |
|
Linux creator and lead developer Linus Torvalds, some of which are quoted in |
|
Documentation/process/handling-regressions.rst. |
|
|
|
Exceptions to this rule are extremely rare; in the past developers almost always |
|
turned out to be wrong when they assumed a particular situation was warranting |
|
an exception. |
|
|
|
Who ensures the "no regressions" is actually followed? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
The subsystem maintainers should take care of that, which are watched and |
|
supported by the tree maintainers -- e.g. Linus Torvalds for mainline and |
|
Greg Kroah-Hartman et al. for various stable/longterm series. |
|
|
|
All of them are helped by people trying to ensure no regression report falls |
|
through the cracks. One of them is Thorsten Leemhuis, who's currently acting as |
|
the Linux kernel's "regressions tracker"; to facilitate this work he relies on |
|
regzbot, the Linux kernel regression tracking bot. That's why you want to bring |
|
your report on the radar of these people by CCing or forwarding each report to |
|
the regressions mailing list, ideally with a "regzbot command" in your mail to |
|
get it tracked immediately. |
|
|
|
How quickly are regressions normally fixed? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Developers should fix any reported regression as quickly as possible, to provide |
|
affected users with a solution in a timely manner and prevent more users from |
|
running into the issue; nevertheless developers need to take enough time and |
|
care to ensure regression fixes do not cause additional damage. |
|
|
|
The answer thus depends on various factors like the impact of a regression, its |
|
age, or the Linux series in which it occurs. In the end though, most regressions |
|
should be fixed within two weeks. |
|
|
|
Is it a regression, if the issue can be avoided by updating some software? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Almost always: yes. If a developer tells you otherwise, ask the regression |
|
tracker for advice as outlined above. |
|
|
|
Is it a regression, if a newer kernel works slower or consumes more energy? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Yes, but the difference has to be significant. A five percent slow-down in a |
|
micro-benchmark thus is unlikely to qualify as regression, unless it also |
|
influences the results of a broad benchmark by more than one percent. If in |
|
doubt, ask for advice. |
|
|
|
Is it a regression, if an external kernel module breaks when updating Linux? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
No, as the "no regression" rule is about interfaces and services the Linux |
|
kernel provides to the userland. It thus does not cover building or running |
|
externally developed kernel modules, as they run in kernel-space and hook into |
|
the kernel using internal interfaces occasionally changed. |
|
|
|
How are regressions handled that are caused by security fixes? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
In extremely rare situations security issues can't be fixed without causing |
|
regressions; those fixes are given way, as they are the lesser evil in the end. |
|
Luckily this middling almost always can be avoided, as key developers for the |
|
affected area and often Linus Torvalds himself try very hard to fix security |
|
issues without causing regressions. |
|
|
|
If you nevertheless face such a case, check the mailing list archives if people |
|
tried their best to avoid the regression. If not, report it; if in doubt, ask |
|
for advice as outlined above. |
|
|
|
What happens if fixing a regression is impossible without causing another? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Sadly these things happen, but luckily not very often; if they occur, expert |
|
developers of the affected code area should look into the issue to find a fix |
|
that avoids regressions or at least their impact. If you run into such a |
|
situation, do what was outlined already for regressions caused by security |
|
fixes: check earlier discussions if people already tried their best and ask for |
|
advice if in doubt. |
|
|
|
A quick note while at it: these situations could be avoided, if people would |
|
regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each |
|
development cycle a test run. This is best explained by imagining a change |
|
integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at |
|
the same time is a hard requirement for some other improvement applied for |
|
5.15-rc1. All these changes often can simply be reverted and the regression thus |
|
solved, if someone finds and reports it before 5.15 is released. A few days or |
|
weeks later this solution can become impossible, as some software might have |
|
started to rely on aspects introduced by one of the follow-up changes: reverting |
|
all changes would then cause a regression for users of said software and thus is |
|
out of the question. |
|
|
|
Is it a regression, if some feature I relied on was removed months ago? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
It is, but often it's hard to fix such regressions due to the aspects outlined |
|
in the previous section. It hence needs to be dealt with on a case-by-case |
|
basis. This is another reason why it's in everybody's interest to regularly test |
|
mainline pre-releases. |
|
|
|
Does the "no regression" rule apply if I seem to be the only affected person? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
It does, but only for practical usage: the Linux developers want to be free to |
|
remove support for hardware only to be found in attics and museums anymore. |
|
|
|
Note, sometimes regressions can't be avoided to make progress -- and the latter |
|
is needed to prevent Linux from stagnation. Hence, if only very few users seem |
|
to be affected by a regression, it for the greater good might be in their and |
|
everyone else's interest to lettings things pass. Especially if there is an |
|
easy way to circumvent the regression somehow, for example by updating some |
|
software or using a kernel parameter created just for this purpose. |
|
|
|
Does the regression rule apply for code in the staging tree as well? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Not according to the `help text for the configuration option covering all |
|
staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_, |
|
which since its early days states:: |
|
|
|
Please note that these drivers are under heavy development, may or |
|
may not work, and may contain userspace interfaces that most likely |
|
will be changed in the near future. |
|
|
|
The staging developers nevertheless often adhere to the "no regressions" rule, |
|
but sometimes bend it to make progress. That's for example why some users had to |
|
deal with (often negligible) regressions when a WiFi driver from the staging |
|
tree was replaced by a totally different one written from scratch. |
|
|
|
Why do later versions have to be "compiled with a similar configuration"? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Because the Linux kernel developers sometimes integrate changes known to cause |
|
regressions, but make them optional and disable them in the kernel's default |
|
configuration. This trick allows progress, as the "no regressions" rule |
|
otherwise would lead to stagnation. |
|
|
|
Consider for example a new security feature blocking access to some kernel |
|
interfaces often abused by malware, which at the same time are required to run a |
|
few rarely used applications. The outlined approach makes both camps happy: |
|
people using these applications can leave the new security feature off, while |
|
everyone else can enable it without running into trouble. |
|
|
|
How to create a configuration similar to the one of an older kernel? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Start your machine with a known-good kernel and configure the newer Linux |
|
version with ``make olddefconfig``. This makes the kernel's build scripts pick |
|
up the configuration file (the ".config" file) from the running kernel as base |
|
for the new one you are about to compile; afterwards they set all new |
|
configuration options to their default value, which should disable new features |
|
that might cause regressions. |
|
|
|
Can I report a regression I found with pre-compiled vanilla kernels? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
You need to ensure the newer kernel was compiled with a similar configuration |
|
file as the older one (see above), as those that built them might have enabled |
|
some known-to-be incompatible feature for the newer kernel. If in doubt, report |
|
the matter to the kernel's provider and ask for advice. |
|
|
|
|
|
More about regression tracking with "regzbot" |
|
--------------------------------------------- |
|
|
|
What is regression tracking and why should I care about it? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Rules like "no regressions" need someone to ensure they are followed, otherwise |
|
they are broken either accidentally or on purpose. History has shown this to be |
|
true for Linux kernel development as well. That's why Thorsten Leemhuis, the |
|
Linux Kernel's regression tracker, and some people try to ensure all regression |
|
are fixed by keeping an eye on them until they are resolved. Neither of them are |
|
paid for this, that's why the work is done on a best effort basis. |
|
|
|
Why and how are Linux kernel regressions tracked using a bot? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Tracking regressions completely manually has proven to be quite hard due to the |
|
distributed and loosely structured nature of Linux kernel development process. |
|
That's why the Linux kernel's regression tracker developed regzbot to facilitate |
|
the work, with the long term goal to automate regression tracking as much as |
|
possible for everyone involved. |
|
|
|
Regzbot works by watching for replies to reports of tracked regressions. |
|
Additionally, it's looking out for posted or committed patches referencing such |
|
reports with "Link:" tags; replies to such patch postings are tracked as well. |
|
Combined this data provides good insights into the current state of the fixing |
|
process. |
|
|
|
How to see which regressions regzbot tracks currently? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. |
|
|
|
What kind of issues are supposed to be tracked by regzbot? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
The bot is meant to track regressions, hence please don't involve regzbot for |
|
regular issues. But it's okay for the Linux kernel's regression tracker if you |
|
involve regzbot to track severe issues, like reports about hangs, corrupted |
|
data, or internal errors (Panic, Oops, BUG(), warning, ...). |
|
|
|
How to change aspects of a tracked regression? |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
By using a 'regzbot command' in a direct or indirect reply to the mail with the |
|
report. The easiest way to do that: find the report in your "Sent" folder or the |
|
mailing list archive and reply to it using your mailer's "Reply-all" function. |
|
In that mail, use one of the following commands in a stand-alone paragraph (IOW: |
|
use blank lines to separate one or multiple of these commands from the rest of |
|
the mail's text). |
|
|
|
* Update when the regression started to happen, for example after performing a |
|
bisection:: |
|
|
|
#regzbot introduced: 1f2e3d4c5d |
|
|
|
* Set or update the title:: |
|
|
|
#regzbot title: foo |
|
|
|
* Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of |
|
the issue or a fix are discussed::: |
|
|
|
#regzbot monitor: https://lore.kernel.org/r/[email protected]/ |
|
#regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 |
|
|
|
* Point to a place with further details of interest, like a mailing list post |
|
or a ticket in a bug tracker that are slightly related, but about a different |
|
topic:: |
|
|
|
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 |
|
|
|
* Mark a regression as invalid:: |
|
|
|
#regzbot invalid: wasn't a regression, problem has always existed |
|
|
|
Regzbot supports a few other commands primarily used by developers or people |
|
tracking regressions. They and more details about the aforementioned regzbot |
|
commands can be found in the `getting started guide |
|
<https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and |
|
the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ |
|
for regzbot. |
|
|
|
.. |
|
end-of-content |
|
.. |
|
This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top |
|
of the file. If you want to distribute this text under CC-BY-4.0 only, |
|
please use "The Linux kernel developers" for author attribution and link |
|
this as source: |
|
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst |
|
.. |
|
Note: Only the content of this RST file as found in the Linux kernel sources |
|
is available under CC-BY-4.0, as versions of this text that were processed |
|
(for example by the kernel's build system) might contain content taken from |
|
files which use a more restrictive license.
|
|
|