Black Screen of Death

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Black Screen of Death

john
Hi,

For the last 2 months I have been getting a problem where my machine
locks and after 2 seconds goes to a black screen. The machine is then
unusable, ctrl-alt-delete F1 or sys rescue keys do not work and you
cannot remote login.

I initially thought this was a problem with lxd but then experienced
issue when just using Enlightenment desktop. I have switched to xfce4
but still getting issue.

It happens 2 to 3 times a week.

There is nothing in logs (/var/log/messages).

I upgraded BIOS a beginning of year so don't think it's that but have
just upgraded again to see if that helps.

Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded kernel
to latest stable when running emerge --update...

Any idea on how to debug an issue like this?

I do dual boot with Windows and have not had this issue there so feels
like a Linux issue. Playing some quite intensive games on Windows but
last lock up on Linux was opening a terminal.

Any advice or guidance would be much appreciated.

John

Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

gentoo-user

[2019-07-18 23:29] jdm <[hidden email]>
> Hi,
Hi,

> For the last 2 months I have been getting a problem where my machine
> locks and after 2 seconds goes to a black screen. The machine is then
> unusable, ctrl-alt-delete F1 or sys rescue keys do not work and you
> cannot remote login.
>
> I initially thought this was a problem with lxd but then experienced
> issue when just using Enlightenment desktop. I have switched to xfce4
> but still getting issue.
>
> It happens 2 to 3 times a week.
>
> There is nothing in logs (/var/log/messages).
>
> I upgraded BIOS a beginning of year so don't think it's that but have
> just upgraded again to see if that helps.
>
> Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded kernel
> to latest stable when running emerge --update...
I'm not sure if it's still an issue with the 2nd gen cpus, but at least
with the first gen's I had lockups until I disabled the c-state
functionality in the UEFI. There was a previous thread on this list
discussing the issue which might be interesting if you haven't read it
yet [1].

[..]

[1]: https://archives.gentoo.org/gentoo-user/message/1cfd9cd62796b4a41a2152a66cc1df0e

Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

Adam Carter
In reply to this post by john
On Fri, Jul 19, 2019 at 8:28 AM jdm <[hidden email]> wrote:
Hi,

For the last 2 months I have been getting a problem where my machine
locks and after 2 seconds goes to a black screen. The machine is then
unusable, ctrl-alt-delete F1 or sys rescue keys do not work and you
cannot remote login.

I initially thought this was a problem with lxd but then experienced
issue when just using Enlightenment desktop. I have switched to xfce4
but still getting issue.

It happens 2 to 3 times a week.

There is nothing in logs (/var/log/messages).

I upgraded BIOS a beginning of year so don't think it's that but have
just upgraded again to see if that helps.

Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded kernel
to latest stable when running emerge --update...

Any idea on how to debug an issue like this?

A few questions;
- do you boot straight into X ? If so, have you tried disabling X to see if the CLI only is stable?
- Which video card and are you using and are you loading the firmware?
- Have you done an emerge @x11-module-rebuild since your last kernel update?
- Anything interesting in ~/.local/share/xorg/Xorg.0.log? If 'Build Operating System' and 'Current Operating System' are different you could try rebuilding xorg-server, but it shouldn't matter.

I've only heard of the c-state issue affecting early Ryzen 1s.
Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

john
On Fri, 19 Jul 2019 10:59:23 +1000
Adam Carter <[hidden email]> wrote:

> On Fri, Jul 19, 2019 at 8:28 AM jdm <[hidden email]> wrote:
>
> > Hi,
> >
> > For the last 2 months I have been getting a problem where my machine
> > locks and after 2 seconds goes to a black screen. The machine is
> > then unusable, ctrl-alt-delete F1 or sys rescue keys do not work
> > and you cannot remote login.
> >
> > I initially thought this was a problem with lxd but then experienced
> > issue when just using Enlightenment desktop. I have switched to
> > xfce4 but still getting issue.
> >
> > It happens 2 to 3 times a week.
> >
> > There is nothing in logs (/var/log/messages).
> >
> > I upgraded BIOS a beginning of year so don't think it's that but
> > have just upgraded again to see if that helps.
> >
> > Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded
> > kernel to latest stable when running emerge --update...
> >
> > Any idea on how to debug an issue like this?
> >  
>
> A few questions;
> - do you boot straight into X ? If so, have you tried disabling X to
> see if the CLI only is stable?
> - Which video card and are you using and are you loading the firmware?
> - Have you done an emerge @x11-module-rebuild since your last kernel
> update?
> - Anything interesting in ~/.local/share/xorg/Xorg.0.log? If 'Build
> Operating System' and 'Current Operating System' are different you
> could try rebuilding xorg-server, but it shouldn't matter.
>
> I've only heard of the c-state issue affecting early Ryzen 1s.

Hi,
Will try disabling X as always boot straight into WM.

Video card in Radeon RX480 and I have firmware loaded into kernel
as followed wiki. I will check this I have not looked into this for a
long time.

/.local/share/xorg/Xorg.0.log Current and Build are consistent.

I'll see if I can see c-state in BIOS which another email suggests as
well.

Thanks

John

 

Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

john
On Fri, 19 Jul 2019 06:09:47 +0100
jdm <[hidden email]> wrote:

> On Fri, 19 Jul 2019 10:59:23 +1000
> Adam Carter <[hidden email]> wrote:
>
> > On Fri, Jul 19, 2019 at 8:28 AM jdm <[hidden email]> wrote:
> >  
> > > Hi,
> > >
> > > For the last 2 months I have been getting a problem where my
> > > machine locks and after 2 seconds goes to a black screen. The
> > > machine is then unusable, ctrl-alt-delete F1 or sys rescue keys
> > > do not work and you cannot remote login.
> > >
> > > I initially thought this was a problem with lxd but then
> > > experienced issue when just using Enlightenment desktop. I have
> > > switched to xfce4 but still getting issue.
> > >
> > > It happens 2 to 3 times a week.
> > >
> > > There is nothing in logs (/var/log/messages).
> > >
> > > I upgraded BIOS a beginning of year so don't think it's that but
> > > have just upgraded again to see if that helps.
> > >
> > > Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded
> > > kernel to latest stable when running emerge --update...
> > >
> > > Any idea on how to debug an issue like this?
> > >    
> >
> > A few questions;
> > - do you boot straight into X ? If so, have you tried disabling X to
> > see if the CLI only is stable?
> > - Which video card and are you using and are you loading the
> > firmware?
> > - Have you done an emerge @x11-module-rebuild since your last kernel
> > update?
> > - Anything interesting in ~/.local/share/xorg/Xorg.0.log? If 'Build
> > Operating System' and 'Current Operating System' are different you
> > could try rebuilding xorg-server, but it shouldn't matter.
> >
> > I've only heard of the c-state issue affecting early Ryzen 1s.  
>
> Hi,
> Will try disabling X as always boot straight into WM.
>
> Video card in Radeon RX480 and I have firmware loaded into kernel
> as followed wiki. I will check this I have not looked into this for a
> long time.
>
> /.local/share/xorg/Xorg.0.log Current and Build are consistent.
>
> I'll see if I can see c-state in BIOS which another email suggests as
> well.
>
> Thanks
>
> John
>
>  
>

I have updated firmware line in kernel as this now includes a few extra
lines so not loading all of the available firmware.

Thanks for advice and I'll see how I get on.

John


Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

Stefan Schmiedl
Re: [gentoo-user] Black Screen of Death "jdm" <[hidden email]>, 19.07.2019, 07:28:

> I have updated firmware line in kernel as this now includes a few extra
> lines so not loading all of the available firmware.

> Thanks for advice and I'll see how I get on.

Can you trigger the crash by (over-)exerting the system?

A few years ago I had a box with 4x4GB of RAM and a similar
crash only occurred during heavy load, say recompiling gcc
with -j 9. Turned out that _two_ of the four RAM modules were
faulty. And statistical evidence led me to the conclusion
that those were only used when the first 8GB were exhausted.

If you can spare the box for a day, boot into memtest and
let it run for at least 12 hours.

s.
Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

Mick-10
In reply to this post by john
On Friday, 19 July 2019 06:28:56 BST jdm wrote:

> I have updated firmware line in kernel as this now includes a few extra
> lines so not loading all of the available firmware.

Some time ago I noticed warnings in dmesg which alerted me to the fact the
radeon graphics was not finding all the firmware it needed/wanted.  I had used
the gentoo radeon wiki page, but noticed the same graphics was now also
covered in the amdgpu wike page.  Anyway, I responded to each dmesg complaint
by adding the respective firmware in the kernel and a reboot resolved all
problems.  In my case I was only getting a black screen trying to wake up the
PC after a suspend in RAM, so not exactly the same problem, but similar
enough.
--
Regards,

Mick

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

Alan Mackenzie
In reply to this post by john
On Fri, Jul 19, 2019 at 01:30:48 +0200, [hidden email] wrote:

> [2019-07-18 23:29] jdm <[hidden email]>
> > Hi,
> Hi,

> > For the last 2 months I have been getting a problem where my machine
> > locks and after 2 seconds goes to a black screen. The machine is then
> > unusable, ctrl-alt-delete F1 or sys rescue keys do not work and you
> > cannot remote login.

> > I initially thought this was a problem with lxd but then experienced
> > issue when just using Enlightenment desktop. I have switched to xfce4
> > but still getting issue.

> > It happens 2 to 3 times a week.

> > There is nothing in logs (/var/log/messages).

> > I upgraded BIOS a beginning of year so don't think it's that but have
> > just upgraded again to see if that helps.

> > Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded kernel
> > to latest stable when running emerge --update...

I've been having this problem ever since I built my "new" (April 2017)
machine, Ryzen something or other on an ASUS Prime x370pro.  My machine
has been hanging, perhaps, about once a week.

> I'm not sure if it's still an issue with the 2nd gen cpus, but at least
> with the first gen's I had lockups until I disabled the c-state
> functionality in the UEFI. There was a previous thread on this list
> discussing the issue which might be interesting if you haven't read it
> yet [1].

Thanks for the tip!  I've just disabled c-state (whatever that might be)
in my UEFI BIOS, and so far my machine hasn't hung.  The test will be
whether or not it continues not to have hung over 4 - 6 weeks.

Unfortunately, the speed of the machine is ~10% slower than with c-state
enabled.  On a CPU intensive benchmark, I get these timings:

    With c-state enabled:    18.236s
    With c-state disabled:   20.052s

> [..]

> [1]: https://archives.gentoo.org/gentoo-user/message/1cfd9cd62796b4a41a2152a66cc1df0e

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

Rich Freeman
On Fri, Jul 19, 2019 at 7:52 AM Alan Mackenzie <[hidden email]> wrote:

>
> On Fri, Jul 19, 2019 at 01:30:48 +0200, [hidden email] wrote:
>
> > > Running ASUS Prime x470 Prime with ryzen 2700 and have upgraded kernel
> > > to latest stable when running emerge --update...
>
> I've been having this problem ever since I built my "new" (April 2017)
> machine, Ryzen something or other on an ASUS Prime x370pro.  My machine
> has been hanging, perhaps, about once a week.
>

What kernel version are you both running?  The new integrated vega
APUs and the vega GPUs use the amdgpu kernel driver, which is new.  It
was fairly unstable until recently.

If you're on the latest 4.19 that probably isn't the problem (not that
it is impossible).  4.19.59 is the current upstream longterm.  If
you're on something pre-4.18 I'd expect problems.  I don't know if all
the amdgpu issues are backported to 4.14 - I would probably stick with
4.19 with anything Ryzen/Vega-related.

It is possible this isn't the problem, but this sounds like some kind
of KMS-related issue.  If you can't switch to a virtual console with
KMS that is probably a kernel issue.

I'm assuming you are using the in-kernel amdgpu drivers.  If you're
using some kind of proprietary driver that could be a problem as well.

--
Rich

Reply | Threaded
Open this post in threaded view
|

Re: Black Screen of Death

William Kenworthy
In reply to this post by Stefan Schmiedl


On 7/19/19 2:48 PM, Stefan Schmiedl wrote:
Re: [gentoo-user] Black Screen of Death "jdm" [hidden email], 19.07.2019, 07:28:

> I have updated firmware line in kernel as this now includes a few extra
> lines so not loading all of the available firmware.

> Thanks for advice and I'll see how I get on.

Can you trigger the crash by (over-)exerting the system?

A few years ago I had a box with 4x4GB of RAM and a similar
crash only occurred during heavy load, say recompiling gcc
with -j 9. Turned out that _two_ of the four RAM modules were
faulty. And statistical evidence led me to the conclusion
that those were only used when the first 8GB were exhausted.

If you can spare the box for a day, boot into memtest and
let it run for at least 12 hours.

s.


Also, check if you have cpu isolation turned off in your kernel - when its on causes this type of crash.


Bill K.