Recovering from crashes

Crashing, otherwise known as locking up or freezing, is a well-known and all-too-frequent phenomenon in the Windows world.  With Linux systems, on the other hand, it is not uncommon for servers and other always-on systems to go for months without needing a single reboot.  I’ve used Linux for roughly… oh around 9 years now, and Windows since the days of 3.x in the early 90’s, and can tell you from experience that Linux crashes are few and far between… especially when compared to the rate of Windows crashes.  That said, it can happen from time to time, and it’s better that you know how to recover from them.  Just powering off while everything’s running is not good for any operating system, Windows or otherwise, and may damage your data, so it’s best to avoid it if you can and try to get everything to shut down in an orderly manner.  Here are some ways to do it.

Track it down and kill it!

If a certain program has locked up but the computer is still functional, meaning you can still move the mouse, select menu items, etc., and it’s not practical or desirable to reboot the machine, then killing the program will do the trick.  (You may feel like killing it if you just typed up a long document without saving and it has locked up, potentially losing all your work, but that’s not what I’m talking about.)  You can open a terminal window (also known as Console, shell, or command prompt) and type in:
ps -ef | less
(If you haven’t seen that vertical bar  |  before it’s called a “pipe,” often found on the backslash \ key.  It looks like two little vertical lines, one stacked on top of the other, but when you type it it’s one solid line.) This will give you a long list of processes that the computer is currently running.   Here’s a little snippet of my listing as an example:
lisa 5861 1 14 Dec23 ? 01:41:47 /usr/lib/firefox/firefox
lisa 9522 5861 2 Dec23 ? 00:07:03 /usr/lib/nspluginwrapper/i386/li
lisa 11147 5510 0 00:27 ? 00:00:02 gnome-terminal
lisa 11149 11147 0 00:27 ? 00:00:00 gnome-pty-helper
lisa 11150 11147 0 00:27 pts/0 00:00:00 bash

You can now use the cursor up or down keys to scroll through your list and look for the name of the misbehaving program.  For the sake of argument, and because it’s what I have running at the moment, let’s say my Firefox has crashed.  The first column is my username, and the second is the process ID (pid for short).  The last column tells me the name of the process (program).  Take note of the pid’s of any processes that bear the name of the program giving you trouble.  In this case I’m looking for anything with firefox in it.  I can see Firefox is running with a pid of 5861, so I hit Q to quit out of the listing and type
kill 5861
Some distributions may complain that you don’t have the right to do this.  If so you will need to do it as the root user (aka administrator).  To do that in Ubuntu and other distributions based on it such as Kubuntu and Mint, add sudo in front of all the commands, as in sudo kill 5861.  It will ask for the root password the first time you do this.  For most other distributions type su and hit enter.  It will ask you for the root password (not your user password, remember) and after that the prompt will end with a # instead of a $, and you can just enter the commands as they appear here. If this fails to close the program, and you took note of more than one pid to kill for this program, repeat the kill command for each remaining pid. If this still doesn’t do it, use the same kill command(s) but with -9 option.  In my case:
kill -9 5861
Granted, the kill -9 option is not the nicest or neatest way to kill a program, which is why it’s a last resort, but it is the silver bullet… if nothing else does it this will.  If it doesn’t, you have probably missed another pid that needs to be killed or you aren’t properly running kill as the root user.

There are shortcuts to this process if you know the name of the program – that is, what you would type in at a terminal to start it up, you can just type pkill followed by the program name.  In this case, I happen to know it’s firefox.  So I can type
pkill firefox
and then
ps -ef | grep firefox
to check if it actually stopped the program and all its processes.  If not, the -9 option above works with pkill too.
pkill -9 firefox
That’s great, you say, but what if my whole system seems to have locked up?

ALTernate ways to get BACK in ConTRoL

If it’s not just one program and the system seems to be locked up, look for signs of life.  There is usually a light or LCD indicator on the front of your desktop computer or somewhere near the keyboard on your laptop, which blinks erratically when there is activity on your hard drive.  If all the lights remain steady, try hitting the NUMLOCK key a few times and watch the Numlock indicator on your keyboard or laptop to see if it goes on and off.  If there is hard drive activity and/or the Numlock seems to be responding, your best option may be to take a coffee break and let it do its thing for a few minutes.  In this case it’s not truly locked up because it’s still responsive.  Rather, it’s likely that you or some program started an intensive process and your computer is trying to catch up.  If, however, the hard drive is still chugging away after several minutes with still no changes on the screen, or there is no hard drive activity or Numlock response, you may have to resort to more drastic measures.  Hit the CTRL key, the ALT key, and the BACKspace key, all at the same time, in the same way you would do CTRL ALT DELete for Windows.  It’s okay to do this a few times, especially if the hard drive is chugging away busily as described above.  If the computer responds to this it will most likely close down any running programs in a neat manner, blank the screen, log you out and present you with a login screen.  If so, you should still ensure the memory is truly free of whatever offending process caused the problem but shutting down completely, waiting about 10 seconds, and powering back on.

No luck so far?  Hit CTRL ALT F1 all at the same time (The F1 key is usually near the ESCape key.)  You can try this a few times too.  If this works you will get switched from the crashed graphical interface to a full-screen terminal window.  It will prompt for your username and password.  For those of you who have to use sudo , log in with your regular username and password and keep using sudo below.  For everybody else, log in with username root and the root password.  Now type
shutdown -r now
You should start seeing scrolling text informing you of various processes being closed in an orderly manner.  It may take a little time but your system should soon reboot and you’ll be back in business.

SYStem REQuests – Prettypleeeeeaze shut down now

If you’re still not getting any response from your computer, try this tip from Juliet Kemp (yay, another g33kgrrl!)  The basic idea is that you can hold down ALT and the SYSREQ or SYSRQ key, while hitting certain letters to achieve an orderly reboot.  Juliet gives a list of letters to use, which I suggest doing in order, slowly:  R O K E I S U then wait for the hard drive to stop chugging away and do one last letter, B .  That is, hit all at the same time: ALT  SYSRQ   R {wait a few seconds} ALT  SYSRQ   O {wait}  …and so on until finishing with ALT  SYSRQ   B .  I haven’t had occasion to use this SYSRQ tip yet, thankfully, but I found it on a number of trusted well-known Linux help websites, so don’t be afraid to give it a whirl.

I’m warning you… I’ll shut you off

If you still are not feeling the love, try tapping the power button… and by that I mean just a quick tap like hitting a key.  I have successfully used this on few occasions as a last-ditch effort to get the computer to shut things down nicely.  The bad news is that if this doesn’t do it, probably nothing will.  The good news is it almost never gets stuck that badly.

He’s dead, Jim… you get his wallet, I’ll get his triquarter

There’s no hard drive activity.  The screen doesn’t change and the mouse cursor doesn”t move.  There’s no response whatsoever to NUMLOCK, or any of the other key combinations, or tapping the power switch.  It’s dead.  The best option at this point is to hold down the power switch for a few seconds until the system turns off.  Note that whenever you power down your system, always wait around 10 seconds before turning it back on.  (If you care, this has to do with letting the capacitors inside your computer discharge somewhat before hitting them with more power – thus avoiding an electrical spike to your delicate components.)  When it comes up, Linux will probably want to scan the hard drive for errors, just as Windows wants to run Scandisk after an ugly shutdown.  Let it, and watch that there are no error messages about corruption being found.  Lastly, take comfort in knowing that had this been Windows it would’ve happened at least a hundred times by now.  😉



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s