6. When Bad Things Happen To Good People
Of course we can't cover every Bad Thing that happens, but I'll outline some items of common sense.
There are two types of bad things: random and repeatable. It's very difficult to diagnose or fix random problems that you don't have any control over when they happen or not. However, if the problem is repeatable "it happens when I press the left arrow key twice", then you're in business.
6.1. RTFM!
Read the friendly manual. The `manual' can take on a few forms. For open source games there's the readme files that come with the game. Commercial games will have a printed manual and maybe some readme files on the CD the game came on. Don't forget to browse the CD your game came on for helpful tips and advice.
Don't forget the game's website. The game's author has probably seen people with your exact same problem many times over and might put information specific to that game on the website. A prime example of this is Loki Software's online FAQs located at http://faqs.lokigames.com.
6.2. Look For Updates and Patches
If you're playing an open source game that you compiled, make sure you have the newest version by checking the game's website. If your game came from a distro make sure there's not an update rpm/deb for the game.
Commercial game companies like Loki release patches for their games. Often a game will have MANY patches (Myth2) and some games are unplayable without them (Heretic2). Check the game's website for patches whether you have a problem running the game or not; there may be an update for a security problem that you may not even be aware of.
By the way, Loki now has a utility that searches for Loki Software on your hard drive and automatically updates them. Check out http://updates.lokigames.com.
6.3. Newsgroups
If you don't know what netnews (Usenet) is, then this is definitely worth 30 minutes of your time to learn about. Install a newsreader. I prefer console tools more, so I use tin, but slrn is also popular. Netscape has a nice graphical "point and click" newsreader too.
For instance, I can browse Loki Software's news server with tin -g news.lokigames.com. You can also specify which news server to use using the $NNTP environment variable or with the file /etc/nntpserver.
6.4. Google Group Search
Every post made to Usenet gets archived at Google's database at http://groups.google.com. This archive used to be at http://www.deja.com, but was bought by Google. Many people still know the archive as "deja".
It's almost certain that whatever problem you have with Linux, gaming related or not, has already been asked about and answered on Usenet. Not once, not twice, but many times over. If you don't understand the first response you see (or if it doesn't work), then try one of the other many replies. If the page is not in a language you can understand, there are many translation sites which will convert the text into whatever language you like, including http://www.freetranslation.com and http://translation.lycos.com. My web browser of choice, Opera (available at http://www.opera.com) allows you to use the right mouse button to select a portion of text and left click the selection to translate it into another language. Very useful when a Google group search yields a page in German which looks useful and my wife (who reads German well) isn't around.
The Google group search has a basic and advanced search page. Don't bother with the simple search. The advanced search is at http://groups.google.com/advanced_group_search.
It's easy to use. For example, if my problem was that Quake III crashed everytime Lucy jumps, I would enter "linux quake3 crash lucy jumps" in the "Find messages with all of the words" textbox.
There are fields for which newsgroup you want to narrow your search to. Take the time to read and understand what each field means. I promise you. You won't be disappointed with this service. Use it, and you'll be a much happier person. Do note that they don't archive private newsgroups, like Loki Software's news server. However, so many people use Usenet, it almost doesn't matter.
6.5. Debugging: call traces and core files
This is generally not something you'll do for commercial games. For open source games, you can help the author by giving a corefile or stack trace. Very quickly, a core file (aka core dump) is a file that holds the "state" of the program at the moment it crashes. It holds valuable clues for the programmer to the nature of the crash -- what caused it and what the program was doing when it happened. If you want to learn more about core files, I have a great gdb tutorial at http://www.dirac.org/linux.
At the *very* least, the author will be interested in the call stack when the game crashed. Here is how you can get the call stack at barf-time:
Sometimes distros set up their OS so that core files (which are mainly useful to programmers) aren't generated. The first step is to make your system allow unlimited coresizes:
ulimit -c unlimited |
You will now have to recompile the program and pass the -g option to gcc (explaining this is beyond the scope of this document). Now, run the game and do whatever you did to crash the program and dump a core again. Run the debugger with the core file as the 2nd argument:
$ gdb CoolGameExecutable core |
And at the (gdb) prompt, type "backtrace". You'll see something like:
#0 printf (format=0x80484a4 "z is %d.\n") at printf.c:30 #1 0x8048431 in display (z=5) at try1.c:11 #2 0x8048406 in main () at try1.c:6 |
It may be quite long, but use your mouse to cut and paste this information into a file. Email the author and tell him:
The game's name
Any error message that appears on the screen when the game crashes.
What causes the crash and whether it's a repeatable crash or not.
The call stack
If you have good bandwidth, ask the author if he would like the core file that his program dumped. If he says yes, then send it. Remember to ask first, because core files can get very, very big.
6.6. Saved Games
If your game allows for saved games, then sending the author a copy of the saved game is useful because it helps the tech reproduce whatever is going wrong. For commercial games, this option is more fruitful than sending a core file or call stack since commercial games can't be recompiled to include debugging information. You should definitely ask before sending a save game file because they tend to be long, but gaming companies usually have lots of bandwidth. Mike Phillips (formerly of Loki Software) mentioned that sending in saved games to Loki is definitely a good thing.
Needless to say, this only applies if your game crashes reproducably at a certain point. If the game segfaults every time you run it, or is incredibly slow, a saved game file won't be of much help.
6.7. What to do when a file or library isn't being found (better living through strace)
Sometimes you'll see error messages that indicate a file wasn't found. The file could be a library:
% ./exult ./exult: error while loading shared library: libSDL-1.2.so.0: cannot load shared object file: No such file or directory |
or it could be some kind of data file, like a wad or map file:
% qf-client-sdl IP address 192.168.0.2:27001 UDP Initialize Error: W_LoadWadFile: couldn't load gfx.wad |
Suppose gfx.wad is already on my system, but couldn't be found because it isn't in the right directory. Then where IS the right directory? Wouldn't it be helpful to know where these programs looked for the missing files?
This is where strace shines. strace tells you what system calls are being made, with what arguments, and what their return values are. In my `Kernel Module Programming Guide' (due to be released to LDP soon), I outline everything you may want to know about strace. But here's a brief outline using the canonical example of what strace looks like. Give the command:
strace -o ./LS_LOG /bin/ls |
The -o option sends strace's output to a file; here, LS_LOG. The last argument to strace is the program we're inspecting, here, "ls". Look at the contents of LS_LOG. Pretty impressive, eh? Here is a typical line:
open(".", O_RDONLY|O_NONBLOCK|0x18000) = 4 |
We used the open() system call to open "." with various arguments, and the return value of the call is 4. What does this have to do with files not being found?
Suppose I want to watch the StateOfMind demo because I can't ever seem to get enough of it. One day I try to run it and something bad happens:
% ./mind.i86_linux.glibc2.1 Loading & massaging... Error:Can't open data file 'mind.dat'. |
Let's use strace to find out where the program was looking for the data file.
strace ./mind.i86_linux.glibc2.1 2> ./StateOfMind_LOG |
Pulling out vim and searching for all occurrences of mind.dat, I find the following lines:
open("/usr/share/mind.dat",O_RDONLY) = -1 ENOENT (No such file) write(2, "Error:", 6Error:) = 6 write(2, "Can\'t open data file \'mind.dat\'."..., ) = 33 |
It was looking for mind.dat in only one directory. Clearly, mind.dat isn't in /usr/share. Now we can try to locate mind.dat and move it into /usr/share, or better, create a symbolic link.
This method works for libraries too. Suppose the library libmp3.so.2 is in /usr/local/include but your new game "Kill-Metallica" can't find it. You can use strace to determine where Kill-Metallica was looking for the library and make a symlink from /usr/local/include/libmp3.so.2 to wherever Kill-Metallica was looking for the library file.
strace is a very powerful utility. When diagnosing why things aren't being found, it's your best ally, and is even faster than looking at source code. As a last note, you can't look up information in source code of commercial games from Lokisoft or Tribsoft. But you can still use strace with them!
6.8. Hosed consoles
Sometimes a game will exit abnormally and your console will get `hosed'. There are a few definitions of a hosed console. The text characters could look like gibberish. Your normally nice black screen could look like a quasi-graphics screen. When you press ENTER, a newline doesn't get echo'ed to the screen. Sometimes, certain keys of the keyboard won't respond. Logging out and back in don't always work, but there are a few things that might:
If you don't see any character on the screen as you type in, your terminal settings may be wrong. Try "stty echo". This should let input characters echo again.
At the prompt, type "reset". This should clear up many problems, including consoles hosed by an SVGAlib or ncurses based game.
Try running the game again and normally. Once I had to kill Quake III in a hurry, so I performed a Ctl-Alt-Backspace. The console was hosed with a quasi-graphics screen. Running Quake III and quitting normally fixed the problem.
The commands deallocvt and openvt will work for most of the other problems you'll have. deallocvt N kills terminal N entirely, so that Alt-FN doesn't even work anymore. openvt -c N starts it back up.
If certain keys on your keyboard don't work, be creative. If you want to reboot but the `o' key doesn't work, try using halt. One method I've come up with is typing a command at the prompt and using characters on the screen with mouse cut/paste. For example, you can type "ps ax", and you're sure to have an `h', `a', `l' and a `t' somewhere on the screen. You can use the mouse to cut and paste the word "halt".
The most regrettable option is a reboot. If you can, an orderly shutdown is preferable; use "halt" or "shutdown". If you can't, ssh in from a another machine. That sometimes works when your console is very badly hosed. In the worst case scenario, hit the reset or power switch.
Note that if you use a journalling filesystem like ext3, reiserfs or xfs, hitting the power switch isn't all that bad. You're still supposed to shutdown in an orderly manner, but the filesystem integrity will be maintained. You won't normally see an fsck for the partitions that use the journalling filesystem.
6.9. Locked System
When a computer "locks", also called "hung", the keyboard and mouse become completely unresponsive. This is a direct consequence of a bug in the Linux kernel. While Linux is known for stability, these things do happen, especiallly for gaming which entails highly synchronized hardware events which occur very fast, even to a computer. When a computer locks, it can be a "hard lock", meaning the kernel has completely stopped functioning. This often indicates misbehaving or faulty hardware. There's no recovery from this kind of lock other than pressing the reset or power button. The lock can also be a "soft lock", meaning that the kernel is still functioning in some capacity. It's possible to recover from this gracefully.
The first thing you should try is to hit control-alt-backspace which kills X. If you gain control of your system, the kernel wasn't really locked in the first place. If this didn't work after a few seconds, you'll definitely want to reboot the system using the following instructions.
Use control-alt-delete to reboot the system. You'll know this worked if you hear the computer beep after a few seconds (this is BIOS saying "I'm OK" during a power on cycle).
Log into another system and ssh into the hung system. If you can ssh in, reboot or halt the system.
If you can't ssh into the system, you'll need to use the "magic SysRq key" which is documented in /usr/src/linux/Documentation/sysrq.txt. Here's a summary for the x86 architecture (see the documentation for other architectures). Note if your keyboard doesn't have a SysRq key, use the PrintScreen key:
Hit alt-SysRq-s. This will attempt to sync your mounted filesystems so that changes to files get flushed to disk. You may hear disk activity. If you're looking at a console, the system should print the devices which were flushed.
A few seconds later, hit alt-SysRq-u. This will attempt to remount all your mounted filesystems as read-only). You should hear disk activity. If you're looking at a console, the system will print the devices which were remounted.
A few seconds later, use alt-SysRq-b to reboot the system.
You can hit alt-SysRq-h for a very terse help screen.
To use the magic SysRq key, your kernel needs to have been compiled with magic SysRq support. You'll find this option under "Kernel Hacking | Kernel Debugging | Magic SysRq key" in whatever kernel config menu you like to use. If the magic SysRq key sequence doesn't shut your system down gracefully, your kernel has crashed hard and you'll need to use the reset or power button to recover.