10. How do I get a plain text man page without all that ^H^_ stuff?
Have a look at col(1), because col can filter out backspace sequences. Just in case you can't wait that long:
funnyprompt$ groff -t -e -mandoc -Tascii manpage.1 | col -bx > manpage.txt
The -t and -e switches tell groff to preprocess using tbl and eqn. This is overkill for man pages that don't require preprocessing but it does no harm apart from a few CPU cycles wasted. On the other hand, not using -t when it is actually required does harm: the table is terribly formatted. You can even find out (well, "guess" is a better word) what command is needed to format a certain groff document (not just man pages) by issuing
funnyprompt$ grog /usr/man/man7/signal.7 groff -t -man /usr/man/man7/signal.7 |
"Grog" stands for "GROff Guess", and it does what it says--guess. If it were perfect we wouldn't need options any more. I've seen it guess incorrectly on macro packages and on preprocessors. Here is a little perl script I wrote that can delete the page headers and footers, thereby saving you a few pages (and mother nature a tree) when printing long and elaborate man pages. Save it in a file named strip-headers & chmod 755.
#!/usr/bin/perl -wn # make it slurp the whole file at once: undef $/; # delete first header: s/^\n*.*\n+//; # delete last footer: s/\n+.*\n+$/\n/g; # delete page breaks: s/\n\n+[^ \t].*\n\n+(\S+).*\1\n\n+/\n/g; # collapse two or more blank lines into a single one: s/\n{3,}/\n\n/g; # see what's left... print; |
You have to use it as the first filter after the man command as it relies on the number of newlines being output by groff. For example:
funnyprompt$ man bash | strip-headers | col -bx > bash.txt