[Tutor] Re: Convert man file into readable text file

Derrick 'dman' Hudson dman@dman.ddts.net
Thu, 13 Jun 2002 21:13:26 -0500


--Bn2rw/3z4jIqBvZU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 14, 2002 at 07:17:28AM +0700, Mico Siahaan wrote:
| I'm a newbie. I want to make a python script to convert a man file into a=
=20
| readable text file

$ man

It outputs text itself.

$ man | vim -

Ok, so it has backspace characters in it to achieve bold and
underline on terminals.

(in vim)

:%s/.^H//g

(where ^H is a literal Ctrl-H character, entered by pressing ^V^H)


Even better yet, in your .vimrc put this line :
    " enable the :Man command
    runtime ftplugin/man.vim

and in ~/.bashrc put this :
    function man()
    {
        vim -c 'set nolist' -c ":Man $@"
    }

then when you run 'man' in your shell you'll see the man page (in full
color!) in a vim buffer.  In addition, when you are in vim you can
type :Man <foo>  to get <foo>'s manpage in a buffer.  It's really
cool.

(note: this requires vim 6 or newer and the man.vim plugin distributed
with it)

| so I can edit it with a text editors.

If you want to edit a manpage you should learn troff.  Man pages are
written in troff format, and then troff (or groff) processes them to
generate the properly formatted output for your display.  Dumping out
plain ascii and editing that won't have any long-term effects.  (eg
the maintainer of the manpage isn't going to accept a patch from it)

| So far, I made this:
|=20
| import string
| fin =3D open("wget.man","r")
| fout =3D open("wget.test","w")
|=20
| while 1:
|         line =3D fin.readline()
|         if line =3D=3D "":
|                 break
|         for ch in line:
|                 if ch not in string.printable:
|                         idx =3D line.find(ch)
|                         temp =3D line[:idx] + line[idx+1:]
|                         newline =3D temp
|                 else:
|                         newline =3D line
|         fout.write(newline)
|=20
| fin.close()
| fout.close()
|=20
| And it gives a wrong result.

How is it wrong?  I'm not going to guess.  Actuall, I will.  It's
wrong because all the characters in the troff source are already
printable so you won't have changed anything.

| I understand it is because I don't understand the structure of man
| files.

groff does.  It is a troff processor.

| Can anyone give me a correct example, please? Thanks.

One way is using the existing tools on the shell.  It would require
less effort :
    $ man man | sed -e 's/.\x08//g' > man.text

(or the equivalent vim commands given above)
   =20
If you want to use python instead of vim or sed or somesuch existing
tool :

import os , re
input =3D os.popen( "man man" )
out =3D open( "output.text" , "w" )
for line in input :  # requires python 2.2
    out.write( re.sub( r'.\x08' , '' , line ) )
out.close()
in.close()
   =20

(it does the same thing as all the other solutions I presented, but is
clearly a lot more work/code)

HTH,
-D

--=20

The light of the righteous shines brightly,
but the lamp of the wicked is snuffed out.
        Proverbs 13:9
=20
http://dman.ddts.net/~dman/


--Bn2rw/3z4jIqBvZU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj0JUUYACgkQO8l8XBKTpRQACACfYZpCW3kHG5KVj1Osrr3V9tjP
VAAAoIPsc4WWeSIk5whzkMR2wfxpa4Q9
=2kMQ
-----END PGP SIGNATURE-----

--Bn2rw/3z4jIqBvZU--