[Tutor] pseudo code --- search for <> tags

Magnus Lycka magnus@thinkware.se
Wed, 09 Oct 2002 22:47:12 +0200


At 16:06 2002-10-09 -0400, McCarney, James Alexander wrote:
>I want to search for all files in a folder/subfolder and output to a
>textfile.

If you want to recursively work through subdirectories,
you should have a look at os.path.walk.

>Locate the folder.

os.chdir(...)

>Locate the files.

glob.glob('*.xxx')

>Open files for reading

f =3D file(filename, 'rt')
text =3D f.read()

>Report the name and location of the file and put it in front of the=
 returned
>info.

Have a look at the os.path module to get things
portable.

>Take whatever begins with '<' and ends with '/>'

I'm not quite sure what you intend here. What would
you want from "XXX<asd>12123123<asd/>XXX"?
a) <asd>12123123<asd/>
or
b) <asd/>
?

import re
re.findall(r"(<.+?/>)",text)    # case a
re.findall(r"(<[^>]+?/>)",text) # case b

Actually, to make case a handle line breaks in tags you need

pat =3D re.compile(r"(<.+?/>)",re.DOTALL)
pat.findall(text)

I'll let you figure out the rest.


--=20
Magnus Lyck=E5, Thinkware AB
=C4lvans v=E4g 99, SE-907 50 UME=C5
tel: 070-582 80 65, fax: 070-612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se