[Tutor] Reading and Writing
Ibraheem Umaru-Mohammed
iumarumo@eidosnet.co.uk
Fri, 30 Aug 2002 11:54:12 +0100
["Erik Price"="Erik"]
Erik >>
Erik >> On Thursday, August 29, 2002, at 11:48 AM, Ibraheem Umaru-Mohammed
Erik >> wrote:
Erik >>
Erik >> >Iti also probably neater to use the glob module, as glob returns full
Erik >> >pathnames, whereas with the os.listdir function, we find ourselves
Erik >> >constructing the path.
Erik >>
Erik >> Why does os.path.splitext() and fnmatch.fnmatch() use the Unix-style
Erik >> shell globbing syntax for pattern-matching? I find "glob"-style
Erik >> pattern-matching to be somewhat weak after having used regular
Erik >> expressions (especially PCREs). There must be some way to use (or some
Erik >> module that uses) regexes instead.
Erik >>
Erik >> Is that for easiness for newbies or to appease sysadmins used to bash?
Erik >> (I like bash too but Python seems like it should be more powerful than
Erik >> just globbing.)
Erik >>
Erik >> Either way, it's no match for regular expressions.
Erik >> (Hah! I kill me! ;)
Erik >>
You can do this by using the "re" module directly. For example,
<code>
#!/usr/bin/env python
import os
import re
text_directory = "/home/ibraheem/python_scripts/txt/"
html_directory = "/home/ibraheem/python_scripts/html/"
# match a filename that matches the following regular expression:
# \d = a single digit [0-9]
# \w+ = at least one word character [a-zA-Z]
# \d = a single digit [0-9]
# .htm = a literal
pattern = "\d\w+\d.htm"
for filename in os.listdir(html_directory):
# if we have a match, print the matching string.
m = re.match(pattern, filename)
if m: print repr(m.group(0))
<code/>
Gives the following,
<snip>
ibraheem@ignoramus:$ ls ~/python_scripts/html/
1foo1.htm 2foo2.htm 3foo3.htm 4foo4.htm 5foo5.htm 6foo6.htm
1.html 2.html 3.html 4.html 5.html 6.html
ibraheem@ignoramus:$ ./re-example.py
'1foo1.htm'
'2foo2.htm'
'3foo3.htm'
'4foo4.htm'
'5foo5.htm'
'6foo6.htm'
ibraheem@ignoramus:$
<snip/>
You can see that the [0-9].html files are NOT matched.
Kindest regards,
--ibz.
--
Ibraheem Umaru-Mohammed
"ibz"
umarumohammed (at) btinternet (dot) com