[Tutor] Reading and Writing

Ibraheem Umaru-Mohammed iumarumo@eidosnet.co.uk
Fri, 30 Aug 2002 11:54:12 +0100


["Erik Price"="Erik"]
Erik >> 
Erik >> On Thursday, August 29, 2002, at 11:48  AM, Ibraheem Umaru-Mohammed 
Erik >> wrote:
Erik >> 
Erik >> >Iti also probably neater to use the glob module, as glob returns full
Erik >> >pathnames, whereas with the os.listdir function, we find ourselves
Erik >> >constructing the path.
Erik >> 
Erik >> Why does os.path.splitext() and fnmatch.fnmatch() use the Unix-style 
Erik >> shell globbing syntax for pattern-matching?  I find "glob"-style 
Erik >> pattern-matching to be somewhat weak after having used regular 
Erik >> expressions (especially PCREs).  There must be some way to use (or some 
Erik >> module that uses) regexes instead.
Erik >> 
Erik >> Is that for easiness for newbies or to appease sysadmins used to bash?  
Erik >> (I like bash too but Python seems like it should be more powerful than 
Erik >> just globbing.)
Erik >> 
Erik >> Either way, it's no match for regular expressions.
Erik >> (Hah!  I kill me!  ;)
Erik >> 

You can do this by using the "re" module directly. For example,

			<code>
#!/usr/bin/env python

import os
import re 

text_directory = "/home/ibraheem/python_scripts/txt/"
html_directory = "/home/ibraheem/python_scripts/html/"

# match a filename that matches the following regular expression:
# 	\d	= a single digit [0-9]
# 	\w+	= at least one word character [a-zA-Z]
#	\d	= a single digit [0-9]
#	.htm	= a literal
pattern = "\d\w+\d.htm"
for filename in os.listdir(html_directory):
  # if we have a match, print the matching string.
  m = re.match(pattern, filename)
  if m: print repr(m.group(0))
  
			<code/>

Gives the following,

			<snip>
ibraheem@ignoramus:$ ls ~/python_scripts/html/
1foo1.htm  2foo2.htm  3foo3.htm  4foo4.htm  5foo5.htm  6foo6.htm
1.html     2.html     3.html     4.html     5.html     6.html
ibraheem@ignoramus:$ ./re-example.py
'1foo1.htm'
'2foo2.htm'
'3foo3.htm'
'4foo4.htm'
'5foo5.htm'
'6foo6.htm'
ibraheem@ignoramus:$
			<snip/>

You can see that the [0-9].html files are NOT matched.

Kindest regards,

				--ibz.
-- 
				Ibraheem Umaru-Mohammed 
					"ibz"
			umarumohammed (at) btinternet (dot) com