phone number regex

Mike Fletcher mfletch at tpresence.com
Sun Aug 27 23:49:38 EDT 2000


Not really sure what you're looking for, so here are some examples.  Note:
in your code you are printing the line for each iteration of the for loop,
that prints the line between your output lines.  If that is what you want to
eliminate, comment out that line.

If what you're wanting to do is to just find and print the telephone numbers
without the ['','',''...] stuff, the "simple" function should give you that
effect.

If what you're asking for is to eliminate the "formatting" around the
numbers (the brackets, dashes etc.) see the more_robust function, which uses
regular expression groups to only extract the information, ignoring the
formatting.

Here is an example of the output...

p:\>findall
simple translation
(403)781-5200
(780)423-5600
(403)380-5325
(403)528-1900
(403)309-1100
907-868-7594
907-463-2551
256-238-9380
256-233-3081
9 numbers found in 9 records

using some python features
(403) 781-5200
(780) 423-5600
(403) 380-5325
(403) 528-1900
(403) 309-1100
(907) 868-7594
(907) 463-2551
(256) 238-9380
(256) 233-3081

Hope this helps,
Mike

8<_____________ findall.py _____________
data = '''AB Calgary  (403)781-5200
AB Edmonton  (780)423-5600
AB Lethbridge  (403)380-5325
AB Medicine Hat  (403)528-1900
AB Red Deer  (403)309-1100
AK Anchorage  907-868-7594
AK Juneau  907-463-2551
AL Anniston  256-238-9380
AL Athens  256-233-3081
a
b
c'''
					 
import re, string, sys

def simple( data ):
	numbers = re.findall( '\(?\d{3}\)?[- ]?\d{3}[- ]?\d{4}', data )
	for number in numbers:
		print number

	print '%s numbers found in %s records'%( len(numbers),
string.count(data, '\n')+1)

def more_robust( data ):
	lines = filter( None, string.split( data, '\n')) # easier on NT
	searcher = re.compile( '\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})')
	errors = []
	for line in lines: # use a fileinput if you prefer
		match = searcher.search( line )
		if match:
			print '(%s) %s-%s'%
(match.group(1),match.group(2),match.group(3) )
		else:
			errors.append( line )
	if errors:
		sys.stderr.write(  'Errors encountered, the following
records had no telephone numbers:\n' )
		for line in errors:
			sys.stderr.write(  line )
			if line and line[-1] != '\n':
				sys.stderr.write(  '\n' )

if __name__ == "__main__":
	print 'simple translation'
	simple( data )
	print
	print 'using some python features'
	more_robust( data )


-----Original Message-----
From: Tony Johnson [mailto:gjohnson at gs.verio.net]
Sent: Sunday, August 27, 2000 10:29 PM
To: python-list at python.org
Subject: Re: phone number regex


Thank You for the reply. Someone replied to this but I lost the email. 
But I have figured out my problem and I have just a small question.  I
created a regex that matched an area code ie. (555).  The regex is: 

bash-2.04$ cat wisp-test4
#!/usr/local/bin/python
 
import sys, string, re , fileinput
acode_tmpl = re.compile('[^\(\d][\D+][^\)\d]')
file = open(sys.argv[1], 'r')
for line in fileinput.input():
   line = string.strip(line)
   print line # this line prints the "interspersed lines"
   a = re.split(acode_tmpl,line)
   print a                

And it produces output like:

bash-2.04$ wisp-test4 networkg.txt
AB Calgary  (403)781-5200 # result of the "print line"
['', '', '', '', '(403)781-5200']
AB Edmonton  (780)423-5600
['', '', '', '', ' (780)423-5600']
...


I would like my script to not be include the text before and after the
pattern match.  Is there a switch I turn n in the re.switch function or
do I have to break the match further from here?
...




More information about the Python-list mailing list