phone number regex
Mike Fletcher
mfletch at tpresence.com
Sun Aug 27 23:49:38 EDT 2000
Not really sure what you're looking for, so here are some examples. Note:
in your code you are printing the line for each iteration of the for loop,
that prints the line between your output lines. If that is what you want to
eliminate, comment out that line.
If what you're wanting to do is to just find and print the telephone numbers
without the ['','',''...] stuff, the "simple" function should give you that
effect.
If what you're asking for is to eliminate the "formatting" around the
numbers (the brackets, dashes etc.) see the more_robust function, which uses
regular expression groups to only extract the information, ignoring the
formatting.
Here is an example of the output...
p:\>findall
simple translation
(403)781-5200
(780)423-5600
(403)380-5325
(403)528-1900
(403)309-1100
907-868-7594
907-463-2551
256-238-9380
256-233-3081
9 numbers found in 9 records
using some python features
(403) 781-5200
(780) 423-5600
(403) 380-5325
(403) 528-1900
(403) 309-1100
(907) 868-7594
(907) 463-2551
(256) 238-9380
(256) 233-3081
Hope this helps,
Mike
8<_____________ findall.py _____________
data = '''AB Calgary (403)781-5200
AB Edmonton (780)423-5600
AB Lethbridge (403)380-5325
AB Medicine Hat (403)528-1900
AB Red Deer (403)309-1100
AK Anchorage 907-868-7594
AK Juneau 907-463-2551
AL Anniston 256-238-9380
AL Athens 256-233-3081
a
b
c'''
import re, string, sys
def simple( data ):
numbers = re.findall( '\(?\d{3}\)?[- ]?\d{3}[- ]?\d{4}', data )
for number in numbers:
print number
print '%s numbers found in %s records'%( len(numbers),
string.count(data, '\n')+1)
def more_robust( data ):
lines = filter( None, string.split( data, '\n')) # easier on NT
searcher = re.compile( '\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})')
errors = []
for line in lines: # use a fileinput if you prefer
match = searcher.search( line )
if match:
print '(%s) %s-%s'%
(match.group(1),match.group(2),match.group(3) )
else:
errors.append( line )
if errors:
sys.stderr.write( 'Errors encountered, the following
records had no telephone numbers:\n' )
for line in errors:
sys.stderr.write( line )
if line and line[-1] != '\n':
sys.stderr.write( '\n' )
if __name__ == "__main__":
print 'simple translation'
simple( data )
print
print 'using some python features'
more_robust( data )
-----Original Message-----
From: Tony Johnson [mailto:gjohnson at gs.verio.net]
Sent: Sunday, August 27, 2000 10:29 PM
To: python-list at python.org
Subject: Re: phone number regex
Thank You for the reply. Someone replied to this but I lost the email.
But I have figured out my problem and I have just a small question. I
created a regex that matched an area code ie. (555). The regex is:
bash-2.04$ cat wisp-test4
#!/usr/local/bin/python
import sys, string, re , fileinput
acode_tmpl = re.compile('[^\(\d][\D+][^\)\d]')
file = open(sys.argv[1], 'r')
for line in fileinput.input():
line = string.strip(line)
print line # this line prints the "interspersed lines"
a = re.split(acode_tmpl,line)
print a
And it produces output like:
bash-2.04$ wisp-test4 networkg.txt
AB Calgary (403)781-5200 # result of the "print line"
['', '', '', '', '(403)781-5200']
AB Edmonton (780)423-5600
['', '', '', '', ' (780)423-5600']
...
I would like my script to not be include the text before and after the
pattern match. Is there a switch I turn n in the re.switch function or
do I have to break the match further from here?
...
More information about the Python-list
mailing list