[Tutor] Checking telephone numbers: re or strings

Charlie Clark charlie@begeistert.org
Fri Jan 31 05:21:02 2003


> Hi Charlie,
>=20
> Out of curiosity, is it ever possible for two country codes to=20
> "conflict"? That is, is it ever possible that something like this might=20
> happen?
>=20
>     access =3D {'America':'1', ..., 'Pythonia' : '123'}
>=20
> Just wondering.
apparently not because this might cause an error when dialling the number
+1-234-712-4667 may be a valid number in North America for all I know but=20
it would rooted to Pythonia. So we don't have to worry about that problem.
=20
> This works.  We can speed things up slightly.  We don't actually need to=20
> do:
>=20
>     access =3D codes.values()
>=20
> because, in recent versions of Python, it's perfectly ok to check for=20
> things inside dictionaries by using 'in':
mm, I've heard of this. Which version did it come in? I've kind of got used=20
to the explicit calls to keys(), values() and items()
=20
> ###
> >>> digits =3D {'zero': 0, 'one':1, 'two':2, 'three' : 3, 'four':4,
> ...           'five': 5, 'six':6, 'seven':7, 'eight':8, 'nine':9}
> >>> 'zero' in digits
> 1
> >>> 'twenty' in digits
> 0
> ###
>=20
> and the lookup check for key membership is much faster than scanning for=20
> membership in a list.=20
agreed

> In your code, 'access' is a dictionary that maps countries to their=20
> respective country codes.  If you make a reversed map --- that is,=20
> country codes to countries --- you may find parts of your program easier=20
> to do.
Yes, I know. It's constructed from the database where the numbers are=20
stored as numbers but as I have convert them into a string for the match I=20
can use them as keys.
=20
> Yes, the loop might be a bit expensive.  However, we can improve the=20
> situation: what about constructing a large regular expression out of all=20
> those country codes?
Good idea! What about it? ;-) Lack of experience.
=20
> ###
> >>> prefixes =3D ['1', '49', '353']
> >>> import re
> >>> pattern =3D re.compile('|'.join(prefixes))
> >>>
> >>> numbers =3D ['491787826226', '1721545662', '001745648324'] for n in=20
> >>> numbers:
> ...     print n, pattern.match(n).group(0)
> ...
> 491787826226 49
> 1721545662 1
> 001745648324
> Traceback (most recent call last):
>   File "<stdin>", line 2, in ?
> AttributeError: 'NoneType' object has no attribute 'group' ###
=20
> Ok, the last one didn't work out because the regular expression didn't=20
> match against '001745648324', but that's a situation we can properly=20
> detect, if we write a little more code.
Yes, that was the raw list. I've already written something to detect the=20
"00" at the beginning as this is just an access code for international=20
dialling outside of North America. However, there will be other cases of=20
none matches which means I need to catch those errors and give them an=20
empty code.
=20
> I think this approach --- compiling a regular expression at runtime ---=20
> should still be pretty darn fast, since we avoid doing loops over=20
> individual regular expressions.
Sounds good to me. I'll give a try.
=20
> There are other techniques we can use to make this go even faster, but=20
> let's see how far the standard regular expressions can take us.
No need to over-optimise! There are only a couple of thousand numbers. If=20
my expensive loop method only takes a minute.

Thanx for helping me understand re's a bit better!

Charlie
--=20
Charlie Clark
Helmholtzstr. 20
D=FCsseldorf
D- 40215
Tel: +49-211-938-5360