[Tutor] help with re.split use

Lloyd Kvam pythontutor at venix.com
Thu Jan 29 14:06:01 EST 2004


belatedly answerin all of the questions.

Michel Bélanger wrote:

> Hi,
> 
> I  use the re.split function to parse Fisrt and Last name from a contact 
> field.  I use the following command:   
> # -*- coding: cp1252 -*-
> import csv
> import re
> row = "Genevieve Camiré"
> contact = re.split(' ',row)
> print row, contact, len(contact)
> 
> This generate the following results:
> Genevieve Camiré ['Genevieve', 'Camir\xe9'] 2
> 
> 
> question1:  When I first ran the code, I had and I/O warning message 
> which  ended up with the addition of the first line of code  # -*- 
> coding: cp1252 -*-  What is it for?
Each string character has a range of 256 possible values.  0 - 127 have a very standard
definition between byte values and character representation, so 48 represents zero '0'.
The values from 128 - 255 vary depending upon the "code page".  cp1252 provides a
convenient mapping between byte values and characters needed in Western European
languages.  The "standard" ASCII mapping leaves 128-255 undefined.

\xe9 provides the hex value (e9) of the accented e character.  The decimal value would be
(14 * 16) + 9

http://www.joelonsoftware.com/articles/Unicode.html
readable description of unicode issues.
> 
> question2:  Why the word 'Camiré' got changed to 'Camir\xe9'
see above

> 
> question3: Some of the row from my contacts list contain only one word 
> which result with contact been a list of length 1.  Is it possible to 
> add an argument to the split function so that it generates an empty 
> string for the second item in contact list: i.e.
> 
> row = "Belanger"
> after split function is applied to row
> contact = ['Belanger','']

row.split(' ',1) would limit the length to two items.  I think you are stuck with
checking for lengths of 1 and appending '' for those cases in your code.

> 
> Thanks
> 
> 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

-- 
Lloyd Kvam
Venix Corp.
1 Court Street, Suite 378
Lebanon, NH 03766-1358

voice:	603-653-8139
fax:	801-459-9582




More information about the Tutor mailing list