[Tutor] help with re.split use
Lloyd Kvam
pythontutor at venix.com
Thu Jan 29 14:06:01 EST 2004
belatedly answerin all of the questions.
Michel Bélanger wrote:
> Hi,
>
> I use the re.split function to parse Fisrt and Last name from a contact
> field. I use the following command:
> # -*- coding: cp1252 -*-
> import csv
> import re
> row = "Genevieve Camiré"
> contact = re.split(' ',row)
> print row, contact, len(contact)
>
> This generate the following results:
> Genevieve Camiré ['Genevieve', 'Camir\xe9'] 2
>
>
> question1: When I first ran the code, I had and I/O warning message
> which ended up with the addition of the first line of code # -*-
> coding: cp1252 -*- What is it for?
Each string character has a range of 256 possible values. 0 - 127 have a very standard
definition between byte values and character representation, so 48 represents zero '0'.
The values from 128 - 255 vary depending upon the "code page". cp1252 provides a
convenient mapping between byte values and characters needed in Western European
languages. The "standard" ASCII mapping leaves 128-255 undefined.
\xe9 provides the hex value (e9) of the accented e character. The decimal value would be
(14 * 16) + 9
http://www.joelonsoftware.com/articles/Unicode.html
readable description of unicode issues.
>
> question2: Why the word 'Camiré' got changed to 'Camir\xe9'
see above
>
> question3: Some of the row from my contacts list contain only one word
> which result with contact been a list of length 1. Is it possible to
> add an argument to the split function so that it generates an empty
> string for the second item in contact list: i.e.
>
> row = "Belanger"
> after split function is applied to row
> contact = ['Belanger','']
row.split(' ',1) would limit the length to two items. I think you are stuck with
checking for lengths of 1 and appending '' for those cases in your code.
>
> Thanks
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
--
Lloyd Kvam
Venix Corp.
1 Court Street, Suite 378
Lebanon, NH 03766-1358
voice: 603-653-8139
fax: 801-459-9582
More information about the Tutor
mailing list