[Tutor] Problems processing accented characters in ISO-8859-1 encoded texts

Josep M. Fontana josep.m.fontana at gmail.com
Thu Dec 23 10:54:02 CET 2010


Sorry! Sorry! Sorry! I just found out this question had already been
answered by Steven D'Aprano in another thread! The trick was to add
'\w' besides [a-zA-Z].

Please, accept my apologies. I devote time to this project whenever I
have some free time. I got very busy with other things at some point
and I stopped working on that. When I started again today, I had not
noticed that there was already an answer to the question I had posted
a while ago that actually solved my problem. Thanks again Steven. You
can consider the problem solved and this thread closed.

Josep M.

On Thu, Dec 23, 2010 at 10:25 AM, Josep M. Fontana
<josep.m.fontana at gmail.com> wrote:
> I am working with texts that are encoded as ISO-8859-1. I have
> included the following two lines at the beginning of my python script:
>
> !/usr/bin/env python
> # -*- coding: iso-8859-1 -*-
>
> If I'm not mistaken, this should tell Python that accented characters
> such as 'á', 'Á', 'ö' or 'è' should be considered as alpha-numeric
> characters and therefore matched with a regular expression of the form
> [a-zA-Z]. However, when I process my texts, all of the accented
> characters are matched as non alpha-numeric symbols. What am I doing
> wrong?
>
> I'm not including the whole script because I think the rest of the
> code is irrelevant. All that's relevant (I think) is that I'm using
> the regular expression '[^a-zA-Z\t\n\r\f\v]+' to match any string that
> includes non alpha-numeric characters and that returns 'á', 'Á', 'ö'
> or 'è' as well as other real non alpha-numeric characters.
>
> Has anybody else experienced this problem when working with texts
> encoded as ISO-8859-1 or UTF-8? Is there any additional flag or
> parameter that I should add to make the processing of these characters
> as regular word characters possible?
>
> Thanks in advance for your help.
>
> Josep M.
>


More information about the Tutor mailing list