Spanish Accents

Thu Dec 22 11:54:50 EST 2011

On Thu, Dec 22, 2011 at 12:42 PM, Peter Otten <__peter__ at web.de> wrote:

> The file is probably encoded in ISO-8859-1, ISO-8859-15, or cp1252 then:
>
> >>> print "\xe1".decode("iso-8859-1")
> á
> >>> print "\xe1".decode("iso-8859-15")
> á
> >>> print "\xe1".decode("cp1252")
> á
>
> Try codecs.open() with one of these encodings.
>

I'm baffled. I duplicated your print statements but when I run this code
(or any of the 3 encodings):

file = codecs.open(p + "2.txt", "r", "cp1252")
#file = codecs.open(p + "2.txt", "r", "utf-8")
for line in file:
  print line

I get this error:

*UnicodeEncodeError*: 'ascii' codec can't encode character u'\xe1' in
position 48: ordinal not in range(128)
      args = ('ascii', u'<i>Noticia: Este sitio web entre este portal
est...r\xe1pidamente va a salir de aqu\xed.</i><br /><br />\r\n', 48, 49,
'ordinal not in range(128)')
      encoding = 'ascii'
      end = 49
      object = u'<i>Noticia: Este sitio web entre este portal
est...r\xe1pidamente
va a salir de aqu\xed.</i><br /><br />\r\n'
      reason = 'ordinal not in range(128)'
      start = 48

Please advise. TIA,
Stan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20111222/0b9e7d49/attachment-0001.html>