UTF-8 and latin1
dn
PythonList at DancesWithMice.info
Wed Aug 17 16:53:17 EDT 2022
On 18/08/2022 03.33, Stefan Ram wrote:
> Tobiah <toby at tobiah.org> writes:
>> I get data from various sources; client emails, spreadsheets, and
>> data from web applications. I find that I can do some_string.decode('latin1')
>
> Strings have no "decode" method. ("bytes" objects do.)
>
>> to get unicode that I can use with xlsxwriter,
>> or put <meta charset="latin1"> in the header of a web page to display
>> European characters correctly.
>
> |You should always use the UTF-8 character encoding. (Remember
> |that this means you also need to save your content as UTF-8.)
> World Wide Web Consortium (W3C) (2014)
>
>> am using data from the wild. It's frustrating that I have to play
>> a guessing game to figure out how to use incoming text. I'm just wondering
>
> You can let Python guess the encoding of a file.
>
> def encoding_of( name ):
> path = pathlib.Path( name )
> for encoding in( "utf_8", "cp1252", "latin_1" ):
> try:
> with path.open( encoding=encoding, errors="strict" )as file:
> text = file.read()
> return encoding
> except UnicodeDecodeError:
> pass
> return None
>
>> if there are any thoughts. What if we just globally decided to use utf-8?
>> Could that ever happen?
>
> That decisions has been made long ago.
Unfortunately, much of our data was collected long before then - and as
we've discovered, the OP is still living in Python 2 times.
What about if the path "name" (above) is not in utf-8?
eg the OP's Montréal in Latin1, as Montréal.txt or Montréal.rpt
--
Regards,
=dn
More information about the Python-list
mailing list