[Tutor] UnicodeDecodeError
David L Neil
PyTutor at DancesWithMice.info
Sun Mar 15 14:45:47 EDT 2020
On 16/03/20 4:41 AM, thehouse.be--- via Tutor wrote:
> I am a beginner, learning Python.
> So sorry if my question is basic.
>
> I am trying to work with .csv files in order to analyse data which comes from a Google Forms survey.
> Idea is to handle the raw data, do some statistical analysis and make a report.
>
> When trying to convert the data into a listy of lists, I get a UnicodeDecodeError.
>
> This is what I do:
>
>>>> import csv
>>>> exampleFile = open(‘example.csv’)
>>>> exampleReader = csv.reader(exampleFile)
>>>> exampleData = list(exampleReader)
>
> This last statement generates:
> —————————————————————————————————————
> UnicodeDecodeError Traceback (most recent call last)
> <ipython-input-9-3817c0931c6f> in <module>
> ----> 1 exampleData = list(exampleReader)
> /Applications/mu-editor.app/Contents/Resources/python/lib/python3.6/encodings/ascii.pyc in decode(self, input, final)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1798: ordinal not in range(128)
>
> I suppose there is a bizarre character somewhere in the file, but no idea where.
> As we use accents and umlauts in our language, could that be the problem?
> If that would be the problem, how to solve?
NB there are major differences in this area between Python2 and Python3.
I'm assuming you are using Python3.
The difficulty, as you say, is that the majority of the world's
population use languages which cannot be adequately-expressed using
ASCII (*American* Standard Code...) - which also makes this type of
question difficult to answer because of the many permutations and
combinations...
If the spreadsheet/original .CSV file was built using MS-Excel and/or on
a non-English-speaking MS-Windows machine, then it is highly likely we
need to harmonise this Python code with that characteristic.
Are you able to ascertain such detail? If not, you can probably make an
educated guess (given your analysis to-date).
Microsoft Windows tends to put European users into one of the ISO 8859-x
character sets. (but which one? Good news: we may not need to be
*exactly* correct in this choice!)
Python3 works with Unicode by default.
It is possible to encode and decode between "text encodings". Some
experimentation may be necessary.
Please let us know the results of your investigation/experiments, and/or
if that leads to further questions...
WebRefs:
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
https://docs.python.org/3/howto/unicode.html
https://docs.python.org/3/library/codecs.html
--
Regards =dn
More information about the Tutor
mailing list