String character encoding when converting data from one type/format to another

Jacob Kruger jacob at
Wed Jan 7 14:38:03 CET 2015


Makes more sense now, and yes, using 2.7 here.

Unfortunately, while could pass the binary values into blob fields well 
enough, using forms of parameterised statements, the actual generation of 
sql script text files is a step they want to work with at times, if someone 
is handling this on site, so had to work first with generating string 
values, and then handle executing those statements against a MySQL server 
later on using MySQLdb.

Stay well

Jacob Kruger
Blind Biker
Skype: BlindZA
"Roger Wilco wants to welcome the space janitor's closet..."

----- Original Message ----- 
From: "Peter Otten" <__peter__ at>
To: <python-list at>
Sent: Wednesday, January 07, 2015 2:11 PM
Subject: Re: String character encoding when converting data from one 
type/format to another

> Jacob Kruger wrote:
>> I'm busy using something like pyodbc to pull data out of MS access .mdb
>> files, and then generate .sql script files to execute against MySQL
>> databases using MySQLdb module, but, issue is forms of characters in
>> string values that don't fit inside the 0-127 range - current one seems 
>> to
>> be something like \xa3, and if I pass it through ord() function, it comes
>> out as character number 163.
>> Now issue is, yes, could just run through the hundreds of thousands of
>> characters in these resulting strings, and strip out any that are not
>> within the basic 0-127 range, but, that could result in corrupting data -
>> think so anyway.
>> Anyway, issue is, for example, if I try something like
>> str('\xa3').encode('utf-8') or str('\xa3').encode('ascii'), or
> "\xa3" already is a str; str("\xa3") is as redundant as
> str(str(str("\xa3"))) ;)
>> str('\xa3').encode('latin7') - that last one is actually our preferred
>> encoding for the MySQL database - they all just tell me they can't work
>> with a character out of range.
> encode() goes from unicode to byte; you want to convert bytes to unicode 
> and
> thus need decode().
> In this context it is important that you tell us the Python version. In
> Python 2 str.encode(encoding) is basically
> str.decode("ascii").encode(encoding)
> which is why you probably got a UnicodeDecodeError in the traceback:
>>>> "\xa3".encode("latin7")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/usr/lib/python2.7/encodings/", line 12, in encode
>    return codecs.charmap_encode(input,errors,encoding_table)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0:
> ordinal not in range(128)
>>>> "\xa3".decode("latin7")
> u'\xa3'
>>>> print "\xa3".decode("latin7")
> £
> Aside: always include the traceback in your posts -- and always read it
> carefully. The fact that "latin7" is not mentioned might have given you a
> hint that the problem was not what you thought it was.
>> Any thoughts on a sort of generic method/means to handle any/all
>> characters that might be out of range when having pulled them out of
>> something like these MS access databases?
> Assuming the data in Access is not broken and that you know the encoding
> decode() will work.
>> Another side note is for binary values that might store binary values, I
>> use something like the following to generate hex-based strings that work
>> alright when then inserting said same binary values into longblob fields,
>> but, don't think this would really help for what are really just most
>> likely badly chosen copy/pasted strings from documents, with strange
>> encoding, or something:
>> #sample code line for binary encoding into string output
>> s_values += "0x" + str(l_data[J][I]).encode("hex").replace("\\", "\\\\") 
>> +
>> ", "
> I would expect that you can feed bytestrings directly into blobs, without
> any preparatory step. Try it, and if you get failures show us the failing
> code and the corresponding traceback.
> -- 

More information about the Python-list mailing list