Handling Special characters in python
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Jan 1 07:01:51 EST 2013
On Tue, 01 Jan 2013 03:35:56 -0800, anilkumar.dannina wrote:
> I am facing one issue in my module. I am gathering data from sql server
> database. In the data that I got from db contains special characters
> like "endash". Python was taking it as "\x96". I require the same
> character(endash). How can I perform that. Can you please help me in
> resolving this issue.
"endash" is not a character, it is six characters.
On the other hand, "\x96" is a single byte:
py> c = u"\x96"
py> assert len(c) == 1
But it is not a legal Unicode character:
py> import unicodedata
py> unicodedata.name(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
So if it is not a Unicode character, it is probably a byte.
py> c = "\x96"
py> print c
�
To convert byte 0x96 to an n-dash character, you need to identify the
encoding to use.
(Aside: and *stop* using it. It is 2013 now, anyone who is not using
UTF-8 is doing it wrong. Legacy encodings are still necessary for legacy
data, but any new data should always using UTF-8.)
CP 1252 is one possible encoding, but there may be others:
py> uc = c.decode('cp1252')
py> unicodedata.name(uc)
'EN DASH'
--
Steven
More information about the Python-list
mailing list