How to 'de-slashify' a string?

AK ak at nothere.com
Sun Aug 23 01:01:46 EDT 2009


Vlastimil Brom wrote:
> 2009/8/22 AK <ak at nothere.com>:
>> Vlastimil Brom wrote:
>>> 2009/8/22 AK <ak at nothere.com>:
>>>> Steven D'Aprano wrote:
>>>>> On Sat, 22 Aug 2009 04:20:23 -0400, AK wrote:
>>>>>
>>>>>> Hi, if I have a string '\\303\\266', how can I convert it to '\303\266'
>>>>>> in a general way?
>>>>> It's not clear what you mean.
>>>>>
>>>>> Do you mean you have a string '\\303\\266', that is:
>>>>>
>>>>> backslash backslash three zero three backslash backslash two six six
>>>>>
>>>>> If so, then the simplest way is:
>>>>>
>>>>>>>> s = r'\\303\\266'  # note the raw string
>>>>>>>> len(s)
>>>>> 10
>>>>>>>> print s
>>>>> \\303\\266
>>>>>>>> print s.replace('\\\\', '\\')
>>>>> \303\266
>>>>>
>>>>>
>>>>> Another possibility:
>>>>>
>>>>>>>> s = '\\303\\266'  # this is not a raw string
>>>>>>>> len(s)
>>>>> 8
>>>>>>>> print s
>>>>> \303\266
>>>>>
>>>>> So s is:
>>>>> backslash three zero three backslash two six six
>>>>>
>>>>> and you don't need to do any more.
>>>> Well, I need the string itself to become '\303\266', not to print
>>>> that way. In other words, when I do 'print s', it should display
>>>> unicode characters if my term is set to show them, instead of
>>>> showing \303\266.
>>>>
>>>>>> The problem I'm running into is that I'm connecting with pygresql to a
>>>>>> postgres database and when I get fields that are of 'char' type, I get
>>>>>> them in unicode, but when I get fields of 'byte' type, I get the text
>>>>>> with quoted slashes, e.g. '\303' becomes '\\303' and so on.
>>>>> Is pygresql quoting the backslash, or do you just think it is quoting
>>>>> the
>>>>> backslashes? How do you know? E.g. if you have '\\303', what is the
>>>>> length
>>>>> of that? 4 or 5?
>>>> Length is 4, and I need it to be length of 1. E.g.:
>>>>
>>>>>>> s = '\303'
>>>>>>> s
>>>> '\xc3'
>>>>>>> x = '\\303'
>>>>>>> x
>>>> '\\303'
>>>>>>> len(x)
>>>> 4
>>>>>>> len(s)
>>>> 1
>>>>
>>>>
>>>> What I get from pygresql is x, what I need is s. Either by asking
>>>> pygresql
>>>> to do this or convert it afterwards. I can't do replace('\\303', '\303')
>>>> because it can be any unicode character.
>>>>
>>>> --
>>>> AK
>>>> --
>>>> http://mail.python.org/mailman/listinfo/python-list
>>>>
>>>
>>> Hi,
>>> do you mean something like
>>>
>>>>>> u"\u0303"
>>> u'\u0303'
>>>>>> print u"\u0303"
>>> ̃
>>>    ̃ (dec.: 771)  (hex.: 0x303)   ̃ COMBINING TILDE (Mark, Nonspacing)
>>> ?
>>>
>>> vbr
>> Yes, something like that except that it starts out as '\\303\\266', and it's
>> good enough for me if it turns into '\303\266', in fact that's rendered as
>> one unicode char. In other words, when you do:
>>
>>>>> print "\\303\\266"
>> '\303\266'
>>
>> I need that result to become a python string, i.e. the slashes need to
>> be converted from literal slashes to escape slashes.
>>
>>
>>
>>
>> --
>> AK
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
> 
> Not sure, whether it is the right way of handling the such text data, but maybe:
> 
>>>> decoded = '\\303\\266'.decode("string_escape")
>>>> decoded
> '\xc3\xb6'
>>>> print decoded
> ö
>>>> print '\303\266'
> ö
> 
> It might be an IDLE issue, but it still isn't one unicode glyph.
> 
> I guess, you have to ensure, that the input data is valid and the
> right encoding is used.
> 
> hth
>   vbr

Actually, this works perfectly for me. It prints out as one character in
gnome-terminal and also when I write it to a text file, and open it as
utf-8 format in gnumeric, it also shows up properly.

Thanks to all who helped! -AK



More information about the Python-list mailing list