How to 'de-slashify' a string?

Vlastimil Brom vlastimil.brom at gmail.com
Sat Aug 22 07:43:22 EDT 2009


2009/8/22 AK <ak at nothere.com>:
> Vlastimil Brom wrote:
>>
>> 2009/8/22 AK <ak at nothere.com>:
>>>
>>> Steven D'Aprano wrote:
>>>>
>>>> On Sat, 22 Aug 2009 04:20:23 -0400, AK wrote:
>>>>
>>>>> Hi, if I have a string '\\303\\266', how can I convert it to '\303\266'
>>>>> in a general way?
>>>>
>>>> It's not clear what you mean.
>>>>
>>>> Do you mean you have a string '\\303\\266', that is:
>>>>
>>>> backslash backslash three zero three backslash backslash two six six
>>>>
>>>> If so, then the simplest way is:
>>>>
>>>>>>> s = r'\\303\\266'  # note the raw string
>>>>>>> len(s)
>>>>
>>>> 10
>>>>>>>
>>>>>>> print s
>>>>
>>>> \\303\\266
>>>>>>>
>>>>>>> print s.replace('\\\\', '\\')
>>>>
>>>> \303\266
>>>>
>>>>
>>>> Another possibility:
>>>>
>>>>>>> s = '\\303\\266'  # this is not a raw string
>>>>>>> len(s)
>>>>
>>>> 8
>>>>>>>
>>>>>>> print s
>>>>
>>>> \303\266
>>>>
>>>> So s is:
>>>> backslash three zero three backslash two six six
>>>>
>>>> and you don't need to do any more.
>>>
>>> Well, I need the string itself to become '\303\266', not to print
>>> that way. In other words, when I do 'print s', it should display
>>> unicode characters if my term is set to show them, instead of
>>> showing \303\266.
>>>
>>>>
>>>>> The problem I'm running into is that I'm connecting with pygresql to a
>>>>> postgres database and when I get fields that are of 'char' type, I get
>>>>> them in unicode, but when I get fields of 'byte' type, I get the text
>>>>> with quoted slashes, e.g. '\303' becomes '\\303' and so on.
>>>>
>>>> Is pygresql quoting the backslash, or do you just think it is quoting
>>>> the
>>>> backslashes? How do you know? E.g. if you have '\\303', what is the
>>>> length
>>>> of that? 4 or 5?
>>>
>>> Length is 4, and I need it to be length of 1. E.g.:
>>>
>>>>>> s = '\303'
>>>>>> s
>>>
>>> '\xc3'
>>>>>>
>>>>>> x = '\\303'
>>>>>> x
>>>
>>> '\\303'
>>>>>>
>>>>>> len(x)
>>>
>>> 4
>>>>>>
>>>>>> len(s)
>>>
>>> 1
>>>
>>>
>>> What I get from pygresql is x, what I need is s. Either by asking
>>> pygresql
>>> to do this or convert it afterwards. I can't do replace('\\303', '\303')
>>> because it can be any unicode character.
>>>
>>>>
>>>
>>> --
>>> AK
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>>
>>
>> Hi,
>> do you mean something like
>>
>>>>> u"\u0303"
>>
>> u'\u0303'
>>>>>
>>>>> print u"\u0303"
>>
>> ̃
>>    ̃ (dec.: 771)  (hex.: 0x303)   ̃ COMBINING TILDE (Mark, Nonspacing)
>> ?
>>
>> vbr
>
> Yes, something like that except that it starts out as '\\303\\266', and it's
> good enough for me if it turns into '\303\266', in fact that's rendered as
> one unicode char. In other words, when you do:
>
>>>> print "\\303\\266"
> '\303\266'
>
> I need that result to become a python string, i.e. the slashes need to
> be converted from literal slashes to escape slashes.
>
>
>
>
> --
> AK
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Not sure, whether it is the right way of handling the such text data, but maybe:

>>> decoded = '\\303\\266'.decode("string_escape")
>>> decoded
'\xc3\xb6'
>>> print decoded
ö
>>> print '\303\266'
ö
>>>

It might be an IDLE issue, but it still isn't one unicode glyph.

I guess, you have to ensure, that the input data is valid and the
right encoding is used.

hth
  vbr



More information about the Python-list mailing list