[Tutor] Puzzled again

Wed Aug 3 21:10:16 CEST 2011

Richard D. Moores wrote:

> On Wed, Aug 3, 2011 at 10:11, Peter Otten <__peter__ at web.de> wrote:
>> Richard D. Moores wrote:
>>
>>> I wrote before that I had pasted the function (convertPath()) from my
>>> initial post into mycalc.py because I had accidentally deleted it from
>>> mycalc.py. And that there was no problem importing it from mycalc.
>>> Well, I was mistaken (for a reason too tedious to go into). There WAS
>>> a problem, the same one as before.
>>
>> Dave was close, but Steven hit the nail: the string r"C:\Users\Dick\..."
>> is fine, but when you put it into the docstring it is not a raw string
>> within another string, it becomes just a sequence of characters that is
>> part of the outer string. As such \U marks the beginning of a special way
>> to define a unicode codepoint:
>>
>>>>> "\U00000041"
>> 'A'
>>
>> As "sers\Dic", the eight characters following the \U in your docstring,
>> are not a valid hexadecimal number you get an error message.
>>
>> The solution is standard procedure: escape the backslash or use a
>> rawstring:
>>
>> Wrong:
>>
>>>>> """yadda r"C:\Users\Dick\..." yadda"""
>> File "<stdin>", line 1
>> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
>> position 10-12: truncated \UXXXXXXXX escape
>>
>> Correct:
>>
>>>>> """yadda r"C:\\Users\Dick\..." yadda"""
>> 'yadda r"C:\\Users\\Dick\\..." yadda'
>>
>> Also correct:
>>
>>>>> r"""yadda r"C:\Users\Dick\..." yadda"""
>> 'yadda r"C:\\Users\\Dick\\..." yadda'
> 
> Here's from my last post:
> 
> ====================================
> Now I edit it back to its original problem form:
> 
> def convertPath(path):
>    """
>    Given a path with backslashes, return that path with forward slashes.
> 
>    By Steven D'Aprano  07/31/2011 on Tutor list
>    >>> path = r'C:\Users\Dick\Desktop\Documents\Notes\College Notes.rtf'
>    >>> convertPath(path)
>    'C:/Users/Dick/Desktop/Documents/Notes/College Notes.rtf'
>    """
>    import os.path
>    separator = os.path.sep
>    if separator != '/':
>        path = path.replace(os.path.sep, '/')
>    return path
> 
> and get
> 
> C:\Windows\System32>python
> Python 3.2.1 (default, Jul 10 2011, 20:02:51) [MSC v.1500 64 bit
> (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import mycalc2
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "C:\Python32\lib\site-packages\mycalc2.py", line 10
>    """
> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
> in position 144-146: truncated \UXXXXXXX
> X escape
> 
> Using HxD, I find that the bytes in 144-146 are 20, 54, 75 or  the
> <space>, 'T', 'u' of  " Tutor" .  A screen shot of HxD with this
> version of mycalc2.py open in it is at
> <http://www.rcblue.com/images/HxD.jpg>. You can see that I believe the
> offset integers are base-10 ints. I do hope that's correct, or I've
> done a lot of work for naught.
> ====================================
> 
> So have I not used HxD correctly (my first time to use a hex reader)?
> If I have used it correctly, why do the reported problem offsets of
> 144-146 correspond to such innocuous things as 'T', 'u' and <space>,
> and which come BEFORE the problems you and Steven point out?

Come on, put that r before the docstring

def convertPath(path):
    r"""
    Given a path with backslashes, return that path with forward slashes.
    ...

and see the problem go away. The numbers reported by Python are byte offsets 
within the string literal:

>>> "\Uinvalidnumber"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in 
position 0-2: truncated \UXXXXXXXX escape
>>> "1\Uinvalidnumber"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in 
position 1-3: truncated \UXXXXXXXX escape
>>> "12\Uinvalidnumber"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in 
position 2-4: truncated \UXXXXXXXX escape
>>> "123\Uinvalidnumber"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in 
position 3-5: truncated \UXXXXXXXX escape