[Tutor] Puzzled again

Wed Aug 3 21:04:49 CEST 2011

On 08/03/2011 01:48 PM, Richard D. Moores wrote:
> On Wed, Aug 3, 2011 at 10:11, Peter Otten<__peter__ at web.de>  wrote:
>
>> <SNIP>
>> Dave was close, but Steven hit the nail: the string r"C:\Users\Dick\..." is
>> fine, but when you put it into the docstring it is not a raw string within
>> another string, it becomes just a sequence of characters that is part of the
>> outer string. As such \U marks the beginning of a special way to define a
>> unicode codepoint:
>> <snip>
> Here's from my last post:
>
> ====================================
> Now I edit it back to its original problem form:
>
> def convertPath(path):
>     """
>     Given a path with backslashes, return that path with forward slashes.
>
>     By Steven D'Aprano  07/31/2011 on Tutor list
>     >>>  path = r'C:\Users\Dick\Desktop\Documents\Notes\College Notes.rtf'
>     >>>  convertPath(path)
>     'C:/Users/Dick/Desktop/Documents/Notes/College Notes.rtf'
>     """<snip>

> Traceback (most recent call last):
>   File "<stdin>", line 1, in<module>
>   File "C:\Python32\lib\site-packages\mycalc2.py", line 10
>     """
> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
> in position 144-146: truncated \UXXXXXXX
> X escape
>
> Using HxD, I find that the bytes in 144-146 are 20, 54, 75 or  the
> <space>, 'T', 'u' of  " Tutor" .  A screen shot of HxD with this
> version of mycalc2.py open in it is at
> <http://www.rcblue.com/images/HxD.jpg>. You can see that I believe the
> offset integers are base-10 ints. I do hope that's correct, or I've
> done a lot of work for naught.
> ====================================
>
> So have I not used HxD correctly (my first time to use a hex reader)?
> If I have used it correctly, why do the reported problem offsets of
> 144-146 correspond to such innocuous things as 'T', 'u' and<space>,
> and which come BEFORE the problems you and Steven point out?
>
This one is my fault, for pointing you to the hex viewer.  Peter is 
correct.  But the offset is relative to the beginning of the 
triple-quoted string.
The problem has nothing to do with the encoding of the file itself, but 
instead just with the backslashes inside the triple-quoted string.  
Since you have a \U, the parser also expects 8 hex digits.  The thing 
that threw me was that this particular symptom is specific to Python 
3.x, which I don't normally use.

The following line would have the same problem:

mystring = "abc \Unexpected def"

since the letters nexpecte  don't spell out a valid hexcode.  You would 
instead want

mystring = r"abc \Unexpected def"

-- 

DaveA