[Tutor] Puzzled again

Dave Angel d at davea.name
Wed Aug 3 21:04:49 CEST 2011

On 08/03/2011 01:48 PM, Richard D. Moores wrote:
> On Wed, Aug 3, 2011 at 10:11, Peter Otten<__peter__ at web.de>  wrote:
>> <SNIP>
>> Dave was close, but Steven hit the nail: the string r"C:\Users\Dick\..." is
>> fine, but when you put it into the docstring it is not a raw string within
>> another string, it becomes just a sequence of characters that is part of the
>> outer string. As such \U marks the beginning of a special way to define a
>> unicode codepoint:
>> <snip>
> Here's from my last post:
> ====================================
> Now I edit it back to its original problem form:
> def convertPath(path):
>     """
>     Given a path with backslashes, return that path with forward slashes.
>     By Steven D'Aprano  07/31/2011 on Tutor list
>     >>>  path = r'C:\Users\Dick\Desktop\Documents\Notes\College Notes.rtf'
>     >>>  convertPath(path)
>     'C:/Users/Dick/Desktop/Documents/Notes/College Notes.rtf'
>     """<snip>

> Traceback (most recent call last):
>   File "<stdin>", line 1, in<module>
>   File "C:\Python32\lib\site-packages\mycalc2.py", line 10
>     """
> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
> in position 144-146: truncated \UXXXXXXX
> X escape
> Using HxD, I find that the bytes in 144-146 are 20, 54, 75 or  the
> <space>, 'T', 'u' of  " Tutor" .  A screen shot of HxD with this
> version of mycalc2.py open in it is at
> <http://www.rcblue.com/images/HxD.jpg>. You can see that I believe the
> offset integers are base-10 ints. I do hope that's correct, or I've
> done a lot of work for naught.
> ====================================
> So have I not used HxD correctly (my first time to use a hex reader)?
> If I have used it correctly, why do the reported problem offsets of
> 144-146 correspond to such innocuous things as 'T', 'u' and<space>,
> and which come BEFORE the problems you and Steven point out?
This one is my fault, for pointing you to the hex viewer.  Peter is 
correct.  But the offset is relative to the beginning of the 
triple-quoted string.
The problem has nothing to do with the encoding of the file itself, but 
instead just with the backslashes inside the triple-quoted string.  
Since you have a \U, the parser also expects 8 hex digits.  The thing 
that threw me was that this particular symptom is specific to Python 
3.x, which I don't normally use.

The following line would have the same problem:

mystring = "abc \Unexpected def"

since the letters nexpecte  don't spell out a valid hexcode.  You would 
instead want

mystring = r"abc \Unexpected def"



More information about the Tutor mailing list