[python-win32] String weirdness on python 2.4 / windows
Steve Holden
steve at holdenweb.com
Thu Oct 20 09:00:50 CEST 2005
Kinsley Turner wrote:
>
>>>Similarly I have a string with the IBM-extended-ASCII degrees symbol
>>>(ascii 0xb0)
>>>that is read in from a network-connected field device. Somehow this
>
> ends
>
>>>up with
>>>an extended 'A' (with a single dot over it.) prepended before it.
>
>
>>This question is inappropriate for this mailing list, which is for the
>>pywin32 extensions, which you don't appear to be using. You should send
>>the question to python-list.
>
>
> Really? Ok.
> The list index (http://mail.python.org/mailman/listinfo) says
> this list is for "Python on win32".
>
Which understandably misled you, but this list is specifically for
issues concerning the use of the win32all extension modules. General
Python help is best obtained fro python-list at python.org, whether on
Windows or any other platform.
>
>>What does "spit it back down on a web request for /favicon.ico" mean???
>>What "it" comes back from "where" corrupted?
>
>
> When I say 'spit', you can read this as "transmits to the requesting
> web-browser using HTTP protocol over a BSD-style socket layering
> atop TCP/IP".
>
>
>>What does corrupted mean?
>
>
> No longer in it's original form.
> In this case it looks like bytes have been removed & changed.
>
>
>>It would help greatly if you showed the actual code that causes the
>>alleged problem. Even better would be to cut that out and make it
>>into a small standalone script that demonstrates the problem. Also
>>stating the expected or preferred result would be a good idea.
>
>
> Yes I agree, but not all problems can be rendered down into a simple
> example easily. You see in this case the string is rendered into
> an image and served back to a web browser. I had hoped for a simple
> answer like "Add encoding header XYZ to your script". Alas.
>
>
>>AFAIK ASCII describes only characters with ordinals in range(0x80).
>
>
> Yes that's true, original ASCII was only 7 bit.
> Obviously I was referring to ISO-8859-1, commonly referred to as
> 'Extended ASCII' and IIRC popularised by IBM in the 80's (70's?).
> It's been the dominant latin character set for 20 years or so.
> But you knew this already.
>
>
>>Perhaps you mean that you have a string which contains '\xb0'.
>
>
> No.
> The string is encoded as a single symbol within the python string,
> not a trigraph (quadgraph?).
>
In Python, of course, '\xb0' is a single-byte string literal containing
only the character whose integer value is hex B0.
>
>>"Somehow" -- unless you have pixies at the bottom of your garden, it
>>got mangled because *YOU* did something to it. If you can't tell us
>>what you did to it, we can't help you.
>
>
> The string in question is a test-string. Basically it contains a
> few words, then "1234567890 !C" where the '!' is the degrees-symbol
> as specified in Extended ASCII / ISO-8859-*. The text is then rendered
> into an image (using a supplied GNU TTF font) via the Python Imaging
> Library
> (PIL). Under UNIX this comes out as expected, with the degrees symbol
> rendered appropriately. When run as a Win32 service the rendering
> comes out with an accented 'A' in front of it.
>
>
>>"Extended A with a single dot over it"?? Are you sure that's a dot? I'd
>>like to know what language uses A with dot above *and* what the pixies
>>are using to render it on your screen. Could it possibly be a circumflex
>
>
>>accent?
>
>
> I thought it was a 'Å' (pasted in an 'A' with a dot above it,
> well, ok, maybe I should have said 'circle above it') Depending
> on your font & size sometimes the dot is connected to the top of
> the A, in other's it hovers. Try it: print u'\xc5'. This is from an
> ISO-8859-1 encoding, YMMV. Sorry, I don't know what language
> uses this character either. It's being rendered by the PIL.
>
>
>
>>Hint (1): u'\xb0'.encode('utf-8') produces '\xc2\xb0'. If that is
>>displayed by a gadget that's expecting iso-8859-1 (or cp1252) instead of
>
>
>>utf-8, '\xc2' will show as Latin capital A with circumflex.
>
>
> Hmmm, I wonder if PIL is doing some kind of modification to the string
> before rendering it. This question might better be posed to the PIL list.
> I can't really control what the device sends back, but I think this
> is not the only Extended ASCII / ISO-8859-* character it delivers.
>
> <lightbulb>Ahhh... I think I've got it.</lightbulb>
>
>>Hint (2): print repr(allegedly_mangled_string)
>
>
> This gives me '1234567890\xb0C' from a UNIX python (2.4.2 #1)
> Win32 (python 2.4.2 #64) gives the same. So it mustn't be something
> to do with the python / string representation.
>
>
>>"prepended before it" -- as opposed to "prepended after it"?
>
>
> That would be "'appended' after it", wouldn't it?
> (or are you just trying to pick a spoonerism)
>
>
>
>>Perhaps we should avoid Westpac ATMs until you sound the all-clear :-)
>
>
> Last time I looked these ran on OS/2[1], so I think you'll be safe.
>
>
>
> thanks for the hints,
> -kt
>
>
>
> [1] At least the old ones anyway.
>
>
>
Anyway, it looks like you are on the track of what appears to be a
character set or encoding issue. Good luck.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
More information about the Python-win32
mailing list