Jython: Upper-ASCII characters '\351' from chr(233)
Maurice Bauhahn
bauhahnm at clara.net
Tue Apr 24 20:23:07 EDT 2001
Thank you very much for your persistent help.
I was able to get the 8th bit characters to act as keys...with a somewhat
complex construction: chr(int(linesplit[0])). Linesplit had decimal
numbers in text format.
linesplit = split('\t',encodingline)
if (len(linesplit) > 5):
try:
templist = linesplit[2:4]
templist.append(split(';|:',linesplit[4]))
templist.append(strip(linesplit[5]))
encodedict[chr(int(linesplit[0]))] = templist
print templist
except ValueError:
logerror('My error', linesplit[0])
else:
logerror('Not >5 fields long', linesplit)
D-Man wrote:
> On Fri, Apr 20, 2001 at 09:45:33PM +0100, Maurice Bauhahn wrote:
> | Thank you for the suggestion, D-Man.
> |
> | However, I doubt that this is a problem with the display, because I
> | can see all these unusual characters when I print a line of text to
> | the screen. The problem becomes obvious when I try one of those
> | upper ASCII characters as a key of the dictionary...it does not
> | work. My hope is to compare each character from a text file...and
>
> How do you know it doesn't work? I have heard that all strings in
> Jython are Unicode because all Java strings are Unicode (or something
> like that).
>
> Say...I just tried it again, using Jython 2.0 and CPython 2.1. If I
> type
>
> print chr( 233 )
>
> I get an accented e in CPython and something else from Jython, but not
> the '\351' from before. Actually in CPython I get '\xe9' if I just
> call chr. It might be a difference between str() and repr().
>
> If you can enter the character into your file, putting a 'u' in front
> of the string specifies it as unicode. Ex :
>
> print u'é'
>
> Say, what if you use the 'unichr' function? There might be a
> difference between chr and unichr (in CPython there is).
>
> Here is a snippet, CPython first, then Jython :
>
> >>> unichr( 8218 )
> u'\u201a'
> >>> print unichr( 8218 )
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> >>> ord( 'é' )
> 8218
> >>> unichr( 233 )
> '\351'
> >>> unichr( 8218 )
> u'\u201A'
> >>> print unichr( 8218 )
> é
> >>> print chr( 8218 )
> é
>
> | use the dictionary to assist in translation of those characters to
> | Unicode (the Cambodian script...so standard Java code converters are
> | not useful).
> |
> | Maybe I will have to call a Java function to accomplish my desired
> | task, right?
>
> Maybe. I really don't have much experience with using Unicode or
> locale specific stuff.
>
> I hope my results give you some thoughts on how to solve your problem.
> -D
--
Maurice Bauhahn
2 Meadow Way
Dorney Reach
MAIDENHEAD
SL6 0DS
United Kingdom
Home Tel: +44(0)1628 626068
Work Tel: +44(0)1932 878404
Home Email: bauhahnm at clara.net
Work Email: mbauhahn at brio.com
More information about the Python-list
mailing list