[Tutor] string conversion
Steven D'Aprano
steve at pearwood.info
Tue Mar 1 00:44:37 CET 2011
Robert Clement wrote:
> Hi
>
> I have a wxpython control in which users are intended to enter control
> characters used to define binary string delimiters, eg. '\xBA\xBA' or
> '\t\r\n' .
Do you mean that your users enter *actual* control characters? What do
they type to enter (say) an ASCII null character into the field?
Or do you mean they type the string representation of the control
character, e.g. for ASCII null they press \ then 0 on their keyboard,
and the field shows \0 rather than one of those funny little square
boxes you get for missing characters in fonts.
I will assume you mean the second, because I can't imagine how to enter
control characters directly into a field (other than the simple ones
like newline and tab).
> The string returned by the control is a unicode version of the string
> entered by the user, eg. u'\\xBA\\xBA' or u'\\t\\r\\n' .
The data you are dealing with is binary, that is, made up of bytes
between 0 and 255. The field is Unicode, that is, made up of characters
with code points between 0 and some upper limit which is *much* higher
than 255. If wxpython has some way to set the encoding of the field to
ASCII, that will probably save you a lot of grief; otherwise, you'll
need to decide what you want to do if the user types something like £ or
© or other unicode characters.
In any case, it seems that you are expecting strings with the
representation of control characters, rather than actual control characters.
> I would like to be able retrieve the original string containing the
> escaped control characters or hex values so that I can assign it to a
> variable to be used to split the binary string.
You have the original string -- the user typed <backslash> <code>, and
you are provided <backslash> <code>.
Remember that backslashes in Python are special, and so they are escaped
when displaying the string. Because \t is used for the display of tab,
it can't be used for the display of backslash-t. Instead the display of
backslash is backslash-backslash. But that's just the *display*, not the
string itself. If you type \t into your field, and retrieve the string
which looks like u'\\t', if you call len() on the string you will get 2,
not 3, or 6. If you print it with the print command, it will print as \t
with no string delimiters u' and ' and no escaped backslash.
So you have the original string, exactly as typed by the user. I *think*
what you want is to convert it to *actual* control characters, so that a
literal backslash-t is converted to a tab character, etc.
>>> s = u'\\t'
>>> print len(s), s, repr(s)
2 \t u'\\t'
>>> t = s.decode('string_escape')
>>> print len(t), t, repr(t)
1 '\t'
Hope that helps.
--
Steven
More information about the Tutor
mailing list