Concatenate a string as binary bytes

Benjamin Kaplan benjamin.kaplan at case.edu
Tue Dec 14 15:06:04 EST 2010


2010/12/14 Jaime Fernández <jjjaime at gmail.com>:
> Hi
> To build a binary packet (for SMPP protocol), we have to concatenate
> different types of data: integers, floats, strings.
> We are using struct.pack to generate the binary representation of each
> integer and float of the packet, and then they are concatenated with the +
> operand.
> However, for strings we directly concatenate the string with +, without
> using struct.
> Everything works with python 2 except when string encoding is introduced.
> Whenever, a non ASCII char appears in the string, an exception is launched.
> In python 3, it's not possible to do this trick because all the strings are
> unicode.
> What would be the best approach to:
>  - Support non-ascii chars (we just want to concatenate the binary
> representation of the string without any modification)
>  - Compatibility between python 2 and python 3.
> Thanks,
> Jaime
> --

I don't think you quite understand how encodings and unicode work.You
have two similar, but distinct data types involved: a byte string (""
in python 2.x, b"" in Python 3.x) which is a sequence of bytes, and a
unicode String (u"" in Python 2.x and "" in Python 3.x) which is a
sequence of characters. Neither type of strings has an encoding
associated with it- an encoding is just a function for converting
between these two data types.

You only get those non-ascii character problems when you try
concatenating Unicode strings with byte strings, because Python
defaults to using ASCII as the encoding when you don't specify the
encoding yourself. If you want to avoid those errors (in both Python
2.x and Python 3.x), use the unicode string's encode method to turn
the characters into a sequence of bytes before you concat them.



More information about the Python-list mailing list