UTF-16 encoding line breaks?

Isaac To kkto at csis.hku.hk
Wed Jun 11 12:48:23 EDT 2003


>>>>> "Richard" == Richard  <richardd at hmgcc.gov.uk> writes:

    Richard> Hi, I have a script which uses the .encode('UTF-16') function
    Richard> to encode a string into UTF-16. However I am having
    Richard> difficulties in putting line breaks into that string. \n is
    Richard> what I normally use but does not appear to become valid UTF-16
    Richard> once encoded. Can anyone tell me what escape command I can use
    Richard> in my string to ensure that I get line breaks in my UTF-16
    Richard> endoded output?

Why you should bother encoding something to UTF-16 before adding the return
characters?  UTF-16 is a strange enough format that is quite clumsy to work
on after it is encoded.  E.g., you have to detect the endian of the string
after it is encoded, since the implementation is free to use any
byte-ordering.  It can also contain surrogate characters, which means 2
16-bit characters can actually represent 1 UCS-4 character.  So basically,
if you want to operate on it, don't encode it yet, or decode it first.

Regards,
Isaac.




More information about the Python-list mailing list