[I18n-sig] Autoguessing charset for Unicode strings?

Machin, John JMachin@Colonial.com.au
Wed, 20 Jun 2001 09:50:23 +1000


maybe not so expensive, depending on (a) what's in C and what's in Python
and (b) function call overhead and (c) what proportion of text needs which
character set ...

loop once through your Unicode;
	if there were any chars with ordinal > 255, then use UTF-8
	elif there were any > 127, then use iso-8859-1
	else use ASCII

-----Original Message-----
From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
Sent: Wednesday, 20 June 2001 9:27
To: barry@digicool.com
Cc: i18n-sig@python.org
Subject: Re: [I18n-sig] Autoguessing charset for Unicode strings?

[snip]

Now, many email readers will still choke these days when they see
UTF-8 (the Microsoft ones being positive exceptions here), but will
recognize Latin-1. So, another procedure might be

1. try to encode as ASCII
2. if that fails, try iso-8859-1
3. if that fails, use UTF-8

You'll see that this becomes more and more expensive.

[snip]

Regards,
Martin

_______________________________________________
I18n-sig mailing list
I18n-sig@python.org
http://mail.python.org/mailman/listinfo/i18n-sig


**************   IMPORTANT MESSAGE  **************

The information contained in or attached to this message is intended only for the people it is addressed to. If you are not the intended recipient, any use, disclosure or copying of this information is unauthorised and prohibited. This information may be confidential or subject to legal privilege. It is not the expressed view of Colonial Limited or any of its subsidiaries unless that is clearly stated. Colonial cannot accept liability for any virus damage caused by this message.

**************************************************