[I18n-sig] Encoding auto-detection

M.-A. Lemburg mal@lemburg.com
Fri, 01 Jun 2001 23:10:50 +0200


"Martin v. Loewis" wrote:
> 
> > I also think that it would be worthwhile adding a similar
> > API to codecs.py which takes the magic ('<?xml' in this case)
> > as argument and then tries to determine whether the input
> > data is an ASCII superset, UTF-8 or UTF-16/32.
> 
> I don't think so. Doing the XML autodetection is not terribly
> complicated, and rarely needs to be done - you'd normally pass the
> byte stream to an XML parser, so you would not need to care about the
> encoding.

I was talking about a general purpose encoding sniffer, the XML
case would only be a special case. The idea is to pass a magic
string to the API and then let it fiddle around with to try 
to deduce the encoding. The magic string might also be regular
expression which then has the encoding parameter as group 1, etc.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/