> Tony Nelson wrote:
> I'd like to have a fast way to validate large amounts of string data as
> being UTF-8.
define "validate".

All data conforms to the UTF-8 encoding format.  I can stand if someone 
has made data that impersonates UTF-8 that isn't really Unicode.

> I don't see a fast way to do it in Python, though:
> >
unicode(s,'utf-8').encode('utf-8)
if "validate" means "make sure the byte stream doesn't use invalid
sequences", a plain
unicode(s, "utf-8")
should be sufficient.

You are correct.  I misunderstood what was happening in my code.  I 
apologise for wasting bandwidth and your time (and I wasted my own time 
as well).

Indeed, unicode(s, 'utf-8') will catch the problem and is fast enough 
for my purpose, adding about 25% to the time to load a file.
