Validate string as UTF-8?
*firstname*nlsnews at georgea*lastname*.com
Sun Nov 6 21:47:39 CET 2005
In article <mailman.176.1131307306.18701.python-list at python.org>,
"Fredrik Lundh" <fredrik at pythonware.com> wrote:
> Tony Nelson wrote:
> > I'd like to have a fast way to validate large amounts of string data as
> > being UTF-8.
> define "validate".
All data conforms to the UTF-8 encoding format. I can stand if someone
has made data that impersonates UTF-8 that isn't really Unicode.
> > I don't see a fast way to do it in Python, though:
> > unicode(s,'utf-8').encode('utf-8)
> if "validate" means "make sure the byte stream doesn't use invalid
> sequences", a plain
> unicode(s, "utf-8")
> should be sufficient.
You are correct. I misunderstood what was happening in my code. I
apologise for wasting bandwidth and your time (and I wasted my own time
Indeed, unicode(s, 'utf-8') will catch the problem and is fast enough
for my purpose, adding about 25% to the time to load a file.
TonyN.:' *firstname*nlsnews at georgea*lastname*.com
More information about the Python-list