byte count unicode string

Wed Sep 20 04:42:41 EDT 2006

Duncan Booth <duncan.booth at invalid.invalid> writes:
> I guess you could invent something like inserting a string into a database 
> which has fixed size fields, silently truncates fields which are too long 
> and stores the strings internally in utf-8 but only accepts ucs-2 in its 
> interface. Pretty far fetched, but if it exists I suspect that an extra 
> utf-8 encoding here or there is the least of your problems.

More direct would be to add an option to the http parser to return the
utf8 received from the browser as a byte array still in utf8, instead
of decoding it so that it needs to be re-encoded before insertion into
the database.  A lot of the time, the application doesn't need to look
at the string anyway.