>> I personally deal with a database of hundreds of billions of 2 to 5 character ASCII strings. This has been a significant blocker to Python 3 adoption in my world.
>
> I agree -- it is a VERY common case for scientific data sets. But a one-byte-per-char encoding would handle it nicely, or UCS-4 if you want Unicode. The wasted space is not that big a deal with short strings...
Unless if you have hundreds of billions of them.
>> BTW, for those new to the list or with a short memory, this topic has been discussed fairly extensively at least 3 times before. Hopefully the *fourth* time will be the charm!
>
> yes, let's hope so!
>
> The big difference now is that Julian seems to be committed to actually making it happen!
>
> Thanks Julian!
>
> Which brings up a good point -- if you need us to stop the damn bike-shedding so you can get it done -- say so.
>
> I have strong opinions, but would still rather see any of the ideas on the table implemented than nothing.
FWIW, I prefer nothing to just adding a special case for latin-1. Solve the HDF5 problem (i.e. fixed-length UTF-8 strings) or leave it be until someone else is willing to solve that problem. I don't think we're at the bikeshedding stage yet; we're still disagreeing about fundamental requirements.
--
Robert Kern