On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou email@example.com wrote: ..
I am not sure this belongs to the locale module, however. It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate.
It could simply be in unicodedata if you split the implementation into a core C part and some Python bits.
Splitting unicodedata may not be a bad idea. There are many more pieces in UCD than covered by unicodedata.  Hardcoding them all into unicodedata module is hard to justify, but some are quite useful. For example, PropertyValueAliases.txt is quite useful for those like myself who cannot remember what Pd or Zl category names stand for. SpecialCasing.txt is required for proper casing, but is not currently included in Python. I would not want to change str.upper or str.title because of this, but providing the raw info to someone who wants to implement proper case mappings may not be a bad idea. Blocks.txt is certainly useful for any language-dependent processing.
On the other hand, I think we should keep Unicode data and Unicode algorithms separate. And the latter may not even belong to the Python stdlib.