From JoyceUlysses.txt -- words occurring exactly once
dieter.maurer at online.de
dieter.maurer at online.de
Tue Jun 4 12:13:47 EDT 2024
Edward Teach wrote at 2024-6-3 10:47 +0100:
> ...
>The Gutenburg Project publishes "plain text". That's another problem,
>because "plain text" means UTF-8....and that means unicode...and that
>means running some sort of unicode-to-ascii conversion in order to get
>something like "words". A couple of hours....a couple of hundred lines
>of C....problem solved!
Unicode supports the notion "owrd" even better "ASCII".
For example, the `\w` (word charavter) regular expression wild card,
works for Unicode like for ASCII (of course with enhanced letter,
digits, punctuation, etc.)
More information about the Python-list
mailing list