From JoyceUlysses.txt -- words occurring exactly once

Edward Teach hackbeard at linuxmail.org
Tue Jun 4 07:21:34 EDT 2024


On Mon, 03 Jun 2024 14:58:26 -0400 (EDT)
Grant Edwards <grant.b.edwards at gmail.com> wrote:

> On 2024-06-03, Edward Teach via Python-list <python-list at python.org>
> wrote:
> 
> > The Gutenburg Project publishes "plain text".  That's another
> > problem, because "plain text" means UTF-8....and that means
> > unicode...and that means running some sort of unicode-to-ascii
> > conversion in order to get something like "words".  A couple of
> > hours....a couple of hundred lines of C....problem solved!  
> 
> I'm curious.  Why does it need to be converted frum Unicode to ASCII?
> 
> When you read it into Python, it gets converted right back to
> Unicode...
> 
> 
> 

Well.....when using the file linux.words as a useful master list of
"words".....linux.words is strict ASCII........



More information about the Python-list mailing list