"Newbie" questions - "unique" sorting ?
John Fitzsimons
xpm4senn001 at sneakemail.com
Wed Jun 25 08:44:50 EDT 2003
On Mon, 23 Jun 2003 20:35:59 -0700, "Cousin Stanley"
<CousinStanley at hotmail.com> wrote:
Hi Cousin Stanley,
>{ 1. Good News | 2. Bad News | 3. Good News } ....
> 1. Good News ....
> The last version of word_list.py that I up-loaded
> works as expected with your input file producing
> an indexed word list with no duplicates ...
< snip >
> That's 6.56 HOURS and un-acceptable performance !!!!
I agree. :-) Very clever of you to have worked out how long it would
take. I hope you didn't wait over 6 hours to find out !!!
> word_list.py works quickly on smaller files,
> but as coded, is an absolute dog for indexing
> larger files ....
Good. I was hoping it wasn't something that I had done wrong. :-)
> 3. Good News ....
> Since I FINALLY figured out that you're mostly interested
> in just the URLs and not a general word list,
> I coded a pre-process script to extract just the URLs
> from the original input file ....
> python url_list.py JF_In.txt JF_URLs.txt
Unless I missed something it does lines starting ftp, http, BUT not
lines that start www . Is that correct ? Or did I give you a file with
no lines starting www ?
< snip >
>Let me know if this output looks closer to what you are after ....
Very very good......and fast. If I can work out what happened to the
www lines, and fix it, then everything will be great. I then hope to
try this exercise using a different method to see if the numbers come
up the same.
Thank you for such excellent programming. :-)
Regards, John.
More information about the Python-list
mailing list