extracting from web pages but got disordered words sometimes
Paul McGuire
ptmcg at austin.rr.com
Sat Jan 27 14:26:46 EST 2007
After looking at the pyparsing results, I think I see the problem with
your original code. You are selecting only the characters after the
rightmost "-" character, but you really want to select everything to
the right of "- -". In some of the titles, the encoded Chinese
includes a "-" character, so you are chopping off everything before
that.
Try changing your code to:
title=full_title.split("- -")[1]
I think then your original program will work.
-- Paul
More information about the Python-list
mailing list