[Tracker-discuss] Import work begun.
Erik Forsberg
forsberg at efod.se
Tue Nov 7 20:51:17 CET 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Barry Warsaw <barry at python.org> writes:
> On Nov 7, 2006, at 11:51 AM, Erik Forsberg wrote:
>
>> Now working on getting the importer to work again. It's been a while
>> since I ran it, and there's been a new release of the sourceforge
>> tools from effbot due to changes on the sourceforge web site.
>
> Let me know how that goes. I've been using r428 of /F's stuff and
> ran into several problems getting a clean export of the Mailman
> trackers. I contacted Fredrik about that but I think he and I have
> both been too busy.
Well, it didn't work for me either - failed to find the description as
well as the comments. Here's a patch to fix that:
- --snip--
Index: extract.py
===================================================================
- --- extract.py (revision 428)
+++ extract.py (working copy)
@@ -95,13 +95,13 @@
table = elem.find("table")
# locate the description
- - for tr in table:
+ for tr in table[1:]:
if len(tr) == 1 and tr[0].get("colspan") == "2":
# map <br> to newlines
for br in tr.findall(".//br"):
br.text = chr(0) # temporarily use NULL as line terminator
- - if br.tail and br.tail.startswith("\n"):
- - br.tail = br.tail[1:] # trip extra newlines
+ if br.tail and br.tail.startswith("\r\n"):
+ br.tail = br.tail[2:] # trip extra newlines
text = gettext(tr)
if text.startswith("\n\n\t\t\t"):
text = text[5:]
@@ -128,7 +128,7 @@
elif td and td[0].tag == "h3":
key = gettext(td[0]).strip()
if key == "Followups:":
- - for i, e in enumerate(td.findall("table/tr/td")):
+ for i, e in enumerate(td.findall("p/table/tr/td")):
if i:
data = getcomment(e)
result.setdefault("comments", []).append(data)
- --snap--
I'm not sure this solves all possible scraping trouble, but at least
it's a start.
Cc to Fredrik to let him update his repo. And please spell my surname
correctly in the commit message this time ;-).
Cheers,
\EF
- --
Erik Forsberg http://efod.se
GPG/PGP Key: 1024D/0BAC89D9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8+ <http://mailcrypt.sourceforge.net/>
iD8DBQFFUOO1rJurFAusidkRAqhxAKCRHDFxLnj2a6rncWjHpkG3nsIbNQCgiCRF
EIiB5y3i8iWebF9WomI9KAA=
=GguG
-----END PGP SIGNATURE-----
More information about the Tracker-discuss
mailing list