[Tutor] Problem getting data using beautifulsoup4 + python 3.4.1
Juan Christian
juan0christian at gmail.com
Thu Sep 18 22:25:35 CEST 2014
My code:
import requests
import bs4
FORUM_ID = "440"
response = requests.get('
http://steamcommunity.com/app/{id}/tradingforum'.format(id = FORUM_ID))
soup = bs4.BeautifulSoup(response.text)
topics = [a.attrs.get('href') for a in soup.select('a.forum_topic_overlay')]
for topic in topics:
r = requests.get(topic)
s = bs4.BeautifulSoup(r.text)
username = [a.get_text() for a in s.select('div.authorline')]
profile = [a.attrs.get('href') for a in s.select('div.authorline')]
print(s.select('div.authorline'))
print("\nProfile value: " + str(profile))
print("\n==================================\n")
Now, let's talk about the problem. The print(s.select('div.authorline'))
prints what I want, that is the part of the page that I need:
[<div class="authorline">
<a class="hoverunderline forum_op_author
commentthread_author_globalmoderator" data-miniprofile="40662867" href="
http://steamcommunity.com/id/FrazerJC" onclick="return Forum_AuthorMenu(
this, event, false, '810938082603415962', '-1', 40662867, 'FrazerJC' );">
FrazerJC<span
class="forum_author_action_pulldown"></span></a><img height="12" src="
http://steamcommunity-a.akamaihd.net/public/images/skin_1/comment_modindicator_moderator.png"
title="Moderator" width="12"> <span class="date">14 Oct, 2013 @
3:31pm</span></img></div>]
But, the print("\nProfile value: " + str(profile)) isn't printing what I
want. It's giving me "Profile value: [None]". This should give me the link
to the person's profile, in this example, "
http://steamcommunity.com/id/FrazerJC". I was following the bs4 doc, and I
did the [a.attrs.get('href') for a in s.select('div.authorline')] in order
to get the value of the href, but it isn't working, of course I made
something wrong, but where?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140918/bb5af4f5/attachment.html>
More information about the Tutor
mailing list