[Tutor] beautifulsoup - getting an error when NavigableString object is returned

Clay Wiedemann clay.wiedemann at gmail.com
Sun Mar 4 04:22:05 CET 2007

I wanted to strip the quotes from IMDB quote pages, just to start
learning python. Quotes are not nested, so I got the anchor links that
precede them. I thought I could walk down until I hit an HR tag,
meanwhile grabbing people and quotes via hits on <b> and <br>.
But once I tried to walk down from my hit on the anchor link and pull
the name, I found I kept getting a NavigableString instead of tag, so
asking for the .name attribute gave an error.

Any idea why this might happen?

This is the relevant chunk of IMDB code:

<a name="qt0210620"></a>

<b><a href="/name/nm0629454/">Bill</a></b>:
You're supposed to wear the blue dress when I wear this.

<b><a href="/name/nm0707043/">Mary</a></b>:
I don't want to dress like twins anymore.

<b><a href="/name/nm0629454/">Bill</a></b>:
We're not twins. We're a trio.
<hr width="30%">


And this is what I wrote (and if there are other awful things about
this, I would be happy to know):

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup
import re

# stubs --------------------------

movietitle_stub = "Nashville" 							#later search an pull first
result (if movie?)
movieurl_stub = "http://imdb.com/title/tt0073440/" 		#and get this

def soupifyPage(target):
	grab html from a page
	probably need real method of checking for failure, huh
	codeReq = urllib2.Request(target)
	response = urllib2.urlopen(codeReq)
	soupyhtml = BeautifulSoup(response)
	return soupyhtml

def pullQuote(curTag):
	# character is in bold
	print curTag.nextSibling.name
	if curTag.nextSibling.name == 'hr':
		#are done
		return quoteBlock
	print "seeing" + curTag.nextSibling.name
	quoteBlock = quoteBlock + " - " + curTag.nextSibling.name
	curTag = curTag.nextSibling

quotepage = movieurl_stub + "quotes"
print "Getting this:" + quotepage
print "---------------"
quotebag = soupifyPage(quotepage)

# each quote is preceded by anchorlink, begins with qt : example <a
# the end with an HR tag
# they are not nested

quotations = quotebag.findAll(attrs = {'name' : re.compile("^qt")})

for q in quotations:
	print q.nextSibling.name  # attribute error: "'NavigableString'
object has no attribute 'name'"
	print "next!"


- - - - - - -

Clay S. Wiedemann

More information about the Tutor mailing list