Podcast catcher in Python

Falcolas garrickp at gmail.com
Fri Sep 11 11:30:31 EDT 2009


On Sep 11, 8:20 am, Chuck <galois... at gmail.com> wrote:
> Hi all,
>
> I would like to code a simple podcast catcher in Python merely as an
> exercise in internet programming.  I am a CS student and new to
> Python, but understand Java fairly well.  I understand how to connect
> to a server with urlopen, but then I don't understand how to download
> the mp3, or whatever, podcast?  Do I need to somehow parse the XML
> document?  I really don't know.  Any ideas?
>
> Thanks!
>
> Chuck

You will first have to download the RSS XML file, then parse that file
for the URL for the audio file itself. Something like eTree will help
immensely in this part. You'll also have to keep track of what you've
already downloaded.

I'd recommend taking a look at the RSS XML yourself, so you know what
it is you have to parse out, and where to find it. From there, it
should be fairly easy to come up with the proper query to pull it
automatically out of the XML.

As a kindness to the provider, I would recommend a fairly lengthy
sleep between GETs, particularly if you want to scrape their back
catalog.

Unfortunately, I no longer have the script I created to do just such a
thing in the past, but the process is rather straightforward, once you
know where to look.

~G



More information about the Python-list mailing list