[Tutor] Recursion depth exceeded in python web crawler
Steven D'Aprano
steve at pearwood.info
Thu Jun 14 21:36:00 EDT 2018
On Thu, Jun 14, 2018 at 02:32:46PM -0400, Daniel Bosah wrote:
> I am trying to modify code from a web crawler to scrape for keywords from
> certain websites. However, Im trying to run the web crawler before I
> modify it, and I'm running into issues.
>
> When I ran this code -
[snip enormous code-dump]
> The interpreter returned this error:
>
> *RuntimeError: maximum recursion depth exceeded while calling a Python
> object*
Since this is not your code, you should report it as a bug to the
maintainers of the web crawler software. They wrote it, and it sounds
like it is buggy.
Quoting the final error message on its own is typically useless, because
we have no context as to where it came from. We don't know and cannot
guess what object was called. Without that information, we're blind and
cannot do more than guess or offer the screamingly obvious advice "find
and fix the recursion error".
When an error does occur, Python provides you with a lot of useful
information about the context of the error: the traceback. As a general
rule, you should ALWAYS quote the entire traceback, starting from the
line beginning "Traceback: ..." not just the final error message.
Unfortunately, in the case of RecursionError, that information can be a
firehose of hundreds of identical lines, which is less useful than it
sounds. The most recent versions of Python redacts that and shows
something similar to this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
[ previous line repeats 998 times ]
RecursionError: maximum recursion depth exceeded
but in older versions you should manually cut out the enormous flood of
lines (sorry). If the lines are NOT identical, then don't delete them!
The bottom line is, without some context, it is difficult for us to tell
where the bug is.
Another point: whatever you are using to post your messages (Gmail?) is
annoyingly adding asterisks to the start and end of each line. I see
your quoted code like this:
[direct quote]
*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*
Notice the * at the start and end of each line? That makes the code
invalid Python. You should check how you are posting to the list, and if
you have "Rich Text" or some other formatting turned on, turn it off.
(My guess is that you posted the code in BOLD or perhaps some colour
other than black, and your email program "helpfully" added asterisks to
it to make it stand out.)
Unfortunately modern email programs, especially web-based ones like
Gmail and Outlook.com, make it *really difficult* for technical forums
like this. They are so intent on making email "pretty" (generally pretty
ugly) for regular users, they punish technically minded users who need
to focus on the text not the presentation.
--
Steve
More information about the Tutor
mailing list