[New-bugs-announce] [issue44069] pathlib.Path.glob's generator is not a real generator

Elijah Rippeth report at bugs.python.org
Fri May 7 11:46:22 EDT 2021

New submission from Elijah Rippeth <elijah.rippeth at gmail.com>:

I have a directory with hundreds of thousands of text files. I wanted to explore one file, so I wrote the following code expecting it to happen basically instantaneously because of how generators work:

from pathlib import Path

base_dir = Path("/path/to/lotta/files/")
files = base_dir.glob("*.txt")            # return immediately
first_file = next(files)                  # doesn't return immediately

to my surprise, this took a long time to finish since `next` on a generator should be O(1).

A colleague pointed me to the following code: https://github.com/python/cpython/blob/adcd2205565f91c6719f4141ab4e1da6d7086126/Lib/pathlib.py#L431

I assume calling this list is to "freeze" a potentially changing directory since `scandir` relies on `os.stat`, but this causes a huge penalty and makes the generator return-type a bit disingenuous. In any case, I think this is bug worthy in someo sense.

components: IO
messages: 393190
nosy: Elijah Rippeth
priority: normal
severity: normal
status: open
title: pathlib.Path.glob's generator is not a real generator
type: performance
versions: Python 3.6

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list