Searching for lots of similar strings (filenames) in sqlite3 database

Chris Angelico rosuav at gmail.com
Wed Jul 2 12:03:12 CEST 2014


On Wed, Jul 2, 2014 at 7:32 PM, Adam Funk <a24061 at ducksburg.com> wrote:
> Well, I've changed it to the following anyway.
>
>         subdir_glob = subdir + '/*'
>         cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
>                        (subdir_glob,))
>         rows = cursor.fetchall()
>         known_files = {row[0] for row in rows}
>
> I see what you mean about paths containing '%', but I don't see why
> you were concerned about underscores, though.

With GLOB, presumably ? matches a single character and * matches any
number of characters. With LIKE, _ matches a single character and %
matches any number. So, for instance, WHERE filename LIKE
'/foo/bar/spam_spam/%' will match '/foo/bar/spam2spam/1234', which may
be a little surprising. It's not going to be a serious problem in most
cases, as it'll also match '/foo/bar/spam_spam/1234', but the false
positives will make one of those "Huhhhhh????" moments if you don't
keep an eye on your magic characters.

In your specific case, you happen to be safe, but as I look over the
code, my paranoia kicks in and tells me to check :) It's just one of
those things that flags itself to the mind - anything that might help
catch bugs early is a good feature of the mind, in my opinion!

ChrisA



More information about the Python-list mailing list