Searching for lots of similar strings (filenames) in sqlite3 database
Adam Funk
a24061 at ducksburg.com
Wed Jul 2 08:30:08 EDT 2014
On 2014-07-02, Chris Angelico wrote:
> On Wed, Jul 2, 2014 at 7:32 PM, Adam Funk <a24061 at ducksburg.com> wrote:
>> Well, I've changed it to the following anyway.
>>
>> subdir_glob = subdir + '/*'
>> cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
>> (subdir_glob,))
>> rows = cursor.fetchall()
>> known_files = {row[0] for row in rows}
>>
>> I see what you mean about paths containing '%', but I don't see why
>> you were concerned about underscores, though.
>
> With GLOB, presumably ? matches a single character and * matches any
> number of characters. With LIKE, _ matches a single character and %
> matches any number. So, for instance, WHERE filename LIKE
> '/foo/bar/spam_spam/%' will match '/foo/bar/spam2spam/1234', which may
> be a little surprising. It's not going to be a serious problem in most
> cases, as it'll also match '/foo/bar/spam_spam/1234', but the false
> positives will make one of those "Huhhhhh????" moments if you don't
> keep an eye on your magic characters.
>
> In your specific case, you happen to be safe, but as I look over the
> code, my paranoia kicks in and tells me to check :) It's just one of
> those things that flags itself to the mind - anything that might help
> catch bugs early is a good feature of the mind, in my opinion!
Oh, I'd just missed the '_' in the LIKE documentation. Doh!
--
Indentation is for enemy skulls, not code!
--- Klingon Programmer's Guide
More information about the Python-list
mailing list