On Sat, Oct 04, 2014 at 11:28:52AM +0200, Andrew Barnert wrote:
On Oct 4, 2014, at 8:21, Steven D'Aprano email@example.com wrote:
fuzzy_lookup(glob): Return iterator which yields (ordinal, name) for each unicode code point which matches the glob.
Names beginning with a substring: fuzzy_lookup("SPAM*")
Names ending with a substring: fuzzy_lookup("*SPAM")
Names containing a substring: fuzzy_lookup("SPAM")
Surely that last one is "*SPAM*", right?
It's a fuzzy lookup, not an exact lookup, so by default it matches the substring anywhere in the string. (If you want an exact name lookup, unicodedata already supports that.) You could write "*SPAM*" of course, but the stars would be redundant.
I'm not trying to match the full range of shell globs, I'm just suggesting the minimum set of features I want. The only metacharacter I can see a practical use for is *. If you can think of uses for other metacharacters, feel free to propose them.
Otherwise this is a weird sort of glob where * doesn't match anything on this end, it instead constrains the opposite end or something.
I don't quite understand what you are trying to say here.
At any rate, why would you expect glob here? There's really nothing else in Python that uses glob patterns except for glob/fnmatch, which are explicitly matching equivalent OS services. It doesn't seem any more natural to think of the database as a directory of files than as a file of text or a database of key values, so why not a regex, or a SQL like pattern, or something else?
Because globs are simpler than regexes, and easier to use. They support the most common (or at least what I think will be the most common) use-cases: matching something that contains, ends with or starts with a substring. (Globbing may be most well-known from shells, but there is nothing about glob syntax that is limited to matching file names. It's a string matching language, which the shell happens to use to match file names.)
I don't see a use for supporting the full range of regexes. As far as I am concerned, globbing is complicated enough for what I need, and full support for arbitrary regexes is YAGNI.