[Python-ideas] Extend unicodedata with a name search

Steven D'Aprano steve at pearwood.info
Sat Oct 4 12:26:10 CEST 2014


On Sat, Oct 04, 2014 at 11:28:52AM +0200, Andrew Barnert wrote:
> On Oct 4, 2014, at 8:21, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > 1) fuzzy_lookup(glob):
> >    Return iterator which yields (ordinal, name) for
> >    each unicode code point which matches the glob.
> > 
> >    Names beginning with a substring:
> >        fuzzy_lookup("SPAM*")
> > 
> >    Names ending with a substring:
> >        fuzzy_lookup("*SPAM")
> > 
> >    Names containing a substring:
> >        fuzzy_lookup("SPAM")
> 
> Surely that last one is "*SPAM*", right? 

It's a fuzzy lookup, not an exact lookup, so by default it matches the 
substring anywhere in the string. (If you want an exact name lookup, 
unicodedata already supports that.) You could write "*SPAM*" of course, 
but the stars would be redundant.

I'm not trying to match the full range of shell globs, I'm just 
suggesting the minimum set of features I want. The only metacharacter I 
can see a practical use for is *. If you can think of uses for other 
metacharacters, feel free to propose them.


> Otherwise this is a weird sort of glob where * doesn't match anything 
> on this end, it instead constrains the opposite end or something. 

I don't quite understand what you are trying to say here.


> At any rate, why would you expect glob here? There's really nothing 
> else in Python that uses glob patterns except for glob/fnmatch, which 
> are explicitly matching equivalent OS services. It doesn't seem any 
> more natural to think of the database as a directory of files than as 
> a file of text or a database of key values, so why not a regex, or a 
> SQL like pattern, or something else?

Because globs are simpler than regexes, and easier to use. They support 
the most common (or at least what I think will be the most common) 
use-cases: matching something that contains, ends with or starts with a 
substring. (Globbing may be most well-known from shells, but there is 
nothing about glob syntax that is limited to matching file names. It's a 
string matching language, which the shell happens to use to match file 
names.)

I don't see a use for supporting the full range of regexes. As far as I 
am concerned, globbing is complicated enough for what I need, and full 
support for arbitrary regexes is YAGNI.


-- 
Steven


More information about the Python-ideas mailing list