[Python-ideas] Change magic strings to enums
Steven D'Aprano
steve at pearwood.info
Wed Apr 25 06:21:48 EDT 2018
On Wed, Apr 25, 2018 at 10:06:56AM +0200, Jacco van Dorp wrote:
> Perhaps the string encode/decode would be a better case, tho. Is it
> latin 1 or latin-1 ? utf-8 or UTF-8 ?
py> 'abc'.encode('latin 1') == 'abc'.encode('LATIN-1')
True
py> 'abc'.encode('utf8') == 'abc'.encode('UTF 8') == 'abc'.encode('UtF_8')
True
Encoding names are normalised before being used.
> They might be fast to look up if
> you know where to look (probably the top result of googling "python
> string encoding utf 8", and it's the second and first option
> respectively IIRC. But I shouldn't -have- to recall correctly), but
> it's still a lot faster if you can type "Encoding.U" and it gives you
> the option.
If you did this with Encodings.ISO you would get a couple of dozen
possibilities.
ISO-8859-1
ISO-8859-7
ISO-8859-14
ISO-8859-15
etc, just to pick a few at random. How do you know which one you want?
In general, there's not really much *practical* use-case for code
completion on encodings, aside from just exploratory mucking about in
the interactive interpreter.
There are too many codecs (multiple dozen), the names are too similar
and not self-explanatory, and they can have aliases. It would be like
doing code-completion on an object and getting a couple of dozen methods
looking like
method1245 method1246 method1247 method2390 method2395
Besides, aside from UTF-16, UTF-8 and ASCII, we shouldn't encourage
the use of most codecs except for legacy data. And when working with
legacy data, we really need to know ahead of time what the encoding
is, and declare it as constant or application option.
(Or, worst case, we've used chardet or another encoding guesser, and
stored the name of the encoding in a variable.)
I don't really see a big advantage aside from laziness for completing
on encodings. And while laziness is a virtue in programmers, that only
goes so far before it becomes silly. Having to type
import encodings
enc <tab> .Enc <tab> .u <tab> arrow arrow arrow arrow arrow arrow enter
(19 key presses, plus the import) to save from having to type
'utf8'
(six keypresses) is not what I would call efficient use of programmer
time and effort.
(Why so many arrows? Since you'll have to tab past at least
utf16
utf16be
utf16le
utf32
utf32be
utf32le
utf7
before you get to utf8.)
But the biggest problem is that they aren't currently available for
introspection anywhere. You can register new codecs, but there's no API
for querying the list of currently registered codecs or their aliases.
I think that problem would need to be solved first, in which case code
completion will then be either easy, or irrelevant.
(I'd be perfectly satisfied with an API I could call from the
interactive interpreter.)
--
Steve
More information about the Python-ideas
mailing list