[Python-ideas] More user-friendly version for string.translate()
Steven D'Aprano
steve at pearwood.info
Mon Oct 24 22:37:05 EDT 2016
On Mon, Oct 24, 2016 at 07:39:16PM +0200, Mikhail V wrote:
> Hello all,
>
> I would be happy to see a somewhat more general and user friendly
> version of string.translate function.
> It could work this way:
> string.newtranslate(file_with_table, Drop=True, Dec=True)
That's an interesting concept for "user friendly". Apart from functions
that are actually designed to read files of a particular format, can
you think of any built-in functions that take a file as argument?
This is how you would use this "user friendly version of translate":
path = '/tmp/table' # hope no other program is using it...
with open(path, 'w') as f:
f.write('97 {65}\n')
f.write('98 {66}\n')
f.write('99 {67}\n')
with open(path, 'r') as f:
new_string = old_string.newtranslate(f, False, True)
Compared to the existing solution:
new_string = old_string.translate(str.maketrans('abc', 'ABC'))
Mikhail, I appreciate that you have many ideas and want to share them,
but try to think about how those ideas would work. The Python standard
library is full of really well-designed programming interfaces. You can
learn a lot by thinking "what existing function is this like? how does
that existing function work?".
str.translate and str.maketrans already exist. Look at how maketrans
builds a translation table: it can take either two equal length strings,
and maps characters in one to the equivalent character in the other:
str.maketrans('abc', 'ABC')
Or it can take a mapping (usually a dict) that maps either characters or
ordinal numbers to a new string (not just a single character, but an
arbitrary string) or ordinal numbers.
str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43})
(or None, to delete them). Note the flexibility: you don't need to
specify ahead of time whether you are specifying the ordinal
value as a decimal, hex, octal or binary value. Any expression that
evaluates to a string or a int within the legal range is valid.
That's a good programming interface.
Could it be better? Perhaps. I've suggested that maybe translate could
automatically call maketrans if given more than one argument. Maybe
there's an easier way to just delete unwanted characters. Perhaps there
could be a way to say "any character not in the translation table should
be dropped". These are interesting questions.
> Further thoughts: for 8-bit strings this should be simple to implement
> I think.
I doubt that these new features will be added to bytes as well as
strings. For 8-bits byte strings, it is easy enough to generate your own
translation and deletion tables -- there are only 256 values to
consider.
> For 16-bit of course
> there is issue of memory usage for lookup tables, but the gurus could
> probably optimise it.
There are no 16-bit strings.
Unicode is a 21-bit encoding, usually encoded as either fixed-width
sequence of 4-byte code units (UTF-32) or a variable-width sequence of
2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a
"16-bit string".
[...]
> but as said I don't like very much the idea and would be OK for me to
> use numeric values only.
I think you are very possibly the only Python programmer in the world
who thinks that writing decimal ordinal values is more user-friendly
than writing the actual character itself. I know I would much rather
see $, π or ╔ than 36, 960 or 9556.
--
Steve
More information about the Python-ideas
mailing list