[Python-ideas] More user-friendly version for string.translate()

Steven D'Aprano steve at pearwood.info
Mon Oct 24 22:37:05 EDT 2016


On Mon, Oct 24, 2016 at 07:39:16PM +0200, Mikhail V wrote:
> Hello all,
> 
> I would be happy to see a somewhat more general and user friendly
> version of string.translate function.
> It could work this way:
> string.newtranslate(file_with_table, Drop=True, Dec=True)

That's an interesting concept for "user friendly". Apart from functions 
that are actually designed to read files of a particular format, can 
you think of any built-in functions that take a file as argument?

This is how you would use this "user friendly version of translate":

path = '/tmp/table'  # hope no other program is using it...
with open(path, 'w') as f:
    f.write('97    {65}\n')
    f.write('98    {66}\n')
    f.write('99    {67}\n')

with open(path, 'r') as f:
    new_string = old_string.newtranslate(f, False, True)


Compared to the existing solution:

new_string = old_string.translate(str.maketrans('abc', 'ABC'))


Mikhail, I appreciate that you have many ideas and want to share them, 
but try to think about how those ideas would work. The Python standard 
library is full of really well-designed programming interfaces. You can 
learn a lot by thinking "what existing function is this like? how does 
that existing function work?".

str.translate and str.maketrans already exist. Look at how maketrans 
builds a translation table: it can take either two equal length strings, 
and maps characters in one to the equivalent character in the other:

    str.maketrans('abc', 'ABC')

Or it can take a mapping (usually a dict) that maps either characters or 
ordinal numbers to a new string (not just a single character, but an 
arbitrary string) or ordinal numbers. 

    str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43})

(or None, to delete them). Note the flexibility: you don't need to 
specify ahead of time whether you are specifying the ordinal 
value as a decimal, hex, octal or binary value. Any expression that 
evaluates to a string or a int within the legal range is valid.

That's a good programming interface.

Could it be better? Perhaps. I've suggested that maybe translate could 
automatically call maketrans if given more than one argument. Maybe 
there's an easier way to just delete unwanted characters. Perhaps there 
could be a way to say "any character not in the translation table should 
be dropped". These are interesting questions.


> Further thoughts: for 8-bit strings this should be simple to implement
> I think.

I doubt that these new features will be added to bytes as well as 
strings. For 8-bits byte strings, it is easy enough to generate your own 
translation and deletion tables -- there are only 256 values to 
consider.


> For 16-bit of course
> there is issue of memory usage for lookup tables, but the gurus could
> probably optimise it.

There are no 16-bit strings.

Unicode is a 21-bit encoding, usually encoded as either fixed-width 
sequence of 4-byte code units (UTF-32) or a variable-width sequence of 
2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a 
"16-bit string".


[...]
> but as said I don't like very much the idea and would be OK for me to
> use numeric values only.

I think you are very possibly the only Python programmer in the world 
who thinks that writing decimal ordinal values is more user-friendly 
than writing the actual character itself. I know I would much rather 
see $, π or ╔ than 36, 960 or 9556.



-- 
Steve


More information about the Python-ideas mailing list