Why Python don't accept 03 as a number?
avigross at verizon.net
Mon Dec 10 21:11:03 EST 2018
READERS DIGEST CONDENSED QUESTION: How expensive is eval("123 + 456 == 975") versus other ways?
The discussion started by Jach has spread away from discussing how python deals with numbers starting with leading zeroes such as "03". I note there are many ID numbers like social security that have a leading zero so if converted to an int for storage reasons, ...
The TOPIC I want to discuss is the concerns about Jach wanting to use "eval()". Besides security concerns, I want to know if it is particularly expensive.
It was suggested that you might solve a range of problems like puzzles asking you to substitute unique digits for letters. Some of these puzzles are normally solved with logic a step at a time but the techniques used may require a specific order and may not be easy to program for a more general case so can it be solved using what might be termed brute force. I mean try EVERYTHING that might work, including some that might not.
A particular example was: SEND + MORE = MONEY.
Jach's suggestion was to take every possible combination of digits and make the substitutions for the letters in that puzzle: then do something like this:
>>> eval ("9567+1085==10652")
So, although better algorithms exist, no doubt, I considered what it would take to do this without an eval. I came up with the following as an outline.
Accept the puzzle as a string with any number of items to the right or left of an "=" as long as they only have whitespace and plus signs in addition to alphabetic characters. So "AE + DF = CE + AB + AA" would be fine. This model does not include doing subtraction or other things, just a simple model like that, for now.
You can easily remove whitespace, force everything to one case, split it into a left/right on "=" and then split those into lists on "+.
You now have two lists of alphabetic strings that ultimately must become lists of integers and then the sums can be compared.
To get the number of unique letters in the puzzle, N, you throw all the list items as individual characters into a set. Clearly, for this kind of puzzle, there cannot be more than ten unique letters or there can be no unique mapping for 0-9.
I would then "generate" all possible combinations of digits 0-9 of length N. There are an amazing number of ways to do that ranging from taking a range(10**N) and converting it to a string then a list of numeral characters then tossing out any that have any duplicates, to creating a recursive function that passes along what has been used so far to the next call. Since many of these problems only need ONE solution, the generator would only iterate as many times as needed and no giant lists of numbers or strings need be in memory at any time.
For each tuple returned by the generator, you can iterate on the set of unique letters and use string functions to substitute the digits, or perhaps do this all at once. You would do this to all the items in what I call the left list as well as all the items on the right list. These would not be "numeric" so using int() on each item would work EVEN with leading zeroes. Seems safe enough.
Finally, with no eval needed, you take the sum of the numbers in a list of the left and compare to the sum on the right and if equal, you present the solution in whatever way works. If no more solutions are needed, you quit. I might write this part as a generator too, that can be called once or to completion.
Yes, this is brute force. Using range(1,000,000) (no commas in real use) would be a million iterations when there are six unique characters in the puzzle and as many as 10 billion if all ten are in use. If you use nested loops like I showed a while ago (albeit to make arbitrary ones for many sizes could require multiple functions or use of an eval on one built by hand) you can cut down the number of iterations as the nested loops count down with each one doing one less than the one before it. Same goes for the recursive function call method as it passes along what numerals have already been used. There may already be a permutation function that does this efficiently in C.
The above description includes parts that may also be used in the eval situation. The main difference may be that my method uses int() and perhaps some string functions. So, does eval just divert the attention of the interpreter as may happen with importing a module, or does it invoke a second copy of the interpreter as may happen with some multitasking methods? I would guess the former. But it may also have to first compile the text into byte code. However, a full eval is like using a sledgehammer when a thumbtack might be enough. Then again, it is already sitting there so a little extra use might be cheap.
So a main reason for my variant might still be to avoid taking any chances on rogue code. Mind you, in the above I would remove everything except [a-zA-Z=+] including parentheses and periods so damage from a carefully crafted text should be limited to over-riding a global variable name and even that can be handled by supplying a minimal environment to the eval call.
The real cost that dominates here is not memory, albeit garbage collection may be busy as it generates lots of temporary small bits of data. It is how the number of iterations grows.
I have looked at a somewhat related issue of how you generate all possible SCRABBLE words (or BOGGLE or ...) given specific starting letters. That problem allows duplicate letters but it has to handle quite large words (even weird medical terms like Pneumonoultramicroscopicsilicovolcanoconiosis but SCRABBLE words cannot possible be larger than the seven letters you have added to whatever is on the board that it connects to and in any case, the board can only fit 19 letters.) One way to make all possible combinations is along the lines above with many needed changes as there can (in theory) be as many as 26 unique letters (in English) and you can have multiple copies of some letters. If you allow other languages, the problem grows to the point where brute force is not realistic. And, ideally, you winnow down your choices by checking each word against a large dictionary. Anyone know of one that has a decent selection and can be freely imported? I mean one word per line.
Of course, in this case, there is room for improvement such as by making a tree out of the dictionary and pruning any attempts if the word you are building wanders off a leaf. Another story.
I apologize for the length. The main question was whether eval is particularly expensive.
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On Behalf Of Antoon Pardon
Sent: Monday, December 10, 2018 5:00 AM
To: python-list at python.org
Subject: Re: Why Python don't accept 03 as a number?
On 8/12/18 06:00, Cameron Simpson wrote:
> On 07Dec2018 20:24, Jach Fong <jfong at ms4.hinet.net> wrote:
>> Ian at 2018/12/8 UTC+8 AM11:28:34 wrote:
>>> What is it exactly that you're trying to accomplish with this?
>>> Perhaps there's a better way than using eval.
>> This problem comes from solving a word puzzle,
>> ab + aa + cd == ce
>> Each character will be translate to a digit and evaluate the
>> 03 + 00 + 15 == 18
> Then you should be evaluating the digits and assembling values from
> them. Not trying to shoehorn a string through something that _might_
> accept this string and do what you want. In Python 2 it will accept
> your string and not do what you want; at least in Python 3 it doesn't
> accept your string.
> My point here is that the structure of your puzzle doesn't map
> directly into a naive python statement, and you shouldn't be
> pretending it might.
How do you figure? As far as I understand he is trying to solve this kind of puzzle:
Where every letter is to be replaced by a digit in such a way that the sum checks out. Sure trying to solve this by starting with the string: "SEND + MORE == MONEY" and doing replaces until eval comes up with true is not very sofisticated but I wouldn't call it shoehorning either.
More information about the Python-list