[Tutor] unstring

Steven D'Aprano steve at pearwood.info
Wed Jun 19 04:41:07 CEST 2013


On Tue, Jun 18, 2013 at 06:41:01PM -0700, Jim Mooney wrote:
> Is there a way to unstring something? That is str(object) will give me
> a string, but what if I want the original object back, for some
> purpose, without a lot of foofaraw?

The short answer is, "no". 

The slightly longer answer is, "sometimes".

The accurate answer is, "it depends on what the object is, whether you 
insist on a human-readable string, and whether or not you like living 
dangerously".

If you know what sort of object the string is supposed to represent, 
then often (but not always) you can convert it like this:

x = 23456
s = str(s)
y = int(s)

after which, x should equal y. This will work for ints, and will 
probably work for floats[1]. On the other hand, this does not work for 
(say) list, tuple or other similar types of object:

s = str([None, 42, 'abc'])
list(s)

returns something very different from what you started with. (Try it and 
see.)

Many objects -- but not all -- have the property that if you call eval() 
on their repr(), you get the same value back again:

s = repr([None, 42, 'abc'])
eval(s)

ought to return the same list as you started with. But:

- this is not guaranteed for all objects;

- it's also unsafe. 

If the string you call eval on comes from an untrusted source, they can 
do *anything they like* on your computer. Imagine if you are taking a 
string from somewhere, which you assumed was generated using repr(), but 
somebody can fool you to accept this string instead:

"[None, 42, 'abc'] and __import__('os').system('echo Got You Now, sucker!')"


Try eval()'ing the above string. Now imagine something more malicious.

So, my advice is, *** don't use eval on untrusted strings ***

Another option is to use ast.literal_eval, which is much, much more 
limited and consequently is safer.

py> ast.literal_eval("[None, 42, 'abc']")
[None, 42, 'abc']


To summarise, some but not all objects can be round-tripped to 
and from human-readable strings, like those produced by str() and 
repr(). Some of them can even be done so safely, without eval().

As an alternative, if you give up the requirement that the string be 
human-readable, you can *serialise* the object. Not all objects can be 
serialised, but most can. You can use:

- marshal
- pickle
- json
- yaml  # not in the standard library
- and others

but they all have pros and cons. For instance, pickle can handle nearly 
anything, but it has the same vulnerability as eval(), it can evaluated 
arbitrary code. json and yaml are pretty close to human-readable, even 
human-editable, but they can't handle arbitrary objects.

py> import pickle
py> x = [None, 42, 'abc']
py> pickle.dumps(x)
b'\x80\x03]q\x00(NK*X\x03\x00\x00\x00abcq\x01e.'

Not exactly human-readable, but guaranteed[2] to round-trip:

py> pickle.loads(pickle.dumps(x)) == x
True


If these are of interest, I suggest starting by reading the docs, then 
coming back with any questions.



[1] I think it will always work, but floats are just tricky enough that 
I am not willing to promise it.

[2] Guarantee void on planet Earth.


-- 
Steven


More information about the Tutor mailing list