[Python-ideas] `to_file()` method for strings

Nick Coghlan ncoghlan at gmail.com
Thu Mar 24 22:22:37 EDT 2016


On 25 March 2016 at 08:07, Chris Barker <chris.barker at noaa.gov> wrote:
> what's wrong with:
>
>     open(a_path, 'w').write(the_string)
>
> short, simple one-liner.
>
> OK, non cPython garbage collection may leave that file open and dangling,
> but we're talking the quicky scripting data analysis type user -- the script
> will terminate soon enough.

One of the few downsides of Python's popularity as both a scripting
language and an app development language is that a lot of tutorials
are written for the latter, and in app development, relying on the GC
for external resource cleanup isn't a great habit to get into. As a
result, tutorials will introduce the deterministic cleanup form, and
actively discourage use of the non-deterministic cleanup.

Independent of the "Why not just rely on the GC?" question, though,
we're also forcing the user to upgrade their mental model to achieve
their objective.

User model: "I want to save this data to disk at a particular location
and be able to read it back later"

By contrast, unpacking the steps in the one-liner:

- open the nominated location for writing (with implied text encoding
& error handling)
- write the data to that location

It's that switch from a 1-step process to a 2-step process that breaks
flow, rather than the specifics of the wording in the code (Python 3
at least improves the hidden third step in the process by having the
implied text encoding typically be UTF-8 rather than ASCII).

Formulating the question this way does suggest a somewhat radical
notion, though: what if we had a JSON-based save builtin that wrote
UTF-8 encoded files based on json.dump()?

That is, "save(a_path, the_string)" would attempt to do:

    with open(a_path, 'w', encoding='utf-8', errors='strict') as f:
        json.dump(the_string, f)

While a corresponding "load(a_path)" would attempt to do:

    with open(a_path, 'r', encoding='utf-8', errors='strict') as f:
        return json.load(f)

The format of the created files would be using a well-defined standard
rather than raw data dumps (as well as covering more than just
pre-serialised strings), and the degenerate case of saving a single
string would just have quotation marks added to the beginning and end.
If we later chose to define a "__save__" and "__load__" protocol, then
json.dump/load would also be able to benefit.

There'd also be a potential long term security benefit here, as folks
are often prone to reaching for pickle to save data structures to
disk, which creates an arbitrary code execution security risk when
loading them again later. Explicitly favouring the less dangerous JSON
as the preferred serialisation format can help nudge people towards
safer practices without forcing them to consciously think through the
security implications.

Switching from the save/load builtins to manual file management would
then be a matter of improving storage efficiency and speed of access,
just as switching to any other kind of persistent data store would be
(and a JSON based save/load would migrate very well to NoSQL style
persistent data storage services).

Cheers,
Nick.

P.S. I'm going to be mostly offline until late next week, but found
this idea intriguing enough to share before I left.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list