<div dir="ltr">Thanks Nick -- I really couldn't have said it better than "<span style="font-size:13px;line-height:19.5px">we're also forcing the user to upgrade their mental model to achieve</span><br style="font-size:13px;line-height:19.5px"><span style="font-size:13px;line-height:19.5px">their objective."</span><div><span style="line-height:19.5px"><br></span></div><div><span style="line-height:19.5px">Python is beautiful in part because it takes all sorts of idioms that are complicated in other languages and wraps them up in something simple and intuitive ("in" has to be my favorite builtin of all time). This feels like one of those few cases where Python still feels like C, and it's not at all clear to my why that needs to be the case. </span></div><div><span style="line-height:19.5px"><br></span></div><div><span style="line-height:19.5px"><br></span></div><div><span style="line-height:19.5px"><br></span><div class="gmail_quote"><div dir="ltr">On Thu, Mar 24, 2016 at 7:22 PM Nick Coghlan <<a href="mailto:ncoghlan@gmail.com" target="_blank">ncoghlan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 25 March 2016 at 08:07, Chris Barker <<a href="mailto:chris.barker@noaa.gov" target="_blank">chris.barker@noaa.gov</a>> wrote:<br>
> what's wrong with:<br>
><br>
> open(a_path, 'w').write(the_string)<br>
><br>
> short, simple one-liner.<br>
><br>
> OK, non cPython garbage collection may leave that file open and dangling,<br>
> but we're talking the quicky scripting data analysis type user -- the script<br>
> will terminate soon enough.<br>
<br>
One of the few downsides of Python's popularity as both a scripting<br>
language and an app development language is that a lot of tutorials<br>
are written for the latter, and in app development, relying on the GC<br>
for external resource cleanup isn't a great habit to get into. As a<br>
result, tutorials will introduce the deterministic cleanup form, and<br>
actively discourage use of the non-deterministic cleanup.<br>
<br>
Independent of the "Why not just rely on the GC?" question, though,<br>
we're also forcing the user to upgrade their mental model to achieve<br>
their objective.<br>
<br>
User model: "I want to save this data to disk at a particular location<br>
and be able to read it back later"<br>
<br>
By contrast, unpacking the steps in the one-liner:<br>
<br>
- open the nominated location for writing (with implied text encoding<br>
& error handling)<br>
- write the data to that location<br>
<br>
It's that switch from a 1-step process to a 2-step process that breaks<br>
flow, rather than the specifics of the wording in the code (Python 3<br>
at least improves the hidden third step in the process by having the<br>
implied text encoding typically be UTF-8 rather than ASCII).<br>
<br>
Formulating the question this way does suggest a somewhat radical<br>
notion, though: what if we had a JSON-based save builtin that wrote<br>
UTF-8 encoded files based on json.dump()?<br>
<br>
That is, "save(a_path, the_string)" would attempt to do:<br>
<br>
with open(a_path, 'w', encoding='utf-8', errors='strict') as f:<br>
json.dump(the_string, f)<br>
<br>
While a corresponding "load(a_path)" would attempt to do:<br>
<br>
with open(a_path, 'r', encoding='utf-8', errors='strict') as f:<br>
return json.load(f)<br>
<br>
The format of the created files would be using a well-defined standard<br>
rather than raw data dumps (as well as covering more than just<br>
pre-serialised strings), and the degenerate case of saving a single<br>
string would just have quotation marks added to the beginning and end.<br>
If we later chose to define a "__save__" and "__load__" protocol, then<br>
json.dump/load would also be able to benefit.<br>
<br>
There'd also be a potential long term security benefit here, as folks<br>
are often prone to reaching for pickle to save data structures to<br>
disk, which creates an arbitrary code execution security risk when<br>
loading them again later. Explicitly favouring the less dangerous JSON<br>
as the preferred serialisation format can help nudge people towards<br>
safer practices without forcing them to consciously think through the<br>
security implications.<br>
<br>
Switching from the save/load builtins to manual file management would<br>
then be a matter of improving storage efficiency and speed of access,<br>
just as switching to any other kind of persistent data store would be<br>
(and a JSON based save/load would migrate very well to NoSQL style<br>
persistent data storage services).<br>
<br>
Cheers,<br>
Nick.<br>
<br>
P.S. I'm going to be mostly offline until late next week, but found<br>
this idea intriguing enough to share before I left.<br>
<br>
--<br>
Nick Coghlan | <a href="mailto:ncoghlan@gmail.com" target="_blank">ncoghlan@gmail.com</a> | Brisbane, Australia<br>
</blockquote></div></div></div>