From greg.ewing at canterbury.ac.nz  Fri Apr  1 00:57:37 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 01 Apr 2016 17:57:37 +1300
Subject: [Python-Dev] The next major Python version will be Python 8
In-Reply-To: <ndkpll$kne$1@ger.gmane.org>
References: <CAMpsgwb2K7XREcuRbEKN5DxOvULH1WPeeNikFFEY1TPSSRezQQ@mail.gmail.com>
 <ndkpll$kne$1@ger.gmane.org>
Message-ID: <56FDFFC1.5020207@canterbury.ac.nz>

Serhiy Storchaka wrote:
> Does it combine the base of Python 2 with the power of Python 3?

No, that would be Python Backwards-Six.

-- 
Greg

From ncoghlan at gmail.com  Fri Apr  1 08:43:21 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 1 Apr 2016 22:43:21 +1000
Subject: [Python-Dev] Adding a Pip GUI to IDLE and idlelib (GSOC project)
In-Reply-To: <nd7tm8$ee0$1@ger.gmane.org>
References: <nd7tm8$ee0$1@ger.gmane.org>
Message-ID: <CADiSq7fC6_0W5mY3zP--jGCCkakCDRCj3YJLP4_P7UcLntnyog@mail.gmail.com>

On 27 March 2016 at 16:13, Terry Reedy <tjreedy at udel.edu> wrote:
> Thoughts?

+1 from me - being able to teach package installation without teaching
the command line first has been an oft-requested capability for a long
time.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rymg19 at gmail.com  Fri Apr  1 10:19:04 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Fri, 1 Apr 2016 09:19:04 -0500
Subject: [Python-Dev] The future of Python: fixing broken error handling in
 Python 8
Message-ID: <CAO41-mOY8E6_gWbiSTDsYswMFp=m71h44V57SNVvEyH39FRKbQ@mail.gmail.com>

Python's exception handling system is currently badly brokeTypeError:
unsupported operand type(s) for +: 'NoneType' and 'NoneType'n. Therefore,
with the recent news of the joyous release of Python 8 (
https://mail.python.org/pipermail/python-dev/2016-March/143603.html), I
have decided to propose a revolutionary idea: safe mock objects.

A "safe" mock object (qualified name
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`;
Java-style naming was adopted for readability purposes; comments are now no
longer necessary) is a magic object that supports everything and returns
itself. Since examples speak more words than are in the Python source code,
here are some (examples, not words in the Python source code):


a = 1
b = None
c = a + b # Returns a
_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8
print(c) # Prints the empty string.
d = c+1 # All operations on
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`'s
return a new one.
e = d.xyz(1, 2, 3) # `e` is now a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.
def f():
    assert 0 # Causes the function to return a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.
    raise 123 # Does the same thing.
print(L) # L is undefined, so it becomes a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.


Safe mock objects are obviously the Next Error Handling Revolution ?.
Unicode
errors now simply disappear and return more
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`s.

As for `try` and `catch` (protest the naming of `except`!!) statements,
they will
be completely ignored. The `try`, `except`, and `finally` bodies will all be
executed in sequence, except that printing and returning values with an
`except`
statement does nothing:


try:
    xyz = None.a # `xyz` becomes a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.
except:
    print(123) # Does nothing.
    return None # Does nothing.
finally:
    return xyz # Returns a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.


Aggressive error handling (as shown in PanicSort [https://xkcd.com/1185/])
that does destructive actions (such as `rm -rf /`) will always execute the
destructive code, encouraging more honest development.

In addition, due to errors simply being ignored, nothing can ever quite go
wrong.
All discussions about a safe navigation operator can now be immediately
halted,
since any undefined attributes will simply return a
`_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`.

Although I have not yet destroy--I mean, improved CPython to allow for this
amazing idea, I have created a primitive implementation of the
`_frozensafemockobjectimplementation` module:

https://github.com/kirbyfan64/_frozensafemockobjectimplementation

I hope you will all realize that this new idea is a drastic improvement
over current technologies and therefore support it, because we can Make
Python Great Again?.

-- 
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160401/d2b72a45/attachment-0001.html>

From robertomartinezp at gmail.com  Fri Apr  1 06:42:10 2016
From: robertomartinezp at gmail.com (=?UTF-8?Q?Roberto_Mart=C3=ADnez?=)
Date: Fri, 01 Apr 2016 10:42:10 +0000
Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35?
Message-ID: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>

Hi,

I am having a hard time trying to choose one of this two products:

Phyton 27:
http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU
Phyton 35:
http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM

Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that
Phyton 27 is more tested and have a bigger user base.

Can you help to choose?

Best regards,
Roberto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160401/7ba0541c/attachment.html>

From status at bugs.python.org  Fri Apr  1 12:08:40 2016
From: status at bugs.python.org (Python tracker)
Date: Fri,  1 Apr 2016 18:08:40 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20160401160840.165F456909@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2016-03-25 - 2016-04-01)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    5471 (+10)
  closed 32971 (+33)
  total  38442 (+43)

Open issues with patches: 2379 


Issues opened (32)
==================

#26643: regrtest: rework libregrtest.save_env submodule
http://bugs.python.org/issue26643  opened by haypo

#26646: Allow built-in module in package
http://bugs.python.org/issue26646  opened by Daniel Shaulov

#26647: ceval: use Wordcode, 16-bit bytecode
http://bugs.python.org/issue26647  opened by Demur Rumed

#26648: csv.reader Error message indicates to use deprecated
http://bugs.python.org/issue26648  opened by Philip Martin

#26650: calendar: OverflowErrors for year == 1 and firstweekday > 0
http://bugs.python.org/issue26650  opened by mjpieters

#26651: Deprecate register_adapter() and register_converter() in sqlit
http://bugs.python.org/issue26651  opened by berker.peksag

#26652: Cannot install Python 2.7.11 on Windows Server 2008 R2
http://bugs.python.org/issue26652  opened by Hung-Hsuan Chen

#26654: asyncio is not inspecting keyword arguments of functools.parti
http://bugs.python.org/issue26654  opened by iceboy

#26656: Documentation for re.compile is a bit outdated
http://bugs.python.org/issue26656  opened by Sworddragon

#26657: Directory traversal with http.server and SimpleHTTPServer on w
http://bugs.python.org/issue26657  opened by Thomas

#26658: test_os fails when run on Windows ramdisk
http://bugs.python.org/issue26658  opened by jkloth

#26659: slice() leaks memory when part of a cycle
http://bugs.python.org/issue26659  opened by Kevin Modzelewski

#26660: tempfile.TemporaryDirectory() cleanup exception on Windows if 
http://bugs.python.org/issue26660  opened by Laurent.Mazuel

#26661: python fails to locate system libffi
http://bugs.python.org/issue26661  opened by rkuska

#26662: configure/Makefile doesn't check if "python" command works, ne
http://bugs.python.org/issue26662  opened by haypo

#26663: asyncio _UnixWritePipeTransport._close abandons unflushed writ
http://bugs.python.org/issue26663  opened by Robert Smallshire

#26664: find a bug in activate.fish of venv of cpython3.6
http://bugs.python.org/issue26664  opened by ?????????

#26665: pip is not bootstrapped by default on 2.7
http://bugs.python.org/issue26665  opened by Axel

#26666: File object hook to modify select(ors) event mask
http://bugs.python.org/issue26666  opened by zwol

#26667: Update importlib to accept pathlib.Path objects
http://bugs.python.org/issue26667  opened by brett.cannon

#26668: Remove Lib/test/test_importlib/regrtest.py?
http://bugs.python.org/issue26668  opened by haypo

#26669: time.localtime(float("NaN")) does not raise a ValueError on al
http://bugs.python.org/issue26669  opened by gregory.p.smith

#26671: Clean up path_converter in posixmodule.c
http://bugs.python.org/issue26671  opened by serhiy.storchaka

#26672: regrtest missing in the module name
http://bugs.python.org/issue26672  opened by Axel

#26673: Tkinter error when opening IDLE configuration menu
http://bugs.python.org/issue26673  opened by wysaard

#26677: pyvenv: activate.fish breaks $PATH for bash scripts
http://bugs.python.org/issue26677  opened by Florian.Dold

#26678: Incorrect linking to elements in datetime package
http://bugs.python.org/issue26678  opened by andymaier

#26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted
http://bugs.python.org/issue26679  opened by Robert Bachmann

#26680: Incorporating float.is_integer into the numeric tower and Deci
http://bugs.python.org/issue26680  opened by Robert Smallshire2

#26682: Ttk Notebook tabs do not show with 1-2 char names
http://bugs.python.org/issue26682  opened by terry.reedy

#26683: Questionable terminology for describing what locals() does
http://bugs.python.org/issue26683  opened by rhettinger

#26685: Raise errors from socket.close()
http://bugs.python.org/issue26685  opened by martin.panter



Most recent 15 issues with no replies (15)
==========================================

#26677: pyvenv: activate.fish breaks $PATH for bash scripts
http://bugs.python.org/issue26677

#26672: regrtest missing in the module name
http://bugs.python.org/issue26672

#26669: time.localtime(float("NaN")) does not raise a ValueError on al
http://bugs.python.org/issue26669

#26667: Update importlib to accept pathlib.Path objects
http://bugs.python.org/issue26667

#26665: pip is not bootstrapped by default on 2.7
http://bugs.python.org/issue26665

#26663: asyncio _UnixWritePipeTransport._close abandons unflushed writ
http://bugs.python.org/issue26663

#26661: python fails to locate system libffi
http://bugs.python.org/issue26661

#26660: tempfile.TemporaryDirectory() cleanup exception on Windows if 
http://bugs.python.org/issue26660

#26656: Documentation for re.compile is a bit outdated
http://bugs.python.org/issue26656

#26652: Cannot install Python 2.7.11 on Windows Server 2008 R2
http://bugs.python.org/issue26652

#26626: test_dbm_gnu
http://bugs.python.org/issue26626

#26618: _overlapped extension module of asyncio uses deprecated WSAStr
http://bugs.python.org/issue26618

#26615: Missing entry in WRAPPER_ASSIGNMENTS in update_wrapper's doc
http://bugs.python.org/issue26615

#26609: Wrong request target in test_httpservers.py
http://bugs.python.org/issue26609

#26600: MagickMock __str__ sometimes returns MagickMock instead of str
http://bugs.python.org/issue26600



Most recent 15 issues waiting for review (15)
=============================================

#26685: Raise errors from socket.close()
http://bugs.python.org/issue26685

#26680: Incorporating float.is_integer into the numeric tower and Deci
http://bugs.python.org/issue26680

#26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted
http://bugs.python.org/issue26679

#26671: Clean up path_converter in posixmodule.c
http://bugs.python.org/issue26671

#26661: python fails to locate system libffi
http://bugs.python.org/issue26661

#26658: test_os fails when run on Windows ramdisk
http://bugs.python.org/issue26658

#26657: Directory traversal with http.server and SimpleHTTPServer on w
http://bugs.python.org/issue26657

#26651: Deprecate register_adapter() and register_converter() in sqlit
http://bugs.python.org/issue26651

#26650: calendar: OverflowErrors for year == 1 and firstweekday > 0
http://bugs.python.org/issue26650

#26648: csv.reader Error message indicates to use deprecated
http://bugs.python.org/issue26648

#26647: ceval: use Wordcode, 16-bit bytecode
http://bugs.python.org/issue26647

#26646: Allow built-in module in package
http://bugs.python.org/issue26646

#26643: regrtest: rework libregrtest.save_env submodule
http://bugs.python.org/issue26643

#26642: Replace stdout and stderr with simple standard printers at Pyt
http://bugs.python.org/issue26642

#26639: Tools/i18n/pygettext.py: replace deprecated imp module with im
http://bugs.python.org/issue26639



Top 10 most discussed issues (10)
=================================

#26488: hashlib command line interface
http://bugs.python.org/issue26488  15 msgs

#26647: ceval: use Wordcode, 16-bit bytecode
http://bugs.python.org/issue26647  15 msgs

#26624: Windows hangs in call to CRT setlocale()
http://bugs.python.org/issue26624  10 msgs

#18844: allow weights in random.choice
http://bugs.python.org/issue18844   8 msgs

#26632: __all__ decorator
http://bugs.python.org/issue26632   6 msgs

#26658: test_os fails when run on Windows ramdisk
http://bugs.python.org/issue26658   6 msgs

#26680: Incorporating float.is_integer into the numeric tower and Deci
http://bugs.python.org/issue26680   6 msgs

#23551: IDLE to provide menu link to PIP gui.
http://bugs.python.org/issue23551   5 msgs

#23735: Readline not adjusting width after resize with 6.3
http://bugs.python.org/issue23735   5 msgs

#26606: logging.baseConfig is missing the encoding parameter
http://bugs.python.org/issue26606   5 msgs



Issues closed (30)
==================

#15117: Please document top-level sqlite3 module variables
http://bugs.python.org/issue15117  closed by berker.peksag

#18691: sqlite3.Cursor.execute expects sequence as second argument.
http://bugs.python.org/issue18691  closed by berker.peksag

#19065: sqlite3 timestamp adapter chokes on timezones
http://bugs.python.org/issue19065  closed by berker.peksag

#22218: Fix more compiler warnings "comparison between signed and unsi
http://bugs.python.org/issue22218  closed by haypo

#22854: Documentation/implementation out of sync for IO
http://bugs.python.org/issue22854  closed by martin.panter

#23758: Improve documenation about num_params in sqlite3 create_functi
http://bugs.python.org/issue23758  closed by berker.peksag

#23804: SSLSocket.recv(0) receives up to 1024 bytes
http://bugs.python.org/issue23804  closed by martin.panter

#25195: mock.ANY doesn't match mock.MagicMock() object
http://bugs.python.org/issue25195  closed by berker.peksag

#25256: Add sys.debug_build public variable to check if Python was com
http://bugs.python.org/issue25256  closed by haypo

#25276: Intermittent segfaults on PPC64 AIX 3.x
http://bugs.python.org/issue25276  closed by haypo

#25289: test_strptime hangs sometimes on AMD64 Windows7 SP1 3.x buildb
http://bugs.python.org/issue25289  closed by haypo

#25940: SSL tests failed due to expired svn.python.org SSL certificate
http://bugs.python.org/issue25940  closed by martin.panter

#26130: redundant local copy of a char pointer in classify in Parser\p
http://bugs.python.org/issue26130  closed by berker.peksag

#26492: Exhausted array iterator should left exhausted
http://bugs.python.org/issue26492  closed by serhiy.storchaka

#26494: Double deallocation on iterator exhausting
http://bugs.python.org/issue26494  closed by serhiy.storchaka

#26591: datetime datetime.time to datetime.time comparison does nothin
http://bugs.python.org/issue26591  closed by belopolsky

#26616: A bug in datetime.astimezone() method
http://bugs.python.org/issue26616  closed by belopolsky

#26640: xmlrpc.server imports xmlrpc.client
http://bugs.python.org/issue26640  closed by brett.cannon

#26641: doctest doesn't support packages
http://bugs.python.org/issue26641  closed by haypo

#26644: SSLSocket.recv(-1) triggers SystemError
http://bugs.python.org/issue26644  closed by martin.panter

#26645: argparse prints help messages to stdout instead of stderr by d
http://bugs.python.org/issue26645  closed by serhiy.storchaka

#26649: Fail update installation: 'utf-8' codec can't decode
http://bugs.python.org/issue26649  closed by haypo

#26653: bisect raises a TypeError when hi is None
http://bugs.python.org/issue26653  closed by rhettinger

#26655: pathlib glob case sensitivity issue on Windows
http://bugs.python.org/issue26655  closed by SilentGhost

#26670: Add a developer mode: -X dev command line option
http://bugs.python.org/issue26670  closed by haypo

#26674: ???typo??? Japanese Documentation
http://bugs.python.org/issue26674  closed by ezio.melotti

#26675: Appending to a large list flushes old entries
http://bugs.python.org/issue26675  closed by Swaprava Nath

#26676: Add missing XMLPullParser to ElementTree.__all__
http://bugs.python.org/issue26676  closed by martin.panter

#26681: decorators for attributes
http://bugs.python.org/issue26681  closed by ethan.furman

#26684: pathlib.Path.with_name() and .with_suffix do not allow combini
http://bugs.python.org/issue26684  closed by ethan.furman

From rosuav at gmail.com  Fri Apr  1 12:21:15 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 2 Apr 2016 03:21:15 +1100
Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35?
In-Reply-To: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
References: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
Message-ID: <CAPTjJmoiSK_GOddX-LdbHo+0mVMWNmwd817-xYptkWv83=PciA@mail.gmail.com>

On Fri, Apr 1, 2016 at 9:42 PM, Roberto Mart?nez
<robertomartinezp at gmail.com> wrote:
> I am having a hard time trying to choose one of this two products:
>
> Phyton 27:
> http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU
> Phyton 35:
> http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM
>
> Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that
> Phyton 27 is more tested and have a bigger user base.
>
> Can you help to choose?

Sure! This is a fairly common question, and it comes down to what sort
of plants you're trying to use this with. Some plants prefer Phyton
27, while others prefer Phyton 35. Most plants are happy with either,
though, so unless you have a good reason to do otherwise, use Phyton
35.

Phyton 35 has some significant improvements that make it far better at
handling plants from different parts of the world. And even some
American plants have special black markings on them, or cost so much
money that they're priced in Euros, or for some similar reason need
the advanced care of Phyton 35. As such, I strongly recommend that you
develop a taste for Phyton 35, as it will serve you better in the long
run. In this era of international foods in every supermarket aisle,
you cannot simply dismiss the black marks as "funny spots" and wish
they'd just go away; you MUST have a fungicide which can adequately
handle them.

ChrisA

PS. This is an *awesome* find! Nice going.

From bussonniermatthias at gmail.com  Fri Apr  1 12:35:42 2016
From: bussonniermatthias at gmail.com (Matthias Bussonnier)
Date: Fri, 1 Apr 2016 09:35:42 -0700
Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35?
In-Reply-To: <CAPTjJmoiSK_GOddX-LdbHo+0mVMWNmwd817-xYptkWv83=PciA@mail.gmail.com>
References: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
 <CAPTjJmoiSK_GOddX-LdbHo+0mVMWNmwd817-xYptkWv83=PciA@mail.gmail.com>
Message-ID: <CANJQusUwLt3hV15vpp8nfyyNg4MeCp4ZUj-0tSR8d-PE7JLY-A@mail.gmail.com>

On Fri, Apr 1, 2016 at 9:21 AM, Chris Angelico <rosuav at gmail.com> wrote:
> On Fri, Apr 1, 2016 at 9:42 PM, Roberto Mart?nez
> <robertomartinezp at gmail.com> wrote:
>> I am having a hard time trying to choose one of this two products:
>>
>> Phyton 27:
>> http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU
>> Phyton 35:
>> http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM
>>
>> Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that
>> Phyton 27 is more tested and have a bigger user base.
>>
>> Can you help to choose?
>
> Sure! This is a fairly common question, and it comes down to what sort
> of plants you're trying to use this with. Some plants prefer Phyton
> 27, while others prefer Phyton 35. Most plants are happy with either,
> though, so unless you have a good reason to do otherwise, use Phyton
> 35.
>
> Phyton 35 has some significant improvements that make it far better at
> handling plants from different parts of the world. And even some
> American plants have special black markings on them, or cost so much
> money that they're priced in Euros, or for some similar reason need
> the advanced care of Phyton 35. As such, I strongly recommend that you
> develop a taste for Phyton 35, as it will serve you better in the long
> run. In this era of international foods in every supermarket aisle,
> you cannot simply dismiss the black marks as "funny spots" and wish
> they'd just go away; you MUST have a fungicide which can adequately
> handle them.
>


Also keep in mind that Phyton 35 improve on previous fungicide by allowing
asynchronous plant growing using  eukaryotic microorganisms also known
`yeast from`.


-- 
M

From rymg19 at gmail.com  Fri Apr  1 13:08:37 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Fri, 1 Apr 2016 12:08:37 -0500
Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35?
In-Reply-To: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
References: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
Message-ID: <CAO41-mMwaQuED_dhBTWM9u2m85J4z5gG3rr1d6MhCrabWqDMVA@mail.gmail.com>

Well, based on recent feedback, you should wait for Phyton 80, which will
also make your bean plants start growing hair.

(Side note: This is seriously weird. :O )

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
Hi,

I am having a hard time trying to choose one of this two products:

Phyton 27:
http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU
Phyton 35:
http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM

Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that
Phyton 27 is more tested and have a bigger user base.

Can you help to choose?

Best regards,
Roberto





_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160401/6575794c/attachment.html>

From brett at snarky.ca  Fri Apr  1 14:07:18 2016
From: brett at snarky.ca (Brett Cannon)
Date: Fri, 01 Apr 2016 18:07:18 +0000
Subject: [Python-Dev] [Python-checkins] cpython: Python 8: no pep8,
 no chocolate!
In-Reply-To: <20160331214027.11092.50943.0083A2D0@psf.io>
References: <20160331214027.11092.50943.0083A2D0@psf.io>
Message-ID: <CAP1=2W5wn71aFSkRPOQXcPB-MWK3MO=yb-b-F=iXc=Ci7R-o2A@mail.gmail.com>

Are you planning on removing this after today? My worry about leaving it in
is if it's a modified copy that follows your Python 8 April Fools joke then
it will quite possibly trip people up who try and run pep8 but don't have
it installed, leading them to wonder why the heck their imports are now all
flagged as broken.

On Thu, 31 Mar 2016 at 14:40 victor.stinner <python-checkins at python.org>
wrote:

> https://hg.python.org/cpython/rev/9aedec2dbc01
> changeset:   100818:9aedec2dbc01
> user:        Victor Stinner <victor.stinner at gmail.com>
> date:        Thu Mar 31 23:30:53 2016 +0200
> summary:
>   Python 8: no pep8, no chocolate!
>
> files:
>   Include/patchlevel.h |     6 +-
>   Lib/pep8.py          |  2151 ++++++++++++++++++++++++++++++
>   Lib/site.py          |    56 +
>   3 files changed, 2210 insertions(+), 3 deletions(-)
>
>
> diff --git a/Include/patchlevel.h b/Include/patchlevel.h
> --- a/Include/patchlevel.h
> +++ b/Include/patchlevel.h
> @@ -16,14 +16,14 @@
>
>  /* Version parsed out into numeric values */
>  /*--start constants--*/
> -#define PY_MAJOR_VERSION       3
> -#define PY_MINOR_VERSION       6
> +#define PY_MAJOR_VERSION       8
> +#define PY_MINOR_VERSION       0
>  #define PY_MICRO_VERSION       0
>  #define PY_RELEASE_LEVEL       PY_RELEASE_LEVEL_ALPHA
>  #define PY_RELEASE_SERIAL      0
>
>  /* Version as a string */
> -#define PY_VERSION             "3.6.0a0"
> +#define PY_VERSION             "8.0.0a0"
>  /*--end constants--*/
>
>  /* Version as a single 4-byte hex number, e.g. 0x010502B2 == 1.5.2b2.
> diff --git a/Lib/pep8.py b/Lib/pep8.py
> new file mode 100644
> --- /dev/null
> +++ b/Lib/pep8.py
> @@ -0,0 +1,2151 @@
> +#!/usr/bin/env python
> +# pep8.py - Check Python source code formatting, according to PEP 8
> +# Copyright (C) 2006-2009 Johann C. Rocholl <johann at rocholl.net>
> +# Copyright (C) 2009-2014 Florent Xicluna <florent.xicluna at gmail.com>
> +# Copyright (C) 2014-2016 Ian Lee <ianlee1521 at gmail.com>
> +#
> +# Permission is hereby granted, free of charge, to any person
> +# obtaining a copy of this software and associated documentation files
> +# (the "Software"), to deal in the Software without restriction,
> +# including without limitation the rights to use, copy, modify, merge,
> +# publish, distribute, sublicense, and/or sell copies of the Software,
> +# and to permit persons to whom the Software is furnished to do so,
> +# subject to the following conditions:
> +#
> +# The above copyright notice and this permission notice shall be
> +# included in all copies or substantial portions of the Software.
> +#
> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> +# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> +# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> +# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> +# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> +# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +# SOFTWARE.
> +
> +r"""
> +Check Python source code formatting, according to PEP 8.
> +
> +For usage and a list of options, try this:
> +$ python pep8.py -h
> +
> +This program and its regression test suite live here:
> +https://github.com/pycqa/pep8
> +
> +Groups of errors and warnings:
> +E errors
> +W warnings
> +100 indentation
> +200 whitespace
> +300 blank lines
> +400 imports
> +500 line length
> +600 deprecation
> +700 statements
> +900 syntax error
> +"""
> +from __future__ import with_statement
> +
> +import os
> +import sys
> +import re
> +import time
> +import inspect
> +import keyword
> +import tokenize
> +from optparse import OptionParser
> +from fnmatch import fnmatch
> +try:
> +    from configparser import RawConfigParser
> +    from io import TextIOWrapper
> +except ImportError:
> +    from ConfigParser import RawConfigParser
> +
> +__version__ = '1.7.0'
> +
> +DEFAULT_EXCLUDE = '.svn,CVS,.bzr,.hg,.git,__pycache__,.tox'
> +DEFAULT_IGNORE = 'E121,E123,E126,E226,E24,E704'
> +try:
> +    if sys.platform == 'win32':
> +        USER_CONFIG = os.path.expanduser(r'~\.pep8')
> +    else:
> +        USER_CONFIG = os.path.join(
> +            os.getenv('XDG_CONFIG_HOME') or
> os.path.expanduser('~/.config'),
> +            'pep8'
> +        )
> +except ImportError:
> +    USER_CONFIG = None
> +
> +PROJECT_CONFIG = ('setup.cfg', 'tox.ini', '.pep8')
> +TESTSUITE_PATH = os.path.join(os.path.dirname(__file__), 'testsuite')
> +MAX_LINE_LENGTH = 79
> +REPORT_FORMAT = {
> +    'default': '%(path)s:%(row)d:%(col)d: %(code)s %(text)s',
> +    'pylint': '%(path)s:%(row)d: [%(code)s] %(text)s',
> +}
> +
> +PyCF_ONLY_AST = 1024
> +SINGLETONS = frozenset(['False', 'None', 'True'])
> +KEYWORDS = frozenset(keyword.kwlist + ['print']) - SINGLETONS
> +UNARY_OPERATORS = frozenset(['>>', '**', '*', '+', '-'])
> +ARITHMETIC_OP = frozenset(['**', '*', '/', '//', '+', '-'])
> +WS_OPTIONAL_OPERATORS = ARITHMETIC_OP.union(['^', '&', '|', '<<', '>>',
> '%'])
> +WS_NEEDED_OPERATORS = frozenset([
> +    '**=', '*=', '/=', '//=', '+=', '-=', '!=', '<>', '<', '>',
> +    '%=', '^=', '&=', '|=', '==', '<=', '>=', '<<=', '>>=', '='])
> +WHITESPACE = frozenset(' \t')
> +NEWLINE = frozenset([tokenize.NL, tokenize.NEWLINE])
> +SKIP_TOKENS = NEWLINE.union([tokenize.INDENT, tokenize.DEDENT])
> +# ERRORTOKEN is triggered by backticks in Python 3
> +SKIP_COMMENTS = SKIP_TOKENS.union([tokenize.COMMENT, tokenize.ERRORTOKEN])
> +BENCHMARK_KEYS = ['directories', 'files', 'logical lines', 'physical
> lines']
> +
> +INDENT_REGEX = re.compile(r'([ \t]*)')
> +RAISE_COMMA_REGEX = re.compile(r'raise\s+\w+\s*,')
> +RERAISE_COMMA_REGEX = re.compile(r'raise\s+\w+\s*,.*,\s*\w+\s*$')
> +ERRORCODE_REGEX = re.compile(r'\b[A-Z]\d{3}\b')
> +DOCSTRING_REGEX = re.compile(r'u?r?["\']')
> +EXTRANEOUS_WHITESPACE_REGEX = re.compile(r'[[({] | []}),;:]')
> +WHITESPACE_AFTER_COMMA_REGEX = re.compile(r'[,;:]\s*(?:  |\t)')
> +COMPARE_SINGLETON_REGEX = re.compile(r'(\bNone|\bFalse|\bTrue)?\s*([=!]=)'
> +                                     r'\s*(?(1)|(None|False|True))\b')
> +COMPARE_NEGATIVE_REGEX = re.compile(r'\b(not)\s+[^][)(}{ ]+\s+(in|is)\s')
> +COMPARE_TYPE_REGEX =
> re.compile(r'(?:[=!]=|is(?:\s+not)?)\s*type(?:s.\w+Type'
> +                                r'|\s*\(\s*([^)]*[^ )])\s*\))')
> +KEYWORD_REGEX = re.compile(r'(\s*)\b(?:%s)\b(\s*)' % r'|'.join(KEYWORDS))
> +OPERATOR_REGEX = re.compile(r'(?:[^,\s])(\s*)(?:[-+*/|!<=>%&^]+)(\s*)')
> +LAMBDA_REGEX = re.compile(r'\blambda\b')
> +HUNK_REGEX = re.compile(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@.*$')
> +
> +# Work around Python < 2.6 behaviour, which does not generate NL after
> +# a comment which is on a line by itself.
> +COMMENT_WITH_NL = tokenize.generate_tokens(['#\n'].pop).send(None)[1] ==
> '#\n'
> +
> +
>
> +##############################################################################
> +# Plugins (check functions) for physical lines
>
> +##############################################################################
> +
> +
> +def tabs_or_spaces(physical_line, indent_char):
> +    r"""Never mix tabs and spaces.
> +
> +    The most popular way of indenting Python is with spaces only.  The
> +    second-most popular way is with tabs only.  Code indented with a
> mixture
> +    of tabs and spaces should be converted to using spaces exclusively.
> When
> +    invoking the Python command line interpreter with the -t option, it
> issues
> +    warnings about code that illegally mixes tabs and spaces.  When using
> -tt
> +    these warnings become errors.  These options are highly recommended!
> +
> +    Okay: if a == 0:\n        a = 1\n        b = 1
> +    E101: if a == 0:\n        a = 1\n\tb = 1
> +    """
> +    indent = INDENT_REGEX.match(physical_line).group(1)
> +    for offset, char in enumerate(indent):
> +        if char != indent_char:
> +            return offset, "E101 indentation contains mixed spaces and
> tabs"
> +
> +
> +def tabs_obsolete(physical_line):
> +    r"""For new projects, spaces-only are strongly recommended over tabs.
> +
> +    Okay: if True:\n    return
> +    W191: if True:\n\treturn
> +    """
> +    indent = INDENT_REGEX.match(physical_line).group(1)
> +    if '\t' in indent:
> +        return indent.index('\t'), "W191 indentation contains tabs"
> +
> +
> +def trailing_whitespace(physical_line):
> +    r"""Trailing whitespace is superfluous.
> +
> +    The warning returned varies on whether the line itself is blank, for
> easier
> +    filtering for those who want to indent their blank lines.
> +
> +    Okay: spam(1)\n#
> +    W291: spam(1) \n#
> +    W293: class Foo(object):\n    \n    bang = 12
> +    """
> +    physical_line = physical_line.rstrip('\n')    # chr(10), newline
> +    physical_line = physical_line.rstrip('\r')    # chr(13), carriage
> return
> +    physical_line = physical_line.rstrip('\x0c')  # chr(12), form feed, ^L
> +    stripped = physical_line.rstrip(' \t\v')
> +    if physical_line != stripped:
> +        if stripped:
> +            return len(stripped), "W291 trailing whitespace"
> +        else:
> +            return 0, "W293 blank line contains whitespace"
> +
> +
> +def trailing_blank_lines(physical_line, lines, line_number, total_lines):
> +    r"""Trailing blank lines are superfluous.
> +
> +    Okay: spam(1)
> +    W391: spam(1)\n
> +
> +    However the last line should end with a new line (warning W292).
> +    """
> +    if line_number == total_lines:
> +        stripped_last_line = physical_line.rstrip()
> +        if not stripped_last_line:
> +            return 0, "W391 blank line at end of file"
> +        if stripped_last_line == physical_line:
> +            return len(physical_line), "W292 no newline at end of file"
> +
> +
> +def maximum_line_length(physical_line, max_line_length, multiline):
> +    r"""Limit all lines to a maximum of 79 characters.
> +
> +    There are still many devices around that are limited to 80 character
> +    lines; plus, limiting windows to 80 characters makes it possible to
> have
> +    several windows side-by-side.  The default wrapping on such devices
> looks
> +    ugly.  Therefore, please limit all lines to a maximum of 79
> characters.
> +    For flowing long blocks of text (docstrings or comments), limiting the
> +    length to 72 characters is recommended.
> +
> +    Reports error E501.
> +    """
> +    line = physical_line.rstrip()
> +    length = len(line)
> +    if length > max_line_length and not noqa(line):
> +        # Special case for long URLs in multi-line docstrings or comments,
> +        # but still report the error when the 72 first chars are
> whitespaces.
> +        chunks = line.split()
> +        if ((len(chunks) == 1 and multiline) or
> +            (len(chunks) == 2 and chunks[0] == '#')) and \
> +                len(line) - len(chunks[-1]) < max_line_length - 7:
> +            return
> +        if hasattr(line, 'decode'):   # Python 2
> +            # The line could contain multi-byte characters
> +            try:
> +                length = len(line.decode('utf-8'))
> +            except UnicodeError:
> +                pass
> +        if length > max_line_length:
> +            return (max_line_length, "E501 line too long "
> +                    "(%d > %d characters)" % (length, max_line_length))
> +
> +
>
> +##############################################################################
> +# Plugins (check functions) for logical lines
>
> +##############################################################################
> +
> +
> +def blank_lines(logical_line, blank_lines, indent_level, line_number,
> +                blank_before, previous_logical, previous_indent_level):
> +    r"""Separate top-level function and class definitions with two blank
> lines.
> +
> +    Method definitions inside a class are separated by a single blank
> line.
> +
> +    Extra blank lines may be used (sparingly) to separate groups of
> related
> +    functions.  Blank lines may be omitted between a bunch of related
> +    one-liners (e.g. a set of dummy implementations).
> +
> +    Use blank lines in functions, sparingly, to indicate logical sections.
> +
> +    Okay: def a():\n    pass\n\n\ndef b():\n    pass
> +    Okay: def a():\n    pass\n\n\n# Foo\n# Bar\n\ndef b():\n    pass
> +
> +    E301: class Foo:\n    b = 0\n    def bar():\n        pass
> +    E302: def a():\n    pass\n\ndef b(n):\n    pass
> +    E303: def a():\n    pass\n\n\n\ndef b(n):\n    pass
> +    E303: def a():\n\n\n\n    pass
> +    E304: @decorator\n\ndef a():\n    pass
> +    """
> +    if line_number < 3 and not previous_logical:
> +        return  # Don't expect blank lines before the first line
> +    if previous_logical.startswith('@'):
> +        if blank_lines:
> +            yield 0, "E304 blank lines found after function decorator"
> +    elif blank_lines > 2 or (indent_level and blank_lines == 2):
> +        yield 0, "E303 too many blank lines (%d)" % blank_lines
> +    elif logical_line.startswith(('def ', 'class ', '@')):
> +        if indent_level:
> +            if not (blank_before or previous_indent_level < indent_level
> or
> +                    DOCSTRING_REGEX.match(previous_logical)):
> +                yield 0, "E301 expected 1 blank line, found 0"
> +        elif blank_before != 2:
> +            yield 0, "E302 expected 2 blank lines, found %d" %
> blank_before
> +
> +
> +def extraneous_whitespace(logical_line):
> +    r"""Avoid extraneous whitespace.
> +
> +    Avoid extraneous whitespace in these situations:
> +    - Immediately inside parentheses, brackets or braces.
> +    - Immediately before a comma, semicolon, or colon.
> +
> +    Okay: spam(ham[1], {eggs: 2})
> +    E201: spam( ham[1], {eggs: 2})
> +    E201: spam(ham[ 1], {eggs: 2})
> +    E201: spam(ham[1], { eggs: 2})
> +    E202: spam(ham[1], {eggs: 2} )
> +    E202: spam(ham[1 ], {eggs: 2})
> +    E202: spam(ham[1], {eggs: 2 })
> +
> +    E203: if x == 4: print x, y; x, y = y , x
> +    E203: if x == 4: print x, y ; x, y = y, x
> +    E203: if x == 4 : print x, y; x, y = y, x
> +    """
> +    line = logical_line
> +    for match in EXTRANEOUS_WHITESPACE_REGEX.finditer(line):
> +        text = match.group()
> +        char = text.strip()
> +        found = match.start()
> +        if text == char + ' ':
> +            # assert char in '([{'
> +            yield found + 1, "E201 whitespace after '%s'" % char
> +        elif line[found - 1] != ',':
> +            code = ('E202' if char in '}])' else 'E203')  # if char in
> ',;:'
> +            yield found, "%s whitespace before '%s'" % (code, char)
> +
> +
> +def whitespace_around_keywords(logical_line):
> +    r"""Avoid extraneous whitespace around keywords.
> +
> +    Okay: True and False
> +    E271: True and  False
> +    E272: True  and False
> +    E273: True and\tFalse
> +    E274: True\tand False
> +    """
> +    for match in KEYWORD_REGEX.finditer(logical_line):
> +        before, after = match.groups()
> +
> +        if '\t' in before:
> +            yield match.start(1), "E274 tab before keyword"
> +        elif len(before) > 1:
> +            yield match.start(1), "E272 multiple spaces before keyword"
> +
> +        if '\t' in after:
> +            yield match.start(2), "E273 tab after keyword"
> +        elif len(after) > 1:
> +            yield match.start(2), "E271 multiple spaces after keyword"
> +
> +
> +def missing_whitespace(logical_line):
> +    r"""Each comma, semicolon or colon should be followed by whitespace.
> +
> +    Okay: [a, b]
> +    Okay: (3,)
> +    Okay: a[1:4]
> +    Okay: a[:4]
> +    Okay: a[1:]
> +    Okay: a[1:4:2]
> +    E231: ['a','b']
> +    E231: foo(bar,baz)
> +    E231: [{'a':'b'}]
> +    """
> +    line = logical_line
> +    for index in range(len(line) - 1):
> +        char = line[index]
> +        if char in ',;:' and line[index + 1] not in WHITESPACE:
> +            before = line[:index]
> +            if char == ':' and before.count('[') > before.count(']') and \
> +                    before.rfind('{') < before.rfind('['):
> +                continue  # Slice syntax, no space required
> +            if char == ',' and line[index + 1] == ')':
> +                continue  # Allow tuple with only one element: (3,)
> +            yield index, "E231 missing whitespace after '%s'" % char
> +
> +
> +def indentation(logical_line, previous_logical, indent_char,
> +                indent_level, previous_indent_level):
> +    r"""Use 4 spaces per indentation level.
> +
> +    For really old code that you don't want to mess up, you can continue
> to
> +    use 8-space tabs.
> +
> +    Okay: a = 1
> +    Okay: if a == 0:\n    a = 1
> +    E111:   a = 1
> +    E114:   # a = 1
> +
> +    Okay: for item in items:\n    pass
> +    E112: for item in items:\npass
> +    E115: for item in items:\n# Hi\n    pass
> +
> +    Okay: a = 1\nb = 2
> +    E113: a = 1\n    b = 2
> +    E116: a = 1\n    # b = 2
> +    """
> +    c = 0 if logical_line else 3
> +    tmpl = "E11%d %s" if logical_line else "E11%d %s (comment)"
> +    if indent_level % 4:
> +        yield 0, tmpl % (1 + c, "indentation is not a multiple of four")
> +    indent_expect = previous_logical.endswith(':')
> +    if indent_expect and indent_level <= previous_indent_level:
> +        yield 0, tmpl % (2 + c, "expected an indented block")
> +    elif not indent_expect and indent_level > previous_indent_level:
> +        yield 0, tmpl % (3 + c, "unexpected indentation")
> +
> +
> +def continued_indentation(logical_line, tokens, indent_level,
> hang_closing,
> +                          indent_char, noqa, verbose):
> +    r"""Continuation lines indentation.
> +
> +    Continuation lines should align wrapped elements either vertically
> +    using Python's implicit line joining inside parentheses, brackets
> +    and braces, or using a hanging indent.
> +
> +    When using a hanging indent these considerations should be applied:
> +    - there should be no arguments on the first line, and
> +    - further indentation should be used to clearly distinguish itself as
> a
> +      continuation line.
> +
> +    Okay: a = (\n)
> +    E123: a = (\n    )
> +
> +    Okay: a = (\n    42)
> +    E121: a = (\n   42)
> +    E122: a = (\n42)
> +    E123: a = (\n    42\n    )
> +    E124: a = (24,\n     42\n)
> +    E125: if (\n    b):\n    pass
> +    E126: a = (\n        42)
> +    E127: a = (24,\n      42)
> +    E128: a = (24,\n    42)
> +    E129: if (a or\n    b):\n    pass
> +    E131: a = (\n    42\n 24)
> +    """
> +    first_row = tokens[0][2][0]
> +    nrows = 1 + tokens[-1][2][0] - first_row
> +    if noqa or nrows == 1:
> +        return
> +
> +    # indent_next tells us whether the next block is indented; assuming
> +    # that it is indented by 4 spaces, then we should not allow 4-space
> +    # indents on the final continuation line; in turn, some other
> +    # indents are allowed to have an extra 4 spaces.
> +    indent_next = logical_line.endswith(':')
> +
> +    row = depth = 0
> +    valid_hangs = (4,) if indent_char != '\t' else (4, 8)
> +    # remember how many brackets were opened on each line
> +    parens = [0] * nrows
> +    # relative indents of physical lines
> +    rel_indent = [0] * nrows
> +    # for each depth, collect a list of opening rows
> +    open_rows = [[0]]
> +    # for each depth, memorize the hanging indentation
> +    hangs = [None]
> +    # visual indents
> +    indent_chances = {}
> +    last_indent = tokens[0][2]
> +    visual_indent = None
> +    last_token_multiline = False
> +    # for each depth, memorize the visual indent column
> +    indent = [last_indent[1]]
> +    if verbose >= 3:
> +        print(">>> " + tokens[0][4].rstrip())
> +
> +    for token_type, text, start, end, line in tokens:
> +
> +        newline = row < start[0] - first_row
> +        if newline:
> +            row = start[0] - first_row
> +            newline = not last_token_multiline and token_type not in
> NEWLINE
> +
> +        if newline:
> +            # this is the beginning of a continuation line.
> +            last_indent = start
> +            if verbose >= 3:
> +                print("... " + line.rstrip())
> +
> +            # record the initial indent.
> +            rel_indent[row] = expand_indent(line) - indent_level
> +
> +            # identify closing bracket
> +            close_bracket = (token_type == tokenize.OP and text in ']})')
> +
> +            # is the indent relative to an opening bracket line?
> +            for open_row in reversed(open_rows[depth]):
> +                hang = rel_indent[row] - rel_indent[open_row]
> +                hanging_indent = hang in valid_hangs
> +                if hanging_indent:
> +                    break
> +            if hangs[depth]:
> +                hanging_indent = (hang == hangs[depth])
> +            # is there any chance of visual indent?
> +            visual_indent = (not close_bracket and hang > 0 and
> +                             indent_chances.get(start[1]))
> +
> +            if close_bracket and indent[depth]:
> +                # closing bracket for visual indent
> +                if start[1] != indent[depth]:
> +                    yield (start, "E124 closing bracket does not match "
> +                           "visual indentation")
> +            elif close_bracket and not hang:
> +                # closing bracket matches indentation of opening
> bracket's line
> +                if hang_closing:
> +                    yield start, "E133 closing bracket is missing
> indentation"
> +            elif indent[depth] and start[1] < indent[depth]:
> +                if visual_indent is not True:
> +                    # visual indent is broken
> +                    yield (start, "E128 continuation line "
> +                           "under-indented for visual indent")
> +            elif hanging_indent or (indent_next and rel_indent[row] == 8):
> +                # hanging indent is verified
> +                if close_bracket and not hang_closing:
> +                    yield (start, "E123 closing bracket does not match "
> +                           "indentation of opening bracket's line")
> +                hangs[depth] = hang
> +            elif visual_indent is True:
> +                # visual indent is verified
> +                indent[depth] = start[1]
> +            elif visual_indent in (text, str):
> +                # ignore token lined up with matching one from a previous
> line
> +                pass
> +            else:
> +                # indent is broken
> +                if hang <= 0:
> +                    error = "E122", "missing indentation or outdented"
> +                elif indent[depth]:
> +                    error = "E127", "over-indented for visual indent"
> +                elif not close_bracket and hangs[depth]:
> +                    error = "E131", "unaligned for hanging indent"
> +                else:
> +                    hangs[depth] = hang
> +                    if hang > 4:
> +                        error = "E126", "over-indented for hanging indent"
> +                    else:
> +                        error = "E121", "under-indented for hanging
> indent"
> +                yield start, "%s continuation line %s" % error
> +
> +        # look for visual indenting
> +        if (parens[row] and
> +                token_type not in (tokenize.NL, tokenize.COMMENT) and
> +                not indent[depth]):
> +            indent[depth] = start[1]
> +            indent_chances[start[1]] = True
> +            if verbose >= 4:
> +                print("bracket depth %s indent to %s" % (depth, start[1]))
> +        # deal with implicit string concatenation
> +        elif (token_type in (tokenize.STRING, tokenize.COMMENT) or
> +              text in ('u', 'ur', 'b', 'br')):
> +            indent_chances[start[1]] = str
> +        # special case for the "if" statement because len("if (") == 4
> +        elif not indent_chances and not row and not depth and text ==
> 'if':
> +            indent_chances[end[1] + 1] = True
> +        elif text == ':' and line[end[1]:].isspace():
> +            open_rows[depth].append(row)
> +
> +        # keep track of bracket depth
> +        if token_type == tokenize.OP:
> +            if text in '([{':
> +                depth += 1
> +                indent.append(0)
> +                hangs.append(None)
> +                if len(open_rows) == depth:
> +                    open_rows.append([])
> +                open_rows[depth].append(row)
> +                parens[row] += 1
> +                if verbose >= 4:
> +                    print("bracket depth %s seen, col %s, visual min =
> %s" %
> +                          (depth, start[1], indent[depth]))
> +            elif text in ')]}' and depth > 0:
> +                # parent indents should not be more than this one
> +                prev_indent = indent.pop() or last_indent[1]
> +                hangs.pop()
> +                for d in range(depth):
> +                    if indent[d] > prev_indent:
> +                        indent[d] = 0
> +                for ind in list(indent_chances):
> +                    if ind >= prev_indent:
> +                        del indent_chances[ind]
> +                del open_rows[depth + 1:]
> +                depth -= 1
> +                if depth:
> +                    indent_chances[indent[depth]] = True
> +                for idx in range(row, -1, -1):
> +                    if parens[idx]:
> +                        parens[idx] -= 1
> +                        break
> +            assert len(indent) == depth + 1
> +            if start[1] not in indent_chances:
> +                # allow to line up tokens
> +                indent_chances[start[1]] = text
> +
> +        last_token_multiline = (start[0] != end[0])
> +        if last_token_multiline:
> +            rel_indent[end[0] - first_row] = rel_indent[row]
> +
> +    if indent_next and expand_indent(line) == indent_level + 4:
> +        pos = (start[0], indent[0] + 4)
> +        if visual_indent:
> +            code = "E129 visually indented line"
> +        else:
> +            code = "E125 continuation line"
> +        yield pos, "%s with same indent as next logical line" % code
> +
> +
> +def whitespace_before_parameters(logical_line, tokens):
> +    r"""Avoid extraneous whitespace.
> +
> +    Avoid extraneous whitespace in the following situations:
> +    - before the open parenthesis that starts the argument list of a
> +      function call.
> +    - before the open parenthesis that starts an indexing or slicing.
> +
> +    Okay: spam(1)
> +    E211: spam (1)
> +
> +    Okay: dict['key'] = list[index]
> +    E211: dict ['key'] = list[index]
> +    E211: dict['key'] = list [index]
> +    """
> +    prev_type, prev_text, __, prev_end, __ = tokens[0]
> +    for index in range(1, len(tokens)):
> +        token_type, text, start, end, __ = tokens[index]
> +        if (token_type == tokenize.OP and
> +            text in '([' and
> +            start != prev_end and
> +            (prev_type == tokenize.NAME or prev_text in '}])') and
> +            # Syntax "class A (B):" is allowed, but avoid it
> +            (index < 2 or tokens[index - 2][1] != 'class') and
> +                # Allow "return (a.foo for a in range(5))"
> +                not keyword.iskeyword(prev_text)):
> +            yield prev_end, "E211 whitespace before '%s'" % text
> +        prev_type = token_type
> +        prev_text = text
> +        prev_end = end
> +
> +
> +def whitespace_around_operator(logical_line):
> +    r"""Avoid extraneous whitespace around an operator.
> +
> +    Okay: a = 12 + 3
> +    E221: a = 4  + 5
> +    E222: a = 4 +  5
> +    E223: a = 4\t+ 5
> +    E224: a = 4 +\t5
> +    """
> +    for match in OPERATOR_REGEX.finditer(logical_line):
> +        before, after = match.groups()
> +
> +        if '\t' in before:
> +            yield match.start(1), "E223 tab before operator"
> +        elif len(before) > 1:
> +            yield match.start(1), "E221 multiple spaces before operator"
> +
> +        if '\t' in after:
> +            yield match.start(2), "E224 tab after operator"
> +        elif len(after) > 1:
> +            yield match.start(2), "E222 multiple spaces after operator"
> +
> +
> +def missing_whitespace_around_operator(logical_line, tokens):
> +    r"""Surround operators with a single space on either side.
> +
> +    - Always surround these binary operators with a single space on
> +      either side: assignment (=), augmented assignment (+=, -= etc.),
> +      comparisons (==, <, >, !=, <=, >=, in, not in, is, is not),
> +      Booleans (and, or, not).
> +
> +    - If operators with different priorities are used, consider adding
> +      whitespace around the operators with the lowest priorities.
> +
> +    Okay: i = i + 1
> +    Okay: submitted += 1
> +    Okay: x = x * 2 - 1
> +    Okay: hypot2 = x * x + y * y
> +    Okay: c = (a + b) * (a - b)
> +    Okay: foo(bar, key='word', *args, **kwargs)
> +    Okay: alpha[:-i]
> +
> +    E225: i=i+1
> +    E225: submitted +=1
> +    E225: x = x /2 - 1
> +    E225: z = x **y
> +    E226: c = (a+b) * (a-b)
> +    E226: hypot2 = x*x + y*y
> +    E227: c = a|b
> +    E228: msg = fmt%(errno, errmsg)
> +    """
> +    parens = 0
> +    need_space = False
> +    prev_type = tokenize.OP
> +    prev_text = prev_end = None
> +    for token_type, text, start, end, line in tokens:
> +        if token_type in SKIP_COMMENTS:
> +            continue
> +        if text in ('(', 'lambda'):
> +            parens += 1
> +        elif text == ')':
> +            parens -= 1
> +        if need_space:
> +            if start != prev_end:
> +                # Found a (probably) needed space
> +                if need_space is not True and not need_space[1]:
> +                    yield (need_space[0],
> +                           "E225 missing whitespace around operator")
> +                need_space = False
> +            elif text == '>' and prev_text in ('<', '-'):
> +                # Tolerate the "<>" operator, even if running Python 3
> +                # Deal with Python 3's annotated return value "->"
> +                pass
> +            else:
> +                if need_space is True or need_space[1]:
> +                    # A needed trailing space was not found
> +                    yield prev_end, "E225 missing whitespace around
> operator"
> +                elif prev_text != '**':
> +                    code, optype = 'E226', 'arithmetic'
> +                    if prev_text == '%':
> +                        code, optype = 'E228', 'modulo'
> +                    elif prev_text not in ARITHMETIC_OP:
> +                        code, optype = 'E227', 'bitwise or shift'
> +                    yield (need_space[0], "%s missing whitespace "
> +                           "around %s operator" % (code, optype))
> +                need_space = False
> +        elif token_type == tokenize.OP and prev_end is not None:
> +            if text == '=' and parens:
> +                # Allow keyword args or defaults: foo(bar=None).
> +                pass
> +            elif text in WS_NEEDED_OPERATORS:
> +                need_space = True
> +            elif text in UNARY_OPERATORS:
> +                # Check if the operator is being used as a binary operator
> +                # Allow unary operators: -123, -x, +1.
> +                # Allow argument unpacking: foo(*args, **kwargs).
> +                if (prev_text in '}])' if prev_type == tokenize.OP
> +                        else prev_text not in KEYWORDS):
> +                    need_space = None
> +            elif text in WS_OPTIONAL_OPERATORS:
> +                need_space = None
> +
> +            if need_space is None:
> +                # Surrounding space is optional, but ensure that
> +                # trailing space matches opening space
> +                need_space = (prev_end, start != prev_end)
> +            elif need_space and start == prev_end:
> +                # A needed opening space was not found
> +                yield prev_end, "E225 missing whitespace around operator"
> +                need_space = False
> +        prev_type = token_type
> +        prev_text = text
> +        prev_end = end
> +
> +
> +def whitespace_around_comma(logical_line):
> +    r"""Avoid extraneous whitespace after a comma or a colon.
> +
> +    Note: these checks are disabled by default
> +
> +    Okay: a = (1, 2)
> +    E241: a = (1,  2)
> +    E242: a = (1,\t2)
> +    """
> +    line = logical_line
> +    for m in WHITESPACE_AFTER_COMMA_REGEX.finditer(line):
> +        found = m.start() + 1
> +        if '\t' in m.group():
> +            yield found, "E242 tab after '%s'" % m.group()[0]
> +        else:
> +            yield found, "E241 multiple spaces after '%s'" % m.group()[0]
> +
> +
> +def whitespace_around_named_parameter_equals(logical_line, tokens):
> +    r"""Don't use spaces around the '=' sign in function arguments.
> +
> +    Don't use spaces around the '=' sign when used to indicate a
> +    keyword argument or a default parameter value.
> +
> +    Okay: def complex(real, imag=0.0):
> +    Okay: return magic(r=real, i=imag)
> +    Okay: boolean(a == b)
> +    Okay: boolean(a != b)
> +    Okay: boolean(a <= b)
> +    Okay: boolean(a >= b)
> +    Okay: def foo(arg: int = 42):
> +
> +    E251: def complex(real, imag = 0.0):
> +    E251: return magic(r = real, i = imag)
> +    """
> +    parens = 0
> +    no_space = False
> +    prev_end = None
> +    annotated_func_arg = False
> +    in_def = logical_line.startswith('def')
> +    message = "E251 unexpected spaces around keyword / parameter equals"
> +    for token_type, text, start, end, line in tokens:
> +        if token_type == tokenize.NL:
> +            continue
> +        if no_space:
> +            no_space = False
> +            if start != prev_end:
> +                yield (prev_end, message)
> +        if token_type == tokenize.OP:
> +            if text == '(':
> +                parens += 1
> +            elif text == ')':
> +                parens -= 1
> +            elif in_def and text == ':' and parens == 1:
> +                annotated_func_arg = True
> +            elif parens and text == ',' and parens == 1:
> +                annotated_func_arg = False
> +            elif parens and text == '=' and not annotated_func_arg:
> +                no_space = True
> +                if start != prev_end:
> +                    yield (prev_end, message)
> +            if not parens:
> +                annotated_func_arg = False
> +
> +        prev_end = end
> +
> +
> +def whitespace_before_comment(logical_line, tokens):
> +    r"""Separate inline comments by at least two spaces.
> +
> +    An inline comment is a comment on the same line as a statement.
> Inline
> +    comments should be separated by at least two spaces from the
> statement.
> +    They should start with a # and a single space.
> +
> +    Each line of a block comment starts with a # and a single space
> +    (unless it is indented text inside the comment).
> +
> +    Okay: x = x + 1  # Increment x
> +    Okay: x = x + 1    # Increment x
> +    Okay: # Block comment
> +    E261: x = x + 1 # Increment x
> +    E262: x = x + 1  #Increment x
> +    E262: x = x + 1  #  Increment x
> +    E265: #Block comment
> +    E266: ### Block comment
> +    """
> +    prev_end = (0, 0)
> +    for token_type, text, start, end, line in tokens:
> +        if token_type == tokenize.COMMENT:
> +            inline_comment = line[:start[1]].strip()
> +            if inline_comment:
> +                if prev_end[0] == start[0] and start[1] < prev_end[1] + 2:
> +                    yield (prev_end,
> +                           "E261 at least two spaces before inline
> comment")
> +            symbol, sp, comment = text.partition(' ')
> +            bad_prefix = symbol not in '#:' and (symbol.lstrip('#')[:1]
> or '#')
> +            if inline_comment:
> +                if bad_prefix or comment[:1] in WHITESPACE:
> +                    yield start, "E262 inline comment should start with
> '# '"
> +            elif bad_prefix and (bad_prefix != '!' or start[0] > 1):
> +                if bad_prefix != '#':
> +                    yield start, "E265 block comment should start with '#
> '"
> +                elif comment:
> +                    yield start, "E266 too many leading '#' for block
> comment"
> +        elif token_type != tokenize.NL:
> +            prev_end = end
> +
> +
> +def imports_on_separate_lines(logical_line):
> +    r"""Imports should usually be on separate lines.
> +
> +    Okay: import os\nimport sys
> +    E401: import sys, os
> +
> +    Okay: from subprocess import Popen, PIPE
> +    Okay: from myclas import MyClass
> +    Okay: from foo.bar.yourclass import YourClass
> +    Okay: import myclass
> +    Okay: import foo.bar.yourclass
> +    """
> +    line = logical_line
> +    if line.startswith('import '):
> +        found = line.find(',')
> +        if -1 < found and ';' not in line[:found]:
> +            yield found, "E401 multiple imports on one line"
> +
> +
> +def module_imports_on_top_of_file(
> +        logical_line, indent_level, checker_state, noqa):
> +    r"""Imports are always put at the top of the file, just after any
> module
> +    comments and docstrings, and before module globals and constants.
> +
> +    Okay: import os
> +    Okay: # this is a comment\nimport os
> +    Okay: '''this is a module docstring'''\nimport os
> +    Okay: r'''this is a module docstring'''\nimport os
> +    Okay: try:\n    import x\nexcept:\n    pass\nelse:\n    pass\nimport y
> +    Okay: try:\n    import x\nexcept:\n    pass\nfinally:\n
> pass\nimport y
> +    E402: a=1\nimport os
> +    E402: 'One string'\n"Two string"\nimport os
> +    E402: a=1\nfrom sys import x
> +
> +    Okay: if x:\n    import os
> +    """
> +    def is_string_literal(line):
> +        if line[0] in 'uUbB':
> +            line = line[1:]
> +        if line and line[0] in 'rR':
> +            line = line[1:]
> +        return line and (line[0] == '"' or line[0] == "'")
> +
> +    allowed_try_keywords = ('try', 'except', 'else', 'finally')
> +
> +    if indent_level:  # Allow imports in conditional statements or
> functions
> +        return
> +    if not logical_line:  # Allow empty lines or comments
> +        return
> +    if noqa:
> +        return
> +    line = logical_line
> +    if line.startswith('import ') or line.startswith('from '):
> +        if checker_state.get('seen_non_imports', False):
> +            yield 0, "E402 module level import not at top of file"
> +    elif any(line.startswith(kw) for kw in allowed_try_keywords):
> +        # Allow try, except, else, finally keywords intermixed with
> imports in
> +        # order to support conditional importing
> +        return
> +    elif is_string_literal(line):
> +        # The first literal is a docstring, allow it. Otherwise, report
> error.
> +        if checker_state.get('seen_docstring', False):
> +            checker_state['seen_non_imports'] = True
> +        else:
> +            checker_state['seen_docstring'] = True
> +    else:
> +        checker_state['seen_non_imports'] = True
> +
> +
> +def compound_statements(logical_line):
> +    r"""Compound statements (on the same line) are generally discouraged.
> +
> +    While sometimes it's okay to put an if/for/while with a small body
> +    on the same line, never do this for multi-clause statements.
> +    Also avoid folding such long lines!
> +
> +    Always use a def statement instead of an assignment statement that
> +    binds a lambda expression directly to a name.
> +
> +    Okay: if foo == 'blah':\n    do_blah_thing()
> +    Okay: do_one()
> +    Okay: do_two()
> +    Okay: do_three()
> +
> +    E701: if foo == 'blah': do_blah_thing()
> +    E701: for x in lst: total += x
> +    E701: while t < 10: t = delay()
> +    E701: if foo == 'blah': do_blah_thing()
> +    E701: else: do_non_blah_thing()
> +    E701: try: something()
> +    E701: finally: cleanup()
> +    E701: if foo == 'blah': one(); two(); three()
> +    E702: do_one(); do_two(); do_three()
> +    E703: do_four();  # useless semicolon
> +    E704: def f(x): return 2*x
> +    E731: f = lambda x: 2*x
> +    """
> +    line = logical_line
> +    last_char = len(line) - 1
> +    found = line.find(':')
> +    while -1 < found < last_char:
> +        before = line[:found]
> +        if ((before.count('{') <= before.count('}') and   # {'a': 1}
> (dict)
> +             before.count('[') <= before.count(']') and   # [1:2] (slice)
> +             before.count('(') <= before.count(')'))):    # (annotation)
> +            lambda_kw = LAMBDA_REGEX.search(before)
> +            if lambda_kw:
> +                before = line[:lambda_kw.start()].rstrip()
> +                if before[-1:] == '=' and
> isidentifier(before[:-1].strip()):
> +                    yield 0, ("E731 do not assign a lambda expression,
> use a "
> +                              "def")
> +                break
> +            if before.startswith('def '):
> +                yield 0, "E704 multiple statements on one line (def)"
> +            else:
> +                yield found, "E701 multiple statements on one line
> (colon)"
> +        found = line.find(':', found + 1)
> +    found = line.find(';')
> +    while -1 < found:
> +        if found < last_char:
> +            yield found, "E702 multiple statements on one line
> (semicolon)"
> +        else:
> +            yield found, "E703 statement ends with a semicolon"
> +        found = line.find(';', found + 1)
> +
> +
> +def explicit_line_join(logical_line, tokens):
> +    r"""Avoid explicit line join between brackets.
> +
> +    The preferred way of wrapping long lines is by using Python's implied
> line
> +    continuation inside parentheses, brackets and braces.  Long lines can
> be
> +    broken over multiple lines by wrapping expressions in parentheses.
> These
> +    should be used in preference to using a backslash for line
> continuation.
> +
> +    E502: aaa = [123, \\n       123]
> +    E502: aaa = ("bbb " \\n       "ccc")
> +
> +    Okay: aaa = [123,\n       123]
> +    Okay: aaa = ("bbb "\n       "ccc")
> +    Okay: aaa = "bbb " \\n    "ccc"
> +    Okay: aaa = 123  # \\
> +    """
> +    prev_start = prev_end = parens = 0
> +    comment = False
> +    backslash = None
> +    for token_type, text, start, end, line in tokens:
> +        if token_type == tokenize.COMMENT:
> +            comment = True
> +        if start[0] != prev_start and parens and backslash and not
> comment:
> +            yield backslash, "E502 the backslash is redundant between
> brackets"
> +        if end[0] != prev_end:
> +            if line.rstrip('\r\n').endswith('\\'):
> +                backslash = (end[0], len(line.splitlines()[-1]) - 1)
> +            else:
> +                backslash = None
> +            prev_start = prev_end = end[0]
> +        else:
> +            prev_start = start[0]
> +        if token_type == tokenize.OP:
> +            if text in '([{':
> +                parens += 1
> +            elif text in ')]}':
> +                parens -= 1
> +
> +
> +def break_around_binary_operator(logical_line, tokens):
> +    r"""
> +    Avoid breaks before binary operators.
> +
> +    The preferred place to break around a binary operator is after the
> +    operator, not before it.
> +
> +    W503: (width == 0\n + height == 0)
> +    W503: (width == 0\n and height == 0)
> +
> +    Okay: (width == 0 +\n height == 0)
> +    Okay: foo(\n    -x)
> +    Okay: foo(x\n    [])
> +    Okay: x = '''\n''' + ''
> +    Okay: foo(x,\n    -y)
> +    Okay: foo(x,  # comment\n    -y)
> +    """
> +    def is_binary_operator(token_type, text):
> +        # The % character is strictly speaking a binary operator, but the
> +        # common usage seems to be to put it next to the format
> parameters,
> +        # after a line break.
> +        return ((token_type == tokenize.OP or text in ['and', 'or']) and
> +                text not in "()[]{},:.;@=%")
> +
> +    line_break = False
> +    unary_context = True
> +    for token_type, text, start, end, line in tokens:
> +        if token_type == tokenize.COMMENT:
> +            continue
> +        if ('\n' in text or '\r' in text) and token_type !=
> tokenize.STRING:
> +            line_break = True
> +        else:
> +            if (is_binary_operator(token_type, text) and line_break and
> +                    not unary_context):
> +                yield start, "W503 line break before binary operator"
> +            unary_context = text in '([{,;'
> +            line_break = False
> +
> +
> +def comparison_to_singleton(logical_line, noqa):
> +    r"""Comparison to singletons should use "is" or "is not".
> +
> +    Comparisons to singletons like None should always be done
> +    with "is" or "is not", never the equality operators.
> +
> +    Okay: if arg is not None:
> +    E711: if arg != None:
> +    E711: if None == arg:
> +    E712: if arg == True:
> +    E712: if False == arg:
> +
> +    Also, beware of writing if x when you really mean if x is not None --
> +    e.g. when testing whether a variable or argument that defaults to
> None was
> +    set to some other value.  The other value might have a type (such as a
> +    container) that could be false in a boolean context!
> +    """
> +    match = not noqa and COMPARE_SINGLETON_REGEX.search(logical_line)
> +    if match:
> +        singleton = match.group(1) or match.group(3)
> +        same = (match.group(2) == '==')
> +
> +        msg = "'if cond is %s:'" % (('' if same else 'not ') + singleton)
> +        if singleton in ('None',):
> +            code = 'E711'
> +        else:
> +            code = 'E712'
> +            nonzero = ((singleton == 'True' and same) or
> +                       (singleton == 'False' and not same))
> +            msg += " or 'if %scond:'" % ('' if nonzero else 'not ')
> +        yield match.start(2), ("%s comparison to %s should be %s" %
> +                               (code, singleton, msg))
> +
> +
> +def comparison_negative(logical_line):
> +    r"""Negative comparison should be done using "not in" and "is not".
> +
> +    Okay: if x not in y:\n    pass
> +    Okay: assert (X in Y or X is Z)
> +    Okay: if not (X in Y):\n    pass
> +    Okay: zz = x is not y
> +    E713: Z = not X in Y
> +    E713: if not X.B in Y:\n    pass
> +    E714: if not X is Y:\n    pass
> +    E714: Z = not X.B is Y
> +    """
> +    match = COMPARE_NEGATIVE_REGEX.search(logical_line)
> +    if match:
> +        pos = match.start(1)
> +        if match.group(2) == 'in':
> +            yield pos, "E713 test for membership should be 'not in'"
> +        else:
> +            yield pos, "E714 test for object identity should be 'is not'"
> +
> +
> +def comparison_type(logical_line, noqa):
> +    r"""Object type comparisons should always use isinstance().
> +
> +    Do not compare types directly.
> +
> +    Okay: if isinstance(obj, int):
> +    E721: if type(obj) is type(1):
> +
> +    When checking if an object is a string, keep in mind that it might be
> a
> +    unicode string too! In Python 2.3, str and unicode have a common base
> +    class, basestring, so you can do:
> +
> +    Okay: if isinstance(obj, basestring):
> +    Okay: if type(a1) is type(b1):
> +    """
> +    match = COMPARE_TYPE_REGEX.search(logical_line)
> +    if match and not noqa:
> +        inst = match.group(1)
> +        if inst and isidentifier(inst) and inst not in SINGLETONS:
> +            return  # Allow comparison for types which are not obvious
> +        yield match.start(), "E721 do not compare types, use
> 'isinstance()'"
> +
> +
> +def python_3000_has_key(logical_line, noqa):
> +    r"""The {}.has_key() method is removed in Python 3: use the 'in'
> operator.
> +
> +    Okay: if "alph" in d:\n    print d["alph"]
> +    W601: assert d.has_key('alph')
> +    """
> +    pos = logical_line.find('.has_key(')
> +    if pos > -1 and not noqa:
> +        yield pos, "W601 .has_key() is deprecated, use 'in'"
> +
> +
> +def python_3000_raise_comma(logical_line):
> +    r"""When raising an exception, use "raise ValueError('message')".
> +
> +    The older form is removed in Python 3.
> +
> +    Okay: raise DummyError("Message")
> +    W602: raise DummyError, "Message"
> +    """
> +    match = RAISE_COMMA_REGEX.match(logical_line)
> +    if match and not RERAISE_COMMA_REGEX.match(logical_line):
> +        yield match.end() - 1, "W602 deprecated form of raising exception"
> +
> +
> +def python_3000_not_equal(logical_line):
> +    r"""New code should always use != instead of <>.
> +
> +    The older syntax is removed in Python 3.
> +
> +    Okay: if a != 'no':
> +    W603: if a <> 'no':
> +    """
> +    pos = logical_line.find('<>')
> +    if pos > -1:
> +        yield pos, "W603 '<>' is deprecated, use '!='"
> +
> +
> +def python_3000_backticks(logical_line):
> +    r"""Backticks are removed in Python 3: use repr() instead.
> +
> +    Okay: val = repr(1 + 2)
> +    W604: val = `1 + 2`
> +    """
> +    pos = logical_line.find('`')
> +    if pos > -1:
> +        yield pos, "W604 backticks are deprecated, use 'repr()'"
> +
> +
>
> +##############################################################################
> +# Helper functions
>
> +##############################################################################
> +
> +
> +if sys.version_info < (3,):
> +    # Python 2: implicit encoding.
> +    def readlines(filename):
> +        """Read the source code."""
> +        with open(filename, 'rU') as f:
> +            return f.readlines()
> +    isidentifier = re.compile(r'[a-zA-Z_]\w*$').match
> +    stdin_get_value = sys.stdin.read
> +else:
> +    # Python 3
> +    def readlines(filename):
> +        """Read the source code."""
> +        try:
> +            with open(filename, 'rb') as f:
> +                (coding, lines) = tokenize.detect_encoding(f.readline)
> +                f = TextIOWrapper(f, coding, line_buffering=True)
> +                return [l.decode(coding) for l in lines] + f.readlines()
> +        except (LookupError, SyntaxError, UnicodeError):
> +            # Fall back if file encoding is improperly declared
> +            with open(filename, encoding='latin-1') as f:
> +                return f.readlines()
> +    isidentifier = str.isidentifier
> +
> +    def stdin_get_value():
> +        return TextIOWrapper(sys.stdin.buffer, errors='ignore').read()
> +noqa = re.compile(r'# no(?:qa|pep8)\b', re.I).search
> +
> +
> +def expand_indent(line):
> +    r"""Return the amount of indentation.
> +
> +    Tabs are expanded to the next multiple of 8.
> +
> +    >>> expand_indent('    ')
> +    4
> +    >>> expand_indent('\t')
> +    8
> +    >>> expand_indent('       \t')
> +    8
> +    >>> expand_indent('        \t')
> +    16
> +    """
> +    if '\t' not in line:
> +        return len(line) - len(line.lstrip())
> +    result = 0
> +    for char in line:
> +        if char == '\t':
> +            result = result // 8 * 8 + 8
> +        elif char == ' ':
> +            result += 1
> +        else:
> +            break
> +    return result
> +
> +
> +def mute_string(text):
> +    """Replace contents with 'xxx' to prevent syntax matching.
> +
> +    >>> mute_string('"abc"')
> +    '"xxx"'
> +    >>> mute_string("'''abc'''")
> +    "'''xxx'''"
> +    >>> mute_string("r'abc'")
> +    "r'xxx'"
> +    """
> +    # String modifiers (e.g. u or r)
> +    start = text.index(text[-1]) + 1
> +    end = len(text) - 1
> +    # Triple quotes
> +    if text[-3:] in ('"""', "'''"):
> +        start += 2
> +        end -= 2
> +    return text[:start] + 'x' * (end - start) + text[end:]
> +
> +
> +def parse_udiff(diff, patterns=None, parent='.'):
> +    """Return a dictionary of matching lines."""
> +    # For each file of the diff, the entry key is the filename,
> +    # and the value is a set of row numbers to consider.
> +    rv = {}
> +    path = nrows = None
> +    for line in diff.splitlines():
> +        if nrows:
> +            if line[:1] != '-':
> +                nrows -= 1
> +            continue
> +        if line[:3] == '@@ ':
> +            hunk_match = HUNK_REGEX.match(line)
> +            (row, nrows) = [int(g or '1') for g in hunk_match.groups()]
> +            rv[path].update(range(row, row + nrows))
> +        elif line[:3] == '+++':
> +            path = line[4:].split('\t', 1)[0]
> +            if path[:2] == 'b/':
> +                path = path[2:]
> +            rv[path] = set()
> +    return dict([(os.path.join(parent, path), rows)
> +                 for (path, rows) in rv.items()
> +                 if rows and filename_match(path, patterns)])
> +
> +
> +def normalize_paths(value, parent=os.curdir):
> +    """Parse a comma-separated list of paths.
> +
> +    Return a list of absolute paths.
> +    """
> +    if not value:
> +        return []
> +    if isinstance(value, list):
> +        return value
> +    paths = []
> +    for path in value.split(','):
> +        path = path.strip()
> +        if '/' in path:
> +            path = os.path.abspath(os.path.join(parent, path))
> +        paths.append(path.rstrip('/'))
> +    return paths
> +
> +
> +def filename_match(filename, patterns, default=True):
> +    """Check if patterns contains a pattern that matches filename.
> +
> +    If patterns is unspecified, this always returns True.
> +    """
> +    if not patterns:
> +        return default
> +    return any(fnmatch(filename, pattern) for pattern in patterns)
> +
> +
> +def _is_eol_token(token):
> +    return token[0] in NEWLINE or token[4][token[3][1]:].lstrip() ==
> '\\\n'
> +if COMMENT_WITH_NL:
> +    def _is_eol_token(token, _eol_token=_is_eol_token):
> +        return _eol_token(token) or (token[0] == tokenize.COMMENT and
> +                                     token[1] == token[4])
> +
>
> +##############################################################################
> +# Framework to run all checks
>
> +##############################################################################
> +
> +
> +_checks = {'physical_line': {}, 'logical_line': {}, 'tree': {}}
> +
> +
> +def _get_parameters(function):
> +    if sys.version_info >= (3, 3):
> +        return [parameter.name
> +                for parameter
> +                in inspect.signature(function).parameters.values()
> +                if parameter.kind == parameter.POSITIONAL_OR_KEYWORD]
> +    else:
> +        return inspect.getargspec(function)[0]
> +
> +
> +def register_check(check, codes=None):
> +    """Register a new check object."""
> +    def _add_check(check, kind, codes, args):
> +        if check in _checks[kind]:
> +            _checks[kind][check][0].extend(codes or [])
> +        else:
> +            _checks[kind][check] = (codes or [''], args)
> +    if inspect.isfunction(check):
> +        args = _get_parameters(check)
> +        if args and args[0] in ('physical_line', 'logical_line'):
> +            if codes is None:
> +                codes = ERRORCODE_REGEX.findall(check.__doc__ or '')
> +            _add_check(check, args[0], codes, args)
> +    elif inspect.isclass(check):
> +        if _get_parameters(check.__init__)[:2] == ['self', 'tree']:
> +            _add_check(check, 'tree', codes, None)
> +
> +
> +def init_checks_registry():
> +    """Register all globally visible functions.
> +
> +    The first argument name is either 'physical_line' or 'logical_line'.
> +    """
> +    mod = inspect.getmodule(register_check)
> +    for (name, function) in inspect.getmembers(mod, inspect.isfunction):
> +        register_check(function)
> +init_checks_registry()
> +
> +
> +class Checker(object):
> +    """Load a Python source file, tokenize it, check coding style."""
> +
> +    def __init__(self, filename=None, lines=None,
> +                 options=None, report=None, **kwargs):
> +        if options is None:
> +            options = StyleGuide(kwargs).options
> +        else:
> +            assert not kwargs
> +        self._io_error = None
> +        self._physical_checks = options.physical_checks
> +        self._logical_checks = options.logical_checks
> +        self._ast_checks = options.ast_checks
> +        self.max_line_length = options.max_line_length
> +        self.multiline = False  # in a multiline string?
> +        self.hang_closing = options.hang_closing
> +        self.verbose = options.verbose
> +        self.filename = filename
> +        # Dictionary where a checker can store its custom state.
> +        self._checker_states = {}
> +        if filename is None:
> +            self.filename = 'stdin'
> +            self.lines = lines or []
> +        elif filename == '-':
> +            self.filename = 'stdin'
> +            self.lines = stdin_get_value().splitlines(True)
> +        elif lines is None:
> +            try:
> +                self.lines = readlines(filename)
> +            except IOError:
> +                (exc_type, exc) = sys.exc_info()[:2]
> +                self._io_error = '%s: %s' % (exc_type.__name__, exc)
> +                self.lines = []
> +        else:
> +            self.lines = lines
> +        if self.lines:
> +            ord0 = ord(self.lines[0][0])
> +            if ord0 in (0xef, 0xfeff):  # Strip the UTF-8 BOM
> +                if ord0 == 0xfeff:
> +                    self.lines[0] = self.lines[0][1:]
> +                elif self.lines[0][:3] == '\xef\xbb\xbf':
> +                    self.lines[0] = self.lines[0][3:]
> +        self.report = report or options.report
> +        self.report_error = self.report.error
> +
> +    def report_invalid_syntax(self):
> +        """Check if the syntax is valid."""
> +        (exc_type, exc) = sys.exc_info()[:2]
> +        if len(exc.args) > 1:
> +            offset = exc.args[1]
> +            if len(offset) > 2:
> +                offset = offset[1:3]
> +        else:
> +            offset = (1, 0)
> +        self.report_error(offset[0], offset[1] or 0,
> +                          'E901 %s: %s' % (exc_type.__name__,
> exc.args[0]),
> +                          self.report_invalid_syntax)
> +
> +    def readline(self):
> +        """Get the next line from the input buffer."""
> +        if self.line_number >= self.total_lines:
> +            return ''
> +        line = self.lines[self.line_number]
> +        self.line_number += 1
> +        if self.indent_char is None and line[:1] in WHITESPACE:
> +            self.indent_char = line[0]
> +        return line
> +
> +    def run_check(self, check, argument_names):
> +        """Run a check plugin."""
> +        arguments = []
> +        for name in argument_names:
> +            arguments.append(getattr(self, name))
> +        return check(*arguments)
> +
> +    def init_checker_state(self, name, argument_names):
> +        """ Prepares a custom state for the specific checker plugin."""
> +        if 'checker_state' in argument_names:
> +            self.checker_state = self._checker_states.setdefault(name, {})
> +
> +    def check_physical(self, line):
> +        """Run all physical checks on a raw input line."""
> +        self.physical_line = line
> +        for name, check, argument_names in self._physical_checks:
> +            self.init_checker_state(name, argument_names)
> +            result = self.run_check(check, argument_names)
> +            if result is not None:
> +                (offset, text) = result
> +                self.report_error(self.line_number, offset, text, check)
> +                if text[:4] == 'E101':
> +                    self.indent_char = line[0]
> +
> +    def build_tokens_line(self):
> +        """Build a logical line from tokens."""
> +        logical = []
> +        comments = []
> +        length = 0
> +        prev_row = prev_col = mapping = None
> +        for token_type, text, start, end, line in self.tokens:
> +            if token_type in SKIP_TOKENS:
> +                continue
> +            if not mapping:
> +                mapping = [(0, start)]
> +            if token_type == tokenize.COMMENT:
> +                comments.append(text)
> +                continue
> +            if token_type == tokenize.STRING:
> +                text = mute_string(text)
> +            if prev_row:
> +                (start_row, start_col) = start
> +                if prev_row != start_row:    # different row
> +                    prev_text = self.lines[prev_row - 1][prev_col - 1]
> +                    if prev_text == ',' or (prev_text not in '{[(' and
> +                                            text not in '}])'):
> +                        text = ' ' + text
> +                elif prev_col != start_col:  # different column
> +                    text = line[prev_col:start_col] + text
> +            logical.append(text)
> +            length += len(text)
> +            mapping.append((length, end))
> +            (prev_row, prev_col) = end
> +        self.logical_line = ''.join(logical)
> +        self.noqa = comments and noqa(''.join(comments))
> +        return mapping
> +
> +    def check_logical(self):
> +        """Build a line from tokens and run all logical checks on it."""
> +        self.report.increment_logical_line()
> +        mapping = self.build_tokens_line()
> +
> +        if not mapping:
> +            return
> +
> +        (start_row, start_col) = mapping[0][1]
> +        start_line = self.lines[start_row - 1]
> +        self.indent_level = expand_indent(start_line[:start_col])
> +        if self.blank_before < self.blank_lines:
> +            self.blank_before = self.blank_lines
> +        if self.verbose >= 2:
> +            print(self.logical_line[:80].rstrip())
> +        for name, check, argument_names in self._logical_checks:
> +            if self.verbose >= 4:
> +                print('   ' + name)
> +            self.init_checker_state(name, argument_names)
> +            for offset, text in self.run_check(check, argument_names) or
> ():
> +                if not isinstance(offset, tuple):
> +                    for token_offset, pos in mapping:
> +                        if offset <= token_offset:
> +                            break
> +                    offset = (pos[0], pos[1] + offset - token_offset)
> +                self.report_error(offset[0], offset[1], text, check)
> +        if self.logical_line:
> +            self.previous_indent_level = self.indent_level
> +            self.previous_logical = self.logical_line
> +        self.blank_lines = 0
> +        self.tokens = []
> +
> +    def check_ast(self):
> +        """Build the file's AST and run all AST checks."""
> +        try:
> +            tree = compile(''.join(self.lines), '', 'exec', PyCF_ONLY_AST)
> +        except (ValueError, SyntaxError, TypeError):
> +            return self.report_invalid_syntax()
> +        for name, cls, __ in self._a
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160401/6d0aed6f/attachment-0001.html>

From nad at python.org  Fri Apr  1 16:16:28 2016
From: nad at python.org (Ned Deily)
Date: Fri, 1 Apr 2016 16:16:28 -0400
Subject: [Python-Dev] [Python-checkins] cpython: Python 8: no pep8,
 no chocolate!
In-Reply-To: <CAP1=2W5wn71aFSkRPOQXcPB-MWK3MO=yb-b-F=iXc=Ci7R-o2A@mail.gmail.com>
References: <20160331214027.11092.50943.0083A2D0@psf.io>
 <CAP1=2W5wn71aFSkRPOQXcPB-MWK3MO=yb-b-F=iXc=Ci7R-o2A@mail.gmail.com>
Message-ID: <7B294850-BFD9-48D3-8E12-0FA3312136BB@python.org>

On Apr 1, 2016, at 14:07, Brett Cannon <brett at snarky.ca> wrote:
> Are you planning on removing this after today? My worry about leaving it in is if it's a modified copy that follows your Python 8 April Fools joke then it will quite possibly trip people up who try and run pep8 but don't have it installed, leading them to wonder why the heck their imports are now all flagged as broken.
> 
> On Thu, 31 Mar 2016 at 14:40 victor.stinner <python-checkins at python.org> wrote:
> https://hg.python.org/cpython/rev/9aedec2dbc01
> changeset:   100818:9aedec2dbc01
> user:        Victor Stinner <victor.stinner at gmail.com>
> date:        Thu Mar 31 23:30:53 2016 +0200
> summary:
>   Python 8: no pep8, no chocolate!
> 
> files:
>   Include/patchlevel.h |     6 +-
>   Lib/pep8.py          |  2151 ++++++++++++++++++++++++++++++
>   Lib/site.py          |    56 +
>   3 files changed, 2210 insertions(+), 3 deletions(-)
[...]

It has already been removed, a few hours after it was pushed, since it broke all of the 3x buidbots, and would have confused and/or added extra work to anyone trying to build or push changes.

On behalf of my fellow release managers, may I suggest that, in the future, if anyone feels the urge to check something like this in to the live cpython repository, please resist that urge? :)  A patch would be just as amusing without the need to use the soft cushion or the comfy chair.

Inquisitorly yours,
--Ned

--
  Ned Deily
  nad at python.org -- []


From greg.ewing at canterbury.ac.nz  Fri Apr  1 20:16:06 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 02 Apr 2016 13:16:06 +1300
Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35?
In-Reply-To: <CAPTjJmoiSK_GOddX-LdbHo+0mVMWNmwd817-xYptkWv83=PciA@mail.gmail.com>
References: <CAF0TK51emEGepcvJPr4xOpaQMD9tx4w1TsNBb7B1bc3J-pg0ug@mail.gmail.com>
 <CAPTjJmoiSK_GOddX-LdbHo+0mVMWNmwd817-xYptkWv83=PciA@mail.gmail.com>
Message-ID: <56FF0F46.2010603@canterbury.ac.nz>

Chris Angelico wrote:
> In this era of international foods in every supermarket aisle,
> you cannot simply dismiss the black marks as "funny spots" and wish
> they'd just go away; you MUST have a fungicide which can adequately
> handle them.

At least there's a standard for the spots now. It used to
be a real mess -- Japanese plants had yellow spots, Chinese
ones had red spots, all the European countries had their
own slightly different variations on the spots, and you
had to keep a dozen different fungicides in your shed for
treating them.

But now, fortunately, more and more growers are producing
plants with the standard spots, and Phyton 35 is widely
acknowledged as being one of the best fungicides for dealing
with them. (Except for one person who seems to have his
own inscrutable ideas on what should be done with spots.)

-- 
Greg

From victor.stinner at gmail.com  Sat Apr  2 03:53:50 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 2 Apr 2016 09:53:50 +0200
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <CA+eR4cFJ4+ANk6=ROKOvE-k-8iFWPOUG20bMZ1_pofcqD99kkA@mail.gmail.com>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <CAMpsgwa9=+KWndpioGFxQH29txsxgfaH5Taq4YZ57oTYGreu=A@mail.gmail.com>
 <ndfn1c$hu1$1@ger.gmane.org>
 <20160330133041.72218B14159@webabinitio.net>
 <CA+eR4cFJ4+ANk6=ROKOvE-k-8iFWPOUG20bMZ1_pofcqD99kkA@mail.gmail.com>
Message-ID: <CAMpsgwb33n7-4KaGgX7jNcNMqWwZKuFacbzGfccQDynpHTZXaA@mail.gmail.com>

Any progress on the issue?

Victor

Le jeudi 31 mars 2016, Martin Panter <vadmium+py at gmail.com> a ?crit :

> On 30 March 2016 at 13:30, R. David Murray <rdmurray at bitdance.com
> <javascript:;>> wrote:
> > Anyone know how to find out what changed from Google's POV?  As far as
> > we know nothing changed at the bugs end, but it is certainly possible
> > that something did change in the hosting infrastructure without our
> > knowledge.  Knowing what is setting google off would help track it down,
> > if so...or perhaps something changed at the google end, in which case we
> > *really* need to know what.
>
> My only guess is that Google decided to get stricter regarding
> something mentioned in
> <http://psf.upfronthosting.co.za/roundup/meta/issue562>, maybe
> something in its sending guidelines. Perhaps to do with IPv6 DNS
> <http://psf.upfronthosting.co.za/roundup/meta/issue568>.
>
> FYI I am now working around the problem for myself by pointing my
> bugs.python.org account at a Yahoo email address, and setting up Yahoo
> to forward all emails to my G Mail address.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <javascript:;>
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160402/6bf2eb16/attachment.html>

From brett at python.org  Sat Apr  2 11:42:57 2016
From: brett at python.org (Brett Cannon)
Date: Sat, 02 Apr 2016 15:42:57 +0000
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <CAMpsgwb33n7-4KaGgX7jNcNMqWwZKuFacbzGfccQDynpHTZXaA@mail.gmail.com>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <CAMpsgwa9=+KWndpioGFxQH29txsxgfaH5Taq4YZ57oTYGreu=A@mail.gmail.com>
 <ndfn1c$hu1$1@ger.gmane.org> <20160330133041.72218B14159@webabinitio.net>
 <CA+eR4cFJ4+ANk6=ROKOvE-k-8iFWPOUG20bMZ1_pofcqD99kkA@mail.gmail.com>
 <CAMpsgwb33n7-4KaGgX7jNcNMqWwZKuFacbzGfccQDynpHTZXaA@mail.gmail.com>
Message-ID: <CAP1=2W7xM87boqeUDgT=N69G-m4cFAY2z2d766zCbHXSgwhHEw@mail.gmail.com>

This is probably the wrong place to be posting as there's an issue tracker
for the issue tracker.

Anyways this might be a solution:
http://psf.upfronthosting.co.za/roundup/meta/issue568

On Sat, Apr 2, 2016, 00:54 Victor Stinner <victor.stinner at gmail.com> wrote:

> Any progress on the issue?
>
> Victor
>
>
> Le jeudi 31 mars 2016, Martin Panter <vadmium+py at gmail.com> a ?crit :
>
>> On 30 March 2016 at 13:30, R. David Murray <rdmurray at bitdance.com> wrote:
>> > Anyone know how to find out what changed from Google's POV?  As far as
>> > we know nothing changed at the bugs end, but it is certainly possible
>> > that something did change in the hosting infrastructure without our
>> > knowledge.  Knowing what is setting google off would help track it down,
>> > if so...or perhaps something changed at the google end, in which case we
>> > *really* need to know what.
>>
>> My only guess is that Google decided to get stricter regarding
>> something mentioned in
>> <http://psf.upfronthosting.co.za/roundup/meta/issue562>, maybe
>> something in its sending guidelines. Perhaps to do with IPv6 DNS
>> <http://psf.upfronthosting.co.za/roundup/meta/issue568>.
>>
>> FYI I am now working around the problem for myself by pointing my
>> bugs.python.org account at a Yahoo email address, and setting up Yahoo
>> to forward all emails to my G Mail address.
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160402/2ab309b4/attachment.html>

From storchaka at gmail.com  Sun Apr  3 03:32:19 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sun, 3 Apr 2016 10:32:19 +0300
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
Message-ID: <ndqgu4$p9d$1@ger.gmane.org>

Originally I proposed a pair of macros for safe reference replacing to 
reflects the duality of Py_DECREF/Py_XDECREF. [1], [2]  The one should 
use Py_DECREF and the other should use Py_XDECREF.

But then I got a number of voices for the single name [3], and no one 
voice (except mine) for the pair of names. Thus in final patches the 
single name Py_SETREF that uses Py_XDECREF is used. Due to adding some 
overhead in comparison with using Py_DECREF, this macros is not used in 
critical performance code such as PyDict_SetItem().

Now Raymond says that we should have separate Py_SETREF/Py_XSETREF names 
to avoid any overhead. [4]  And so I'm raising this issue on Python-Dev.

Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF 
that uses Py_DECREF?

[1] http://comments.gmane.org/gmane.comp.python.devel/145346
[2] http://comments.gmane.org/gmane.comp.python.devel/145974
[3] http://bugs.python.org/issue26200#msg259784
[4] http://bugs.python.org/issue26200


From python at mrabarnett.plus.com  Sun Apr  3 09:29:31 2016
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 3 Apr 2016 14:29:31 +0100
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
In-Reply-To: <ndqgu4$p9d$1@ger.gmane.org>
References: <ndqgu4$p9d$1@ger.gmane.org>
Message-ID: <57011ABB.8070509@mrabarnett.plus.com>

On 2016-04-03 08:32, Serhiy Storchaka wrote:
> Originally I proposed a pair of macros for safe reference replacing to
> reflects the duality of Py_DECREF/Py_XDECREF. [1], [2]  The one should
> use Py_DECREF and the other should use Py_XDECREF.
>
> But then I got a number of voices for the single name [3], and no one
> voice (except mine) for the pair of names. Thus in final patches the
> single name Py_SETREF that uses Py_XDECREF is used. Due to adding some
> overhead in comparison with using Py_DECREF, this macros is not used in
> critical performance code such as PyDict_SetItem().
>
> Now Raymond says that we should have separate Py_SETREF/Py_XSETREF names
> to avoid any overhead. [4]  And so I'm raising this issue on Python-Dev.
>
> Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF
> that uses Py_DECREF?
>
> [1] http://comments.gmane.org/gmane.comp.python.devel/145346
> [2] http://comments.gmane.org/gmane.comp.python.devel/145974
> [3] http://bugs.python.org/issue26200#msg259784
> [4] http://bugs.python.org/issue26200
>
Checking for NULL is convenient (and safer), but, on the other hand, it 
_would_ be consistent with the others.

From arigo at tunes.org  Sun Apr  3 10:00:39 2016
From: arigo at tunes.org (Armin Rigo)
Date: Sun, 3 Apr 2016 16:00:39 +0200
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
In-Reply-To: <57011ABB.8070509@mrabarnett.plus.com>
References: <ndqgu4$p9d$1@ger.gmane.org> <57011ABB.8070509@mrabarnett.plus.com>
Message-ID: <CAMSv6X2Ee5kOE9gZAVGGJeBHcgadfSjDXkLUZJeE-E+q26Usaw@mail.gmail.com>

Hi,

On 3 April 2016 at 15:29, MRAB <python at mrabarnett.plus.com> wrote:
>> Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF
>> that uses Py_DECREF?
>
> Checking for NULL is convenient (and safer), but, on the other hand, it
> _would_ be consistent with the others.

My 2 cents would be to call the new macro Py_XSETREF for consistency,
at least, whether you decide to go with two macros or not.  Otherwise
it's kind of obvious that if you add Py_SETREF that checks for nulls,
in 2 or 3 releases people will really want a "fast" variant anyway,
and there will be no consistent name for that.


A bient?t,

Armin.

From jeog.dev at gmail.com  Sun Apr  3 14:31:12 2016
From: jeog.dev at gmail.com (J.E. Ogden)
Date: Sun, 3 Apr 2016 14:31:12 -0400
Subject: [Python-Dev] review/proof docs about private memory heap
Message-ID: <CAK-_zEtjtEqWervev3yJMBWTe=ihRtgBZceNhGAZDu6o2N41aA@mail.gmail.com>

After digging through obmalloc.c to optimize some memory intensive code, I
put a paper together on the entire private memory heap that may or may not
be a useful addition to docs.

I was hoping someone could review/proof it for errors in content.

Not sure the policy on links but I've uploaded it to google drive:
https://drive.google.com/open?id=0B6IkX5KnPHVLamwxSTNYR3dJYkE

thanks,
jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160403/1b512f4b/attachment.html>

From ncoghlan at gmail.com  Mon Apr  4 05:09:56 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 4 Apr 2016 19:09:56 +1000
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
In-Reply-To: <ndqgu4$p9d$1@ger.gmane.org>
References: <ndqgu4$p9d$1@ger.gmane.org>
Message-ID: <CADiSq7fne71NdaWeWupRVFGWPq09Jw5uhkLNuDdmVYizpWZGwA@mail.gmail.com>

On 3 April 2016 at 17:32, Serhiy Storchaka <storchaka at gmail.com> wrote:
> Originally I proposed a pair of macros for safe reference replacing to
> reflects the duality of Py_DECREF/Py_XDECREF. [1], [2]  The one should use
> Py_DECREF and the other should use Py_XDECREF.
>
> But then I got a number of voices for the single name [3], and no one voice
> (except mine) for the pair of names. Thus in final patches the single name
> Py_SETREF that uses Py_XDECREF is used. Due to adding some overhead in
> comparison with using Py_DECREF, this macros is not used in critical
> performance code such as PyDict_SetItem().

I was one of those arguing for the single macro, and I think Alexander
raises a good point in http://bugs.python.org/issue26200#msg262204
that I don't recall seeing in the original discussion: the "X" in the
macro serves as a good shorthand for indicating that the code in
question isn't closely tracking whether or not manipulated reference
might be NULL, and hence may be a good candidate for additional
micro-optimisations that keep better track of whether or not the
pointer is NULL.

> Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF that
> uses Py_DECREF?

With the single-macro design put into effect and concrete problems
arising from that, I'm now more persuaded by the consistency argument
than I was originally, so +1 from me for reverting to your original
dual-macro proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Mon Apr  4 05:35:41 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 4 Apr 2016 11:35:41 +0200
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
In-Reply-To: <CADiSq7fne71NdaWeWupRVFGWPq09Jw5uhkLNuDdmVYizpWZGwA@mail.gmail.com>
References: <ndqgu4$p9d$1@ger.gmane.org>
 <CADiSq7fne71NdaWeWupRVFGWPq09Jw5uhkLNuDdmVYizpWZGwA@mail.gmail.com>
Message-ID: <CAMpsgwbKnj+U0RK-VettgwBEWPMS-Zuq1bh3hXLHnnNPcbvVeA@mail.gmail.com>

If some dev don't want to use the single macro for good or bad reasons,
it's maybe better to have two macros to generalize their usage. The macro
makes to C code shorter and easier to review.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160404/63fcdff1/attachment.html>

From robertc at robertcollins.net  Mon Apr  4 06:04:58 2016
From: robertc at robertcollins.net (Robert Collins)
Date: Mon, 4 Apr 2016 22:04:58 +1200
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
Message-ID: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>

I'm working on teaching funcsigs - the backport of inspect.signature -
better handling for wrapped functions, and the key enabler to do that
is capturing the wrapped function in __wrapped__. I'm wondering what
folks thoughts are on backporting that to 2.7 - seems cleaner than
monkeypatching functools.wraps, which would tend to be subject to
import ordering races and general ick. I'll likely prep such a
monkeypatch for folk that are stuck on older versions of 2.7 anyhow...
so its not a huge win...

-Rob

-- 
Robert Collins <rbtcollins at hpe.com>
Distinguished Technologist
HP Converged Cloud

From ncoghlan at gmail.com  Mon Apr  4 09:24:25 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 4 Apr 2016 23:24:25 +1000
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
In-Reply-To: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
References: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
Message-ID: <CADiSq7dBiEoBRh=EPX82wfztmjW07PqKmDQ7bjsLeX+d4_Mn=A@mail.gmail.com>

On 4 April 2016 at 20:04, Robert Collins <robertc at robertcollins.net> wrote:
> I'm working on teaching funcsigs - the backport of inspect.signature -
> better handling for wrapped functions, and the key enabler to do that
> is capturing the wrapped function in __wrapped__. I'm wondering what
> folks thoughts are on backporting that to 2.7 - seems cleaner than
> monkeypatching functools.wraps, which would tend to be subject to
> import ordering races and general ick. I'll likely prep such a
> monkeypatch for folk that are stuck on older versions of 2.7 anyhow...
> so its not a huge win...

Right, the baseline there is really 2.7.5 + selected backports, and
the backport set is small for RHEL 7.x, and even smaller for Debian
stable and Ubuntu LTS. Even getting the network security enhancements
backported has proven to be challenging - other feature updates have
next to no chance.

Given that, I don't see a compelling reason to change the existing
policy - the "no new features in point releases" restriction only gets
waived in cases that have implications beyond the Python 2.7 process
itself (which pretty much restricts potential waivers to network
security enhancements).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From guido at python.org  Mon Apr  4 11:47:01 2016
From: guido at python.org (Guido van Rossum)
Date: Mon, 4 Apr 2016 08:47:01 -0700
Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF
In-Reply-To: <CAMpsgwbKnj+U0RK-VettgwBEWPMS-Zuq1bh3hXLHnnNPcbvVeA@mail.gmail.com>
References: <ndqgu4$p9d$1@ger.gmane.org>
 <CADiSq7fne71NdaWeWupRVFGWPq09Jw5uhkLNuDdmVYizpWZGwA@mail.gmail.com>
 <CAMpsgwbKnj+U0RK-VettgwBEWPMS-Zuq1bh3hXLHnnNPcbvVeA@mail.gmail.com>
Message-ID: <CAP7+vJLzEKfs1n6SNCsO745ZK8iwZt6++kr_r01bkS-Q8gznUA@mail.gmail.com>

Agreed, let's go with two macros. The time discussing this further
could be spent more productively.

On Mon, Apr 4, 2016 at 2:35 AM, Victor Stinner <victor.stinner at gmail.com> wrote:
> If some dev don't want to use the single macro for good or bad reasons, it's
> maybe better to have two macros to generalize their usage. The macro makes
> to C code shorter and easier to review.
>
> Victor
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Mon Apr  4 17:05:23 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 4 Apr 2016 17:05:23 -0400
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
Message-ID: <ndukuq$or3$1@ger.gmane.org>

On 3/29/2016 7:30 PM, Martin Panter wrote:
> For the last ~36 hours I have stopped receiving emails for messages
> posted in the bug tracker. Is anyone else having this problem? Has
> anything changed recently?

My udel dot edu account is handled by google.  I am also not getting 
anything at all, not even in spam, since at least 3/31 when I was added 
to https://bugs.python.org/issue26673  I only discovered it in the 
Friday weekly New Issues report.  More emails were missing on Friday. 
The problem continues.  I just added a question to 
https://bugs.python.org/issue19944 and got nothing.

> I have had it set to send to my gmail.com address since the beginning.
> At the moment the last bug message email is
> <https://bugs.python.org/issue19959#msg262569> with ?Date: Mon, 28 Mar
> 2016 12:19:49 +0000?. I have checked spam and they are not going
> there.

Since at least last summer, Rietveld reviews have consistently gone to 
Junk.  Normal tracker emails sometimes went to Inbox, sometimes to Junk. 
  Since normal emails (but not reviews, unfortunately) are tagged in the 
subject line, I added a rule to Thunderbird to move tracker email to 
Inbox when I open Junk.  This is no longer happening at they do not even 
get to Junk.

I tried changing my tracker email to verizon.net and posted a message on 
on issue where I am the only nosy person.  After half an hour, nothing. 
  I am not surprised as Verizon rarely delivers anything it considers 
junk.  I had this confirmed by a game site that said that its emails are 
deleted unless one contacts Verizon to whitelist their site.  I will see 
if I can again find the page to do that.

I do get checkins and core-mentorship mail.  I have not seen anything on 
core-developers since the discussion of new commits privileges a month ago.

-- 
Terry Jan Reedy



From brett at python.org  Mon Apr  4 17:13:15 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 04 Apr 2016 21:13:15 +0000
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <ndukuq$or3$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org>
Message-ID: <CAP1=2W7oG7gR90gjXGy1Ri3n=q1FdGPZkt710guDEwpCPE=S2Q@mail.gmail.com>

On Mon, 4 Apr 2016 at 14:05 Terry Reedy <tjreedy at udel.edu> wrote:

> On 3/29/2016 7:30 PM, Martin Panter wrote:
> > For the last ~36 hours I have stopped receiving emails for messages
> > posted in the bug tracker. Is anyone else having this problem? Has
> > anything changed recently?
>
> My udel dot edu account is handled by google.  I am also not getting
> anything at all, not even in spam, since at least 3/31 when I was added
> to https://bugs.python.org/issue26673  I only discovered it in the
> Friday weekly New Issues report.  More emails were missing on Friday.
> The problem continues.  I just added a question to
> https://bugs.python.org/issue19944 and got nothing.
>

I have reached out to Upfront -- our Roundup host -- to see if the fix
proposed in http://psf.upfronthosting.co.za/roundup/meta/issue568 will
solve the issue to make sure this gets resolved. When I know something I
will post here.


>
> > I have had it set to send to my gmail.com address since the beginning.
> > At the moment the last bug message email is
> > <https://bugs.python.org/issue19959#msg262569> with ?Date: Mon, 28 Mar
> > 2016 12:19:49 +0000?. I have checked spam and they are not going
> > there.
>
> Since at least last summer, Rietveld reviews have consistently gone to
> Junk.  Normal tracker emails sometimes went to Inbox, sometimes to Junk.
>   Since normal emails (but not reviews, unfortunately) are tagged in the
> subject line, I added a rule to Thunderbird to move tracker email to
> Inbox when I open Junk.  This is no longer happening at they do not even
> get to Junk.
>
> I tried changing my tracker email to verizon.net and posted a message on
> on issue where I am the only nosy person.  After half an hour, nothing.
>   I am not surprised as Verizon rarely delivers anything it considers
> junk.  I had this confirmed by a game site that said that its emails are
> deleted unless one contacts Verizon to whitelist their site.  I will see
> if I can again find the page to do that.
>
> I do get checkins and core-mentorship mail.  I have not seen anything on
> core-developers since the discussion of new commits privileges a month ago.
>

Do you mean python-committers? I don't know of any core-developers mailing
list. If you do mean python-committers just let me know and I will see what
address you're subscribed under.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160404/d7202c4c/attachment.html>

From Stefan.Richthofer at gmx.de  Mon Apr  4 23:38:51 2016
From: Stefan.Richthofer at gmx.de (Stefan Richthofer)
Date: Tue, 5 Apr 2016 05:38:51 +0200
Subject: [Python-Dev] Help/advice needed with JyNI issue #4 (Tkinter on OSX)
In-Reply-To: <CAMpsgwZXOGJ9yaKvuPM3U2A_ZBgzE-=8T4nfBsk5VQ5pQr=6rA@mail.gmail.com>
References: <CAMpsgwZXOGJ9yaKvuPM3U2A_ZBgzE-=8T4nfBsk5VQ5pQr=6rA@mail.gmail.com>
Message-ID: <trinity-a360f306-5c8c-4b02-8524-0430d0af9536-1459827531228@3capp-gmx-bs63>

Hey everybody,

I need help/advice for this JyNI-related issue: https://github.com/Stewori/JyNI/issues/4
Especially I need advice from someone familiar with TCL and TK internals, preferably also Tkinter.
The issue is rather strange in the sense that it works well on Linux, while the program hangs on OSX. Everything we found out so far was collected in the thread linked above. Briefly speaking, on OSX TCL/TK does not produce a particular event the loop is waiting for and does not display the window. However logging suggests that calls to TCL/TK API are identical between Linux and OSX runs, so we are really stuck here in finding out what is different on Linux (our current logging does not cover function argument values though).
Any advise how I can debug interaction with TCL/TK to find the reason for the missing event would be helpful.

(Sorry if you might regard this off-topic for Python-dev; since JyNI is somewhat a crossover-project (also containing lots of CPython 2.7 code) I am asking in various locations. Starting here, because in this list I see best chances to find someone who can help within the Python ecosystem. Next I would look for a TCL/TK forum or something.)

Thanks!

Stefan

From victor.stinner at gmail.com  Tue Apr  5 04:10:45 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 5 Apr 2016 10:10:45 +0200
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
In-Reply-To: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
References: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
Message-ID: <CAMpsgwb-f2+xXf5KCnY5882gVODM4x3EQk=s56CYm0Ku3rKZsA@mail.gmail.com>

See https://pypi.python.org/pypi/functools32 for the functools backport for
Python 2.7.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160405/9ebd03ab/attachment.html>

From robertc at robertcollins.net  Tue Apr  5 04:20:54 2016
From: robertc at robertcollins.net (Robert Collins)
Date: Tue, 5 Apr 2016 20:20:54 +1200
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
In-Reply-To: <CAMpsgwb-f2+xXf5KCnY5882gVODM4x3EQk=s56CYm0Ku3rKZsA@mail.gmail.com>
References: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
 <CAMpsgwb-f2+xXf5KCnY5882gVODM4x3EQk=s56CYm0Ku3rKZsA@mail.gmail.com>
Message-ID: <CAJ3HoZ33RdKQcjYp5EZf8QMFHe+NrVSRALzgL8TqcaVAqNLOjg@mail.gmail.com>

Sadly that has the ordering bug of assigning __wrapped__ first and appears
a little unmaintained based on the bug tracker :(
On 5 Apr 2016 8:10 PM, "Victor Stinner" <victor.stinner at gmail.com> wrote:

> See https://pypi.python.org/pypi/functools32 for the functools backport
> for Python 2.7.
>
> Victor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160405/5b35578d/attachment.html>

From guido at python.org  Tue Apr  5 11:46:24 2016
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 Apr 2016 08:46:24 -0700
Subject: [Python-Dev] Help/advice needed with JyNI issue #4 (Tkinter on
 OSX)
In-Reply-To: <trinity-a360f306-5c8c-4b02-8524-0430d0af9536-1459827531228@3capp-gmx-bs63>
References: <CAMpsgwZXOGJ9yaKvuPM3U2A_ZBgzE-=8T4nfBsk5VQ5pQr=6rA@mail.gmail.com>
 <trinity-a360f306-5c8c-4b02-8524-0430d0af9536-1459827531228@3capp-gmx-bs63>
Message-ID: <CAP7+vJLZ1YRP5QCPYHH5V4651En3yNEkOgPC+7uKyTuLWuGwyg@mail.gmail.com>

Since this seems tcl/tk related your best bet is the tkinter mailing
list: https://mail.python.org/mailman/listinfo/tkinter-discuss

On Mon, Apr 4, 2016 at 8:38 PM, Stefan Richthofer
<Stefan.Richthofer at gmx.de> wrote:
> Hey everybody,
>
> I need help/advice for this JyNI-related issue: https://github.com/Stewori/JyNI/issues/4
> Especially I need advice from someone familiar with TCL and TK internals, preferably also Tkinter.
> The issue is rather strange in the sense that it works well on Linux, while the program hangs on OSX. Everything we found out so far was collected in the thread linked above. Briefly speaking, on OSX TCL/TK does not produce a particular event the loop is waiting for and does not display the window. However logging suggests that calls to TCL/TK API are identical between Linux and OSX runs, so we are really stuck here in finding out what is different on Linux (our current logging does not cover function argument values though).
> Any advise how I can debug interaction with TCL/TK to find the reason for the missing event would be helpful.
>
> (Sorry if you might regard this off-topic for Python-dev; since JyNI is somewhat a crossover-project (also containing lots of CPython 2.7 code) I am asking in various locations. Starting here, because in this list I see best chances to find someone who can help within the Python ecosystem. Next I would look for a TCL/TK forum or something.)
>
> Thanks!
>
> Stefan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From brett at python.org  Tue Apr  5 12:36:59 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 05 Apr 2016 16:36:59 +0000
Subject: [Python-Dev] Anyone want to lead the sprints at PyCon US 2016?
In-Reply-To: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
Message-ID: <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>

The call has started to go out for sprint groups to list themselves online.
Anyone want to specifically lead the core sprint this year? If no one
specifically does then I will sign us up and do my usual thing of pointing
people at the devguide and encourage people to ask questions but not do a
lot of hand-holding (I'm expecting to be busy either working on GitHub
migration stuff or doing other things that I have been neglecting due to my
GitHub migration work).

---------- Forwarded message ---------
From: Ewa Jodlowska <ewa at python.org>
Date: Mon, 4 Apr 2016 at 07:14
Subject: [PSF-Community] Sprinting at PyCon US 2016
To: <psf-community at python.org>


Are you coming to PyCon US? Have you thought about sprinting?

The coding Sprints are the hidden gem of PyCon, up to 4 days (June 2-5) of
coding with many Python projects and their maintainers. And if you're
coming to PyCon, taking part in the Sprints is easy!

You don?t need to change your registration* to join the Sprints. There?s no
additional registration fee, and you even get lunch. You do need to cover
the additional lodging and other meals, but that?s it. If you?ve booked a
room through the PyCon registration system, you'll need to contact the
registration team at pycon2016 at cteusa.com as soon as possible to request
the extra nights. The sprinting itself (along with lunch every day) is
free, so your only expenses are your room and other meals.

If you're interested in what projects will be sprinting, just keep an eye
on the sprints page on the PyCon web site at
https://us.pycon.org/2016/community/sprints/ Be sure to check back, as
groups are being added all the time.

If you haven't sprinted before, or if you just need to brush up on
sprinting tools and techniques, there will again be an 'Intro to Sprinting'
session the evening of June 1, lead by Shauna Gordon-McKeon and other
members of Python community. To grab a free ticket for this session, just
visit
https://www.eventbrite.com/e/introduction-to-open-source-the-pycon-sprints-tickets-22435151141
.

*Please note that conference registration is sold out, but you do not need
a conference registration to come to the Sprints.

_______________________________________________
PSF-Community mailing list
PSF-Community at python.org
https://mail.python.org/mailman/listinfo/psf-community
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160405/cfd02f0b/attachment.html>

From rdmurray at bitdance.com  Tue Apr  5 15:56:30 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 05 Apr 2016 15:56:30 -0400
Subject: [Python-Dev] bugs.python.org email blockage at gmail
In-Reply-To: <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
Message-ID: <20160405195631.B2F35B14156@webabinitio.net>

We think we have a partial (and hopefully temporary) solution to the
bugs email blockage: ipv6 has been turned off on bugs, so it is sending
only from the ipv4 address.  Google appears to be accepting the emails
again.  However, the IPV4 address has a poor reputation, and Verizon
at least appears to be blocking it.  So more work is still needed.

--David

From brett at python.org  Tue Apr  5 18:41:14 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 05 Apr 2016 22:41:14 +0000
Subject: [Python-Dev] When should pathlib stop being provisional?
Message-ID: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>

After a rather extensive discussion on python-ideas about pathlib.PurePath
not inheriting from str, another point that came up was that the use of
pathlib has been rather light. Unfortunately even the stdlib doesn't really
use pathlib because it's currently marked as provisional (or at least
that's why I haven't tried to use it where possible in importlib).

Do we have a plan of what is required to remove the provisional label from
pathlib?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160405/9941b251/attachment.html>

From guido at python.org  Tue Apr  5 18:55:28 2016
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 Apr 2016 15:55:28 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
Message-ID: <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>

It's been provisional since 3.4. I think if it is still there in 3.6.0
it should be considered no longer provisional. But this may indeed be
a test case for the ultimate fate of provisional modules -- should we
remove it?

I have to admit I got tired of the discussions and muted them all.
Personally I am not worried about the light use (I always expected it
would take a long time to get adoption) but I am worried about the
hostility towards the module. My last/only comment in the discussion
was about there possibly being a dichotomy between people who use
Python for scripting and those who use it to write more substantial
programs (I'm trying not to judge one group more important than
another -- I'm just observing there seem to be these two groups). But
I didn't stick around long enough to watch for responses to this idea.

Would making it inherit from str cause most hostility to disappear?
I'm sure there was a discussion about this when PEP 428 was originally
proposed, and I recall I was strongly in the camp of "it should not
inherit from str", but unfortunately the PEP has no mention of this
discussion or even the stated reason.

--Guido


On Tue, Apr 5, 2016 at 3:41 PM, Brett Cannon <brett at python.org> wrote:
> After a rather extensive discussion on python-ideas about pathlib.PurePath
> not inheriting from str, another point that came up was that the use of
> pathlib has been rather light. Unfortunately even the stdlib doesn't really
> use pathlib because it's currently marked as provisional (or at least that's
> why I haven't tried to use it where possible in importlib).
>
> Do we have a plan of what is required to remove the provisional label from
> pathlib?
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

From antoine at python.org  Tue Apr  5 18:45:23 2016
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 6 Apr 2016 00:45:23 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
Message-ID: <57044003.3050400@python.org>


I think the provisional status can be safely lifted now.  Even though
pathlib hasn't seen that much use, there have been enough reports and
discussion since its acception that I think the API has proven it's sane
for general use.

(as for importlib, pathlib might have too many dependencies for sane
bootstrapping)

Regards

Antoine.


Le 06/04/2016 00:41, Brett Cannon a ?crit :
> After a rather extensive discussion on python-ideas about
> pathlib.PurePath not inheriting from str, another point that came up was
> that the use of pathlib has been rather light. Unfortunately even the
> stdlib doesn't really use pathlib because it's currently marked as
> provisional (or at least that's why I haven't tried to use it where
> possible in importlib).
> 
> Do we have a plan of what is required to remove the provisional label
> from pathlib?

From tritium-list at sdamon.com  Tue Apr  5 19:08:23 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Tue, 5 Apr 2016 19:08:23 -0400
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
Message-ID: <57044567.6070308@sdamon.com>

On 4/5/2016 18:55, Guido van Rossum wrote:
>   My last/only comment in the discussion
> was about there possibly being a dichotomy between people who use
> Python for scripting and those who use it to write more substantial
> programs (I'm trying not to judge one group more important than
> another -- I'm just observing there seem to be these two groups). But
> I didn't stick around long enough to watch for responses to this idea.
This was all but ignored.

The opinions mentioned in the thread, without throwing my opinion behind 
any of them were:

* pathlib should be improved (specifically by making it inherit from str)
* the stdlib should be made to deal with pathlib without changing pathlib
* pathlib is redundant to third party modules which work better
* the continued existence of pathlib was briefly discussed

You can insert the never-ending arguments for and against each of those 
points in your head - none of them were particularly convincing (in that 
i don't think anyone changed their position.)

the split between utility scripting and application development was not 
really discussed.

From rosuav at gmail.com  Tue Apr  5 19:13:24 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 6 Apr 2016 09:13:24 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <57044567.6070308@sdamon.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
Message-ID: <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>

On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters
<tritium-list at sdamon.com> wrote:
> * pathlib should be improved (specifically by making it inherit from str)

I'd like to see this specific change settled on in the PEP, actually.
There are some arguments on both sides, and some hybrid solutions
being proposed, and it looks to be an important enough issue to people
for there to be an answer somewhere. It seems to come down to a
sloppiness vs strictness concern, I think, but I'm not sure.

ChrisA

From guido at python.org  Tue Apr  5 19:45:50 2016
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 Apr 2016 16:45:50 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
Message-ID: <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>

On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters
> <tritium-list at sdamon.com> wrote:
>> * pathlib should be improved (specifically by making it inherit from str)
>
> I'd like to see this specific change settled on in the PEP, actually.
> There are some arguments on both sides, and some hybrid solutions
> being proposed, and it looks to be an important enough issue to people
> for there to be an answer somewhere. It seems to come down to a
> sloppiness vs strictness concern, I think, but I'm not sure.

This does sound like it's the crucial issue, and it is worth writing
up clearly the pros and cons. Let's draft those lists in a thread
(this one's fine) and then add them to the PEP. We can then decide to:

- keep the status quo
- change PurePath to inherit from str
- decide it's never going to be settled and kill pathlib.py

(And yes, I'm dead serious about the latter, rather Solomonic option.)

-- 
--Guido van Rossum (python.org/~guido)

From brett at python.org  Tue Apr  5 19:47:32 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 05 Apr 2016 23:47:32 +0000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
Message-ID: <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>

On Tue, 5 Apr 2016 at 15:55 Guido van Rossum <guido at python.org> wrote:

> It's been provisional since 3.4. I think if it is still there in 3.6.0
> it should be considered no longer provisional. But this may indeed be
> a test case for the ultimate fate of provisional modules -- should we
> remove it?
>
> I have to admit I got tired of the discussions and muted them all.
>

:) I figured. I was close myself until I decided to be the "not inheriting
from str is a sane decision" camp because people weren't understanding
where the design decision probably came from, hence
http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str .


> Personally I am not worried about the light use (I always expected it
> would take a long time to get adoption)


Ditto. My expectation/hope is that once we stop having it be provisional
and we start using it in the stdlib then usage will pick up, especially if
libraries pick up the `getattr(path, 'path', path)` idiom as an easy
transition technique until they decide to drop support for str-based paths.
The main motivation of this email is actually to have newcomers to the
sprints at PyCon US sprint on adding support for pathlib (after we add
"path-like object" to the glossary to say something like "a `str` object or
an object that has a `path` attribute that itself is a `str`").


> but I am worried about the
> hostility towards the module. My last/only comment in the discussion
> was about there possibly being a dichotomy between people who use
> Python for scripting and those who use it to write more substantial
> programs (I'm trying not to judge one group more important than
> another -- I'm just observing there seem to be these two groups). But
> I didn't stick around long enough to watch for responses to this idea.
>

Nope, no response (as Alexander pointed out).


>
> Would making it inherit from str cause most hostility to disappear?
>

Probably. Most people were upset with pathlib because they couldn't use it
immediately with all of the third-party libraries out there on top of the
stdlib because adoption has been so low. Now if we make a concerted effort
to accept pathlib in the stdlib then this may be the kick in the pants that
it takes to start getting people to accept it externally and the transition
band-aid of inheriting from str may not be needed.

To me it seems to basically be a question of whether people can be patient
during a transition and embrace pathlib over time or if they will simply
refuse to add support in libraries and refuse to use `getattr(path, 'path',
path)` or `str(path)` in the mean time. Personally, if we can wait out the
Python 3 transition I have no issue waiting on a transition like this that
has no backward-compatibility issues and has a one-liner solution for
adding shallow support (and thus is ripe for quick patches to projects).

After the whole str thing the only other major topic was coming up with
some easier way to produce pathlib.Path instances (e.g. the p-string
suggestion). Nothing really came of those discussions that seemed concrete
and reach consensus, though (I think that may have been where your
scripting/substantial programming comment came from).


> I'm sure there was a discussion about this when PEP 428 was originally
> proposed, and I recall I was strongly in the camp of "it should not
> inherit from str", but unfortunately the PEP has no mention of this
> discussion or even the stated reason.
>

https://www.python.org/dev/peps/pep-0428/#no-confusion-with-builtins is the
best you get in the PEP.

-Brett


>
> --Guido
>
>
> On Tue, Apr 5, 2016 at 3:41 PM, Brett Cannon <brett at python.org> wrote:
> > After a rather extensive discussion on python-ideas about
> pathlib.PurePath
> > not inheriting from str, another point that came up was that the use of
> > pathlib has been rather light. Unfortunately even the stdlib doesn't
> really
> > use pathlib because it's currently marked as provisional (or at least
> that's
> > why I haven't tried to use it where possible in importlib).
> >
> > Do we have a plan of what is required to remove the provisional label
> from
> > pathlib?
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160405/b0bb8545/attachment-0001.html>

From rosuav at gmail.com  Tue Apr  5 20:02:30 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 6 Apr 2016 10:02:30 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
Message-ID: <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>

On Wed, Apr 6, 2016 at 9:45 AM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters
>> <tritium-list at sdamon.com> wrote:
>>> * pathlib should be improved (specifically by making it inherit from str)
>>
>> I'd like to see this specific change settled on in the PEP, actually.
>> There are some arguments on both sides, and some hybrid solutions
>> being proposed, and it looks to be an important enough issue to people
>> for there to be an answer somewhere. It seems to come down to a
>> sloppiness vs strictness concern, I think, but I'm not sure.
>
> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py
>
> (And yes, I'm dead serious about the latter, rather Solomonic option.)

Summarizing from memory to get things started.

Inheriting from str makes it easier for code to support pathlib
without really caring about the details.

NOT inheriting from str forces code to be aware that it's working with
a path, in the same way that text and bytes are fundamentally
different things, and the Unicode string doesn't inherit from the byte
string, nor vice versa.

If a few crucial built-in functions support Path objects (notably
open() and a handful of os.* functions), the bulk of stdlib support
will be easy (sometimes trivial) to implement.

Paths are [or are not] fundamentally different from strings. <-- argued point

Paths might be backed by Unicode text, and might be backed by bytes.
Should a Path be able to be implicitly constructed from either?

Should there be some sort of "Path literal"? <-- possibly a completely
separate question, to be resolved after this one

How should .. be handled? Can you canonicalize a Path?

Can Path handle URIs as well as file system paths?

-----

My personal view on the text/bytes debate is that a path is
fundamentally a human concept, and consists therefore of text. The
fact that some file systems store (at the low level) bytes and some
store (I think) UTF-16 code units should be immaterial; path
components exist for people. We can smuggle unrecognized bytes around,
but ultimately, those bytes came from characters at some point - we
just don't know the encoding. So a Path object has no relationship
with bytes, only with str.

Whether a Path is fundamentally "a text string that uses slashes to
separate components" or "a tuple of path components" is up for debate.
Both make a lot of sense, and I'm somewhat inclined to the latter
view; it allows for other forms of path component, such as an open
directory (for statat/openat etc), or a special thing representing
"current directory" or "root directory".

ChrisA

From tjreedy at udel.edu  Tue Apr  5 21:27:05 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 5 Apr 2016 21:27:05 -0400
Subject: [Python-Dev] bugs.python.org email blockage at gmail
In-Reply-To: <20160405195631.B2F35B14156@webabinitio.net>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
 <20160405195631.B2F35B14156@webabinitio.net>
Message-ID: <ne1olg$h4q$1@ger.gmane.org>

On 4/5/2016 3:56 PM, R. David Murray wrote:
> We think we have a partial (and hopefully temporary) solution to the
> bugs email blockage: ipv6 has been turned off on bugs, so it is sending
> only from the ipv4 address.  Google appears to be accepting the emails
> again.  However, the IPV4 address has a poor reputation, and Verizon
> at least appears to be blocking it.  So more work is still needed.

Switching back to Google from Verizon.

How is bugs email sent differently from list email?  What the latter 
does works fine, at least for gmail.

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Tue Apr  5 21:39:15 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 5 Apr 2016 21:39:15 -0400
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
Message-ID: <ne1pca$q86$1@ger.gmane.org>

On 4/5/2016 7:45 PM, Guido van Rossum wrote:

> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py
>
> (And yes, I'm dead serious about the latter, rather Solomonic option.)

My sense of the discussion was that some people think that the 
new-in-upcoming 3.5.2 PurePath.path should serve as a substitute for 
inheriting from str.  In particular, it should make it easy for 
stringpath functions to also accept path objects.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Tue Apr  5 22:21:04 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 12:21:04 +1000
Subject: [Python-Dev] bugs.python.org email blockage at gmail
In-Reply-To: <ne1olg$h4q$1@ger.gmane.org>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
 <20160405195631.B2F35B14156@webabinitio.net>
 <ne1olg$h4q$1@ger.gmane.org>
Message-ID: <CADiSq7cOjF8UwLYXQDfKSqLFezXuQStHgtqML3+wsBBQ8XqPNw@mail.gmail.com>

On 6 April 2016 at 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
> On 4/5/2016 3:56 PM, R. David Murray wrote:
>>
>> We think we have a partial (and hopefully temporary) solution to the
>> bugs email blockage: ipv6 has been turned off on bugs, so it is sending
>> only from the ipv4 address.  Google appears to be accepting the emails
>> again.  However, the IPV4 address has a poor reputation, and Verizon
>> at least appears to be blocking it.  So more work is still needed.
>
> Switching back to Google from Verizon.
>
> How is bugs email sent differently from list email?  What the latter does
> works fine, at least for gmail.

bugs.python.org is currently sending notification emails directly to
recipients, rather than routing them via the outbound SMTP server on
mail.python.org.

Reconfiguring it to relay notifications via the main outgoing server
is the longer term fix, but an initial attempt at enabling that
resulted in errors in the bugs.python.org mail logs, so David reverted
to the direct email configuration for the time being.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From steve at pearwood.info  Tue Apr  5 22:40:13 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 6 Apr 2016 12:40:13 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
Message-ID: <20160406024012.GG12526@ando.pearwood.info>

I haven't really been following this discussion, but a couple of 
comments...

On Tue, Apr 05, 2016 at 11:47:32PM +0000, Brett Cannon wrote:

> http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str

Nice write-up, thanks.


[...]
> To me it seems to basically be a question of whether people can be patient
> during a transition and embrace pathlib over time or if they will simply
> refuse to add support in libraries and refuse to use `getattr(path, 'path',
> path)` or `str(path)` in the mean time.

Wait, what? Is that what the whole fuss is about? That some people 
refuse to call str(path) when passing a path object to a function that 
expects a string? Really? That's it?

The mind boggles.


-- 
Steve

From ncoghlan at gmail.com  Tue Apr  5 22:44:47 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 12:44:47 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
Message-ID: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>

On 6 April 2016 at 09:45, Guido van Rossum <guido at python.org> wrote:
> On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters
>> <tritium-list at sdamon.com> wrote:
>>> * pathlib should be improved (specifically by making it inherit from str)
>>
>> I'd like to see this specific change settled on in the PEP, actually.
>> There are some arguments on both sides, and some hybrid solutions
>> being proposed, and it looks to be an important enough issue to people
>> for there to be an answer somewhere. It seems to come down to a
>> sloppiness vs strictness concern, I think, but I'm not sure.
>
> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py

Option 4: define a rich-object-to-text path serialisation convention,
as paths are not conceptually the same as arbitrary strings, and we
can define a new protocol accepted by builtins and standard library
modules, while third parties can't

The most promising option for that is probably "getattr(path, 'path',
path)", since the "path" attribute is being added to pathlib, and the
given idiom can be readily adopted in Python 2/3 compatible code
(since normal strings and any other object without a "path" attribute
are passed through unchanged). Alternatively, since it's a protocol,
double-underscores on the property name may be appropriate (i.e.
"getattr(path, '__path__', path)")

The next challenge would then be to make a list of APIs to be updated
for 3.6 to implicitly accept "rich path" objects via the agreed
convention, with pathlib.PurePath used as a test class:

* open()
* codecs.open() (et al)
* io.*
* os.path.*
* other os functions
* shutil.*
* tempfile.*
* shelve.*
* csv.*

The list wouldn't necessarily need to be 100% comprehensive (similar
to the rollout of context management, "support rich path objects in
API <X>" may appear as future RFEs), but it should be comprehensive
enough for rich path objects to mostly "just work" with other APIs
that aren't specifically limiting their inputs to str objects
(although using lower level APIs may force a conversion to the lower
level plain text representation as a side-effect).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From steve at pearwood.info  Tue Apr  5 22:51:55 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 6 Apr 2016 12:51:55 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>
Message-ID: <20160406025154.GH12526@ando.pearwood.info>

On Wed, Apr 06, 2016 at 10:02:30AM +1000, Chris Angelico wrote:

> My personal view on the text/bytes debate is that a path is
> fundamentally a human concept, and consists therefore of text. The
> fact that some file systems store (at the low level) bytes and some
> store (I think) UTF-16 code units should be immaterial; path
> components exist for people. We can smuggle unrecognized bytes around,
> but ultimately, those bytes came from characters at some point - we
> just don't know the encoding. So a Path object has no relationship
> with bytes, only with str.

That might be usually true in practice, but it is incorrect in 
principle. Paths in POSIX systems like Linux are fundamentally 
byte-strings with only two restrictions: \0 and \x2f are forbidden.

The fact that paths in Linux mostly happen to look like English words 
(often heavily abbreviated) is a historical accident. The file system 
itself supported paths containing (say) \xff even back in the days when 
text was pure US-ASCII and bytes over \x7f had no textual meaning, and 
these days paths still support sequences of bytes that have no human 
meaning in any encoding.

I don't know if this makes the tiniest lick of difference for Pathlib. I 
would be perfectly content if we stuck with the design decision that 
Pathlib can only represent paths representable as Unicode strings, and 
left weird POSIX filenames to the legacy byte-string interface.


-- 
Steve

From stephen at xemacs.org  Tue Apr  5 23:03:36 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 6 Apr 2016 12:03:36 +0900
Subject: [Python-Dev]  bugs.python.org email blockage at gmail
In-Reply-To: <20160405195631.B2F35B14156@webabinitio.net>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
 <20160405195631.B2F35B14156@webabinitio.net>
Message-ID: <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp>

R. David Murray writes:

 > again.  However, the IPV4 address has a poor reputation, and Verizon
 > at least appears to be blocking it.  So more work is still needed.

Don't take Verizon's policy as meaningful.  Tell Verizon customers to
get another address.  That is the only solution that works for Verizon
subscribers for very long (based on 15 years of Mailman-Users posts),
they have never been a high-quality email provider.  Further, Verizon
(as an email provider) is in the process of dying anyway (they are
very much alive as the new owner of AOL), so improvements in their
email practices have a likelihood of zero to the resolution of a C
float.



From stephen at xemacs.org  Tue Apr  5 23:03:59 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 6 Apr 2016 12:03:59 +0900
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
In-Reply-To: <CAJ3HoZ33RdKQcjYp5EZf8QMFHe+NrVSRALzgL8TqcaVAqNLOjg@mail.gmail.com>
References: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
 <CAMpsgwb-f2+xXf5KCnY5882gVODM4x3EQk=s56CYm0Ku3rKZsA@mail.gmail.com>
 <CAJ3HoZ33RdKQcjYp5EZf8QMFHe+NrVSRALzgL8TqcaVAqNLOjg@mail.gmail.com>
Message-ID: <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp>

Robert Collins writes:

 > Sadly that has the ordering bug of assigning __wrapped__ first and appears
 > a little unmaintained based on the bug tracker :(

You can fix two problems with one patch, then!


From tritium-list at sdamon.com  Tue Apr  5 23:06:36 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Tue, 5 Apr 2016 23:06:36 -0400
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <57047D3C.2030700@sdamon.com>

On 4/5/2016 22:44, Nick Coghlan wrote:
> Option 4: define a rich-object-to-text path serialisation convention,
> as paths are not conceptually the same as arbitrary strings
Just as a nit to pick, it is perfectly acceptable for hypothetical path 
objects to raise when someone tries to shoehorn them into acting like 
arbitrary strings - open() will gladly halt and set fire if you try and 
pass the text of war and peace as an argument.

I think the naysayers would be satisfied with an object that... while 
not str or bytes or a derived class of either... acted like str when it 
had to.  Is that possible without deriving from str or bytes?

From rosuav at gmail.com  Tue Apr  5 23:18:09 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 6 Apr 2016 13:18:09 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <20160406025154.GH12526@ando.pearwood.info>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>
 <20160406025154.GH12526@ando.pearwood.info>
Message-ID: <CAPTjJmpfcek3_kmnYCEx0fFV62GB=g1iAG_7KZ0YjHA2YYOs_g@mail.gmail.com>

On Wed, Apr 6, 2016 at 12:51 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Wed, Apr 06, 2016 at 10:02:30AM +1000, Chris Angelico wrote:
>
>> My personal view on the text/bytes debate is that a path is
>> fundamentally a human concept, and consists therefore of text. The
>> fact that some file systems store (at the low level) bytes and some
>> store (I think) UTF-16 code units should be immaterial; path
>> components exist for people. We can smuggle unrecognized bytes around,
>> but ultimately, those bytes came from characters at some point - we
>> just don't know the encoding. So a Path object has no relationship
>> with bytes, only with str.
>
> That might be usually true in practice, but it is incorrect in
> principle. Paths in POSIX systems like Linux are fundamentally
> byte-strings with only two restrictions: \0 and \x2f are forbidden.

That's the file system level. But more fundamentally than that, a path
exists so that humans can refer to files. That's why they have
*names*, not just dirent numbers. We could assign dirent number -1 to
mean "parent directory", and then represent everything with tuples of
directory entries. Follow the chain and you get an inode. Absolute
paths would start with an inode (the root directory being inode 2) and
proceed with dirents thereafter. Maybe we'd need a pseudo-inode to
mean "current directory". Should we do paths like this? No way! Much
better to have either "/home/rosuav/cpython/python" or (P.ROOT,
"home", "rosuav", "cpython", "python") to represent them, because they
exist for the human.

The POSIX file system rules aren't insignificant, but my point is that
every byte value seen in a file name was once representing a
character. Outside of deliberate tests, we don't create files on our
disks whose names are strings of random bytes; the normal use of a
file system is to store files that a human has named. Hence my
recommendation that a Path object be tied to str, but *not* to bytes.

> The fact that paths in Linux mostly happen to look like English words
> (often heavily abbreviated) is a historical accident. The file system
> itself supported paths containing (say) \xff even back in the days when
> text was pure US-ASCII and bytes over \x7f had no textual meaning, and
> these days paths still support sequences of bytes that have no human
> meaning in any encoding.
>
> I don't know if this makes the tiniest lick of difference for Pathlib. I
> would be perfectly content if we stuck with the design decision that
> Pathlib can only represent paths representable as Unicode strings, and
> left weird POSIX filenames to the legacy byte-string interface.

I'd prefer to keep the surrogateescape compatibility hack with U+DC00
to U+DCFF being used to smuggle bytes around. That means that every
path can be represented as a Unicode string, with only minor loss of
functionality (imagine a path with only a single character that can't
be decoded - chances are a human can figure out what the file is), but
it still strongly pushes to a Unicode interpretation of the path.

An *actual* byte-string interface (such as os.listdir and friends
support) would be completely outside of anything involving Pathlib. If
you give bytes, you'll get bytes. And I'd deprecate that once Path
objects are more broadly accepted.

ChrisA

From ethan at stoneleaf.us  Wed Apr  6 00:29:18 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 05 Apr 2016 21:29:18 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
Message-ID: <5704909E.8070908@stoneleaf.us>

On 04/05/2016 03:55 PM, Guido van Rossum wrote:

> It's been provisional since 3.4. I think if it is still there in 3.6.0
> it should be considered no longer provisional. But this may indeed be
> a test case for the ultimate fate of provisional modules -- should we
> remove it?

We should either remove it or make the rest of the stdlib work with it. 
  Currently, pathlib.*Paths are second-class citizens, and working with 
them is not significantly better than working with os.path.* simply 
because we have to cast to str every time we want to deal with any other 
part of the stdlib.

> Would making it inherit from str cause most hostility to disappear?

I don't think that is necessary.  The hostility (of which I have some) 
is because we can't do:

     app_root = Path(...)
     config = app_root/'settings.cfg'
     with open(config) as blah:
         # whatever

It feels like instead of addressing this basic disconnect, the answer 
has instead been:  add that to pathlib!  Which works great -- until a 
user or a library gets this path object and tries to use something from 
os on it.

To come at this from a different angle:  Python now has Enum; it is 
arguable that Path is more important, or at least much more useful.  We 
have IntEnum whose sole purpose in life is to make it possible to 
(mostly) seamlessly work with the stdlib and other libraries where ints 
are being used to represent enumerations; and in pathlib we have . . . 
absolutely nothing.  We have the promise of great things and wonderful 
usability, but in reality we have just as much pain as before -- or more 
if we forget to str(path) somewhere.

I said that pathlib.Path does not need to inherit from str, and I still 
think that; however, to be a good stepping stone / transitional library 
I think the pathlib backport does need to have its Paths inherit from str.

--
~Ethan~

From ethan at stoneleaf.us  Wed Apr  6 00:35:57 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 05 Apr 2016 21:35:57 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <20160406024012.GG12526@ando.pearwood.info>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
 <20160406024012.GG12526@ando.pearwood.info>
Message-ID: <5704922D.7060905@stoneleaf.us>

On 04/05/2016 07:40 PM, Steven D'Aprano wrote:
> On Tue, Apr 05, 2016 at 11:47:32PM +0000, Brett Cannon wrote:

>> To me it seems to basically be a question of whether people can be patient
>> during a transition and embrace pathlib over time or if they will simply
>> refuse to add support in libraries and refuse to use `getattr(path, 'path',
>> path)` or `str(path)` in the mean time.
>
> Wait, what? Is that what the whole fuss is about? That some people
> refuse to call str(path) when passing a path object to a function that
> expects a string?

No, Stephen, that is not what this is about.  This is about the ugliness 
of code with str(path) this and str(path) that and let's not forget the 
Path(this_returned_string) and Path(that_returned_string), not to 
mention the frustrations of forgetting to cast a str to Path or a Path 
to str.  It's about the horror of boiler-plate infecting our otherwise 
beautiful Python code.

--
~Ethan~


From ncoghlan at gmail.com  Wed Apr  6 00:49:33 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 14:49:33 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <57047D3C.2030700@sdamon.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <57047D3C.2030700@sdamon.com>
Message-ID: <CADiSq7cBWjLVrXGFC5H6OpdCuDHnuZz4GzikuAQpPLqd2zj=5Q@mail.gmail.com>

On 6 April 2016 at 13:06, Alexander Walters <tritium-list at sdamon.com> wrote:
> I think the naysayers would be satisfied with an object that... while not
> str or bytes or a derived class of either... acted like str when it had to.
> Is that possible without deriving from str or bytes?

Only if the consuming code explicitly casts with "str()", and that's
*too* permissive for most use cases (since __str__ and the __repr__
fallback are completely inappropriate as a "convert to a text
representation of a filesystem path" command).

A "__text__" protocol for non-lossy conversions to str would arguably
be feasible, but its scope goes way beyond what's needed for a "rich
path object" conversion protocol.

Implementing that model in the general case would require something
more akin to https://www.python.org/dev/peps/pep-0357/, which added
__index__ as a guaranteed-non-lossy conversion from other types to a
builtin integer, allowing non-builtin integers to accepted for things
like slicing and sequence repetition, without inadvertently also
accepting non-integral types like builtin floats.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From guido at python.org  Wed Apr  6 01:00:01 2016
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 Apr 2016 22:00:01 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704909E.8070908@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
Message-ID: <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>

On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> [...] we can't do:
>
>     app_root = Path(...)
>     config = app_root/'settings.cfg'
>     with open(config) as blah:
>         # whatever
>
> It feels like instead of addressing this basic disconnect, the answer has
> instead been:  add that to pathlib!  Which works great -- until a user or a
> library gets this path object and tries to use something from os on it.

I agree that asking for config.open() isn't the right answer here
(even if it happens to work). But in this example, once 3.5.2 is out,
the solution would be to use open(config.path), and that will also
work when passing it to a library. Is it still unacceptable then?

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Apr  6 01:03:22 2016
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 Apr 2016 22:03:22 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <CAP7+vJ+qm0aOPrRw1g=XZzg4Y4GZw0Pm=rbcVwNr=4sWUkJQTw@mail.gmail.com>

On Tue, Apr 5, 2016 at 7:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Option 4: define a rich-object-to-text path serialisation convention,

Unfortunately that sounds like a classic "serious programming"
solution (objects, abstractions, serialization, all big important
words :-).

-- 
--Guido van Rossum (python.org/~guido)

From storchaka at gmail.com  Wed Apr  6 01:28:29 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 6 Apr 2016 08:28:29 +0300
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
Message-ID: <ne26pu$np9$1@ger.gmane.org>

On 06.04.16 01:41, Brett Cannon wrote:
> After a rather extensive discussion on python-ideas about
> pathlib.PurePath not inheriting from str, another point that came up was
> that the use of pathlib has been rather light. Unfortunately even the
> stdlib doesn't really use pathlib because it's currently marked as
> provisional (or at least that's why I haven't tried to use it where
> possible in importlib).
>
> Do we have a plan of what is required to remove the provisional label
> from pathlib?

The behavior of the Path.resolve() method likely should be changed with 
breaking backward compatibility. There is an open issue about this.


From stephen at xemacs.org  Wed Apr  6 01:37:27 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 6 Apr 2016 14:37:27 +0900
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPTjJmpfcek3_kmnYCEx0fFV62GB=g1iAG_7KZ0YjHA2YYOs_g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>
 <20160406025154.GH12526@ando.pearwood.info>
 <CAPTjJmpfcek3_kmnYCEx0fFV62GB=g1iAG_7KZ0YjHA2YYOs_g@mail.gmail.com>
Message-ID: <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp>

Chris Angelico writes:

 > Outside of deliberate tests, we don't create files on our disks
 > whose names are strings of random bytes;

Wishful thinking.  First, names made of control characters have often
been deliberately used by miscreants to conceal their warez.  Second,
in some systems it's all too easy to create paths with components in
different locales (the place I've seen it most frequently is in NFS
mounts).  I think that's much less true today, but perhaps that's only
because my employer figured out that it was much less pain if system
paths were pure ASCII so that it mostly didn't matter what encoding
users chose for their subtrees.

It remains important to be able to handle nearly arbitrary bytestrings
in file names as far as I can see.  Please note that 100 million
Japanese and 1 billion Chinese by and large still prefer their
homegrown encodings (plural!!) to Unicode, while many systems are now
defaulting filenames to UTF-8.  There's plenty of room remaining for
copying bytestrings to arguments of open and friends.


From stephen at xemacs.org  Wed Apr  6 01:40:06 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 6 Apr 2016 14:40:06 +0900
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704922D.7060905@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
 <20160406024012.GG12526@ando.pearwood.info>
 <5704922D.7060905@stoneleaf.us>
Message-ID: <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp>

Ethan Furman writes:

 > No, Stephen, that is not what this is about.

Wrong Steven.  Spelling matters in email too.  And he's more worth
paying attention to than I am.  But I'll have my say anyway. ;-)

 > This is about the ugliness of code with str(path) this and
 > str(path) that

-1 Not good enough.  I wouldn't do it that often that "ugly" overrides
the reasoning Brett presented, and if you do, I bet one or two
personal helpers would clean up 95% of your cases.  But see Nick's
comment that "str(var)" is too permissive.  I'll have to think about
that, but my first take is he's right, and we need to do something
about making use of Path more straightforward within the stdlib.
Whatever that is, preferably would make life easier for 3rd party
usage too, of course.

Is error-checking within Path sufficiently robust in the light of "too
permissive"?  (I don't know exactly what I mean by that, but something
like if "str(var_purporting_to_be_Path)" is too permissive, are we
sure that "str(really_is_Path_var)" is "safe"?  Apparently we haven't
had a lot of beta testing.)

 > and let's not forget the Path(this_returned_string) and
 > Path(that_returned_string),

But we don't object to (de)serializing dicts to (from) str (as JSON or
pickle).  I think Path vs. string is similarly different to justify
saying so (especially when treating user input).  Note, too, that
based on discussion in that thread it seems likely that Path is likely
to be inappropriate as an internal representation of URL.RFC3986.Path.
Thus, strings that look like paths (as strings) actually will have
multiple internal representations, similarly to the way that a dict
can have multiple serializations.  If representation transformation is
not invertible, EIBTI says we need the "boilerplate".

YMMV, but that's my take.


From ncoghlan at gmail.com  Wed Apr  6 01:44:41 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 15:44:41 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJ+qm0aOPrRw1g=XZzg4Y4GZw0Pm=rbcVwNr=4sWUkJQTw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <CAP7+vJ+qm0aOPrRw1g=XZzg4Y4GZw0Pm=rbcVwNr=4sWUkJQTw@mail.gmail.com>
Message-ID: <CADiSq7cTe+tgh2k052Zq=-X39NWPdXs7QGALd=tHp-shugYa9A@mail.gmail.com>

On 6 April 2016 at 15:03, Guido van Rossum <guido at python.org> wrote:
> On Tue, Apr 5, 2016 at 7:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Option 4: define a rich-object-to-text path serialisation convention,
>
> Unfortunately that sounds like a classic "serious programming"
> solution (objects, abstractions, serialization, all big important
> words :-).

Yeah, my choice of phrasing made the idea sound more complicated than
it is. The actual change would be to add the following to some Python
standard library APIs that accept a filesystem path as an argument:

    arg = getattr(arg, "path", arg)

and the C API based equivalent to some C modules.

(With the main bike-sheddable part being whether to use the generic
"path" or something more explicit like "__fspath__" for the property
name, since pathlib can readily support either/both of them, and
"__fspath__" would be in line with the "os.fsencode" and "os.fsdecode"
abbreviations)

The key goal of this approach would be to make it so that most third
party libraries would "just work" with path objects if they were
already using os.path and other standard library APIs for path
manipulation (rather than using string methods directly), while still
avoiding the type confusion that comes from inheriting directly from
str.

>From a testing perspective, it would arguably make sense to tackle it
as a separate "test_path_protocol" test case that checked pathlib
compatibility with the APIs of interest, simply to avoid adding a
pathlib dependency to all those module tests.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From storchaka at gmail.com  Wed Apr  6 01:50:41 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 6 Apr 2016 08:50:41 +0300
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <ne283i$a9r$1@ger.gmane.org>

On 06.04.16 05:44, Nick Coghlan wrote:
> The next challenge would then be to make a list of APIs to be updated
> for 3.6 to implicitly accept "rich path" objects via the agreed
> convention, with pathlib.PurePath used as a test class:
>
> * open()
> * codecs.open() (et al)
> * io.*
> * os.path.*
> * other os functions
> * shutil.*
> * tempfile.*
> * shelve.*
> * csv.*

Not sure about os.path.*. The purpose of os.path module is manipulating 
string paths. From the perspective of pathlib it can look lower level.

Supporting pathlib.Path will complicate and slow down os.path functions 
(they are already more complex and slow than were in Python 2). Since 
os.path functions often called several times in a loop, their 
performance is important. On other hand, some Path methods are more 
efficient than os.path functions, and Path specialized code at higher 
level can be more preferable.



From greg.ewing at canterbury.ac.nz  Wed Apr  6 01:52:34 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 06 Apr 2016 17:52:34 +1200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <5704A422.8020008@canterbury.ac.nz>

Nick Coghlan wrote:
> The most promising option for that is probably "getattr(path, 'path',
> path)",

Is there something seriously wrong with str(path)?

-- 
Greg

From storchaka at gmail.com  Wed Apr  6 01:57:12 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 6 Apr 2016 08:57:12 +0300
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <ne28fo$flu$1@ger.gmane.org>

On 06.04.16 05:44, Nick Coghlan wrote:
> The most promising option for that is probably "getattr(path, 'path',
> path)", since the "path" attribute is being added to pathlib, and the
> given idiom can be readily adopted in Python 2/3 compatible code
> (since normal strings and any other object without a "path" attribute
> are passed through unchanged). Alternatively, since it's a protocol,
> double-underscores on the property name may be appropriate (i.e.
> "getattr(path, '__path__', path)")

This was already discussed. Current conclusion is using the "path" 
attribute. See http://bugs.python.org/issue22570 .



From storchaka at gmail.com  Wed Apr  6 01:59:04 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 6 Apr 2016 08:59:04 +0300
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704A422.8020008@canterbury.ac.nz>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <5704A422.8020008@canterbury.ac.nz>
Message-ID: <ne28j8$flu$2@ger.gmane.org>

On 06.04.16 08:52, Greg Ewing wrote:
> Nick Coghlan wrote:
>> The most promising option for that is probably "getattr(path, 'path',
>> path)",
>
> Is there something seriously wrong with str(path)?

What if path is None or bytes?


From ethan at stoneleaf.us  Wed Apr  6 02:20:47 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 05 Apr 2016 23:20:47 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>	<CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>	<CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>	<20160406024012.GG12526@ando.pearwood.info>	<5704922D.7060905@stoneleaf.us>
 <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp>
Message-ID: <5704AABF.9080201@stoneleaf.us>

On 04/05/2016 10:40 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>
>   > No, Stephen, that is not what this is about.
>
> Wrong Steven.  Spelling matters in email too.

Yes, it absolutely does.  My apologies.

> -1 Not good enough.  I wouldn't do it that often that "ugly" overrides
> the reasoning Brett presented [...]

> But we don't object to (de)serializing dicts to (from) str (as JSON or
> pickle).

Amusingly enough, I don't have to deal with serializing dicts.  :) 
However, as a comparison:  imagine you had to transform your dict to 
JSON every time some function wanted a dict as input.  And had to 
transform returned JSON strings in to dicts.

> I think Path vs. string is similarly different to justify
> saying so (especially when treating user input).  [...]
> Thus, strings that look like paths (as strings) actually will have
> multiple internal representations, similarly to the way that a dict
> can have multiple serializations.

I don't follow.  When dealing with the file system one passes a string* 
representing the path of the object one wants -- pretty much the same 
string that was passed in to Path.

--
~Ethan~

* or bytes, but the same sameness, really.

From ncoghlan at gmail.com  Wed Apr  6 02:23:00 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 16:23:00 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <ne28j8$flu$2@ger.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <5704A422.8020008@canterbury.ac.nz> <ne28j8$flu$2@ger.gmane.org>
Message-ID: <CADiSq7eYjtW1pBaV+4a0Y8V1y5jT0X+eZjHWPaF36T4Ehzq_fw@mail.gmail.com>

On 6 April 2016 at 15:59, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 06.04.16 08:52, Greg Ewing wrote:
>>
>> Nick Coghlan wrote:
>>>
>>> The most promising option for that is probably "getattr(path, 'path',
>>> path)",
>>
>>
>> Is there something seriously wrong with str(path)?
>
> What if path is None or bytes?

Or an int, float, list, dict, or arbitrary other object.

To be more explicit, the problem isn't what happens when the API doing
"str(path)" internally is used correctly, it's what happens when it's
used incorrectly: you end up proceeding with a nonsense string as your
path name, rather than failing early with TypeError or AttributeError.

Doing "getattr(path, 'path', path)" instead means that in the error
case (i.e. no "path" attribute), any existing argument checking is
still triggered normally.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Wed Apr  6 02:25:05 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 6 Apr 2016 16:25:05 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CAPTjJmpGd=ABLp0E_YkbQiiXDeRETzpGAQraVsCNF49uOtjMHQ@mail.gmail.com>
 <20160406025154.GH12526@ando.pearwood.info>
 <CAPTjJmpfcek3_kmnYCEx0fFV62GB=g1iAG_7KZ0YjHA2YYOs_g@mail.gmail.com>
 <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAPTjJmqJoVWwB-XZBu78ZYBrDfto=AW6V8pHSGen6qYdT+qsWA@mail.gmail.com>

On Wed, Apr 6, 2016 at 3:37 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Chris Angelico writes:
>
>  > Outside of deliberate tests, we don't create files on our disks
>  > whose names are strings of random bytes;
>
> Wishful thinking.  First, names made of control characters have often
> been deliberately used by miscreants to conceal their warez.  Second,
> in some systems it's all too easy to create paths with components in
> different locales (the place I've seen it most frequently is in NFS
> mounts).  I think that's much less true today, but perhaps that's only
> because my employer figured out that it was much less pain if system
> paths were pure ASCII so that it mostly didn't matter what encoding
> users chose for their subtrees.

Control characters are still characters, though. You can take a
bytestring consisting of byte values less than 32, decode it as UTF-8,
and have a series of codepoints to work with.

If your employer has "solved" the problem by restricting system paths
to ASCII, that's a fine solution for a single system with a single
ASCII-compatible encoding; a better solution is to mandate UTF-8 as
the file system encoding, as that's what most people are expecting
anyway.

> It remains important to be able to handle nearly arbitrary bytestrings
> in file names as far as I can see.  Please note that 100 million
> Japanese and 1 billion Chinese by and large still prefer their
> homegrown encodings (plural!!) to Unicode, while many systems are now
> defaulting filenames to UTF-8.  There's plenty of room remaining for
> copying bytestrings to arguments of open and friends.

Why exactly do they prefer these other encodings? Are they
representing characters that Unicode doesn't contain? If so, we have a
fundamental problem (no Python program is going to be able to cope
with these, without a third party library or some stupid mess of local
code); if not, you can always represent it as Unicode and encode it as
UTF-8 when it reaches the file system. Re-encoding is something that's
easy when you treat something as text, and impossible when you treat
it as bytes.

So far, you're still actually agreeing with me: paths are *text*, but
sometimes we don't know the encoding (and that's a problem to be
solved).

ChrisA

From ethan at stoneleaf.us  Wed Apr  6 02:25:59 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 05 Apr 2016 23:25:59 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <ne283i$a9r$1@ger.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne283i$a9r$1@ger.gmane.org>
Message-ID: <5704ABF7.1010905@stoneleaf.us>

On 04/05/2016 10:50 PM, Serhiy Storchaka wrote:
> On 06.04.16 05:44, Nick Coghlan wrote:
>> The next challenge would then be to make a list of APIs to be updated
>> for 3.6 to implicitly accept "rich path" objects via the agreed
>> convention, with pathlib.PurePath used as a test class:
>>
>> * open()
>> * codecs.open() (et al)
>> * io.*
>> * os.path.*
>> * other os functions
>> * shutil.*
>> * tempfile.*
>> * shelve.*
>> * csv.*
>
> Not sure about os.path.*. The purpose of os.path module is manipulating
> string paths. From the perspective of pathlib it can look lower level.

The point is that a function that receives a "path" object (whether str 
or Path) shouldn't have to care: it should be able to call os.path.split 
on the thing it received and get back a usable answer.

--
~Ethan~


From ncoghlan at gmail.com  Wed Apr  6 02:29:34 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 16:29:34 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <ne28fo$flu$1@ger.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
Message-ID: <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>

On 6 April 2016 at 15:57, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 06.04.16 05:44, Nick Coghlan wrote:
>>
>> The most promising option for that is probably "getattr(path, 'path',
>> path)", since the "path" attribute is being added to pathlib, and the
>> given idiom can be readily adopted in Python 2/3 compatible code
>> (since normal strings and any other object without a "path" attribute
>> are passed through unchanged). Alternatively, since it's a protocol,
>> double-underscores on the property name may be appropriate (i.e.
>> "getattr(path, '__path__', path)")
>
> This was already discussed. Current conclusion is using the "path"
> attribute. See http://bugs.python.org/issue22570 .

I'd missed the existing precedent in DirEntry.path, so simply taking
that and running with it sounds good to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ethan at stoneleaf.us  Wed Apr  6 02:50:06 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 05 Apr 2016 23:50:06 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
Message-ID: <5704B19E.3000405@stoneleaf.us>

On 04/05/2016 10:00 PM, Guido van Rossum wrote:
> On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> [...] we can't do:
>>
>>      app_root = Path(...)
>>      config = app_root/'settings.cfg'
>>      with open(config) as blah:
>>          # whatever
>>
>> It feels like instead of addressing this basic disconnect, the answer has
>> instead been:  add that to pathlib!  Which works great -- until a user or a
>> library gets this path object and tries to use something from os on it.
>
> I agree that asking for config.open() isn't the right answer here
> (even if it happens to work). But in this example, once 3.5.2 is out,
> the solution would be to use open(config.path), and that will also
> work when passing it to a library. Is it still unacceptable then?

On the one hand that is definitely more palatable.

On the other hand it doesn't address having the stdlib itself directly 
support Path.

On the gripping hand this feels reminiscent of the arguments over bytes 
vs unicode, but without any of the "This is why unicode is better!" bits.

Why is pathlib better than plain strings?

- attribute access to different parts such as the dirname, the
   filename, the extension (suffix)
- easy access to on-disk answers such as .exists(), .stat(), .chdir
- easy creation/modification of Path objects

What problem is it solving that makes the pain worth dealing with?

- no idea

This is an especially important point considering the str-derived Path 
libraries already out there that have the same advantages as pathlib, 
but none of the pain.

--
~Ethan~

From ncoghlan at gmail.com  Wed Apr  6 02:52:53 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 16:52:53 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704ABF7.1010905@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne283i$a9r$1@ger.gmane.org> <5704ABF7.1010905@stoneleaf.us>
Message-ID: <CADiSq7cZ=vUBAOS1aznqEoDwq7VFJ=hOpPQWqY280y=94k_rQw@mail.gmail.com>

On 6 April 2016 at 16:25, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/05/2016 10:50 PM, Serhiy Storchaka wrote:
>> On 06.04.16 05:44, Nick Coghlan wrote:
>>> The next challenge would then be to make a list of APIs to be updated
>>> for 3.6 to implicitly accept "rich path" objects via the agreed
>>> convention, with pathlib.PurePath used as a test class:
>>>
>>> * open()
>>> * codecs.open() (et al)
>>> * io.*
>>> * os.path.*
>>> * other os functions
>>> * shutil.*
>>> * tempfile.*
>>> * shelve.*
>>> * csv.*
>>
>>
>> Not sure about os.path.*. The purpose of os.path module is manipulating
>> string paths. From the perspective of pathlib it can look lower level.
>
> The point is that a function that receives a "path" object (whether str or
> Path) shouldn't have to care: it should be able to call os.path.split on the
> thing it received and get back a usable answer.

I actually think it makes sense to pursue this question in a test
driven manner: create "test_pathlib_support" as a new test case, start
passing pathlib.PurePath instances to a relatively high level API like
shutil, and see what low level interfaces need to be updated accept
filesystem path objects (in addition to strings) in order to make that
work. If shutil can be updated to support pathlib with changes solely
at at the io and os module layer, then that bodes well for
transparently enabling support in 3rd party APIs as well.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From njs at pobox.com  Wed Apr  6 02:53:05 2016
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 5 Apr 2016 23:53:05 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
Message-ID: <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>

On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 6 April 2016 at 15:57, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> On 06.04.16 05:44, Nick Coghlan wrote:
>>>
>>> The most promising option for that is probably "getattr(path, 'path',
>>> path)", since the "path" attribute is being added to pathlib, and the
>>> given idiom can be readily adopted in Python 2/3 compatible code
>>> (since normal strings and any other object without a "path" attribute
>>> are passed through unchanged). Alternatively, since it's a protocol,
>>> double-underscores on the property name may be appropriate (i.e.
>>> "getattr(path, '__path__', path)")
>>
>> This was already discussed. Current conclusion is using the "path"
>> attribute. See http://bugs.python.org/issue22570 .
>
> I'd missed the existing precedent in DirEntry.path, so simply taking
> that and running with it sounds good to me.

This makes me twitch slightly, because NumPy has had a whole set of
problems due to the ancient and minimally-considered decision to
assume a bunch of ad hoc non-namespaced method names fulfilled some
protocol -- like all .sum methods will have a signature that's
compatible with numpy's, and if an object has a .log method then
surely that computes the logarithm (what else in computing could "log"
possibly refer to?), etc. This experience may or may not be relevant,
I'm not sure -- sometimes these kinds of twitches are good guides to
intuition, and sometimes they are just knee-jerk responses to an old
and irrelevant problem :-). But you might want to at least think about
how common it might be to have existing objects with unrelated
attributes that happen to be called "path", and the bizarro problems
that might be caused if someone accidentally passes one of them to a
function that expects all .path attributes to be instances of this new
protocol.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From ncoghlan at gmail.com  Wed Apr  6 02:57:49 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 Apr 2016 16:57:49 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
Message-ID: <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>

On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I'd missed the existing precedent in DirEntry.path, so simply taking
>> that and running with it sounds good to me.
>
> This makes me twitch slightly, because NumPy has had a whole set of
> problems due to the ancient and minimally-considered decision to
> assume a bunch of ad hoc non-namespaced method names fulfilled some
> protocol -- like all .sum methods will have a signature that's
> compatible with numpy's, and if an object has a .log method then
> surely that computes the logarithm (what else in computing could "log"
> possibly refer to?), etc. This experience may or may not be relevant,
> I'm not sure -- sometimes these kinds of twitches are good guides to
> intuition, and sometimes they are just knee-jerk responses to an old
> and irrelevant problem :-)
>
> But you might want to at least think about
> how common it might be to have existing objects with unrelated
> attributes that happen to be called "path", and the bizarro problems
> that might be caused if someone accidentally passes one of them to a
> function that expects all .path attributes to be instances of this new
> protocol.

sys.path, for example.

That's why I'd actually prefer the implicit conversion protocol to be
the more explicitly named "__fspath__", with suitable "__fspath__ =
path" assignments added to DirEntry and pathlib. However, I'm also not
offering to actually *do* the work here, and the casting vote goes to
the folks pursuing the implementation effort.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From wes.turner at gmail.com  Wed Apr  6 03:14:53 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 6 Apr 2016 02:14:53 -0500
Subject: [Python-Dev] When should pathlib stop being provisional?
Message-ID: <CACfEFw86f6jkYGZNLZWdT-NZ+-yxwK2FxnODXjSvirXKFY4giQ@mail.gmail.com>

On Apr 6, 2016 1:26 AM, "Chris Angelico" <rosuav at gmail.com> wrote:
>
> On Wed, Apr 6, 2016 at 3:37 PM, Stephen J. Turnbull <stephen at xemacs.org>
wrote:
> > Chris Angelico writes:
> >
> >  > Outside of deliberate tests, we don't create files on our disks
> >  > whose names are strings of random bytes;
> >
> > Wishful thinking.  First, names made of control characters have often
> > been deliberately used by miscreants to conceal their warez.  Second,
> > in some systems it's all too easy to create paths with components in
> > different locales (the place I've seen it most frequently is in NFS
> > mounts).  I think that's much less true today, but perhaps that's only
> > because my employer figured out that it was much less pain if system
> > paths were pure ASCII so that it mostly didn't matter what encoding
> > users chose for their subtrees.
>
> Control characters are still characters, though. You can take a
> bytestring consisting of byte values less than 32, decode it as UTF-8,
> and have a series of codepoints to work with.
>
> If your employer has "solved" the problem by restricting system paths
> to ASCII, that's a fine solution for a single system with a single
> ASCII-compatible encoding; a better solution is to mandate UTF-8 as
> the file system encoding, as that's what most people are expecting
> anyway.
>
> > It remains important to be able to handle nearly arbitrary bytestrings
> > in file names as far as I can see.  Please note that 100 million
> > Japanese and 1 billion Chinese by and large still prefer their
> > homegrown encodings (plural!!) to Unicode, while many systems are now
> > defaulting filenames to UTF-8.  There's plenty of room remaining for
> > copying bytestrings to arguments of open and friends.
>
> Why exactly do they prefer these other encodings? Are they
> representing characters that Unicode doesn't contain? If so, we have a
> fundamental problem (no Python program is going to be able to cope
> with these, without a third party library or some stupid mess of local
> code); if not, you can always represent it as Unicode and encode it as
> UTF-8 when it reaches the file system. Re-encoding is something that's
> easy when you treat something as text, and impossible when you treat
> it as bytes.
>
> So far, you're still actually agreeing with me: paths are *text*, but
> sometimes we don't know the encoding (and that's a problem to be
> solved).

re: bytestring, unicode, encodings after e.g. os.path.split / Path.split:

from "[Python-ideas] Type hints for text/binary data in Python 2+3 code"

https://mail.python.org/pipermail/python-ideas/2016-March/038869.html

>> would/will it be possible to
use Typing.Text as a base class for even-more abstract string types

https://mail.python.org/pipermail/python-ideas/2016-March/039016.html

>> * Text.encoding
>> * Text.lang (urn:ietf:rfc:3066)
... forgot to CC:
>> * https://tools.ietf.org/html/rfc5646
  "Tags for Identifying Languages"
  urn:ietf:rfc:5646

is this (Path) a narrower case of string types (#strypes), because after
transformations we want to preserve string metadata like e.g encoding?

I'd vote for
* adding DirEntry.__path__ as a proxy to DirEntry.path
* standardizing on __path__ (over .path)
  * because this operation *is* fundamentally similar to e.g. __str__
    * operator.path pathify, pathifize

>
> ChrisA
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/f76702cc/attachment-0001.html>

From p.f.moore at gmail.com  Wed Apr  6 05:02:09 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 Apr 2016 10:02:09 +0100
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
Message-ID: <CACac1F_OyAq2Gkun9B7vRJLp2oSX4=i_iPWN3fqkGVTDf3fEdA@mail.gmail.com>

On 6 April 2016 at 06:00, Guido van Rossum <guido at python.org> wrote:
> On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> [...] we can't do:
>>
>>     app_root = Path(...)
>>     config = app_root/'settings.cfg'
>>     with open(config) as blah:
>>         # whatever
>>
>> It feels like instead of addressing this basic disconnect, the answer has
>> instead been:  add that to pathlib!  Which works great -- until a user or a
>> library gets this path object and tries to use something from os on it.
>
> I agree that asking for config.open() isn't the right answer here
> (even if it happens to work). But in this example, once 3.5.2 is out,
> the solution would be to use open(config.path), and that will also
> work when passing it to a library. Is it still unacceptable then?

My sense is that this will remain unacceptable to those people who
have a problem here.

The issue is not so much the ugliness of the code (in spite of the
fact that this is what people focus on) but rather the disconnect
between the mental model people have and the reality of the code they
have to write.

The basic idea behind pathlib.Path objects is that they represent a
*path*. And when you call open, you should pass it a path. So (the
argument goes) why should you have to convert the path you have (a
Path object) to pass it to a function (like open) that requires a path
argument?

Making stdlib functions work with Path objects would fix a lot of the
conceptual difficulties here. And it would also mean that (thanks to
duck typing) a lot of 3rd party code would work without change,
further alleviating the issue. But ultimately, there will still be
code that needs changing to be aware of Path objects. The change is
simple enough (patharg = str(patharg), or the getattr('path')
approach) but it's a change in mental model (this time by library
authors) and the benefit of the change is not sufficiently obvious.

Inheriting from str is the commonly-proposed solution, because in
practical terms it works. But it does so by mixing layers of
abstraction in a way that is difficult to explain to someone who
thinks of a "path" as an abstract object rather than as a (text?
byte?) string. Ultimately, all that's happening is that the burden of
keeping the abstractions separate is placed on the design, rather than
being explicit in the code. But while I have no evidence that this is
a problem, it does leave me with a nagging feeling that it "seems
similar to the bytes/text issue".

My feelings:
- I'd *like* to push for the cleaner separation of abstractions that a
"pure" Path object provides.
- It does need library writers (and in particular the stdlib) to "buy
into" the model and make changes to support Path objects
- I don't have a huge problem with using str(p) or p.path as a
workaround during the transition, but that's from the POV of throwaway
scripting. I'm not sure I'd be so happy using the workaround in code
that would need to be supported for a long time.
- I'd rather compromise on principles than abandon the idea of a
stdlib Path object
- In practical terms, inheriting from str is probably fine. At least
evidence from 3rd party path libraries indicates so.

Paul

From encukou at gmail.com  Wed Apr  6 05:30:32 2016
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 6 Apr 2016 11:30:32 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
Message-ID: <5704D738.4070507@gmail.com>

On 04/06/2016 08:53 AM, Nathaniel Smith wrote:
> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 6 April 2016 at 15:57, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>> On 06.04.16 05:44, Nick Coghlan wrote:
>>>>
>>>> The most promising option for that is probably "getattr(path, 'path',
>>>> path)", since the "path" attribute is being added to pathlib, and the
>>>> given idiom can be readily adopted in Python 2/3 compatible code
>>>> (since normal strings and any other object without a "path" attribute
>>>> are passed through unchanged). Alternatively, since it's a protocol,
>>>> double-underscores on the property name may be appropriate (i.e.
>>>> "getattr(path, '__path__', path)")
>>>
>>> This was already discussed. Current conclusion is using the "path"
>>> attribute. See http://bugs.python.org/issue22570 .
>>
>> I'd missed the existing precedent in DirEntry.path, so simply taking
>> that and running with it sounds good to me.
> 
> This makes me twitch slightly, because NumPy has had a whole set of
> problems due to the ancient and minimally-considered decision to
> assume a bunch of ad hoc non-namespaced method names fulfilled some
> protocol -- like all .sum methods will have a signature that's
> compatible with numpy's, and if an object has a .log method then
> surely that computes the logarithm (what else in computing could "log"
> possibly refer to?), etc. This experience may or may not be relevant,
> I'm not sure -- sometimes these kinds of twitches are good guides to
> intuition, and sometimes they are just knee-jerk responses to an old
> and irrelevant problem :-). But you might want to at least think about
> how common it might be to have existing objects with unrelated
> attributes that happen to be called "path", and the bizarro problems
> that might be caused if someone accidentally passes one of them to a
> function that expects all .path attributes to be instances of this new
> protocol.
> 
> -n
> 

Python was in a similar situation with the .next method on iterators,
which changed to __next__ in Python 3. PEP 3114 (which explains this
change) says:

> Code that nowhere contains an explicit call to a next method can
> nonetheless be silently affected by the presence of such
> a method. Therefore, this PEP proposes that iterators should have
> a __next__ method instead of a next method (with no change in
> semantics).

How well does that apply to path/__path__?

That PEP also introduced the next() builtin. This suggests that a
protocol with __path__/__fspath__ would need a corresponding
path()/fspath() builtin.


From antoine at python.org  Wed Apr  6 05:41:18 2016
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 6 Apr 2016 09:41:18 +0000 (UTC)
Subject: [Python-Dev] When should pathlib stop being provisional?
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
Message-ID: <loom.20160406T113422-123@post.gmane.org>

Brett Cannon <brett <at> python.org> writes:
> 
> :) I figured. I was close myself until I decided to be the "not inheriting
from str is a sane decision" camp because people weren't understanding where
the design decision probably came from,
hence?http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str

That's a good write-up, thank you. Paths don't have to inherit str
any more than IP addresses or any other thing that happens to be
passed as a string in traditional APIs.

On a concrete point, inheriting str would make the API a horrible,
confusing, dangerous mess missing regular string semantics (concatenation 
with +, for example, or indexing) with path-specific semantics and various
grey areas (should .split() have path semantics or str semantics? what
is the rule and how are people supposed to remember it?).

(of course, for PHP or Javascript programmers it may not sound like a
problem. Let "adding" two IP addresses return the concatenation of
their string representations...)

Regards

Antoine.

From antoine at python.org  Wed Apr  6 05:44:30 2016
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 6 Apr 2016 09:44:30 +0000 (UTC)
Subject: [Python-Dev] When should pathlib stop being provisional?
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
Message-ID: <loom.20160406T114350-916@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> sys.path, for example.
> 
> That's why I'd actually prefer the implicit conversion protocol to be
> the more explicitly named "__fspath__", with suitable "__fspath__ =
> path" assignments added to DirEntry and pathlib.

That was my preference as well.

> However, I'm also not
> offering to actually *do* the work here, and the casting vote goes to
> the folks pursuing the implementation effort.

Indeed.

Regards

Antoine.



From antoine at python.org  Wed Apr  6 05:50:45 2016
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 6 Apr 2016 09:50:45 +0000 (UTC)
Subject: [Python-Dev] When should pathlib stop being provisional?
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne283i$a9r$1@ger.gmane.org> <5704ABF7.1010905@stoneleaf.us>
Message-ID: <loom.20160406T114454-973@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:
> >
> > Not sure about os.path.*. The purpose of os.path module is manipulating
> > string paths. From the perspective of pathlib it can look lower level.
> 
> The point is that a function that receives a "path" object (whether str 
> or Path) shouldn't have to care: it should be able to call os.path.split 
> on the thing it received and get back a usable answer.

pathlib should already replicate the useful parts of os.path.  That was
the design goal after all.

So this is like saying you want a Python file or socket object to be
accepted by os.read(). In the rare case where you want that, you call the 
.fileno() method explicitly. The equivalent for Path objects is to
lookup the .path attribute explicitly.

Regards

Antoine.



From steve at pearwood.info  Wed Apr  6 06:45:08 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 6 Apr 2016 20:45:08 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
Message-ID: <20160406104508.GI12526@ando.pearwood.info>

On Tue, Apr 05, 2016 at 11:53:05PM -0700, Nathaniel Smith wrote:

> This makes me twitch slightly, because NumPy has had a whole set of
> problems due to the ancient and minimally-considered decision to
> assume a bunch of ad hoc non-namespaced method names fulfilled some
> protocol -- like all .sum methods will have a signature that's
> compatible with numpy's, and if an object has a .log method then
> surely that computes the logarithm (what else in computing could "log"
> possibly refer to?), etc. 

It's the down-side of duck-typing. It's all well and good accepting 
anything with a quack method, but not everything is that straight-
forward:

artist.draw()
gunslinger.draw()


I think that file system paths are important enough, and tricky enough, 
to justify their own protocol. I like Nick's suggestion of a special 
dunder method for converting path-like objects into paths, without the 
problems that str(x) has, or the risk of assuming that anything with a 
.path attribute refers to a file system path.

(maze.path, garden.path, career.path perhaps?)


-- 
Steve

From p.f.moore at gmail.com  Wed Apr  6 07:03:29 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 Apr 2016 12:03:29 +0100
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
Message-ID: <CACac1F-aKFjmXdaO8+BSuTbNdp_XkYKbWXA3LZnc0bifNjye7w@mail.gmail.com>

On 6 April 2016 at 00:45, Guido van Rossum <guido at python.org> wrote:
> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py
>
> (And yes, I'm dead serious about the latter, rather Solomonic option.)

By the way, even if there's no solution that satisfies everyone to the
"inherit from str" question, I'd still be unhappy if pathlib
disappeared from the stdlib. It's useful for quick admin scripts that
don't justify an external dependency. Those typically do quite a bit
of path manipulation, and as such benefit from the improved API of
pathlib over os.path.

+1 on making (and documenting) a final decision on the "inherit from
str" question
-1 on removing pathlib just because that decision might not satisfy everyone

Paul

From rdmurray at bitdance.com  Wed Apr  6 10:04:15 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 06 Apr 2016 10:04:15 -0400
Subject: [Python-Dev] bugs.python.org email blockage at gmail
In-Reply-To: <CADiSq7cOjF8UwLYXQDfKSqLFezXuQStHgtqML3+wsBBQ8XqPNw@mail.gmail.com>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
 <20160405195631.B2F35B14156@webabinitio.net> <ne1olg$h4q$1@ger.gmane.org>
 <CADiSq7cOjF8UwLYXQDfKSqLFezXuQStHgtqML3+wsBBQ8XqPNw@mail.gmail.com>
Message-ID: <20160406140417.57BB1B14023@webabinitio.net>

On Wed, 06 Apr 2016 12:21:04 +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 6 April 2016 at 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
> bugs.python.org is currently sending notification emails directly to
> recipients, rather than routing them via the outbound SMTP server on
> mail.python.org.

Correct.

> Reconfiguring it to relay notifications via the main outgoing server
> is the longer term fix, but an initial attempt at enabling that
> resulted in errors in the bugs.python.org mail logs, so David reverted
> to the direct email configuration for the time being.

Specifically, I think we should clean up the issues that are causing
reputation loss (which pretty much means dropping rietveld, although
in theory we could fix rietveld instead if someone wants to finish
Ezio's patch).  And then we need to understand the issue that caused
me to back out the change: something is sending null-Sender emails to
multiple recipients.  We may not need to fix it (mail.python.org rejected
them but they may be useless messages), but we probably should.  I suspect
they are actual bounces, but I don't have the time to investigate further
at this time.

--David

From rdmurray at bitdance.com  Wed Apr  6 10:08:39 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 06 Apr 2016 10:08:39 -0400
Subject: [Python-Dev] bugs.python.org email blockage at gmail
In-Reply-To: <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
 <20160405195631.B2F35B14156@webabinitio.net>
 <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp>
Message-ID: <20160406140842.E13AEB14023@webabinitio.net>

On Wed, 06 Apr 2016 12:03:36 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> R. David Murray writes:
> 
>  > again.  However, the IPV4 address has a poor reputation, and Verizon
>  > at least appears to be blocking it.  So more work is still needed.
> 
> Don't take Verizon's policy as meaningful.  Tell Verizon customers to
> get another address.  That is the only solution that works for Verizon
> subscribers for very long (based on 15 years of Mailman-Users posts),
> they have never been a high-quality email provider.  Further, Verizon
> (as an email provider) is in the process of dying anyway (they are
> very much alive as the new owner of AOL), so improvements in their
> email practices have a likelihood of zero to the resolution of a C
> float.

Yes, Mark reminded me that Verizon still isn't accepting mail from
mail.python.org, despite multiple contacts from the postmaster team.
So they are pretty much a lost cause and no one should use them for email,
I think.

However, the "poor reputation" comment came from the error message
returned by gmail when it bounced the spam-bounce-reports bugs was trying
to send to Ezio.

--David

From steve at pearwood.info  Wed Apr  6 10:39:12 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 7 Apr 2016 00:39:12 +1000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704D738.4070507@gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <5704D738.4070507@gmail.com>
Message-ID: <20160406143909.GJ12526@ando.pearwood.info>

On Wed, Apr 06, 2016 at 11:30:32AM +0200, Petr Viktorin wrote:

> Python was in a similar situation with the .next method on iterators,
> which changed to __next__ in Python 3. PEP 3114 (which explains this
> change) says:
> 
> > Code that nowhere contains an explicit call to a next method can
> > nonetheless be silently affected by the presence of such
> > a method. Therefore, this PEP proposes that iterators should have
> > a __next__ method instead of a next method (with no change in
> > semantics).
> 
> How well does that apply to path/__path__?

I think it's potentially the same. Possibly there are fewer existing 
uses of "obj.path" out there which conflict with this use, but there's 
at least one in the std lib: sys.path. 


> That PEP also introduced the next() builtin. This suggests that a
> protocol with __path__/__fspath__ would need a corresponding
> path()/fspath() builtin.

Not necessarily. Take a look at (say) dir(object()) and you'll see a few 
dunders that don't correspond to built-ins:

__reduce__  and __reduce_ex__ are used by pickle;
__sizeof__ is used by sys.getsizeof;
__subclasshook__ is used by the ABC system;

Another example is __trunc__ used by math.trunc().

So any such fspath function should stand on its own as a useful 
feature, not just because there's a dunder method __fspath__.


-- 
Steve

From njs at pobox.com  Wed Apr  6 10:50:23 2016
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Apr 2016 07:50:23 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <20160406143909.GJ12526@ando.pearwood.info>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <5704D738.4070507@gmail.com>
 <20160406143909.GJ12526@ando.pearwood.info>
Message-ID: <CAPJVwBksd5uVRnLF+9snNqde=RQNhWs_hZU_6HNxoWHzh75YWw@mail.gmail.com>

On Apr 6, 2016 07:44, "Steven D'Aprano" <steve at pearwood.info> wrote:
>
> On Wed, Apr 06, 2016 at 11:30:32AM +0200, Petr Viktorin wrote:
>
> > Python was in a similar situation with the .next method on iterators,
> > which changed to __next__ in Python 3. PEP 3114 (which explains this
> > change) says:
> >
> > > Code that nowhere contains an explicit call to a next method can
> > > nonetheless be silently affected by the presence of such
> > > a method. Therefore, this PEP proposes that iterators should have
> > > a __next__ method instead of a next method (with no change in
> > > semantics).
> >
> > How well does that apply to path/__path__?
>
> I think it's potentially the same. Possibly there are fewer existing
> uses of "obj.path" out there which conflict with this use, but there's
> at least one in the std lib: sys.path.
>
>
> > That PEP also introduced the next() builtin. This suggests that a
> > protocol with __path__/__fspath__ would need a corresponding
> > path()/fspath() builtin.
>
> Not necessarily. Take a look at (say) dir(object()) and you'll see a few
> dunders that don't correspond to built-ins:
>
> __reduce__  and __reduce_ex__ are used by pickle;
> __sizeof__ is used by sys.getsizeof;
> __subclasshook__ is used by the ABC system;
>
> Another example is __trunc__ used by math.trunc().
>
> So any such fspath function should stand on its own as a useful
> feature, not just because there's a dunder method __fspath__.

An even more precise analogy is provided by __index__, whose semantics are
to provide safe casting to integer (the name is a historical accident), as
opposed to __int__'s tendency to cast things to integer willy-nilly,
including things that really shouldn't be silently accepted as integers.
Basically __index__ is to __int__ as __(fs)path__ would be to __str__.

There's an operator.index but no builtins.index.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/f8033759/attachment-0001.html>

From ethan at stoneleaf.us  Wed Apr  6 11:01:30 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 08:01:30 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <loom.20160406T114454-973@post.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne283i$a9r$1@ger.gmane.org> <5704ABF7.1010905@stoneleaf.us>
 <loom.20160406T114454-973@post.gmane.org>
Message-ID: <570524CA.50108@stoneleaf.us>

On 04/06/2016 02:50 AM, Antoine Pitrou wrote:
> Ethan Furman <ethan <at> stoneleaf.us> writes:
>>>
>>> Not sure about os.path.*. The purpose of os.path module is manipulating
>>> string paths. From the perspective of pathlib it can look lower level.
>>
>> The point is that a function that receives a "path" object (whether str
>> or Path) shouldn't have to care: it should be able to call os.path.split
>> on the thing it received and get back a usable answer.
>
> pathlib should already replicate the useful parts of os.path.  That was
> the design goal after all.

Yes it does, and very well.

> So this is like saying you want a Python file or socket object to be
> accepted by os.read(). In the rare case where you want that, you call the
> .fileno() method explicitly. The equivalent for Path objects is to
> lookup the .path attribute explicitly.

Unfortunately for Path objects there is already a well-established 
ecosystem for dealing with paths as strings, and it currently breaks 
when passed a Path path object.  This is a high barrier to entry. 
Having the stdlib support Path objects would lower that barrier 
significantly.

--
~Ethan~

From ethan at stoneleaf.us  Wed Apr  6 11:10:06 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 08:10:06 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
Message-ID: <570526CE.5080401@stoneleaf.us>

On 04/05/2016 11:57 PM, Nick Coghlan wrote:
> On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>>> I'd missed the existing precedent in DirEntry.path, so simply taking
>>> that and running with it sounds good to me.
>>
>> This makes me twitch slightly, because NumPy has had a whole set of
>> problems due to the ancient and minimally-considered decision to
>> assume a bunch of ad hoc non-namespaced method names fulfilled some
>> protocol -- like all .sum methods will have a signature that's
>> compatible with numpy's, and if an object has a .log method then
>> surely that computes the logarithm (what else in computing could "log"
>> possibly refer to?), etc. This experience may or may not be relevant,
>> I'm not sure -- sometimes these kinds of twitches are good guides to
>> intuition, and sometimes they are just knee-jerk responses to an old
>> and irrelevant problem :-)
>>
>> But you might want to at least think about
>> how common it might be to have existing objects with unrelated
>> attributes that happen to be called "path", and the bizarro problems
>> that might be caused if someone accidentally passes one of them to a
>> function that expects all .path attributes to be instances of this new
>> protocol.
>
> sys.path, for example.
>
> That's why I'd actually prefer the implicit conversion protocol to be
> the more explicitly named "__fspath__", with suitable "__fspath__ =
> path" assignments added to DirEntry and pathlib. However, I'm also not
> offering to actually *do* the work here, and the casting vote goes to
> the folks pursuing the implementation effort.

If we decide upon __fspath__ (or __path__) I will do the work on pathlib 
and scandir to add those attributes.

--
~Ethan~

From brett at python.org  Wed Apr  6 13:26:36 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 17:26:36 +0000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <570526CE.5080401@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
Message-ID: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>

WIth Ethan volunteering to do the work to help make a path protocol a thing
-- and I'm willing to help along with propagating this through the stdlib
where I think Serhiy might be interested in helping as well -- and a
seeming consensus this is a good idea, it seems like this proposal has a
chance of actually coming to fruition.

Now we need clear details. :) Some open questions are:

   1. Name: __path__, __fspath__, or something else?
   2. Method or attribute? (changes what kind of one-liner you might use in
   libraries, but I think historically all protocols have been methods and the
   serialized string representation might be costly to build)
   3. Built-in? (name is dependent on #1 if we add one)
   4. Add the method/attribute to str? (I assume so, much like __index__()
   is on int, but I have not seen it explicitly stated so I would rather
   clarify it)
   5. Expand the C API to have something like PyObject_Path()?


Some people have asked for the pathlib PEP to have a more flushed out
reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
want to do it I can try to instil my blog post into a more succinct
paragraph or two and update the PEP myself.

Is this going to require a PEP or if we can agree on the points here are we
just going to do it? If we think it requires a PEP I'm willing to write it,
but I obviously have no issue if we skip that step either. :)

Oh, and we should resolve this before the next release of Python 3.4, 3.5,
or 3.6 so that pathlib can be updated in those releases.

-Brett


On Wed, 6 Apr 2016 at 08:09 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
> > On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
> >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
>
> >>> I'd missed the existing precedent in DirEntry.path, so simply taking
> >>> that and running with it sounds good to me.
> >>
> >> This makes me twitch slightly, because NumPy has had a whole set of
> >> problems due to the ancient and minimally-considered decision to
> >> assume a bunch of ad hoc non-namespaced method names fulfilled some
> >> protocol -- like all .sum methods will have a signature that's
> >> compatible with numpy's, and if an object has a .log method then
> >> surely that computes the logarithm (what else in computing could "log"
> >> possibly refer to?), etc. This experience may or may not be relevant,
> >> I'm not sure -- sometimes these kinds of twitches are good guides to
> >> intuition, and sometimes they are just knee-jerk responses to an old
> >> and irrelevant problem :-)
> >>
> >> But you might want to at least think about
> >> how common it might be to have existing objects with unrelated
> >> attributes that happen to be called "path", and the bizarro problems
> >> that might be caused if someone accidentally passes one of them to a
> >> function that expects all .path attributes to be instances of this new
> >> protocol.
> >
> > sys.path, for example.
> >
> > That's why I'd actually prefer the implicit conversion protocol to be
> > the more explicitly named "__fspath__", with suitable "__fspath__ =
> > path" assignments added to DirEntry and pathlib. However, I'm also not
> > offering to actually *do* the work here, and the casting vote goes to
> > the folks pursuing the implementation effort.
>
> If we decide upon __fspath__ (or __path__) I will do the work on pathlib
> and scandir to add those attributes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/ce4f4985/attachment.html>

From desmoulinmichel at gmail.com  Wed Apr  6 13:35:25 2016
From: desmoulinmichel at gmail.com (Michel Desmoulin)
Date: Wed, 6 Apr 2016 19:35:25 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <570548DD.7080108@gmail.com>

Wouldn't be better to generalize that to a "__location__" protocol,
which allow to return any kind of location, including path, url or
coordinate, ip_address, etc ?

Le 06/04/2016 19:26, Brett Cannon a ?crit :
> WIth Ethan volunteering to do the work to help make a path protocol a
> thing -- and I'm willing to help along with propagating this through the
> stdlib where I think Serhiy might be interested in helping as well --
> and a seeming consensus this is a good idea, it seems like this proposal
> has a chance of actually coming to fruition.
> 
> Now we need clear details. :) Some open questions are:
> 
>  1. Name: __path__, __fspath__, or something else?
>  2. Method or attribute? (changes what kind of one-liner you might use
>     in libraries, but I think historically all protocols have been
>     methods and the serialized string representation might be costly to
>     build)
>  3. Built-in? (name is dependent on #1 if we add one)
>  4. Add the method/attribute to str? (I assume so, much like __index__()
>     is on int, but I have not seen it explicitly stated so I would
>     rather clarify it)
>  5. Expand the C API to have something like PyObject_Path()?
> 
> 
> Some people have asked for the pathlib PEP to have a more flushed out
> reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> want to do it I can try to instil my blog post into a more succinct
> paragraph or two and update the PEP myself.
> 
> Is this going to require a PEP or if we can agree on the points here are
> we just going to do it? If we think it requires a PEP I'm willing to
> write it, but I obviously have no issue if we skip that step either. :)
> 
> Oh, and we should resolve this before the next release of Python 3.4,
> 3.5, or 3.6 so that pathlib can be updated in those releases.
> 
> -Brett
> 
> 
> On Wed, 6 Apr 2016 at 08:09 Ethan Furman <ethan at stoneleaf.us
> <mailto:ethan at stoneleaf.us>> wrote:
> 
>     On 04/05/2016 11:57 PM, Nick Coghlan wrote:
>     > On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com
>     <mailto:njs at pobox.com>> wrote:
>     >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com
>     <mailto:ncoghlan at gmail.com>> wrote:
> 
>     >>> I'd missed the existing precedent in DirEntry.path, so simply taking
>     >>> that and running with it sounds good to me.
>     >>
>     >> This makes me twitch slightly, because NumPy has had a whole set of
>     >> problems due to the ancient and minimally-considered decision to
>     >> assume a bunch of ad hoc non-namespaced method names fulfilled some
>     >> protocol -- like all .sum methods will have a signature that's
>     >> compatible with numpy's, and if an object has a .log method then
>     >> surely that computes the logarithm (what else in computing could
>     "log"
>     >> possibly refer to?), etc. This experience may or may not be relevant,
>     >> I'm not sure -- sometimes these kinds of twitches are good guides to
>     >> intuition, and sometimes they are just knee-jerk responses to an old
>     >> and irrelevant problem :-)
>     >>
>     >> But you might want to at least think about
>     >> how common it might be to have existing objects with unrelated
>     >> attributes that happen to be called "path", and the bizarro problems
>     >> that might be caused if someone accidentally passes one of them to a
>     >> function that expects all .path attributes to be instances of
>     this new
>     >> protocol.
>     >
>     > sys.path, for example.
>     >
>     > That's why I'd actually prefer the implicit conversion protocol to be
>     > the more explicitly named "__fspath__", with suitable "__fspath__ =
>     > path" assignments added to DirEntry and pathlib. However, I'm also not
>     > offering to actually *do* the work here, and the casting vote goes to
>     > the folks pursuing the implementation effort.
> 
>     If we decide upon __fspath__ (or __path__) I will do the work on pathlib
>     and scandir to add those attributes. 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com
> 

From wes.turner at gmail.com  Wed Apr  6 13:37:06 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 6 Apr 2016 12:37:06 -0500
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <570526CE.5080401@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
Message-ID: <CACfEFw_SjhLrrPM-8wk9M73r2Lz5J3MRoiLax51wu_pCbeMdcg@mail.gmail.com>

* +1 for __path__, __fspath__
  (though I don't know what each does)

* why not Text(basestring / bytestring) and pathlib.Path(Text)?
   * are there examples of cases where this cannot be?
      * if not, +1 for subclassing str/Text

      * where are the examples of method collisions between the str
interface and the pathlib.Path interface?
         * str.__div__ is nonsensical
         * pathlib.Path.__div__ is super-useful


On Apr 6, 2016 10:10 AM, "Ethan Furman" <ethan at stoneleaf.us> wrote:

> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
>
>> On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
>>
>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
>>> wrote:
>>>
>>
> I'd missed the existing precedent in DirEntry.path, so simply taking
>>>> that and running with it sounds good to me.
>>>>
>>>
>>> This makes me twitch slightly, because NumPy has had a whole set of
>>> problems due to the ancient and minimally-considered decision to
>>> assume a bunch of ad hoc non-namespaced method names fulfilled some
>>> protocol -- like all .sum methods will have a signature that's
>>> compatible with numpy's, and if an object has a .log method then
>>> surely that computes the logarithm (what else in computing could "log"
>>> possibly refer to?), etc. This experience may or may not be relevant,
>>> I'm not sure -- sometimes these kinds of twitches are good guides to
>>> intuition, and sometimes they are just knee-jerk responses to an old
>>> and irrelevant problem :-)
>>>
>>> But you might want to at least think about
>>> how common it might be to have existing objects with unrelated
>>> attributes that happen to be called "path", and the bizarro problems
>>> that might be caused if someone accidentally passes one of them to a
>>> function that expects all .path attributes to be instances of this new
>>> protocol.
>>>
>>
>> sys.path, for example.
>>
>> That's why I'd actually prefer the implicit conversion protocol to be
>> the more explicitly named "__fspath__", with suitable "__fspath__ =
>> path" assignments added to DirEntry and pathlib. However, I'm also not
>> offering to actually *do* the work here, and the casting vote goes to
>> the folks pursuing the implementation effort.
>>
>
> If we decide upon __fspath__ (or __path__) I will do the work on pathlib
> and scandir to add those attributes.
>
> --
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/ede89530/attachment.html>

From brett at python.org  Wed Apr  6 13:41:14 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 17:41:14 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <570548DD.7080108@gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
Message-ID: <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>

On Wed, 6 Apr 2016 at 10:36 Michel Desmoulin <desmoulinmichel at gmail.com>
wrote:

> Wouldn't be better to generalize that to a "__location__" protocol,
> which allow to return any kind of location, including path, url or
> coordinate, ip_address, etc ?
>

No because all of those things have different semantic meaning. See the
__index__ PEP for reasons why you would tightly bound protocols instead of
overloading ones like __int__ for multiple meanings.

-Brett


>
> Le 06/04/2016 19:26, Brett Cannon a ?crit :
> > WIth Ethan volunteering to do the work to help make a path protocol a
> > thing -- and I'm willing to help along with propagating this through the
> > stdlib where I think Serhiy might be interested in helping as well --
> > and a seeming consensus this is a good idea, it seems like this proposal
> > has a chance of actually coming to fruition.
> >
> > Now we need clear details. :) Some open questions are:
> >
> >  1. Name: __path__, __fspath__, or something else?
> >  2. Method or attribute? (changes what kind of one-liner you might use
> >     in libraries, but I think historically all protocols have been
> >     methods and the serialized string representation might be costly to
> >     build)
> >  3. Built-in? (name is dependent on #1 if we add one)
> >  4. Add the method/attribute to str? (I assume so, much like __index__()
> >     is on int, but I have not seen it explicitly stated so I would
> >     rather clarify it)
> >  5. Expand the C API to have something like PyObject_Path()?
> >
> >
> > Some people have asked for the pathlib PEP to have a more flushed out
> > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> > want to do it I can try to instil my blog post into a more succinct
> > paragraph or two and update the PEP myself.
> >
> > Is this going to require a PEP or if we can agree on the points here are
> > we just going to do it? If we think it requires a PEP I'm willing to
> > write it, but I obviously have no issue if we skip that step either. :)
> >
> > Oh, and we should resolve this before the next release of Python 3.4,
> > 3.5, or 3.6 so that pathlib can be updated in those releases.
> >
> > -Brett
> >
> >
> > On Wed, 6 Apr 2016 at 08:09 Ethan Furman <ethan at stoneleaf.us
> > <mailto:ethan at stoneleaf.us>> wrote:
> >
> >     On 04/05/2016 11:57 PM, Nick Coghlan wrote:
> >     > On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com
> >     <mailto:njs at pobox.com>> wrote:
> >     >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com
> >     <mailto:ncoghlan at gmail.com>> wrote:
> >
> >     >>> I'd missed the existing precedent in DirEntry.path, so simply
> taking
> >     >>> that and running with it sounds good to me.
> >     >>
> >     >> This makes me twitch slightly, because NumPy has had a whole set
> of
> >     >> problems due to the ancient and minimally-considered decision to
> >     >> assume a bunch of ad hoc non-namespaced method names fulfilled
> some
> >     >> protocol -- like all .sum methods will have a signature that's
> >     >> compatible with numpy's, and if an object has a .log method then
> >     >> surely that computes the logarithm (what else in computing could
> >     "log"
> >     >> possibly refer to?), etc. This experience may or may not be
> relevant,
> >     >> I'm not sure -- sometimes these kinds of twitches are good guides
> to
> >     >> intuition, and sometimes they are just knee-jerk responses to an
> old
> >     >> and irrelevant problem :-)
> >     >>
> >     >> But you might want to at least think about
> >     >> how common it might be to have existing objects with unrelated
> >     >> attributes that happen to be called "path", and the bizarro
> problems
> >     >> that might be caused if someone accidentally passes one of them
> to a
> >     >> function that expects all .path attributes to be instances of
> >     this new
> >     >> protocol.
> >     >
> >     > sys.path, for example.
> >     >
> >     > That's why I'd actually prefer the implicit conversion protocol to
> be
> >     > the more explicitly named "__fspath__", with suitable "__fspath__ =
> >     > path" assignments added to DirEntry and pathlib. However, I'm also
> not
> >     > offering to actually *do* the work here, and the casting vote goes
> to
> >     > the folks pursuing the implementation effort.
> >
> >     If we decide upon __fspath__ (or __path__) I will do the work on
> pathlib
> >     and scandir to add those attributes.
> >
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/18a9b81f/attachment.html>

From brett at python.org  Wed Apr  6 13:46:51 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 17:46:51 +0000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CACfEFw_SjhLrrPM-8wk9M73r2Lz5J3MRoiLax51wu_pCbeMdcg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CACfEFw_SjhLrrPM-8wk9M73r2Lz5J3MRoiLax51wu_pCbeMdcg@mail.gmail.com>
Message-ID: <CAP1=2W5Gt5DTu9G4i3b2_SVvPejnhQud3w5i88hC76L9zaPGtg@mail.gmail.com>

On Wed, 6 Apr 2016 at 10:41 Wes Turner <wes.turner at gmail.com> wrote:

> * +1 for __path__, __fspath__
>   (though I don't know what each does)
>

Returns a string representing a file system path.


> * why not Text(basestring / bytestring) and pathlib.Path(Text)?
>

See the points about next() vs __next__()


>    * are there examples of cases where this cannot be?
>

I don't understand what you think "cannot be".


>       * if not, +1 for subclassing str/Text
>
>       * where are the examples of method collisions between the str
> interface and the pathlib.Path interface?
>

There aren't any and that's partially why some people wanted the str
subclass to begin with.

Please consider this thread a str-subclass-free zone. This line of
discussion is to flesh out the proposal for a path protocol as a proposal
against subclassing str, not to settle the whole discussion outright. If
you want to continue to debate the subclassing-str side of this please use
the other thread.

-Brett


>          * str.__div__ is nonsensical
>          * pathlib.Path.__div__ is super-useful
>
>
> On Apr 6, 2016 10:10 AM, "Ethan Furman" <ethan at stoneleaf.us> wrote:
>
>> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
>>
>>> On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
>>>> wrote:
>>>>
>>>
>> I'd missed the existing precedent in DirEntry.path, so simply taking
>>>>> that and running with it sounds good to me.
>>>>>
>>>>
>>>> This makes me twitch slightly, because NumPy has had a whole set of
>>>> problems due to the ancient and minimally-considered decision to
>>>> assume a bunch of ad hoc non-namespaced method names fulfilled some
>>>> protocol -- like all .sum methods will have a signature that's
>>>> compatible with numpy's, and if an object has a .log method then
>>>> surely that computes the logarithm (what else in computing could "log"
>>>> possibly refer to?), etc. This experience may or may not be relevant,
>>>> I'm not sure -- sometimes these kinds of twitches are good guides to
>>>> intuition, and sometimes they are just knee-jerk responses to an old
>>>> and irrelevant problem :-)
>>>>
>>>> But you might want to at least think about
>>>> how common it might be to have existing objects with unrelated
>>>> attributes that happen to be called "path", and the bizarro problems
>>>> that might be caused if someone accidentally passes one of them to a
>>>> function that expects all .path attributes to be instances of this new
>>>> protocol.
>>>>
>>>
>>> sys.path, for example.
>>>
>>> That's why I'd actually prefer the implicit conversion protocol to be
>>> the more explicitly named "__fspath__", with suitable "__fspath__ =
>>> path" assignments added to DirEntry and pathlib. However, I'm also not
>>> offering to actually *do* the work here, and the casting vote goes to
>>> the folks pursuing the implementation effort.
>>>
>>
>> If we decide upon __fspath__ (or __path__) I will do the work on pathlib
>> and scandir to add those attributes.
>>
>> --
>> ~Ethan~
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/9e7c16d5/attachment-0001.html>

From ethan at stoneleaf.us  Wed Apr  6 14:05:47 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 11:05:47 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <57054FFB.5070709@stoneleaf.us>

On 04/06/2016 10:26 AM, Brett Cannon wrote:

> WIth Ethan volunteering to do the work to help make a path protocol a
> thing -- and I'm willing to help along with propagating this through the
> stdlib where I think Serhiy might be interested in helping as well --
> and a seeming consensus this is a good idea, it seems like this proposal
> has a chance of actually coming to fruition.

Excellent!  Let's proceed along this path ;) until somebody objects.


> Now we need clear details. :) Some open questions are:
>
>  1. Name: __path__, __fspath__, or something else?

__fspath__


>  2. Method or attribute? (changes what kind of one-liner you might use
>     in libraries, but I think historically all protocols have been
>     methods and the serialized string representation might be costly to
>     build)

I would prefer an attribute, but yeah I think dunders are typically 
methods, and I don't see this being special enough to not follow that trend.


>  3. Built-in? (name is dependent on #1 if we add one)

fspath() -- and it would be handy to have a function that return either 
the __fspath__ results, or the string (if it was one), or raise an 
exception if neither of the above work out.


>  4. Add the method/attribute to str? (I assume so, much like __index__()
>     is on int, but I have not seen it explicitly stated so I would
>     rather clarify it)

I don't think that's needed.  With Path() and fspath() it's trivial to 
make sure one has what one wants.


>  5. Expand the C API to have something like PyObject_Path()?

No opinion.


> Some people have asked for the pathlib PEP to have a more flushed out
> reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> want to do it I can try to instil my blog post into a more succinct
> paragraph or two and update the PEP myself.

Nice.


> Is this going to require a PEP or if we can agree on the points here are
> we just going to do it? If we think it requires a PEP I'm willing to
> write it, but I obviously have no issue if we skip that step either. :)

If there are no (serious?) objects I don't think a PEP is needed.


> Oh, and we should resolve this before the next release of Python 3.4,
> 3.5, or 3.6 so that pathlib can be updated in those releases.

Agreed.

--
~Ethan~


From brett at python.org  Wed Apr  6 14:32:07 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 18:32:07 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57054FFB.5070709@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
Message-ID: <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>

On Wed, 6 Apr 2016 at 11:06 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/06/2016 10:26 AM, Brett Cannon wrote:
>
> > WIth Ethan volunteering to do the work to help make a path protocol a
> > thing -- and I'm willing to help along with propagating this through the
> > stdlib where I think Serhiy might be interested in helping as well --
> > and a seeming consensus this is a good idea, it seems like this proposal
> > has a chance of actually coming to fruition.
>
> Excellent!  Let's proceed along this path ;) until somebody objects.
>
>
> > Now we need clear details. :) Some open questions are:
> >
> >  1. Name: __path__, __fspath__, or something else?
>
> __fspath__
>

+1 for __path__, +0 for __fspath__ (I don't know how widespread the notion
that "fs" means "file system" is).


>
>
> >  2. Method or attribute? (changes what kind of one-liner you might use
> >     in libraries, but I think historically all protocols have been
> >     methods and the serialized string representation might be costly to
> >     build)
>
> I would prefer an attribute, but yeah I think dunders are typically
> methods, and I don't see this being special enough to not follow that
> trend.
>

Depends on what we want to tell 3rd-party libraries to do to support
pathlib if they are on 3.3 or if they are worried about people using Python
3.4.2 or 3.5.1. An attribute still works with `getattr(path, '__path__',
path)`. But with a method you probably want either `path.__path__() if
hasattr(path, '__path__') else path` or `getattr(path, '__path__', lambda:
path)()`.


>
>
> >  3. Built-in? (name is dependent on #1 if we add one)
>
> fspath() -- and it would be handy to have a function that return either
> the __fspath__ results, or the string (if it was one), or raise an
> exception if neither of the above work out.
>

So:

  # Attribute
  def fspath(path):
      hasattr(path, '__path__'):
          return path.__path__
      if isinstance(path, str):
          return path
      raise NotImplementedError  # Or TypeError?

  # Method
  def fspath(path):
      try:
          return path.__path__()
      except AttributeError:
          if isinstance(path, str):
              return path
      raise TypeError  # Or NotImplementedError?

Or you can drop the isinstance() check and simply check for the
attribute/method and use it and otherwise return `path` and let the code's
duck-typing of str handle catching an unexpected type for a path. At which
point the built-in becomes whatever idiom we promote for pathlib usage that
pre-dates this protocol.



>
> >  4. Add the method/attribute to str? (I assume so, much like __index__()
> >     is on int, but I have not seen it explicitly stated so I would
> >     rather clarify it)
>
> I don't think that's needed.  With Path() and fspath() it's trivial to
> make sure one has what one wants.
>

If we add str.__fspath__ then the function becomes:

  def fspath(path):
      return path.__fspath__()

Which might be too simplistic for a built-in, but that also means adding it
on str would potentially negate the need for a built-in.


>
>
> >  5. Expand the C API to have something like PyObject_Path()?
>
> No opinion.
>

If we add a built-in then I say we add an equivalent function in the C API.

-Brett


>
>
> > Some people have asked for the pathlib PEP to have a more flushed out
> > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> > want to do it I can try to instil my blog post into a more succinct
> > paragraph or two and update the PEP myself.
>
> Nice.
>
>
> > Is this going to require a PEP or if we can agree on the points here are
> > we just going to do it? If we think it requires a PEP I'm willing to
> > write it, but I obviously have no issue if we skip that step either. :)
>
> If there are no (serious?) objects I don't think a PEP is needed.
>
>
> > Oh, and we should resolve this before the next release of Python 3.4,
> > 3.5, or 3.6 so that pathlib can be updated in those releases.
>
> Agreed.
>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/8f72722a/attachment.html>

From ethan at stoneleaf.us  Wed Apr  6 14:54:08 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 11:54:08 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
Message-ID: <57055B50.2030209@stoneleaf.us>

On 04/06/2016 11:32 AM, Brett Cannon wrote:
> On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote:
>> On 04/06/2016 10:26 AM, Brett Cannon wrote:

>>> Now we need clear details. :) Some open questions are:
>>>
>>>  1. Name: __path__, __fspath__, or something else?
>>
>> __fspath__
>
> +1 for __path__, +0 for __fspath__ (I don't know how widespread the
> notion that "fs" means "file system" is).

Maybe __os_path__ then?  I would rather be explicit about the type of 
path we are dealing with -- who knows if we won't have __url_path__ in 
the future (besides Guido, of course ;)


>    def fspath(path):
>        try:
>            return path.__path__()
>        except AttributeError:
>            if isinstance(path, str):
>                return path
>        raise TypeError  # Or NotImplementedError?
>
> Or you can drop the isinstance() check and [...]

If the purpose of fspath() is to return a usable path-as-string then we 
should raise if unable to do it.

> If we add str.__fspath__ then the function becomes:
>
>    def fspath(path):
>        return path.__fspath__()
>
> Which might be too simplistic for a built-in, but that also means adding
> it on str would potentially negate the need for a built-in.

That is an attractive option.

--
~Ethan~

From alexander.belopolsky at gmail.com  Wed Apr  6 15:02:35 2016
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 6 Apr 2016 15:02:35 -0400
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
Message-ID: <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>

On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon <brett at python.org> wrote:

> +1 for __path__, +0 for __fspath__ (I don't know how widespread the notion
> that "fs" means "file system" is).


Same here.  In the good old days, "fs" stood for a "Font Server."  And in
even older (and better?) days, FS was a "Field Separator."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/c2baf9fd/attachment.html>

From rosuav at gmail.com  Wed Apr  6 15:18:06 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 Apr 2016 05:18:06 +1000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57055B50.2030209@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <57055B50.2030209@stoneleaf.us>
Message-ID: <CAPTjJmrY2XpyXqx2dFiOunSW9EbpUOU8j_zKYtoYH0WeO7c43w@mail.gmail.com>

On Thu, Apr 7, 2016 at 4:54 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Maybe __os_path__ then?  I would rather be explicit about the type of path
> we are dealing with -- who knows if we won't have __url_path__ in the future
> (besides Guido, of course ;)
>

Bikeshedding furiously... I don't like os_path here as it's too
similar to os.path; unless that's deliberate?

ChrisA

From ericfahlgren at gmail.com  Wed Apr  6 15:28:02 2016
From: ericfahlgren at gmail.com (Eric Fahlgren)
Date: Wed, 6 Apr 2016 12:28:02 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <20160406143909.GJ12526@ando.pearwood.info>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <5704D738.4070507@gmail.com> <20160406143909.GJ12526@ando.pearwood.info>
Message-ID: <01d601d1903a$70a1f460$51e5dd20$@gmail.com>

On Wednesday, April 06, 2016 07:39,  Steven D'Aprano wrote:
> > How well does that apply to path/__path__?
> 
> I think it's potentially the same. Possibly there are fewer existing uses
of
> "obj.path" out there which conflict with this use, but there's at least
one in the
> std lib: sys.path.

Somewhat ironically, also os.

>>> import os.path
>>> getattr(os, "path")
<module 'ntpath' from 'C:\\Python35\\lib\\ntpath.py'>


From rymg19 at gmail.com  Wed Apr  6 15:29:51 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 6 Apr 2016 14:29:51 -0500
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <CAO41-mOdyDH-LK-gsoh=fq2DWR9Ls7wmGPtcnBZ=hbns7jNdcA@mail.gmail.com>

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
On Apr 6, 2016 12:28 PM, "Brett Cannon" <brett at python.org> wrote:
>
> WIth Ethan volunteering to do the work to help make a path protocol a
thing -- and I'm willing to help along with propagating this through the
stdlib where I think Serhiy might be interested in helping as well -- and a
seeming consensus this is a good idea, it seems like this proposal has a
chance of actually coming to fruition.
>
> Now we need clear details. :) Some open questions are:

My votes:

> Name: __path__, __fspath__, or something else?

__path__. Considering everything related to `pathlib` uses the word `path`,
__fspath__ seems kind of odd.

> Method or attribute? (changes what kind of one-liner you might use in
libraries, but I think historically all protocols have been methods and the
serialized string representation might be costly to build)

Method. Using an attribute would be needlessly inconsistent.

> Built-in? (name is dependent on #1 if we add one)
> Add the method/attribute to str? (I assume so, much like __index__() is
on int, but I have not seen it explicitly stated so I would rather clarify
it)

I agree; this would avoid lots of excess complexity.

> Expand the C API to have something like PyObject_Path()?

-1. PyFileObject was already removed from Python 3; it seems useless to add
another one.

>
> Some people have asked for the pathlib PEP to have a more flushed out
reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
want to do it I can try to instil my blog post into a more succinct
paragraph or two and update the PEP myself.
>
> Is this going to require a PEP or if we can agree on the points here are
we just going to do it? If we think it requires a PEP I'm willing to write
it, but I obviously have no issue if we skip that step either. :)
>
> Oh, and we should resolve this before the next release of Python 3.4,
3.5, or 3.6 so that pathlib can be updated in those releases.
>
> -Brett
>
>
> On Wed, 6 Apr 2016 at 08:09 Ethan Furman <ethan at stoneleaf.us> wrote:
>>
>> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
>> > On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
>> >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
>>
>> >>> I'd missed the existing precedent in DirEntry.path, so simply taking
>> >>> that and running with it sounds good to me.
>> >>
>> >> This makes me twitch slightly, because NumPy has had a whole set of
>> >> problems due to the ancient and minimally-considered decision to
>> >> assume a bunch of ad hoc non-namespaced method names fulfilled some
>> >> protocol -- like all .sum methods will have a signature that's
>> >> compatible with numpy's, and if an object has a .log method then
>> >> surely that computes the logarithm (what else in computing could "log"
>> >> possibly refer to?), etc. This experience may or may not be relevant,
>> >> I'm not sure -- sometimes these kinds of twitches are good guides to
>> >> intuition, and sometimes they are just knee-jerk responses to an old
>> >> and irrelevant problem :-)
>> >>
>> >> But you might want to at least think about
>> >> how common it might be to have existing objects with unrelated
>> >> attributes that happen to be called "path", and the bizarro problems
>> >> that might be caused if someone accidentally passes one of them to a
>> >> function that expects all .path attributes to be instances of this new
>> >> protocol.
>> >
>> > sys.path, for example.
>> >
>> > That's why I'd actually prefer the implicit conversion protocol to be
>> > the more explicitly named "__fspath__", with suitable "__fspath__ =
>> > path" assignments added to DirEntry and pathlib. However, I'm also not
>> > offering to actually *do* the work here, and the casting vote goes to
>> > the folks pursuing the implementation effort.
>>
>> If we decide upon __fspath__ (or __path__) I will do the work on pathlib
>> and scandir to add those attributes.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/c567a534/attachment-0001.html>

From p.f.moore at gmail.com  Wed Apr  6 15:32:39 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 Apr 2016 20:32:39 +0100
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
Message-ID: <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>

On 6 April 2016 at 19:32, Brett Cannon <brett at python.org> wrote:
>> > Now we need clear details. :) Some open questions are:
>> >
>> >  1. Name: __path__, __fspath__, or something else?
>>
>> __fspath__
>
> +1 for __path__, +0 for __fspath__ (I don't know how widespread the notion
> that "fs" means "file system" is).

Agreed. But if we have a builtin, it should follow the name of the
special attribute/method. And I'm not that keen on having a builtin
with a generic name like 'path'.

>> >  2. Method or attribute? (changes what kind of one-liner you might use
>> >     in libraries, but I think historically all protocols have been
>> >     methods and the serialized string representation might be costly to
>> >     build)
>>
>> I would prefer an attribute, but yeah I think dunders are typically
>> methods, and I don't see this being special enough to not follow that
>> trend.
>
> Depends on what we want to tell 3rd-party libraries to do to support pathlib
> if they are on 3.3 or if they are worried about people using Python 3.4.2 or
> 3.5.1. An attribute still works with `getattr(path, '__path__', path)`. But
> with a method you probably want either `path.__path__() if hasattr(path,
> '__path__') else path` or `getattr(path, '__path__', lambda: path)()`.

I'm a little confused by this. To support the older pathlib, they have
to do patharg = str(patharg), because *none* of the proposed
attributes (path or __path__) will exist.

The getattr trick is needed to support the *new* pathlib, when you
need a real string. Currently you need a string if you call stdlib
functions or builtins. If we fix the stdlib/builtins, the need goes
away for those cases, but remains if you need to call libraries that
*don't* support pathlib (os.path will likely be one of those) or do
direct string manipulation.

In practice, I see the getattr trick as an "easy fix" for libraries
that want to add support but in a minimally-intrusive way. On that
basis, making the trick easy to use is important, which argues for an
attribute.

>> >  3. Built-in? (name is dependent on #1 if we add one)
>>
>> fspath() -- and it would be handy to have a function that return either
>> the __fspath__ results, or the string (if it was one), or raise an
>> exception if neither of the above work out.

fspath regardless of the name chosen in #1 - a new builtin called path
just has too much likelihood of clashing with user code.

But I'm not sure we need a builtin. I'm not at all clear how
frequently we expect user code to need to use this protocol. Users
can't use the builtin if they want to be backward compatible, But code
that doesn't need backward compatibility can probably just work with
pathlib (and the stdlib support for it) directly. For display, the
implicit conversion to str is fine. For "get me a string representing
the path", is the "path" attribute being abandoned in favour of this
special method? I'm inclined to think that if you are writing "pure
pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
certainly no *less* readable.

But I'm not one of the people who disliked using .path, so I'm
probably not best placed to judge. It would be good if someone who
*does* feel strongly could explain why fspath(pathobj) is better than
pathobj.path.

> So:
>
>   # Attribute
>   def fspath(path):
>       hasattr(path, '__path__'):
>           return path.__path__
>       if isinstance(path, str):
>           return path
>       raise NotImplementedError  # Or TypeError?
>
>   # Method
>   def fspath(path):
>       try:
>           return path.__path__()
>       except AttributeError:
>           if isinstance(path, str):
>               return path
>       raise TypeError  # Or NotImplementedError?

You could of course use try/except for the attribute case. Or hasattr
for the method case (where it would avoid masking AttributeError
exceptions raised within the dunder method call (a possibility if user
classes implement their own version of the protocol).

Paul

From phd at phdru.name  Wed Apr  6 15:26:42 2016
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 6 Apr 2016 21:26:42 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57055B50.2030209@stoneleaf.us>
References: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <57055B50.2030209@stoneleaf.us>
Message-ID: <20160406192642.GA11074@phdru.name>

On Wed, Apr 06, 2016 at 11:54:08AM -0700, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/06/2016 11:32 AM, Brett Cannon wrote:
> >On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote:
> >>On 04/06/2016 10:26 AM, Brett Cannon wrote:
> 
> >>>Now we need clear details. :) Some open questions are:
> >>>
> >>> 1. Name: __path__, __fspath__, or something else?
> >>
> >>__fspath__
> >
> >+1 for __path__, +0 for __fspath__ (I don't know how widespread the
> >notion that "fs" means "file system" is).
> 
> Maybe __os_path__ then?  I would rather be explicit about the type of path
> we are dealing with -- who knows if we won't have __url_path__ in the future
> (besides Guido, of course ;)

   __pathstr__? __urlstr__?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From barry at python.org  Wed Apr  6 15:33:30 2016
From: barry at python.org (Barry Warsaw)
Date: Wed, 6 Apr 2016 15:33:30 -0400
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <5704909E.8070908@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
Message-ID: <20160406153330.024a66a5@subdivisions.wooz.org>

On Apr 05, 2016, at 09:29 PM, Ethan Furman wrote:

>We should either remove it or make the rest of the stdlib work with it.
>Currently, pathlib.*Paths are second-class citizens, and working with them is
>not significantly better than working with os.path.* simply because we have
>to cast to str every time we want to deal with any other part of the stdlib.

This.  I've tried to use them in a couple of projects and in many ways pathlib
objects are nice to work with.  But rarely can they be used exclusively.
There are just too many other packages and APIs that use os.path and the two
do not interoperate very well.  That makes practical use of pathlib objects
just too unwieldy for project-wide adoption.

I don't know if inheriting them from str would fix this problem.  I'm +0 on
removing the provisional status of pathlib and in trying to figure out ways
for them to work better with other libraries (both stdlib and 3rd party) that
will continue to be os.path based for the foreseeable future.

Cheers,
-Barry

From brett at python.org  Wed Apr  6 15:31:17 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 19:31:17 +0000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAO41-mOdyDH-LK-gsoh=fq2DWR9Ls7wmGPtcnBZ=hbns7jNdcA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CAO41-mOdyDH-LK-gsoh=fq2DWR9Ls7wmGPtcnBZ=hbns7jNdcA@mail.gmail.com>
Message-ID: <CAP1=2W6a7UU6XfL3PRi9nEVO15zNCbAd_0TWzQKevvCeiWddCQ@mail.gmail.com>

On Wed, 6 Apr 2016 at 12:29 Ryan Gonzalez <rymg19 at gmail.com> wrote:

> --
> Ryan
> [ERROR]: Your autotools build scripts are 200 lines longer than your
> program. Something?s wrong.
> http://kirbyfan64.github.io/
>
>
> On Apr 6, 2016 12:28 PM, "Brett Cannon" <brett at python.org> wrote:
> >
> > WIth Ethan volunteering to do the work to help make a path protocol a
> thing -- and I'm willing to help along with propagating this through the
> stdlib where I think Serhiy might be interested in helping as well -- and a
> seeming consensus this is a good idea, it seems like this proposal has a
> chance of actually coming to fruition.
> >
> > Now we need clear details. :) Some open questions are:
>
> My votes:
>
> > Name: __path__, __fspath__, or something else?
>
> __path__. Considering everything related to `pathlib` uses the word
> `path`, __fspath__ seems kind of odd.
>
> > Method or attribute? (changes what kind of one-liner you might use in
> libraries, but I think historically all protocols have been methods and the
> serialized string representation might be costly to build)
>
> Method. Using an attribute would be needlessly inconsistent.
>
> > Built-in? (name is dependent on #1 if we add one)
> > Add the method/attribute to str? (I assume so, much like __index__() is
> on int, but I have not seen it explicitly stated so I would rather clarify
> it)
>
> I agree; this would avoid lots of excess complexity.
>
> > Expand the C API to have something like PyObject_Path()?
>
> -1. PyFileObject was already removed from Python 3; it seems useless to
> add another one.
>

But that was removing a custom object, not a function that will implement
whatever idiom we come up with for getting the string representation of a
path.

-Brett


> >
> > Some people have asked for the pathlib PEP to have a more flushed out
> reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> want to do it I can try to instil my blog post into a more succinct
> paragraph or two and update the PEP myself.
> >
> > Is this going to require a PEP or if we can agree on the points here are
> we just going to do it? If we think it requires a PEP I'm willing to write
> it, but I obviously have no issue if we skip that step either. :)
> >
> > Oh, and we should resolve this before the next release of Python 3.4,
> 3.5, or 3.6 so that pathlib can be updated in those releases.
> >
> > -Brett
> >
> >
> > On Wed, 6 Apr 2016 at 08:09 Ethan Furman <ethan at stoneleaf.us> wrote:
> >>
> >> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
> >> > On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
> >> >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
> >>
> >> >>> I'd missed the existing precedent in DirEntry.path, so simply taking
> >> >>> that and running with it sounds good to me.
> >> >>
> >> >> This makes me twitch slightly, because NumPy has had a whole set of
> >> >> problems due to the ancient and minimally-considered decision to
> >> >> assume a bunch of ad hoc non-namespaced method names fulfilled some
> >> >> protocol -- like all .sum methods will have a signature that's
> >> >> compatible with numpy's, and if an object has a .log method then
> >> >> surely that computes the logarithm (what else in computing could
> "log"
> >> >> possibly refer to?), etc. This experience may or may not be relevant,
> >> >> I'm not sure -- sometimes these kinds of twitches are good guides to
> >> >> intuition, and sometimes they are just knee-jerk responses to an old
> >> >> and irrelevant problem :-)
> >> >>
> >> >> But you might want to at least think about
> >> >> how common it might be to have existing objects with unrelated
> >> >> attributes that happen to be called "path", and the bizarro problems
> >> >> that might be caused if someone accidentally passes one of them to a
> >> >> function that expects all .path attributes to be instances of this
> new
> >> >> protocol.
> >> >
> >> > sys.path, for example.
> >> >
> >> > That's why I'd actually prefer the implicit conversion protocol to be
> >> > the more explicitly named "__fspath__", with suitable "__fspath__ =
> >> > path" assignments added to DirEntry and pathlib. However, I'm also not
> >> > offering to actually *do* the work here, and the casting vote goes to
> >> > the folks pursuing the implementation effort.
> >>
> >> If we decide upon __fspath__ (or __path__) I will do the work on pathlib
> >> and scandir to add those attributes.
> >
> >
>
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/1a7a166c/attachment.html>

From brett at python.org  Wed Apr  6 15:39:12 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 19:39:12 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
Message-ID: <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>

On Wed, 6 Apr 2016 at 12:32 Paul Moore <p.f.moore at gmail.com> wrote:

> On 6 April 2016 at 19:32, Brett Cannon <brett at python.org> wrote:
> >> > Now we need clear details. :) Some open questions are:
> >> >
> >> >  1. Name: __path__, __fspath__, or something else?
> >>
> >> __fspath__
> >
> > +1 for __path__, +0 for __fspath__ (I don't know how widespread the
> notion
> > that "fs" means "file system" is).
>
> Agreed. But if we have a builtin, it should follow the name of the
> special attribute/method. And I'm not that keen on having a builtin
> with a generic name like 'path'.
>
> >> >  2. Method or attribute? (changes what kind of one-liner you might use
> >> >     in libraries, but I think historically all protocols have been
> >> >     methods and the serialized string representation might be costly
> to
> >> >     build)
> >>
> >> I would prefer an attribute, but yeah I think dunders are typically
> >> methods, and I don't see this being special enough to not follow that
> >> trend.
> >
> > Depends on what we want to tell 3rd-party libraries to do to support
> pathlib
> > if they are on 3.3 or if they are worried about people using Python
> 3.4.2 or
> > 3.5.1. An attribute still works with `getattr(path, '__path__', path)`.
> But
> > with a method you probably want either `path.__path__() if hasattr(path,
> > '__path__') else path` or `getattr(path, '__path__', lambda: path)()`.
>
> I'm a little confused by this. To support the older pathlib, they have
> to do patharg = str(patharg), because *none* of the proposed
> attributes (path or __path__) will exist.
>
> The getattr trick is needed to support the *new* pathlib, when you
> need a real string. Currently you need a string if you call stdlib
> functions or builtins. If we fix the stdlib/builtins, the need goes
> away for those cases, but remains if you need to call libraries that
> *don't* support pathlib (os.path will likely be one of those) or do
> direct string manipulation.
>
> In practice, I see the getattr trick as an "easy fix" for libraries
> that want to add support but in a minimally-intrusive way. On that
> basis, making the trick easy to use is important, which argues for an
> attribute.
>

So then where's the confusion? :) You seem to get the points. I personally
find `path.__path__() if hasattr(path, '__path__') else path` also readable
(if obviously a bit longer).

-Brett


>
> >> >  3. Built-in? (name is dependent on #1 if we add one)
> >>
> >> fspath() -- and it would be handy to have a function that return either
> >> the __fspath__ results, or the string (if it was one), or raise an
> >> exception if neither of the above work out.
>
> fspath regardless of the name chosen in #1 - a new builtin called path
> just has too much likelihood of clashing with user code.
>
> But I'm not sure we need a builtin. I'm not at all clear how
> frequently we expect user code to need to use this protocol. Users
> can't use the builtin if they want to be backward compatible, But code
> that doesn't need backward compatibility can probably just work with
> pathlib (and the stdlib support for it) directly. For display, the
> implicit conversion to str is fine. For "get me a string representing
> the path", is the "path" attribute being abandoned in favour of this
> special method?


Yes.


> I'm inclined to think that if you are writing "pure
> pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
> certainly no *less* readable.
>

I don't' know what you mean by "pure pathlib". You mean code that only
works with pathlib objects? Or do you mean code that accepts pathlib
objects but uses strings internally?

-Brett


>
> But I'm not one of the people who disliked using .path, so I'm
> probably not best placed to judge. It would be good if someone who
> *does* feel strongly could explain why fspath(pathobj) is better than
> pathobj.path.
>


>
> > So:
> >
> >   # Attribute
> >   def fspath(path):
> >       hasattr(path, '__path__'):
> >           return path.__path__
> >       if isinstance(path, str):
> >           return path
> >       raise NotImplementedError  # Or TypeError?
> >
> >   # Method
> >   def fspath(path):
> >       try:
> >           return path.__path__()
> >       except AttributeError:
> >           if isinstance(path, str):
> >               return path
> >       raise TypeError  # Or NotImplementedError?
>
> You could of course use try/except for the attribute case. Or hasattr
> for the method case (where it would avoid masking AttributeError
> exceptions raised within the dunder method call (a possibility if user
> classes implement their own version of the protocol).
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/2494efda/attachment.html>

From brett at python.org  Wed Apr  6 15:40:16 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 19:40:16 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <20160406192642.GA11074@phdru.name>
References: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <57055B50.2030209@stoneleaf.us> <20160406192642.GA11074@phdru.name>
Message-ID: <CAP1=2W7=shXrHvrckBPP3rVR+bJCaMK-6My7WnegR8bZjww86g@mail.gmail.com>

On Wed, 6 Apr 2016 at 12:38 Oleg Broytman <phd at phdru.name> wrote:

> On Wed, Apr 06, 2016 at 11:54:08AM -0700, Ethan Furman <ethan at stoneleaf.us>
> wrote:
> > On 04/06/2016 11:32 AM, Brett Cannon wrote:
> > >On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote:
> > >>On 04/06/2016 10:26 AM, Brett Cannon wrote:
> >
> > >>>Now we need clear details. :) Some open questions are:
> > >>>
> > >>> 1. Name: __path__, __fspath__, or something else?
> > >>
> > >>__fspath__
> > >
> > >+1 for __path__, +0 for __fspath__ (I don't know how widespread the
> > >notion that "fs" means "file system" is).
> >
> > Maybe __os_path__ then?  I would rather be explicit about the type of
> path
> > we are dealing with -- who knows if we won't have __url_path__ in the
> future
> > (besides Guido, of course ;)
>
>    __pathstr__? __urlstr__?
>

But we didn't call it __indexint__ either. No need to embed the type in the
name.

-Brett


>
> Oleg.
> --
>      Oleg Broytman            http://phdru.name/            phd at phdru.name
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/61c58f44/attachment-0001.html>

From barry at python.org  Wed Apr  6 15:43:34 2016
From: barry at python.org (Barry Warsaw)
Date: Wed, 6 Apr 2016 15:43:34 -0400
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
Message-ID: <20160406154334.058182b6@subdivisions.wooz.org>

On Apr 06, 2016, at 12:44 PM, Nick Coghlan wrote:

>The next challenge would then be to make a list of APIs to be updated
>for 3.6 to implicitly accept "rich path" objects via the agreed
>convention, with pathlib.PurePath used as a test class:
>
>* open()
>* codecs.open() (et al)
>* io.*
>* os.path.*
>* other os functions
>* shutil.*
>* tempfile.*
>* shelve.*
>* csv.*

Aside from the name of the attribute (though I'm partial to __path__), I think
this would go a long way toward making path objects nicer to work with.  And
right, it doesn't have to be 100% but this would be a big improvement.

Cheers,
-Barry

From ethan at stoneleaf.us  Wed Apr  6 16:07:54 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 13:07:54 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>	<CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>	<57044567.6070308@sdamon.com>	<CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>	<CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>	<CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>	<ne28fo$flu$1@ger.gmane.org>	<CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>	<CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>	<CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>	<570526CE.5080401@stoneleaf.us>	<CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>	<57054FFB.5070709@stoneleaf.us>	<CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
Message-ID: <57056C9A.5090105@stoneleaf.us>

On 04/06/2016 12:32 PM, Paul Moore wrote:

> But I'm not one of the people who disliked using .path, so I'm
> probably not best placed to judge. It would be good if someone who
> *does* feel strongly could explain why fspath(pathobj) is better than
> pathobj.path.

fspath() would be useful because you can pass it a str or a Path and get 
a str back (or an exception if you pass the wrong thing in).

Just like with Path you can pass a str or a Path get a Path back (or an 
exception if ...).

--
--
~Ethan~

From ethan at stoneleaf.us  Wed Apr  6 16:09:19 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 13:09:19 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAPTjJmrY2XpyXqx2dFiOunSW9EbpUOU8j_zKYtoYH0WeO7c43w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <57055B50.2030209@stoneleaf.us>
 <CAPTjJmrY2XpyXqx2dFiOunSW9EbpUOU8j_zKYtoYH0WeO7c43w@mail.gmail.com>
Message-ID: <57056CEF.8010404@stoneleaf.us>

On 04/06/2016 12:18 PM, Chris Angelico wrote:
> On Thu, Apr 7, 2016 at 4:54 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> Maybe __os_path__ then?  I would rather be explicit about the type of path
>> we are dealing with -- who knows if we won't have __url_path__ in the future
>> (besides Guido, of course ;)
>>
>
> Bikeshedding furiously... I don't like os_path here as it's too
> similar to os.path; unless that's deliberate?

Well, it is a Operating System Path.  ;)

--
~Ethan~


From srkunze at mail.de  Wed Apr  6 16:13:09 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Wed, 6 Apr 2016 22:13:09 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
Message-ID: <57056DD5.5050503@mail.de>

On 06.04.2016 21:02, Alexander Belopolsky wrote:
>
> On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon <brett at python.org 
> <mailto:brett at python.org>> wrote:
>
>     +1 for __path__, +0 for __fspath__? (I don't know how widespread
>     the notion that "fs" means "file system" is).
>
>
> Same here.?  In the good old days, "fs" stood for a "Font Server." 
> ? And in even older (and better?) days, FS was a "Field Separator."

The future is not the past. ;)


What about

__file_path__

?


Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/7b69f909/attachment.html>

From ethan at stoneleaf.us  Wed Apr  6 16:20:59 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 13:20:59 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <loom.20160406T113422-123@post.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <CAP1=2W4HojCKjcE2cVNAiDLkuOayhORyKfGwqgiKtRh8ZE=dKA@mail.gmail.com>
 <loom.20160406T113422-123@post.gmane.org>
Message-ID: <57056FAB.2010703@stoneleaf.us>

On 04/06/2016 02:41 AM, Antoine Pitrou wrote:

> On a concrete point, inheriting str would make the API a horrible,
> confusing, dangerous mess missing regular string semantics (concatenation
> with +, for example, or indexing) with path-specific semantics and various
> grey areas (should .split() have path semantics or str semantics? what
> is the rule and how are people supposed to remember it?).

While I agree in principle..

> (of course, for PHP or Javascript programmers it may not sound like a
> problem. Let "adding" two IP addresses return the concatenation of
> their string representations...)

Like if had a subnet of '192.168' and a  host of '.11.16' and adding 
them together gave you '192.168.11.16'? (yeah, a bit weak)

Or, more appropriately:  a path of

   '/home/ethan/mystuff' + '_bak'

so I can make a copy?  Actually, that would be

   stuff = pathlib.Path('/home/ethan/mystuff')  # no issue here
   backup_stuff = stuff.with_name(stuff.name + '_bak')  # eww

Sure, you can make the argument that `with_suffix('.bak')` is cleaner, 
but it is not up to the stdlib to micromanage my code.

Oh, and I do not consort with PHP, and only do so with Javascript when 
forced.

--
~Ethan~

From ethan at stoneleaf.us  Wed Apr  6 16:22:04 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 13:22:04 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
Message-ID: <57056FEC.6010201@stoneleaf.us>

On 04/05/2016 11:53 PM, Nathaniel Smith wrote:
> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote:

>> I'd missed the existing precedent in DirEntry.path, so simply taking
>> that and running with it sounds good to me.
>
> This makes me twitch slightly, because NumPy has had a whole set of
> problems due to the ancient and minimally-considered decision to
> assume a bunch of ad hoc non-namespaced method names fulfilled some
> protocol -- like all .sum methods will have a signature that's
> compatible with numpy's, and if an object has a .log method then
> surely that computes the logarithm (what else in computing could "log"
> possibly refer to?), etc. This experience may or may not be relevant,
> I'm not sure -- sometimes these kinds of twitches are good guides to
> intuition, and sometimes they are just knee-jerk responses to an old
> and irrelevant problem :-). But you might want to at least think about
> how common it might be to have existing objects with unrelated
> attributes that happen to be called "path", and the bizarro problems
> that might be caused if someone accidentally passes one of them to a
> function that expects all .path attributes to be instances of this new
> protocol.

A very good point, thank you.

--
~Ethan~


From brett at python.org  Wed Apr  6 16:28:02 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 20:28:02 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57056DD5.5050503@mail.de>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
 <57056DD5.5050503@mail.de>
Message-ID: <CAP1=2W5ws+uQiU3SjHyyTNX==VOh4U+1v+8=Kbm0=9-jTsY98Q@mail.gmail.com>

On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze <srkunze at mail.de> wrote:

> On 06.04.2016 21:02, Alexander Belopolsky wrote:
>
> On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon <brett at python.org> wrote:
>
> +1 for __path__, +0 for __fspath__? (I don't know how widespread the
>> notion that "fs" means "file system" is).
>
>
> Same here.?  In the good old days, "fs" stood for a "Font Server." ? And
> in even older (and better?) days, FS was a "Field Separator."
>
>
> The future is not the past. ;)
>
>
> What about
>
> __file_path__
>

Can be a directory as well (and you could argue semantics of file system
inodes, beginners won't know the subtlety and/or wonder where __dir_path__
is).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/62a41d2a/attachment.html>

From srkunze at mail.de  Wed Apr  6 16:47:05 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Wed, 6 Apr 2016 22:47:05 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
Message-ID: <570575C9.7060208@mail.de>

On 06.04.2016 07:00, Guido van Rossum wrote:
> On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> [...] we can't do:
>>
>>      app_root = Path(...)
>>      config = app_root/'settings.cfg'
>>      with open(config) as blah:
>>          # whatever
>>
>> It feels like instead of addressing this basic disconnect, the answer has
>> instead been:  add that to pathlib!  Which works great -- until a user or a
>> library gets this path object and tries to use something from os on it.
> I agree that asking for config.open() isn't the right answer here
> (even if it happens to work).

How come?

> But in this example, once 3.5.2 is out,
> the solution would be to use open(config.path), and that will also
> work when passing it to a library. Is it still unacceptable then?

I think so. Although in this example I would prefer the shorter 
config.open alternative as I am lazy.


I still cannot remember what the concrete issue was why we dropped 
pathlib the same day we gave it a try. It was something really stupid 
and although I hoped to reduce the size of the code, it was less 
readable. But it was not the path->str issue but something more mundane. 
It was something that forced us to use os[.path] as Path didn't provide 
something equivalent. Cannot remember.....


Best,
Sven

From srkunze at mail.de  Wed Apr  6 16:54:13 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Wed, 6 Apr 2016 22:54:13 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5ws+uQiU3SjHyyTNX==VOh4U+1v+8=Kbm0=9-jTsY98Q@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
 <57056DD5.5050503@mail.de>
 <CAP1=2W5ws+uQiU3SjHyyTNX==VOh4U+1v+8=Kbm0=9-jTsY98Q@mail.gmail.com>
Message-ID: <57057775.3040307@mail.de>

On 06.04.2016 22:28, Brett Cannon wrote:
> On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze <srkunze at mail.de 
> <mailto:srkunze at mail.de>> wrote:
>
>
>     What about
>
>     __file_path__
>
>
> Can be a directory as well (and you could argue semantics of file 
> system inodes, beginners won't know the subtlety and/or wonder where 
> __dir_path__ is).

Good point.

Well, then __fspath__ for me.


I knew instantly what it means especially considering btrfs, ntfs, xfs, 
zfs, etc.

Furthermore, we MIGHT later want some URI support, so I don't know off 
the top of my head if there's a difference between __fspath__ and 
__urlpath__ but better separate it now. Later we can re-merge then if 
necessary.


Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/b03645e9/attachment.html>

From brett at python.org  Wed Apr  6 16:55:41 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 20:55:41 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57057775.3040307@mail.de>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
 <57056DD5.5050503@mail.de>
 <CAP1=2W5ws+uQiU3SjHyyTNX==VOh4U+1v+8=Kbm0=9-jTsY98Q@mail.gmail.com>
 <57057775.3040307@mail.de>
Message-ID: <CAP1=2W7tfLJrsho5w52XY_t0O--mRydKac7Cr-oA4VWaQnqy8A@mail.gmail.com>

On Wed, 6 Apr 2016 at 13:54 Sven R. Kunze <srkunze at mail.de> wrote:

> On 06.04.2016 22:28, Brett Cannon wrote:
>
> On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze < <srkunze at mail.de>
> srkunze at mail.de> wrote:
>
>
>> What about
>>
>> __file_path__
>>
>
> Can be a directory as well (and you could argue semantics of file system
> inodes, beginners won't know the subtlety and/or wonder where __dir_path__
> is).
>
>
> Good point.
>
> Well, then __fspath__ for me.
>
>
> I knew instantly what it means especially considering btrfs, ntfs, xfs,
> zfs, etc.
>
> Furthermore, we MIGHT later want some URI support, so I don't know off the
> top of my head if there's a difference between __fspath__ and __urlpath__
> but better separate it now. Later we can re-merge then if necessary.
>

There's a difference as a URL represents something different than a file
system path (URI doesn't necessarily). Plus the serialized format would be
different, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/6edcdab4/attachment.html>

From wes.turner at gmail.com  Wed Apr  6 17:03:05 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 6 Apr 2016 16:03:05 -0500
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP1=2W5Gt5DTu9G4i3b2_SVvPejnhQud3w5i88hC76L9zaPGtg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CACfEFw_SjhLrrPM-8wk9M73r2Lz5J3MRoiLax51wu_pCbeMdcg@mail.gmail.com>
 <CAP1=2W5Gt5DTu9G4i3b2_SVvPejnhQud3w5i88hC76L9zaPGtg@mail.gmail.com>
Message-ID: <CACfEFw_9Bwc8uoXzZnZ5gOXBGOpEF5ZJxajx8ZjPbwfAxGcg6w@mail.gmail.com>

On Apr 6, 2016 12:47 PM, "Brett Cannon" <brett at python.org> wrote:
>
>
>
> On Wed, 6 Apr 2016 at 10:41 Wes Turner <wes.turner at gmail.com> wrote:
>>
>> * +1 for __path__, __fspath__
>>   (though I don't know what each does)
>
>
> Returns a string representing a file system path.

Why two methods? __uripath__?

(scheme, host (port), path, query, fragment) so, not __uripath__

what would be the difference between __path__ and __fspath__?

>
>>
>> * why not Text(basestring / bytestring) and pathlib.Path(Text)?
>
>
> See the points about next() vs __next__()

Path(b'123') / u'456'

similarly,
Path(b'123') / UTF8 / UTF16

>
>>
>>    * are there examples of cases where this cannot be?
>
>
> I don't understand what you think "cannot be".

What one recommends (path.py(str) / str(pathlib.Path()) + getattr) is
distinct from what any given programmer chooses to do with their code.

>
>>
>>       * if not, +1 for subclassing str/Text
>>
>>       * where are the examples of method collisions between the str
interface and the pathlib.Path interface?
>
>
> There aren't any and that's partially why some people wanted the str
subclass to begin with.
>
> Please consider this thread a str-subclass-free zone. This line of
discussion is to flesh out the proposal for a path protocol as a proposal
against subclassing str, not to settle the whole discussion outright. If
you want to continue to debate the subclassing-str side of this please use
the other thread.

this seems to be a sudden, arbitrary distinction.

are these proposals necessarily disjoint?

so,
adding getattr(path, '__path__', path) to stdlib and other code is going to
prevent which edge cases (before  os.path.normpath()* anyway) for which
benefit?

when do I do getattr(path, '__fspath__', path)?

>
> -Brett
>
>>
>>          * str.__div__ is nonsensical
>>          * pathlib.Path.__div__ is super-useful

ah, not .__add__() but .append()

I suppose the request here is for the cases which would be prevented (that
we need to learn to look for)

>>
>>
>>
>> On Apr 6, 2016 10:10 AM, "Ethan Furman" <ethan at stoneleaf.us> wrote:
>>>
>>> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
>>>>
>>>> On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
>>>>>
>>>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
>>>
>>>
>>>>>> I'd missed the existing precedent in DirEntry.path, so simply taking
>>>>>> that and running with it sounds good to me.
>>>>>
>>>>>
>>>>> This makes me twitch slightly, because NumPy has had a whole set of
>>>>> problems due to the ancient and minimally-considered decision to
>>>>> assume a bunch of ad hoc non-namespaced method names fulfilled some
>>>>> protocol -- like all .sum methods will have a signature that's
>>>>> compatible with numpy's, and if an object has a .log method then
>>>>> surely that computes the logarithm (what else in computing could "log"
>>>>> possibly refer to?), etc. This experience may or may not be relevant,
>>>>> I'm not sure -- sometimes these kinds of twitches are good guides to
>>>>> intuition, and sometimes they are just knee-jerk responses to an old
>>>>> and irrelevant problem :-)
>>>>>
>>>>> But you might want to at least think about
>>>>> how common it might be to have existing objects with unrelated
>>>>> attributes that happen to be called "path", and the bizarro problems
>>>>> that might be caused if someone accidentally passes one of them to a
>>>>> function that expects all .path attributes to be instances of this new
>>>>> protocol.
>>>>
>>>>
>>>> sys.path, for example.
>>>>
>>>> That's why I'd actually prefer the implicit conversion protocol to be
>>>> the more explicitly named "__fspath__", with suitable "__fspath__ =
>>>> path" assignments added to DirEntry and pathlib. However, I'm also not
>>>> offering to actually *do* the work here, and the casting vote goes to
>>>> the folks pursuing the implementation effort.
>>>
>>>
>>> If we decide upon __fspath__ (or __path__) I will do the work on
pathlib and scandir to add those attributes.
>>>
>>> --
>>> ~Ethan~
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>>
>>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/brett%40python.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/101ee9ed/attachment-0001.html>

From ethan at stoneleaf.us  Wed Apr  6 17:03:55 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 14:03:55 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <570575C9.7060208@mail.de>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de>
Message-ID: <570579BB.5070602@stoneleaf.us>

On 04/06/2016 01:47 PM, Sven R. Kunze wrote:

> I still cannot remember what the concrete issue was why we dropped
> pathlib the same day we gave it a try. It was something really stupid
> and although I hoped to reduce the size of the code, it was less
> readable. But it was not the path->str issue but something more mundane.
> It was something that forced us to use os[.path] as Path didn't provide
> something equivalent. Cannot remember.....

I'm willing to guess that if you had been able to just call

   os.whatever(your_path_obj)

it would have been at most a minor annoyance.

--
~Ethan~

From brett at python.org  Wed Apr  6 17:07:59 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 21:07:59 +0000
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CACfEFw_9Bwc8uoXzZnZ5gOXBGOpEF5ZJxajx8ZjPbwfAxGcg6w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CACfEFw_SjhLrrPM-8wk9M73r2Lz5J3MRoiLax51wu_pCbeMdcg@mail.gmail.com>
 <CAP1=2W5Gt5DTu9G4i3b2_SVvPejnhQud3w5i88hC76L9zaPGtg@mail.gmail.com>
 <CACfEFw_9Bwc8uoXzZnZ5gOXBGOpEF5ZJxajx8ZjPbwfAxGcg6w@mail.gmail.com>
Message-ID: <CAP1=2W4YkBw-tzT41iEBV3+V_RZ-9-Cju2d43Y9XwmYb4Dsu6w@mail.gmail.com>

On Wed, 6 Apr 2016 at 14:03 Wes Turner <wes.turner at gmail.com> wrote:

>
> On Apr 6, 2016 12:47 PM, "Brett Cannon" <brett at python.org> wrote:
> >
> >
> >
> > On Wed, 6 Apr 2016 at 10:41 Wes Turner <wes.turner at gmail.com> wrote:
> >>
> >> * +1 for __path__, __fspath__
> >>   (though I don't know what each does)
> >
> >
> > Returns a string representing a file system path.
>
> Why two methods? __uripath__?
>
> (scheme, host (port), path, query, fragment) so, not __uripath__
>
> what would be the difference between __path__ and __fspath__?
>

There is no difference; we're trying to choose a name.


> >
> >>
> >> * why not Text(basestring / bytestring) and pathlib.Path(Text)?
> >
> >
> > See the points about next() vs __next__()
>
> Path(b'123') / u'456'
>
> similarly,
> Path(b'123') / UTF8 / UTF16
>

As other people pointed out on the other thread, while bytes paths do
exist, we don't want to promote them as they are a mess to work with.

-Brett


> >
> >>
> >>    * are there examples of cases where this cannot be?
> >
> >
> > I don't understand what you think "cannot be".
>
> What one recommends (path.py(str) / str(pathlib.Path()) + getattr) is
> distinct from what any given programmer chooses to do with their code.
>
> >
> >>
> >>       * if not, +1 for subclassing str/Text
> >>
> >>       * where are the examples of method collisions between the str
> interface and the pathlib.Path interface?
> >
> >
> > There aren't any and that's partially why some people wanted the str
> subclass to begin with.
> >
> > Please consider this thread a str-subclass-free zone. This line of
> discussion is to flesh out the proposal for a path protocol as a proposal
> against subclassing str, not to settle the whole discussion outright. If
> you want to continue to debate the subclassing-str side of this please use
> the other thread.
>
> this seems to be a sudden, arbitrary distinction.
>
> are these proposals necessarily disjoint?
>
> so,
> adding getattr(path, '__path__', path) to stdlib and other code is going
> to prevent which edge cases (before  os.path.normpath()* anyway) for which
> benefit?
>
> when do I do getattr(path, '__fspath__', path)?
>
> >
> > -Brett
> >
> >>
> >>          * str.__div__ is nonsensical
> >>          * pathlib.Path.__div__ is super-useful
>
> ah, not .__add__() but .append()
>
> I suppose the request here is for the cases which would be prevented (that
> we need to learn to look for)
>
> >>
> >>
> >>
> >> On Apr 6, 2016 10:10 AM, "Ethan Furman" <ethan at stoneleaf.us> wrote:
> >>>
> >>> On 04/05/2016 11:57 PM, Nick Coghlan wrote:
> >>>>
> >>>> On 6 April 2016 at 16:53, Nathaniel Smith <njs at pobox.com> wrote:
> >>>>>
> >>>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
> >>>
> >>>
> >>>>>> I'd missed the existing precedent in DirEntry.path, so simply taking
> >>>>>> that and running with it sounds good to me.
> >>>>>
> >>>>>
> >>>>> This makes me twitch slightly, because NumPy has had a whole set of
> >>>>> problems due to the ancient and minimally-considered decision to
> >>>>> assume a bunch of ad hoc non-namespaced method names fulfilled some
> >>>>> protocol -- like all .sum methods will have a signature that's
> >>>>> compatible with numpy's, and if an object has a .log method then
> >>>>> surely that computes the logarithm (what else in computing could
> "log"
> >>>>> possibly refer to?), etc. This experience may or may not be relevant,
> >>>>> I'm not sure -- sometimes these kinds of twitches are good guides to
> >>>>> intuition, and sometimes they are just knee-jerk responses to an old
> >>>>> and irrelevant problem :-)
> >>>>>
> >>>>> But you might want to at least think about
> >>>>> how common it might be to have existing objects with unrelated
> >>>>> attributes that happen to be called "path", and the bizarro problems
> >>>>> that might be caused if someone accidentally passes one of them to a
> >>>>> function that expects all .path attributes to be instances of this
> new
> >>>>> protocol.
> >>>>
> >>>>
> >>>> sys.path, for example.
> >>>>
> >>>> That's why I'd actually prefer the implicit conversion protocol to be
> >>>> the more explicitly named "__fspath__", with suitable "__fspath__ =
> >>>> path" assignments added to DirEntry and pathlib. However, I'm also not
> >>>> offering to actually *do* the work here, and the casting vote goes to
> >>>> the folks pursuing the implementation effort.
> >>>
> >>>
> >>> If we decide upon __fspath__ (or __path__) I will do the work on
> pathlib and scandir to add those attributes.
> >>>
> >>> --
> >>> ~Ethan~
> >>> _______________________________________________
> >>> Python-Dev mailing list
> >>> Python-Dev at python.org
> >>> https://mail.python.org/mailman/listinfo/python-dev
> >>>
> >>> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
> >>
> >> _______________________________________________
> >> Python-Dev mailing list
> >> Python-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/e0758efe/attachment.html>

From srkunze at mail.de  Wed Apr  6 17:15:17 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Wed, 6 Apr 2016 23:15:17 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W7tfLJrsho5w52XY_t0O--mRydKac7Cr-oA4VWaQnqy8A@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CAP7h-xYBFObhQqhmUT+hP9Td9_SCgggmofSYUqcFvLaWmCp-oQ@mail.gmail.com>
 <57056DD5.5050503@mail.de>
 <CAP1=2W5ws+uQiU3SjHyyTNX==VOh4U+1v+8=Kbm0=9-jTsY98Q@mail.gmail.com>
 <57057775.3040307@mail.de>
 <CAP1=2W7tfLJrsho5w52XY_t0O--mRydKac7Cr-oA4VWaQnqy8A@mail.gmail.com>
Message-ID: <57057C65.3070802@mail.de>

On 06.04.2016 22:55, Brett Cannon wrote:
> On Wed, 6 Apr 2016 at 13:54 Sven R. Kunze <srkunze at mail.de 
> <mailto:srkunze at mail.de>> wrote:
>
>     Furthermore, we MIGHT later want some URI support, so I don't know
>     off the top of my head if there's a difference between __fspath__
>     and __urlpath__ but better separate it now. Later we can re-merge
>     then if necessary.
>
>
> There's a difference as a URL represents something different than a 
> file system path (URI doesn't necessarily). Plus the serialized format 
> would be different, etc.

Sure. URLs and URIs are more than just paths. I would expect __urlpath__ 
to be different than __url__ itself but if that's is a different discussion.

So, __fspath__ for me. :)

Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/bdf3cb75/attachment.html>

From srkunze at mail.de  Wed Apr  6 17:27:07 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Wed, 6 Apr 2016 23:27:07 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <570579BB.5070602@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de> <570579BB.5070602@stoneleaf.us>
Message-ID: <57057F2B.1020205@mail.de>

Yeah, sure. But it was more like this on a single line:

    os.missing1(str(our_path.something1)) *** 
os.missing2(str(our_path.something1)) *** 
os.missing1(str(our_path.something1))

And then it started to get messy because you need to work on a single 
long line or you need to open more than one line.

It was a simple thing actually. Like repeating the same calls to pathlib 
just because we need to switch to os.path.... I will ask my colleague if 
he remembers or if we can recover the code tommorrow...


Best,
Sven


NOTE to myself: getting old, need to write down everything


On 06.04.2016 23:03, Ethan Furman wrote:
> On 04/06/2016 01:47 PM, Sven R. Kunze wrote:
>
>> I still cannot remember what the concrete issue was why we dropped
>> pathlib the same day we gave it a try. It was something really stupid
>> and although I hoped to reduce the size of the code, it was less
>> readable. But it was not the path->str issue but something more mundane.
>> It was something that forced us to use os[.path] as Path didn't provide
>> something equivalent. Cannot remember.....
>
> I'm willing to guess that if you had been able to just call
>
>   os.whatever(your_path_obj)
>
> it would have been at most a minor annoyance.
>
> -- 
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de


From p.f.moore at gmail.com  Wed Apr  6 18:22:50 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 Apr 2016 23:22:50 +0100
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
Message-ID: <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>

On 6 April 2016 at 20:39, Brett Cannon <brett at python.org> wrote:
>> I'm a little confused by this. To support the older pathlib, they have
>> to do patharg = str(patharg), because *none* of the proposed
>> attributes (path or __path__) will exist.
>>
>> The getattr trick is needed to support the *new* pathlib, when you
>> need a real string. Currently you need a string if you call stdlib
>> functions or builtins. If we fix the stdlib/builtins, the need goes
>> away for those cases, but remains if you need to call libraries that
>> *don't* support pathlib (os.path will likely be one of those) or do
>> direct string manipulation.
>>
>> In practice, I see the getattr trick as an "easy fix" for libraries
>> that want to add support but in a minimally-intrusive way. On that
>> basis, making the trick easy to use is important, which argues for an
>> attribute.
>
> So then where's the confusion? :) You seem to get the points. I personally
> find `path.__path__() if hasattr(path, '__path__') else path` also readable
> (if obviously a bit longer).

The confusion is that you seem to be saying that people can use
getattr(path, '__path__', path) to support older versions of Python.
But the older versions are precisely the ones that don't have __path__
so you won't be supporting them.

>> >> >  3. Built-in? (name is dependent on #1 if we add one)
>> >>
>> >> fspath() -- and it would be handy to have a function that return either
>> >> the __fspath__ results, or the string (if it was one), or raise an
>> >> exception if neither of the above work out.
>>
>> fspath regardless of the name chosen in #1 - a new builtin called path
>> just has too much likelihood of clashing with user code.
>>
>> But I'm not sure we need a builtin. I'm not at all clear how
>> frequently we expect user code to need to use this protocol. Users
>> can't use the builtin if they want to be backward compatible, But code
>> that doesn't need backward compatibility can probably just work with
>> pathlib (and the stdlib support for it) directly. For display, the
>> implicit conversion to str is fine. For "get me a string representing
>> the path", is the "path" attribute being abandoned in favour of this
>> special method?
>
> Yes.

OK. So the idiom to get a string from a known Path object would be any of:

1. str(path)
2. fspath(path)
3. path.__path__()

(1) is safe if you know you have a Path object, but could incorrectly
convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I
miss any options?

So I think we need a builtin.

Code that needs to be backward compatible will still have to use
str(path), because neither the builtin nor the __path__ protocol will
exist in older versions of Python. Maybe a compatibility library could
add

try:
    fspath
except NameError:
    try:
        import pathlib
        def fspath(p):
            if isinstance(p, pathlib.Path):
                return str(p)
            return p
    except ImportError:
        def fspath(p):
            return p

It's messy, like all compatibility code, but it allows code to use
fspath(p) in older versions.

>> I'm inclined to think that if you are writing "pure
>> pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
>> certainly no *less* readable.
>
> I don't' know what you mean by "pure pathlib". You mean code that only works
> with pathlib objects? Or do you mean code that accepts pathlib objects but
> uses strings internally?

I mean code that knows it has a Path object to work with (and not a
string or anything else). But the point is moot if the path attribute
is going away.

Other than to say that I do prefer the name "path", I just don't think
it's a reasonable name for a builtin. Even if it's OK for user
variables to have the same name as builtins, IDEs tend to colour
builtins differently, which is distracting. (Temporary variables named
"file" or "dir" are the ones I hit frequently...)

If all we're debating is the name, though, I think we're pretty much there :-)

Paul

From brett at python.org  Wed Apr  6 18:46:24 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 22:46:24 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
Message-ID: <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>

On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:

> On 6 April 2016 at 20:39, Brett Cannon <brett at python.org> wrote:
> >> I'm a little confused by this. To support the older pathlib, they have
> >> to do patharg = str(patharg), because *none* of the proposed
> >> attributes (path or __path__) will exist.
> >>
> >> The getattr trick is needed to support the *new* pathlib, when you
> >> need a real string. Currently you need a string if you call stdlib
> >> functions or builtins. If we fix the stdlib/builtins, the need goes
> >> away for those cases, but remains if you need to call libraries that
> >> *don't* support pathlib (os.path will likely be one of those) or do
> >> direct string manipulation.
> >>
> >> In practice, I see the getattr trick as an "easy fix" for libraries
> >> that want to add support but in a minimally-intrusive way. On that
> >> basis, making the trick easy to use is important, which argues for an
> >> attribute.
> >
> > So then where's the confusion? :) You seem to get the points. I
> personally
> > find `path.__path__() if hasattr(path, '__path__') else path` also
> readable
> > (if obviously a bit longer).
>
> The confusion is that you seem to be saying that people can use
> getattr(path, '__path__', path) to support older versions of Python.
> But the older versions are precisely the ones that don't have __path__
> so you won't be supporting them.
>

Because pathlib is provisional the change will go into the next releases of
Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I think
the key point is that this sort of thing will occur before you have access
to some new built-in or something.


>
> >> >> >  3. Built-in? (name is dependent on #1 if we add one)
> >> >>
> >> >> fspath() -- and it would be handy to have a function that return
> either
> >> >> the __fspath__ results, or the string (if it was one), or raise an
> >> >> exception if neither of the above work out.
> >>
> >> fspath regardless of the name chosen in #1 - a new builtin called path
> >> just has too much likelihood of clashing with user code.
> >>
> >> But I'm not sure we need a builtin. I'm not at all clear how
> >> frequently we expect user code to need to use this protocol. Users
> >> can't use the builtin if they want to be backward compatible, But code
> >> that doesn't need backward compatibility can probably just work with
> >> pathlib (and the stdlib support for it) directly. For display, the
> >> implicit conversion to str is fine. For "get me a string representing
> >> the path", is the "path" attribute being abandoned in favour of this
> >> special method?
> >
> > Yes.
>
> OK. So the idiom to get a string from a known Path object would be any of:
>
> 1. str(path)
> 2. fspath(path)
> 3. path.__path__()
>
> (1) is safe if you know you have a Path object, but could incorrectly
> convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I
> miss any options?
>

Other than path.__path__ being an attribute, nope.


>
> So I think we need a builtin.
>

Well, the ugliness shouldn't survive forever if the community shifts over
to using pathlib while the built-in will. We also don't have a built-in for
__index__() so it depends on whether we expect this sort of thing to be the
purview of library authors or if normal people will be interacting with it
(it's probably both during the transition, but I don't know afterwards).


>
> Code that needs to be backward compatible will still have to use
> str(path), because neither the builtin nor the __path__ protocol will
> exist in older versions of Python.


str(path) will definitely work, path.__path__ will work if you're running
the next set of bugfix releases. fspath(path) will only work in Python 3.6
and newer.


> Maybe a compatibility library could
> add
>
> try:
>     fspath
> except NameError:
>     try:
>         import pathlib
>         def fspath(p):
>             if isinstance(p, pathlib.Path):
>                 return str(p)
>             return p
>     except ImportError:
>         def fspath(p):
>             return p
>
> It's messy, like all compatibility code, but it allows code to use
> fspath(p) in older versions.
>

I would tweak it to check for __fspath__ before it resorted to calling
str(), but yes, that could be something people use.


>
> >> I'm inclined to think that if you are writing "pure
> >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
> >> certainly no *less* readable.
> >
> > I don't' know what you mean by "pure pathlib". You mean code that only
> works
> > with pathlib objects? Or do you mean code that accepts pathlib objects
> but
> > uses strings internally?
>
> I mean code that knows it has a Path object to work with (and not a
> string or anything else). But the point is moot if the path attribute
> is going away.
>
> Other than to say that I do prefer the name "path", I just don't think
> it's a reasonable name for a builtin. Even if it's OK for user
> variables to have the same name as builtins, IDEs tend to colour
> builtins differently, which is distracting. (Temporary variables named
> "file" or "dir" are the ones I hit frequently...)
>
> If all we're debating is the name, though, I think we're pretty much there
> :-)
>

It seems like __fspath__ may be leading as a name, but not that many people
have spoken up. But that is not the only thing still up for debate. :)

We have not settled on whether a built-in is necessary.  Maybe whatever
function we come with should live in pathlib itself and not have it be a
built-in?

We have also not settled on whether __fspath__ should be a method or
attribute as that changes the boilerplate one-liner people may use if a
built-in isn't available. This is the first half of the protocol.

What exactly should this helper function do? E.g. does it simply return its
argument if __fspath__ isn't defined, or does it check for __fspath__, then
if it's an instance of str, then TypeError? This is the second half of the
protocol and will end up defining what a "path-like object" represents.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/b5532a9e/attachment-0001.html>

From greg at krypto.org  Wed Apr  6 18:54:42 2016
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 06 Apr 2016 22:54:42 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
Message-ID: <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>

Note: While I do not object to the bike shed colors being proposed, if you
call the attribute .__path__ that is somewhat confusing when thinking about
the import system which declares that *"any module that contains a __path__
attribute is considered a package"*.

So would module.__path__ become a Path instance in a potential future
making module.__path__.__path__ meaningfully confusing? ;)

I'm not worried about people who shove pathlib.Path instances in as values
into sys.modules and expect anything but pain. :P

__gps__



On Wed, Apr 6, 2016 at 3:46 PM Brett Cannon <brett at python.org> wrote:

> On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:
>
>> On 6 April 2016 at 20:39, Brett Cannon <brett at python.org> wrote:
>> >> I'm a little confused by this. To support the older pathlib, they have
>> >> to do patharg = str(patharg), because *none* of the proposed
>> >> attributes (path or __path__) will exist.
>> >>
>> >> The getattr trick is needed to support the *new* pathlib, when you
>> >> need a real string. Currently you need a string if you call stdlib
>> >> functions or builtins. If we fix the stdlib/builtins, the need goes
>> >> away for those cases, but remains if you need to call libraries that
>> >> *don't* support pathlib (os.path will likely be one of those) or do
>> >> direct string manipulation.
>> >>
>> >> In practice, I see the getattr trick as an "easy fix" for libraries
>> >> that want to add support but in a minimally-intrusive way. On that
>> >> basis, making the trick easy to use is important, which argues for an
>> >> attribute.
>> >
>> > So then where's the confusion? :) You seem to get the points. I
>> personally
>> > find `path.__path__() if hasattr(path, '__path__') else path` also
>> readable
>> > (if obviously a bit longer).
>>
>> The confusion is that you seem to be saying that people can use
>> getattr(path, '__path__', path) to support older versions of Python.
>> But the older versions are precisely the ones that don't have __path__
>> so you won't be supporting them.
>>
>
> Because pathlib is provisional the change will go into the next releases
> of Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I
> think the key point is that this sort of thing will occur before you have
> access to some new built-in or something.
>
>
>>
>> >> >> >  3. Built-in? (name is dependent on #1 if we add one)
>> >> >>
>> >> >> fspath() -- and it would be handy to have a function that return
>> either
>> >> >> the __fspath__ results, or the string (if it was one), or raise an
>> >> >> exception if neither of the above work out.
>> >>
>> >> fspath regardless of the name chosen in #1 - a new builtin called path
>> >> just has too much likelihood of clashing with user code.
>> >>
>> >> But I'm not sure we need a builtin. I'm not at all clear how
>> >> frequently we expect user code to need to use this protocol. Users
>> >> can't use the builtin if they want to be backward compatible, But code
>> >> that doesn't need backward compatibility can probably just work with
>> >> pathlib (and the stdlib support for it) directly. For display, the
>> >> implicit conversion to str is fine. For "get me a string representing
>> >> the path", is the "path" attribute being abandoned in favour of this
>> >> special method?
>> >
>> > Yes.
>>
>> OK. So the idiom to get a string from a known Path object would be any of:
>>
>> 1. str(path)
>> 2. fspath(path)
>> 3. path.__path__()
>>
>> (1) is safe if you know you have a Path object, but could incorrectly
>> convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I
>> miss any options?
>>
>
> Other than path.__path__ being an attribute, nope.
>
>
>>
>> So I think we need a builtin.
>>
>
> Well, the ugliness shouldn't survive forever if the community shifts over
> to using pathlib while the built-in will. We also don't have a built-in for
> __index__() so it depends on whether we expect this sort of thing to be the
> purview of library authors or if normal people will be interacting with it
> (it's probably both during the transition, but I don't know afterwards).
>
>
>>
>> Code that needs to be backward compatible will still have to use
>> str(path), because neither the builtin nor the __path__ protocol will
>> exist in older versions of Python.
>
>
> str(path) will definitely work, path.__path__ will work if you're running
> the next set of bugfix releases. fspath(path) will only work in Python 3.6
> and newer.
>
>
>> Maybe a compatibility library could
>> add
>>
>> try:
>>     fspath
>> except NameError:
>>     try:
>>         import pathlib
>>         def fspath(p):
>>             if isinstance(p, pathlib.Path):
>>                 return str(p)
>>             return p
>>     except ImportError:
>>         def fspath(p):
>>             return p
>>
>> It's messy, like all compatibility code, but it allows code to use
>> fspath(p) in older versions.
>>
>
> I would tweak it to check for __fspath__ before it resorted to calling
> str(), but yes, that could be something people use.
>
>
>>
>> >> I'm inclined to think that if you are writing "pure
>> >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
>> >> certainly no *less* readable.
>> >
>> > I don't' know what you mean by "pure pathlib". You mean code that only
>> works
>> > with pathlib objects? Or do you mean code that accepts pathlib objects
>> but
>> > uses strings internally?
>>
>> I mean code that knows it has a Path object to work with (and not a
>> string or anything else). But the point is moot if the path attribute
>> is going away.
>>
>> Other than to say that I do prefer the name "path", I just don't think
>> it's a reasonable name for a builtin. Even if it's OK for user
>> variables to have the same name as builtins, IDEs tend to colour
>> builtins differently, which is distracting. (Temporary variables named
>> "file" or "dir" are the ones I hit frequently...)
>>
>> If all we're debating is the name, though, I think we're pretty much
>> there :-)
>>
>
> It seems like __fspath__ may be leading as a name, but not that many
> people have spoken up. But that is not the only thing still up for debate.
> :)
>
> We have not settled on whether a built-in is necessary.  Maybe whatever
> function we come with should live in pathlib itself and not have it be a
> built-in?
>
> We have also not settled on whether __fspath__ should be a method or
> attribute as that changes the boilerplate one-liner people may use if a
> built-in isn't available. This is the first half of the protocol.
>
> What exactly should this helper function do? E.g. does it simply return
> its argument if __fspath__ isn't defined, or does it check for __fspath__,
> then if it's an instance of str, then TypeError? This is the second half of
> the protocol and will end up defining what a "path-like object" represents.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/4e143c34/attachment.html>

From greg.ewing at canterbury.ac.nz  Wed Apr  6 18:59:25 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 07 Apr 2016 10:59:25 +1200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
Message-ID: <570594CD.5010805@canterbury.ac.nz>

Nick Coghlan wrote:
> I'd missed the existing precedent in DirEntry.path, so simply taking
> that and running with it sounds good to me.

It's not quite the same thing, though. DirEntry.path takes
something that is not a path (a DirEntry instance) and
gives you a path representing it, so the name makes sense.

But a Path instance is already "a path", so Path.path
is weird. Path.str would make more sense.

-- 
Greg

From njs at pobox.com  Wed Apr  6 19:25:15 2016
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Apr 2016 16:25:15 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
Message-ID: <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>

On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon <brett at python.org> wrote:
>
>
> On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:
>>
>> So I think we need a builtin.
>
>
> Well, the ugliness shouldn't survive forever if the community shifts over to
> using pathlib while the built-in will. We also don't have a built-in for
> __index__() so it depends on whether we expect this sort of thing to be the
> purview of library authors or if normal people will be interacting with it
> (it's probably both during the transition, but I don't know afterwards).

For __index__ the "built-in" is:

from operator import index

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From njs at pobox.com  Wed Apr  6 19:27:13 2016
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Apr 2016 16:27:13 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
Message-ID: <CAPJVwB=iLkA6DDxY4wKsLo3RHBFLMuzSN25r=LurLxSD92EJ_g@mail.gmail.com>

On Wed, Apr 6, 2016 at 3:54 PM, Gregory P. Smith <greg at krypto.org> wrote:
> Note: While I do not object to the bike shed colors being proposed, if you
> call the attribute .__path__ that is somewhat confusing when thinking about
> the import system which declares that "any module that contains a __path__
> attribute is considered a package".

To me this observation seems to rule out __path__ as an option: even
if they wouldn't clash in practice, then right now googling __path__
sends you straight to the import system documentation. If we overload
the meaning of the string then it'll make a mess of the
trying-to-figure-out-what-this-__thing__-is experience.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From brett at python.org  Wed Apr  6 19:26:58 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 23:26:58 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
Message-ID: <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>

On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon <brett at python.org> wrote:
> >
> >
> > On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:
> >>
> >> So I think we need a builtin.
> >
> >
> > Well, the ugliness shouldn't survive forever if the community shifts
> over to
> > using pathlib while the built-in will. We also don't have a built-in for
> > __index__() so it depends on whether we expect this sort of thing to be
> the
> > purview of library authors or if normal people will be interacting with
> it
> > (it's probably both during the transition, but I don't know afterwards).
>
> For __index__ the "built-in" is:
>
> from operator import index
>

Which suggests perhaps we should have pathlib.fspath() instead of a
built-in.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/00aee815/attachment.html>

From brett at python.org  Wed Apr  6 19:27:27 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 06 Apr 2016 23:27:27 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
Message-ID: <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>

On Wed, 6 Apr 2016 at 15:54 Gregory P. Smith <greg at krypto.org> wrote:

> Note: While I do not object to the bike shed colors being proposed, if you
> call the attribute .__path__ that is somewhat confusing when thinking about
> the import system which declares that *"any module that contains a
> __path__ attribute is considered a package"*.
>
> So would module.__path__ become a Path instance in a potential future
> making module.__path__.__path__ meaningfully confusing? ;)
>
> I'm not worried about people who shove pathlib.Path instances in as values
> into sys.modules and expect anything but pain. :P
>

Ah, good point. I think that kills __path__ then as an option.

-Brett


>
> __gps__
>
>
>
> On Wed, Apr 6, 2016 at 3:46 PM Brett Cannon <brett at python.org> wrote:
>
>> On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:
>>
>>> On 6 April 2016 at 20:39, Brett Cannon <brett at python.org> wrote:
>>> >> I'm a little confused by this. To support the older pathlib, they have
>>> >> to do patharg = str(patharg), because *none* of the proposed
>>> >> attributes (path or __path__) will exist.
>>> >>
>>> >> The getattr trick is needed to support the *new* pathlib, when you
>>> >> need a real string. Currently you need a string if you call stdlib
>>> >> functions or builtins. If we fix the stdlib/builtins, the need goes
>>> >> away for those cases, but remains if you need to call libraries that
>>> >> *don't* support pathlib (os.path will likely be one of those) or do
>>> >> direct string manipulation.
>>> >>
>>> >> In practice, I see the getattr trick as an "easy fix" for libraries
>>> >> that want to add support but in a minimally-intrusive way. On that
>>> >> basis, making the trick easy to use is important, which argues for an
>>> >> attribute.
>>> >
>>> > So then where's the confusion? :) You seem to get the points. I
>>> personally
>>> > find `path.__path__() if hasattr(path, '__path__') else path` also
>>> readable
>>> > (if obviously a bit longer).
>>>
>>> The confusion is that you seem to be saying that people can use
>>> getattr(path, '__path__', path) to support older versions of Python.
>>> But the older versions are precisely the ones that don't have __path__
>>> so you won't be supporting them.
>>>
>>
>> Because pathlib is provisional the change will go into the next releases
>> of Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I
>> think the key point is that this sort of thing will occur before you have
>> access to some new built-in or something.
>>
>>
>>>
>>> >> >> >  3. Built-in? (name is dependent on #1 if we add one)
>>> >> >>
>>> >> >> fspath() -- and it would be handy to have a function that return
>>> either
>>> >> >> the __fspath__ results, or the string (if it was one), or raise an
>>> >> >> exception if neither of the above work out.
>>> >>
>>> >> fspath regardless of the name chosen in #1 - a new builtin called path
>>> >> just has too much likelihood of clashing with user code.
>>> >>
>>> >> But I'm not sure we need a builtin. I'm not at all clear how
>>> >> frequently we expect user code to need to use this protocol. Users
>>> >> can't use the builtin if they want to be backward compatible, But code
>>> >> that doesn't need backward compatibility can probably just work with
>>> >> pathlib (and the stdlib support for it) directly. For display, the
>>> >> implicit conversion to str is fine. For "get me a string representing
>>> >> the path", is the "path" attribute being abandoned in favour of this
>>> >> special method?
>>> >
>>> > Yes.
>>>
>>> OK. So the idiom to get a string from a known Path object would be any
>>> of:
>>>
>>> 1. str(path)
>>> 2. fspath(path)
>>> 3. path.__path__()
>>>
>>> (1) is safe if you know you have a Path object, but could incorrectly
>>> convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I
>>> miss any options?
>>>
>>
>> Other than path.__path__ being an attribute, nope.
>>
>>
>>>
>>> So I think we need a builtin.
>>>
>>
>> Well, the ugliness shouldn't survive forever if the community shifts over
>> to using pathlib while the built-in will. We also don't have a built-in for
>> __index__() so it depends on whether we expect this sort of thing to be the
>> purview of library authors or if normal people will be interacting with it
>> (it's probably both during the transition, but I don't know afterwards).
>>
>>
>>>
>>> Code that needs to be backward compatible will still have to use
>>> str(path), because neither the builtin nor the __path__ protocol will
>>> exist in older versions of Python.
>>
>>
>> str(path) will definitely work, path.__path__ will work if you're running
>> the next set of bugfix releases. fspath(path) will only work in Python 3.6
>> and newer.
>>
>>
>>> Maybe a compatibility library could
>>> add
>>>
>>> try:
>>>     fspath
>>> except NameError:
>>>     try:
>>>         import pathlib
>>>         def fspath(p):
>>>             if isinstance(p, pathlib.Path):
>>>                 return str(p)
>>>             return p
>>>     except ImportError:
>>>         def fspath(p):
>>>             return p
>>>
>>> It's messy, like all compatibility code, but it allows code to use
>>> fspath(p) in older versions.
>>>
>>
>> I would tweak it to check for __fspath__ before it resorted to calling
>> str(), but yes, that could be something people use.
>>
>>
>>>
>>> >> I'm inclined to think that if you are writing "pure
>>> >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) -
>>> >> certainly no *less* readable.
>>> >
>>> > I don't' know what you mean by "pure pathlib". You mean code that only
>>> works
>>> > with pathlib objects? Or do you mean code that accepts pathlib objects
>>> but
>>> > uses strings internally?
>>>
>>> I mean code that knows it has a Path object to work with (and not a
>>> string or anything else). But the point is moot if the path attribute
>>> is going away.
>>>
>>> Other than to say that I do prefer the name "path", I just don't think
>>> it's a reasonable name for a builtin. Even if it's OK for user
>>> variables to have the same name as builtins, IDEs tend to colour
>>> builtins differently, which is distracting. (Temporary variables named
>>> "file" or "dir" are the ones I hit frequently...)
>>>
>>> If all we're debating is the name, though, I think we're pretty much
>>> there :-)
>>>
>>
>> It seems like __fspath__ may be leading as a name, but not that many
>> people have spoken up. But that is not the only thing still up for debate.
>> :)
>>
>> We have not settled on whether a built-in is necessary.  Maybe whatever
>> function we come with should live in pathlib itself and not have it be a
>> built-in?
>>
>> We have also not settled on whether __fspath__ should be a method or
>> attribute as that changes the boilerplate one-liner people may use if a
>> built-in isn't available. This is the first half of the protocol.
>>
>> What exactly should this helper function do? E.g. does it simply return
>> its argument if __fspath__ isn't defined, or does it check for __fspath__,
>> then if it's an instance of str, then TypeError? This is the second half of
>> the protocol and will end up defining what a "path-like object" represents.
>>
> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/7641f586/attachment-0001.html>

From ethan at stoneleaf.us  Wed Apr  6 19:37:11 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 16:37:11 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
 <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
Message-ID: <57059DA7.2090504@stoneleaf.us>

On 04/06/2016 04:26 PM, Brett Cannon wrote:
> On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith wrote:

>> For __index__ the "built-in" is:
>>
>> from operator import index
>
> Which suggests perhaps we should have pathlib.fspath() instead of a
> built-in.

+1

--
~Ethan~

From ethan at stoneleaf.us  Wed Apr  6 19:44:59 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 16:44:59 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
Message-ID: <57059F7B.3090901@stoneleaf.us>

On 04/06/2016 04:27 PM, Brett Cannon wrote:
> On Wed, 6 Apr 2016 at 15:54 Gregory P. Smithwrote:
>>
>> So would module.__path__ become a Path instance in a potential
>> future making module.__path__.__path__ meaningfully confusing? ;)
>>
>> I'm not worried about people who shove pathlib.Path instances in as
>> values into sys.modules and expect anything but pain. :P
>
> Ah, good point. I think that kills __path__ then as an option.

Excellent!  Narrowing the field then to:

__fspath__

__os_path__


Step right up!  Cast yer votes!

--
~Ethan~


From v+python at g.nevcal.com  Wed Apr  6 20:21:03 2016
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 6 Apr 2016 17:21:03 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57059F7B.3090901@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us>
Message-ID: <5705A7EF.1070401@g.nevcal.com>

On 4/6/2016 4:44 PM, Ethan Furman wrote:
> On 04/06/2016 04:27 PM, Brett Cannon wrote:
>> On Wed, 6 Apr 2016 at 15:54 Gregory P. Smithwrote:
>>>
>>> So would module.__path__ become a Path instance in a potential
>>> future making module.__path__.__path__ meaningfully confusing? ;)
>>>
>>> I'm not worried about people who shove pathlib.Path instances in as
>>> values into sys.modules and expect anything but pain. :P
>>
>> Ah, good point. I think that kills __path__ then as an option.
>
> Excellent!  Narrowing the field then to:
>
> __fspath__

-1: not all os names that look like files actually refer to the file 
system: pipes, devices, etc.
>
> __os_path__

+1: the special names are os dependent, so os seems like an appropriate 
prefix.

>
>
> Step right up!  Cast yer votes!
>
> -- 
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/2e857796/attachment.html>

From chris.barker at noaa.gov  Wed Apr  6 20:43:42 2016
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Wed, 6 Apr 2016 17:43:42 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57059F7B.3090901@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us>
Message-ID: <-5625672377616017435@unknownmsgid>

>> Ah, good point. I think that kills __path__ then as an option.

Darn. I really preferred that. Oh well.

> __fspath__

+0.1

But not a big deal. I think this is pretty much for occasional use by
library authors, so not a big deal what it is named.

Which also means that I don't think we need a built-in function that
calls it, either. How often do people need a stringified-path version
of an arbitrary object?

Which makes me think: str() calls __str__ on an arbitrary object, and
creates a new string object.

But fspath(), if it exists, would call __fspath__ on an arbitrary
object, and create a new string -- not a new Path. That may be
confusing...

If we were starting from scratch, I suppose __path__ would return a
Path object -- it would be a protocol one could use to duck-type a
path.

But since we have history, we are creating a protocol that conforms to
the existing string-as-path protocol.

So are we imagining that future libs will be written that only take
objects with a __fspath__ method? In which case, do we need to add it
to str? In which case, this is all kind of pointless.

Or maybe all future libs will continue to accept either an str or an
object with __fspath__.  In which case, this is pretty pointless, too.

I guess what I'm wondering is if we are stuck with str-paths as the
lingua-Franca for paths forever. In which case, we should embrace that
and just call str() on anything passed in as a path argument.

Sure, then open(3.5) will give you a file not found error, or maybe
create a file with a weird name, but really? Who's going to make that
mistake and not figure it out really quickly?

-CHB

From ethan at stoneleaf.us  Wed Apr  6 20:57:21 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 17:57:21 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <-5625672377616017435@unknownmsgid>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
Message-ID: <5705B071.1010207@stoneleaf.us>

On 04/06/2016 05:43 PM, Chris Barker - NOAA Federal wrote:

>> __fspath__
>
> +0.1
>
> But not a big deal. I think this is pretty much for occasional use by
> library authors, so not a big deal what it is named.

It's mostly for the stdlib itself.  I imagine that most libraries would 
just take what they are given and pass it along to open or os.stat or 
whatever.

> Which also means that I don't think we need a built-in function that
> calls it, either. How often do people need a stringified-path version
> of an arbitrary object?

Not often.

> Which makes me think: str() calls __str__ on an arbitrary object, and
> creates a new string object.
>
> But fspath(), if it exists, would call __fspath__ on an arbitrary
> object, and create a new string -- not a new Path. That may be
> confusing...

It would be more along the lines of pickle -- give me the standard 
serialized form of this Path, please.

> If we were starting from scratch, I suppose __path__ would return a
> Path object -- it would be a protocol one could use to duck-type a
> path.

Sure.

> But since we have history, we are creating a protocol that conforms to
> the existing string-as-path protocol.

Yup.

> So are we imagining that future libs will be written that only take
> objects with a __fspath__ method? In which case, do we need to add it
> to str? In which case, this is all kind of pointless.

We are imagining that future libraries that have to muck about with 
paths will work with Path objects, either by accepting them or 
converting to them as the (possibly) stringified paths are passed in -- 
and when necessary those libs can pass either the Path obj or the 
stringified path to the stdlib.

> Or maybe all future libs will continue to accept either an str or an
> object with __fspath__.  In which case, this is pretty pointless, too.

The point is to allow future programs to work with Path and be able to 
work with the stdlib as seamlessly and painlessly as possible.

> I guess what I'm wondering is if we are stuck with str-paths as the
> lingua-Franca for paths forever. In which case, we should embrace that
> and just call str() on anything passed in as a path argument.

Nah.  That's inviting trouble and pain, and we're trying to get away 
from that.

> Sure, then open(3.5) will give you a file not found error, or maybe
> create a file with a weird name, but really? Who's going to make that
> mistake and not figure it out really quickly?

Well, since the 3.5 was actually in my_var, and could have been written 
before it was read, it could easily be days, weeks, or even months -- 
probably after the last guy quit, you took the job, the server died, and 
you had to restore from backup -- at which point you'll see all the 
really, really strange file names and wonder what they are.  And of 
course, whatever logic was determining those weird names is now out of 
sync because of the server swap.

And, yeah, I've seen weirder things happen.

--
~Ethan~

From wes.turner at gmail.com  Wed Apr  6 22:24:19 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 6 Apr 2016 21:24:19 -0500
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
 <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
Message-ID: <CACfEFw8J43JgfUYFf_=t3apV+dLv3cXFKQwjSnLezoz7q7ramg@mail.gmail.com>

On Apr 6, 2016 6:31 PM, "Brett Cannon" <brett at python.org> wrote:
>
>
>
> On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon <brett at python.org> wrote:
>> >
>> >
>> > On Wed, 6 Apr 2016 at 15:22 Paul Moore <p.f.moore at gmail.com> wrote:
>> >>
>> >> So I think we need a builtin.
>> >
>> >
>> > Well, the ugliness shouldn't survive forever if the community shifts
over to
>> > using pathlib while the built-in will. We also don't have a built-in
for
>> > __index__() so it depends on whether we expect this sort of thing to
be the
>> > purview of library authors or if normal people will be interacting
with it
>> > (it's probably both during the transition, but I don't know
afterwards).
>>
>> For __index__ the "built-in" is:
>>
>> from operator import index
>
>
> Which suggests perhaps we should have pathlib.fspath() instead of a
built-in.

Would it make sense to instead have pathlib.Path.__init__?

>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/c2e68bcf/attachment.html>

From ethan at stoneleaf.us  Wed Apr  6 22:40:55 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 19:40:55 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CACfEFw8J43JgfUYFf_=t3apV+dLv3cXFKQwjSnLezoz7q7ramg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
 <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
 <CACfEFw8J43JgfUYFf_=t3apV+dLv3cXFKQwjSnLezoz7q7ramg@mail.gmail.com>
Message-ID: <5705C8B7.6000802@stoneleaf.us>

On 04/06/2016 07:24 PM, Wes Turner wrote:
> On Apr 6, 2016 6:31 PM, "Brett Cannon" wrote:

>> Which suggests perhaps we should have pathlib.fspath() instead of a
>> built-in.
>
> Would it make sense to instead have pathlib.Path.__init__?

We already have that -- it's what makes a Path.

What we are looking for is a function that accepts a Path or a str and 
returns the Path as a str, or the str passed in.

--
~Ethan~


From wes.turner at gmail.com  Wed Apr  6 23:12:47 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 6 Apr 2016 22:12:47 -0500
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5705C8B7.6000802@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAPJVwBk54higD5HRUChhJV7wd2rP4=g2zg99FjsDHPPgkHe8ZA@mail.gmail.com>
 <CAP1=2W5VkBvVwipWOy=Gdo-=OTGXqwSnV0JcAoFmao+P0QAsCQ@mail.gmail.com>
 <CACfEFw8J43JgfUYFf_=t3apV+dLv3cXFKQwjSnLezoz7q7ramg@mail.gmail.com>
 <5705C8B7.6000802@stoneleaf.us>
Message-ID: <CACfEFw-90VVNHxhAZo_WTa92SDwJEm6Wi50t7jQgERWC9CkUyA@mail.gmail.com>

My mistake.

On Wed, Apr 6, 2016 at 9:40 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/06/2016 07:24 PM, Wes Turner wrote:
>
>> On Apr 6, 2016 6:31 PM, "Brett Cannon" wrote:
>>
>
> Which suggests perhaps we should have pathlib.fspath() instead of a
>>> built-in.
>>>
>>
>> Would it make sense to instead have pathlib.Path.__init__?
>>
>
> We already have that -- it's what makes a Path.
>
> What we are looking for is a function that accepts a Path or a str and
> returns the Path as a str, or the str passed in.
>
> --
> ~Ethan~
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/3f7d01de/attachment.html>

From chris.barker at noaa.gov  Wed Apr  6 23:50:23 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 6 Apr 2016 20:50:23 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5705B071.1010207@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705B071.1010207@stoneleaf.us>
Message-ID: <CALGmxELMVsa5Lt5bHmB+JJnNF_OyXBNFtd9GYMbJGZtYft20Pg@mail.gmail.com>

On Wed, Apr 6, 2016 at 5:57 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> But not a big deal. I think this is pretty much for occasional use by
>
> library authors, so not a big deal what it is named.
>>
>
> It's mostly for the stdlib itself.  I imagine that most libraries would
> just take what they are given and pass it along to open or os.stat or
> whatever.
>

Exactly -- so we really don't need a builtin shortcut.


> Which makes me think: str() calls __str__ on an arbitrary object, and
>> creates a new string object.
>>
>> But fspath(), if it exists, would call __fspath__ on an arbitrary
>> object, and create a new string -- not a new Path. That may be
>> confusing...
>>
>
> It would be more along the lines of pickle -- give me the standard
> serialized form of this Path, please.
>

well, give me the standard serialized-path of this arbitrary object, yes?


> So are we imagining that future libs will be written that only take
>> objects with a __fspath__ method? In which case, do we need to add it
>> to str? In which case, this is all kind of pointless.
>>
>
> We are imagining that future libraries that have to muck about with paths
> will work with Path objects, either by accepting them or converting to them
> as the (possibly) stringified paths are passed in -- and when necessary
> those libs can pass either the Path obj or the stringified path to the
> stdlib.


if that's the case, we don't need the __fspath__ protocol -- the reason for
the protocol is that we imagine there may be any number of third-party
objects to represent/work-with paths, that aren't strings or stdlib Path
objects.

Or maybe all future libs will continue to accept either an str or an
>> object with __fspath__.  In which case, this is pretty pointless, too.
>>
>
> The point is to allow future programs to work with Path and be able to
> work with the stdlib as seamlessly and painlessly as possible.
>

again, we don't need a new protocol for that -- we only need the protocol
if we want arbitrary future programs to work with arbitrary path
implementations.

which I suppose we do -- there are already other path implimentaitons out
there (though at least some are strings :-) )


> I guess what I'm wondering is if we are stuck with str-paths as the
>> lingua-Franca for paths forever. In which case, we should embrace that
>> and just call str() on anything passed in as a path argument.
>>
>
> Nah.  That's inviting trouble and pain, and we're trying to get away from
> that.
>
> Sure, then open(3.5) will give you a file not found error, or maybe
>> create a file with a weird name, but really? Who's going to make that
>> mistake and not figure it out really quickly?
>>
>
> Well, since the 3.5 was actually in my_var, and could have been written
> before it was read, it could easily be days, weeks, or even months --
> probably after the last guy quit, you took the job, the server died, and
> you had to restore from backup -- at which point you'll see all the really,
> really strange file names and wonder what they are.  And of course,
> whatever logic was determining those weird names is now out of sync because
> of the server swap.
>
> And, yeah, I've seen weirder things happen.
>

People can totally screw up path variables as strings or Path objects too
-- I'm having trouble seeing that this is all that more likely -- after
all, python is a dynamic language -- if we wanted full type safety, we
wouldn't be using python...

Speaking of which, how is this going to work with the new type system? Do
we need an ABC, rather than just a protocol?

But as long as we get to the stdlib taking Path objects, I'm happy :-)

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/0da8a215/attachment-0001.html>

From ethan at stoneleaf.us  Thu Apr  7 00:15:19 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 21:15:19 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CALGmxELMVsa5Lt5bHmB+JJnNF_OyXBNFtd9GYMbJGZtYft20Pg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705B071.1010207@stoneleaf.us>
 <CALGmxELMVsa5Lt5bHmB+JJnNF_OyXBNFtd9GYMbJGZtYft20Pg@mail.gmail.com>
Message-ID: <5705DED7.2070303@stoneleaf.us>

On 04/06/2016 08:50 PM, Chris Barker wrote:
 > On Wed, Apr 6, 2016 at 5:57 PM, Ethan Furman wrote:

 >> It's mostly for the stdlib itself.  I imagine that most libraries
 >> would just take what they are given and pass it along to open or
 >> os.stat or whatever.
 >
 > Exactly -- so we really don't need a builtin shortcut.

Hey, we have to program the stdlib too!  No need to make it harder for 
ourselves.


 >> It would be more along the lines of pickle -- give me the standard
 >> serialized form of this Path, please.
 >
 > well, give me the standard serialized-path of this arbitrary object,
 > yes?

Yes.  :)


 >> We are imagining that future libraries that have to muck about with
 >> paths will work with Path objects, either by accepting them or
 >> converting to them as the (possibly) stringified paths are passed in
 >> -- and when necessary those libs can pass either the Path obj or the
 >> stringified path to the stdlib.
 >
 > if that's the case, we don't need the __fspath__ protocol -- then
 > reason for the protocol is that we imagine there may be any number of
 > third-party objects to represent/work-with paths, that aren't strings
 > or stdlib Path objects.

The purpose of the __os_path__ method is two-fold:

- it's presence declares that the object is a path (or convertible
   to one)
- it does the conversion

Since we need it for ourselves there's no reason to prevent others
from taking advantage of it.


 >> The point is to allow future programs to work with Path and be able
 >> to work with the stdlib as seamlessly and painlessly as possible.
 >
 > again, we don't need a new protocol for that -- we only need the
 > protocol if we want arbitrary future programs to work with arbitrary
 > path implementations.

I am certainly not opposed to that.

 > which I suppose we do -- there are already other path implimentaitons
 > out there (though at least some are strings :-) )

Right.  And I'm already making changes to mine to work with this
new stuff.


 > People can totally screw up path variables as strings or Path objects
 > too -- I'm having trouble seeing that this is all that more likely --
 > after all, python is a dynamic language -- if we wanted full type
 > safety, we wouldn't be using python...

Very True.  ;)

 > Speaking of which, how is this going to work with the new type
 > system?  Do we need an ABC, rather than just a protocol?

I do not know, good question.

 > But as long as we get to the stdlib taking Path objects, I'm happy :-)

Excellent!

--
~Ethan~


From stephen at xemacs.org  Thu Apr  7 00:37:48 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 7 Apr 2016 13:37:48 +0900
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CALGmxELMVsa5Lt5bHmB+JJnNF_OyXBNFtd9GYMbJGZtYft20Pg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705B071.1010207@stoneleaf.us>
 <CALGmxELMVsa5Lt5bHmB+JJnNF_OyXBNFtd9GYMbJGZtYft20Pg@mail.gmail.com>
Message-ID: <22277.58396.936738.919834@turnbull.sk.tsukuba.ac.jp>

Chris Barker writes:

 > which I suppose we do -- there are already other path implimentaitons out
 > there (though at least some are strings :-) )

Even so, their __fspath__ implementation might return syntactically
canonicalized or realpath paths, rather than whatever is input.  If
cached and the path frequently accessed, the realpath implementation
might be a significant win in some applications.


From raymond.hettinger at gmail.com  Thu Apr  7 01:08:53 2016
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 6 Apr 2016 22:08:53 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
Message-ID: <7324A271-1736-4385-8D35-CD48EA74F4C8@gmail.com>


> On Apr 5, 2016, at 3:55 PM, Guido van Rossum <guido at python.org> wrote:
> 
> It's been provisional since 3.4. I think if it is still there in 3.6.0
> it should be considered no longer provisional. But this may indeed be
> a test case for the ultimate fate of provisional modules -- should we
> remove it?

I lean slightly towards for removal. 

Having worked through the API when it is first released, I find it to be highly forgettable (i.e. I have to re-read the docs each time I've revisited it).

While I haven't seen any uptake in real code, there are occasional questions about it on StackOverflow, so we do know that there is at least some interest.  I'm not sure that it needs to live in the standard library though.


Raymond

From ericsnowcurrently at gmail.com  Thu Apr  7 01:45:56 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 6 Apr 2016 23:45:56 -0600
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CALFfu7ATvpQ_PKEOMof7hJ21iWJ3qF9t4uxrUWYQ6C1v4RbkYw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <20160406154334.058182b6@subdivisions.wooz.org>
 <CALFfu7DLO70k5Deew-zuE9skVd4e=+ZVAYsbR3DsiuDWv6+Q5Q@mail.gmail.com>
 <CALFfu7ATvpQ_PKEOMof7hJ21iWJ3qF9t4uxrUWYQ6C1v4RbkYw@mail.gmail.com>
Message-ID: <CALFfu7BxkuOhq9DDNMezVKnWH_GhKzS6rQi6TB6YT2mwX0ypxQ@mail.gmail.com>

On Apr 6, 2016 14:00, "Barry Warsaw" <barry at python.org> wrote:
> Aside from the name of the attribute (though I'm partial to __path__),

Ahem, pkg.__path__.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160406/8f95727b/attachment.html>

From greg.ewing at canterbury.ac.nz  Thu Apr  7 02:15:40 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 07 Apr 2016 18:15:40 +1200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <-5625672377616017435@unknownmsgid>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
Message-ID: <5705FB0C.2090705@canterbury.ac.nz>

Chris Barker - NOAA Federal wrote:
> But fspath(), if it exists, would call __fspath__ on an arbitrary
> object, and create a new string -- not a new Path. That may be
> confusing...

Maybe something like fspathstr/__fspathstr__ would be better?

-- 
Greg

From ethan at stoneleaf.us  Thu Apr  7 02:31:27 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 23:31:27 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5705FB0C.2090705@canterbury.ac.nz>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz>
Message-ID: <5705FEBF.3070301@stoneleaf.us>

On 04/06/2016 11:15 PM, Greg Ewing wrote:
> Chris Barker - NOAA Federal wrote:
>> But fspath(), if it exists, would call __fspath__ on an arbitrary
>> object, and create a new string -- not a new Path. That may be
>> confusing...
>
> Maybe something like fspathstr/__fspathstr__ would be better?

As someone already said, we don't need to embed the type in the name.

The point of the __os_path__ protocol is to return the serialized 
version of the Path the object represents.  This would be somewhat 
similar to the various __reduce*__ protocols (which I thought had 
something to do with adding until I learned what they were for).

--
~Ethan~


From ethan at stoneleaf.us  Thu Apr  7 02:34:27 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 06 Apr 2016 23:34:27 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <5705FF73.2050806@stoneleaf.us>

On 04/06/2016 10:26 AM, Brett Cannon wrote:

>  2. Method or attribute? (changes what kind of one-liner you might use
>     in libraries, but I think historically all protocols have been
>     methods and the serialized string representation might be costly to
>     build)

Having thought about this some more, it seems we have enough __dunder__ 
attributes that are plain strings that having this one also be a plain 
string should not be a problem:

- __name__
- __module__
- __file__

Since Paths are immutable the __os_path__ attribute isn't going to 
change and doesn't need to be a method.

--
~Ethan~


From songofacandy at gmail.com  Thu Apr  7 03:00:49 2016
From: songofacandy at gmail.com (INADA Naoki)
Date: Thu, 7 Apr 2016 16:00:49 +0900
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
 <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
Message-ID: <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>

On Thu, Apr 7, 2016 at 2:41 AM, Brett Cannon <brett at python.org> wrote:

>
>
> On Wed, 6 Apr 2016 at 10:36 Michel Desmoulin <desmoulinmichel at gmail.com>
> wrote:
>
>> Wouldn't be better to generalize that to a "__location__" protocol,
>> which allow to return any kind of location, including path, url or
>> coordinate, ip_address, etc ?
>>
>
> No because all of those things have different semantic meaning. See the
> __index__ PEP for reasons why you would tightly bound protocols instead of
> overloading ones like __int__ for multiple meanings.
>
> -Brett
>

https://www.python.org/dev/peps/pep-0357/

> It is not possible to use the nb_int (and __int__ special method)
> for this purpose because that method is used to *coerce* objects
> to integers.

I feel adding protocol only for path is bit over engineering. So I'm -0.5
on adding __fspath__.

I'm +1 on adding general protocol for *coerce to string* like __index__.
+0.5 on inherit from str (and drop byte path support).

-- 
INADA Naoki  <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/0450c927/attachment.html>

From songofacandy at gmail.com  Thu Apr  7 03:04:28 2016
From: songofacandy at gmail.com (INADA Naoki)
Date: Thu, 7 Apr 2016 16:04:28 +0900
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
 <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
 <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
Message-ID: <CAEfz+TxD2+H0w-pkaDZ413MGwpE-Bj2VUHdcD4qxoxUm6=2kbg@mail.gmail.com>

FYI, Ruby's Pathname class doesn't inherit String.

http://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html

Ruby has two "convert to string" method.
`.to_s` is like `__str__`.
`.to_str` is like `__index__` but for str.  It is used for implicit
conversion.

File.open accepts any object implements `.to_str`.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/440b8e55/attachment.html>

From g.brandl at gmx.net  Thu Apr  7 03:19:17 2016
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 7 Apr 2016 09:19:17 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <ne51lr$go0$1@ger.gmane.org>

On 04/06/2016 07:26 PM, Brett Cannon wrote:
> WIth Ethan volunteering to do the work to help make a path protocol a thing --
> and I'm willing to help along with propagating this through the stdlib where I
> think Serhiy might be interested in helping as well -- and a seeming consensus
> this is a good idea, it seems like this proposal has a chance of actually coming
> to fruition.
> 
> Now we need clear details. :) Some open questions are:

Throwing in my 2 bikesheds here, not having read all subthreads:

>  1. Name: __path__, __fspath__, or something else?

__path__ is already taken as a module attribute, so I would avoid it.
__fspath__ is fine with me, although the more explicit variants are also
ok.  It's not like you need to read/write it constantly (that's the goal).

>  2. Method or attribute? (changes what kind of one-liner you might use in
>     libraries, but I think historically all protocols have been methods and the
>     serialized string representation might be costly to build)

An attribute would be somewhat inconsistent with the special-method lookup rules
(looked up on the type, not the instance), so a method is probably a better
choice.

>  3. Built-in? (name is dependent on #1 if we add one)

I don't think it warrants a builtin.  I'd place it as a function in pathlib.

>  4. Add the method/attribute to str? (I assume so, much like __index__() is on
>     int, but I have not seen it explicitly stated so I would rather clarify it)

+1.

>  5. Expand the C API to have something like PyObject_Path()?

+1 (with _Py_ at first) since you're going to need it in a lot of C functions.

Georg



From p.f.moore at gmail.com  Thu Apr  7 03:59:14 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 Apr 2016 08:59:14 +0100
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
Message-ID: <CACac1F8uBfs+HOqGHAGiAx0J5UDCuSPZ+juMNXEYiXN+-Ccjug@mail.gmail.com>

On 6 April 2016 at 23:46, Brett Cannon <brett at python.org> wrote:
> str(path) will definitely work, path.__path__ will work if you're running
> the next set of bugfix releases. fspath(path) will only work in Python 3.6
> and newer.

Ah, that was something I hadn't appreciated, that the builtin would be
3.6+ whereas the protocol would be added to current bugfix releases.

>> Maybe a compatibility library could
>> add
>>
>> try:
>>     fspath
>> except NameError:
>>     try:
>>         import pathlib
>>         def fspath(p):
>>             if isinstance(p, pathlib.Path):
>>                 return str(p)
>>             return p
>>     except ImportError:
>>         def fspath(p):
>>             return p
>>
>> It's messy, like all compatibility code, but it allows code to use
>> fspath(p) in older versions.
>
>
> I would tweak it to check for __fspath__ before it resorted to calling
> str(), but yes, that could be something people use.

Yeah, the above code assumes that if the builtin isn't available, nor
will the protocol be (see my misunderstanding above).

Paul

From Nikolaus at rath.org  Thu Apr  7 06:48:28 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Thu, 07 Apr 2016 12:48:28 +0200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5705FEBF.3070301@stoneleaf.us> (Ethan Furman's message of "Wed, 
 06 Apr 2016 23:31:27 -0700")
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
Message-ID: <87oa9l3dab.fsf@thinkpad.rath.org>

On Apr 06 2016, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/06/2016 11:15 PM, Greg Ewing wrote:
>> Chris Barker - NOAA Federal wrote:
>>> But fspath(), if it exists, would call __fspath__ on an arbitrary
>>> object, and create a new string -- not a new Path. That may be
>>> confusing...
>>
>> Maybe something like fspathstr/__fspathstr__ would be better?
>
> As someone already said, we don't need to embed the type in the name.
>
> The point of the __os_path__ protocol is to return the serialized
> version of the Path the object represents.  This would be somewhat
> similar to the various __reduce*__ protocols (which I thought had
> something to do with adding until I learned what they were for).

Does anyone anticipate any classes other than those from pathlib to come
with such a method?

It seems odd to me to introduce a special method (and potentially a
buildin too) if it's only going to be used by a single module.

Why is:

path = getattr(obj, '__fspath__') if hasattr(obj, '__fspath__') else obj

better than

path = str(obj) if isinstance(obj, pathlib.Path) else obj

?

Yes, I know there are other pathlib-like modules out there. But isn't
pathlib meant to replace them?

Best,
Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From donald at stufft.io  Thu Apr  7 07:03:56 2016
From: donald at stufft.io (Donald Stufft)
Date: Thu, 7 Apr 2016 07:03:56 -0400
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <87oa9l3dab.fsf@thinkpad.rath.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
 <87oa9l3dab.fsf@thinkpad.rath.org>
Message-ID: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io>


> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath <Nikolaus at rath.org> wrote:
> 
> Does anyone anticipate any classes other than those from pathlib to come
> with such a method?


It seems like it would be reasonable for pathlib.Path to call fspath on the
path passed to pathlib.Path.__init__, which would mean that if other libraries
implemented __fspath__ then you could pass their path objects to pathlib and
it would just work (and similarly, if they also called fspath it would enable
interoperation between all of the various path libraries).

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/32899beb/attachment.sig>

From p.f.moore at gmail.com  Thu Apr  7 07:05:43 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 Apr 2016 12:05:43 +0100
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <87oa9l3dab.fsf@thinkpad.rath.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
 <87oa9l3dab.fsf@thinkpad.rath.org>
Message-ID: <CACac1F-orLN1Ehsfm5EtMQcV14Mx2j=U3Zz9kgG2fqPhXwzV3A@mail.gmail.com>

On 7 April 2016 at 11:48, Nikolaus Rath <Nikolaus at rath.org> wrote:
> Why is:
>
> path = getattr(obj, '__fspath__') if hasattr(obj, '__fspath__') else obj
>
> better than
>
> path = str(obj) if isinstance(obj, pathlib.Path) else obj

One reason is that the former doesn't need you to import pathlib,
which is good if you need to work with older versions of Python that
don't have pathlib at all (yes, it's just some standard conditional
import boilerplate, but it's additional messiness).

Paul

From rosuav at gmail.com  Thu Apr  7 08:11:34 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 Apr 2016 22:11:34 +1000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57059F7B.3090901@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us>
Message-ID: <CAPTjJmoU=94BG=Ne0VcDogHqsk7km2sHP_REzf72ivQjBtT_1Q@mail.gmail.com>

On Thu, Apr 7, 2016 at 9:44 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Excellent!  Narrowing the field then to:
>
> __fspath__
>
> __os_path__
>
>
> Step right up!  Cast yer votes!

+0.9 for __fspath__; I'd prefer a one-word name, but with __path__ out
of the running (which I agree with), there's no other obvious word.
__fspath__ is a close second.

-1 for __os_path__, unless it's reasonable to justify it as "most of
the standard library uses Path objects, but os.path uses strings, so
before you pass a Path to anything in os.path, you call path.ospath()
on it, which calls __os_path__()". And that seems a bit hairy and
roundabout; what it's _really_ doing is giving you back a string, and
that has little to do with os.path.

ChrisA

From ericsnowcurrently at gmail.com  Thu Apr  7 10:21:34 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Thu, 7 Apr 2016 08:21:34 -0600
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <ne51lr$go0$1@ger.gmane.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <ne51lr$go0$1@ger.gmane.org>
Message-ID: <CALFfu7C4CvNnKw7wBEfp2A6atD8YeQgriY1VjwV7k_0+sgQHvg@mail.gmail.com>

On Apr 7, 2016 1:22 AM, "Georg Brandl" <g.brandl at gmx.net> wrote:
>
> On 04/06/2016 07:26 PM, Brett Cannon wrote:
> >  1. Name: __path__, __fspath__, or something else?
>
> __path__ is already taken as a module attribute, so I would avoid it.
> __fspath__ is fine with me, although the more explicit variants are also
> ok.  It's not like you need to read/write it constantly (that's the goal).

+1

I also think that __ospath__ may be more correct since it is an
OS-dependent representation, e.g. slash vs. backslash.

>
> >  2. Method or attribute? (changes what kind of one-liner you might use
in
> >     libraries, but I think historically all protocols have been methods
and the
> >     serialized string representation might be costly to build)
>
> An attribute would be somewhat inconsistent with the special-method
lookup rules
> (looked up on the type, not the instance), so a method is probably a
better
> choice.

I was just about to point this out.  The deviation by pickle (lookup on
instance rather than type) has been a source of pain.

>
> >  3. Built-in? (name is dependent on #1 if we add one)
>
> I don't think it warrants a builtin.  I'd place it as a function in
pathlib.

+1

>
> >  4. Add the method/attribute to str? (I assume so, much like
__index__() is on
> >     int, but I have not seen it explicitly stated so I would rather
clarify it)
>
> +1.

+1

>
> >  5. Expand the C API to have something like PyObject_Path()?
>
> +1 (with _Py_ at first) since you're going to need it in a lot of C
functions.

+1

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/53b1d9a4/attachment.html>

From jimjjewett at gmail.com  Thu Apr  7 10:25:43 2016
From: jimjjewett at gmail.com (Jim J. Jewett)
Date: Thu, 7 Apr 2016 10:25:43 -0400
Subject: [Python-Dev] pathlib (was: Defining a path protocol)
Message-ID: <CA+OGgf659yEKH8_hfgZ2DiKcr8D7SGySzvvDfjo=ZwEk7t396g@mail.gmail.com>

(1)  I think the "built-in" should instead be a module-level function
in the pathlib.  If you aren't already expecting pathlib paths, then
you're just expecting strings to work anyhow, and a builtin isn't
likely to be helpful.

(2)  I prefer that the function be explicit about the fact that it is
downcasting the representation to a string.  e.g.,
pathlib.path_as_string(my_path)

But if the final result is ospath or fspath or ... I won't fight too
hard, particularly since the output may be a bytestring rather than a
str.

-jJ

From ericsnowcurrently at gmail.com  Thu Apr  7 10:40:37 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Thu, 7 Apr 2016 08:40:37 -0600
Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib
 stop being provisional?
In-Reply-To: <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7Bw9mMhXJB5x5kLj2CW8yGxO9aZ2Lw=aNYyHhOqKkPFcA@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
Message-ID: <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>

On Apr 6, 2016 11:11 PM, "Raymond Hettinger" <raymond.hettinger at gmail.com>
wrote:
> Having worked through the API when it is first released, I find it to be
highly forgettable (i.e. I have to re-read the docs each time I've
revisited it).

Agreed, though it's arguably better than argparse, logging, unittest, or
several other stdlib modules.  To some extent the challenge with those is
the complexity of the problem space.  Furthermore, the key for any
sufficiently complex module is that the common-case usage is intuitive and
simple enough.  Some stdlib modules do a better job of that than others.
:/  How much would you say that any of that applies to pathlib?  What about
relative to other similar packages on the cheeseshop?

Regardless, are there any specific improvements you'd recommend while the
module is still provisional?  Are your concerns a matter of structure vs.
naming?  Usability vs. (intuitive) discoverability?

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/92cb417a/attachment.html>

From p.f.moore at gmail.com  Thu Apr  7 11:18:55 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 Apr 2016 16:18:55 +0100
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7Bw9mMhXJB5x5kLj2CW8yGxO9aZ2Lw=aNYyHhOqKkPFcA@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
Message-ID: <CACac1F8+pemBk2nO2m2qPZHSZm1P89_jYA2-=qK3=WGnVPB_=g@mail.gmail.com>

On 7 April 2016 at 15:40, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Apr 6, 2016 11:11 PM, "Raymond Hettinger" <raymond.hettinger at gmail.com>
> wrote:
>> Having worked through the API when it is first released, I find it to be
>> highly forgettable (i.e. I have to re-read the docs each time I've revisited
>> it).
>
> Agreed, though it's arguably better than argparse, logging, unittest, or
> several other stdlib modules.  To some extent the challenge with those is
> the complexity of the problem space.  Furthermore, the key for any
> sufficiently complex module is that the common-case usage is intuitive and
> simple enough.  Some stdlib modules do a better job of that than others. :/
> How much would you say that any of that applies to pathlib?  What about
> relative to other similar packages on the cheeseshop?

Personally, the main issue I have with remembering pathlib method
names, is the inconsistency with the existing modules. I always have
to check that it's path.is_dir() compared to os.path.isdir(pathstr).
And it's os.path.dirname(pathstr) vs path.parent. On the other hand,
the consistency between path.parent (for the immediate parent) and
path.parents (for the sequence of parents) is useful, so it's not
clear cut.

There's nothing fundamentally *wrong* with the pathlib method names,
but there's no obvious reason why they needed to change.

I'll get used to them. It's just one more stumbling block that makes
me feel like it's a bit too hard to bother, and I go back to os.path.

Would I change the names? I honestly don't know. If os.path was going
to disappear, then no - the inconsistency is a short term problem. But
even if there's a major switch to pathlib, I expect os.path to remain
indefinitely, and that inconsistency will be a wart that we'll have to
live with for a long time.

Paul

From ethan at stoneleaf.us  Thu Apr  7 11:33:13 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 07 Apr 2016 08:33:13 -0700
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <CACac1F8+pemBk2nO2m2qPZHSZm1P89_jYA2-=qK3=WGnVPB_=g@mail.gmail.com>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
 <CACac1F8+pemBk2nO2m2qPZHSZm1P89_jYA2-=qK3=WGnVPB_=g@mail.gmail.com>
Message-ID: <57067DB9.7060804@stoneleaf.us>

On 04/07/2016 08:18 AM, Paul Moore wrote:
> On 7 April 2016 at 15:40, Eric Snow  wrote:
>> On Apr 6, 2016 11:11 PM, "Raymond Hettinger" wrote:

>>> Having worked through the API when it is first released, I find it to be
>>> highly forgettable (i.e. I have to re-read the docs each time I've revisited
>>> it).
>>
>> Agreed, though it's arguably better than argparse, logging, unittest, or
>> several other stdlib modules.

> Personally, the main issue I have with remembering pathlib method
> names, is the inconsistency with the existing modules.

That is one of the things I really dislike.  If the behaviour is the 
same as the os version, it should have the same name.  I also have no 
problem with new names that makes more sense so long as an alias exists 
for the os version (can even be deprecated without removal).

> Would I change the names? I honestly don't know. If os.path was going
> to disappear, then no - the inconsistency is a short term problem. But
> even if there's a major switch to pathlib, I expect os.path to remain
> indefinitely, and that inconsistency will be a wart that we'll have to
> live with for a long time.

os.path isn't going anywhere.

--
~Ethan~


From desmoulinmichel at gmail.com  Thu Apr  7 06:50:42 2016
From: desmoulinmichel at gmail.com (Michel Desmoulin)
Date: Thu, 7 Apr 2016 12:50:42 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <570575C9.7060208@mail.de>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de>
Message-ID: <57063B82.3090307@gmail.com>



Le 06/04/2016 22:47, Sven R. Kunze a ?crit :
> On 06.04.2016 07:00, Guido van Rossum wrote:
>> On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>> [...] we can't do:
>>>
>>>      app_root = Path(...)
>>>      config = app_root/'settings.cfg'
>>>      with open(config) as blah:
>>>          # whatever
>>>
>>> It feels like instead of addressing this basic disconnect, the answer
>>> has
>>> instead been:  add that to pathlib!  Which works great -- until a
>>> user or a
>>> library gets this path object and tries to use something from os on it.
>> I agree that asking for config.open() isn't the right answer here
>> (even if it happens to work).
> 
> How come?
> 
>> But in this example, once 3.5.2 is out,
>> the solution would be to use open(config.path), and that will also
>> work when passing it to a library. Is it still unacceptable then?
> 
> I think so. Although in this example I would prefer the shorter
> config.open alternative as I am lazy.
> 
> 
> I still cannot remember what the concrete issue was why we dropped
> pathlib the same day we gave it a try. It was something really stupid
> and although I hoped to reduce the size of the code, it was less
> readable. But it was not the path->str issue but something more mundane.
> It was something that forced us to use os[.path] as Path didn't provide
> something equivalent. Cannot remember.....

Path objects don't have splitext() or and don't allow  "string" / path.
Those are the ones bugging me the most.

> 
> 
> Best,
> Sven
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com
> 

From projetmbc at gmail.com  Thu Apr  7 01:24:42 2016
From: projetmbc at gmail.com (Christophe Bal)
Date: Thu, 7 Apr 2016 07:24:42 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAAb4jGmT4Te2-wvM4osAjeS6ut1xZ-CrevRPf-=Bung5MYekVw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CACac1F-aKFjmXdaO8+BSuTbNdp_XkYKbWXA3LZnc0bifNjye7w@mail.gmail.com>
 <CAAb4jG=K1SsEe=AhJVPycjZDZW4fNt12Hvt7eAEou68e3awOxg@mail.gmail.com>
 <CAAb4jGmT4Te2-wvM4osAjeS6ut1xZ-CrevRPf-=Bung5MYekVw@mail.gmail.com>
Message-ID: <CAAb4jGngN1ey7TJ-JiDt1A7rm_L_rzOmEjRn5ZVhUAX0=qozYA@mail.gmail.com>

As a simple user, pathlib simplifies playing with paths. A lot of things
are easy to do. For example, Pathlib / "subfile" is so useful.

I also have a subclass of pathlib.Path on github that makes easy seeking
for files and directories.

So keep alive pathlib !
Le 6 avr. 2016 13:06, "Paul Moore" <p.f.moore at gmail.com> a ?crit :

On 6 April 2016 at 00:45, Guido van Rossum <guido at python.org> wrote:
> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py
>
> (And yes, I'm dead serious about the latter, rather Solomonic option.)

By the way, even if there's no solution that satisfies everyone to the
"inherit from str" question, I'd still be unhappy if pathlib
disappeared from the stdlib. It's useful for quick admin scripts that
don't justify an external dependency. Those typically do quite a bit
of path manipulation, and as such benefit from the improved API of
pathlib over os.path.

+1 on making (and documenting) a final decision on the "inherit from
str" question
-1 on removing pathlib just because that decision might not satisfy everyone

Paul
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/projetmbc%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/120627cd/attachment.html>

From chris.barker at noaa.gov  Thu Apr  7 11:44:12 2016
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Thu, 7 Apr 2016 08:44:12 -0700
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <57067DB9.7060804@stoneleaf.us>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
 <CACac1F8+pemBk2nO2m2qPZHSZm1P89_jYA2-=qK3=WGnVPB_=g@mail.gmail.com>
 <57067DB9.7060804@stoneleaf.us>
Message-ID: <-4168706305295605305@unknownmsgid>

>> Personally, the main issue I have with remembering pathlib method
>> names, is the inconsistency with the existing modules.

Was this *really*  not brought up when this was introduced? Oh well.

We could add aliases, but I think it's not such a big deal. I'm
convinced that the largest barrier to adoption has been that it can't
be used with the stdlib. And I think the discussion on Python-ideas
supports that.

That, and py2 compatibility. There is a back port on PyPi, but it
can't be used with the stdlib, either. Not sure what to do about
that--maybe it should inherit from Unicode?

-CHB


> That is one of the things I really dislike.  If the behaviour is the same as the os version, it should have the same name.  I also have no problem with new names that makes more sense so long as an alias exists for the os version (can even be deprecated without removal).
>
>> Would I change the names? I honestly don't know. If os.path was going
>> to disappear, then no - the inconsistency is a short term problem. But
>> even if there's a major switch to pathlib, I expect os.path to remain
>> indefinitely, and that inconsistency will be a wart that we'll have to
>> live with for a long time.
>
> os.path isn't going anywhere.
>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov

From ethan at stoneleaf.us  Thu Apr  7 11:47:56 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 07 Apr 2016 08:47:56 -0700
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <-4168706305295605305@unknownmsgid>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
 <CACac1F8+pemBk2nO2m2qPZHSZm1P89_jYA2-=qK3=WGnVPB_=g@mail.gmail.com>
 <57067DB9.7060804@stoneleaf.us> <-4168706305295605305@unknownmsgid>
Message-ID: <5706812C.6060507@stoneleaf.us>

On 04/07/2016 08:44 AM, Chris Barker - NOAA Federal wrote:

> We could add aliases, but I think it's not such a big deal. I'm
> convinced that the largest barrier to adoption has been that it can't
> be used with the stdlib. And I think the discussion on Python-ideas
> supports that.

Lack of interoperability is a huge issue; using different but similar 
names is still an issue.

> That, and py2 compatibility. There is a back port on PyPi, but it
> can't be used with the stdlib, either. Not sure what to do about
> that--maybe it should inherit from Unicode?

Also huge, and agree it (the backport) should inherit from unicode.

--
~Ethan~

From ethan at stoneleaf.us  Thu Apr  7 11:52:11 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 07 Apr 2016 08:52:11 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <57063B82.3090307@gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com>
Message-ID: <5706822B.1010801@stoneleaf.us>

On 04/07/2016 03:50 AM, Michel Desmoulin wrote:

> Path objects don't have splitext() or and don't allow  "string" / path.
> Those are the ones bugging me the most.

--> Path('README.md')

--> p = Path('README.md')   # PosixPath('README.md')

--> '/home/ethan' / p  # PosixPath('/home/ethan/README.md')

--> p.splitext()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'PosixPath' object has no attribute 'splitext'

So, yeah, no .splitext()

--
~Ethan~

From zachary.ware+pydev at gmail.com  Thu Apr  7 12:13:22 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Thu, 7 Apr 2016 11:13:22 -0500
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <57063B82.3090307@gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com>
Message-ID: <CAKJDb-OMDWggv9A=9NhD-j+pQpRcM7uo_Vm_ij+MgLDtczNhbQ@mail.gmail.com>

On Thu, Apr 7, 2016 at 5:50 AM, Michel Desmoulin
<desmoulinmichel at gmail.com> wrote:
> Path objects don't have splitext() or and don't allow  "string" / path.
> Those are the ones bugging me the most.

>>> import pathlib
>>> p = '/some/test' / pathlib.Path('path') / 'file_with.ext'
>>> p
PosixPath('/some/test/path/file_with.ext')
>>> p.parent, p.stem, p.suffix
(PosixPath('/some/test/path'), 'file_with', '.ext')


-- 
Zach

From chris.barker at noaa.gov  Thu Apr  7 12:50:49 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 Apr 2016 09:50:49 -0700
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <57063B82.3090307@gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com>
Message-ID: <CALGmxEKm3eac3DFhDa2AROrw_Ga-CW5S+K0yPgmfpmQ7N98pmg@mail.gmail.com>

On Thu, Apr 7, 2016 at 3:50 AM, Michel Desmoulin <desmoulinmichel at gmail.com>
wrote:
>
> Path objects don't have splitext()


that is useful -- let's add it. (and others if need be)

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/e6c96e50/attachment.html>

From chris.barker at noaa.gov  Thu Apr  7 12:56:21 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 Apr 2016 09:56:21 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
 <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
 <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
Message-ID: <CALGmxELxqWbenwmLvuCrLC7YD8K0TW6pR-0oBYmyODdRB+8ovg@mail.gmail.com>

On Thu, Apr 7, 2016 at 12:00 AM, INADA Naoki <songofacandy at gmail.com> wrote:

>
> I feel adding protocol only for path is bit over engineering. So I'm -0.5
> on adding __fspath__.
>
> I'm +1 on adding general protocol for *coerce to string* like __index__.
>

isn't __str__ the protocol for "coerce to string" ?

__index__ is a protocol for "coerce to an integer that can be used as an
index", which is like __fspath__ would be "coerce to a string that can be
used as a path"

the whole point is that __str__ will "work" with virtually anything --
whether it can reasonably be used as a path or not. I'm not sure that's a
problem, but if it is, then that's what this new protocol is trying to
solve, just like __Index__ enforces that only things that are intended to
be used as indexes will work.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/e834e242/attachment.html>

From chris.barker at noaa.gov  Thu Apr  7 12:59:22 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 Apr 2016 09:59:22 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
 <87oa9l3dab.fsf@thinkpad.rath.org>
 <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io>
Message-ID: <CALGmxEKBSmvGWEJ2udCBrge252jS0K6p2_pkMssXo0cN2t99yw@mail.gmail.com>

On Thu, Apr 7, 2016 at 4:03 AM, Donald Stufft <donald at stufft.io> wrote:


> It seems like it would be reasonable for pathlib.Path to call fspath on the
> path passed to pathlib.Path.__init__, which would mean that if other
> libraries
> implemented __fspath__ then you could pass their path objects to pathlib
> and
> it would just work


and then any lib that needed a path, could simply wrap Path() around
whatever was passed in.

This is much like using np.array() if you want numpy arrays -- it works
great.

numpy is trickier because they are mutable and can be big, so you don't
want to make a copy if you don't need to -- hence the np.asarray() function
-- but Paths are immutable and far more lightweight.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/097d8eb5/attachment.html>

From antoine at python.org  Thu Apr  7 13:02:11 2016
From: antoine at python.org (Antoine Pitrou)
Date: Thu, 7 Apr 2016 19:02:11 +0200
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7Bw9mMhXJB5x5kLj2CW8yGxO9aZ2Lw=aNYyHhOqKkPFcA@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
Message-ID: <57069293.9040800@python.org>


Le 07/04/2016 16:40, Eric Snow a ?crit :
> 
> On Apr 6, 2016 11:11 PM, "Raymond Hettinger"
> <raymond.hettinger at gmail.com <mailto:raymond.hettinger at gmail.com>> wrote:
>> Having worked through the API when it is first released, I find it to
> be highly forgettable (i.e. I have to re-read the docs each time I've
> revisited it).
> 
> Agreed, though it's arguably better than argparse, logging, unittest, or
> several other stdlib modules.  To some extent the challenge with those
> is the complexity of the problem space.  Furthermore, the key for any
> sufficiently complex module is that the common-case usage is intuitive
> and simple enough.

This is terribly unspecific as far as criticism goes. "Highly
forgettable" depends on who you ask.  I tend to find unittest and
logging quite useful myself, even if I have to look at the docs from
time to time (and I'm certainly not the only one).

I don't think you'll find an API that doesn't need any learning or
getting used, unless it's simply copying another API.
os.path() is extremely forgettable as well, but after years of getting
used people may feel it's "natural".  Put Python in the hands of a
non-Python programmer, they will find many things bizarre and
uncomfortable compared to their language of choice...

Regards

Antoine.

From desmoulinmichel at gmail.com  Thu Apr  7 14:19:04 2016
From: desmoulinmichel at gmail.com (Michel Desmoulin)
Date: Thu, 7 Apr 2016 20:19:04 +0200
Subject: [Python-Dev] When should pathlib stop being provisional?
In-Reply-To: <CAKJDb-OMDWggv9A=9NhD-j+pQpRcM7uo_Vm_ij+MgLDtczNhbQ@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <5704909E.8070908@stoneleaf.us>
 <CAP7+vJLZu2vdyU2CPQ2Kq+koY=LDgwdkDyc1pe5+bBhUHmn0UA@mail.gmail.com>
 <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com>
 <CAKJDb-OMDWggv9A=9NhD-j+pQpRcM7uo_Vm_ij+MgLDtczNhbQ@mail.gmail.com>
Message-ID: <5706A498.90507@gmail.com>

Fair enough, I stand corrected for both points.

Le 07/04/2016 18:13, Zachary Ware a ?crit :
> On Thu, Apr 7, 2016 at 5:50 AM, Michel Desmoulin
> <desmoulinmichel at gmail.com> wrote:
>> Path objects don't have splitext() or and don't allow  "string" / path.
>> Those are the ones bugging me the most.
> 
>>>> import pathlib
>>>> p = '/some/test' / pathlib.Path('path') / 'file_with.ext'
>>>> p
> PosixPath('/some/test/path/file_with.ext')
>>>> p.parent, p.stem, p.suffix
> (PosixPath('/some/test/path'), 'file_with', '.ext')
> 
> 

From njs at pobox.com  Thu Apr  7 14:44:30 2016
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 7 Apr 2016 11:44:30 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CALGmxELxqWbenwmLvuCrLC7YD8K0TW6pR-0oBYmyODdRB+8ovg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
 <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
 <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
 <CALGmxELxqWbenwmLvuCrLC7YD8K0TW6pR-0oBYmyODdRB+8ovg@mail.gmail.com>
Message-ID: <CAPJVwBkZvDd5EALAkozx7y+8q_wdgmyjKgnuf8was-=bQcRGcg@mail.gmail.com>

On Apr 7, 2016 10:00 AM, "Chris Barker" <chris.barker at noaa.gov> wrote:
>
> On Thu, Apr 7, 2016 at 12:00 AM, INADA Naoki <songofacandy at gmail.com>
wrote:
>>
>>
>> I feel adding protocol only for path is bit over engineering. So I'm
-0.5 on adding __fspath__.
>>
>> I'm +1 on adding general protocol for *coerce to string* like __index__.
>
>
> isn't __str__ the protocol for "coerce to string" ?
>
> __index__ is a protocol for "coerce to an integer that can be used as an
index", which is like __fspath__ would be "coerce to a string that can be
used as a path"

No, __index__ is the protocol for "do a safe coerce to integer". The name
is misleading, but its use in non-indexing contexts is well established.
E.g.

" ab" * obj

will return a string with obj.__index__() repetitions.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/612b6c33/attachment.html>

From chris.barker at noaa.gov  Thu Apr  7 15:03:31 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 Apr 2016 12:03:31 -0700
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <57069293.9040800@python.org>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7Bw9mMhXJB5x5kLj2CW8yGxO9aZ2Lw=aNYyHhOqKkPFcA@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
 <57069293.9040800@python.org>
Message-ID: <CALGmxELgRof6vTzr8Uoa8bp4QO-pvx3+_d+7RP56_wVmdP-nKQ@mail.gmail.com>

On Thu, Apr 7, 2016 at 10:02 AM, Antoine Pitrou <antoine at python.org> wrote:

> >> Having worked through the API when it is first released, I find it to
> > be highly forgettable
>


> This is terribly unspecific as far as criticism goes. "Highly
> forgettable" depends on who you ask.


Exactly -- for my part, I need to look up most of os.path every time I use
it....

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/4f039ca6/attachment.html>

From chris.barker at noaa.gov  Thu Apr  7 15:06:19 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 Apr 2016 12:06:19 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAPJVwBkZvDd5EALAkozx7y+8q_wdgmyjKgnuf8was-=bQcRGcg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <570548DD.7080108@gmail.com>
 <CAP1=2W7JJSqTv9gx2ZZztXwsdV4u6gLRNBQknFi4GmZgwxpJew@mail.gmail.com>
 <CAEfz+TxeO7NxcA53sY9P9v_TEDakqr7T2oWgpdeDRkuVXzQ5vw@mail.gmail.com>
 <CALGmxELxqWbenwmLvuCrLC7YD8K0TW6pR-0oBYmyODdRB+8ovg@mail.gmail.com>
 <CAPJVwBkZvDd5EALAkozx7y+8q_wdgmyjKgnuf8was-=bQcRGcg@mail.gmail.com>
Message-ID: <CALGmxEKWMuu6O7o67agbRvmd9yfC=xbHOJn0M_L0F9T+wgMC1w@mail.gmail.com>

On Thu, Apr 7, 2016 at 11:44 AM, Nathaniel Smith <njs at pobox.com> wrote:

> No, __index__ is the protocol for "do a safe coerce to integer". The name
> is misleading, but its use in non-indexing contexts is well established.
> E.g.
>
> " ab" * obj
>
> will return a string with obj.__index__() repetitions.
>
A good argument for Chris A's proposal over on python-ideas to have a
dunder method for "coerce to a lossless string", that could be used for
Path, but also for who knows what else?

As I see it , exactly the same as the __fspath__ idea, except that we'd use
a name that made it clear you might want to use it for other things (and
str would grow that method...)

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160407/45c2a8d3/attachment-0001.html>

From greg.ewing at canterbury.ac.nz  Fri Apr  8 01:59:43 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 08 Apr 2016 17:59:43 +1200
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAPTjJmoU=94BG=Ne0VcDogHqsk7km2sHP_REzf72ivQjBtT_1Q@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us>
 <CAPTjJmoU=94BG=Ne0VcDogHqsk7km2sHP_REzf72ivQjBtT_1Q@mail.gmail.com>
Message-ID: <570748CF.3090503@canterbury.ac.nz>

Chris Angelico wrote:
> -1 for __os_path__, unless it's reasonable to justify it as "most of
> the standard library uses Path objects, but os.path uses strings, so
> before you pass a Path to anything in os.path, you call path.ospath()
> on it, which calls __os_path__()".

A less roundabout interpretation would be that it returns
the path in a form that is directly acceptable to the OS.

BTW, if __fspath__ is acceptable, __ospath__ (without the
embedded _) should be as well.

-- 
Greg

From ethan at stoneleaf.us  Fri Apr  8 02:27:28 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 07 Apr 2016 23:27:28 -0700
Subject: [Python-Dev] summary: a Path protocol
Message-ID: <57074F50.7080407@stoneleaf.us>

The discussion has ranged all over, so let me try to sum up:


Name:

   __ospath__


Method or attribute?

   Method (implementations are of course free to pre-build and/or
   cache the value)


Built-in?

   no, rather a function in pathlib - ospath()


Add the method/attribute to str?

   Not necessary -- but if somebody else wants to do that part I
   am not opposed


Expand the C API to have something like PyObject_Path()?

   Yes - and if I understood correctly this function will do the
   same as pathlib.ospath(), just at the C level?  And what will
   its name be, exactly?

--
~Ethan~

From victor.stinner at gmail.com  Fri Apr  8 02:35:54 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 8 Apr 2016 08:35:54 +0200
Subject: [Python-Dev] summary: a Path protocol
In-Reply-To: <57074F50.7080407@stoneleaf.us>
References: <57074F50.7080407@stoneleaf.us>
Message-ID: <CAMpsgwa3UAbb42xow8xd96gXL8bgtXROXuhUACNwsvD-cp5Ofw@mail.gmail.com>

Sorry, I don't have time to read the whole discussion. What is the problem
with adding a __str__ to pathlib?

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/30c1a3e2/attachment.html>

From rosuav at gmail.com  Fri Apr  8 02:57:55 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 Apr 2016 16:57:55 +1000
Subject: [Python-Dev] summary: a Path protocol
In-Reply-To: <CAMpsgwa3UAbb42xow8xd96gXL8bgtXROXuhUACNwsvD-cp5Ofw@mail.gmail.com>
References: <57074F50.7080407@stoneleaf.us>
 <CAMpsgwa3UAbb42xow8xd96gXL8bgtXROXuhUACNwsvD-cp5Ofw@mail.gmail.com>
Message-ID: <CAPTjJmoxcQ6crs6LfUmXUJ=j2-3_p4ikXzVGEK1x8iEcL4iCgA@mail.gmail.com>

On Fri, Apr 8, 2016 at 4:35 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
> Sorry, I don't have time to read the whole discussion. What is the problem
> with adding a __str__ to pathlib?
>
> Victor

Everything else has __str__ too, so you run the risk of open(["Hello",
"World"], "w") working and doing something weird. Or of passing an
open file object to something that was expecting a file name, and
having *that* work too. Calling str(p) on something that ought to be
either a Path or a string should raise an exception if given something
else.

ChrisA

From ncoghlan at gmail.com  Fri Apr  8 05:50:04 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 8 Apr 2016 19:50:04 +1000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>

On 7 April 2016 at 03:26, Brett Cannon <brett at python.org> wrote:
> WIth Ethan volunteering to do the work to help make a path protocol a thing
> -- and I'm willing to help along with propagating this through the stdlib
> where I think Serhiy might be interested in helping as well -- and a seeming
> consensus this is a good idea, it seems like this proposal has a chance of
> actually coming to fruition.
>
> Now we need clear details. :) Some open questions are:
>
> Name: __path__, __fspath__, or something else?

__fspath__

> Method or attribute? (changes what kind of one-liner you might use in
> libraries, but I think historically all protocols have been methods and the
> serialized string representation might be costly to build)

Method, as long as there's a helper function somewhere

> Built-in? (name is dependent on #1 if we add one)

os.fspath (alongside os.fsencode and os.fsdecode)

(Putting this in a module low in the dependency stack makes it easy
for other modules to access without pulling in all of pathlib's
dependencies)

> Add the method/attribute to str? (I assume so, much like __index__() is on
> int, but I have not seen it explicitly stated so I would rather clarify it)

Makes sense

> Expand the C API to have something like PyObject_Path()?

PyUnicode_FromFSPath, perhaps? The return type is well-defined here,
so it can be done as an alternate constructor, and the C API
counterparts of os.fsdecode and os.fsencode are PyUnicode functions
(specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault)

> Some people have asked for the pathlib PEP to have a more flushed out
> reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't
> want to do it I can try to instil my blog post into a more succinct
> paragraph or two and update the PEP myself.
>
> Is this going to require a PEP or if we can agree on the points here are we
> just going to do it? If we think it requires a PEP I'm willing to write it,
> but I obviously have no issue if we skip that step either. :)

It's worth summarising in a PEP at least for communications purposes -
very easy for folks that don't follow python-dev to miss otherwise.
Plus my specific API suggestions are pretty different from Ethan's :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Fri Apr  8 09:31:49 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 8 Apr 2016 15:31:49 +0200
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <CAMpsgwZ79SzOcZ=wLfspdVn8JHPp4YkckJgp1G8Qq-D_1z4ztQ@mail.gmail.com>

Please write a new PEP.

The topic looks to be discussed since many months by many different people
on different mailing list. A PEP is a good standard to take a decision and
it became clear that a decision must be taken for pathlib.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/7f35b128/attachment.html>

From victor.stinner at gmail.com  Fri Apr  8 09:45:36 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 8 Apr 2016 15:45:36 +0200
Subject: [Python-Dev] Other pathlib improvements? was: When should
 pathlib stop being provisional?
In-Reply-To: <57069293.9040800@python.org>
References: <CALFfu7Dh5kYK_LZO8AQ=fgFzh_rP_EOEOEak-19J=B4FkuJmHw@mail.gmail.com>
 <CALFfu7Bw9mMhXJB5x5kLj2CW8yGxO9aZ2Lw=aNYyHhOqKkPFcA@mail.gmail.com>
 <CALFfu7BLVu0tWFojpdw8kR5QmK7z+0SOqHU9_Lg2Ka-Wmz0B2A@mail.gmail.com>
 <CALFfu7C8TUM3qrzUt+RCqRzPDVQ+Amt6fDZMdXwzfSNT59g=xw@mail.gmail.com>
 <CALFfu7DBtbmUwGkQJJY82_v6J6D-VUWHm_RQcmO=m6FJNRCVzw@mail.gmail.com>
 <CALFfu7A3xtcMD7=FjQT157kfkXb-YE8xzPP6irDHLGsM0VpP7A@mail.gmail.com>
 <CALFfu7CzAJ-YVDaZgtANYdOhJkSQvA8VgUKY_OdfDJ3kfcttRg@mail.gmail.com>
 <CALFfu7AUatBvQCQ8XZZ6AA7CHBqY58pFtV3shHCSYbzjC7VkKg@mail.gmail.com>
 <CALFfu7CjfFVW_xR9bB3Q=2fVb8pQwvWeuEjftEFERZWOXrS-gA@mail.gmail.com>
 <CALFfu7DskArGjmgpfskYqnk6c3dW-71HP1yq7Q=EbESoCyz_bg@mail.gmail.com>
 <CALFfu7CEhj4ZO2fctVqysVu=wCqGRCkLV7-qxeVVqpm+BmrVoA@mail.gmail.com>
 <CALFfu7DfSYME89gcMyd35HMoEjOrVU2+=686C+6ixwsggHwZkA@mail.gmail.com>
 <CALFfu7C+OcKuRXzKMwCSyt-dCYWcBEMfpM4eHv3X46ybXCoDsA@mail.gmail.com>
 <CALFfu7AnqCcQhpAhgHgHzEZOgSKMSqU0zP5Bz2TkV_GooQTpHA@mail.gmail.com>
 <57069293.9040800@python.org>
Message-ID: <CAMpsgwYnDGKxJ2dYpX2hoN9Mf+xL2iReOAgHMaEegGQ9jijYtg@mail.gmail.com>

FYI the doc of the builtin functions is the #1 in stats of docs python.org.

I also read this doc every week, even if I consider that I know well
Python. IMHO it's not an issue to regulary read the doc.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/129a95f2/attachment.html>

From victor.stinner at gmail.com  Fri Apr  8 09:56:04 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 8 Apr 2016 15:56:04 +0200
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
Message-ID: <CAMpsgwYNf0J9vgCD6ip9ffCh9npcJddEPKJBw=kSK0u-93D00A@mail.gmail.com>

I like __fspath__ because it looks like os.fsencode() and os.fsdecode().

Please no builtin function, we have enough of them, but make sure that the
__fspath__ is accepted in all functions expecting a filename.

If you consider that a function would make your change simpler, I suggest
to add os.fspath():

if isinstance(obj, str): return obj
try: return obj.__fspath__
except AttributeError: raise TypeError(...)

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/dcb4ed62/attachment.html>

From jon+python-dev at unequivocal.co.uk  Fri Apr  8 10:18:47 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Fri, 8 Apr 2016 15:18:47 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode
 revisited)
Message-ID: <20160408141847.GQ4951@unequivocal.co.uk>

I've made another attempt at Python sandboxing, which does something
which I've not seen tried before - using the 'ast' module to do static
analysis of the untrusted code before it's executed, to prevent most
of the sneaky tricks that have been used to break out of past attempts
at sandboxes.

In short, I'm turning Python's usual "gentleman's agreement" that you
should not access names and attributes that are indicated as private
by starting with an underscore into a rigidly enforced rule: try and
access anything starting with an underscore and your code will not be
run.

Anyway the code is at https://github.com/jribbens/unsafe
It requires Python 3.4 or later (it could probably be made to work on
Python 2.7 as well, but it would need some changes).

I would be very interested to see if anyone can manage to break it.
Bugs which are trivially fixable are of course welcomed, but the real
question is: is this approach basically sound, or is it fundamentally
unworkable?

From p.f.moore at gmail.com  Fri Apr  8 10:37:45 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 8 Apr 2016 15:37:45 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CACac1F-G1aK8nb4PBhz0z-+beGoLQL34ceQQRXr=54dkWGVkBA@mail.gmail.com>

On 8 April 2016 at 15:18, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> I would be very interested to see if anyone can manage to break it.
> Bugs which are trivially fixable are of course welcomed, but the real
> question is: is this approach basically sound, or is it fundamentally
> unworkable?

What are the limitations? It seems to even block "import" which seems
over-zealous (no import math?)
Paul

From jon+python-dev at unequivocal.co.uk  Fri Apr  8 10:55:36 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Fri, 8 Apr 2016 15:55:36 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACac1F-G1aK8nb4PBhz0z-+beGoLQL34ceQQRXr=54dkWGVkBA@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CACac1F-G1aK8nb4PBhz0z-+beGoLQL34ceQQRXr=54dkWGVkBA@mail.gmail.com>
Message-ID: <20160408145536.GA17895@unequivocal.co.uk>

On Fri, Apr 08, 2016 at 03:37:45PM +0100, Paul Moore wrote:
> On 8 April 2016 at 15:18, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> > I would be very interested to see if anyone can manage to break it.
> > Bugs which are trivially fixable are of course welcomed, but the real
> > question is: is this approach basically sound, or is it fundamentally
> > unworkable?
> 
> What are the limitations? It seems to even block "import" which seems
> over-zealous (no import math?)

The restrictions are:

  Of the builtins, __import__, compile, globals, input, locals,
  memoryview, open, print, type and vars are unavailable (and
  some of the exceptions, but mostly because they're irrelevant).

  You cannot access any name or attribute which starts with "_",
  or is called "gi_frame" or "gi_code".

  You cannot use the "with" statement (although it's possible it might
  be safe for me to add that back in if I also disallow access to
  attributes called "tb_frame").

Importing modules is fundamentally unsafe because the untrusted code
might alter the module, and the altered version would then be used by
the containing application. My code has a "_copy_module" function
which copies (some of) the contents of modules, so some sort of
import functionality of a white-list of modules could be added using
this, but there's no point in me going through working out which
modules are safe to white-list until I'm vaguely confident that my
approach isn't fundamentally broken in the first place.

From arthur at darcet.fr  Fri Apr  8 11:21:38 2016
From: arthur at darcet.fr (Arthur Darcet)
Date: Fri, 8 Apr 2016 17:21:38 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CAOwXji7Ly=O6CJo6kygfJWCJ8Phamnn_-pJMqFHOnNPE7J0f+Q@mail.gmail.com>

On 8 April 2016 at 16:18, Jon Ribbens <jon+python-dev at unequivocal.co.uk>
wrote:

> I've made another attempt at Python sandboxing, which does something
> which I've not seen tried before - using the 'ast' module to do static
> analysis of the untrusted code before it's executed, to prevent most
> of the sneaky tricks that have been used to break out of past attempts
> at sandboxes.
>
> In short, I'm turning Python's usual "gentleman's agreement" that you
> should not access names and attributes that are indicated as private
> by starting with an underscore into a rigidly enforced rule: try and
> access anything starting with an underscore and your code will not be
> run.
>
> Anyway the code is at https://github.com/jribbens/unsafe
> It requires Python 3.4 or later (it could probably be made to work on
> Python 2.7 as well, but it would need some changes).
>
> I would be very interested to see if anyone can manage to break it.
> Bugs which are trivially fixable are of course welcomed, but the real
> question is: is this approach basically sound, or is it fundamentally
> unworkable?
>

If i'm not mistaken, this breaks out:

> exec('open("out", "w").write("a")', {})

because if the second argument of exec does not contain a __builtins__ key,
then a copy of the original builtins module is inserted:
https://docs.python.org/3/library/functions.html#exec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/7e707888/attachment-0001.html>

From ethan at stoneleaf.us  Fri Apr  8 11:33:39 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 08:33:39 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>	<CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>	<57044567.6070308@sdamon.com>	<CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>	<CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>	<CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>	<ne28fo$flu$1@ger.gmane.org>	<CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>	<CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>	<CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>	<570526CE.5080401@stoneleaf.us>	<CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
Message-ID: <5707CF53.5060502@stoneleaf.us>

On 04/08/2016 02:50 AM, Nick Coghlan wrote:

>> Built-in? (name is dependent on #1 if we add one)
>
> os.fspath (alongside os.fsencode and os.fsdecode)

I like this better.


>> Add the method/attribute to str? (I assume so, much like __index__() is on
>> int, but I have not seen it explicitly stated so I would rather clarify it)
>
> Makes sense

What will this do?  Return a Path or a str?  I don't think we need either.


>> Expand the C API to have something like PyObject_Path()?
>
> PyUnicode_FromFSPath, perhaps? The return type is well-defined here,
> so it can be done as an alternate constructor, and the C API
> counterparts of os.fsdecode and os.fsencode are PyUnicode functions
> (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault)

So this will do the same thing as os.fspath() at the C level, yes?


> It's worth summarising in a PEP at least for communications purposes -
> very easy for folks that don't follow python-dev to miss otherwise.
> Plus my specific API suggestions are pretty different from Ethan's :)

*sigh*  Okay.

--
~Ethan~


From brett at python.org  Fri Apr  8 11:41:30 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 15:41:30 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5707CF53.5060502@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <5707CF53.5060502@stoneleaf.us>
Message-ID: <CAP1=2W4kkCVJHrHZ6XhKukGAWVQencE=Dg-pbmNUPgTSgFaAkA@mail.gmail.com>

On Fri, 8 Apr 2016 at 08:33 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/08/2016 02:50 AM, Nick Coghlan wrote:
>
> >> Built-in? (name is dependent on #1 if we add one)
> >
> > os.fspath (alongside os.fsencode and os.fsdecode)
>
> I like this better.
>
>
> >> Add the method/attribute to str? (I assume so, much like __index__() is
> on
> >> int, but I have not seen it explicitly stated so I would rather clarify
> it)
> >
> > Makes sense
>
> What will this do?  Return a Path or a str?  I don't think we need either.
>

When I brought this up it was to return self.


>
>
> >> Expand the C API to have something like PyObject_Path()?
> >
> > PyUnicode_FromFSPath, perhaps? The return type is well-defined here,
> > so it can be done as an alternate constructor, and the C API
> > counterparts of os.fsdecode and os.fsencode are PyUnicode functions
> > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault)
>
> So this will do the same thing as os.fspath() at the C level, yes?
>

Yes.


>
>
> > It's worth summarising in a PEP at least for communications purposes -
> > very easy for folks that don't follow python-dev to miss otherwise.
> > Plus my specific API suggestions are pretty different from Ethan's :)
>
> *sigh*  Okay
>

Chris Angelico and I have been asked by Guido to work together to come up
with a proposal after all the discussions are finished and it will most
likely be a patch to the pathlib PEP.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/be69840c/attachment.html>

From jon+python-dev at unequivocal.co.uk  Fri Apr  8 11:44:15 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Fri, 8 Apr 2016 16:44:15 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAOwXji7Ly=O6CJo6kygfJWCJ8Phamnn_-pJMqFHOnNPE7J0f+Q@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAOwXji7Ly=O6CJo6kygfJWCJ8Phamnn_-pJMqFHOnNPE7J0f+Q@mail.gmail.com>
Message-ID: <20160408154415.GB17895@unequivocal.co.uk>

On Fri, Apr 08, 2016 at 05:21:38PM +0200, Arthur Darcet wrote:
>    If i'm not mistaken, this breaks out:
>    > exec('open("out", "w").write("a")', {})
>    because if the second argument of exec does not contain a __builtins__
>    key, then a copy of the original builtins module is inserted:
>    https://docs.python.org/3/library/functions.html#exec

Ah, that's a good point. I did think allowing eval/exec was a bit
ambitious. I've updated it to disallow passing namespace arguments to
them.

From koriakin at 0x04.net  Fri Apr  8 11:49:12 2016
From: koriakin at 0x04.net (=?UTF-8?Q?Marcin_Ko=c5=9bcielnicki?=)
Date: Fri, 8 Apr 2016 17:49:12 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <5707D2F8.9080901@0x04.net>

On 08/04/16 16:18, Jon Ribbens wrote:
> I've made another attempt at Python sandboxing, which does something
> which I've not seen tried before - using the 'ast' module to do static
> analysis of the untrusted code before it's executed, to prevent most
> of the sneaky tricks that have been used to break out of past attempts
> at sandboxes.
>
> In short, I'm turning Python's usual "gentleman's agreement" that you
> should not access names and attributes that are indicated as private
> by starting with an underscore into a rigidly enforced rule: try and
> access anything starting with an underscore and your code will not be
> run.
>
> Anyway the code is at https://github.com/jribbens/unsafe
> It requires Python 3.4 or later (it could probably be made to work on
> Python 2.7 as well, but it would need some changes).
>
> I would be very interested to see if anyone can manage to break it.
> Bugs which are trivially fixable are of course welcomed, but the real
> question is: is this approach basically sound, or is it fundamentally
> unworkable?

That one is trivially fixable, but here goes:

async def a():
     global c
     c = b.cr_frame.f_back.f_back.f_back

b = a()
b.send(None)
c.f_builtins['print']('broken')

Also, if the point of giving me a subclass of datetime is to prevent 
access to the actual class, that can be circumvented:

 >>> real_datetime = datetime.datetime.mro()[1]
 >>> real_datetime
<class 'datetime.datetime'>

But I'm not sure what good that is.

From chris.barker at noaa.gov  Fri Apr  8 12:04:23 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 Apr 2016 09:04:23 -0700
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
Message-ID: <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>

On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 7 April 2016 at 03:26, Brett Cannon <brett at python.org> wrote:
>


> > Method or attribute? (changes what kind of one-liner you might use in
> > libraries, but I think historically all protocols have been methods and
> the
> > serialized string representation might be costly to build)
>

couldn't it be a property?


> Method, as long as there's a helper function somewhere


what has the helper function got to do with whether it's a method or
attribute (would we call a property an attribute here?)

> Built-in? (name is dependent on #1 if we add one)
>
> os.fspath (alongside os.fsencode and os.fsdecode)
>
> (Putting this in a module low in the dependency stack makes it easy
> for other modules to access without pulling in all of pathlib's
> dependencies)


Iike that -- though still =0.5 on having one at all -- this is only going
to be used by the stdlib and other path-using libraries, not user code --
is that that hard to call obj.__fspath__() ?

> Add the method/attribute to str? (I assume so, much like __index__() is on
> > int, but I have not seen it explicitly stated so I would rather clarify
> it)
>

I thought the whole point off all this is that not any old string can be a
path! (whereas any int can be an index). Unless we go with Chris A's
suggestion that this be a more generic lossless string protocol, rather
than just for paths.


> It's worth summarising in a PEP at least for communications purposes -
> very easy for folks that don't follow python-dev to miss otherwise.
>

I'd say add it to the existing pathlib PEP -- along with the extra
discussion of why Path does not inherit from str.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/c37416dc/attachment.html>

From status at bugs.python.org  Fri Apr  8 12:08:40 2016
From: status at bugs.python.org (Python tracker)
Date: Fri,  8 Apr 2016 18:08:40 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20160408160840.E1D44568A7@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2016-04-01 - 2016-04-08)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    5477 ( +6)
  closed 32993 (+22)
  total  38470 (+28)

Open issues with patches: 2381 


Issues opened (23)
==================

#26686: email.parser stops parsing headers too soon.
http://bugs.python.org/issue26686  opened by msapiro

#26687: Use Py_RETURN_NONE in sqlite3 module
http://bugs.python.org/issue26687  opened by berker.peksag

#26689: Add `has_flag` method to `distutils.CCompiler`
http://bugs.python.org/issue26689  opened by sylvain.corlay

#26692: cgroups support in multiprocessing
http://bugs.python.org/issue26692  opened by Satrajit Ghosh

#26693: Exception ignored in: <module 'threading' from ...threading.py
http://bugs.python.org/issue26693  opened by skydoom

#26694: Disasembler fall with Key Error while disassemble obfuscated c
http://bugs.python.org/issue26694  opened by pulina

#26695: pickle and _pickle accelerator have different behavior when un
http://bugs.python.org/issue26695  opened by josh.r

#26696: Document collections.abc.ByteString
http://bugs.python.org/issue26696  opened by brett.cannon

#26697: tkFileDialog crash on askopenfilename Python 2.7 64-bit Win7
http://bugs.python.org/issue26697  opened by Eric Johnson

#26698: Tk DPI awareness
http://bugs.python.org/issue26698  opened by westley.martinez

#26699: locale.str docstring is incorrect: "Convert float to integer"
http://bugs.python.org/issue26699  opened by mark.dickinson

#26700: Make digest_size a class variable
http://bugs.python.org/issue26700  opened by rhettinger

#26701: Documentation for int constructor mentions __int__ but not __t
http://bugs.python.org/issue26701  opened by Robert Smallshire2

#26702: A better assert statement
http://bugs.python.org/issue26702  opened by barry

#26703: Socket state corrupts when original socket object goes out of 
http://bugs.python.org/issue26703  opened by JoshN

#26704: unittest.mock.patch: Double patching instance method: Attribut
http://bugs.python.org/issue26704  opened by asottile

#26705: logging.Handler.handleError should be called from logging.Hand
http://bugs.python.org/issue26705  opened by palaviv

#26706: Update OpenSSL version in readme
http://bugs.python.org/issue26706  opened by scw

#26707: plistlib fails to parse bplist with 0x80 UID values
http://bugs.python.org/issue26707  opened by slo.sleuth

#26708: Constify C string pointers in the posix module
http://bugs.python.org/issue26708  opened by serhiy.storchaka

#26710: ConfigParser: Values in DEFAULT section override defaults pass
http://bugs.python.org/issue26710  opened by Marc.Abramowitz

#26711: Fix comparison of plistlib.Data
http://bugs.python.org/issue26711  opened by serhiy.storchaka

#26712: Unify (r)split(), (l/r)strip() method tests
http://bugs.python.org/issue26712  opened by martin.panter



Most recent 15 issues with no replies (15)
==========================================

#26706: Update OpenSSL version in readme
http://bugs.python.org/issue26706

#26700: Make digest_size a class variable
http://bugs.python.org/issue26700

#26699: locale.str docstring is incorrect: "Convert float to integer"
http://bugs.python.org/issue26699

#26697: tkFileDialog crash on askopenfilename Python 2.7 64-bit Win7
http://bugs.python.org/issue26697

#26696: Document collections.abc.ByteString
http://bugs.python.org/issue26696

#26695: pickle and _pickle accelerator have different behavior when un
http://bugs.python.org/issue26695

#26694: Disasembler fall with Key Error while disassemble obfuscated c
http://bugs.python.org/issue26694

#26692: cgroups support in multiprocessing
http://bugs.python.org/issue26692

#26672: regrtest missing in the module name
http://bugs.python.org/issue26672

#26669: time.localtime(float("NaN")) does not raise a ValueError on al
http://bugs.python.org/issue26669

#26667: Update importlib to accept pathlib.Path objects
http://bugs.python.org/issue26667

#26665: pip is not bootstrapped by default on 2.7
http://bugs.python.org/issue26665

#26663: asyncio _UnixWritePipeTransport._close abandons unflushed writ
http://bugs.python.org/issue26663

#26661: python fails to locate system libffi
http://bugs.python.org/issue26661

#26660: tempfile.TemporaryDirectory() cleanup exception on Windows if 
http://bugs.python.org/issue26660



Most recent 15 issues waiting for review (15)
=============================================

#26712: Unify (r)split(), (l/r)strip() method tests
http://bugs.python.org/issue26712

#26711: Fix comparison of plistlib.Data
http://bugs.python.org/issue26711

#26708: Constify C string pointers in the posix module
http://bugs.python.org/issue26708

#26707: plistlib fails to parse bplist with 0x80 UID values
http://bugs.python.org/issue26707

#26706: Update OpenSSL version in readme
http://bugs.python.org/issue26706

#26705: logging.Handler.handleError should be called from logging.Hand
http://bugs.python.org/issue26705

#26704: unittest.mock.patch: Double patching instance method: Attribut
http://bugs.python.org/issue26704

#26689: Add `has_flag` method to `distutils.CCompiler`
http://bugs.python.org/issue26689

#26687: Use Py_RETURN_NONE in sqlite3 module
http://bugs.python.org/issue26687

#26685: Raise errors from socket.close()
http://bugs.python.org/issue26685

#26680: Incorporating float.is_integer into the numeric tower and Deci
http://bugs.python.org/issue26680

#26661: python fails to locate system libffi
http://bugs.python.org/issue26661

#26658: test_os fails when run on Windows ramdisk
http://bugs.python.org/issue26658

#26657: Directory traversal with http.server and SimpleHTTPServer on w
http://bugs.python.org/issue26657

#26651: Deprecate register_adapter() and register_converter() in sqlit
http://bugs.python.org/issue26651



Top 10 most discussed issues (10)
=================================

#26680: Incorporating float.is_integer into the numeric tower and Deci
http://bugs.python.org/issue26680  11 msgs

#26693: Exception ignored in: <module 'threading' from ...threading.py
http://bugs.python.org/issue26693  10 msgs

#23551: IDLE to provide menu link to PIP gui.
http://bugs.python.org/issue23551   9 msgs

#26689: Add `has_flag` method to `distutils.CCompiler`
http://bugs.python.org/issue26689   9 msgs

#26703: Socket state corrupts when original socket object goes out of 
http://bugs.python.org/issue26703   9 msgs

#18844: allow weights in random.choice
http://bugs.python.org/issue18844   8 msgs

#24291: wsgiref.handlers.SimpleHandler truncates large output blobs
http://bugs.python.org/issue24291   7 msgs

#25609: Add a ContextManager ABC and type
http://bugs.python.org/issue25609   7 msgs

#26257: Eliminate buffer_tests.py
http://bugs.python.org/issue26257   7 msgs

#26707: plistlib fails to parse bplist with 0x80 UID values
http://bugs.python.org/issue26707   7 msgs



Issues closed (21)
==================

#6953: readline documentation needs work
http://bugs.python.org/issue6953  closed by martin.panter

#10796: Improve doc for readline.set_completer_delims()
http://bugs.python.org/issue10796  closed by martin.panter

#23371: mimetypes initialization fails on Windows because of TypeError
http://bugs.python.org/issue23371  closed by berker.peksag

#23735: Readline not adjusting width after resize with 6.3
http://bugs.python.org/issue23735  closed by martin.panter

#25951: SSLSocket.sendall() does not return None on success like socke
http://bugs.python.org/issue25951  closed by martin.panter

#25987: collections.abc.Reversible
http://bugs.python.org/issue25987  closed by gvanrossum

#26234: The typing module includes 're' and 'io' in __all__
http://bugs.python.org/issue26234  closed by gvanrossum

#26295: Random failures when running test suite in parallel (-m test -
http://bugs.python.org/issue26295  closed by haypo

#26391: typing: Specialized sub-classes of Generic never call __init__
http://bugs.python.org/issue26391  closed by gvanrossum

#26479: Init documentation typo "may be return" > "may NOT be returned
http://bugs.python.org/issue26479  closed by Samuel Colvin

#26509: asyncio: spurious ConnectionAbortedError logged on Windows
http://bugs.python.org/issue26509  closed by haypo

#26586: Simple enhancement to BaseHTTPRequestHandler
http://bugs.python.org/issue26586  closed by martin.panter

#26671: Clean up path_converter in posixmodule.c
http://bugs.python.org/issue26671  closed by serhiy.storchaka

#26673: Tkinter error when opening IDLE configuration menu
http://bugs.python.org/issue26673  closed by terry.reedy

#26678: Incorrect linking to elements in datetime package
http://bugs.python.org/issue26678  closed by martin.panter

#26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted
http://bugs.python.org/issue26679  closed by berker.peksag

#26688: unittest2 referenced in unittest.mock documentation
http://bugs.python.org/issue26688  closed by berker.peksag

#26690: PyUnicode_Decode breaks when Python / sqlite3 is built with sq
http://bugs.python.org/issue26690  closed by zzzeek

#26691: Update the typing module to match what's in github.com/python/
http://bugs.python.org/issue26691  closed by gvanrossum

#26709: Year 2038 problem in plistlib
http://bugs.python.org/issue26709  closed by serhiy.storchaka

#26713: Change f-literal grammar so that escaping isn???t possible or 
http://bugs.python.org/issue26713  closed by r.david.murray

From k7hoven at gmail.com  Fri Apr  8 12:02:04 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Fri, 8 Apr 2016 19:02:04 +0300
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
Message-ID: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>

Nick Coghlan wrote:
> On 7 April 2016 at 03:26, Brett Cannon <br... at python.org> wrote:
>>
>> Name: __path__, __fspath__, or something else?
>
> __fspath__
>

I think I might like this dunder name because it does not clutter the
list of regular methods and attributes, and is perhaps more pythonic.

>> Method or attribute? (changes what kind of one-liner you might use in
>> libraries, but I think historically all protocols have been methods and the
>> serialized string representation might be costly to build)
>
> Method, as long as there's a helper function somewhere

As a further minor benefit of it being a method, it may be easier to
distinguish it from from `__path__`, which is an iterable attribute.

>> Built-in? (name is dependent on #1 if we add one)
>
> os.fspath (alongside os.fsencode and os.fsdecode)
>
> (Putting this in a module low in the dependency stack makes it easy
> for other modules to access without pulling in all of pathlib's
> dependencies)

Strong +1 on putting it in os. This should also be implemented in
DirEntry, instances of which are "yielded" by os.scandir.

Also, you have a strong case regarding naming with the 'fs' prefix. It
is also easier to read fspath as f-s-path than it is to read ospath as
o-s-path, because ospath could also be pronounced as a single
(meaningless?) word.

I'm still thinking a little bit about 'pathname', which to me sounds
more like a string than fspath does [1]. It would be nice to have the
string/path distinction especially when pathlib adoption grows larger.
But who knows, maybe somewhere in the far future, no-one will care
much about fspath, fsencode, fsdecode or os.path.

>> Add the method/attribute to str? (I assume so, much like __index__() is on
>> int, but I have not seen it explicitly stated so I would rather clarify it)
>
> Makes sense

If added to str, it should also be added to bytes. But will that then
return str or bytes? See also the next point.

> Expand the C API to have something like PyObject_Path()?
>
> PyUnicode_FromFSPath, perhaps? The return type is well-defined here,
> so it can be done as an alternate constructor, and the C API
> counterparts of os.fsdecode and os.fsencode are PyUnicode functions
> (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault)

What about DirEntry, which may have a bytes representation? I would
expect the function return type of os.fspath to be Union[str, bytes],
unless bytes pathnames are decoded with surrogate escapes.

[1] https://mail.python.org/pipermail/python-ideas/2016-April/039595.html


PS. I have been reading this list occasionally on the google groups
mirror, and I now subscribed to it just to send this. (BTW, I probably
broke the thread, as I did not have Nick's email in my inbox to reply
to. Sorry about that.) I'll have to mention that I was surprised, to
say the least, to find that the pathlib discussion had moved here from
python-ideas, where I had mentioned I was working on a proposal. Then,
I also found that the solution discussed here was seemingly an
improved version of what I had proposed on python-ideas somewhat
earlier [1], but did not get any reactions to. While I can only make
guesses about what happened, these kinds of things easily make you go
from "Hey, maybe I'll be able to do something to improve Python!" to
"These people don't seem to want me here or appreciate my efforts.".
Not to accuse anyone in particular; just to let people know. Anyway, I
somehow got sucked into thinking deeply about pathlib etc. (which I do
use). Not that I really have much at stake here, except spending
ridiculous amounts of time thinking about paths, mainly during my
Easter holidays and after that. I really had a hard time explaining to
friends and family what the heck I was doing ;).

From rosuav at gmail.com  Fri Apr  8 12:20:49 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 9 Apr 2016 02:20:49 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CAPTjJmpXTbWWY22kNhQqXu_hO-_zoWJ+MEKu4i+KWGY2D-AfUw@mail.gmail.com>

On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> Anyway the code is at https://github.com/jribbens/unsafe
> It requires Python 3.4 or later (it could probably be made to work on
> Python 2.7 as well, but it would need some changes).

Not being a security expert, I'm not the best one to try to break it
maliciously; but I can break things accidentally. Pull request sent
through. :)

ChrisA

From ethan at stoneleaf.us  Fri Apr  8 12:30:38 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 09:30:38 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W4kkCVJHrHZ6XhKukGAWVQencE=Dg-pbmNUPgTSgFaAkA@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <5707CF53.5060502@stoneleaf.us>
 <CAP1=2W4kkCVJHrHZ6XhKukGAWVQencE=Dg-pbmNUPgTSgFaAkA@mail.gmail.com>
Message-ID: <5707DCAE.9010407@stoneleaf.us>

On 04/08/2016 08:41 AM, Brett Cannon wrote:
> On Fri, 8 Apr 2016 at 08:33 Ethan Furman wrote:
 >> Brett previously queried:

>>> Add the method/attribute to str? (I assume so, much like
>>> __index__() is on int, but I have not seen it explicitly
 >>> stated so I would rather clarify it)
>
>> What will this do?  Return a Path or a str?  I don't think
 >> we need either.
>
> When I brought this up it was to return self.

Okay, thanks.

> Chris Angelico and I have been asked by Guido to work together to come
> up with a proposal after all the discussions are finished and it will
> most likely be a patch to the pathlib PEP.

Cool.  I wasn't looking forward to that part.

--
~Ethan~

From ethan at stoneleaf.us  Fri Apr  8 12:36:03 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 09:36:03 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
Message-ID: <5707DDF3.5030106@stoneleaf.us>

On 04/08/2016 09:04 AM, Chris Barker wrote:
> On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote:

>> Method, as long as there's a helper function somewhere
>
> what has the helper function got to do with whether it's a method or
> attribute (would we call a property an attribute here?)
>
>> Built-in? (name is dependent on #1 if we add one)
>
> os.fspath (alongside os.fsencode and os.fsdecode)
>
> [...] this is only going to be used by the stdlib and other
 > path-using libraries, not user code -- is that that hard to
 > call obj.__fspath__() ?

1) user code may call it
2) folks who write libraries are still users ;)
3) using __dunder__s directly is usually poor form.

> I thought the whole point off all this is that not any old string can be
> a path! (whereas any int can be an index). Unless we go with Chris A's
> suggestion that this be a more generic lossless string protocol, rather
> than just for paths.

That does seem to be a valid point against str.__fspath__.

--
~Ethan~


From chris.barker at noaa.gov  Fri Apr  8 12:42:49 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 Apr 2016 09:42:49 -0700
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
Message-ID: <CALGmxEKPrfjZvw7wB+bst_nczNks59h3p=Gs1v-J+Q_aH6Qjfg@mail.gmail.com>

On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> I'm still thinking a little bit about 'pathname', which to me sounds
> more like a string than fspath does [1].


I like that a lot - or even "__pathstr__" or "__pathstring__"

after all, we're making a big deal out of the fact that a path is *not a
string*, but rather a string is a *representation* (or serialization) of a
path.


> If added to str, it should also be added to bytes.


ouch! not sure I want to go there, though...


>  I'll have to mention that I was surprised, to
> say the least, to find that the pathlib discussion had moved here from
> python-ideas, where I had mentioned I was working on a proposal.

...

>  While I can only make
> guesses about what happened, these kinds of things easily make you go
> from "Hey, maybe I'll be able to do something to improve Python!" to
> "These people don't seem to want me here or appreciate my efforts.".
>

For the record, this is pretty rare -- and it was announced on -ideas that
the discussion had started up here -- maybe you missed that post?

I think in this case, there were ideas over on -ideas, but then it was
decided (by whom, who knows?) that the goal of supporting PAth in the
stdlib was decided upon, so it was time to talk implementation, rather than
ideas -- thus python-dev. In fact, the implementation turned out to be less
straightforward than originally thought, so maybe it should have stayed on
-ideas, but there you go.


> Not to accuse anyone in particular; just to let people know. Anyway, I
> somehow got sucked into thinking deeply about pathlib etc. (which I do
> use). Not that I really have much at stake here, except spending
> ridiculous amounts of time thinking about paths, mainly during my
> Easter holidays and after that. I really had a hard time explaining to
> friends and family what the heck I was doing ;).


speaking only for me - thanks for your contribution -- I'm glad you found
the discussion here.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/67c4e487/attachment-0001.html>

From jon+python-dev at unequivocal.co.uk  Fri Apr  8 12:47:16 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Fri, 8 Apr 2016 17:47:16 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <5707D2F8.9080901@0x04.net>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <5707D2F8.9080901@0x04.net>
Message-ID: <20160408164715.GC17895@unequivocal.co.uk>

On Fri, Apr 08, 2016 at 05:49:12PM +0200, Marcin Ko?cielnicki wrote:
> On 08/04/16 16:18, Jon Ribbens wrote:
> That one is trivially fixable, but here goes:
> 
> async def a():
>     global c
>     c = b.cr_frame.f_back.f_back.f_back
> 
> b = a()
> b.send(None)
> c.f_builtins['print']('broken')

Ah, I've not used Python 3.5, and I can't find any documentation on
this cr_frame business, but I've added cr_frame and f_back to the
disallowed attributes list.

> Also, if the point of giving me a subclass of datetime is to prevent access
> to the actual class, that can be circumvented:
> 
> >>> real_datetime = datetime.datetime.mro()[1]
> >>> real_datetime
> <class 'datetime.datetime'>
> 
> But I'm not sure what good that is.

It means you can alter the datetime class that is used by the
containing application, which is bad - you could lie to it about
what day it is for example ;-)

I've made it so instead of a direct subclass it now makes an
intermediate subclass which makes mro() return an empty list.

From brett at python.org  Fri Apr  8 13:26:36 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 17:26:36 +0000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
Message-ID: <CAP1=2W6U6W+Q0i-aAS-dOd6YVxfHSaw2+0AHUPukQORen7Jtww@mail.gmail.com>

On Fri, 8 Apr 2016 at 09:05 Chris Barker <chris.barker at noaa.gov> wrote:

> On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> On 7 April 2016 at 03:26, Brett Cannon <brett at python.org> wrote:
>>
>
>
>> > Method or attribute? (changes what kind of one-liner you might use in
>> > libraries, but I think historically all protocols have been methods and
>> the
>> > serialized string representation might be costly to build)
>>
>
> couldn't it be a property?
>

A property is a method pretending to be an attribute, so yes. :)


>
>
>> Method, as long as there's a helper function somewhere
>
>
> what has the helper function got to do with whether it's a method or
> attribute (would we call a property an attribute here?)
>

Yes, a property is an attribute in this instance. And it somewhat tweaks
how simple of a one-liner is needed which in turn makes the function either
nearly redundant or helpful. With an attribute:

  getattr(path, '__ospath__', path)

With a method:

  path.__ospath__() if hasattr(path, '__ospath__') else path


>
> > Built-in? (name is dependent on #1 if we add one)
>>
>> os.fspath (alongside os.fsencode and os.fsdecode)
>>
>> (Putting this in a module low in the dependency stack makes it easy
>> for other modules to access without pulling in all of pathlib's
>> dependencies)
>
>
> Iike that -- though still =0.5 on having one at all -- this is only going
> to be used by the stdlib and other path-using libraries, not user code --
> is that that hard to call obj.__fspath__() ?
>

With a function we can add some type checking so that you know you are
getting back a string and not something else like an file descriptor int or
something.


>
> > Add the method/attribute to str? (I assume so, much like __index__() is
>> on
>> > int, but I have not seen it explicitly stated so I would rather clarify
>> it)
>>
>
> I thought the whole point off all this is that not any old string can be a
> path! (whereas any int can be an index). Unless we go with Chris A's
> suggestion that this be a more generic lossless string protocol, rather
> than just for paths.
>

The whole point is to not treat a path object like any old string. We still
have to support a string someone created that is a valid path. Remember,
what we're trying to avoid is people simply doing `str(path)` everywhere
since that works with e.g. None.


>
>
>> It's worth summarising in a PEP at least for communications purposes -
>> very easy for folks that don't follow python-dev to miss otherwise.
>>
>
> I'd say add it to the existing pathlib PEP -- along with the extra
> discussion of why Path does not inherit from str.
>

That's the plan.

-Brett


>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/1179cf1e/attachment.html>

From brett at python.org  Fri Apr  8 13:32:23 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 17:32:23 +0000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
Message-ID: <CAP1=2W6nKh9BruP43nZSWCM+B106GML4wZtzRojuDEJFD7gptQ@mail.gmail.com>

On Fri, 8 Apr 2016 at 09:13 Koos Zevenhoven <k7hoven at gmail.com> wrote:

> Nick Coghlan wrote:
> > On 7 April 2016 at 03:26, Brett Cannon <br... at python.org> wrote:
> >>
> >> Name: __path__, __fspath__, or something else?
> >
> > __fspath__
> >
>
> I think I might like this dunder name because it does not clutter the
> list of regular methods and attributes, and is perhaps more pythonic.
>
> >> Method or attribute? (changes what kind of one-liner you might use in
> >> libraries, but I think historically all protocols have been methods and
> the
> >> serialized string representation might be costly to build)
> >
> > Method, as long as there's a helper function somewhere
>
> As a further minor benefit of it being a method, it may be easier to
> distinguish it from from `__path__`, which is an iterable attribute.
>
> >> Built-in? (name is dependent on #1 if we add one)
> >
> > os.fspath (alongside os.fsencode and os.fsdecode)
> >
> > (Putting this in a module low in the dependency stack makes it easy
> > for other modules to access without pulling in all of pathlib's
> > dependencies)
>
> Strong +1 on putting it in os. This should also be implemented in
> DirEntry, instances of which are "yielded" by os.scandir.
>
> Also, you have a strong case regarding naming with the 'fs' prefix. It
> is also easier to read fspath as f-s-path than it is to read ospath as
> o-s-path, because ospath could also be pronounced as a single
> (meaningless?) word.
>
> I'm still thinking a little bit about 'pathname', which to me sounds
> more like a string than fspath does [1]. It would be nice to have the
> string/path distinction especially when pathlib adoption grows larger.
> But who knows, maybe somewhere in the far future, no-one will care
> much about fspath, fsencode, fsdecode or os.path.
>
> >> Add the method/attribute to str? (I assume so, much like __index__() is
> on
> >> int, but I have not seen it explicitly stated so I would rather clarify
> it)
> >
> > Makes sense
>
> If added to str, it should also be added to bytes. But will that then
> return str or bytes? See also the next point.
>
> > Expand the C API to have something like PyObject_Path()?
> >
> > PyUnicode_FromFSPath, perhaps? The return type is well-defined here,
> > so it can be done as an alternate constructor, and the C API
> > counterparts of os.fsdecode and os.fsencode are PyUnicode functions
> > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault)
>
> What about DirEntry, which may have a bytes representation? I would
> expect the function return type of os.fspath to be Union[str, bytes],
> unless bytes pathnames are decoded with surrogate escapes.
>
> [1] https://mail.python.org/pipermail/python-ideas/2016-April/039595.html
>
>
> PS. I have been reading this list occasionally on the google groups
> mirror, and I now subscribed to it just to send this. (BTW, I probably
> broke the thread, as I did not have Nick's email in my inbox to reply
> to. Sorry about that.) I'll have to mention that I was surprised, to
> say the least, to find that the pathlib discussion had moved here from
> python-ideas, where I had mentioned I was working on a proposal. Then,
> I also found that the solution discussed here was seemingly an
> improved version of what I had proposed on python-ideas somewhat
> earlier [1], but did not get any reactions to. While I can only make
> guesses about what happened, these kinds of things easily make you go
> from "Hey, maybe I'll be able to do something to improve Python!" to
> "These people don't seem to want me here or appreciate my efforts.".
> Not to accuse anyone in particular; just to let people know. Anyway, I
> somehow got sucked into thinking deeply about pathlib etc. (which I do
> use). Not that I really have much at stake here, except spending
> ridiculous amounts of time thinking about paths, mainly during my
> Easter holidays and after that. I really had a hard time explaining to
> friends and family what the heck I was doing ;).
>

Since I kicked up the discussion here on python-dev, I can explain what
happened.

After the python-ideas threads kicked up I realized I was not using pathlib
in importlib and there were a handful of places it could be supported. But
since pathlib is provisional I didn't want to have to start making the
stdlib support it if we removed the whole module itself. So I simply asked
over here on python-dev what it would take to remove the provisional label
from pathlib. People then pulled over the python-ideas discussion of what
people were upset about in regards to pathlib to help decide what it would
require to remove the provisional label and the conversation forked (I also
assumed Guido and others had muted the discussion over on python-ideas so
it would have been a new thread somewhere regardless). And then when I
realized what had happened I was going to reply to one of your emails on
python-ideas to point out the bifurcation but someone beat me to it.

So the whole thing just became a tangled mess of discussion. :) I viewed
the threads on improving pathlib as separate from a discussion of what the
requirements were to remove the provisional label and very specific to
python-dev since this isn't an idea of a concrete development/maintenance
question, but people tied the two together and that's how we ended up here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/5a506dfe/attachment.html>

From jon+python-dev at unequivocal.co.uk  Fri Apr  8 13:34:49 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Fri, 8 Apr 2016 18:34:49 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmpXTbWWY22kNhQqXu_hO-_zoWJ+MEKu4i+KWGY2D-AfUw@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAPTjJmpXTbWWY22kNhQqXu_hO-_zoWJ+MEKu4i+KWGY2D-AfUw@mail.gmail.com>
Message-ID: <20160408173449.GD17895@unequivocal.co.uk>

On Sat, Apr 09, 2016 at 02:20:49AM +1000, Chris Angelico wrote:
> On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
> <jon+python-dev at unequivocal.co.uk> wrote:
> > Anyway the code is at https://github.com/jribbens/unsafe
> > It requires Python 3.4 or later (it could probably be made to work on
> > Python 2.7 as well, but it would need some changes).
> 
> Not being a security expert, I'm not the best one to try to break it
> maliciously; but I can break things accidentally. Pull request sent
> through. :)

Thanks, I've merged that in.

From brett at python.org  Fri Apr  8 13:34:57 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 17:34:57 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <5707DDF3.5030106@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
 <5707DDF3.5030106@stoneleaf.us>
Message-ID: <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>

On Fri, 8 Apr 2016 at 09:39 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/08/2016 09:04 AM, Chris Barker wrote:
> > On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote:
>
> >> Method, as long as there's a helper function somewhere
> >
> > what has the helper function got to do with whether it's a method or
> > attribute (would we call a property an attribute here?)
> >
> >> Built-in? (name is dependent on #1 if we add one)
> >
> > os.fspath (alongside os.fsencode and os.fsdecode)
> >
> > [...] this is only going to be used by the stdlib and other
>  > path-using libraries, not user code -- is that that hard to
>  > call obj.__fspath__() ?
>
> 1) user code may call it
> 2) folks who write libraries are still users ;)
> 3) using __dunder__s directly is usually poor form.
>
> > I thought the whole point off all this is that not any old string can be
> > a path! (whereas any int can be an index). Unless we go with Chris A's
> > suggestion that this be a more generic lossless string protocol, rather
> > than just for paths.
>
> That does seem to be a valid point against str.__fspath__.
>

Yep, and I'm expecting we won't want that at this point. The fact that
paths need strings for low-level OS stuff is a historical and technical
detail, so no need to drag the entire str type into it if we can provide a
reasonable helper function (for either the ABC or magic method solution).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/7dc3cdc2/attachment.html>

From ethan at stoneleaf.us  Fri Apr  8 13:36:13 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 10:36:13 -0700
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CALGmxEKPrfjZvw7wB+bst_nczNks59h3p=Gs1v-J+Q_aH6Qjfg@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
 <CALGmxEKPrfjZvw7wB+bst_nczNks59h3p=Gs1v-J+Q_aH6Qjfg@mail.gmail.com>
Message-ID: <5707EC0D.907@stoneleaf.us>

On 04/08/2016 09:42 AM, Chris Barker wrote:
> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:

>> While I can only make guesses about what happened, these kinds of
 >> things easily make you go from "Hey, maybe I'll be able to do something
 >> to improve Python!" to "These people don't seem to want me here or
 >> appreciate my efforts.".

Ouch, sorry about that.  Glad to have you on -Dev, too.

--
~Ethan~

From k7hoven at gmail.com  Fri Apr  8 13:46:14 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Fri, 8 Apr 2016 20:46:14 +0300
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CALGmxEKPrfjZvw7wB+bst_nczNks59h3p=Gs1v-J+Q_aH6Qjfg@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
 <CALGmxEKPrfjZvw7wB+bst_nczNks59h3p=Gs1v-J+Q_aH6Qjfg@mail.gmail.com>
Message-ID: <CAMiohoidyKzdVHD4d5gcXx6y7hje5TBCc5FydMifJ2aUN=U1GQ@mail.gmail.com>

On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:
>>
>> I'm still thinking a little bit about 'pathname', which to me sounds
>> more like a string than fspath does [1].
>
>
> I like that a lot - or even "__pathstr__" or "__pathstring__"
>
> after all, we're making a big deal out of the fact that a path is *not a
> string*, but rather a string is a *representation* (or serialization) of a
> path.

For me, the point here is the reverse: that any str is not a path, and
that it is misleading to call it *path* when whole point is to make it
*not* a specialized path object but a plain string. I think it's ok to
think of a path as special kind of string. For instance, an URI is
explicitly defined as a *sequence of characters*, and URIs can be
thought of as a more recent, improved and broadened concept than
paths. This is the point of view I took in my recent proposal, but I
don't think it's the only valid way to think about paths "in theory".
I like the "serialization" interpretation as well, but i tend to think
that that string serialization is what is called a path.

Anyway, I don't think these philosophical considerations should
dictate how Python is implemented. But it is always good to also have
a valid theoretical point of view to back up a design decision.

> For the record, this is pretty rare -- and it was announced on -ideas that
> the discussion had started up here -- maybe you missed that post?

If you mean in Ethan's response to my proposal, I noticed that, but
the discussions here had already gone quite far by that time. Even
more so by the time I had time to see what was going on.

I do have to say this is not the first time I felt there was some sort
of hostility towards newcomers on python-ideas. Sure, it might be
partly because those people don't know the culture on the list, but
I'm not sure if that should be used as an excuse.

-Koos

From ethan at stoneleaf.us  Fri Apr  8 14:13:47 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 11:13:47 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
Message-ID: <5707F4DB.7000501@stoneleaf.us>

On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
 > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
 >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:

 >>> I'm still thinking a little bit about 'pathname', which to me sounds
 >>> more like a string than fspath does.
 >>
 >>
 >> I like that a lot - or even "__pathstr__" or "__pathstring__"
 >> after all, we're making a big deal out of the fact that a path is
 >> *not a string*, but rather a string is a *representation* (or
 >> serialization) of a path.

That's a decent point.

So the plausible choices are, I think:

- __fspath__  # File System Path -- possible confusion with Path

- __fsstr__   # File System String

- __fspathstr__ # File System Path String -- zero ambiguity, but
                 # what a mouthful

--
~Ethan~

From chris.barker at noaa.gov  Fri Apr  8 14:20:21 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 Apr 2016 11:20:21 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <5707F4DB.7000501@stoneleaf.us>
References: <5707F4DB.7000501@stoneleaf.us>
Message-ID: <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>

On Fri, Apr 8, 2016 at 11:13 AM, Ethan Furman <ethan at stoneleaf.us> wrote:

> So the plausible choices are, I think:
>
> - __fspath__  # File System Path -- possible confusion with Path
>
> - __fsstr__   # File System String


I think we really need "path" in there somewhere....


>
> - __fspathstr__ # File System Path String -- zero ambiguity, but
>                 # what a mouthful
>

we rejected plain old __path__ because this is already ued in another
context, but if we add "str" on the end, that's not longer an issue, so do
we need the "fs"?

__pathstr__ # pathstring

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/deb4ee45/attachment.html>

From brett at python.org  Fri Apr  8 14:25:25 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 18:25:25 +0000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <5707F4DB.7000501@stoneleaf.us>
References: <5707F4DB.7000501@stoneleaf.us>
Message-ID: <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>

On Fri, 8 Apr 2016 at 11:13 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
>
>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
>  >>> more like a string than fspath does.
>  >>
>  >>
>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
>  >> after all, we're making a big deal out of the fact that a path is
>  >> *not a string*, but rather a string is a *representation* (or
>  >> serialization) of a path.
>
> That's a decent point.
>
> So the plausible choices are, I think:
>
> - __fspath__  # File System Path -- possible confusion with Path
>

+1


>
> - __fsstr__   # File System String
>

-1 Looks like a cat walked across my keyboard  or someone trying to come up
with a trendy startup name.


>
> - __fspathstr__ # File System Path String -- zero ambiguity, but
>                  # what a mouthful
>

-1 See above.

I personally still like __ospath__ as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/56476647/attachment.html>

From k7hoven at gmail.com  Fri Apr  8 14:34:00 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Fri, 8 Apr 2016 21:34:00 +0300
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
Message-ID: <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>

On Fri, Apr 8, 2016 at 9:20 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>
> we rejected plain old __path__ because this is already ued in another
> context, but if we add "str" on the end, that's not longer an issue, so do
> we need the "fs"?
>
> __pathstr__ # pathstring
>

Or perhaps __pathstring__ in case it may be or return byte strings.

-Koos

From chris.barker at noaa.gov  Fri Apr  8 15:03:48 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 Apr 2016 12:03:48 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
Message-ID: <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>

On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> >
> > __pathstr__ # pathstring
> >
>
> Or perhaps __pathstring__ in case it may be or return byte strings.
>

I'm fine with __pathstring__ , but I thought it was already decided that it
would NOT return a bytestring!

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/5a289ec9/attachment.html>

From rosuav at gmail.com  Fri Apr  8 15:09:03 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 9 Apr 2016 05:09:03 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
 <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
Message-ID: <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>

On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker <chris.barker at noaa.gov> wrote:
> On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:
>>
>> >
>> > __pathstr__ # pathstring
>> >
>>
>> Or perhaps __pathstring__ in case it may be or return byte strings.
>
>
> I'm fine with __pathstring__ , but I thought it was already decided that it
> would NOT return a bytestring!

I sincerely hope that's been settled on. There's no reason to have
this ever return anything other than a str. (Famous last words, I
know.)

ChrisA

From brett at python.org  Fri Apr  8 15:24:44 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 19:24:44 +0000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
 <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
 <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
Message-ID: <CAP1=2W6agqzXE76W-zWXsa-UjbxXcK5eKsXxyOAQanVV68T8aQ@mail.gmail.com>

On Fri, 8 Apr 2016 at 12:10 Chris Angelico <rosuav at gmail.com> wrote:

> On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker <chris.barker at noaa.gov>
> wrote:
> > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com>
> wrote:
> >>
> >> >
> >> > __pathstr__ # pathstring
> >> >
> >>
> >> Or perhaps __pathstring__ in case it may be or return byte strings.
> >
> >
> > I'm fine with __pathstring__ , but I thought it was already decided that
> it
> > would NOT return a bytestring!
>
> I sincerely hope that's been settled on. There's no reason to have
> this ever return anything other than a str. (Famous last words, I
> know.)
>

It has been settled: pathlib.Path itself won't accept bytes anyway so
there's no reason to expect this to ever return anything but str.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/1b211a35/attachment.html>

From rdmurray at bitdance.com  Fri Apr  8 16:39:31 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Fri, 08 Apr 2016 16:39:31 -0400
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W6agqzXE76W-zWXsa-UjbxXcK5eKsXxyOAQanVV68T8aQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
 <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
 <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
 <CAP1=2W6agqzXE76W-zWXsa-UjbxXcK5eKsXxyOAQanVV68T8aQ@mail.gmail.com>
Message-ID: <20160408203932.740D7B14158@webabinitio.net>

On Fri, 08 Apr 2016 19:24:44 -0000, Brett Cannon <brett at python.org> wrote:
> On Fri, 8 Apr 2016 at 12:10 Chris Angelico <rosuav at gmail.com> wrote:
> 
> > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker <chris.barker at noaa.gov>
> > wrote:
> > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com>
> > wrote:
> > >>
> > >> >
> > >> > __pathstr__ # pathstring
> > >> >
> > >>
> > >> Or perhaps __pathstring__ in case it may be or return byte strings.

But there are other paths than OS file system paths.  I prefer
__fspath__ or __os_path__ myself.  I think the fact that it is a string
is implied by the fact that it is getting us the thing we can pass
to the os (since Python3 deals with os paths as strings unless you
specify otherwise, only converting them back to bytes, on unix, at the last
moment).

Heh, although I suppose one could make the argument that it should
return whatever the native OS wants, and save the low level code
from having to do that?  Pass the path object all the way down
to that "final step" in the C layer?  (Just ignore me, I'm sure
I'm only making trouble :)

--David

From k7hoven at gmail.com  Fri Apr  8 16:51:00 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Fri, 8 Apr 2016 23:51:00 +0300
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <20160408203932.740D7B14158@webabinitio.net>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
 <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
 <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
 <CAP1=2W6agqzXE76W-zWXsa-UjbxXcK5eKsXxyOAQanVV68T8aQ@mail.gmail.com>
 <20160408203932.740D7B14158@webabinitio.net>
Message-ID: <CAMiohojJ88Z4rVYs62q_MeKU_PoCdvbFYpk8vVArbN=U==9dQw@mail.gmail.com>

On Fri, Apr 8, 2016 at 11:39 PM, R. David Murray <rdmurray at bitdance.com> wrote:
> On Fri, 08 Apr 2016 19:24:44 -0000, Brett Cannon <brett at python.org> wrote:
>> On Fri, 8 Apr 2016 at 12:10 Chris Angelico <rosuav at gmail.com> wrote:
>>
>> > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker <chris.barker at noaa.gov>
>> > wrote:
>> > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com>
>> > wrote:
>> > >>
>> > >> >
>> > >> > __pathstr__ # pathstring
>> > >> >
>> > >>
>> > >> Or perhaps __pathstring__ in case it may be or return byte strings.
>
> But there are other paths than OS file system paths.  I prefer
> __fspath__ or __os_path__ myself.  I think the fact that it is a string
> is implied by the fact that it is getting us the thing we can pass
> to the os (since Python3 deals with os paths as strings unless you
> specify otherwise, only converting them back to bytes, on unix, at the last
> moment).
>
> Heh, although I suppose one could make the argument that it should
> return whatever the native OS wants, and save the low level code
> from having to do that?  Pass the path object all the way down
> to that "final step" in the C layer?  (Just ignore me, I'm sure
> I'm only making trouble :)

My favorites are fspath and pathname, and since this is a dunder
methdod, it is not as crucial what it is called. I have the feeling
the consensus is converging towards fspath?

I'll comment on the bytes issue in the other thread. Boy these threads
are all over the place!

-Koos

From sunnycemetery at gmail.com  Fri Apr  8 17:00:06 2016
From: sunnycemetery at gmail.com (Grady Martin)
Date: Fri, 8 Apr 2016 17:00:06 -0400
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
Message-ID: <20160408210006.GB1484@slim>

Hello, all.  I was wondering if the following string was left untouched by gettext for a purpose (from line 720 of argparse.py, in class ArgumentError):

'argument %(argument_name)s: %(message)s'

There may be other untranslatable strings in the argparse module, but I have yet to encounter them in the wild.

Thank you.

From brett at python.org  Fri Apr  8 17:07:48 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 21:07:48 +0000
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
In-Reply-To: <20160408210006.GB1484@slim>
References: <20160408210006.GB1484@slim>
Message-ID: <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>

On Fri, 8 Apr 2016 at 14:05 Grady Martin <sunnycemetery at gmail.com> wrote:

> Hello, all.  I was wondering if the following string was left untouched by
> gettext for a purpose (from line 720 of argparse.py, in class
> ArgumentError):
>
> 'argument %(argument_name)s: %(message)s'
>
> There may be other untranslatable strings in the argparse module, but I
> have yet to encounter them in the wild.
>

Probably so that anyone introspecting on the error message can count on
somewhat of a consistent format (comes into play with doctest typically).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/1d464099/attachment-0001.html>

From k7hoven at gmail.com  Fri Apr  8 17:23:50 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Sat, 9 Apr 2016 00:23:50 +0300
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
 <5707DDF3.5030106@stoneleaf.us>
 <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>
Message-ID: <CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>

On Fri, Apr 8, 2016 at 8:34 PM, Brett Cannon <brett at python.org> wrote:
> On Fri, 8 Apr 2016 at 09:39 Ethan Furman <ethan at stoneleaf.us> wrote:
>> > I thought the whole point off all this is that not any old string can be
>> > a path! (whereas any int can be an index). Unless we go with Chris A's
>> > suggestion that this be a more generic lossless string protocol, rather
>> > than just for paths.
>>
>> That does seem to be a valid point against str.__fspath__.
>
> Yep, and I'm expecting we won't want that at this point. The fact that paths
> need strings for low-level OS stuff is a historical and technical detail, so
> no need to drag the entire str type into it if we can provide a reasonable
> helper function (for either the ABC or magic method solution).

I'm not sure I understand what these points are about. Anyway,
disallowing str or bytes as pathnames will break backwards
compatibility if done at some point in the future. There's no way
around that.

But regarding all this talk of mine about bytes is because it has not
been completely clear to me if something can break when converting a
bytes path to str. I did originally propose guaranteeing a str, but I
am so far only 85% convinced that that does not cause any problems. I
understand that fsencode(fsdecode(bytes_path)) should always be equal
to bytes_path. But can some other path operations fail when there are
surrogates in the strings? And again, not to forget DirEntry, which
may have a byte string path.

Either way, I suppose os.fspath should accept anything that has
__fspath__ or is a str or bytes (whether these have the dunder method
or not). Then the options are either to return Union[str, bytes] or to
always return str. And if the latter does not cause any problems, I
like it way better, and it seems others would do too. And in that case
it would probably be time to deprecate bytes paths on posix too (on
Windows, this is already the case).

But do we know that converting all paths to str does not cause any problems?

-Koos

From brett at python.org  Fri Apr  8 17:53:18 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 21:53:18 +0000
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
 <5707DDF3.5030106@stoneleaf.us>
 <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>
 <CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>
Message-ID: <CAP1=2W6rpfXA30Fjgp6eM1kR=Ra2VH+xYoehjoXZurtG6pYttw@mail.gmail.com>

On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Fri, Apr 8, 2016 at 8:34 PM, Brett Cannon <brett at python.org> wrote:
> > On Fri, 8 Apr 2016 at 09:39 Ethan Furman <ethan at stoneleaf.us> wrote:
> >> > I thought the whole point off all this is that not any old string can
> be
> >> > a path! (whereas any int can be an index). Unless we go with Chris A's
> >> > suggestion that this be a more generic lossless string protocol,
> rather
> >> > than just for paths.
> >>
> >> That does seem to be a valid point against str.__fspath__.
> >
> > Yep, and I'm expecting we won't want that at this point. The fact that
> paths
> > need strings for low-level OS stuff is a historical and technical
> detail, so
> > no need to drag the entire str type into it if we can provide a
> reasonable
> > helper function (for either the ABC or magic method solution).
>
> I'm not sure I understand what these points are about.


It means we most likely won't add a new method to str in regards to this
proposal.


> Anyway,
> disallowing str or bytes as pathnames will break backwards
> compatibility if done at some point in the future. There's no way
> around that.
>

No one is proposing disallowing str or bytes for a pre-existing API that
supports either. The whole point of this is to make APIs work with strings
and pathlib.


>
> But regarding all this talk of mine about bytes is because it has not
> been completely clear to me if something can break when converting a
> bytes path to str. I did originally propose guaranteeing a str, but I
> am so far only 85% convinced that that does not cause any problems.


Depends on your definition of "problem". If you somehow blindly converted a
bytes object representing a path to a str without knowing its encoding you
will definitely break someone silently (and even os.fsdecode() isn't
fool-proof thanks to multiple encodings on a single file system).


> I
> understand that fsencode(fsdecode(bytes_path)) should always be equal
> to bytes_path. But can some other path operations fail when there are
> surrogates in the strings? And again, not to forget DirEntry, which
> may have a byte string path.
>

At this point no one wants to touch bytes paths. If you need that level of
control because of multiple encodings within a single file system then you
will probably have to stick with managing bytes paths on your own to get
the encoding right.

And just because DirEntry supports bytes doesn't mean that any magic method
it gains has to carry that forward (it can always raise a TypeError if
necessary).


>
> Either way, I suppose os.fspath should accept anything that has
> __fspath__ or is a str or bytes (whether these have the dunder method
> or not).


Maybe. I'm not sure if we will want to down that route of both bytes and
str being supported out of the same function as that gets messy quickly.
The main reason os.scandir() supports it is so it can be a drop-in
replacement for os.listdir(). It really depends on how we choose to
structure the function in terms of just doing the right thing for objects
that follow the protocol or if we want to introduce some required structure
for the resulting path and implement some type guarantees so you have a
better idea of what you will be working with after calling the function.


> Then the options are either to return Union[str, bytes] or to
> always return str. And if the latter does not cause any problems, I
> like it way better, and it seems others would do too.


You don't have to convert byte paths to str, you can simply raise an
exception in the face of them.


> And in that case
> it would probably be time to deprecate bytes paths on posix too (on
> Windows, this is already the case).
>

Can't do that as Stephen Turnbull will tell you. :) At best we can
marginalize the support of bytes-based paths to only low-level APIs exposed
through the os package.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/911415e3/attachment.html>

From ericsnowcurrently at gmail.com  Fri Apr  8 17:57:52 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Fri, 8 Apr 2016 15:57:52 -0600
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
Message-ID: <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>

On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon <brett at python.org> wrote:
> I personally still like __ospath__ as well.

Same here.  The strings are essentially an OS-dependent serialization,
rather than related to a particular file system.

-eric

From chris.barker at noaa.gov  Fri Apr  8 18:21:12 2016
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Fri, 8 Apr 2016 15:21:12 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
Message-ID: <-8088150910827119255@unknownmsgid>

> On Apr 8, 2016, at 3:00 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>
>> On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon <brett at python.org> wrote:
>> I personally still like __ospath__ as well.
>
> Same here.  The strings are essentially an OS-dependent serialization,
> rather than related to a particular file system.

Huh? I though the strings were a OS-independent, human readable
serialization and interchange format.

Bytes would be the OS-dependent serialization.

But yes, I suppose the file-system-level version would be inodes or something.

But this is a string that represents a path, thus __pathstr__. And the
term "path" is used all over the place (including os.path and pathlib)
for this particular type of path, so I don't see why we need the "fs"
or "os", other than the fact that __path__ is already taken.

But I'm looking forward to using this bike shed regardless of its
color, so that's the last I'll comment on that.

-CHB

From brett at python.org  Fri Apr  8 18:23:41 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 08 Apr 2016 22:23:41 +0000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <-8088150910827119255@unknownmsgid>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
Message-ID: <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>

On Fri, 8 Apr 2016 at 15:21 Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> > On Apr 8, 2016, at 3:00 PM, Eric Snow <ericsnowcurrently at gmail.com>
> wrote:
> >
> >> On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon <brett at python.org> wrote:
> >> I personally still like __ospath__ as well.
> >
> > Same here.  The strings are essentially an OS-dependent serialization,
> > rather than related to a particular file system.
>
> Huh? I though the strings were a OS-independent, human readable
> serialization and interchange format.
>

Depends if you use `/` or `\` as your path separator if they are truly
OS-independent. :)

-Brett


>
> Bytes would be the OS-dependent serialization.
>
> But yes, I suppose the file-system-level version would be inodes or
> something.
>
> But this is a string that represents a path, thus __pathstr__. And the
> term "path" is used all over the place (including os.path and pathlib)
> for this particular type of path, so I don't see why we need the "fs"
> or "os", other than the fact that __path__ is already taken.
>
> But I'm looking forward to using this bike shed regardless of its
> color, so that's the last I'll comment on that.
>
> -CHB
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/daf4fe2c/attachment.html>

From ericsnowcurrently at gmail.com  Fri Apr  8 18:28:03 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Fri, 8 Apr 2016 16:28:03 -0600
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
Message-ID: <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>

On Fri, Apr 8, 2016 at 3:57 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon <brett at python.org> wrote:
>> I personally still like __ospath__ as well.
>
> Same here.  The strings are essentially an OS-dependent serialization,
> rather than related to a particular file system.

Hmm.  It's important to note the distinction between a standardized
representation of a path and the OS-dependent representation.  That is
essentially the same distinction as provided by Go's "path" vs.
"path/fliepath" packages.  pathlib provides an abstraction of FS
paths, but does it provide a standardized representation?  From what I
can tell you only ever get some OS-dependent representation.

All this matters because it impacts the value returned from
__ospath__().  Should it return the string representation of the path
for the current OS or some standardized representation?  I'd expect
the former.  However, if that is the expectation then something like
pathlib.PureWindowsPath will give you the wrong thing if your current
OS is linux.  pathlib.PureWindowsPath.__ospath__() would have to fail
or first internally convert to pathlib.PurePosixPath?

On the other hand, it seems like the caller should be in charge of
deciding the required meaning.  That implies that returning a
standardized representation or even something like a
pathlib.PureGenericPath would be more appropriate.

-eric

From k7hoven at gmail.com  Fri Apr  8 19:05:41 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Sat, 9 Apr 2016 02:05:41 +0300
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAP1=2W6rpfXA30Fjgp6eM1kR=Ra2VH+xYoehjoXZurtG6pYttw@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAP7+vJ+O6e5eRi1WVPy0MHU=QP1Ntv3ykAkV+OfaEhLB2cekKw@mail.gmail.com>
 <57044567.6070308@sdamon.com>
 <CAPTjJmpeK9F4JYZmpMPQjZzH5YNRuZ18Lp3A1v0K=Ks+vv9fNQ@mail.gmail.com>
 <CAP7+vJK_wuzW3e6VM7na8-USmMtPDF0UJicTGvcMYAfWbJWjoA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
 <5707DDF3.5030106@stoneleaf.us>
 <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>
 <CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>
 <CAP1=2W6rpfXA30Fjgp6eM1kR=Ra2VH+xYoehjoXZurtG6pYttw@mail.gmail.com>
Message-ID: <CAMiohoimFSJ_fzs7CNGaWnhV5z-kopQ5EaAJcNWJu6bK8X3fWg@mail.gmail.com>

On Sat, Apr 9, 2016 at 12:53 AM, Brett Cannon <brett at python.org> wrote:
> On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven <k7hoven at gmail.com> wrote:
>
> At this point no one wants to touch bytes paths. If you need that level of
> control because of multiple encodings within a single file system then you
> will probably have to stick with managing bytes paths on your own to get the
> encoding right.

What does this mean? I assume you don't mean os.path.* would stop
dealing with bytes? And if not, then you seem to mean that os.fspath
would do nothing except call .__fspath__(). In that case, I think we
should go back to it being an attribute (or property) and a variation
of the now very famous idiom getattr(path, '__fspath__', path) and
perhaps have os.fspath do exactly that.

> And just because DirEntry supports bytes doesn't mean that any magic method
> it gains has to carry that forward (it can always raise a TypeError if
> necessary).

No, but what if some code gets pathnames from whatever other places
and passes them on to os.scandir. Whenever it happens to get a bytes
path, a TypeError gets raised, but only when it picks one of the
DirEntry objects and for instance tries to open(...) it. Of course,
I'm not sure how common this is.

> It really depends on how we choose to structure the
> function in terms of just doing the right thing for objects that follow the
> protocol or if we want to introduce some required structure for the
> resulting path and implement some type guarantees so you have a better idea
> of what you will be working with after calling the function.

Do you have an example of potential 'required structure'?

>> Then the options are either to return Union[str, bytes] or to
>> always return str. And if the latter does not cause any problems, I
>> like it way better, and it seems others would do too.
>
> You don't have to convert byte paths to str, you can simply raise an
> exception in the face of them.
>

I thought the point was for existing APIs to start supporting path
objects, wouldn't raising an exception break the API?

-Koos

From v+python at g.nevcal.com  Fri Apr  8 19:09:13 2016
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 8 Apr 2016 16:09:13 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
Message-ID: <57083A19.3070808@g.nevcal.com>

On 4/8/2016 3:28 PM, Eric Snow wrote:
> All this matters because it impacts the value returned from
> __ospath__().  Should it return the string representation of the path
> for the current OS or some standardized representation?  I'd expect
> the former.  However, if that is the expectation then something like
> pathlib.PureWindowsPath will give you the wrong thing if your current
> OS is linux.  pathlib.PureWindowsPath.__ospath__() would have to fail
> or first internally convert to pathlib.PurePosixPath?
Now that Windows 10++ will run Ubuntu apps, will Python be able to tell 
the difference for when it should return Windows-format paths and 
Posix-format paths?

(I'm sure the answer is yes, the Python-for-Ubuntu running on Windows 
would do the latter, and the Python-for-Windows would do the former. 
Although, it is not clear what sys.platform will return, yet...)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/2048d47d/attachment-0001.html>

From v+python at g.nevcal.com  Fri Apr  8 19:05:01 2016
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 8 Apr 2016 16:05:01 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
Message-ID: <5708391D.5030503@g.nevcal.com>

On 4/8/2016 11:25 AM, Brett Cannon wrote:
> I personally still like __ospath__ as well.
+1. Because they aren't always files... but what else they might be is 
os dependent.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/17e6f218/attachment.html>

From ethan at stoneleaf.us  Fri Apr  8 19:33:31 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 08 Apr 2016 16:33:31 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <CAMiohoimFSJ_fzs7CNGaWnhV5z-kopQ5EaAJcNWJu6bK8X3fWg@mail.gmail.com>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>	<CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>	<ne28fo$flu$1@ger.gmane.org>	<CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>	<CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>	<CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>	<570526CE.5080401@stoneleaf.us>	<CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>	<CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>	<CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>	<5707DDF3.5030106@stoneleaf.us>	<CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>	<CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>	<CAP1=2W6rpfXA30Fjgp6eM1kR=Ra2VH+xYoehjoXZurtG6pYttw@mail.gmail.com>
 <CAMiohoimFSJ_fzs7CNGaWnhV5z-kopQ5EaAJcNWJu6bK8X3fWg@mail.gmail.com>
Message-ID: <57083FCB.1000808@stoneleaf.us>

On 04/08/2016 04:05 PM, Koos Zevenhoven wrote:
> On Sat, Apr 9, 2016 at 12:53 AM, Brett Cannon wrote:
>> On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven wrote:
>>
>> At this point no one wants to touch bytes paths. If you need that level of
>> control because of multiple encodings within a single file system then you
>> will probably have to stick with managing bytes paths on your own to get the
>> encoding right.
>
> What does this mean? I assume you don't mean os.path.* would stop
> dealing with bytes?

No, it does not mean that.  It means the stuff in place won't change, 
but the stuff we're adding now to integrate with Path will only support 
str (which is one reason why os.path isn't going to die).

> And if not, then you seem to mean that os.fspath
> would do nothing except call .__fspath__().

Fair point.  So it should be something like this:

def fspath(thing):
     # look for path attribute
     string = getattr(thing, '__fspath__', None)
     if string is not None:
         return string
     # not found, do we have a str or bytes object?
     if isinstance(thing, (str, bytes)):
         return thing
     raise TypeError('`thing` must implement the __fspath__ protocol or 
be an instance of str or bytes')


>> And just because DirEntry supports bytes doesn't mean that any magic method
>> it gains has to carry that forward (it can always raise a TypeError if
>> necessary).
>
> No, but what if some code gets pathnames from whatever other places
> and passes them on to os.scandir. Whenever it happens to get a bytes
> path, a TypeError gets raised, but only when it picks one of the
> DirEntry objects and for instance tries to open(...) it. Of course,
> I'm not sure how common this is.

Yeah, I don't think this is a good idea.  Given that fspath() should be 
able to return bytes if bytes are passed in,  DirEntry's __fspath__ 
could return bytes to no ill effect.

I realize this may not be ideal, but throwing bytes to the wind is going 
to bite us in the end.

After all, the idea is to make these things work with the stdlib, and 
the stdlib accepts bytes for path strings.

--
~Ethan~

From larry at hastings.org  Fri Apr  8 20:56:10 2016
From: larry at hastings.org (Larry Hastings)
Date: Fri, 8 Apr 2016 17:56:10 -0700
Subject: [Python-Dev] Question about the current implementation of str
Message-ID: <5708532A.3040207@hastings.org>



I have a straightforward question about the str object, specifically the 
PyUnicodeObject.  I've tried reading the source to answer the question 
myself but it's nearly impenetrable.  So I was hoping someone here who 
understands the current implementation could answer it for me.

Although the str object is immutable from Python's perspective, the C 
object itself is mutable.  For example, for dynamically-created strings 
the hash field may be lazy-computed and cached inside the object.  I was 
wondering if there were other fields like this.  For example, are there 
similar lazy-computed cached objects for the different encoded versions 
(utf8 utf16) of the str?  What would really help an exhaustive list of 
the fields of a str object that may ever change after the object's 
initial creation.  Thanks!


We now return you to the debate about the pathlib module,


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/b0c2638c/attachment.html>

From guido at python.org  Fri Apr  8 21:12:54 2016
From: guido at python.org (Guido van Rossum)
Date: Fri, 8 Apr 2016 18:12:54 -0700
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
In-Reply-To: <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>
References: <20160408210006.GB1484@slim>
 <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>
Message-ID: <CAP7+vJ+p4XY_kV6WVG=1N-S6vsgF2BnanYVNbnK31HYWuOqS0g@mail.gmail.com>

That string looks like it is aimed at the developer, not the user of
the program, so it makes sense not to translate it.

On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon <brett at python.org> wrote:
>
>
> On Fri, 8 Apr 2016 at 14:05 Grady Martin <sunnycemetery at gmail.com> wrote:
>>
>> Hello, all.  I was wondering if the following string was left untouched by
>> gettext for a purpose (from line 720 of argparse.py, in class
>> ArgumentError):
>>
>> 'argument %(argument_name)s: %(message)s'
>>
>> There may be other untranslatable strings in the argparse module, but I
>> have yet to encounter them in the wild.
>
>
> Probably so that anyone introspecting on the error message can count on
> somewhat of a consistent format (comes into play with doctest typically).
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

From brett at python.org  Fri Apr  8 21:41:19 2016
From: brett at python.org (Brett Cannon)
Date: Sat, 09 Apr 2016 01:41:19 +0000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <57083A19.3070808@g.nevcal.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
 <57083A19.3070808@g.nevcal.com>
Message-ID: <CAP1=2W5bAqkpmi7teRHNowg4nJDLhX-Nj48XPWPZTS6a5BtYcQ@mail.gmail.com>

On Fri, Apr 8, 2016, 16:13 Glenn Linderman <v+python at g.nevcal.com> wrote:

> On 4/8/2016 3:28 PM, Eric Snow wrote:
>
> All this matters because it impacts the value returned from
> __ospath__().  Should it return the string representation of the path
> for the current OS or some standardized representation?  I'd expect
> the former.  However, if that is the expectation then something like
> pathlib.PureWindowsPath will give you the wrong thing if your current
> OS is linux.  pathlib.PureWindowsPath.__ospath__() would have to fail
> or first internally convert to pathlib.PurePosixPath?
>
> Now that Windows 10++ will run Ubuntu apps, will Python be able to tell
> the difference for when it should return Windows-format paths and
> Posix-format paths?
>

All the bits of code in Python accept / as a separator on Windows so it
doesn't matter (but Ubuntu on Windows is Linux, so it will be / just like
any other Linux install).


> (I'm sure the answer is yes, the Python-for-Ubuntu running on Windows
> would do the latter, and the Python-for-Windows would do the former.
> Although, it is not clear what sys.platform will return, yet...)
>

It should return Linux.

-Brett

_______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/d79cdeff/attachment.html>

From ericsnowcurrently at gmail.com  Fri Apr  8 22:45:32 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Fri, 8 Apr 2016 20:45:32 -0600
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W5bAqkpmi7teRHNowg4nJDLhX-Nj48XPWPZTS6a5BtYcQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
 <57083A19.3070808@g.nevcal.com>
 <CAP1=2W5bAqkpmi7teRHNowg4nJDLhX-Nj48XPWPZTS6a5BtYcQ@mail.gmail.com>
Message-ID: <CALFfu7CcJGAKyO2h_JNpHquoagYAOZw593_PEOBepE5drmzHZw@mail.gmail.com>

On Fri, Apr 8, 2016 at 7:41 PM, Brett Cannon <brett at python.org> wrote:
>
>
> On Fri, Apr 8, 2016, 16:13 Glenn Linderman <v+python at g.nevcal.com> wrote:
>>
>> On 4/8/2016 3:28 PM, Eric Snow wrote:
>>
>> All this matters because it impacts the value returned from
>> __ospath__().  Should it return the string representation of the path
>> for the current OS or some standardized representation?  I'd expect
>> the former.  However, if that is the expectation then something like
>> pathlib.PureWindowsPath will give you the wrong thing if your current
>> OS is linux.  pathlib.PureWindowsPath.__ospath__() would have to fail
>> or first internally convert to pathlib.PurePosixPath?
>>
>> Now that Windows 10++ will run Ubuntu apps, will Python be able to tell
>> the difference for when it should return Windows-format paths and
>> Posix-format paths?
>
>
> All the bits of code in Python accept / as a separator on Windows so it
> doesn't matter (but Ubuntu on Windows is Linux, so it will be / just like
> any other Linux install).

Technically it isn't linux. :)  It's the Ubuntu user-space using the
linux syscalls (like normal), and those syscalls are implemented as
light wrappers around the Windows kernel.  They even implemented fork.
On Windows.  There's no linux kernel involved.

>
>>
>> (I'm sure the answer is yes, the Python-for-Ubuntu running on Windows
>> would do the latter, and the Python-for-Windows would do the former.
>> Although, it is not clear what sys.platform will return, yet...)
>
>
> It should return Linux.

>From screenshots it looks like lsb_release -a returns the normal
Ubuntu info [1] and uname -a says linux (don't know if that will
change). [2]  So yeah, sys.platform should return Linux.

-eric

[1] https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/
[2] https://insights.ubuntu.com/2016/03/30/ubuntu-on-windows-the-ubuntu-userspace-for-windows-developers/

From ncoghlan at gmail.com  Sat Apr  9 02:48:45 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Apr 2016 16:48:45 +1000
Subject: [Python-Dev] pathlib (was: Defining a path protocol)
In-Reply-To: <CA+OGgf659yEKH8_hfgZ2DiKcr8D7SGySzvvDfjo=ZwEk7t396g@mail.gmail.com>
References: <CA+OGgf659yEKH8_hfgZ2DiKcr8D7SGySzvvDfjo=ZwEk7t396g@mail.gmail.com>
Message-ID: <CADiSq7epja0XdVBd8ZPm2HjHzQp7hK3Zs86uXNv8q+ubx=o7iQ@mail.gmail.com>

On 8 April 2016 at 00:25, Jim J. Jewett <jimjjewett at gmail.com> wrote:
> (1)  I think the "built-in" should instead be a module-level function
> in the pathlib.  If you aren't already expecting pathlib paths, then
> you're just expecting strings to work anyhow, and a builtin isn't
> likely to be helpful.

Concrete data in relation to "Why not put the helper function in pathlib?":

    >>> import sys
    >>> orig_modules = set(sys.modules)
    >>> "os" in orig_modules
    True
    >>> import pathlib
    >>> extra_dependencies = set(sys.modules) - orig_modules
    >>> print(sorted(extra_dependencies))
    ['_collections', '_functools', '_heapq', '_operator', '_sre',
'collections', 'contextlib', 'copyreg', 'fnmatch', 'functools',
'heapq', 'itertools', 'keyword', 'ntpath', 'operator', 'pathlib',
're', 'reprlib', 'sre_compile', 'sre_constants', 'sre_parse',
'urllib', 'urllib.parse', 'weakref']

We want to be able to readily use the protocol helper in builtin
modules like os and low level Python modules like os.path, which means
we want it to be much lower down in the import hierarchy than pathlib.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sat Apr  9 02:58:54 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Apr 2016 16:58:54 +1000
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
Message-ID: <CADiSq7eZESHNQztnXLHztEbYA_-iyLFSOW-1-s1jNfju0VpAMw@mail.gmail.com>

On 9 April 2016 at 02:02, Koos Zevenhoven <k7hoven at gmail.com> wrote:
> I'm still thinking a little bit about 'pathname', which to me sounds
> more like a string than fspath does [1]. It would be nice to have the
> string/path distinction especially when pathlib adoption grows larger.
> But who knows, maybe somewhere in the far future, no-one will care
> much about fspath, fsencode, fsdecode or os.path.

Ah, I like it - adding the "name" suffix nicely distinguishes the
protocol from the rich path objects in pathlib.

I'll catch up on Ethan's dedicated naming thread before commenting
further, though :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Sat Apr  9 03:07:01 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 9 Apr 2016 09:07:01 +0200
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
Message-ID: <CAMpsgwaVQbrDXKMQDXOOikhene98Ev_20vHap1-tSdG47wzu+Q@mail.gmail.com>

os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
deliberate choice.

I strongly suggest to only support Unicode for filenames in Python 3. So
__fspath__ must only return str, or a TypeError must be raised.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/307a16e3/attachment.html>

From ethan at stoneleaf.us  Sat Apr  9 03:16:29 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 09 Apr 2016 00:16:29 -0700
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMpsgwaVQbrDXKMQDXOOikhene98Ev_20vHap1-tSdG47wzu+Q@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
 <CAMpsgwaVQbrDXKMQDXOOikhene98Ev_20vHap1-tSdG47wzu+Q@mail.gmail.com>
Message-ID: <5708AC4D.3080809@stoneleaf.us>

On 04/09/2016 12:07 AM, Victor Stinner wrote:
> os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
> deliberate choice.

3.5.0 scandir supports bytes:

--> huh = list(scandir(b'.'))
--> huh
[<DirEntry b'minicourse-ajax-project'>, <DirEntry b'js'>, <DirEntry 
b'__MACOSX'>, <DirEntry b'index.xaml'>, <DirEntry b'css'>, <DirEntry 
b'index.html'>]

--> huh[0].path
b'./minicourse-ajax-project'

--
~Ethan~

From ncoghlan at gmail.com  Sat Apr  9 03:18:10 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Apr 2016 17:18:10 +1000
Subject: [Python-Dev] Question about the current implementation of str
In-Reply-To: <5708532A.3040207@hastings.org>
References: <5708532A.3040207@hastings.org>
Message-ID: <CADiSq7fPBrcZ-zs=xRfePVA-FTdqVLmD_4omBP6Cwd5Pf=swGA@mail.gmail.com>

On 9 April 2016 at 10:56, Larry Hastings <larry at hastings.org> wrote:
>
>
> I have a straightforward question about the str object, specifically the
> PyUnicodeObject.  I've tried reading the source to answer the question
> myself but it's nearly impenetrable.  So I was hoping someone here who
> understands the current implementation could answer it for me.
>
> Although the str object is immutable from Python's perspective, the C object
> itself is mutable.  For example, for dynamically-created strings the hash
> field may be lazy-computed and cached inside the object.  I was wondering if
> there were other fields like this.  For example, are there similar
> lazy-computed cached objects for the different encoded versions (utf8 utf16)
> of the str?  What would really help an exhaustive list of the fields of a
> str object that may ever change after the object's initial creation.

https://www.python.org/dev/peps/pep-0393/#specification should have
most of the relevant details.

Aside from the hash and the interned-or-not flag in the state, most
things should be locked once the string is ready, except that
generating the utf-8 and wchar_t forms is deferred until they're
needed if they're not the same as the canonical form.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sat Apr  9 03:48:38 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Apr 2016 17:48:38 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
Message-ID: <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>

On 9 April 2016 at 04:25, Brett Cannon <brett at python.org> wrote:
> On Fri, 8 Apr 2016 at 11:13 Ethan Furman <ethan at stoneleaf.us> wrote:
>> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
>>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
>>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
>>
>>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
>>  >>> more like a string than fspath does.
>>  >>
>>  >>
>>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
>>  >> after all, we're making a big deal out of the fact that a path is
>>  >> *not a string*, but rather a string is a *representation* (or
>>  >> serialization) of a path.
>>
>> That's a decent point.
>>
>> So the plausible choices are, I think:
>>
>> - __fspath__  # File System Path -- possible confusion with Path
>
> +1

I like __fspath__, but I'm also sympathetic to Koos' point that we're
really dealing with path *names* being produced via this protocol,
rather than the paths themselves.

That would bring the completely explicit "__fspathname__" into the
mix, which would be comparable in length to "__getattribute__" as a
magic method name (both in terms of number of syllable and number of
characters).

Considering the helper function usage, here's some examples in
combination with os.fsencode and os.fsdecode:

    # Status quo for binary/text path conversions
    text_path = os.fsdecode(bytes_path)
    bytes_path = os.fsencode(text_path)

    # Getting a text path from an arbitrary object
    text_path = os.fspath(obj) # This doesn't scream "returns text!" to me
    text_path = os.fspathname(obj) # This does

    # Getting a binary path from an arbitrary object
    bytes_path = os.fsencode(os.fspath(obj))
    bytes_path = os.fsencode(os.fspathname(obj))

I'm starting to think the semantic nudge from the "name" suffix when
reading the code is worth the extra four characters when writing it
(keeping in mind that the whole point of this exercise is that most
folks *won't* be writing explicit conversions - the stdlib will handle
it on their behalf).

I also think the more explicit name helps answer some of the type
signature questions that have arisen:

1. Does os.fspathname return rich Path objects? No, it returns names
as str objects
2. Will file descriptors pass through os.fspathname? No, as they're
not names, they're numeric descriptors.
3. Will bytes-like objects pass through os.fspathname? No, as they're
not names, they're encodings of names

When the name is instead "os.fspath", the appropriate answers to those
three questions are far more debatable.

> I personally still like __ospath__ as well.

That one fails the "Is it ambiguous when spoken aloud?" test for me:
if someone mentions "oh-ess-path", are they talking about os.path or
__ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem
doesn't arise.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Sat Apr  9 03:52:24 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 9 Apr 2016 09:52:24 +0200
Subject: [Python-Dev] Question about the current implementation of str
In-Reply-To: <5708532A.3040207@hastings.org>
References: <5708532A.3040207@hastings.org>
Message-ID: <CAMpsgwYHVuhMROmnw4DYB=+y-7MjAU7FYc3SV+52DRmu=cv0WA@mail.gmail.com>

Le 9 avr. 2016 03:04, "Larry Hastings" <larry at hastings.org> a ?crit :
> Although the str object is immutable from Python's perspective, the C
object itself is mutable.  For example, for dynamically-created strings the
hash field may be lazy-computed and cached inside the object.

Yes, the hash is computed once on demand. It doesn't matter how you build
the string.

> I was wondering if there were other fields like this.  For example, are
there similar lazy-computed cached objects for the different encoded
versions (utf8 utf16) of the str?

Cached utf8 is only cached when you call the C functions filling this
cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.

On Windows, there is a cache for wchar_t* which is utf16. This format is
used by all C functions of the Windows API (Python should only use the
Unicode flavor of the Windows API).

I don't recall other caches.

> What would really help an exhaustive list of the fields of a str object
that may ever change after the object's initial creation.

I don't recall exactly what happens if a cache is created and then the
string is modified. If I recall correctly, the cache is invalidated.

But the hash is used as an heuristic to decide if a string is "immutable"
or not, the refcount is also used by the heuristic. If the string is
immutable, an operation like resize must create a new string.

You can document the PEP 393 in Include/unicodeobject.h.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/74e9e0f4/attachment.html>

From sunnycemetery at gmail.com  Sat Apr  9 04:25:55 2016
From: sunnycemetery at gmail.com (Grady Martin)
Date: Sat, 9 Apr 2016 04:25:55 -0400
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
In-Reply-To: <CAP7+vJ+p4XY_kV6WVG=1N-S6vsgF2BnanYVNbnK31HYWuOqS0g@mail.gmail.com>
References: <20160408210006.GB1484@slim>
 <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>
 <CAP7+vJ+p4XY_kV6WVG=1N-S6vsgF2BnanYVNbnK31HYWuOqS0g@mail.gmail.com>
Message-ID: <20160409082555.GF1484@slim>

I agree.  However, an incorrect choice for an argument with a choices parameter results in this string.

On 2016?04?08? 18?12?, Guido van Rossum wrote:
>
>That string looks like it is aimed at the developer, not the user of
>the program, so it makes sense not to translate it.
>
>On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon <brett at python.org> wrote:
>>
>>
>> On Fri, 8 Apr 2016 at 14:05 Grady Martin <sunnycemetery at gmail.com> wrote:
>>>
>>> Hello, all.  I was wondering if the following string was left untouched by
>>> gettext for a purpose (from line 720 of argparse.py, in class
>>> ArgumentError):
>>>
>>> 'argument %(argument_name)s: %(message)s'
>>>
>>> There may be other untranslatable strings in the argparse module, but I
>>> have yet to encounter them in the wild.
>>
>>
>> Probably so that anyone introspecting on the error message can count on
>> somewhat of a consistent format (comes into play with doctest typically).
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
>
>--
>--Guido van Rossum (python.org/~guido)

From storchaka at gmail.com  Sat Apr  9 05:00:30 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sat, 9 Apr 2016 12:00:30 +0300
Subject: [Python-Dev] Question about the current implementation of str
In-Reply-To: <CAMpsgwYHVuhMROmnw4DYB=+y-7MjAU7FYc3SV+52DRmu=cv0WA@mail.gmail.com>
References: <5708532A.3040207@hastings.org>
 <CAMpsgwYHVuhMROmnw4DYB=+y-7MjAU7FYc3SV+52DRmu=cv0WA@mail.gmail.com>
Message-ID: <neagbf$8al$1@ger.gmane.org>

On 09.04.16 10:52, Victor Stinner wrote:
> Le 9 avr. 2016 03:04, "Larry Hastings" <larry at hastings.org
> <mailto:larry at hastings.org>> a ?crit :
>  > Although the str object is immutable from Python's perspective, the C
> object itself is mutable.  For example, for dynamically-created strings
> the hash field may be lazy-computed and cached inside the object.
>
> Yes, the hash is computed once on demand. It doesn't matter how you
> build the string.
>
>  > I was wondering if there were other fields like this.  For example,
> are there similar lazy-computed cached objects for the different encoded
> versions (utf8 utf16) of the str?
>
> Cached utf8 is only cached when you call the C functions filling this
> cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.
>
> On Windows, there is a cache for wchar_t* which is utf16. This format is
> used by all C functions of the Windows API (Python should only use the
> Unicode flavor of the Windows API).
>
> I don't recall other caches.
>
>  > What would really help an exhaustive list of the fields of a str
> object that may ever change after the object's initial creation.
>
> I don't recall exactly what happens if a cache is created and then the
> string is modified. If I recall correctly, the cache is invalidated.

You must remember, some bugs with desynchronized utf8 and wchar_t* 
caches were fixed just few months ago.

> But the hash is used as an heuristic to decide if a string is
> "immutable" or not, the refcount is also used by the heuristic. If the
> string is immutable, an operation like resize must create a new string.
>
> You can document the PEP 393 in Include/unicodeobject.h.

In normal case the string object can be mutated only at creation time. 
But CPython uses some tricks that modifies already created strings if 
they have no external references and are not interned. For example "a += 
b" or "a = a + b" can resize the "a" string.



From victor.stinner at gmail.com  Sat Apr  9 05:09:37 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 9 Apr 2016 11:09:37 +0200
Subject: [Python-Dev] Question about the current implementation of str
In-Reply-To: <CAMpsgwYHVuhMROmnw4DYB=+y-7MjAU7FYc3SV+52DRmu=cv0WA@mail.gmail.com>
References: <5708532A.3040207@hastings.org>
 <CAMpsgwYHVuhMROmnw4DYB=+y-7MjAU7FYc3SV+52DRmu=cv0WA@mail.gmail.com>
Message-ID: <CAMpsgwZWq48_THCJ1rvYUrTw5Ka=NjUMG_01F_4yNwstGe863w@mail.gmail.com>

2016-04-09 9:52 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
> But the hash is used as an heuristic to decide if a string is "immutable" or
> not, the refcount is also used by the heuristic. If the string is immutable,
> an operation like resize must create a new string.

I'm talking about this private function:

static int
unicode_modifiable(PyObject *unicode)
{
    assert(_PyUnicode_CHECK(unicode));
    if (Py_REFCNT(unicode) != 1)
        return 0;
    if (_PyUnicode_HASH(unicode) != -1)
        return 0;
    if (PyUnicode_CHECK_INTERNED(unicode))
        return 0;
    if (!PyUnicode_CheckExact(unicode))
        return 0;
#ifdef Py_DEBUG
    /* singleton refcount is greater than 1 */
    assert(!unicode_is_singleton(unicode));
#endif
    return 1;
}

Victor

From g.rodola at gmail.com  Sat Apr  9 06:37:23 2016
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Sat, 9 Apr 2016 12:37:23 +0200
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CALGmxEKbJy1SBWszTnAi_2R0jc2sbGqtArBPFwyLprTHh2vMcw@mail.gmail.com>
 <CAMiohohcZnSNzvx+zuQ-pLvODo63C7nD7hrqOtKZWfaNn_pdpw@mail.gmail.com>
 <CALGmxEK0ykFzbxSqSagP2r8ozFUMkEBC0WFui+Cq_zkYHNcnow@mail.gmail.com>
 <CAPTjJmrZHw0C=3+6B5W8o5iCU5SsQhQUabJhTu2EpObs_z3TmQ@mail.gmail.com>
Message-ID: <CAFYqXL-W4_EzKX3=bMPY_iACdHkbjpZ6tDUpc_tDc7Mff0LAaw@mail.gmail.com>

On Fri, Apr 8, 2016 at 9:09 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker <chris.barker at noaa.gov>
> wrote:
> > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven <k7hoven at gmail.com>
> wrote:
> >>
> >> >
> >> > __pathstr__ # pathstring
> >> >
> >>
> >> Or perhaps __pathstring__ in case it may be or return byte strings.
> >
> >
> > I'm fine with __pathstring__ , but I thought it was already decided that
> it
> > would NOT return a bytestring!
>
> I sincerely hope that's been settled on. There's no reason to have
> this ever return anything other than a str. (Famous last words, I
> know.)
>
> ChrisA
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com
>

I'm kind of scared about this: scared to state and be 100% sure that bytes
won't *never ever* be returned.
As such I would call this __fspath__ or something, but I would definitively
avoid to use "str".

-- 
Giampaolo - http://grodola.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/7922a53d/attachment.html>

From k7hoven at gmail.com  Sat Apr  9 06:51:23 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Sat, 9 Apr 2016 13:51:23 +0300
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <5708AC4D.3080809@stoneleaf.us>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>
 <CAMpsgwaVQbrDXKMQDXOOikhene98Ev_20vHap1-tSdG47wzu+Q@mail.gmail.com>
 <5708AC4D.3080809@stoneleaf.us>
Message-ID: <CAMiohoiEt3FPaD7F82kSXhNZtzhGOXQbUkfWUW9HBx1ebxYC4w@mail.gmail.com>

On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/09/2016 12:07 AM, Victor Stinner wrote:
>>
>> os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
>> deliberate choice.
>
>
> 3.5.0 scandir supports bytes:
>
> --> huh = list(scandir(b'.'))
> --> huh
> [<DirEntry b'minicourse-ajax-project'>, <DirEntry b'js'>, <DirEntry
> b'__MACOSX'>, <DirEntry b'index.xaml'>, <DirEntry b'css'>, <DirEntry
> b'index.html'>]
>
> --> huh[0].path
> b'./minicourse-ajax-project'
>
>

Maybe it's the bytes support in scandir that should be deprecated?
(And not bytes support in general, which cannot be done on posix, as I
hear Stephen T. will tell me).

-Koos

From victor.stinner at gmail.com  Sat Apr  9 08:43:19 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 9 Apr 2016 14:43:19 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>

Please don't loose time trying yet another sandbox inside CPython. It's
just a waste of time. It's broken by design.

Please read my email about my attempt (pysandbox):
https://lwn.net/Articles/574323/

And the LWN article:
https://lwn.net/Articles/574215/

There are a lot of safe ways to run CPython inside a sandbox (and not rhe
opposite).

I started as you, add more and more things to a blacklist, but it doesn't
work.

See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
a list of know code to crash CPython (I don't recall the dieectory in
sources), even with the latest version of CPython.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/1e372d1f/attachment.html>

From fijall at gmail.com  Sat Apr  9 08:47:55 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sat, 9 Apr 2016 15:47:55 +0300
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
Message-ID: <CAK5idxRaFPmyt7K1Eh0Vh_cQ+Bz1BdgWL6Vp7SqYHLUUunc2uA@mail.gmail.com>

I'm with Victor here. In fact I tried (and failed) to convince Victor
that the approach is entirely unworkable when he was starting, don't
be the next one :-)

On Sat, Apr 9, 2016 at 3:43 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
> Please don't loose time trying yet another sandbox inside CPython. It's just
> a waste of time. It's broken by design.
>
> Please read my email about my attempt (pysandbox):
> https://lwn.net/Articles/574323/
>
> And the LWN article:
> https://lwn.net/Articles/574215/
>
> There are a lot of safe ways to run CPython inside a sandbox (and not rhe
> opposite).
>
> I started as you, add more and more things to a blacklist, but it doesn't
> work.
>
> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> a list of know code to crash CPython (I don't recall the dieectory in
> sources), even with the latest version of CPython.
>
> Victor
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

From rdmurray at bitdance.com  Sat Apr  9 09:02:04 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Sat, 09 Apr 2016 09:02:04 -0400
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
Message-ID: <20160409130206.6E4B1B14158@webabinitio.net>

On Sat, 09 Apr 2016 17:48:38 +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 9 April 2016 at 04:25, Brett Cannon <brett at python.org> wrote:
> > On Fri, 8 Apr 2016 at 11:13 Ethan Furman <ethan at stoneleaf.us> wrote:
> >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
> >>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
> >>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
> >>
> >>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
> >>  >>> more like a string than fspath does.
> >>  >>
> >>  >>
> >>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
> >>  >> after all, we're making a big deal out of the fact that a path is
> >>  >> *not a string*, but rather a string is a *representation* (or
> >>  >> serialization) of a path.
> >>
> >> That's a decent point.
> >>
> >> So the plausible choices are, I think:
> >>
> >> - __fspath__  # File System Path -- possible confusion with Path
> >
> > +1
> 
> I like __fspath__, but I'm also sympathetic to Koos' point that we're
> really dealing with path *names* being produced via this protocol,
> rather than the paths themselves.
> 
> That would bring the completely explicit "__fspathname__" into the
> mix, which would be comparable in length to "__getattribute__" as a
> magic method name (both in terms of number of syllable and number of
> characters).

I'm not going to vote -1, but for the record I have no real intuition
as to what a "path name" would be.  An arbitrary identifier that we're
using to refer to an os path?

That is, a 'filename' is the identifier we've assigned to this thing
pointed to by an inode in linux, but an os path is a text representation
of the path from the root filename to a specified filename.  That is,
the path *is* the name, so to say "path name" sounds redundant and
confusing to me.

--David

From Nikolaus at rath.org  Sat Apr  9 10:32:28 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Sat, 09 Apr 2016 07:32:28 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> (Donald Stufft's
 message of "Thu, 7 Apr 2016 07:03:56 -0400")
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
 <87oa9l3dab.fsf@thinkpad.rath.org>
 <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io>
Message-ID: <878u0mzwcj.fsf@vostro.rath.org>

On Apr 07 2016, Donald Stufft <donald at stufft.io> wrote:
>> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath <Nikolaus at rath.org> wrote:
>> 
>> Does anyone anticipate any classes other than those from pathlib to come
>> with such a method?
>
>
> It seems like it would be reasonable for pathlib.Path to call fspath on the
> path passed to pathlib.Path.__init__, which would mean that if other libraries
> implemented __fspath__ then you could pass their path objects to pathlib and
> it would just work (and similarly, if they also called fspath it would enable
> interoperation between all of the various path libraries).

Indeed, but my question is: is this actually going to happen? Are there
going to be other libraries that will implement __fspath__, and will
there be demand for pathlib to support them?


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 997 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/96a82690/attachment.sig>

From guido at python.org  Sat Apr  9 11:16:41 2016
From: guido at python.org (Guido van Rossum)
Date: Sat, 9 Apr 2016 08:16:41 -0700
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
In-Reply-To: <20160409082555.GF1484@slim>
References: <20160408210006.GB1484@slim>
 <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>
 <CAP7+vJ+p4XY_kV6WVG=1N-S6vsgF2BnanYVNbnK31HYWuOqS0g@mail.gmail.com>
 <20160409082555.GF1484@slim>
Message-ID: <CAP7+vJ+-wRyqxfDFAF8RsP7X-wKiKdhm85wKQN7Ymdq=7=ZS4A@mail.gmail.com>

OK, so this should be taken to the bug tracker.

On Saturday, April 9, 2016, Grady Martin <sunnycemetery at gmail.com> wrote:

> I agree.  However, an incorrect choice for an argument with a choices
> parameter results in this string.
>
> On 2016?04?08? 18?12?, Guido van Rossum wrote:
>
>>
>> That string looks like it is aimed at the developer, not the user of
>> the program, so it makes sense not to translate it.
>>
>> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon <brett at python.org> wrote:
>>
>>>
>>>
>>> On Fri, 8 Apr 2016 at 14:05 Grady Martin <sunnycemetery at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hello, all.  I was wondering if the following string was left untouched
>>>> by
>>>> gettext for a purpose (from line 720 of argparse.py, in class
>>>> ArgumentError):
>>>>
>>>> 'argument %(argument_name)s: %(message)s'
>>>>
>>>> There may be other untranslatable strings in the argparse module, but I
>>>> have yet to encounter them in the wild.
>>>>
>>>
>>>
>>> Probably so that anyone introspecting on the error message can count on
>>> somewhat of a consistent format (comes into play with doctest typically).
>>>
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe:
>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>

-- 
--Guido (mobile)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/8216b361/attachment.html>

From ethan at stoneleaf.us  Sat Apr  9 11:30:05 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 09 Apr 2016 08:30:05 -0700
Subject: [Python-Dev] Defining a path protocol (was: When should pathlib
 stop being provisional?)
In-Reply-To: <CAMiohoiEt3FPaD7F82kSXhNZtzhGOXQbUkfWUW9HBx1ebxYC4w@mail.gmail.com>
References: <CAMiohoivVc1K2TZcqnBUqJDCDu4Zaji5UP8x1+XF4Efzttts4A@mail.gmail.com>	<CAMpsgwaVQbrDXKMQDXOOikhene98Ev_20vHap1-tSdG47wzu+Q@mail.gmail.com>	<5708AC4D.3080809@stoneleaf.us>
 <CAMiohoiEt3FPaD7F82kSXhNZtzhGOXQbUkfWUW9HBx1ebxYC4w@mail.gmail.com>
Message-ID: <57091FFD.3070909@stoneleaf.us>

On 04/09/2016 03:51 AM, Koos Zevenhoven wrote:
> On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman <ethan at stoneleaf.us> wrote:

>> 3.5.0 scandir supports bytes:
>
> Maybe it's the bytes support in scandir that should be deprecated?
> (And not bytes support in general, which cannot be done on posix, as I
> hear Stephen T. will tell me).

No, scandir is a low-level function -- it needs to support bytes.

--
~Ethan~


From ethan at stoneleaf.us  Sat Apr  9 11:39:30 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 09 Apr 2016 08:39:30 -0700
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <878u0mzwcj.fsf@vostro.rath.org>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <57054FFB.5070709@stoneleaf.us>
 <CAP1=2W6Vm9xcWqsgyayroc2rTUQVXMCY4CMfKt3Nm0EnqTGeUA@mail.gmail.com>
 <CACac1F-aRv2xgp8HaLeiwHP5D8U0=HqusjpkpzdhRV_pRkEqCg@mail.gmail.com>
 <CAP1=2W77K0t-+OTWnHq+vz0=ATA5ZQ+Xq1aPm=-d3bQSiSUY_A@mail.gmail.com>
 <CACac1F8Hn4wAYR0OvNcF0eSJNNK4+Z3wRRfXEuC5uBWTmMXB=A@mail.gmail.com>
 <CAP1=2W5fpwwkSP5Uj8MT6s3+p_e4ti3scG=n2S8S=bu2fK1NVw@mail.gmail.com>
 <CAGE7PNJrn4UkmVRoMgsK0nUiVac66UrZ4-uofkt3wJj7Bj80+A@mail.gmail.com>
 <CAP1=2W6V-pkPfPvn+pZKbVwtN-2NQtto7oJAhmM488B-EQCZMA@mail.gmail.com>
 <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid>
 <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us>
 <87oa9l3dab.fsf@thinkpad.rath.org>
 <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io>
 <878u0mzwcj.fsf@vostro.rath.org>
Message-ID: <57092232.6010002@stoneleaf.us>

On 04/09/2016 07:32 AM, Nikolaus Rath wrote:
> On Apr 07 2016, Donald Stufft <donald at stufft.io> wrote:
>>> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath <Nikolaus at rath.org> wrote:
>>>
>>> Does anyone anticipate any classes other than those from pathlib to come
>>> with such a method?
>>
>>
>> It seems like it would be reasonable for pathlib.Path to call fspath on the
>> path passed to pathlib.Path.__init__, which would mean that if other libraries
>> implemented __fspath__ then you could pass their path objects to pathlib and
>> it would just work (and similarly, if they also called fspath it would enable
>> interoperation between all of the various path libraries).
>
> Indeed, but my question is: is this actually going to happen? Are there
> going to be other libraries that will implement __fspath__, and will
> there be demand for pathlib to support them?

There will be at least one.  :)

--
~Ethan~

From ethan at stoneleaf.us  Sat Apr  9 12:41:01 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 09 Apr 2016 09:41:01 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs
 for __fspath__ and os.fspath()
Message-ID: <5709309D.8030007@stoneleaf.us>

On 04/09/2016 12:48 AM, Nick Coghlan wrote:

 > Considering the helper function usage, here's some examples in
 > combination with os.fsencode and os.fsdecode:
 >
 >   # Status quo for binary/text path conversions
 >   text_path = os.fsdecode(bytes_path)
 >   bytes_path = os.fsencode(text_path)
 >
 >   # Getting a text path from an arbitrary object
 >   text_path = os.fspath(obj) # This doesn't scream "returns text!"
 >   text_path = os.fspathname(obj) # This does
 >
 >   # Getting a binary path from an arbitrary object
 >   bytes_path = os.fsencode(os.fspath(obj))
 >   bytes_path = os.fsencode(os.fspathname(obj))
 >
 > I'm starting to think the semantic nudge from the "name" suffix when
 > reading the code is worth the extra four characters when writing it
 > (keeping in mind that the whole point of this exercise is that most
 > folks *won't* be writing explicit conversions - the stdlib will handle
 > it on their behalf).
 >
 > I also think the more explicit name helps answer some of the type
 > signature questions that have arisen:
 >
 > 1. Does os.fspathname return rich Path objects? No, it returns names
 > as str objects
 > 2. Will file descriptors pass through os.fspathname? No, as they're
 > not names, they're numeric descriptors.
 > 3. Will bytes-like objects pass through os.fspathname? No, as they're
 > not names, they're encodings of names

This worries me.

I know the primary purpose of this change is to enable pathlib and os 
and the rest of the stdlib to work together, but consider . . .

If adding a new attribute/method was as far as we went, new code (stdlib 
or otherwise) would look like:

   if isinstance(a_path_thingy, bytes):
       # because os can accept bytes
       pass
   elif isinstance(a_path_thingy, str):
       # but it's usually text
       pass
   elif hasattr(a_path_thingy, '__fspath__'):
       a_path_thingy = a_path_thingy.__fspath__()
   else:
       raise TypeError('not a valid path')
   # do something with the path

If we add os.fspath(), but don't allow bytes to be returned from it, our 
above example looks more like:

   if isinstance(a_path_thingy, bytes):
       # because os can accept bytes
       pass
   else:
       a_path_thingy = os.fspath(a_path_thingy)
   # do something with the path

Yes, it's better -- but it still requires a pre-check before calling 
os.fspath().

It is my contention that this is better:

   a_path_thingy = os.fspath(a_path_thingy)

This raises two issues:

1) Part of the stdlib is the new scandir module, which can work
    with, and return, both bytes and text -- if __fspath__ can only
    hold text, DirEntry will not get the __fspath__ method added,
    and the pre-check, boiler-plate code will flourish;

2) pathlib.Path accepts bytes -- so what happens when a byte-derived
    Path is passed to os.fspath()?  Is a TypeError raised?  Do we
    guess and auto-convert with fsdecode()?

I think the best answer is to

- let __fspath__ hold bytes as well as text
- let fspath() return bytes as well as text

--
~Ethan~

From sunnycemetery at gmail.com  Sat Apr  9 18:55:41 2016
From: sunnycemetery at gmail.com (Grady Martin)
Date: Sat, 9 Apr 2016 18:55:41 -0400
Subject: [Python-Dev] Incomplete Internationalization in Argparse Module
In-Reply-To: <CAP7+vJ+-wRyqxfDFAF8RsP7X-wKiKdhm85wKQN7Ymdq=7=ZS4A@mail.gmail.com>
References: <20160408210006.GB1484@slim>
 <CAP1=2W6mujcL-fXmfPsRujebnA-aSpZAcFQGP8UfcmT7OpWfEw@mail.gmail.com>
 <CAP7+vJ+p4XY_kV6WVG=1N-S6vsgF2BnanYVNbnK31HYWuOqS0g@mail.gmail.com>
 <20160409082555.GF1484@slim>
 <CAP7+vJ+-wRyqxfDFAF8RsP7X-wKiKdhm85wKQN7Ymdq=7=ZS4A@mail.gmail.com>
Message-ID: <20160409225541.GI1484@slim>

Excellent.  Issue/patch here:

http://bugs.python.org/issue26726

On 2016?04?09? 08?16?, Guido van Rossum wrote:
>
>OK, so this should be taken to the bug tracker.
>
>On Saturday, April 9, 2016, Grady Martin <sunnycemetery at gmail.com> wrote:
>
>> I agree.  However, an incorrect choice for an argument with a choices
>> parameter results in this string.
>>
>> On 2016?04?08? 18?12?, Guido van Rossum wrote:
>>
>>>
>>> That string looks like it is aimed at the developer, not the user of
>>> the program, so it makes sense not to translate it.
>>>
>>> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon <brett at python.org> wrote:
>>>
>>>>
>>>>
>>>> On Fri, 8 Apr 2016 at 14:05 Grady Martin <sunnycemetery at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hello, all.  I was wondering if the following string was left untouched
>>>>> by
>>>>> gettext for a purpose (from line 720 of argparse.py, in class
>>>>> ArgumentError):
>>>>>
>>>>> 'argument %(argument_name)s: %(message)s'
>>>>>
>>>>> There may be other untranslatable strings in the argparse module, but I
>>>>> have yet to encounter them in the wild.
>>>>>
>>>>
>>>>
>>>> Probably so that anyone introspecting on the error message can count on
>>>> somewhat of a consistent format (comes into play with doctest typically).
>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe:
>>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>>
>>>
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>>
>>
>
>--
>--Guido (mobile)

From ncoghlan at gmail.com  Sun Apr 10 00:51:23 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 10 Apr 2016 14:51:23 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
Message-ID: <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>

On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com> wrote:
> Please don't loose time trying yet another sandbox inside CPython. It's just
> a waste of time. It's broken by design.
>
> Please read my email about my attempt (pysandbox):
> https://lwn.net/Articles/574323/
>
> And the LWN article:
> https://lwn.net/Articles/574215/
>
> There are a lot of safe ways to run CPython inside a sandbox (and not rhe
> opposite).
>
> I started as you, add more and more things to a blacklist, but it doesn't
> work.
>
> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> a list of know code to crash CPython (I don't recall the dieectory in
> sources), even with the latest version of CPython.

They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers

There's also https://hg.python.org/cpython/file/tip/Lib/test/test_crashers.py
which was designed to run them regularly to catch when they were
resolved, but it was too fragile and tended to hang the buildbots.

Even without those considerations though, there are system level
denial of service attacks that untrusted code can perform without even
trying to break out of the sandbox - the most naive is "while 1:
pass", but there are more interesting ones like "from itertools import
count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int,
1))".

Operating system level security sandboxes still aren't particularly
easy to use correctly, but they're a lot more reliable than language
runtime level sandboxes, can be used to defend against many more
attack vectors, and even offer increased flexibility (e.g. "can write
to these directories, but no others", "can read these files, but no
others", "can contact these IP addresses, but no others").

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sun Apr 10 01:04:47 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 10 Apr 2016 15:04:47 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <20160409130206.6E4B1B14158@webabinitio.net>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
 <20160409130206.6E4B1B14158@webabinitio.net>
Message-ID: <CADiSq7czs0D0GRRVveb8XTS2KRaH-iAh9WLw60ROu3tQT6JpsQ@mail.gmail.com>

On 9 April 2016 at 23:02, R. David Murray <rdmurray at bitdance.com> wrote:
> That is, a 'filename' is the identifier we've assigned to this thing
> pointed to by an inode in linux, but an os path is a text representation
> of the path from the root filename to a specified filename.  That is,
> the path *is* the name, so to say "path name" sounds redundant and
> confusing to me.

"The path is the name" is a true statement in the context of:

1. The way *nix APIs work
2. Existing filesystem interfaces in the standard library
3. Path abstractions that inherit from str/unicode

It's no longer true in the context of pathlib - there, the path name
is a serialised representation of a rich path object.

It's also not really true in the context of Python 3 in general -
bytes-like objects are an encoding of the path name, rather than the
name itself.

This means that "path" has become ambiguous due to the changing
context - do we mean the path name representation, the binary encoding
of that name, or a higher level rich path object?

We're never going to be able to eliminate that ambiguity (Python's
*nix & C roots run too deep for that), but we *can* potentially
standardise the terms used when disambiguation is needed: path name
(str), encoded path name (bytes-like object), rich path object (object
implementing the new protocol)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From steve at pearwood.info  Sun Apr 10 01:08:45 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 Apr 2016 15:08:45 +1000
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
Message-ID: <20160410050845.GA12526@ando.pearwood.info>

I've just spotted this email from Guido, sorry about the delay in 
responding.

Further comments below.


On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote:

> I think the discussion petered out and nobody asked me to approve it yet
> (or I lost track of it). I'm almost happy to approve it in the current
> state. My only quibble is with some naming -- I'm not sure that a
> super-generic name like 'equal' is better than the original
> ('compare_digest'), 

Changed.


> and I would have picked a different name for token_url
> -- probably token_urlsafe. But maybe Steven can convince me that the names
> currently in the PEP are better.

Changed.


> (I also don't like the wishy-washy
> position of the PEP on the actual specs of the proposed functions. But I'm
> fine with the actual implementation shown as the spec.)

I'm not really sure what you want me to do to improve that. Can you be 
more concrete about what you would like the PEP to say?


I haven't updated the PEP yet, but the newest version of the secrets 
module with the changes requested is here:

https://bitbucket.org/sdaprano/secrets



-- 
Steve

From ncoghlan at gmail.com  Sun Apr 10 01:31:30 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 10 Apr 2016 15:31:30 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <5709309D.8030007@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
Message-ID: <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>

On 10 April 2016 at 02:41, Ethan Furman <ethan at stoneleaf.us> wrote:
> If we add os.fspath(), but don't allow bytes to be returned from it, our
> above example looks more like:
>
>   if isinstance(a_path_thingy, bytes):
>       # because os can accept bytes
>       pass
>   else:
>       a_path_thingy = os.fspath(a_path_thingy)
>   # do something with the path
>
> Yes, it's better -- but it still requires a pre-check before calling
> os.fspath().
>
> It is my contention that this is better:
>
>   a_path_thingy = os.fspath(a_path_thingy)

That approach often doesn't work, though - by design, there are
situations where you can't transparently handle bytes and str with the
same code path in Python 3 the way you could in Python 2.

When somebody hands you bytes rather than text you need to worry about
the encoding, and you need to worry about returning bytes rather than
text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1
provides an illustration of how fiddly that can get, and that's in the
URL context - cross-platform filesystem path handling is worse, since
you need to worry about the significant differences between the way
Windows and *nix handle binary paths, and you can't use os.sep
directly any more (since that's always text).

> This raises two issues:
>
> 1) Part of the stdlib is the new scandir module, which can work
>    with, and return, both bytes and text -- if __fspath__ can only
>    hold text, DirEntry will not get the __fspath__ method added,
>    and the pre-check, boiler-plate code will flourish;

DirEntry can still get the check, it can just throw TypeError when it
represents a binary path (that's one of the advantages of using a
method-based protocol - exceptions on method calls are more acceptable
than exceptions on property access).

> 2) pathlib.Path accepts bytes -- so what happens when a byte-derived
>    Path is passed to os.fspath()?  Is a TypeError raised?  Do we
>    guess and auto-convert with fsdecode()?

pathlib is str-only (which makes sense, since it's a cross-platform
API and binary paths basically don't work on Windows):

>>> pathlib.Path(b".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.4/pathlib.py", line 907, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib64/python3.4/pathlib.py", line 589, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib64/python3.4/pathlib.py", line 581, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>

The only specific mention of binary support in the pathlib docs is to
state that "bytes(p)" uses os.fsencode() to convert to the binary
representation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From greg.ewing at canterbury.ac.nz  Sun Apr 10 01:58:08 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 10 Apr 2016 17:58:08 +1200
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
Message-ID: <5709EB70.7030308@canterbury.ac.nz>

Brett Cannon wrote:

> Depends if you use `/` or `\` as your path separator

Or whether your pathnames look entirely different, e.g VMS:

   device:[topdir.subdir.subsubdir]filename.ext;version

Pathnames are very much OS-dependent in both syntax *and* semantics.

Even the main two in use today (unix and windows) can't be
mapped directly onto each other, because windows has drive
letters and unix doesn't.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sun Apr 10 02:24:23 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 10 Apr 2016 18:24:23 +1200
Subject: [Python-Dev] pathlib (was: Defining a path protocol)
In-Reply-To: <CADiSq7epja0XdVBd8ZPm2HjHzQp7hK3Zs86uXNv8q+ubx=o7iQ@mail.gmail.com>
References: <CA+OGgf659yEKH8_hfgZ2DiKcr8D7SGySzvvDfjo=ZwEk7t396g@mail.gmail.com>
 <CADiSq7epja0XdVBd8ZPm2HjHzQp7hK3Zs86uXNv8q+ubx=o7iQ@mail.gmail.com>
Message-ID: <5709F197.7020802@canterbury.ac.nz>

Nick Coghlan wrote:
> We want to be able to readily use the protocol helper in builtin
> modules like os and low level Python modules like os.path, which means
> we want it to be much lower down in the import hierarchy than pathlib.

Also, it's more general than that. It works on any
object that wants to behave as a path, not just
pathlib ones, so it should be in a neutral place.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sun Apr 10 02:38:35 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 10 Apr 2016 18:38:35 +1200
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <CALFfu7A9T2mjC3D+qHsLY0canj8FxO95x1EckODMq85HZMx_QQ@mail.gmail.com>
Message-ID: <5709F4EB.6060200@canterbury.ac.nz>

Eric Snow wrote:
> All this matters because it impacts the value returned from
> __ospath__().  Should it return the string representation of the path
> for the current OS or some standardized representation?

What standardized representation? I'm not aware of such
a thing.

> I'd expect
> the former.  However, if that is the expectation then something like
> pathlib.PureWindowsPath will give you the wrong thing if your current
> OS is linux.

No, you should get the representation corresponding to
the kind of path object you started with. If you're
working with Windows path objects on a Unix system,
they must be representing something on some Windows
system somewhere, not the one you're running the code
on. The only reason to ask for a string representation
of such a path is for use by that other system.

I don't think it even makes sense to ask for a Unix
representation of a Windows path or vice versa, because
the semantics are different. How do you translate a
Windows drive letter into Unix? What drive letter do
you use for an absolute Unix path?

-- 
Greg

From ncoghlan at gmail.com  Sun Apr 10 02:43:16 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 10 Apr 2016 16:43:16 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <5709EB70.7030308@canterbury.ac.nz>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
Message-ID: <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>

On 10 April 2016 at 15:58, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Brett Cannon wrote:
>
>> Depends if you use `/` or `\` as your path separator
>
>
> Or whether your pathnames look entirely different, e.g VMS:
>
>   device:[topdir.subdir.subsubdir]filename.ext;version
>
> Pathnames are very much OS-dependent in both syntax *and* semantics.
>
> Even the main two in use today (unix and windows) can't be
> mapped directly onto each other, because windows has drive
> letters and unix doesn't.

This does raise a concrete API design question: how should
PurePath.__fspath__ behave when called on a mismatched OS?

For PurePath vs Path, the latter raises NotImplementedError if you try
to create a concrete path that doesn't match the running system:

   >>> pathlib.PureWindowsPath(".")
   PureWindowsPath('.')
   >>> pathlib.WindowsPath(".")
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib64/python3.4/pathlib.py", line 910, in __new__
       % (cls.__name__,))
   NotImplementedError: cannot instantiate 'WindowsPath' on your system

The question we need to address is what happens if you do:

   >>> os.fspath(pathlib.PureWindowsPath("."))

on a *nix system?

Similar to my proposal for dealing with DirEntry.path being a
bytes-like object, I'd like to suggest raising TypeError in __fspath__
if the request is nonsensical for the currently running system - *nix
systems can *manipulate* Windows paths (and vice-versa), but actually
trying to *use* them with the local filesystem isn't going to work
properly, since the syntax and semantics are different.

   >>> os.fspath(pathlib.WindowsPath("."))
   Traceback (most recent call last):
       ...
   TypeError: cannot render 'PureWindowsPath' as filesystem path on
'posix' system

(I'm also suggesting replacing "your" with the value of os.name)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From greg.ewing at canterbury.ac.nz  Sun Apr 10 02:51:23 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 10 Apr 2016 18:51:23 +1200
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7czs0D0GRRVveb8XTS2KRaH-iAh9WLw60ROu3tQT6JpsQ@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
 <20160409130206.6E4B1B14158@webabinitio.net>
 <CADiSq7czs0D0GRRVveb8XTS2KRaH-iAh9WLw60ROu3tQT6JpsQ@mail.gmail.com>
Message-ID: <5709F7EB.9010603@canterbury.ac.nz>

> On 9 April 2016 at 23:02, R. David Murray <rdmurray at bitdance.com> wrote:
> 
>>That is, a 'filename' is the identifier we've assigned to this thing
>>pointed to by an inode in linux, but an os path is a text representation
>>of the path from the root filename to a specified filename.  That is,
>>the path *is* the name, so to say "path name" sounds redundant and
>>confusing to me.

The term "pathname" is what is conventionally used to refer
to a textual string passed to the OS to identify an object
in the file system.

It's often abbreviated to just "path", but that's ambiguous
for our purposes, because "path" can also refer to one of
our higher-level objects.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sun Apr 10 03:12:18 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 10 Apr 2016 19:12:18 +1200
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
Message-ID: <5709FCD2.3070808@canterbury.ac.nz>

Nick Coghlan wrote:
> Similar to my proposal for dealing with DirEntry.path being a
> bytes-like object, I'd like to suggest raising TypeError in __fspath__
> if the request is nonsensical for the currently running system - *nix
> systems can *manipulate* Windows paths (and vice-versa), but actually
> trying to *use* them with the local filesystem isn't going to work
> properly, since the syntax and semantics are different.

That sounds reasonable, since it would be preferable to
fail early if you mistakenly pass a PureWindowsPath to
e.g. open().

But there needs to be some way to ask a path object for
its native string representation, otherwise there would
be no point in using foreign path objects at all.

-- 
Greg

From ncoghlan at gmail.com  Sun Apr 10 03:36:36 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 10 Apr 2016 17:36:36 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <5709FCD2.3070808@canterbury.ac.nz>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
 <5709FCD2.3070808@canterbury.ac.nz>
Message-ID: <CADiSq7foLv27nsP8mkObZYt2zgi7RXOdstCqLyO3z1NAyK+M7w@mail.gmail.com>

On 10 April 2016 at 17:12, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>>
>> Similar to my proposal for dealing with DirEntry.path being a
>> bytes-like object, I'd like to suggest raising TypeError in __fspath__
>> if the request is nonsensical for the currently running system - *nix
>> systems can *manipulate* Windows paths (and vice-versa), but actually
>> trying to *use* them with the local filesystem isn't going to work
>> properly, since the syntax and semantics are different.
>
>
> That sounds reasonable, since it would be preferable to
> fail early if you mistakenly pass a PureWindowsPath to
> e.g. open().
>
> But there needs to be some way to ask a path object for
> its native string representation, otherwise there would
> be no point in using foreign path objects at all.

In addition to the existing "str(pathobj)", a "path" property was
recently added for that purpose:

   >>> import pathlib
   >>> pathlib.PureWindowsPath(".")
   PureWindowsPath('.')
   >>> pathlib.PureWindowsPath(".").path
   '.'

(The specific property name was chosen to match os.scandir's DirEntry.path)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From p.f.moore at gmail.com  Sun Apr 10 03:58:06 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 10 Apr 2016 08:58:06 +0100
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7foLv27nsP8mkObZYt2zgi7RXOdstCqLyO3z1NAyK+M7w@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
 <5709FCD2.3070808@canterbury.ac.nz>
 <CADiSq7foLv27nsP8mkObZYt2zgi7RXOdstCqLyO3z1NAyK+M7w@mail.gmail.com>
Message-ID: <CACac1F-XCA3x9Lh7TemJoLFiRvmsSXcuoO8f7-3gU3hOGtSeEQ@mail.gmail.com>

On 10 April 2016 at 08:36, Nick Coghlan <ncoghlan at gmail.com> wrote:
> In addition to the existing "str(pathobj)", a "path" property was
> recently added for that purpose:
>
>    >>> import pathlib
>    >>> pathlib.PureWindowsPath(".")
>    PureWindowsPath('.')
>    >>> pathlib.PureWindowsPath(".").path
>    '.'
>
> (The specific property name was chosen to match os.scandir's DirEntry.path)

I believe that under the current proposal, the ".path" property will
be removed again in favour of the new protocol, so the only actual
option would be str(pathobj).

Paul

From srkunze at mail.de  Sun Apr 10 10:07:50 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Sun, 10 Apr 2016 16:07:50 +0200
Subject: [Python-Dev] pathlib+os/shutil feedback
Message-ID: <570A5E36.2070606@mail.de>

I talked to my colleague. He didn't remember the concrete use-case, 
though, he instantly mentioned three possible things (no order of 
preference):

1) pathlib + mtime
2) os.walk and pathlib
3) creation/removal of paths

He wasn't too sure but I checked with the docs and his memories seemed 
to be correct:


-----

1) https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat

High-level path objects should return high-level [insert type here] 
objects. Put differently, an API for retrieving time-stats as real 
date/time objects would be nice. I think that can be expanded to other 
pathlib methods as well, to make them less "os-wrapper"-like and provide 
added value.


-----

2) I remember a discussion on python-ideas about using "glob" or 
"rglob". However, when searching the docs for "walk" like in "os.walk" 
or for "iter", I don't find "glob"/"rglob". I can imagine ourselves 
(pathlib newbies back then), we didn't discover them.

It would be great if the docs could be improved like the following:

"""
Path.rglob(pattern)
Walk down a given path; a wrapper for "os.scandir"/"os.listdir". This is 
like calling glob() with ?**? added in front of the given pattern:
"""

I think it would make "glob" and "rglob" more discoverable to new users.

NOTE: """ Using the ?**? pattern in large directory trees may consume an 
inordinate amount of time.""" sounds not really encouraging. This is 
especially true for  "rglob" as it is defined as "like calling glob() 
with ?**?".

That leads to wondering whether "rglob" performs slow globbing instead 
of a "os.scandir"/"os.listdir".

https://docs.python.org/3/library/pathlib.html#basic-use even promotes 
"glob" with "**" in the beginning which seems rather discouraging to use 
"rglob" as a fast alternative to "os.walk/scandir/listdir". Renaming 
"rglob"/adding a "scan" method would definitely help here.


-----

3) Again searching the docs for "create", "delete" (nothing found) and 
"remove", I found "Path.touch", "Path.rmdir" and "Path.unlink".

It would be great if we had an easy way to remove a complete subtree as 
with "shutil.rmtree". We mostly don't care if a directory is empty. We 
need the system to be in a state of "this path does not exist anymore".

Moreover, touching a file is good enough to "create" it if you don't 
care about changing its mtime. It you care about its mtime, it's a 
problem to "touch".

------


That's it for our issues with pathlib from the past. Additionally, I got 
two further observations:

A) pathlib tries to mimic/publish some low-level APIs to its users. 
"unlink" is not something people would expect to use when they want to 
"delete" or to "remove" a file or a directory. I know where the term 
stems from but it's the wrong layer of abstraction IMHO. Same for 
"touch" or "chmod".

B) "rename" vs "replace". The difference is not really clear from the 
docs. You need to read "Path.replace" in order to understand 
"Path.rename" completely. (one raises an exception, the other don't if I 
read it correctly).


If there's some agreement to change things with respect to those 5 
points, I am willing to put some time into it.


Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160410/ae882c9a/attachment.html>

From p.f.moore at gmail.com  Sun Apr 10 10:51:05 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 10 Apr 2016 15:51:05 +0100
Subject: [Python-Dev] pathlib+os/shutil feedback
In-Reply-To: <570A5E36.2070606@mail.de>
References: <570A5E36.2070606@mail.de>
Message-ID: <CACac1F-H23S0okraU_QqrmL-S7Q7gHNKqFba_4R04d95kR-+3w@mail.gmail.com>

On 10 April 2016 at 15:07, Sven R. Kunze <srkunze at mail.de> wrote:
> If there's some agreement to change things with respect to those 5 points, I
> am willing to put some time into it.

In broad terms I agree with these points. Thanks for doing the
research. It would certainly be good to try to improve pathlib based
on this sort of feedback while it is still provisional.

One specific point - you say:

"""
Path.rglob(pattern)
Walk down a given path; a wrapper for "os.scandir"/"os.listdir".
"""

However, at least in 3.5, Path.rglob does *not* wrap scandir. There's
a difference in principle, in that scandir (DirEntry) objects cache
stat data, where pathlib does not. Whether that makes using scandir in
Path.rglob impossible, I don't know. Ideally I'd like to see pathlib
modified to use scandir (because otherwise there will always be people
saying "use os.walk rather than scandir, as it's faster) - or if it's
not possible to do so because of the difference in principle, then I'd
like to see a clear discussion of the issue in the docs, including the
recommended approach for people who want scandir performance *without*
having to abandon pathlib for lower level functions.

Paul

From rdmurray at bitdance.com  Sun Apr 10 11:03:27 2016
From: rdmurray at bitdance.com (R. David Murray)
Date: Sun, 10 Apr 2016 11:03:27 -0400
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <5709F7EB.9010603@canterbury.ac.nz>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
 <20160409130206.6E4B1B14158@webabinitio.net>
 <CADiSq7czs0D0GRRVveb8XTS2KRaH-iAh9WLw60ROu3tQT6JpsQ@mail.gmail.com>
 <5709F7EB.9010603@canterbury.ac.nz>
Message-ID: <20160410150329.A58AEB14158@webabinitio.net>

On Sun, 10 Apr 2016 18:51:23 +1200, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > On 9 April 2016 at 23:02, R. David Murray <rdmurray at bitdance.com> wrote:
> > 
> >>That is, a 'filename' is the identifier we've assigned to this thing
> >>pointed to by an inode in linux, but an os path is a text representation
> >>of the path from the root filename to a specified filename.  That is,
> >>the path *is* the name, so to say "path name" sounds redundant and
> >>confusing to me.
> 
> The term "pathname" is what is conventionally used to refer
> to a textual string passed to the OS to identify an object
> in the file system.
> 
> It's often abbreviated to just "path", but that's ambiguous
> for our purposes, because "path" can also refer to one of
> our higher-level objects.

I find it interesting that in all my years of unix computing I've never
run into this (at least so that I became concious of it).  I see now
that in fact the Posix spec uses 'pathname'.

Objection, such as it was, completely withdrawn :)

(Nick's point about Path object vs path is also a good one.)

--David

From ethan at stoneleaf.us  Sun Apr 10 11:26:31 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 10 Apr 2016 08:26:31 -0700
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7foLv27nsP8mkObZYt2zgi7RXOdstCqLyO3z1NAyK+M7w@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
 <5709FCD2.3070808@canterbury.ac.nz>
 <CADiSq7foLv27nsP8mkObZYt2zgi7RXOdstCqLyO3z1NAyK+M7w@mail.gmail.com>
Message-ID: <570A70A7.3000609@stoneleaf.us>

On 04/10/2016 12:36 AM, Nick Coghlan wrote:
> On 10 April 2016 at 17:12, Greg Ewing wrote:

>> But there needs to be some way to ask a path object for
>> its native string representation, otherwise there would
>> be no point in using foreign path objects at all.
>
> In addition to the existing "str(pathobj)", a "path" property was
> recently added for that purpose:
>
>     >>> import pathlib
>     >>> pathlib.PureWindowsPath(".")
>     PureWindowsPath('.')
>     >>> pathlib.PureWindowsPath(".").path
>     '.'
>
> (The specific property name was chosen to match os.scandir's DirEntry.path)

But with the new __fspath__ enhancements wouldn't the .path attribute go 
away?

--
~Ethan~


From donald at stufft.io  Sun Apr 10 11:50:24 2016
From: donald at stufft.io (Donald Stufft)
Date: Sun, 10 Apr 2016 11:50:24 -0400
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
Message-ID: <F97151F6-0808-4E80-A9F8-E73D8397AABF@stufft.io>


> On Apr 10, 2016, at 2:43 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> This does raise a concrete API design question: how should
> PurePath.__fspath__ behave when called on a mismatched OS?

I think that PurePath.__fspath__ should return a string. There?s no
reason why we can?t in my opinion and doing so just limits the usefulness
of the method. For instance, it?d prevent it from being possible to
serialize a pure windows path and send it over the wire to a process running
on a Windows machine, like say if you have a build master running on Linux
and a build slave running on Windows.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160410/d739b350/attachment.sig>

From ethan at stoneleaf.us  Sun Apr 10 12:16:39 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 10 Apr 2016 09:16:39 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
Message-ID: <570A7C67.3010304@stoneleaf.us>

On 04/09/2016 10:31 PM, Nick Coghlan wrote:
> On 10 April 2016 at 02:41, Ethan Furman wrote:

> When somebody hands you bytes rather than text you need to worry about
> the encoding, and you need to worry about returning bytes rather than
> text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1
> provides an illustration of how fiddly that can get, and that's in the
> URL context - cross-platform filesystem path handling is worse, since
> you need to worry about the significant differences between the way
> Windows and *nix handle binary paths, and you can't use os.sep
> directly any more (since that's always text).

Okay, that makes sense.

> DirEntry can still get the check, it can just throw TypeError when it
> represents a binary path (that's one of the advantages of using a
> method-based protocol - exceptions on method calls are more acceptable
> than exceptions on property access).

I guess I don't see the point of this.  Either DirEntry's [1] only get 
partial support (which is only marginally better than the no support 
pathlib currently has), or stdlib code will need to catch those errors 
and then do an isinstance check to see if knows what the type is and how 
to deal with it [1].

On the other hand, if __fspath__ is allowed to hold bytes then the 
algorithm gets easier:

- get the serialized form
- check for bytes or str and act accordingly

As a practicality argument that seems a lot easier for everybody.

--
~Ethan~

[1] Being a low-level function I think working with either bytes or
     str is entirely appropriate for DirEntry.

[2] DirEntry?  Oh yeah, grab the .path attribute.  Something else?
     Bah, let the exception propogate.



From stephen at xemacs.org  Sun Apr 10 12:29:00 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 11 Apr 2016 01:29:00 +0900
Subject: [Python-Dev] Defining a path protocol
In-Reply-To: <57083FCB.1000808@stoneleaf.us>
References: <CAP1=2W4WDz1uy+bp02Kp_KaoRcuOHr6TVLezmuRhq26x4kw6mA@mail.gmail.com>
 <CADiSq7ePS5DWktKFHobunQAFka5mZZD0Xki7tQ8x-QNXJ=MN5g@mail.gmail.com>
 <ne28fo$flu$1@ger.gmane.org>
 <CADiSq7ekDoK5Pp12yKs8hzyq4msvodP9n8-ZtGmb2H27gG8kfg@mail.gmail.com>
 <CAPJVwB=u+Wq5-gFeiGa4TkJnf0mH4ZiCUO0LErQd9DS_mfQrKA@mail.gmail.com>
 <CADiSq7eNTJW93pxVsFiNypi7TDO9AV8AqrEyVW=F4aEziTNQqw@mail.gmail.com>
 <570526CE.5080401@stoneleaf.us>
 <CAP1=2W5-5EUAHP4vdX182p+EnoEWVCFK5Uib48VHCFnSSGVZ3w@mail.gmail.com>
 <CADiSq7di3DmjFDrkwTH-SEXdAaYt1dijLa5rJZHee4YCycV3FA@mail.gmail.com>
 <CALGmxEK63=dvaSSmEre_TM=uE_Nd=uoOYNodCzLWv5PxQTWE8Q@mail.gmail.com>
 <5707DDF3.5030106@stoneleaf.us>
 <CAP1=2W5xRrUJfEqAa32aOzgNNTXhTY-pAKt+3tEK3iEstG7i2Q@mail.gmail.com>
 <CAMiohoijSR5hqLHXy-cUKnC3ot3yY8OzWVwjEGmg=9EaMxMe8w@mail.gmail.com>
 <CAP1=2W6rpfXA30Fjgp6eM1kR=Ra2VH+xYoehjoXZurtG6pYttw@mail.gmail.com>
 <CAMiohoimFSJ_fzs7CNGaWnhV5z-kopQ5EaAJcNWJu6bK8X3fWg@mail.gmail.com>
 <57083FCB.1000808@stoneleaf.us>
Message-ID: <22282.32588.172670.633359@turnbull.sk.tsukuba.ac.jp>

Ethan Furman writes:

 > It means the stuff in place won't change, but the stuff we're
 > adding now to integrate with Path will only support str (which is
 > one reason why os.path isn't going to die).

I don't think this is a reason for keeping os.path.  (Backward
compatibility with existing code is sufficient, of course.)  Support
of str for all file names is provided by PEP 383.  ISTM there's no big
loss to using PEP 383's 'surrogateescape' handler to allow un-decode-
able filenames in pathlib.Path: they're very rare.  AFAIK pathlib
doesn't care about surrogates -- after all, they're entirely
"consenting adults" stuff.  Of course that detracts a bit from the
attractiveness of pathlib.Path vs. os.path or bytes methods, but only
for a use case most people won't encounter in practice.

We continue to support bytes at the os/io/open level for the same
reasons you added formatting back to bytes: there are times when it's
as least as natural to work with bytes as str (eg, when the path is
passed around without manipulation) and more convenient (eg, you don't
have to deal with encodings and UnicodeError handling).

 > After all, the idea is to make these things work with the stdlib, and 
 > the stdlib accepts bytes for path strings.

I don't see a problem.  In dealing with legacy data (archives that
include paths, such as .zips and .isos) we may find un-decode-able
paths, or paths that are decode-able but by undetermined encoding, for
a while to come (decades).  For those, the bytes interfaces are
preferable to unlovely expedients like decoding as 'iso8859-1'.  But
those are specialized use cases.

Sane people dealing with current file systems won't need bytes in
pathlib, and most "out of bounds" uses for pathlib I can think of in
my own experience will be able to use surrogateescape.


From jon+python-dev at unequivocal.co.uk  Sun Apr 10 12:43:08 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Sun, 10 Apr 2016 17:43:08 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
Message-ID: <20160410164308.GE17895@unequivocal.co.uk>

On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
>    Please don't loose time trying yet another sandbox inside CPython. It's
>    just a waste of time. It's broken by design.
> 
>    Please read my email about my attempt (pysandbox):
>    https://lwn.net/Articles/574323/
> 
>    And the LWN article:
>    https://lwn.net/Articles/574215/
> 
>    There are a lot of safe ways to run CPython inside a sandbox (and not rhe
>    opposite).
> 
>    I started as you, add more and more things to a blacklist, but it doesn't
>    work.

That's the opposite of my approach though - I'm starting small and
adding things, not starting with everything and removing stuff. Even
if what we end up with is an extremely restricted subset of Python,
there are still cases where that could be a useful tool to have.

I've read your links above, and indeed everything I can find written
by anyone about historical attempts to sandbox Python. I'm aware that
others have tried and failed at this in the past, so it's certainly
true that there is room for suspicion that this simply cannot be done.

However on the other hand, nobody has tried before to do what I am
doing (static code analysis), so it's not necessarily a safe
assumption that the idea is doomed. For example, as far as I can see,
none of the methods used to break out of your pysandbox would work to
break out of my experiment.

From jon+python-dev at unequivocal.co.uk  Sun Apr 10 12:51:13 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Sun, 10 Apr 2016 17:51:13 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
Message-ID: <20160410165113.GF17895@unequivocal.co.uk>

On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
> On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com> wrote:
> > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> > a list of know code to crash CPython (I don't recall the dieectory in
> > sources), even with the latest version of CPython.
> 
> They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers

Thanks. I take your point that sandboxing Python requires CPython to
free of code execution bugs. However I will note that none of the
crashers in that directory will work inside my experiment (except
"infinite_loop_re.py", which isn't a crasher just a long loop).

> Even without those considerations though, there are system level
> denial of service attacks that untrusted code can perform without even
> trying to break out of the sandbox - the most naive is "while 1:
> pass", but there are more interesting ones like "from itertools import
> count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int,
> 1))".

Yes, of course. I have already explicitly noted that infinite loops
and memory exhausation are not preventable.

> Operating system level security sandboxes still aren't particularly
> easy to use correctly, but they're a lot more reliable than language
> runtime level sandboxes, can be used to defend against many more
> attack vectors, and even offer increased flexibility (e.g. "can write
> to these directories, but no others", "can read these files, but no
> others", "can contact these IP addresses, but no others").

I don't entirely trust operating system sandboxes either - I generally
assume that if someone can execute arbitrary code on my machine, then
they can do anything they want to that machine.

What I *might* trust, though, would be a "sandbox Python" that is
itself running inside an operating system sandbox...

From guido at python.org  Sun Apr 10 14:43:08 2016
From: guido at python.org (Guido van Rossum)
Date: Sun, 10 Apr 2016 11:43:08 -0700
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <20160410050845.GA12526@ando.pearwood.info>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
Message-ID: <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>

Hi Steven,

No probIem with the delay -- it's still before 3.6.0. I do think it's
just about a record gap in the PEP review process. :-)

I will approve the PEP as soon as you've updated the two function
names in the PEP. (If you don't have write access to the peps repo,
send the new version to peps at python.org -- or send a link to the new
draft somewhere online, e.g. github if you're using that. If you do
have peps repo write access, just reply here when it's done.)

Regarding the alluded vagueness of the PEP on the specs, I think I was
mostly about the phrase "At the time of writing, the following
functions have been suggested" which doesn't seem to commit very
strongly to a specific API. The later phrase "The following
pseudo-code can be taken as a possible starting point for the real
implementation" doesn't really do much to take away the feeling that
the PEP is non-committal on the actual API it proposes. But I don't
want to approve the *idea* of a secrets module -- I want to approve a
specific API.

Maybe you can just change the words a bit to say something like "this
PEP proposes the following API; the implementations given here are not
final".

None of this will prevent adding more functions to secrets.py before
3.6.0 is released (or, of course, in 3.7, 3.8 etc.), but it should
send a clear message that we've agreed on these specific names and
signatures, and that those are what I'm approving. If we change our
minds about the API of the module before releasing 3.6.0, we should
treat it as an amendment to the PEP and take it pretty seriously (but
it's happened before so it's not impossible).

Hopefully this message isn't drowned in the infinity of pathlib and
~bool threads, and we can proceed to add secrets.py to the 3.6 stdlib.
You should be proud of that accomplishment!

--Guido

On Sat, Apr 9, 2016 at 10:08 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> I've just spotted this email from Guido, sorry about the delay in
> responding.
>
> Further comments below.
>
>
> On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote:
>
>> I think the discussion petered out and nobody asked me to approve it yet
>> (or I lost track of it). I'm almost happy to approve it in the current
>> state. My only quibble is with some naming -- I'm not sure that a
>> super-generic name like 'equal' is better than the original
>> ('compare_digest'),
>
> Changed.
>
>
>> and I would have picked a different name for token_url
>> -- probably token_urlsafe. But maybe Steven can convince me that the names
>> currently in the PEP are better.
>
> Changed.
>
>
>> (I also don't like the wishy-washy
>> position of the PEP on the actual specs of the proposed functions. But I'm
>> fine with the actual implementation shown as the spec.)
>
> I'm not really sure what you want me to do to improve that. Can you be
> more concrete about what you would like the PEP to say?
>
>
> I haven't updated the PEP yet, but the newest version of the secrets
> module with the changes requested is here:
>
> https://bitbucket.org/sdaprano/secrets
>
>
>
> --
> Steve
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From wes.turner at gmail.com  Sun Apr 10 17:05:59 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Sun, 10 Apr 2016 16:05:59 -0500
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACfEFw_gMtrPYAF6igEXi2ESq30qepdCCMTPp6WKk13S1b6Gmg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <CACfEFw90e8k8Xza95=j_-_9kaS6A-kaa_+aROJ17i3VTPC8VaA@mail.gmail.com>
 <CACfEFw_gMtrPYAF6igEXi2ESq30qepdCCMTPp6WKk13S1b6Gmg@mail.gmail.com>
Message-ID: <CACfEFw-M8k6qgua_VVXhUWPVC9mPr1PNmd6XP4XUEp2m4X53aw@mail.gmail.com>

On Apr 10, 2016 11:51 AM, "Jon Ribbens" <jon+python-dev at unequivocal.co.uk>
wrote:
>
> On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
> > On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com>
wrote:
> > > See pysandbox test suite for a lot of ways to escape a sandbox.
CPython has
> > > a list of know code to crash CPython (I don't recall the dieectory in
> > > sources), even with the latest version of CPython.
> >
> > They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
>
> Thanks. I take your point that sandboxing Python requires CPython to
> free of code execution bugs. However I will note that none of the
> crashers in that directory will work inside my experiment (except
> "infinite_loop_re.py", which isn't a crasher just a long loop).
>
> > Even without those considerations though, there are system level
> > denial of service attacks that untrusted code can perform without even
> > trying to break out of the sandbox - the most naive is "while 1:
> > pass", but there are more interesting ones like "from itertools import
> > count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int,
> > 1))".
>
> Yes, of course. I have already explicitly noted that infinite loops
> and memory exhausation are not preventable.
>
> > Operating system level security sandboxes still aren't particularly
> > easy to use correctly, but they're a lot more reliable than language
> > runtime level sandboxes, can be used to defend against many more
> > attack vectors, and even offer increased flexibility (e.g. "can write
> > to these directories, but no others", "can read these files, but no
> > others", "can contact these IP addresses, but no others").
>
> I don't entirely trust operating system sandboxes either - I generally
> assume that if someone can execute arbitrary code on my machine, then
> they can do anything they want to that machine.
>
> What I *might* trust, though, would be a "sandbox Python" that is
> itself running inside an operating system sandbox...
>

* https://github.com/jupyter/jupyterhub/wiki/Spawners
  - Docker LXC Containers
  - https://github.com/jupyter/jupyterhub/wiki/Authenticators
    - DOS is still trivial
    - Segfault is still trivial
* http://doc.pypy.org/en/latest/sandbox.html#introduction
_______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160410/4517aabe/attachment.html>

From storchaka at gmail.com  Sun Apr 10 17:07:48 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 11 Apr 2016 00:07:48 +0300
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410165113.GF17895@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
Message-ID: <neefb4$cjn$1@ger.gmane.org>

On 10.04.16 19:51, Jon Ribbens wrote:
> On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
>> On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com> wrote:
>>> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
>>> a list of know code to crash CPython (I don't recall the dieectory in
>>> sources), even with the latest version of CPython.
>>
>> They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
>
> Thanks. I take your point that sandboxing Python requires CPython to
> free of code execution bugs. However I will note that none of the
> crashers in that directory will work inside my experiment (except
> "infinite_loop_re.py", which isn't a crasher just a long loop).

Try following example:

     it = iter([1])
     for i in range(1000000):
         it = filter(None, it)
     next(it)



From Nikolaus at rath.org  Sun Apr 10 17:08:16 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Sun, 10 Apr 2016 14:08:16 -0700
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410164308.GE17895@unequivocal.co.uk> (Jon Ribbens's message
 of "Sun, 10 Apr 2016 17:43:08 +0100")
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
Message-ID: <87k2k5p3y7.fsf@vostro.rath.org>

On Apr 10 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
>>    Please don't loose time trying yet another sandbox inside CPython. It's
>>    just a waste of time. It's broken by design.
>> 
>>    Please read my email about my attempt (pysandbox):
>>    https://lwn.net/Articles/574323/
>> 
>>    And the LWN article:
>>    https://lwn.net/Articles/574215/
>> 
>>    There are a lot of safe ways to run CPython inside a sandbox (and not rhe
>>    opposite).
>> 
>>    I started as you, add more and more things to a blacklist, but it doesn't
>>    work.
>
> That's the opposite of my approach though - I'm starting small and
> adding things, not starting with everything and removing stuff.

That contradicts what you said in another mail:


On Apr 08 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> Ah, I've not used Python 3.5, and I can't find any documentation on
> this cr_frame business, but I've added cr_frame and f_back to the
> disallowed attributes list.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From jon+python-dev at unequivocal.co.uk  Sun Apr 10 17:53:41 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Sun, 10 Apr 2016 22:53:41 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <neefb4$cjn$1@ger.gmane.org>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
Message-ID: <20160410215341.GI17895@unequivocal.co.uk>

On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote:
> On 10.04.16 19:51, Jon Ribbens wrote:
> >On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
> >>On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com> wrote:
> >>>See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> >>>a list of know code to crash CPython (I don't recall the dieectory in
> >>>sources), even with the latest version of CPython.
> >>
> >>They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
> >
> >Thanks. I take your point that sandboxing Python requires CPython to
> >free of code execution bugs. However I will note that none of the
> >crashers in that directory will work inside my experiment (except
> >"infinite_loop_re.py", which isn't a crasher just a long loop).
> 
> Try following example:
> 
>     it = iter([1])
>     for i in range(1000000):
>         it = filter(None, it)
>     next(it)

That does indeed segfault. I guess you should report that as a bug!

From jon+python-dev at unequivocal.co.uk  Sun Apr 10 18:31:57 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Sun, 10 Apr 2016 23:31:57 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <87k2k5p3y7.fsf@vostro.rath.org>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <87k2k5p3y7.fsf@vostro.rath.org>
Message-ID: <20160410223157.GJ17895@unequivocal.co.uk>

On Sun, Apr 10, 2016 at 02:08:16PM -0700, Nikolaus Rath wrote:
> On Apr 10 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> > On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
> > That's the opposite of my approach though - I'm starting small and
> > adding things, not starting with everything and removing stuff.
> 
> That contradicts what you said in another mail:
> 
> On Apr 08 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> > Ah, I've not used Python 3.5, and I can't find any documentation on
> > this cr_frame business, but I've added cr_frame and f_back to the
> > disallowed attributes list.

No, you've just misunderstood my meaning. Obviously I'm not only
allowing access to whitelisted variable and property names, that
would be ridiculous ("your code may only use variables called
'foo', 'bar' and 'baz'...").

The point is that we can start with, say, only allowing expressions
and not statements, and a __builtins__ that contains literally
nothing. We can even limit ourselves to disallow, say, lambda and
yield and generator expressions if we like. Can this minimal
language be made "safe"? If so, we have already won something - the
ability to use "eval" as a powerful calculator function. Then, can
we allow statements? Can we allow user-defined classes? Can we allow
try/catch? etc.

With regard to names by the way, I suspect that disallowing just
anything starting "_" and the names of the properties of frame
objects would be good enough. Unless someone knows a way to get
to an object's __dict__ or its type without using vars() or type()
or underscore attributes...

From oscar.j.benjamin at gmail.com  Sun Apr 10 19:02:28 2016
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 11 Apr 2016 00:02:28 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410215341.GI17895@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
Message-ID: <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>

On 10 Apr 2016 22:55, "Jon Ribbens" <jon+python-dev at unequivocal.co.uk>
wrote:
>
> On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote:
> > On 10.04.16 19:51, Jon Ribbens wrote:
> > >On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:
> > >>On 9 April 2016 at 22:43, Victor Stinner <victor.stinner at gmail.com>
wrote:
> > >>>See pysandbox test suite for a lot of ways to escape a sandbox.
CPython has
> > >>>a list of know code to crash CPython (I don't recall the dieectory in
> > >>>sources), even with the latest version of CPython.
> > >>
> > >>They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers
> > >
> > >Thanks. I take your point that sandboxing Python requires CPython to
> > >free of code execution bugs. However I will note that none of the
> > >crashers in that directory will work inside my experiment (except
> > >"infinite_loop_re.py", which isn't a crasher just a long loop).
> >
> > Try following example:
> >
> >     it = iter([1])
> >     for i in range(1000000):
> >         it = filter(None, it)
> >     next(it)
>
> That does indeed segfault. I guess you should report that as a bug!

There will be always be obscure ways to crash the interpreter. That one can
be fixed but if someone really wants to break your sandbox this way then
they will be able to. Remember that exploits are often based on bugs and
any codebase the size of CPython has bugs.

I haven't looked at your sandbox but for a different approach try this one:

  L = [None]
  L.extend(iter(L))

On my Linux machine that doesn't just crash Python.

--
Oscar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/8294986b/attachment.html>

From jcgoble3 at gmail.com  Sun Apr 10 20:12:30 2016
From: jcgoble3 at gmail.com (Jonathan Goble)
Date: Sun, 10 Apr 2016 20:12:30 -0400
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk> <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
Message-ID: <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>

On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> I haven't looked at your sandbox but for a different approach try this one:
>
>   L = [None]
>   L.extend(iter(L))
>
> On my Linux machine that doesn't just crash Python.

For the record: don't try this if you have unsaved files open on your
computer, because you will lose them. When I typed these two lines
into the Py3.5 interactive prompt, it completely and totally froze
Windows to the point that nothing would respond and I had to resort to
the old trick of holding the power button down for five seconds to
forcibly shut the computer down.

Fortunately, I made extra certain everything was fully saved before I
opened the Python interpreter, so I'm not TOTALLY dumb. :-P

From tseaver at palladion.com  Sun Apr 10 21:49:27 2016
From: tseaver at palladion.com (Tres Seaver)
Date: Sun, 10 Apr 2016 21:49:27 -0400
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410223157.GJ17895@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk> <87k2k5p3y7.fsf@vostro.rath.org>
 <20160410223157.GJ17895@unequivocal.co.uk>
Message-ID: <neevra$i4k$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/10/2016 06:31 PM, Jon Ribbens wrote:
> Unless someone knows a way to get to an object's __dict__ or its type
> without using vars() or type() or underscore attributes...

Hmm, 'classmethod'-wrapped functions get passed the type.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJXCwKgAAoJEPKpaDSJE9HYHbAP/ibVrlKBTqkwePFr4n4hfA5Z
6te+FCzYm4RfAiIMq0Mitc9mFzeeAx5J9Z6kxONkbCBoBbhttcngR1uHWHHR/7tk
a9OVKCu0fzvQvKM9J1wPWdu6uB50TZ2PmRiZ1nChXG2XKC8F3xnj/JwZod0N+3vK
zus1T6/5vB6pm+q/hm9gh1yd9gTRldzoVQ9T2Tp8vo6PiYxe5qBwfhIHKR8xtWVs
eUG0OR1w8QzaU97NDTOShotDq9Ekow66zqlhppqUGSmt2nOTDndLekse6q1l/oir
nMuPBxgyb/CkQ9+KNXb3UvT5l8MLmCtJaMm/To0n8OUBSXG8sspP0oUSiMLUXc5a
F/haZnpD2jLmCFz9ivBxIpFRVkLIajwovzLLItSzePclZHj6TChctSQvGPY0roVD
BYVnGa4i7vi46mSzkeWvXKT2XFed2pCklD+FLnS6RnShxaxj1VEct8LVAJHFNAJ4
qg1dyLlTeclWUdoerRdGG2J7oa3Ib04ydh9OxnB1Y5KGa5iDCmfydHw24BU0gzvu
DIX8tEpq5XSqzN5QAkIbtIV5nyqFwPj1Jun275ETkESTvI0fdja/8RJvJ5npYZj0
yJ5Gc5iXwQWazF18ALFYdyeV+ZKKv2Q5UiYEOBxG02XYaH8GZypAqMbf5apJKQAj
PXHMjfW/YIuASrzcporx
=1Wrb
-----END PGP SIGNATURE-----


From steve at pearwood.info  Sun Apr 10 23:09:19 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 11 Apr 2016 13:09:19 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk> <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
 <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
Message-ID: <20160411030919.GC12526@ando.pearwood.info>

On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
> On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
> <oscar.j.benjamin at gmail.com> wrote:
> > I haven't looked at your sandbox but for a different approach try this one:
> >
> >   L = [None]
> >   L.extend(iter(L))
> >
> > On my Linux machine that doesn't just crash Python.
> 
> For the record: don't try this if you have unsaved files open on your
> computer, because you will lose them. When I typed these two lines
> into the Py3.5 interactive prompt, it completely and totally froze
> Windows to the point that nothing would respond and I had to resort to
> the old trick of holding the power button down for five seconds to
> forcibly shut the computer down.


I think this might improve matters:

http://bugs.python.org/issue26351

although I must admit I don't understand why the entire OS is effected.




-- 
Steve

From phd at phdru.name  Sun Apr 10 23:50:31 2016
From: phd at phdru.name (Oleg Broytman)
Date: Mon, 11 Apr 2016 05:50:31 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411030919.GC12526@ando.pearwood.info>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
 <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
 <20160411030919.GC12526@ando.pearwood.info>
Message-ID: <20160411035031.GA7952@phdru.name>

On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
> > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
> > <oscar.j.benjamin at gmail.com> wrote:
> > > I haven't looked at your sandbox but for a different approach try this one:
> > >
> > >   L = [None]
> > >   L.extend(iter(L))
> > >
> > > On my Linux machine that doesn't just crash Python.
> > 
> > For the record: don't try this if you have unsaved files open on your
> > computer, because you will lose them. When I typed these two lines
> > into the Py3.5 interactive prompt, it completely and totally froze
> > Windows to the point that nothing would respond and I had to resort to
> > the old trick of holding the power button down for five seconds to
> > forcibly shut the computer down.
> 
> 
> I think this might improve matters:
> 
> http://bugs.python.org/issue26351
> 
> although I must admit I don't understand why the entire OS is effected.

   Memory exhaustion?

> -- 
> Steve

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From wes.turner at gmail.com  Mon Apr 11 01:42:47 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 11 Apr 2016 00:42:47 -0500
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411035031.GA7952@phdru.name>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
 <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
 <20160411030919.GC12526@ando.pearwood.info>
 <20160411035031.GA7952@phdru.name>
Message-ID: <CACfEFw_GHGsLuvf5eED=dZ-d2LNTbzMCBLwdyaqzaJbrTT-Cqw@mail.gmail.com>

On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd at phdru.name> wrote:

> On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano <
> steve at pearwood.info> wrote:
> > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
> > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
> > > <oscar.j.benjamin at gmail.com> wrote:
> > > > I haven't looked at your sandbox but for a different approach try
> this one:
> > > >
> > > >   L = [None]
> > > >   L.extend(iter(L))
> > > >
> > > > On my Linux machine that doesn't just crash Python.
> > >
> > > For the record: don't try this if you have unsaved files open on your
> > > computer, because you will lose them. When I typed these two lines
> > > into the Py3.5 interactive prompt, it completely and totally froze
> > > Windows to the point that nothing would respond and I had to resort to
> > > the old trick of holding the power button down for five seconds to
> > > forcibly shut the computer down.
> >
> >
> > I think this might improve matters:
> >
> > http://bugs.python.org/issue26351
> >
> > although I must admit I don't understand why the entire OS is effected.
>
>    Memory exhaustion?
>

*
https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir

* https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile



>
> > --
> > Steve
>
> Oleg.
> --
>      Oleg Broytman            http://phdru.name/            phd at phdru.name
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/926179d3/attachment.html>

From phd at phdru.name  Mon Apr 11 02:06:34 2016
From: phd at phdru.name (Oleg Broytman)
Date: Mon, 11 Apr 2016 08:06:34 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACfEFw_GHGsLuvf5eED=dZ-d2LNTbzMCBLwdyaqzaJbrTT-Cqw@mail.gmail.com>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
 <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
 <20160411030919.GC12526@ando.pearwood.info>
 <20160411035031.GA7952@phdru.name>
 <CACfEFw_GHGsLuvf5eED=dZ-d2LNTbzMCBLwdyaqzaJbrTT-Cqw@mail.gmail.com>
Message-ID: <20160411060634.GA16992@phdru.name>

On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner <wes.turner at gmail.com> wrote:
> On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd at phdru.name> wrote:
> 
> > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano <
> > steve at pearwood.info> wrote:
> > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
> > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
> > > > <oscar.j.benjamin at gmail.com> wrote:
> > > > > I haven't looked at your sandbox but for a different approach try
> > this one:
> > > > >
> > > > >   L = [None]
> > > > >   L.extend(iter(L))
> > > > >
> > > > > On my Linux machine that doesn't just crash Python.
> > > >
> > > > For the record: don't try this if you have unsaved files open on your
> > > > computer, because you will lose them. When I typed these two lines
> > > > into the Py3.5 interactive prompt, it completely and totally froze
> > > > Windows to the point that nothing would respond and I had to resort to
> > > > the old trick of holding the power button down for five seconds to
> > > > forcibly shut the computer down.
> > >
> > >
> > > I think this might improve matters:
> > >
> > > http://bugs.python.org/issue26351
> > >
> > > although I must admit I don't understand why the entire OS is effected.
> >
> >    Memory exhaustion?
> 
> *
> https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir
> 
> * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile

   I think memory control groups in Linux can be used to limit memory
usage. I have mem. c. g. configured and I'll try to find time to
experiment with the code above.

> > > --
> > > Steve

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From ncoghlan at gmail.com  Mon Apr 11 02:20:05 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Apr 2016 16:20:05 +1000
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <F97151F6-0808-4E80-A9F8-E73D8397AABF@stufft.io>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CALFfu7Asr3WkV+0rxLVjvgsMuqMZL=5eGzrJc_8mi-GaV6LVfA@mail.gmail.com>
 <-8088150910827119255@unknownmsgid>
 <CAP1=2W4N9FdEYbBs9A1a0zK-dDnU4_bz+d_up85xTmu8n_J58A@mail.gmail.com>
 <5709EB70.7030308@canterbury.ac.nz>
 <CADiSq7fASLApV6_begjq9A7HLJaY32kyduQV0RJWmqDeZAKTxw@mail.gmail.com>
 <F97151F6-0808-4E80-A9F8-E73D8397AABF@stufft.io>
Message-ID: <CADiSq7eUZ8HV41gNcAKUG6-8ktoy15oP4OuAqirMFrXU9mz7_A@mail.gmail.com>

On 11 April 2016 at 01:50, Donald Stufft <donald at stufft.io> wrote:
>
>> On Apr 10, 2016, at 2:43 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> This does raise a concrete API design question: how should
>> PurePath.__fspath__ behave when called on a mismatched OS?
>
> I think that PurePath.__fspath__ should return a string. There?s no
> reason why we can?t in my opinion and doing so just limits the usefulness
> of the method. For instance, it?d prevent it from being possible to
> serialize a pure windows path and send it over the wire to a process running
> on a Windows machine, like say if you have a build master running on Linux
> and a build slave running on Windows.

Yeah, given that you have to go out of your way to create a path
object for an alternate platform, this makes sense - the "I know what
I'm doing" indicator is calling pathlib.Pure[Windows|Posix]Path
instead of ""pathlib.PurePath in the first place, and so __fspath__
can just do its thing as a pure text-based operation, without worrying
about the current platform.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Mon Apr 11 02:27:07 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Apr 2016 16:27:07 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570A7C67.3010304@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
Message-ID: <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>

On 11 April 2016 at 02:16, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/09/2016 10:31 PM, Nick Coghlan wrote:
>>
>> On 10 April 2016 at 02:41, Ethan Furman wrote:
>
>
>> When somebody hands you bytes rather than text you need to worry about
>> the encoding, and you need to worry about returning bytes rather than
>> text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1
>> provides an illustration of how fiddly that can get, and that's in the
>> URL context - cross-platform filesystem path handling is worse, since
>> you need to worry about the significant differences between the way
>> Windows and *nix handle binary paths, and you can't use os.sep
>> directly any more (since that's always text).
>
>
> Okay, that makes sense.
>
>> DirEntry can still get the check, it can just throw TypeError when it
>> represents a binary path (that's one of the advantages of using a
>> method-based protocol - exceptions on method calls are more acceptable
>> than exceptions on property access).
>
>
> I guess I don't see the point of this.  Either DirEntry's [1] only get
> partial support (which is only marginally better than the no support pathlib
> currently has), or stdlib code will need to catch those errors and then do
> an isinstance check to see if knows what the type is and how to deal with it
> [1].

What's wrong with only gaining partial support? Standard library code
that doesn't currently support DirEntry at all will gain the ability
to support str-based DirEntry objects, while bytes-based DirEntry
objects will continue to be a low level object that isn't
interoperable with most other APIs (which is fine - anyone writing low
level POSIX-specific code can deal with unpacking the values
explicitly, it just won't happen implicitly anywhere).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From raymond.hettinger at gmail.com  Mon Apr 11 02:36:29 2016
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sun, 10 Apr 2016 23:36:29 -0700
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
Message-ID: <C8EF3BA7-B89F-43AE-8BF5-8FB1383C2D44@gmail.com>


> On Apr 10, 2016, at 11:43 AM, Guido van Rossum <guido at python.org> wrote:
> 
> I will approve the PEP as soon as you've updated the two function
> names in the PEP. 

Congratulations Steven.


Raymond


From robertc at robertcollins.net  Mon Apr 11 03:08:54 2016
From: robertc at robertcollins.net (Robert Collins)
Date: Mon, 11 Apr 2016 19:08:54 +1200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <neevra$i4k$1@ger.gmane.org>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <87k2k5p3y7.fsf@vostro.rath.org>
 <20160410223157.GJ17895@unequivocal.co.uk>
 <neevra$i4k$1@ger.gmane.org>
Message-ID: <CAJ3HoZ3ci7H2tbS-pbgpfdeH1sDaxvoU9F5sqjyV=QGz9VOG2A@mail.gmail.com>

On 11 April 2016 at 13:49, Tres Seaver <tseaver at palladion.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/10/2016 06:31 PM, Jon Ribbens wrote:
>> Unless someone knows a way to get to an object's __dict__ or its type
>> without using vars() or type() or underscore attributes...
>
> Hmm, 'classmethod'-wrapped functions get passed the type.

yeah, but to access that you need to assign the descriptor to the type
- circular loop. If you can arrange that assignment its easy:


thetype = []
class gettype:
    def __get__(self, obj, type=None):
        thetype.append((obj, type))
        return None

classIwant.query = gettype()
classIwant().query
thetype[0][1]...

but you've already gotten to classIwant there.

-Rob

-- 
Robert Collins <rbtcollins at hpe.com>
Distinguished Technologist
HP Converged Cloud

From storchaka at gmail.com  Mon Apr 11 03:26:57 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 11 Apr 2016 10:26:57 +0300
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410215341.GI17895@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk> <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
Message-ID: <nefjk1$br4$1@ger.gmane.org>

On 11.04.16 00:53, Jon Ribbens wrote:
>> Try following example:
>>
>>      it = iter([1])
>>      for i in range(1000000):
>>          it = filter(None, it)
>>      next(it)
>
> That does indeed segfault. I guess you should report that as a bug!

There is old issue that doesn't have adequate solution. And this is only 
one example, you can get segfault with other recursive iterators.



From phd at phdru.name  Mon Apr 11 05:17:58 2016
From: phd at phdru.name (Oleg Broytman)
Date: Mon, 11 Apr 2016 11:17:58 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411060634.GA16992@phdru.name>
References: <CADiSq7ctMHv_aHSBTO6aohdHOXFfXHV_42VchRVTpM-yJpN=Sg@mail.gmail.com>
 <20160410165113.GF17895@unequivocal.co.uk>
 <neefb4$cjn$1@ger.gmane.org>
 <20160410215341.GI17895@unequivocal.co.uk>
 <CAHVvXxTbg6W3G2QjNPMFrvx3W-JvWOuTgaUZBYJe1zNGVNmE+g@mail.gmail.com>
 <CAK256p21uQfPo7imTJtn9vB7hAZhuKY3wW2T6soTbw4esWX-cw@mail.gmail.com>
 <20160411030919.GC12526@ando.pearwood.info>
 <20160411035031.GA7952@phdru.name>
 <CACfEFw_GHGsLuvf5eED=dZ-d2LNTbzMCBLwdyaqzaJbrTT-Cqw@mail.gmail.com>
 <20160411060634.GA16992@phdru.name>
Message-ID: <20160411091758.GA20672@phdru.name>

On Mon, Apr 11, 2016 at 08:06:34AM +0200, Oleg Broytman <phd at phdru.name> wrote:
> On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner <wes.turner at gmail.com> wrote:
> > On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman <phd at phdru.name> wrote:
> > 
> > > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano <
> > > steve at pearwood.info> wrote:
> > > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote:
> > > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin
> > > > > <oscar.j.benjamin at gmail.com> wrote:
> > > > > > I haven't looked at your sandbox but for a different approach try
> > > this one:
> > > > > >
> > > > > >   L = [None]
> > > > > >   L.extend(iter(L))
> > > > > >
> > > > > > On my Linux machine that doesn't just crash Python.
> > > > >
> > > > > For the record: don't try this if you have unsaved files open on your
> > > > > computer, because you will lose them. When I typed these two lines
> > > > > into the Py3.5 interactive prompt, it completely and totally froze
> > > > > Windows to the point that nothing would respond and I had to resort to
> > > > > the old trick of holding the power button down for five seconds to
> > > > > forcibly shut the computer down.
> > > >
> > > >
> > > > I think this might improve matters:
> > > >
> > > > http://bugs.python.org/issue26351
> > > >
> > > > although I must admit I don't understand why the entire OS is effected.
> > >
> > >    Memory exhaustion?
> > *
> > https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir
> > 
> > * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile
> 
>    I think memory control groups in Linux can be used to limit memory
> usage. I have mem. c. g. configured and I'll try to find time to
> experiment with the code above.

   With limited memory it was fast:

$ ulimit -d 50000 -m 80000 -s 10000 -v 100000
$ python
Python 2.7.9 (default, Mar  1 2015, 18:22:53) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> L = [None]
>>> L.extend(iter(L))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

   Memory control groups don't help because they don't limit virtual
memory so the process simply starts thrashing.

> > > > --
> > > > Steve

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From victor.stinner at gmail.com  Mon Apr 11 05:40:05 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 11 Apr 2016 11:40:05 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160410164308.GE17895@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
Message-ID: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>

2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote:
>>    Please don't loose time trying yet another sandbox inside CPython. It's
>>    just a waste of time. It's broken by design.
>>
>>    Please read my email about my attempt (pysandbox):
>>    https://lwn.net/Articles/574323/
>>
>>    And the LWN article:
>>    https://lwn.net/Articles/574215/
>>
>>    There are a lot of safe ways to run CPython inside a sandbox (and not rhe
>>    opposite).
>>
>>    I started as you, add more and more things to a blacklist, but it doesn't
>>    work.
>
> That's the opposite of my approach though - I'm starting small and
> adding things, not starting with everything and removing stuff. Even
> if what we end up with is an extremely restricted subset of Python,
> there are still cases where that could be a useful tool to have.

You design rely on the assumption that CPython is only pure Python.
That's wrong. A *lot* of Python features are implemented in C and
"ignore" your sandboxing code. Quick reminder: 50% of CPython is
written in the C language.

It means that your protections like hiding builtin functions from the
Python scope don't work. If an attacker gets access to a C function
giving access to the hidden builtin, the game is over.

pysandbox is based on the idea of tav (his project safelite.py):
remove features in the dictionary of builtin C types like FrameType,
CodeObject, etc. See sandbox/attributes.py. It's not enough to be 100%
safe, a C function can still access fields of the C structure
directly, but it was enough to protect "most" C functions.

It's hard to list all features of the C code which are indirectly
accessible from the Python scope. Some examples: warnings and
tracebacks. These features killed the pysandbox project because they
open directly files on the filesystem, it's not possible to control
these features from the Python scope.

Another example which exposes a vulnerability of your sandbox:
str.format() gets directly object attributes without the getattr()
builtin function, so it's possible to escape your sandbox. Example:
"{0.__class__}".format(obj) shows the type of an object.

Think also about the new f-string which allows arbitrary Python code: f"{code}".


> However on the other hand, nobody has tried before to do what I am
> doing (static code analysis),

You're wrong.

Zope Security ("RestrictedPython") has a similar design. Analyzing AST
is a common design to build a sanbox. But it's not safe.

The "See also" section of my pysandbox has a long list of Python
sandboxes without various design.


> so it's not necessarily a safe
> assumption that the idea is doomed. For example, as far as I can see,
> none of the methods used to break out of your pysandbox would work to
> break out of my experiment.

What I see is that you asked to break your sandbox, and less than 1
hour later, a first vulnerability was found (exec called with two
parameters). A few hours later, a second vulnerability was found
(async generator and cr_frame). By the way, are you sure that you
fixed the vulnerability? You blacklisted "cb_frame", not cr_frame ;-)

You should look closer, pysandbox is very close to you project. It
also uses whitelists for some protections (ex: builtins) and blacklist
for other protections (ex: hide sensitive attributes). You are using a
blacklist for attributes. By the way, you hide cr_frame but not
cr_code. I'm quite sure that it's possible to execute arbitrary
bytecode in your sandbox, I just don't have enough time to dig into
the code. Your sandbox is not fully based on whitelists.

Victor

From k7hoven at gmail.com  Mon Apr 11 07:46:12 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Mon, 11 Apr 2016 14:46:12 +0300
Subject: [Python-Dev] Pathlib enhancments - method name only
In-Reply-To: <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
References: <5707F4DB.7000501@stoneleaf.us>
 <CAP1=2W49VL9AQpLWrrTs2CDxQfTzctpfwbn_cvkaEoRTOHu5Ag@mail.gmail.com>
 <CADiSq7ci_fK0S1u_PADFPNMR-4GRsOjcd8sJSXnSrFwo+Wc8cA@mail.gmail.com>
Message-ID: <CAMiohogxY3EWKT+P+xQ7EcNoPN9QeFBAvnVATJiiSOOVGmizgg@mail.gmail.com>

On Sat, Apr 9, 2016 at 10:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 9 April 2016 at 04:25, Brett Cannon <brett at python.org> wrote:
>> On Fri, 8 Apr 2016 at 11:13 Ethan Furman <ethan at stoneleaf.us> wrote:
>>> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
>>>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
>>>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
>>>  >>>
>>>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
>>>  >>> more like a string than fspath does.
>>>  >>
>>>  >>
>>>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
>>>  >> after all, we're making a big deal out of the fact that a path is
>>>  >> *not a string*, but rather a string is a *representation* (or
>>>  >> serialization) of a path.
>>>
>>> That's a decent point.
>>>
>>> So the plausible choices are, I think:
>>>
>>> - __fspath__  # File System Path -- possible confusion with Path
>>
>> +1
>
> I like __fspath__, but I'm also sympathetic to Koos' point that we're
> really dealing with path *names* being produced via this protocol,
> rather than the paths themselves.
>
> That would bring the completely explicit "__fspathname__" into the
> mix, which would be comparable in length to "__getattribute__" as a
> magic method name (both in terms of number of syllable and number of
> characters).
>
> Considering the helper function usage, here's some examples in
> combination with os.fsencode and os.fsdecode:
>
>     # Status quo for binary/text path conversions
>     text_path = os.fsdecode(bytes_path)
>     bytes_path = os.fsencode(text_path)
>
>     # Getting a text path from an arbitrary object
>     text_path = os.fspath(obj) # This doesn't scream "returns text!" to me
>     text_path = os.fspathname(obj) # This does
>
>     # Getting a binary path from an arbitrary object
>     bytes_path = os.fsencode(os.fspath(obj))
>     bytes_path = os.fsencode(os.fspathname(obj))
>
> I'm starting to think the semantic nudge from the "name" suffix when
> reading the code is worth the extra four characters when writing it
> (keeping in mind that the whole point of this exercise is that most
> folks *won't* be writing explicit conversions - the stdlib will handle
> it on their behalf).
>

Regarding the name, I completely agree with Nick's reasoning (above).
I'm not sure it's a high priority to make dunder-method names short.
They are not typed very often, and when the number of these
"protocols" increases, you face potentially ambiguous names more and
more often (there already is a '__path__' and a '__file__' etc., as
has been brought up earlier in these threads.). In other words, it's a
good idea to have some information in the name.

> I also think the more explicit name helps answer some of the type
> signature questions that have arisen:
>
> 1. Does os.fspathname return rich Path objects? No, it returns names
> as str objects

Or byte strings, it seems, unfortunately.

> 2. Will file descriptors pass through os.fspathname? No, as they're
> not names, they're numeric descriptors.
> 3. Will bytes-like objects pass through os.fspathname? No, as they're
> not names, they're encodings of names
>

If fspathname(...) is to be used in os.path.*, it will break things if
it starts to turn encoded bytes pathnames into str pathnames, which it
did not previously do.

And if fspathname is not to be used in os.path.*, who would be our
intended user of fspathname? I assume we we don't want to encourage
typical 'users' to manipulate pathnames by hand.

>> I personally still like __ospath__ as well.
>
> That one fails the "Is it ambiguous when spoken aloud?" test for me:
> if someone mentions "oh-ess-path", are they talking about os.path or
> __ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem
> doesn't arise.
>

+1 to this too.

-Koos

From mail at timgolden.me.uk  Mon Apr 11 10:41:35 2016
From: mail at timgolden.me.uk (Tim Golden)
Date: Mon, 11 Apr 2016 15:41:35 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #25910:
 Fixed more links in the docs.
In-Reply-To: <20160411143851.18859.27207.908B2F75@psf.io>
References: <20160411143851.18859.27207.908B2F75@psf.io>
Message-ID: <570BB79F.2000708@timgolden.me.uk>

On 11/04/2016 15:38, serhiy.storchaka wrote:
> -  <http://www.openssl.org/docs/apps/ciphers.html#CIPHER_LIST_FORMAT>`__.
> +  <http://www.openssl.org/docs/apps/ciphers.html#CIPHER-LIST-FORMAT>`__.

Is there any intended irony in our link to openssl not being via https?

:)

TJG

From jon+python-dev at unequivocal.co.uk  Mon Apr 11 10:46:44 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Mon, 11 Apr 2016 15:46:44 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
Message-ID: <20160411144644.GA8206@unequivocal.co.uk>

On Mon, Apr 11, 2016 at 11:40:05AM +0200, Victor Stinner wrote:
> 2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> > That's the opposite of my approach though - I'm starting small and
> > adding things, not starting with everything and removing stuff. Even
> > if what we end up with is an extremely restricted subset of Python,
> > there are still cases where that could be a useful tool to have.
> 
> You design rely on the assumption that CPython is only pure Python.

No it doesn't. Obviously I know CPython is written in C - the clue is
in the name. I'm not sure what you mean here. 

> It means that your protections like hiding builtin functions from the
> Python scope don't work. If an attacker gets access to a C function
> giving access to the hidden builtin, the game is over.

The former is only true if you assume the latter is possible.
Is there any reason to believe it is?

> It's hard to list all features of the C code which are indirectly
> accessible from the Python scope. Some examples: warnings and
> tracebacks. These features killed the pysandbox project because they
> open directly files on the filesystem, it's not possible to control
> these features from the Python scope.

I think what you're referring to is when they show context for errors,
for which they try and find the source code lines to display by
identifying the filename, and you can subvert that process by changing
__file__ and/or __name__. If so, you can't do that within my
experiment because you're not allowed to access either of those names.

> Another example which exposes a vulnerability of your sandbox:
> str.format() gets directly object attributes without the getattr()
> builtin function, so it's possible to escape your sandbox. Example:
> "{0.__class__}".format(obj) shows the type of an object.

Yes, I'd thought of that. However getting access to a string which
contains the name or a representation of an object is not at all the
same thing as getting access to the object itself. 

> Think also about the new f-string which allows arbitrary Python
> code: f"{code}".

Obviously I can't speak to features of future versions of Python.
I'd have to see the ast generated by an f-string to know if they
pose a problem or not, but I suspect they would compile to
expression nodes and hence be caught by the existing checks.

> > However on the other hand, nobody has tried before to do what I am
> > doing (static code analysis),
> 
> You're wrong.
> 
> Zope Security ("RestrictedPython") has a similar design. Analyzing AST
> is a common design to build a sanbox. But it's not safe.

Ah, I hadn't seen that one. Yes, they are doing something similar
(but also much more complex!) I don't know why you say this is
a "common design" though, that one is the only one that appears to
use it.

> What I see is that you asked to break your sandbox, and less than 1
> hour later, a first vulnerability was found (exec called with two
> parameters). A few hours later, a second vulnerability was found
> (async generator and cr_frame).

The former was just a stupid bug, it says nothing about the viability
of the methodology. The latter was a new feature in a Python version
later than I have ever used, and again does not imply anything much
about the viability. I think now I've blocked the names of frame
object attributes it wouldn't be a vulnerability any more anyway.

> By the way, are you sure that you fixed the vulnerability? You
> blacklisted "cb_frame", not cr_frame ;-)

Ah, thanks. As above, I think this doesn't actually make any
difference, but I've updated the code regardless.

> You should look closer, pysandbox is very close to you project.

I've just looked through it all again, and I don't understand why you
are saying that. It's nothing like my experiment. It's trying to alter
the global Python environment so that arbitrary code can be executed,
whereas I am not even trying to allow execution of arbitrary code and
am not altering the global environment.

From antoine at python.org  Mon Apr 11 10:48:27 2016
From: antoine at python.org (Antoine Pitrou)
Date: Mon, 11 Apr 2016 14:48:27 +0000 (UTC)
Subject: [Python-Dev] Pathlib enhancments - method name only
References: <5707F4DB.7000501@stoneleaf.us>
Message-ID: <loom.20160411T164804-574@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:
> 
> That's a decent point.
> 
> So the plausible choices are, I think:
> 
> - __fspath__  # File System Path -- possible confusion with Path

This would have my preference.

Regards

Antoine.



From antoine at python.org  Mon Apr 11 10:56:25 2016
From: antoine at python.org (Antoine Pitrou)
Date: Mon, 11 Apr 2016 14:56:25 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Pathlib_enhancements_-_acceptable_inputs_?=
 =?utf-8?q?and_outputs_for_=5F=5Ffspath=5F=5F_and_os=2Efspath=28=29?=
References: <5709309D.8030007@stoneleaf.us>
Message-ID: <loom.20160411T164910-864@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:
>  > I also think the more explicit name helps answer some of the type
>  > signature questions that have arisen:
>  >
>  > 1. Does os.fspathname return rich Path objects? No, it returns names
>  > as str objects
>  > 2. Will file descriptors pass through os.fspathname? No, as they're
>  > not names, they're numeric descriptors.
>  > 3. Will bytes-like objects pass through os.fspathname? No, as they're
>  > not names, they're encodings of names
> 
> If we add os.fspath(), but don't allow bytes to be returned from it, our 
> above example looks more like:
> 
>    if isinstance(a_path_thingy, bytes):
>        # because os can accept bytes
>        pass
>    else:
>        a_path_thingy = os.fspath(a_path_thingy)
>    # do something with the path
> 
> Yes, it's better -- but it still requires a pre-check before calling 
> os.fspath().
> 
> It is my contention that this is better:
> 
>    a_path_thingy = os.fspath(a_path_thingy)

It's not better, because a_path_thingy then may be a bytes object,
and the os.fspath() caller has to deal with it.  Conversely, if
os.fspath() is guaranteed to return a unicode string, then the caller
only has to worry about bytes paths if it really wants to; most callers
probably don't care.

I know what some people say: support for bytes paths is necessary
for "low-level functions" (definition required ;-)).  But in a
PEP 383 world, it's not necessary at all.

> 2) pathlib.Path accepts bytes --

Does it? Or are you proposing such a change?

>>> pathlib.Path(b".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/35/lib/python3.5/pathlib.py", line 956, in __new__
    self = cls._from_parts(args, init=False)
  File "/home/antoine/35/lib/python3.5/pathlib.py", line 638, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/antoine/35/lib/python3.5/pathlib.py", line 630, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>


Regards

Antoine.



From k7hoven at gmail.com  Mon Apr 11 11:02:47 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Mon, 11 Apr 2016 18:02:47 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
Message-ID: <CAMiohoiJhS00jnYm5NH3jKq=8BTnkz=hhnV6MAO8CFo9vt6yWg@mail.gmail.com>

On Mon, Apr 11, 2016 at 9:27 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 11 April 2016 at 02:16, Ethan Furman <ethan at stoneleaf.us> wrote:
>>
>> I guess I don't see the point of this.  Either DirEntry's [1] only get
>> partial support (which is only marginally better than the no support pathlib
>> currently has), or stdlib code will need to catch those errors and then do
>> an isinstance check to see if knows what the type is and how to deal with it
>> [1].
>
> What's wrong with only gaining partial support? Standard library code
> that doesn't currently support DirEntry at all will gain the ability
> to support str-based DirEntry objects, while bytes-based DirEntry
> objects will continue to be a low level object that isn't
> interoperable with most other APIs (which is fine - anyone writing low
> level POSIX-specific code can deal with unpacking the values
> explicitly, it just won't happen implicitly anywhere).
>

While I'm also tempted to lean towards 'marginalizing bytes support',
it seems a little bit dangerous to me. Currently, os.path is heavily
based on duck typing of str and bytes, so there may be code out there
that does all kinds of things with paths without knowing whether it
deals with bytes or str objects. If such code gets in contact with
this pathname protocol, it will raise an exception whenever it happens
to be fed a bytes path. That is, if the approach of 'partial support'
is taken.

And still there is the question I just posted in another branch of
this mess: Who should use os.fspathname(...)? If it's os.path.* and
other traditional (low-level?) functions that deal with paths, then
fspathname should, in the name of backwards compatiblity, be able to
deal with bytes and return bytes in those cases.  Otherwise fspathname
would do nothing for you, and all the work of
isinstance/hasattr/whatever would be left to the caller of
os.fspathname (or maybe this is what you want?).

So a somewhat useful fspathname might indeed look something like this:

 def fspathname(pathlike) -> Union[str, bytes]:
     pathname = getattr(pathlike, '__fspathname__', pathlike)
     if not isinstance(pathname, (str, bytes)):
         raise TypeError("your thing is not pathlike")
     return pathname

But maybe it is enough to have the __fspathname__ attribute, and make
fspathname() some internal implementation detail of os.path.* and the
like.

-Koos

From p.f.moore at gmail.com  Mon Apr 11 11:04:21 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 11 Apr 2016 16:04:21 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411144644.GA8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
Message-ID: <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>

On 11 April 2016 at 15:46, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> It's trying to alter
> the global Python environment so that arbitrary code can be executed,
> whereas I am not even trying to allow execution of arbitrary code and
> am not altering the global environment.

However, it's not at all clear (to me at least) what you *are* trying
to do. You're limiting the subset of Python that people can use,
understood. And you're trying to ensure that people can't do "bad
things". Again, understood. But what subset are you actually allowing,
and what things are you trying to protect against? (For example, I
can't calculate sin(1.2) using the math module - why is that not
alllowed? It's just as safe as using the built in exponential
operator, and indeed I could write a sin() function in pure Python,
although it would be too slow to be useful, unlike math.sin...)

It feels at the moment as if I'm playing a game where I don't know the
rules, and every time I think I scored a point, the rules are changed
to retroactively disallow it.

Paul

From ethan at stoneleaf.us  Mon Apr 11 11:18:47 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 08:18:47 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <loom.20160411T164910-864@post.gmane.org>
References: <5709309D.8030007@stoneleaf.us>
 <loom.20160411T164910-864@post.gmane.org>
Message-ID: <570BC057.4040809@stoneleaf.us>

On 04/11/2016 07:56 AM, Antoine Pitrou wrote:

>> 2) pathlib.Path accepts bytes --
>
> Does it? Or are you proposing such a change?

It used to (I posted a couple examples from 3.5.0).  I finally rebuilt 
with the latest and it no longer does.

--
~Ethan~

From Nikolaus at rath.org  Mon Apr 11 11:35:11 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Mon, 11 Apr 2016 08:35:11 -0700
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411144644.GA8206@unequivocal.co.uk> (Jon Ribbens's message
 of "Mon, 11 Apr 2016 15:46:44 +0100")
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
Message-ID: <8760vorweo.fsf@thinkpad.rath.org>

On Apr 11 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
>> What I see is that you asked to break your sandbox, and less than 1
>> hour later, a first vulnerability was found (exec called with two
>> parameters). A few hours later, a second vulnerability was found
>> (async generator and cr_frame).
>
> The former was just a stupid bug, it says nothing about the viability
> of the methodology. The latter was a new feature in a Python version
> later than I have ever used, and again does not imply anything much
> about the viability.

It implies that new versions of Python may break your sandbox. That
doesn't sound like a viable long-term solution.

> I think now I've blocked the names of frame
> object attributes it wouldn't be a vulnerability any more anyway.

It seems like you're playing whack-a-mole. 


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From ijmorlan at uwaterloo.ca  Mon Apr 11 07:04:56 2016
From: ijmorlan at uwaterloo.ca (Isaac Morland)
Date: Mon, 11 Apr 2016 07:04:56 -0400 (EDT)
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1604110700450.24255@ubuntu1204-102.cs.uwaterloo.ca>

On Mon, 11 Apr 2016, Victor Stinner wrote:

> 2016-04-10 18:43 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
>>
>> That's the opposite of my approach though - I'm starting small and
>> adding things, not starting with everything and removing stuff. Even
>> if what we end up with is an extremely restricted subset of Python,
>> there are still cases where that could be a useful tool to have.
>
> You design rely on the assumption that CPython is only pure Python.
> That's wrong. A *lot* of Python features are implemented in C and
> "ignore" your sandboxing code. Quick reminder: 50% of CPython is
> written in the C language.
>
> It means that your protections like hiding builtin functions from the
> Python scope don't work. If an attacker gets access to a C function
> giving access to the hidden builtin, the game is over.
[....]

Non-Python core developer, non-expert-specifically-in-computer-security 
here, so won't take up much room on this list.

I know enough about almost everything in Computer Science to know just how 
ignorant I am about almost everything in Computer Science.

But I would not use for security purposes a Python sandbox that was not 
formally verified to be correct and unbreakable.  Of course in order for 
this to be possible, there first has to be a formal semantics for Python. 
Has anybody made a formal semantics for Python?  If not, then this project 
is missing a pretty important pre-requisite.

Isaac Morland           CSCF Web Guru
DC 2619, x36650         WWW Software Specialist

From jcristau at debian.org  Mon Apr 11 09:49:10 2016
From: jcristau at debian.org (Julien Cristau)
Date: Mon, 11 Apr 2016 15:49:10 +0200
Subject: [Python-Dev] tp_new selection regression in the 2.7 branch
Message-ID: <20160411134910.GG2889@betterave.cristau.org>

Hi,

changeset https://hg.python.org/cpython/rev/e7062dd9085e in the 2.7
branch changes how tp_new is assigned, and causes regressions with
multiple inheritance from extension classes.
http://bugs.python.org/issue25731#msg262922 has a fairly simple
reproducer using cython.  The __base__ attribute is set correctly, but
tp_new is now wrong and thus the object initialization is broken.

Can this change be fixed or reverted before the next 2.7.x release?

(I have not verified if this regression also affects the 3.5 branch)

Thanks,
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/8504e459/attachment.sig>

From rosuav at gmail.com  Mon Apr 11 12:01:33 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 02:01:33 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <alpine.DEB.2.02.1604110700450.24255@ubuntu1204-102.cs.uwaterloo.ca>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <alpine.DEB.2.02.1604110700450.24255@ubuntu1204-102.cs.uwaterloo.ca>
Message-ID: <CAPTjJmriG4-jiZ22Z_B3AG53u7tpBbWMfh7JLGFvKS7t3HHFoA@mail.gmail.com>

On Mon, Apr 11, 2016 at 9:04 PM, Isaac Morland <ijmorlan at uwaterloo.ca> wrote:
> But I would not use for security purposes a Python sandbox that was not
> formally verified to be correct and unbreakable.  Of course in order for
> this to be possible, there first has to be a formal semantics for Python.
> Has anybody made a formal semantics for Python?  If not, then this project
> is missing a pretty important pre-requisite.

Formal semantics for the language? Yes; most of docs.python.org is
about the language, independently of any particular implementation.
(There are odd notes here and there about "CPython implementation
detail" and such, and there are some entire modules that are
specifically stated as being implementation-specific, but they're a
tiny proportion.) You can also read through the PEPs, which (again,
for the most part) deal with language-level concerns ahead of
implementation details.

However, even with that information, it's virtually impossible to
formally verify that the sandbox is unbreakable. A Python-in-Python
sandbox is almost guaranteed to leak information across the boundary,
and when information is leaked, it's extremely hard to prove that
privilege escalation is impossible.

ChrisA

From ethan at stoneleaf.us  Mon Apr 11 12:18:01 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 09:18:01 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>	<570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
Message-ID: <570BCE39.8090306@stoneleaf.us>

On 04/10/2016 11:27 PM, Nick Coghlan wrote:
> On 11 April 2016 at 02:16, Ethan Furman <ethan at stoneleaf.us> wrote:

>>> DirEntry can still get the check, it can just throw TypeError when it
>>> represents a binary path (that's one of the advantages of using a
>>> method-based protocol - exceptions on method calls are more acceptable
>>> than exceptions on property access).
>>
>>
>> I guess I don't see the point of this.  Either DirEntry's [1] only get
>> partial support (which is only marginally better than the no support pathlib
>> currently has), or stdlib code will need to catch those errors and then do
>> an isinstance check to see if knows what the type is and how to deal with it
>> [1].
>
> What's wrong with only gaining partial support? Standard library code
> that doesn't currently support DirEntry at all will gain the ability
> to support str-based DirEntry objects, while bytes-based DirEntry
> objects will continue to be a low level object [...]

Let's consider to functions, one that accepts bytes/str for the path, 
and one that only accepts str:


   str-only support
   ----------------
   # before new protocol
   def do_fritz(a_path):
       if not isinstance(a_path, str):
           raise TypeError('str required')
       ...

   # after new protocol with str-only support
   def do_fritz(a_path):
       a_path = fspath(a_path)
       ...

   # after new protocol with bytes/str support
       a_path = fspath(a_path)
       if not isinstance(a_path, str):
           raise TypeError('str required')
       ...


   bytes/str support
   -----------------
   # before new protocol
   def zingar(a_path):
       if not isinstance(a_path, (bytes,str)):
           raise TypeError('bytes or str required')
       ...

   # after new protocol with str-only support
   def zingar(a_path):
       if not isinstance(a_path, bytes):
           try:
               a_path = fspath(a_path)
           except FSPathError:
               raise TypeError('bytes or str required')
       ...

   # after new protocol with bytes/str support
   def zingar(a_path):
       a_path = fspath(a_path)
       if not isinstance(a_path, (bytes,str)):
           raise TypeError('bytes or str required')
       ...


If those examples are anywhere close to accurate, an fspath protocol 
that supported both bytes and str seems a lot easier to work with.

--
~Ethan~

From ajm at flonidan.dk  Mon Apr 11 09:54:49 2016
From: ajm at flonidan.dk (Anders Munch)
Date: Mon, 11 Apr 2016 13:54:49 +0000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
Message-ID: <AMSPR05MB439721828924C4F7831BD14B4940@AMSPR05MB439.eurprd05.prod.outlook.com>

Steven D'Aprano:
> although I must admit I don't understand why the entire OS is effected.

A consequence of memory overcommit, I'd wager.  The crasher code not only allocates vast swathes of memory, but accesses it as well, which is bad news for Linux with overcommit enabled. When the OS runs out of backing store to handle page faults, anything can happen.
 
- Anders


From zachary.ware+pydev at gmail.com  Mon Apr 11 12:32:29 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Mon, 11 Apr 2016 11:32:29 -0500
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570BCE39.8090306@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
Message-ID: <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>

On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> If those examples are anywhere close to accurate, an fspath protocol that
> supported both bytes and str seems a lot easier to work with.

But why are you working with bytes paths in the first place? Where did
you get them from, and why couldn't you decode them at that boundary?
In 7ish years of working with Python (almost exclusively Python 3) on
Windows and UNIX, I have never used bytes paths on any platform.

-- 
Zach

From jon+python-dev at unequivocal.co.uk  Mon Apr 11 12:44:49 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Mon, 11 Apr 2016 17:44:49 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <8760vorweo.fsf@thinkpad.rath.org>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <8760vorweo.fsf@thinkpad.rath.org>
Message-ID: <20160411164449.GB8206@unequivocal.co.uk>

On Mon, Apr 11, 2016 at 08:35:11AM -0700, Nikolaus Rath wrote:
> On Apr 11 2016, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> >> What I see is that you asked to break your sandbox, and less than 1
> >> hour later, a first vulnerability was found (exec called with two
> >> parameters). A few hours later, a second vulnerability was found
> >> (async generator and cr_frame).
> >
> > The former was just a stupid bug, it says nothing about the viability
> > of the methodology. The latter was a new feature in a Python version
> > later than I have ever used, and again does not imply anything much
> > about the viability.
> 
> It implies that new versions of Python may break your sandbox. That
> doesn't sound like a viable long-term solution.

That is obviously always going to be true of major new versions with
major new features, no matter what language we're talking about or
what method is being used to sandbox - unless the sandboxing were to
be built in to the language itself, which I have deliberately not
suggested.

But having said that, I already pointed out in the message you're
responding to that with the method I'm using now, coroutines would
not have been an issue even if I hadn't specifically fixed them.

> > I think now I've blocked the names of frame
> > object attributes it wouldn't be a vulnerability any more anyway.
> 
> It seems like you're playing whack-a-mole. 

Well, no, quite the opposite in fact. If that was true then I would
have given up already as the method having been proved useless.
So far it looks like blocking "_*" and the frame object attributes
appears to be sufficient.

From jon+python-dev at unequivocal.co.uk  Mon Apr 11 12:53:54 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Mon, 11 Apr 2016 17:53:54 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
Message-ID: <20160411165354.GC8206@unequivocal.co.uk>

On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
> However, it's not at all clear (to me at least) what you *are* trying
> to do.

I'm trying to see to what extent we can use ast node inspection to
remedy the failures of prior attempts at Python sandboxing. Is there
*any* extent to which Python can be sandboxed, or is even trying to
use it as a calculator function unfixably insecure?

> You're limiting the subset of Python that people can use,
> understood. And you're trying to ensure that people can't do "bad
> things". Again, understood. But what subset are you actually allowing,
> and what things are you trying to protect against? (For example, I
> can't calculate sin(1.2) using the math module - why is that not
> alllowed?

It wasn't allowed in the earlier version because I wasn't allowing
import at all, because this is just an experiment. As it happens,
I added 'import' yesterday so yes you can use math.sin.

> It feels at the moment as if I'm playing a game where I don't know the
> rules, and every time I think I scored a point, the rules are changed
> to retroactively disallow it.

The challenge is to show some code that will escape from the sandbox,
in a way that is not trivially fixable with a tiny patch, or in a way
that demonstrates that such a large number of tiny patches would be
required as to be unworkable.

From rosuav at gmail.com  Mon Apr 11 13:02:54 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 03:02:54 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411165354.GC8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
Message-ID: <CAPTjJmriCFBHQgBrX7JnpO9WhaqmW4fMv0xTqd-qOjfZ3mR3tQ@mail.gmail.com>

On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
>> However, it's not at all clear (to me at least) what you *are* trying
>> to do.
>
> I'm trying to see to what extent we can use ast node inspection to
> remedy the failures of prior attempts at Python sandboxing. Is there
> *any* extent to which Python can be sandboxed, or is even trying to
> use it as a calculator function unfixably insecure?
>

It all depends on how much functionality you want. If all you need is
a numeric expression evaluator, that's not too hard - disallow all
forms of attribute access, etc, and just have simple numbers and
operators. That's pretty useful, and safe. Alternatively, go
completely the other way. Let people run whatever code they like... in
an environment where it can't hurt anyone else. That's what PyPyJS
does - don't bother looking for security holes in it, because all
you're doing is attacking your own computer.

The hard part comes when you want to allow *some*, but not all,
interaction with the outside world. When I was looking into this kind
of sandboxing (although it was Python-in-C++ rather than
Python-in-Python), it was to allow untrusted users to control certain
parts of server-side execution. The result was dismal, because it's
fundamentally impossible to allow the level of control I wanted
without allowing a level of control I didn't want.

So before you can ask whether Python is unfixably insecure, you first
have to decide what the minimum level of functionality is that you'll
accept. Do you need basic arithmetic plus trignometric functions? Easy
enough - disallow all attribute access and imports, and populate
builtins with "from math import *". Need them to be able to assign
variables and define functions? That's gonna be harder.

ChrisA

From ethan at stoneleaf.us  Mon Apr 11 13:12:55 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 10:12:55 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
Message-ID: <570BDB17.5000601@stoneleaf.us>

On 04/11/2016 09:32 AM, Zachary Ware wrote:
> On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote:

>> If those examples are anywhere close to accurate, an fspath protocol that
>> supported both bytes and str seems a lot easier to work with.
>
> But why are you working with bytes paths in the first place? Where did
> you get them from, and why couldn't you decode them at that boundary?
> In 7ish years of working with Python (almost exclusively Python 3) on
> Windows and UNIX, I have never used bytes paths on any platform.

I'm not saying that bytes paths are common -- and if this was a 
brand-new feature I wouldn't be pushing for it so hard;  however, bytes 
paths are already supported and it seems to me to be much less of a 
headache to continue the support in this new protocol instead of drawing 
an artificial line in the sand.

Also, let me be clear that the new protocol will not adversely affect my 
own library is it directly subclasses bytes and strings (bPath and 
uPath), so they will pass through either way (or be appropriately 
rejected if the function only supports str -- are there any?) .

This kind of feels like PEP 361 again -- the vast majority of Python 
programmers do not need %-interpolation for bytes, but what a pain in 
the rear for those that did!  (Yes, I was one of those.)  Admittedly, 
the pain from this will not be nearly as severe as that was, but why 
should we have any unnecessary pain at all?

Asked another way, what are we gaining by disallowing bytes in this new 
way of getting paths versus the pain caused when bytes are needed and/or 
accepted?

 From my point of view the pain of simply implementing this without 
bytes support in the existing os and os.path modules is not worth 
excluding bytes.

--
~Ethan~

From donald at stufft.io  Mon Apr 11 13:18:01 2016
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Apr 2016 13:18:01 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570BDB17.5000601@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
Message-ID: <FFBD240E-4352-40C0-81BC-3AF0A7AB4FB1@stufft.io>


> On Apr 11, 2016, at 1:12 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
> Asked another way, what are we gaining by disallowing bytes in this new way of getting paths versus the pain caused when bytes are needed and/or accepted?


It seems fine to me to allow __fspath__ to return bytes as well as str. The only argument I can think against it is that something like pathlib.Path() would not work with a bytes returning __fspath__, but that?s not any different than what happens if you pass a bytes object directly into pathlib.Path as well.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/6903d622/attachment.sig>

From brett at python.org  Mon Apr 11 13:36:33 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 11 Apr 2016 17:36:33 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570BDB17.5000601@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
Message-ID: <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>

On Mon, 11 Apr 2016 at 10:13 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/11/2016 09:32 AM, Zachary Ware wrote:
> > On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote:
>
> >> If those examples are anywhere close to accurate, an fspath protocol
> that
> >> supported both bytes and str seems a lot easier to work with.
> >
> > But why are you working with bytes paths in the first place? Where did
> > you get them from, and why couldn't you decode them at that boundary?
> > In 7ish years of working with Python (almost exclusively Python 3) on
> > Windows and UNIX, I have never used bytes paths on any platform.
>
> I'm not saying that bytes paths are common -- and if this was a
> brand-new feature I wouldn't be pushing for it so hard;  however, bytes
> paths are already supported and it seems to me to be much less of a
> headache to continue the support in this new protocol instead of drawing
> an artificial line in the sand.
>

Headache for you? The stdlib? Library authors? Users of libraries? There
are a lot of users of this who have varying levels of pain for this.


>
> Also, let me be clear that the new protocol will not adversely affect my
> own library is it directly subclasses bytes and strings (bPath and
> uPath), so they will pass through either way (or be appropriately
> rejected if the function only supports str -- are there any?) .
>

Well, technically it depends on whether we prefer the protocol or explicit
type checking and how we define the protocol. If we say __ospath__ has to
return str and we check for that first then that would be bad for you. If
we do isinstance() checks before calling the protocol or allow both str and
bytes then we open it up.


>
> This kind of feels like PEP 361 again -- the vast majority of Python
> programmers do not need %-interpolation for bytes, but what a pain in
> the rear for those that did!  (Yes, I was one of those.)  Admittedly,
> the pain from this will not be nearly as severe as that was, but why
> should we have any unnecessary pain at all?
>
> Asked another way, what are we gaining by disallowing bytes in this new
> way of getting paths versus the pain caused when bytes are needed and/or
> accepted?
>

Type consistency. E.g. if I pass in a DirEntry object into os.fspath() and
I don't know what the heck I'm getting back then that can lead to subtle
bugs, especially when you didn't check ahead of time what DirEntry.path
was. To me, that bumps up against "In the face of ambiguity, refuse the
temptation to guess". Having the type vary even when the type doesn't can
get messy if you don't expect to always vary (i.e. this isn't getattr()).


>
>  From my point of view the pain of simply implementing this without
> bytes support in the existing os and os.path modules is not worth
> excluding bytes.
>

How about we take something from the "explicit is better than implicit"
playbook and add a keyword argument to os.fspath() to allow bytes to pass
through?

  def fspath(path, *, allow_bytes=False):
      if isinstance(path, str):
          return path
      # Allow bytearray?
      elif allow_bytes and isinstance(path, bytes):
          return path
      try:
          protocol = path.__fspath__()
      except AttributeError:
          pass
      else:
          # Explicit type check worth it, or better to rely on duck typing?
          if isinstance(protocol_path, str):
              return protocol_path
      raise TypeError("expected a path-like object, str, or bytes (if
allowed), not {type(path)}")

For DirEntry users who use bytes, they will simply have to pass around
DirEntry.path which is not as nice as simply passing around DirEntry, but
it does allow them to continue to operate without having to decode the
bytes if allow_bytes is True. We get type consistency in the protocol fas
we can continue to expect people to return strings for __fspath__. And for
those APIs where supporting bytes won't be an issue, they can explicitly
choose to support bytes or not and then not have to juggle support for both
str and bytes if they choose not to. IOW consenting adults to bytes paths
can not get cut out and have a ton of hoops to jump through as long as they
opt-in, but those adults who don't consent to bytes paths have their lives
simplified.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/753a24ae/attachment-0001.html>

From antoine at python.org  Mon Apr 11 13:48:51 2016
From: antoine at python.org (Antoine Pitrou)
Date: Mon, 11 Apr 2016 17:48:51 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Pathlib_enhancements_-_acceptable_inputs_?=
 =?utf-8?q?and_outputs_for_=5F=5Ffspath=5F=5F_and_os=2Efspath=28=29?=
References: <5709309D.8030007@stoneleaf.us>
 <loom.20160411T164910-864@post.gmane.org> <570BC057.4040809@stoneleaf.us>
Message-ID: <loom.20160411T194619-277@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:
> 
> On 04/11/2016 07:56 AM, Antoine Pitrou wrote:
> 
> >> 2) pathlib.Path accepts bytes --
> >
> > Does it? Or are you proposing such a change?
> 
> It used to (I posted a couple examples from 3.5.0).  I finally rebuilt 
> with the latest and it no longer does.

This is surprising, since in its entire lifetime, pathlib was never
supposed to support bytes inputs. See the argument check in the
initial checkin of pathlib.py:
https://hg.python.org/cpython/rev/43377dcfb801/#l6.571

Perhaps that slipped through at some point (and obviously no test was
there to prevent it :-)).

Regards

Antoine.



From steve at pearwood.info  Mon Apr 11 13:50:37 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 12 Apr 2016 03:50:37 +1000
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
Message-ID: <20160411175036.GA1819@ando.pearwood.info>

On Sun, Apr 10, 2016 at 11:43:08AM -0700, Guido van Rossum wrote:
> Hi Steven,
> 
> No probIem with the delay -- it's still before 3.6.0. I do think it's
> just about a record gap in the PEP review process. :-)
> 
> I will approve the PEP as soon as you've updated the two function
> names in the PEP. (If you don't have write access to the peps repo,
> send the new version to peps at python.org -- or send a link to the new
> draft somewhere online, e.g. github if you're using that. If you do
> have peps repo write access, just reply here when it's done.)

I have done that, and updated the API and Implementation section to be 
less wishy-washy and more commital about what exactly will be included. 
Hope it meets with your approval, and thanks for your guidance!


-- 
Steve

From random832 at fastmail.com  Mon Apr 11 14:18:08 2016
From: random832 at fastmail.com (Random832)
Date: Mon, 11 Apr 2016 14:18:08 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
Message-ID: <1460398688.3275807.575485137.7B32BC19@webmail.messagingengine.com>

On Mon, Apr 11, 2016, at 13:36, Brett Cannon wrote:
> How about we take something from the "explicit is better than implicit"
> playbook and add a keyword argument to os.fspath() to allow bytes to pass
> through?

Except, we already know how to convert a bytes-path into a str (and vice
versa) with sys.getfilesystemencoding and surrogateescape. So why not
just have the argument specify what return type is desired?

def fspath(path, *, want_bytes=False):
    if isinstance(path, (bytes, str)):
        ppath = path
    else:
        try:
            ppath = path.__fspath__()
        except AttributeError:
            raise TypeError
    if isinstance(ppath, str):
        return ppath.encode(...) if want_bytes else ppath
    elif isinstance(ppath, bytes):
        return ppath if want_bytes else ppath.decode(...)
    else:
        raise TypeError

This way the posix os module can call the function and have the bytes
value already prepared for it to pass to the real open() syscall.

You could even add the same thing in other places, e.g. os.path.join
(defaulting to if the first argument is a bytes).

From ethan at stoneleaf.us  Mon Apr 11 14:28:22 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 11:28:22 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
Message-ID: <570BECC6.1080708@stoneleaf.us>

On 04/11/2016 10:36 AM, Brett Cannon wrote:
> On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote:

>> I'm not saying that bytes paths are common -- and if this was a
>> brand-new feature I wouldn't be pushing for it so hard;  however, bytes
>> paths are already supported and it seems to me to be much less of a
>> headache to continue the support in this new protocol instead of drawing
>> an artificial line in the sand.
>
> Headache for you? The stdlib? Library authors? Users of libraries? There
> are a lot of users of this who have varying levels of pain for this.

Yes, yes, maybe, maybe.  :)

>> Asked another way, what are we gaining by disallowing bytes in this new
>> way of getting paths versus the pain caused when bytes are needed and/or
>> accepted?
>
> Type consistency. E.g. if I pass in a DirEntry object into os.fspath()
> and I don't know what the heck I'm getting back then that can lead to
> subtle bugs [...]

> How about we take something from the "explicit is better than implicit"
> playbook and add a keyword argument to os.fspath() to allow bytes to
> pass through?
>
>    def fspath(path, *, allow_bytes=False):
>        if isinstance(path, str):
>            return path
>        # Allow bytearray?
>        elif allow_bytes and isinstance(path, bytes):
>            return path
>        try:
>            protocol = path.__fspath__()
>        except AttributeError:
>            pass
>        else:
>            # Explicit type check worth it, or better to rely on duck typing?
>            if isinstance(protocol_path, str):
>                return protocol_path
>        raise TypeError("expected a path-like object, str, or bytes (if
> allowed), not {type(path)}")

I think that might work.  We currently have four path related things: 
bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and 
one can be either.

I would write the above as:

   def fspath(path, *, allow_bytes=False):
      try:
         path = path.__fspath__()
      except AttributeError:
         pass
      if isinstance(path, str):
         return path
      elif allow_bytes and isinstance(path, bytes):
         return path
      else:
         raise SomeError()

> For DirEntry users who use bytes, they will simply have to pass around
> DirEntry.path which is not as nice as simply passing around DirEntry,

If we go with the above we allow DirEntry.__fspath__ to return bytes and 
still get type-consistency of str unless the user explicitly declares 
they're okay with getting either (and even then the field is narrowed 
from four possible source types (or more as time goes on) to two.

To recap, this would allow both str & bytes in __fspath__, but the 
fspath() function defaults to only allowing str through.

I can live with that.

--
~Ethan~

From storchaka at gmail.com  Mon Apr 11 14:29:02 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 11 Apr 2016 21:29:02 +0300
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #25910:
 Fixed more links in the docs.
In-Reply-To: <570BB79F.2000708@timgolden.me.uk>
References: <20160411143851.18859.27207.908B2F75@psf.io>
 <570BB79F.2000708@timgolden.me.uk>
Message-ID: <negqde$osj$1@ger.gmane.org>

On 11.04.16 17:41, Tim Golden wrote:
> On 11/04/2016 15:38, serhiy.storchaka wrote:
>> -  <http://www.openssl.org/docs/apps/ciphers.html#CIPHER_LIST_FORMAT>`__.
>> +  <http://www.openssl.org/docs/apps/ciphers.html#CIPHER-LIST-FORMAT>`__.
>
> Is there any intended irony in our link to openssl not being via https?
>
> :)

http://bugs.python.org/issue26736



From guido at python.org  Mon Apr 11 14:35:31 2016
From: guido at python.org (Guido van Rossum)
Date: Mon, 11 Apr 2016 11:35:31 -0700
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <20160411175036.GA1819@ando.pearwood.info>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
 <20160411175036.GA1819@ando.pearwood.info>
Message-ID: <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>

Most excellent! PEP 506 is hereby approved. Congrats again.

On Mon, Apr 11, 2016 at 10:50 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Apr 10, 2016 at 11:43:08AM -0700, Guido van Rossum wrote:
>> Hi Steven,
>>
>> No probIem with the delay -- it's still before 3.6.0. I do think it's
>> just about a record gap in the PEP review process. :-)
>>
>> I will approve the PEP as soon as you've updated the two function
>> names in the PEP. (If you don't have write access to the peps repo,
>> send the new version to peps at python.org -- or send a link to the new
>> draft somewhere online, e.g. github if you're using that. If you do
>> have peps repo write access, just reply here when it's done.)
>
> I have done that, and updated the API and Implementation section to be
> less wishy-washy and more commital about what exactly will be included.
> Hope it meets with your approval, and thanks for your guidance!
>
>
> --
> Steve
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Mon Apr 11 14:45:21 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 11:45:21 -0700
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
 <20160411175036.GA1819@ando.pearwood.info>
 <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>
Message-ID: <570BF0C1.6070209@stoneleaf.us>

On 04/11/2016 11:35 AM, Guido van Rossum wrote:

> Most excellent! PEP 506 is hereby approved. Congrats again.

Congratulations, Steven!

--
~Ethan~


From brett at python.org  Mon Apr 11 15:00:41 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 11 Apr 2016 19:00:41 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570BECC6.1080708@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
Message-ID: <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>

On Mon, 11 Apr 2016 at 11:28 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/11/2016 10:36 AM, Brett Cannon wrote:
> > On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote:
>
> >> I'm not saying that bytes paths are common -- and if this was a
> >> brand-new feature I wouldn't be pushing for it so hard;  however, bytes
> >> paths are already supported and it seems to me to be much less of a
> >> headache to continue the support in this new protocol instead of drawing
> >> an artificial line in the sand.
> >
> > Headache for you? The stdlib? Library authors? Users of libraries? There
> > are a lot of users of this who have varying levels of pain for this.
>
> Yes, yes, maybe, maybe.  :)
>
> >> Asked another way, what are we gaining by disallowing bytes in this new
> >> way of getting paths versus the pain caused when bytes are needed and/or
> >> accepted?
> >
> > Type consistency. E.g. if I pass in a DirEntry object into os.fspath()
> > and I don't know what the heck I'm getting back then that can lead to
> > subtle bugs [...]
>
> > How about we take something from the "explicit is better than implicit"
> > playbook and add a keyword argument to os.fspath() to allow bytes to
> > pass through?
> >
> >    def fspath(path, *, allow_bytes=False):
> >        if isinstance(path, str):
> >            return path
> >        # Allow bytearray?
> >        elif allow_bytes and isinstance(path, bytes):
> >            return path
> >        try:
> >            protocol = path.__fspath__()
> >        except AttributeError:
> >            pass
> >        else:
> >            # Explicit type check worth it, or better to rely on duck
> typing?
> >            if isinstance(protocol_path, str):
> >                return protocol_path
> >        raise TypeError("expected a path-like object, str, or bytes (if
> > allowed), not {type(path)}")
>
> I think that might work.  We currently have four path related things:
> bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and
> one can be either.
>
> I would write the above as:
>
>    def fspath(path, *, allow_bytes=False):
>       try:
>          path = path.__fspath__()
>       except AttributeError:
>          pass
>       if isinstance(path, str):
>          return path
>       elif allow_bytes and isinstance(path, bytes):
>          return path
>       else:
>          raise SomeError()
>
> > For DirEntry users who use bytes, they will simply have to pass around
> > DirEntry.path which is not as nice as simply passing around DirEntry,
>
> If we go with the above we allow DirEntry.__fspath__ to return bytes and
> still get type-consistency of str unless the user explicitly declares
> they're okay with getting either (and even then the field is narrowed
> from four possible source types (or more as time goes on) to two.
>

You get type consistency from so.fspath(), not the protocol, though.


>
> To recap, this would allow both str & bytes in __fspath__, but the
> fspath() function defaults to only allowing str through.
>
> I can live with that.
>

I'm -0 on allowing __fspath__ to return bytes, but we can see what others
think.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/69d01ce6/attachment.html>

From ethan at stoneleaf.us  Mon Apr 11 16:19:39 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 13:19:39 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
Message-ID: <570C06DB.3050705@stoneleaf.us>

On 04/11/2016 12:00 PM, Brett Cannon wrote:
> On Mon, 11 Apr 2016 at 11:28 Ethan Furman wrote:

>> I would write the above as:
>>
>>     def fspath(path, *, allow_bytes=False):
>
> You get type consistency from so.fspath(), not the protocol, though.

Well, since the protocol is also a function, we could put the 
allow_bytes on that as well -- not sure if that is a good idea or not.

--
~Ethan~


From tritium-list at sdamon.com  Mon Apr 11 16:33:15 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Mon, 11 Apr 2016 16:33:15 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
Message-ID: <570C0A0B.90109@sdamon.com>

In reviewing the ongoing arguments about how to make pathlib better, 
there have been circular arguments about if it is even broken, if it 
should support bytes, if there should be a path protocol that all 
functions that touch the filesystem should use, if that protocol should 
support bytes, how that protocol should be open or closed to allow third 
party modules to act as paths, etc., etc.

If there is headway being made, I do not see it.

I don't think we can come to an agreement that will make anyone happy, 
or have any effect on the adoption of the pathlib module in the standard 
library.  Maybe, just maybe, since there is an ecosystem of third party 
modules already doing this job (and arguably doing it much better than 
pathlib, and for more supported versions of python than any future 
version of pathlib will), it should be dropped from the standard library 
and left on pypi as a third party module.

From srkunze at mail.de  Mon Apr 11 16:39:19 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 22:39:19 +0200
Subject: [Python-Dev] pathlib+os/shutil feedback
In-Reply-To: <CACac1F-H23S0okraU_QqrmL-S7Q7gHNKqFba_4R04d95kR-+3w@mail.gmail.com>
References: <570A5E36.2070606@mail.de>
 <CACac1F-H23S0okraU_QqrmL-S7Q7gHNKqFba_4R04d95kR-+3w@mail.gmail.com>
Message-ID: <570C0B77.7080505@mail.de>

On 10.04.2016 16:51, Paul Moore wrote:
> On 10 April 2016 at 15:07, Sven R. Kunze <srkunze at mail.de> wrote:
>> If there's some agreement to change things with respect to those 5 points, I
>> am willing to put some time into it.
> In broad terms I agree with these points. Thanks for doing the
> research. It would certainly be good to try to improve pathlib based
> on this sort of feedback while it is still provisional.

I'd appreciate some guidance on this. Just let me know what I can do 
since I don't know the processes of hacking CPython.

> """
> Path.rglob(pattern)
> Walk down a given path; a wrapper for "os.scandir"/"os.listdir".
> """
>
> However, at least in 3.5, Path.rglob does *not* wrap scandir. There's
> a difference in principle, in that scandir (DirEntry) objects cache
> stat data, where pathlib does not. Whether that makes using scandir in
> Path.rglob impossible, I don't know. Ideally I'd like to see pathlib
> modified to use scandir (because otherwise there will always be people
> saying "use os.walk rather than scandir, as it's faster) - or if it's
> not possible to do so because of the difference in principle, then I'd
> like to see a clear discussion of the issue in the docs, including the
> recommended approach for people who want scandir performance *without*
> having to abandon pathlib for lower level functions.

Good point. The proposed docstring was just to illustrate the 
functionality to the uninformed reader. People mostly trust the docs 
without digging deeper but they should be accurate of course.


Best,
Sven

From marky1991 at gmail.com  Mon Apr 11 16:40:15 2016
From: marky1991 at gmail.com (marky1991 .)
Date: Mon, 11 Apr 2016 16:40:15 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0A0B.90109@sdamon.com>
References: <570C0A0B.90109@sdamon.com>
Message-ID: <CAG3cHaYgNWcJMQ-ca=Ot8B5epFvBA1PX=W4pgv1nFn4x9WzTLw@mail.gmail.com>

Neverending email chains aside, as a mere user, I like pathlib even as it
is today and like the convenience of it being in the stdlib. (And would
like it even more if the stdlib played nicely with it) I would be
disappointed if it were taken out. (It's one of the few recent additions
that I find useful to be honest)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/5fbbd032/attachment.html>

From victor.stinner at gmail.com  Mon Apr 11 16:42:20 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 11 Apr 2016 22:42:20 +0200
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
Message-ID: <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>

2016-04-11 21:00 GMT+02:00 Brett Cannon <brett at python.org>:
> I'm -0 on allowing __fspath__ to return bytes, but we can see what others
> think.

With the PEP 383, a bytes filename can be stored as str using the
surrogateescape error handler. So DirEntry can convert a bytes path to
str using os.fsdecode().

A "byte string" is unclear in Python. There is the immutable "bytes"
type. But there is also the mutable "bytearray" type. And the buffer
protocol which can have different shapes.

I like the idea of a simple protocol: only allow a single type, str.

Victor

From srkunze at mail.de  Mon Apr 11 16:48:39 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 22:48:39 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0A0B.90109@sdamon.com>
References: <570C0A0B.90109@sdamon.com>
Message-ID: <570C0DA7.6030407@mail.de>

On 11.04.2016 22:33, Alexander Walters wrote:
> If there is headway being made, I do not see it.

Funny that you brought it up. I was about posting something myself. I 
cannot agree completely. But starting with a comment from Paul, I 
realized that pathlib is something different than a string. After doing 
the research and our issues with pathlib, I found:


- pathlib just needs to be improved (see my 5 points)
- os[.path] should not tinkered with


I know that all of those discussions of a new protocol (path->str, 
__fspath__ etc. etc.) might be rendered worthless by these two 
statements. But that's my conclusion.

"os" and "os.path" are just lower level. "pathlib" is a high-level, 
convenience library. When using it, I don't want to use "os" or 
"os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" 
nor "os.path"*.


Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/b56b459a/attachment.html>

From ethan at stoneleaf.us  Mon Apr 11 16:51:28 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 13:51:28 -0700
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0A0B.90109@sdamon.com>
References: <570C0A0B.90109@sdamon.com>
Message-ID: <570C0E50.3080502@stoneleaf.us>

On 04/11/2016 01:33 PM, Alexander Walters wrote:

> In reviewing the ongoing arguments about how to make pathlib better,
> there have been circular arguments about if it is even broken, if it
> should support bytes, if there should be a path protocol that all
> functions that touch the filesystem should use, if that protocol should
> support bytes, how that protocol should be open or closed to allow third
> party modules to act as paths, etc., etc.

Do not take lots of discussion as a negative.  It's better to thrash it 
out thoroughly first.

> If there is headway being made, I do not see it.

It's being made, and I dare say we are close to the end.

--
~Ethan~

From tritium-list at sdamon.com  Mon Apr 11 16:55:05 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Mon, 11 Apr 2016 16:55:05 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0DA7.6030407@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
Message-ID: <570C0F29.5010904@sdamon.com>

If i had my druthers, this thread would be kept to either:

"Shut up alex, we are really close to figuring this out"

or

"Ok, maybe you have a point."

Every conceivable way to fix pathlib have already been argued.  Are any 
of them worth doing?  Can we get consensus enough to implement one of 
them?  If not, we should consider either dropping the matter or dropping 
the module.


On 4/11/2016 16:48, Sven R. Kunze wrote:
> On 11.04.2016 22:33, Alexander Walters wrote:
>> If there is headway being made, I do not see it.
>
> Funny that you brought it up. I was about posting something myself. I 
> cannot agree completely. But starting with a comment from Paul, I 
> realized that pathlib is something different than a string. After 
> doing the research and our issues with pathlib, I found:
>
>
> - pathlib just needs to be improved (see my 5 points)
> - os[.path] should not tinkered with
>
>
> I know that all of those discussions of a new protocol (path->str, 
> __fspath__ etc. etc.) might be rendered worthless by these two 
> statements. But that's my conclusion.
>
> "os" and "os.path" are just lower level. "pathlib" is a high-level, 
> convenience library. When using it, I don't want to use "os" or 
> "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" 
> nor "os.path"*.
>
>
> Best,
> Sven
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/8c7845df/attachment.html>

From tritium-list at sdamon.com  Mon Apr 11 16:56:15 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Mon, 11 Apr 2016 16:56:15 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0E50.3080502@stoneleaf.us>
References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us>
Message-ID: <570C0F6F.6060606@sdamon.com>

That is great news.  I just couldn't see it myself in the threads

On 4/11/2016 16:51, Ethan Furman wrote:
>> If there is headway being made, I do not see it.
>
> It's being made, and I dare say we are close to the end. 


From srkunze at mail.de  Mon Apr 11 17:04:29 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 23:04:29 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0F29.5010904@sdamon.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com>
Message-ID: <570C115D.1030104@mail.de>

On 11.04.2016 22:55, Alexander Walters wrote:
> Every conceivable way to fix pathlib have already been argued. Are any 
> of them worth doing?  Can we get consensus enough to implement one of 
> them?  If not, we should consider either dropping the matter or 
> dropping the module.

Right now, I don't see pathlib removed. Why? Because using strings alone 
has its caveats (we all know that). So, I cannot imagine an alternative 
concept to pathlib right now. We might call it differently, but the 
concept stays unchanged.

MAYBE, if there's an alternative concept, I could be convinced to 
support dropping the module.

Best,
Sven

PS: The only way out that I can imagine is to fix pathlib. I am not in 
favor of fixing functions of "os" and "os.path" to except "path" 
objects; which does the majority here discuss now with the new 
__fspath__ protocol. But shaping what we have is definitely worth it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/a66e78b1/attachment.html>

From ethan at stoneleaf.us  Mon Apr 11 17:10:26 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 14:10:26 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
Message-ID: <570C12C2.9000602@stoneleaf.us>

On 04/11/2016 01:42 PM, Victor Stinner wrote:
> 2016-04-11 21:00 GMT+02:00 Brett Cannon:

>> I'm -0 on allowing __fspath__ to return bytes, but we can see what others
>> think.
>
> With the PEP 383, a bytes filename can be stored as str using the
> surrogateescape error handler. So DirEntry can convert a bytes path to
> str using os.fsdecode().

I am far from a unicode expert, but if I understand this correctly you 
are proposing that DirEntry.__whatever__ can always return a str using 
the surogateescape (SE) method.

However, before this SE string can be used, it would need to be 
converted back to bytes, and with the same SE method, yes?  And this has 
already been implemented in the stdlib?

So my concern in such a case is what happens if we pass this SE string 
somewhere else: a UTF-8 file, or over a socket, or into a database? 
Does this have issues that we wouldn't face if we just used bytes?

--
~Ethan~

From random832 at fastmail.com  Mon Apr 11 17:05:59 2016
From: random832 at fastmail.com (Random832)
Date: Mon, 11 Apr 2016 17:05:59 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0DA7.6030407@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
Message-ID: <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com>

On Mon, Apr 11, 2016, at 16:48, Sven R. Kunze wrote:
> On 11.04.2016 22:33, Alexander Walters wrote:
> > If there is headway being made, I do not see it.
> 
> Funny that you brought it up. I was about posting something myself. I 
> cannot agree completely. But starting with a comment from Paul, I 
> realized that pathlib is something different than a string. After doing 
> the research and our issues with pathlib, I found:
> 
> 
> - pathlib just needs to be improved (see my 5 points)
> - os[.path] should not tinkered with

I'm not so sure. Is there any particular reason os.path.join should
require its arguments to be homogenous, rather than allowing
os.path.join('a', b'b', Path('c')) to return 'a/b/c'?

> I know that all of those discussions of a new protocol (path->str, 
> __fspath__ etc. etc.) might be rendered worthless by these two 
> statements. But that's my conclusion.
> 
> "os" and "os.path" are just lower level. "pathlib" is a high-level, 
> convenience library. When using it, I don't want to use "os" or 
> "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" 
> nor "os.path"*.

The problem isn't you using os. It's you using other modules that use
os. or io, shutil, or builtins.open. Or pathlib, if what *you're* using
is some other path library. Are you content living in a walled garden
where there is only your code and pathlib, and you never might want to
pass a Path to some function someone else (who didn't use pathlib)
wrote?

os is being used as an example because fixing os probably gets you most
other things (that just pass it through to builtins.open which passes it
through to os.open) for free.

From random832 at fastmail.com  Mon Apr 11 17:08:51 2016
From: random832 at fastmail.com (Random832)
Date: Mon, 11 Apr 2016 17:08:51 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C115D.1030104@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
Message-ID: <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>

On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote:
> PS: The only way out that I can imagine is to fix pathlib. I am not in 
> favor of fixing functions of "os" and "os.path" to except "path" 
> objects;

Why not?

From tritium-list at sdamon.com  Mon Apr 11 17:11:22 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Mon, 11 Apr 2016 17:11:22 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C115D.1030104@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
Message-ID: <570C12FA.3030609@sdamon.com>

This stance was probably already argued in the threads in question. This 
thread is more of a health-check.  As an observer, it did not look like 
any headway was being made, and I suggested the solimaic solution.  It 
has been pointed out to me that headway IS being made and they are close 
to a solution.  I think this thread can safely be sunset.

On 4/11/2016 17:04, Sven R. Kunze wrote:
> On 11.04.2016 22:55, Alexander Walters wrote:
>> Every conceivable way to fix pathlib have already been argued. Are 
>> any of them worth doing?  Can we get consensus enough to implement 
>> one of them?  If not, we should consider either dropping the matter 
>> or dropping the module.
>
> Right now, I don't see pathlib removed. Why? Because using strings 
> alone has its caveats (we all know that). So, I cannot imagine an 
> alternative concept to pathlib right now. We might call it 
> differently, but the concept stays unchanged.
>
> MAYBE, if there's an alternative concept, I could be convinced to 
> support dropping the module.
>
> Best,
> Sven
>
> PS: The only way out that I can imagine is to fix pathlib. I am not in 
> favor of fixing functions of "os" and "os.path" to except "path" 
> objects; which does the majority here discuss now with the new 
> __fspath__ protocol. But shaping what we have is definitely worth it.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/5b2e8068/attachment-0001.html>

From ethan at stoneleaf.us  Mon Apr 11 17:15:02 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 14:15:02 -0700
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C115D.1030104@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
Message-ID: <570C13D6.4090609@stoneleaf.us>

On 04/11/2016 02:04 PM, Sven R. Kunze wrote:
> On 11.04.2016 22:55, Alexander Walters wrote:

>> Every conceivable way to fix pathlib have already been argued. Are any
>> of them worth doing?  Can we get consensus enough to implement one of
>> them?  If not, we should consider either dropping the matter or
>> dropping the module.
>
> Right now, I don't see pathlib removed. Why? Because using strings alone
> has its caveats (we all know that). So, I cannot imagine an alternative
> concept to pathlib right now. We might call it differently, but the
> concept stays unchanged.

We've pretty decided that we have two options:

1. remove pathlib
2. make the stdlib work with pathlib

So we're trying to make option 2 work before falling back to option 1.

If you have a way to make pathlib work with the stdlib that doesn't 
involve "fixing" os and os.path, now is the time to speak up.

--
~Ethan~

From srkunze at mail.de  Mon Apr 11 17:21:36 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 23:21:36 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
Message-ID: <570C1560.7070105@mail.de>

On 11.04.2016 23:08, Random832 wrote:
> On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote:
>> PS: The only way out that I can imagine is to fix pathlib. I am not in
>> favor of fixing functions of "os" and "os.path" to except "path"
>> objects;
> Why not?

It occurred to me after pondering over Paul's comments.

"os" and "os.path" is just a completely different level of abstraction. 
There is just no need to mess with them.

The initial failure of my colleague and me of using pathlib can be 
solely attributed to pathlib's lack of functionality. Not to the 
incompatibility of "os" nor "os.path" with "Path" objects.


Best,
Sven

From srkunze at mail.de  Mon Apr 11 17:33:38 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 23:33:38 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com>
Message-ID: <570C1832.6010509@mail.de>

On 11.04.2016 23:05, Random832 wrote:
> On Mon, Apr 11, 2016, at 16:48, Sven R. Kunze wrote:
>> On 11.04.2016 22:33, Alexander Walters wrote:
>>> If there is headway being made, I do not see it.
>> Funny that you brought it up. I was about posting something myself. I
>> cannot agree completely. But starting with a comment from Paul, I
>> realized that pathlib is something different than a string. After doing
>> the research and our issues with pathlib, I found:
>>
>>
>> - pathlib just needs to be improved (see my 5 points)
>> - os[.path] should not tinkered with
> I'm not so sure. Is there any particular reason os.path.join should
> require its arguments to be homogenous, rather than allowing
> os.path.join('a', b'b', Path('c')) to return 'a/b/c'?

Besides the fact, that I don't like mixing types (this was something 
that worried me about the discussion from the beginning), you can 
achieve the same using pathlib alone.

There's no need of it let alone the maintenance and slowdown of these 
implicit conversions.

>> I know that all of those discussions of a new protocol (path->str,
>> __fspath__ etc. etc.) might be rendered worthless by these two
>> statements. But that's my conclusion.
>>
>> "os" and "os.path" are just lower level. "pathlib" is a high-level,
>> convenience library. When using it, I don't want to use "os" or
>> "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os"
>> nor "os.path"*.
> The problem isn't you using os. It's you using other modules that use
> os. or io, shutil, or builtins.open. Or pathlib, if what *you're* using
> is some other path library. Are you content living in a walled garden
> where there is only your code and pathlib, and you never might want to
> pass a Path to some function someone else (who didn't use pathlib)
> wrote?
>
> os is being used as an example because fixing os probably gets you most
> other things (that just pass it through to builtins.open which passes it
> through to os.open) for free.

Hypothetical assumptions meeting implicit type conversions. You might 
prefer those, I don't because of good reason. I was one of those 
starting the discussion around pathlib improvements. I understand now, 
that this is one of its minor issues. And btw. using some "other 
pathlib" is no argument for or against improving "THE pathlib".

The .path attribute will do it from what I can see.


Best,
Sven

From brett at python.org  Mon Apr 11 17:40:29 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 11 Apr 2016 21:40:29 +0000
Subject: [Python-Dev] pathlib+os/shutil feedback
In-Reply-To: <570C0B77.7080505@mail.de>
References: <570A5E36.2070606@mail.de>
 <CACac1F-H23S0okraU_QqrmL-S7Q7gHNKqFba_4R04d95kR-+3w@mail.gmail.com>
 <570C0B77.7080505@mail.de>
Message-ID: <CAP1=2W4jAdjVsv+=6Z2gVfawt-XSyTaTxF0w2py9Nchnu9x7rA@mail.gmail.com>

On Mon, 11 Apr 2016 at 13:40 Sven R. Kunze <srkunze at mail.de> wrote:

> On 10.04.2016 16:51, Paul Moore wrote:
> > On 10 April 2016 at 15:07, Sven R. Kunze <srkunze at mail.de> wrote:
> >> If there's some agreement to change things with respect to those 5
> points, I
> >> am willing to put some time into it.
> > In broad terms I agree with these points. Thanks for doing the
> > research. It would certainly be good to try to improve pathlib based
> > on this sort of feedback while it is still provisional.
>
> I'd appreciate some guidance on this. Just let me know what I can do
> since I don't know the processes of hacking CPython.
>

https://docs.python.org/devguide/ and
https://mail.python.org/mailman/listinfo/core-mentorship are your friends.
:)

For new features of a module you can discuss it on python-ideas first
before proposing a patch if you're worried a patch implementing the feature
might get rejected and you don't want to risk wasting your time.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/56891f3a/attachment.html>

From ben+python at benfinney.id.au  Mon Apr 11 17:41:30 2016
From: ben+python at benfinney.id.au (Ben Finney)
Date: Tue, 12 Apr 2016 07:41:30 +1000
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us>
 <570C0F6F.6060606@sdamon.com>
Message-ID: <85mvoz2585.fsf@benfinney.id.au>

Alexander Walters <tritium-list at sdamon.com> writes:

> That is great news.  I just couldn't see it myself in the threads

Agreed. A summary posting, from someone who has a good handle on the
issue and outcome, would be very helpful.

-- 
 \       ?Firmness in decision is often merely a form of stupidity. It |
  `\        indicates an inability to think the same thing out twice.? |
_o__)                                                ?Henry L. Mencken |
Ben Finney


From brett at python.org  Mon Apr 11 17:43:01 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 11 Apr 2016 21:43:01 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570C12C2.9000602@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
Message-ID: <CAP1=2W50+Bag+p43MRNFR+i==GFj7d2hG-zaqvM-zK4mj9ydxw@mail.gmail.com>

On Mon, 11 Apr 2016 at 14:11 Ethan Furman <ethan at stoneleaf.us> wrote:

> On 04/11/2016 01:42 PM, Victor Stinner wrote:
> > 2016-04-11 21:00 GMT+02:00 Brett Cannon:
>
> >> I'm -0 on allowing __fspath__ to return bytes, but we can see what
> others
> >> think.
> >
> > With the PEP 383, a bytes filename can be stored as str using the
> > surrogateescape error handler. So DirEntry can convert a bytes path to
> > str using os.fsdecode().
>
> I am far from a unicode expert, but if I understand this correctly you
> are proposing that DirEntry.__whatever__ can always return a str using
> the surogateescape (SE) method.
>
> However, before this SE string can be used, it would need to be
> converted back to bytes, and with the same SE method, yes?  And this has
> already been implemented in the stdlib?
>
> So my concern in such a case is what happens if we pass this SE string
> somewhere else: a UTF-8 file, or over a socket, or into a database?
> Does this have issues that we wouldn't face if we just used bytes?
>

This is my worry as well and why I have not proposed this kind of universal
normalizing of bytes paths using os.fsdecode() w/ surrogateescape. Doing
this sort of thing from the system boundary and documenting as such as PEP
383 proposed makes a bit more sense as the expectation is more controlled
and is a clear input boundary.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/c3c356aa/attachment.html>

From srkunze at mail.de  Mon Apr 11 17:43:49 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Mon, 11 Apr 2016 23:43:49 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C13D6.4090609@stoneleaf.us>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us>
Message-ID: <570C1A95.1060100@mail.de>

On 11.04.2016 23:15, Ethan Furman wrote:
> We've pretty decided that we have two options:
>
> 1. remove pathlib
> 2. make the stdlib work with pathlib
>
> So we're trying to make option 2 work before falling back to option 1.
>
> If you have a way to make pathlib work with the stdlib that doesn't 
> involve "fixing" os and os.path, now is the time to speak up.

As I said, I don't like messing with os or os.path. They are built with 
a different level of abstraction in mind.


What makes people want to go down from pathlib to os (speaking in terms 
of abstraction) is the fact that pathlib suggests/promise a convenience 
that it cannot hold. You might have seen my "feedback" post here on 
python-dev. If those points were corrected in a reasonable way, we 
wouldn't have had the need to go down to os or other stdlib modules. As 
it presents itself, it feels like a poor wrapper for os and os.path. I 
hope that makes sense.

So, I might add:

3. add more high-level features to pathlib to prevent a downgrade to os 
or os.path


Best,
Sven

From brett at python.org  Mon Apr 11 17:55:55 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 11 Apr 2016 21:55:55 +0000
Subject: [Python-Dev] Summary of the pathlib discussion (Re:  Maybe,
 just maybe, pathlib doesn't belong.)
In-Reply-To: <85mvoz2585.fsf@benfinney.id.au>
References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us>
 <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au>
Message-ID: <CAP1=2W44n4wgObVZz7EJcK3WNmm7H=0WQJiq1Pg3QDC0-8DQ6Q@mail.gmail.com>

On Mon, 11 Apr 2016 at 14:42 Ben Finney <ben+python at benfinney.id.au> wrote:

> Alexander Walters <tritium-list at sdamon.com> writes:
>
> > That is great news.  I just couldn't see it myself in the threads
>
> Agreed. A summary posting, from someone who has a good handle on the
> issue and outcome, would be very helpful.
>


   - Guido has put Chris Angelico and myself in charge of drafting a
   proposal once we are done discussing things as a PEP (probably an amendment
   to the pathlib PEP where I will also explain why we are still not
   subclassing str)
   - Ethan Furman has volunteered to help out with code work (as have I)
   - Name bikeshedding never seems to end, but there seems to be coalescing
   around __fspath__ or __fspathname__ (I think, although __fspath__ seems to
   be what everyone has been typing today; I'm trying to stay out of it so as
   to not influence too much)
   - We are only discussing two things still (all going on in the threads
   relating to return values, arguments, types, etc. in their titles)...
      - Should path.__fspath__() be allowed to return bytes on top of
      strings? (we seem to have found an amicable way to allow
os.fspath() to let
      a bytes argument pass through just like str in an explicit fashion)
      - Should we explicitly type check in os.fspath() what
      path.__fspath__() returns or just let it fall through and hope people do
      the right thing?

That's pretty much it unless Chris or Ethan disagree. So I think pathlib is
far from being as dead as a parrot. ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/2801508d/attachment.html>

From ethan at stoneleaf.us  Mon Apr 11 17:58:43 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 14:58:43 -0700
Subject: [Python-Dev] pathlib - current status of discussions
Message-ID: <570C1E13.4090909@stoneleaf.us>

name:
----

We are down to two choices:

- __fspath__, or
- __fspathname__

The final choice I suspect will be affected by the choice to allow (or 
not) bytes.


method or attribute:
-------------------

method


built-in:
--------

Almost - we'll put it in the os module


add to str:
----------

No, not all strings are paths.


add to C API:
------------

Yes.  Possible names include PyUnicode_FromFSPath and PyObject_Path -- 
again, the choice of bytes inclusion will affect the final choice of name.


add a Path ABC:
--------------

undecided


Sticking points:
---------------

Do we allow bytes to be returned from os.fspath()?  If yes, then do we 
allow bytes from __fspath__()?

--
~Ethan~

From ethan at stoneleaf.us  Mon Apr 11 18:00:46 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 15:00:46 -0700
Subject: [Python-Dev] Summary of the pathlib discussion (Re:  Maybe,
 just maybe, pathlib doesn't belong.)
In-Reply-To: <CAP1=2W44n4wgObVZz7EJcK3WNmm7H=0WQJiq1Pg3QDC0-8DQ6Q@mail.gmail.com>
References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us>
 <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au>
 <CAP1=2W44n4wgObVZz7EJcK3WNmm7H=0WQJiq1Pg3QDC0-8DQ6Q@mail.gmail.com>
Message-ID: <570C1E8E.1080205@stoneleaf.us>

On 04/11/2016 02:55 PM, Brett Cannon wrote:

> That's pretty much it unless Chris or Ethan disagree. So I think pathlib
> is far from being as dead as a parrot. ;)

That's nearly exactly what I wrote in my summary.  :)

So, yes, we are nearly there!

--
~Ethan~

From wes.turner at gmail.com  Mon Apr 11 18:02:46 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 11 Apr 2016 17:02:46 -0500
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
Message-ID: <CACfEFw8PwRvjPw4jEYcW5jCw-OQyGRNT-X+pJ2Vk4fp01nHmHQ@mail.gmail.com>

You seem to be defining a (restricted subset of an existing) language;
which will need version strings and ABI tags for compatibility purposes:

* Build Tags (for Python variants):
   * https <https://www.python.org/dev/peps/pep-0425/>://
<https://www.python.org/dev/peps/pep-0425/>www.python.org
<https://www.python.org/dev/peps/pep-0425/>/dev/peps/pep-0425/
<https://www.python.org/dev/peps/pep-0425/>
     * Python tag
     * ABI tag
     * Platform tag
  * https://www.python.org/dev/peps/pep-0513/ manylinux1
  * https://www.python.org/dev/peps/pep-3149/ .so file tags
  * RestrictedPython does not have ABI tags

An Android CPython build discussion about just exposing an extra attribute
in the platform module (the Android build also ships without some modules
IIRC):
* https://mail.python.org/pipermail/python-dev/2014-August/135606.html
*
https://mail.python.org/pipermail/python-dev/2014-August/thread.html#135640

On 11 April 2016 at 15:46, Jon Ribbens <jon+python-dev at unequivocal.co.uk>
wrote:
> It's trying to alter
> the global Python environment so that arbitrary code can be executed,
> whereas I am not even trying to allow execution of arbitrary code and
> am not altering the global environment.

However, it's not at all clear (to me at least) what you *are* trying
to do. You're limiting the subset of Python that people can use,
understood. And you're trying to ensure that people can't do "bad
things". Again, understood. But what subset are you actually allowing,
and what things are you trying to protect against? (For example, I
can't calculate sin(1.2) using the math module - why is that not
alllowed? It's just as safe as using the built in exponential
operator, and indeed I could write a sin() function in pure Python,
although it would be too slow to be useful, unlike math.sin...)

It feels at the moment as if I'm playing a game where I don't know the
rules, and every time I think I scored a point, the rules are changed
to retroactively disallow it.

Paul
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/ee1150b3/attachment-0001.html>

From donald at stufft.io  Mon Apr 11 18:38:56 2016
From: donald at stufft.io (Donald Stufft)
Date: Mon, 11 Apr 2016 18:38:56 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C1E13.4090909@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
Message-ID: <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>


> On Apr 11, 2016, at 5:58 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
> name:
> ----
> 
> We are down to two choices:
> 
> - __fspath__, or
> - __fspathname__
> 
> The final choice I suspect will be affected by the choice to allow (or not) bytes.


+1 on __fspath__, -0 on __fspathname__

> 
> 
> 
> add a Path ABC:
> --------------
> 
> undecided


I think it makes sense to add it, but maybe only in 3.6? Path accepting code could be updated to do something like `isinstance(obj, (bytes, str, PathMeta))` which seems like a net win to me.

> 
> 
> Sticking points:
> ---------------
> 
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow bytes from __fspath__()?

I think yes and yes, it seems like making it needlessly harder to deal with a bytes path in the scenarios that you?re actually dealing with them is the kind of change that 3.0 made that ended up getting rolled back where it could.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/4d916038/attachment.sig>

From jon+python-dev at unequivocal.co.uk  Mon Apr 11 18:43:17 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Mon, 11 Apr 2016 23:43:17 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmriCFBHQgBrX7JnpO9WhaqmW4fMv0xTqd-qOjfZ3mR3tQ@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CAPTjJmriCFBHQgBrX7JnpO9WhaqmW4fMv0xTqd-qOjfZ3mR3tQ@mail.gmail.com>
Message-ID: <20160411224317.GD8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote:
> On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens
> <jon+python-dev at unequivocal.co.uk> wrote:
> > On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote:
> >> However, it's not at all clear (to me at least) what you *are* trying
> >> to do.
> >
> > I'm trying to see to what extent we can use ast node inspection to
> > remedy the failures of prior attempts at Python sandboxing. Is there
> > *any* extent to which Python can be sandboxed, or is even trying to
> > use it as a calculator function unfixably insecure?
> 
> It all depends on how much functionality you want. If all you need is
> a numeric expression evaluator, that's not too hard - disallow all
> forms of attribute access, etc, and just have simple numbers and
> operators. That's pretty useful, and safe.

By "calculator" I didn't necessarily mean to imply numeric-only,
sorry if I was unclear. Also perhaps I should have said "non-trivial",
inasmuch as if we restrict it that far then it would quite possibly be
simpler and quicker just to write the expression evaluator from scratch
and not use the Python interpreter at all.

> Alternatively, go completely the other way. Let people run whatever
> code they like... in an environment where it can't hurt anyone else.
> That's what PyPyJS does - don't bother looking for security holes in
> it, because all you're doing is attacking your own computer.

That's a very specific use case though: running client-side in the
user's browser.

> So before you can ask whether Python is unfixably insecure, you first
> have to decide what the minimum level of functionality is that you'll
> accept. Do you need basic arithmetic plus trignometric functions? Easy
> enough - disallow all attribute access and imports, and populate
> builtins with "from math import *". Need them to be able to assign
> variables and define functions? That's gonna be harder.

I think calling functions and accessing variables and attributes is
likely a minimum. Defining functions would be useful, and of course
defining classes would be another useful step further.

From random832 at fastmail.com  Mon Apr 11 18:56:05 2016
From: random832 at fastmail.com (Random832)
Date: Mon, 11 Apr 2016 18:56:05 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C1A95.1060100@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
Message-ID: <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>

On Mon, Apr 11, 2016, at 17:15, Ethan Furman wrote:
> So we're trying to make option 2 work before falling back to option 1.
> 
> If you have a way to make pathlib work with the stdlib that doesn't 
> involve "fixing" os and os.path, now is the time to speak up.

Fully general re-dispatch from argument types on any call to a function
that raises TypeError or NotImplemented? [e.g. call
Path.__missing_func__(os.open, path, mode)]

Have pathlib monkey-patch things at import?


On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote:
> So, I might add:
> 
> 3. add more high-level features to pathlib to prevent a downgrade to os 
> or os.path

3. reimplement the entire ecosystem in every walled garden so no-one has
to leave their walled gardens.

What's the point of batteries being included if you can't wire them to
anything?

I don't get what you mean by this whole "different level of abstraction"
thing, anyway. The fact that there is one obvious thing to want to do
with open and a Path strongly suggests that that should be able to be
done by passing the Path to open.

Also, what level of abstraction is builtin open? Maybe we should _just_
leave os alone on the grounds of some holy sacred lowest-level-itude,
but allow io and shutils to accept Path?

From victor.stinner at gmail.com  Mon Apr 11 19:43:16 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 01:43:16 +0200
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570C12C2.9000602@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
Message-ID: <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>

Le 11 avr. 2016 11:11 PM, "Ethan Furman" <ethan at stoneleaf.us> a ?crit :
> So my concern in such a case is what happens if we pass this SE string
somewhere else: a UTF-8 file, or over a socket, or into a database? Does
this have issues that we wouldn't face if we just used bytes?

"SE string" are returned by os.listdir(str), os.walk(str), os.getenv(str),
sys.argv[int], ... since Python 3.3. Nothing new under the sun.

Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
error. A surrogate is created to store an undecodable byte in a filename.

IHMO it's safer to get an encoding error rather than no error when you
concatenate two byte strings encoded to two different encodings (mojibake).

print(os.fspath(obj)) will more likely do what you expect if os.fspath()
always return str. I mean that it will encode your filename to the encoding
of the terminal which can be different than the filesystem encoding.

If fspath() can return bytes, you should write
print(os.fsdecode(os.fspath(obj))).

--

On Linux, open(DirEntry) for a bytes entry (os.scandir(bytes)) would have
to first decode a bytes filename with os.fsdecode() to then encode it back
with os.fsencode().

Yeah, that's inefficient. But we now have super fast codecs (ex: encode and
decode is almost memcpy for pure ascii). And filenames are usually very
short (less than 300 bytes). IMHO the interface matters more than
performance.

As I showed with my print example, filenames are not only used to access
the filesystem, you also want to display them. Using Unicode avoids bad
surprises (mojibake).

--

Well, the question is more why you want to get bytes at the first place.
Why not only using Unicode?

I understood that some people expect mojibake when using Unicode, whereas
using bytes cannot lead to mojibake. Well, in practice it's simply the
opposite :-)

Maybe devs read that Linux syscalls and C functions take bytes, so using
bytes give access to any filenames including "invalid filenames". That's
true. But it's also true for Unicode if you use os.fsdecode().

Maybe dev don't understand, don't know and fear Unicode :-)

My goal is more to educate users and help them to avoid mojibake.

Did I mention that you must not use bytes filename on Windows? So using
Unicode everywhere helps to write really portable code. On Windows, using
Unicode is requied to be able to open any file.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/723d2279/attachment.html>

From rosuav at gmail.com  Mon Apr 11 20:01:14 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 10:01:14 +1000
Subject: [Python-Dev] Summary of the pathlib discussion (Re:  Maybe,
 just maybe, pathlib doesn't belong.)
In-Reply-To: <CAP1=2W44n4wgObVZz7EJcK3WNmm7H=0WQJiq1Pg3QDC0-8DQ6Q@mail.gmail.com>
References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us>
 <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au>
 <CAP1=2W44n4wgObVZz7EJcK3WNmm7H=0WQJiq1Pg3QDC0-8DQ6Q@mail.gmail.com>
Message-ID: <CAPTjJmpBJpgXZPP1N6jUtv+=wQM+-JN21oDrkBibXV=R5fF=3w@mail.gmail.com>

On Tue, Apr 12, 2016 at 7:55 AM, Brett Cannon <brett at python.org> wrote:
> That's pretty much it unless Chris or Ethan disagree. So I think pathlib is
> far from being as dead as a parrot. ;)

That looks like an accurate summary!

ChrisA

From ethan at stoneleaf.us  Mon Apr 11 20:40:50 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 17:40:50 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
Message-ID: <570C4412.4070600@stoneleaf.us>

On 04/11/2016 01:42 PM, Victor Stinner wrote:

> With the PEP 383, a bytes filename can be stored as str using the
> surrogateescape error handler. So DirEntry can convert a bytes path to
> str using os.fsdecode().

Does this mean that os.fsdecode() is simply a wrapper that sets the 
errors to the surrogateescape handler?

--

~Ethan~

From songofacandy at gmail.com  Mon Apr 11 20:51:21 2016
From: songofacandy at gmail.com (INADA Naoki)
Date: Tue, 12 Apr 2016 09:51:21 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAEfz+Tx0u7S-Ni1HsSMSVySH+vG8X0E07RUpXEv0=H=ErDYXyg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <CAEfz+Tx0u7S-Ni1HsSMSVySH+vG8X0E07RUpXEv0=H=ErDYXyg@mail.gmail.com>
Message-ID: <CAEfz+Txb3LtCrUEr+oBxb1H2VzMOMBZoJ+y_oy9jX8Gndd3KRg@mail.gmail.com>

Sorry, I've forgot to use "Reply All".

On Tue, Apr 12, 2016 at 9:49 AM, INADA Naoki <songofacandy at gmail.com> wrote:

> IHMO it's safer to get an encoding error rather than no error when you
>> concatenate two byte strings encoded to two different encodings (mojibake).
>>
>> print(os.fspath(obj)) will more likely do what you expect if os.fspath()
>> always return str. I mean that it will encode your filename to the encoding
>> of the terminal which can be different than the filesystem encoding.
>>
>> If fspath() can return bytes, you should write
>> print(os.fsdecode(os.fspath(obj))).
>>
>>
> Why not print(obj)?
> str() is normal high-level API, and __fspath__ and os.fspath() should be
> low level API.
> Normal users shouldn't use __fspath__ and os.fspath().  Only library
> developers should use it.
>
> --
> INADA Naoki  <songofacandy at gmail.com>
>

-- 
INADA Naoki  <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/871917bd/attachment.html>

From greg.ewing at canterbury.ac.nz  Mon Apr 11 20:55:43 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 12 Apr 2016 12:55:43 +1200
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570BCE39.8090306@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
Message-ID: <570C478F.6050400@canterbury.ac.nz>

Ethan Furman wrote:
>   # after new protocol with bytes/str support
>   def zingar(a_path):
>       a_path = fspath(a_path)
>       if not isinstance(a_path, (bytes,str)):
>           raise TypeError('bytes or str required')
>       ...

I think that one would be just

    def zingar(a_path):
        a_path = fspath(a_path)

because fspath() would presumably check the result for
str/bytesness itself. At least I can't think of a reason
for it not to, since returning either str or bytes is
part of its contract.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Mon Apr 11 21:08:36 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 12 Apr 2016 13:08:36 +1200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411164449.GB8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk> <8760vorweo.fsf@thinkpad.rath.org>
 <20160411164449.GB8206@unequivocal.co.uk>
Message-ID: <570C4A94.1010402@canterbury.ac.nz>

Jon Ribbens wrote:
> So far it looks like blocking "_*" and the frame object attributes
> appears to be sufficient.

Even if your sandbox as it currently exists is secure, it's
only an extremely restricted subset. You seem to be assuming
that if your technique works so far, then it can be extended
to cover a larger subset, but I don't think that's certain.

One problem that's been raised is how to prevent untrusted
code from monkeypatching imported modules. Possibly that
could be addressed by giving the untrusted code a copy of
the module, but I'm not entirely sure -- accidentally
importing two copies of the same source file is a well-known
source of bugs, after all.

A related, but more difficult problem is that if we allow
the untrusted code to import any pure-Python classes, it
will be able to monkeypatch them. So it seems like it will
need its own copy of those classes as well -- and having
two copies of the same class around is *another* well
known source of bugs.

-- 
Greg

From wes.turner at gmail.com  Mon Apr 11 21:52:10 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 11 Apr 2016 20:52:10 -0500
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <570C4A94.1010402@canterbury.ac.nz>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <8760vorweo.fsf@thinkpad.rath.org>
 <20160411164449.GB8206@unequivocal.co.uk>
 <570C4A94.1010402@canterbury.ac.nz>
Message-ID: <CACfEFw_GzLZz=ato1w-uywgEEnKGL=2-6zNtqW=3Edz+LeogMA@mail.gmail.com>

On Mon, Apr 11, 2016 at 8:08 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
wrote:

> Jon Ribbens wrote:
>
>> So far it looks like blocking "_*" and the frame object attributes
>> appears to be sufficient.
>>
>
> Even if your sandbox as it currently exists is secure, it's
> only an extremely restricted subset. You seem to be assuming
> that if your technique works so far, then it can be extended
> to cover a larger subset, but I don't think that's certain.
>

How would you test that?


> One problem that's been raised is how to prevent untrusted
> code from monkeypatching imported modules. Possibly that
> could be addressed by giving the untrusted code a copy of
> the module, but I'm not entirely sure -- accidentally
> importing two copies of the same source file is a well-known
> source of bugs, after all.
>

https://en.wikipedia.org/wiki/Monkey_patch#Pitfalls

*
https://pypi.python.org/pypi?%3Aaction=search&term=monkeypatch&submit=search

  * https://pypi.python.org/pypi/apparmor_monkeys
  *
http://eventlet.net/doc/patching.html#monkeypatching-the-standard-library
  * http://www.gevent.org/gevent.monkey.html
  * https://docs.python.org/3/library/asyncio-sync.html#locks
  * https://docs.python.org/2/library/threading.html#lock-objects
  *
https://docs.python.org/2/library/sets.html?highlight=immutable#sets.ImmutableSet
  * http://doc.pypy.org/en/latest/stm.html#locks
   - " Infinite recursion just segfaults for now."
  * https://github.com/tobgu/pyrsistent #justfoundthis
    - https://github.com/tobgu/pyrsistent#invariants
    - https://github.com/tobgu/pyrsistent#freeze-and-thaw
      - freeze, thaw

  * define a @property (and no @propname.setter)
    - https://docs.python.org/2/howto/descriptor.html#properties
    - https://docs.python.org/2/library/functions.html#property


> A related, but more difficult problem is that if we allow
> the untrusted code to import any pure-Python classes, it
> will be able to monkeypatch them. So it seems like it will
> need its own copy of those classes as well --


* https://docs.python.org/3/library/importlib.html#importlib.__import__
*


> and having
> two copies of the same class around is *another* well
> known source of bugs.


One way to reduce the likelihood of this is to
bundle all dependencies into a self-contained
PEX ZIP package
and specify entry points.

* http://legacy.python.org/dev/peps/pep-0441/
*
https://pex.readthedocs.org/en/stable/buildingpex.html#specifying-entry-points
*
https://pex.readthedocs.org/en/stable/buildingpex.html#tailoring-pex-execution-at-build-time


>
>
> --
> Greg
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160411/2e9a6913/attachment-0001.html>

From jon+python-dev at unequivocal.co.uk  Mon Apr 11 22:00:29 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 03:00:29 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <570C4A94.1010402@canterbury.ac.nz>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <8760vorweo.fsf@thinkpad.rath.org>
 <20160411164449.GB8206@unequivocal.co.uk>
 <570C4A94.1010402@canterbury.ac.nz>
Message-ID: <20160412020029.GE8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 01:08:36PM +1200, Greg Ewing wrote:
> Jon Ribbens wrote:
> >So far it looks like blocking "_*" and the frame object attributes
> >appears to be sufficient.
> 
> Even if your sandbox as it currently exists is secure, it's
> only an extremely restricted subset.

I'm not sure what you think the restrictions are, but yes a highly
restricted Python that was secure would be very useful sometimes.

> You seem to be assuming that if your technique works so far, then it
> can be extended to cover a larger subset, but I don't think that's
> certain.

No, I'm not assuming that.

> One problem that's been raised is how to prevent untrusted
> code from monkeypatching imported modules. Possibly that
> could be addressed by giving the untrusted code a copy of
> the module,

Yes, that's what it does.

> but I'm not entirely sure -- accidentally importing two copies of
> the same source file is a well-known source of bugs, after all.

I'm not sure what you mean by that.

> A related, but more difficult problem is that if we allow
> the untrusted code to import any pure-Python classes, it
> will be able to monkeypatch them. So it seems like it will
> need its own copy of those classes as well

Yes, that's also what it does.

> -- and having two copies of the same class around is *another* well
> known source of bugs.

I'm not sure what you mean by that either.

From rosuav at gmail.com  Mon Apr 11 22:13:07 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 12:13:07 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411224317.GD8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CAPTjJmriCFBHQgBrX7JnpO9WhaqmW4fMv0xTqd-qOjfZ3mR3tQ@mail.gmail.com>
 <20160411224317.GD8206@unequivocal.co.uk>
Message-ID: <CAPTjJmpWVmXAn6R7eR80TLswMtgWyAbdgvJm535R-DsvMn8WZg@mail.gmail.com>

On Tue, Apr 12, 2016 at 8:43 AM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote:
>> It all depends on how much functionality you want. If all you need is
>> a numeric expression evaluator, that's not too hard - disallow all
>> forms of attribute access, etc, and just have simple numbers and
>> operators. That's pretty useful, and safe.
>
> By "calculator" I didn't necessarily mean to imply numeric-only,
> sorry if I was unclear. Also perhaps I should have said "non-trivial",
> inasmuch as if we restrict it that far then it would quite possibly be
> simpler and quicker just to write the expression evaluator from scratch
> and not use the Python interpreter at all.

I'm aware you wanted more. My point is that it's not hard to secure
the trivially simple, and it doesn't have to be entirely useless. But
every bit of additional power brings with it additional risk.

ChrisA

From ncoghlan at gmail.com  Mon Apr 11 23:45:00 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Apr 2016 13:45:00 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C1E13.4090909@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
Message-ID: <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>

On 12 April 2016 at 07:58, Ethan Furman <ethan at stoneleaf.us> wrote:
> Sticking points:
> ---------------
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow
> bytes from __fspath__()?

I've come around to the point of view that allowing both str and
bytes-like objects to pass through unchanged makes sense, with the
rationale being the one someone mentioned regarding ease-of-use in
os.path.

Consider os.path.join: with a permissive os.fspath, the necessary
update should just be to introduce "map(os.fspath, args)" (or its C
equivalent), and then continue with the existing bytes vs str handling
logic.

Functions consuming os.fspath can then decide on a case-by-case basis
how they want to handle binary paths: either use them as is (which
will usually work on mostly-ASCII systems), convert them to text with
os.fsdecode (which will usually work on *nix systems), or disallow
them entirely (which would probably only be appropriate for libraries
that wanted to ensure support for non-ASCII paths on Windows systems).

That then cascades into the other open questions mentioned:

- permitted return types for both fspath and __fspath__ would be (str, bytes)
- the names would be fspath and __fspath__, since the result may be
either a path name as text, or an encoded path name as bytes

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Mon Apr 11 23:58:29 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Apr 2016 13:58:29 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
Message-ID: <CADiSq7fO0wvyO6iS6hwzrooDf8imYj7EKYxqjggFiXXWEtae4g@mail.gmail.com>

On 12 April 2016 at 13:45, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Consider os.path.join: with a permissive os.fspath, the necessary
> update should just be to introduce "map(os.fspath, args)" (or its C
> equivalent), and then continue with the existing bytes vs str handling
> logic.

That does remind me: once a patch is available, we should check the
benchmark numbers with the patch applied. I'd expect the new protocol
overhead to be swamped by the actual IO costs, but this kind of low
level change can have surprising consequences.

Regarding the type checks, PyObject_AsFilesystemPath (or whatever we
call it) will be implemented in C, with os.fspath just calling that,
so doing "PyUnicode_Check(path) || PyBytes_Check(path)" on the result
will be both cheap and convenient for API consumers (since it means
they know they only have to cope with bytes or str instances
internally, and will get a clear error message if handed something
else).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From chris.barker at noaa.gov  Tue Apr 12 01:14:29 2016
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Mon, 11 Apr 2016 22:14:29 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
Message-ID: <-9219200259368253896@unknownmsgid>

>  with the
> rationale being the one someone mentioned regarding ease-of-use in
> os.path.
>
> Consider os.path.join:

Why in the world do the  os.path functions need to work with Path
objects? ( and other conforming objects)

Thus all started with the goal of using Path objects in the stdlib,
but that's for opening files, etc. Path is an alternative to os.path
-- you don't need to use both.

And if you do have a byte path, you can stick with os.path....

BTW,

I'm confused about what a bytes path IS -- is it encoded? Can you
assume it can be decoded ? It seems to me that the ONLY time you
should get a byte path is from a low level system call on a posix
system, and you may have no idea how it's encoded. So the ONLY thing
you should do with it is pass it along to another low level system
call.

I can't see why we should support anything else with bytes objects.

> - the names would be fspath and __fspath__, since the result may be
> either a path name as text, or an encoded path name as bytes

You just used the phrase "path name as bytes" -- so why is
__pathname__ inappropriate if it might return bytes?

I like __pathname__ better because this entire effort is because we'
be decided itMs important to make the distinction between a "path" and
the text representation of said path.

Just sayin'

-CHB

From stephen at xemacs.org  Tue Apr 12 01:28:51 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 12 Apr 2016 14:28:51 +0900
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
Message-ID: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>

Donald Stufft writes:

 > I think yes and yes [__fspath__ and fspath should be allowed to
 > handle bytes, otherwise] it seems like making it needlessly harder
 > to deal with a bytes path

It's not needless.  This kind of polymorphism makes it hard to review
code locally.  Once bytes get a foothold inside a text application,
they metastasize altogether too easily, and you end up with TypeErrors
or UnicodeErrors quite far from the origin.  Debugging often requires
tracing data flows over hill and over dale while choking from the
dusty trail, or band-aids like a top-level "except UnicodeError:
log_and_quarantine(bytes)".  I can't prove that returning bytes from
these APIs is a big risk in this sense, but I can't see a way to prove
that it's not, either, given that their point is duck-typing, and
therefore they may be generalized in the future, and by third parties.

I understand that there are applications where it's bytes all the way
down, but by the very nature of computing systems, there are systems
where bytes are decoded to text.  For historical reasons (the encoding
Tower of Babel), it's very error-prone to do that on demand.  Best
practice is to do the conversion as close to the boundary as possible,
and process only text internally.

In text applications, "bytes as carcinogen" is an apt metaphor.

Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
text-processing applications is more important than the inconvenience
to byte-shoveling applications.  But there is a need to be
parsimonious with polymorphism.


From robertc at robertcollins.net  Tue Apr 12 01:30:04 2016
From: robertc at robertcollins.net (Robert Collins)
Date: Tue, 12 Apr 2016 17:30:04 +1200
Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7?
In-Reply-To: <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp>
References: <CAJ3HoZ38C2rs0EF+pW3OKXcJfwBwscYi2ijG+rTEqUiqui5xYQ@mail.gmail.com>
 <CAMpsgwb-f2+xXf5KCnY5882gVODM4x3EQk=s56CYm0Ku3rKZsA@mail.gmail.com>
 <CAJ3HoZ33RdKQcjYp5EZf8QMFHe+NrVSRALzgL8TqcaVAqNLOjg@mail.gmail.com>
 <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAJ3HoZ2bqJigfpc+Cw97E5oyuVm_dy7jCe1OEARaFKK9sJoXBg@mail.gmail.com>

On 6 April 2016 at 15:03, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Robert Collins writes:
>
>  > Sadly that has the ordering bug of assigning __wrapped__ first and appears
>  > a little unmaintained based on the bug tracker :(
>
> You can fix two problems with one patch, then!
>

Not really - taking over a project is somewhat long winded; it would
be centralising yet another backport which
may-or-may-not-be-a-good-thing, and I'm not exactly overflowing with
spare tuits. If someone wants to do it - great, more power to them,
but the last thing we need is to move it from one unmaintained spot to
another unmaintained spot.

-Rob



-- 
Robert Collins <rbtcollins at hpe.com>
Distinguished Technologist
HP Converged Cloud

From greg.ewing at canterbury.ac.nz  Tue Apr 12 01:40:16 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 12 Apr 2016 17:40:16 +1200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <-9219200259368253896@unknownmsgid>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid>
Message-ID: <570C8A40.6020903@canterbury.ac.nz>

Chris Barker - NOAA Federal wrote:
> Why in the world do the  os.path functions need to work with Path
> objects?

So that applications using path objects can pass them
to library code that uses os.path to manipulate them.

> I'm confused about what a bytes path IS -- is it encoded?

It's a sequence of bytes identifying a file. Often it
will be an encoding of som piece of text in the file
system encoding, but there's no guarantee of that.

> Can you assume it can be decoded ?

Only if you use an encoding in which all byte sequences
are valid, such as latin1 or utf8+surrogateescape.

> So the ONLY thing
> you should do with it is pass it along to another low level system
> call.

Not quite -- you can separate it into components and
work with them. Essentially the same set of operations
that os.path provides.

>>- the names would be fspath and __fspath__, since the result may be
>>either a path name as text, or an encoded path name as bytes
> 
> I like __pathname__ better because this entire effort is because we'
> be decided itMs important to make the distinction between a "path" and
> the text representation of said path.

I agree -- the term "pathname" can cover both text and
bytes. When posix talks about pathnames it's really
talking about bytes.

-- 
Greg

From ethan at stoneleaf.us  Tue Apr 12 02:00:14 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 11 Apr 2016 23:00:14 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <-9219200259368253896@unknownmsgid>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid>
Message-ID: <570C8EEE.6050904@stoneleaf.us>

On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote:

>> Consider os.path.join:
>
> Why in the world do the  os.path functions need to work with Path
> objects? ( and other conforming objects)

Because library XYZ that takes a path and wants to open it shouldn't 
have to care whether that path is a string or pathlib.Path -- but if 
os.open can't use pathlib.Path then the library has to care (or the user 
has to care).

> This all started with the goal of using Path objects in the stdlib,
> but that's for opening files, etc.

Etc. as in os.join?  os.stat? os.path.split?

> Path is an alternative to os.path -- you don't need to use both.

As a user you don't, no.  As a library that has no control over what 
kind of "path" is passed to you -- well, if os and os.path can accept 
Path objects then you can just use os and os.path; otherwise you have to 
use os and os.path if passed a str or bytes, and pathlib.Path if passed 
a pathlib.Path -- so you do have to use both.

>> - the names would be fspath and __fspath__, since the result may be
>> either a path name as text, or an encoded path name as bytes
>
> You just used the phrase "path name as bytes" -- so why is
> __pathname__ inappropriate if it might return bytes?

No, he used the phrase "*encoded* path name as bytes".  Names are 
typically represented as text, and since bytes might be returned we 
don't want a signal that says text.

> I like __pathname__ better because this entire effort is because we'
> be decided itMs important to make the distinction between a "path" and
> the text representation of said path.

No, this entire effort is to make pathlib work with the rest of the stdlib.

--
~Ethan~

From stephen at xemacs.org  Tue Apr 12 02:21:12 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 12 Apr 2016 15:21:12 +0900
Subject: [Python-Dev]  Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C0A0B.90109@sdamon.com>
References: <570C0A0B.90109@sdamon.com>
Message-ID: <22284.37848.204411.503483@turnbull.sk.tsukuba.ac.jp>

Alexander Walters writes:

 > If there is headway being made, I do not see it.

Filter out everything but the posts by Brett, and see if you still
feel that way.  (Other people have contributed[1], but that filter
has about 20dB better S/N than the whole thread does.)


Footnotes: 
[1]  Brett may even claim none of the ideas are his.


From stephen at xemacs.org  Tue Apr 12 03:52:19 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 12 Apr 2016 16:52:19 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAEfz+Txb3LtCrUEr+oBxb1H2VzMOMBZoJ+y_oy9jX8Gndd3KRg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <CAEfz+Tx0u7S-Ni1HsSMSVySH+vG8X0E07RUpXEv0=H=ErDYXyg@mail.gmail.com>
 <CAEfz+Txb3LtCrUEr+oBxb1H2VzMOMBZoJ+y_oy9jX8Gndd3KRg@mail.gmail.com>
Message-ID: <22284.43315.54899.838953@turnbull.sk.tsukuba.ac.jp>

INADA Naoki writes:

 > > Why not print(obj)?

print(obj) will give mojibake by default if
sys.getfilenameencoding() != sys.getdefaultencoding().

 > > str() is normal high-level API, and __fspath__ and os.fspath() should be
 > > low level API.
 > > Normal users shouldn't use __fspath__ and os.fspath().  Only library
 > > developers should use it.

This is the price we pay for the stubbornness of the
bytes-are-text-too meme.




From p.f.moore at gmail.com  Tue Apr 12 04:17:28 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 12 Apr 2016 09:17:28 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160411165354.GC8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
Message-ID: <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>

On 11 April 2016 at 17:53, Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
>> You're limiting the subset of Python that people can use,
>> understood. And you're trying to ensure that people can't do "bad
>> things". Again, understood. But what subset are you actually allowing,
>> and what things are you trying to protect against? (For example, I
>> can't calculate sin(1.2) using the math module - why is that not
>> alllowed?
>
> It wasn't allowed in the earlier version because I wasn't allowing
> import at all, because this is just an experiment. As it happens,
> I added 'import' yesterday so yes you can use math.sin.

Well, I'll ask the obvious question, then. In allowing "import" did
you allow "import ctypes"? If so, then I win :-) Or did you explicitly
whitelist certain modules? And if so, which ones are they, and did I
succeed if I manage to import a module you hadn't whitelisted?

>> It feels at the moment as if I'm playing a game where I don't know the
>> rules, and every time I think I scored a point, the rules are changed
>> to retroactively disallow it.
>
> The challenge is to show some code that will escape from the sandbox,
> in a way that is not trivially fixable with a tiny patch, or in a way
> that demonstrates that such a large number of tiny patches would be
> required as to be unworkable.

But I'm still not clear when I count as "outside the sandbox", given
that I don't know what the rules of what is allowed *in* the sandbox
are...

Paul

From rosuav at gmail.com  Tue Apr 12 04:28:34 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 18:28:34 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CAPTjJmqms4g6r436M5ZYttQT89VV=gaBQisKmh9w1VP9oEmSdA@mail.gmail.com>

On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> Anyway the code is at https://github.com/jribbens/unsafe
> It requires Python 3.4 or later (it could probably be made to work on
> Python 2.7 as well, but it would need some changes).

Rather annoying point: Your interactive mode allows no editing keys
(readline etc), and also doesn't have underscore for "last result", as
that's a forbidden name. :( Makes tinkering fiddly.

ChrisA

From p.f.moore at gmail.com  Tue Apr 12 04:31:21 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 12 Apr 2016 09:31:21 +0100
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CACac1F9U7J_t1hNpGuKdMhrPB9-WteOd_Wr_+s5sJC0-fMz_RA@mail.gmail.com>

On 12 April 2016 at 06:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.
>
> In text applications, "bytes as carcinogen" is an apt metaphor.
>
> Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
> text-processing applications is more important than the inconvenience
> to byte-shoveling applications.  But there is a need to be
> parsimonious with polymorphism.

As someone who has done a lot of work helping projects to port from
the 2.x bytes/text model to the 3.x model, I have similar concerns
that rooting out the source of bytes objects appearing in a program
could be an issue with the proposed "return either" approach. The most
effective tool I have found in fixing programs with text/bytes issues
is carefully and thoroughly annotating precisely which functions
accept and return bytes, and which accept and return text. The sort of
mixed-mode processing we're talking about here makes that
substantially harder. And note that the signature of os.fspath can
return bytes or text *independent* of the type of the argument - it's
not a "bytes in, bytes out" function like the usual pattern of
"polymorphic support for bytes".

But just like Stephen, I have no feel for how significant the risk
will be in real life. I've never worked on code that actually has a
need for bytestring paths (particularly now that surrogateescape
ensures that most cases "just work").

Paul

From ncoghlan at gmail.com  Tue Apr 12 04:56:44 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Apr 2016 18:56:44 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>

On 12 April 2016 at 15:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath. That
is, it would be equivalent to:

    def fspathname(path):
        name = os.fspath(path)
        if not isinstance(name, str):
            raise TypeError("Expected str for pathname, not
{}".format(type(name)))
        return name

That way folks that wanted the clean "must be str" signature could use
os.fspathname, while those that wanted to accept either could use the
lower level os.fspath.

The ambiguity in question here is inherent in the differences between
the way POSIX and Windows work, so there are limits to how far we can
go in hiding it without making things worse rather than better.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Tue Apr 12 04:57:37 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 18:57:37 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
Message-ID: <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>

On Tue, Apr 12, 2016 at 6:17 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> Well, I'll ask the obvious question, then. In allowing "import" did
> you allow "import ctypes"? If so, then I win :-) Or did you explicitly
> whitelist certain modules? And if so, which ones are they, and did I
> succeed if I manage to import a module you hadn't whitelisted?

The module whitelist is given at the top of the source code:

_SAFE_MODULES = frozenset((
    "base64", "binascii", "bisect", "calendar", "cmath", "crypt", "datetime",
    "decimal", "enum", "errno", "fractions", "functools", "hashlib", "hmac",
    "ipaddress", "itertools", "math", "numbers", "queue", "re", "statistics",
    "textwrap", "unicodedata", "urllib.parse",
))

And yes, you win if you get another module. Interestingly, you're
allowed to import urllib.parse, but not urllib itself; but "import
urllib.parse" makes urllib available - and, since modules inside
modules are blacklisted, "urllib.parse" doesn't exist
(AttributeError).

You can access the decimal module, and call decimal.getcontext(). This
returns the same default context object that the "outer" Python uses;
consequently, this sandboxing technique MUST NOT be used in any
program that, now or ever in the future, uses the decimal module (or
at least its default context; but I'm not sure how you'd be absolutely
sure you never EVER use the default context).

Even more curiously, you can "import fractions", but you don't get
fractions.Fraction - though you *do* get fractions.Decimal. And
importing enum gives you EnumMeta, but metaclasses seem to be broken,
and you can't get enum.Enum.

The sandbox code assumes that an attacker cannot create files in the
current directory.

rosuav at sikorsky:~/tmp/unsafe$ echo 'import sys; real_module = lambda
mod: sys.modules[mod]' >hashlib.py
rosuav at sikorsky:~/tmp/unsafe$ ./unsafe.py -i
Python 3.6.0a0 (default:78b84ae0b745+, Apr  6 2016, 03:43:18)
[GCC 5.3.1 20160323] on linux
Type "help", "copyright", "credits" or "license" for more information.
(SafeInteractiveConsole)
>>> import hashlib
>>> hashlib.real_module("sys")
<module 'sys' (built-in)>

Setting LC_ALL and then working with calendar.LocaleTextCalendar()
causes locale files to be read. I'm not sure if you can turn that into
an exploit, but the attack surface depends on the installed locales on
the system.

This is still a massive game of whack-a-mole.

ChrisA

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 05:08:05 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 10:08:05 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmqms4g6r436M5ZYttQT89VV=gaBQisKmh9w1VP9oEmSdA@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAPTjJmqms4g6r436M5ZYttQT89VV=gaBQisKmh9w1VP9oEmSdA@mail.gmail.com>
Message-ID: <20160412090805.GF8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 06:28:34PM +1000, Chris Angelico wrote:
> On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
> <jon+python-dev at unequivocal.co.uk> wrote:
> > Anyway the code is at https://github.com/jribbens/unsafe
> > It requires Python 3.4 or later (it could probably be made to work on
> > Python 2.7 as well, but it would need some changes).
> 
> Rather annoying point: Your interactive mode allows no editing keys
> (readline etc), and also doesn't have underscore for "last result", as
> that's a forbidden name. :( Makes tinkering fiddly.

It's just a subclass of the stdlib class code.InteractiveConsole,
which seems not to offer those features unfortunately.

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 06:06:23 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 11:06:23 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
Message-ID: <20160412100623.GG8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
> And yes, you win if you get another module. Interestingly, you're
> allowed to import urllib.parse, but not urllib itself; but "import
> urllib.parse" makes urllib available - and, since modules inside
> modules are blacklisted, "urllib.parse" doesn't exist
> (AttributeError).

Yes, this is issue #3 on github. I'd need to spend a few minutes
thinking about how to make importing of submodules work out properly.

> You can access the decimal module, and call decimal.getcontext(). This
> returns the same default context object that the "outer" Python uses;

OK, decimal goes ;-)

> Even more curiously, you can "import fractions", but you don't get
> fractions.Fraction - though you *do* get fractions.Decimal.

That seems to be because Fraction inherits from numbers.Number,
which has a metaclass, so type(Fraction) is abc.ABCMeta not 'type'.
That's obviously not a security hole and may well be fixable.

> The sandbox code assumes that an attacker cannot create files in the
> current directory.

If the attacker can create such files then the system is already
compromised even if you're not using any sandboxing system, because
you won't be able to trust any normal imports from your own code.

> Setting LC_ALL and then working with calendar.LocaleTextCalendar()
> causes locale files to be read.

I don't think that has any obvious relevance. Doing "import enum"
causes "enum.py" to be read too, and that isn't a security hole.

> This is still a massive game of whack-a-mole.

No, it still isn't. If the names blacklist had to keep being extended
then you would be right, but that hasn't happened so far. Whitelists
by definition contain only a small, limited number of potential moles.

The only thing you found above that even remotely approaches an
exploit is the decimal.getcontext() thing, and even that I don't
think you could use to do any code execution.

From rosuav at gmail.com  Tue Apr 12 06:27:14 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 20:27:14 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412100623.GG8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
Message-ID: <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>

On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
>> The sandbox code assumes that an attacker cannot create files in the
>> current directory.
>
> If the attacker can create such files then the system is already
> compromised even if you're not using any sandboxing system, because
> you won't be able to trust any normal imports from your own code.

Just confirming that, yeah. Though you could protect against it
somewhat by pre-importing everything that can legally be imported;
that way, at least the attack surface ceases once untrusted code
starts executing. Consider it a privilege escalation attack; you can
move from "create file in current directory" to "remote code
execution" simply by creating hashlib.py and then importing it.

>> Setting LC_ALL and then working with calendar.LocaleTextCalendar()
>> causes locale files to be read.
>
> I don't think that has any obvious relevance. Doing "import enum"
> causes "enum.py" to be read too, and that isn't a security hole.

I mean the system locale files, not just locale.py itself. If nothing
else, it's a means of discovering info about the system. I don't know
what you can get by figuring out what locales are installed, but it's
another concern to think about.

>> This is still a massive game of whack-a-mole.
>
> No, it still isn't. If the names blacklist had to keep being extended
> then you would be right, but that hasn't happened so far. Whitelists
> by definition contain only a small, limited number of potential moles.
>
> The only thing you found above that even remotely approaches an
> exploit is the decimal.getcontext() thing, and even that I don't
> think you could use to do any code execution.

decimal.getcontext is a simple and obvious example of a way that
global mutable objects can be accessed across the boundary. There is
no way to mathematically prove that there are no more, so it's still a
matter of blacklisting.

I still think you need to work out a "minimum viable set" and set down
some concrete rules: if any feature in this set has to be blacklisted
in order to achieve security, the experiment has failed.

ChrisA

From p.f.moore at gmail.com  Tue Apr 12 06:41:14 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 12 Apr 2016 11:41:14 +0100
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C1560.7070105@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
Message-ID: <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>

On 11 April 2016 at 22:21, Sven R. Kunze <srkunze at mail.de> wrote:
> On 11.04.2016 23:08, Random832 wrote:
>>
>> On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote:
>>>
>>> PS: The only way out that I can imagine is to fix pathlib. I am not in
>>> favor of fixing functions of "os" and "os.path" to except "path"
>>> objects;
>>
>> Why not?
>
>
> It occurred to me after pondering over Paul's comments.
>
> "os" and "os.path" is just a completely different level of abstraction.
> There is just no need to mess with them.
>
> The initial failure of my colleague and me of using pathlib can be solely
> attributed to pathlib's lack of functionality. Not to the incompatibility of
> "os" nor "os.path" with "Path" objects.

As your thoughts appear to have been triggered by my comments, I feel
I should clarify.

1. I like pathlib even as it is right now, and I'm strongly -1 on removing it.
2. The "external dependency" aspect of 3rd party solutions makes them
far less useful to me.
3. The work on improving integration with the stdlib (which is nearly
sorted now, as far as I can see) is a big improvement, and I'm all in
favour. But even without it, I wouldn't want pathlib to be removed.
4. There are further improvements that could be made to pathlib,
certainly, but again they are optional, and pathlib is fine without
them.
5. I wish more 3rd party code integrated better with pathlib. The
improved integration work might help with this. But ultimately, Python
2 compatibility is likely to be the biggest block (either perceived or
real - we can make pathlib support as simple as possible, but some 3rd
party authors will remain unwilling to add support for Python 3 only
features in the short term). This isn't a pathlib problem.
6. There will probably always be a place for low-level os/os.path
code. Adding support in those modules for pathlib doesn't affect that
fact, but does make it easier to use pathlib "seamlessly", so why not
do so?

tl; dr; I'm 100% in favour of pathlib, and in the direction the
current discussion (excluding "let's give up on pathlib" digressions)
is going.

Paul

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 07:10:40 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 12:10:40 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
Message-ID: <20160412111040.GH8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
> On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens
> <jon+python-dev at unequivocal.co.uk> wrote:
> > No, it still isn't. If the names blacklist had to keep being extended
> > then you would be right, but that hasn't happened so far. Whitelists
> > by definition contain only a small, limited number of potential moles.
> >
> > The only thing you found above that even remotely approaches an
> > exploit is the decimal.getcontext() thing, and even that I don't
> > think you could use to do any code execution.
> 
> decimal.getcontext is a simple and obvious example of a way that
> global mutable objects can be accessed across the boundary. There is
> no way to mathematically prove that there are no more, so it's still a
> matter of blacklisting.

No, it's a matter of reducing the whitelist. I must admit that
I don't understand in what way this is not already clear. Look:

  >>> len(unsafe._SAFE_MODULES)
  23

I could "mathematically prove" that there are no more security holes
in that list by reducing its length to zero. There are still plenty
of circumstances in which the experiment would be a useful tool even
with no modules allowed to be imported.

> I still think you need to work out a "minimum viable set" and set down
> some concrete rules: if any feature in this set has to be blacklisted
> in order to achieve security, the experiment has failed.

The "minimum viable set" in my view would be: no builtins at all,
only allowing eval() not exec(), and disallowing yield [from],
lambdas and generator expressions.

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 07:14:45 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 12:14:45 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
Message-ID: <20160412111445.GI8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
> On Tue, 12 Apr 2016, Jon Ribbens wrote:
> >>This is still a massive game of whack-a-mole.
> >
> >No, it still isn't. If the names blacklist had to keep being extended
> >then you would be right, but that hasn't happened so far. Whitelists
> >by definition contain only a small, limited number of potential moles.
> >
> >The only thing you found above that even remotely approaches an
> >exploit is the decimal.getcontext() thing, and even that I don't
> >think you could use to do any code execution.
> 
> "I don't think"?
> 
> Where's the formal proof?

I disallowed the module completely, that's the proof.

> Without a proof, this is indeed just a game of whack-a-mole.

Almost no computer programs are ever "formally proved" to be secure.
None of those that run the global Internet are. I don't see why it
makes any sense to demand that my experiment be held to a massively
higher standard than the rest of the code everyone relies on every day.

From fijall at gmail.com  Tue Apr 12 07:38:09 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 12 Apr 2016 13:38:09 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412111445.GI8206@unequivocal.co.uk>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
 <20160412111445.GI8206@unequivocal.co.uk>
Message-ID: <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>

On Tue, Apr 12, 2016 at 1:14 PM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
>> On Tue, 12 Apr 2016, Jon Ribbens wrote:
>> >>This is still a massive game of whack-a-mole.
>> >
>> >No, it still isn't. If the names blacklist had to keep being extended
>> >then you would be right, but that hasn't happened so far. Whitelists
>> >by definition contain only a small, limited number of potential moles.
>> >
>> >The only thing you found above that even remotely approaches an
>> >exploit is the decimal.getcontext() thing, and even that I don't
>> >think you could use to do any code execution.
>>
>> "I don't think"?
>>
>> Where's the formal proof?
>
> I disallowed the module completely, that's the proof.
>
>> Without a proof, this is indeed just a game of whack-a-mole.
>
> Almost no computer programs are ever "formally proved" to be secure.
> None of those that run the global Internet are. I don't see why it
> makes any sense to demand that my experiment be held to a massively
> higher standard than the rest of the code everyone relies on every day.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

Jon, let me reiterate. You asked people to break it (that's the title
of the thread) and they did so almost immediately. Then you patched
the thing and asked them to break it again and they did. Now the
faulty assumption here is that this procedure, repeated enough times
will produce a secure environment - this is not how security works,
you need to be secure against people who will spend more than 5
minutes and who are not on this list or reading this incredibly long
email chain. You can't do that just by asking on the mailing list and
whacking all the examples. As others pointed out, this particular
approach (with maybe different details) has been tried again and again
and again and the result has been the same - you end up with either a
completely unusable python (the python that can't run anything is
trivially secure) or you end up with something that's insecure. I
suggest you look instead at something like PyPy sandbox - which
systematically replaces all external calls with a call to a proxy.
Because PyPy is written in RPython, you can do that - the amount of
code that needs reviewing is relatively small, a couple pages of code.
The code you need to review in order to be even remotely secure is
much larger - it's the amount of C code you can call from your python
with or without knowing that it can happen.

Cheers,
fijal

From rosuav at gmail.com  Tue Apr 12 08:05:22 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 22:05:22 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412111040.GH8206@unequivocal.co.uk>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
Message-ID: <CAPTjJmpAF-qPdvO4fUDe_SgFMkcYRH_kbZdCqa+GCzhOUB3-Uw@mail.gmail.com>

On Tue, Apr 12, 2016 at 9:10 PM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
>> decimal.getcontext is a simple and obvious example of a way that
>> global mutable objects can be accessed across the boundary. There is
>> no way to mathematically prove that there are no more, so it's still a
>> matter of blacklisting.
>
> No, it's a matter of reducing the whitelist. I must admit that
> I don't understand in what way this is not already clear. Look:
>
>   >>> len(unsafe._SAFE_MODULES)
>   23
>
> I could "mathematically prove" that there are no more security holes
> in that list by reducing its length to zero. There are still plenty
> of circumstances in which the experiment would be a useful tool even
> with no modules allowed to be imported.

Yes, you just removed decimal because of getcontext. What about the
next module with that kind of issue? Or what about the next
non-underscore attribute on a core type that can cause you grief (like
how async functions leak stack frames)?

>> I still think you need to work out a "minimum viable set" and set down
>> some concrete rules: if any feature in this set has to be blacklisted
>> in order to achieve security, the experiment has failed.
>
> The "minimum viable set" in my view would be: no builtins at all,
> only allowing eval() not exec(), and disallowing yield [from],
> lambdas and generator expressions.

Then start with that. Don't give ANYTHING else. Otherwise you're still
playing with the blacklist.

But at that point, you pretty much have something that can't be
recognized as Python. You may as well start from a completely
different basis and design your own expression evaluator, maybe making
use of parse-to-AST, but not actually eval'ing the source code. That's
how fundamental this issue is - to dodge the security problems, you
get to the point where you've dodged all of what makes Python Python.

ChrisA

From victor.stinner at gmail.com  Tue Apr 12 08:05:06 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 14:05:06 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412111040.GH8206@unequivocal.co.uk>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
Message-ID: <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>

2016-04-12 13:10 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> No, it's a matter of reducing the whitelist. I must admit that
> I don't understand in what way this is not already clear. Look:
>
>   >>> len(unsafe._SAFE_MODULES)
>   23

You don't understand that even if the visible "Python scope", "Python
namespace", or call it as you want (the code that is accessible from
your sandbox) looks very tiny, the real effictive code is HUGE. For
example, you give a full access to the str type which is made of 20K
lines of C code:

haypo at smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c
Objects/stringlib/*h
 15670 Objects/unicodeobject.c
   297 Objects/unicodectype.c
    29 Objects/stringlib/asciilib.h
   827 Objects/stringlib/codecs.h
    27 Objects/stringlib/count.h
   109 Objects/stringlib/ctype.h
    25 Objects/stringlib/eq.h
   250 Objects/stringlib/fastsearch.h
   201 Objects/stringlib/find.h
   133 Objects/stringlib/find_max_char.h
   140 Objects/stringlib/join.h
   180 Objects/stringlib/localeutil.h
   116 Objects/stringlib/partition.h
    53 Objects/stringlib/replace.h
   390 Objects/stringlib/split.h
    28 Objects/stringlib/stringdefs.h
   266 Objects/stringlib/transmogrify.h
    30 Objects/stringlib/ucs1lib.h
    29 Objects/stringlib/ucs2lib.h
    29 Objects/stringlib/ucs4lib.h
    11 Objects/stringlib/undef.h
    32 Objects/stringlib/unicodedefs.h
  1284 Objects/stringlib/unicode_format.h
 20156 total

Did you review carefully *all* these lines? If a single C line gives
access to the real Python namespace, the game is over.

In a few minutes, I found "{0.__class__}".format(obj) which is not a
full escape of the sandbox, but it's just to give one example. With
more time, I'm sure that a line can be found in the str type to escape
your sandbox.


> I could "mathematically prove" that there are no more security holes
> in that list by reducing its length to zero.

You only see a very tiny portion of the real attack surface.

> The "minimum viable set" in my view would be: no builtins at all,
> only allowing eval() not exec(), and disallowing yield [from],
> lambdas and generator expressions.

IMHO it's a waste of time to try to reduce the great Python with
battery included to a simple calculator to compute 1+2. You will never
be able to fix all holes, there are too many holes in your sandbox.

It's very easy to implement your own calculator in pure Python, from
the parser to the code to compute the operators. If you write yourself
the whole code, it's much easier to control what is allowed and put
limits. For example, with your own code, you can put limits on the
maximum number, whereas your sandbox will kill your CPU and memory if
you try 2**(2**100) (no builtin function required for this "exploit").

Victor

From victor.stinner at gmail.com  Tue Apr 12 08:16:57 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 14:16:57 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
Message-ID: <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>

2016-04-08 16:18 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> I've made another attempt at Python sandboxing, which does something
> which I've not seen tried before - using the 'ast' module to do static
> analysis of the untrusted code before it's executed, to prevent most
> of the sneaky tricks that have been used to break out of past attempts
> at sandboxes.

Right, it blocks the most trivial attacks against sandboxes. But you
only fixed a few holes, they are still a wide area of holes to escape
your sandbox.

I read your code and the code of CPython. I found many issues.

Your sandbox runs untrusted code in a new namespace. The game is to
get access of the outter namespace, the real Python namespace. For
example, get the namespace of the unsafe module.

Your bet is that blocking access to "_" variables, using a whitelist
of modules and a few other protections is enough to block access to
the real namespace. The problem is that Python provides a very wide
range of tools for introspection.

I expected to find a hole using the C code, but in fact, it was much
simpler than that.

Your "safe import" hides real functions with a proxy. Ok. But the code
of modules is still run in the real namespace, where I expected that
modules run in the untrusted (restricted) namespace. The game is now
to find a way to retrieve content from the real namespace using any
function exposed in modules.

I found functools.update_wrapper(). I was very surprised because this
function calls getattr() and setattr(), whereas your sandbox replaces
these builtin functions. In fact, the "safe" getattr and setattr are
only installed in the untrusted namespace, and as I wrote, the modules
run in the real Python namespace.


> I would be very interested to see if anyone can manage to break it.

So here you have:
---
import functools

# any proxy function from unsafe.py
import base64
src = base64.main

# hack to get any attribute of an object
def getattr(obj, attr):
    secret = None

    class A:
        def __setattr__(self, key, value):
            nonlocal secret
            if key == attr:
                secret = value

    dst = A()
    functools.update_wrapper(dst, src, assigned=(attr,), updated=())
    return secret

builtins = getattr(base64.main, "__globals__")["__builtins__"]

fn = "/tmp/owned"
with builtins.open(fn, "w") as f:
    f.write("game over!\n")
---

The exploit is based on two things:

* update_wrapper() is used to get the secret attribute using the real
getattr() function
* update_wrapper() + A.__setattr__ are used to pass the secret from
the real namespace to the untrusted namespace


> Bugs which are trivially fixable are of course welcomed, but the real
> question is: is this approach basically sound, or is it fundamentally
> unworkable?

You can block the functools.update_wrapper(), or even the whole
functools module. But it will not fix the root cause: modules must run
in the untrusted namespace.

In pysandbox, I have code to ensure that all modules run in the
untrusted namespace: see CleanupBuiltins in sandbox/builtins.py. But
it was not enough, many vulnerabilities were found even with all my
protections.

I'm sure that many others will find other ways to escape your sandbox
with enough time. It's a matter of time, not a matter of whitelists.

As I wrote in my long explaning why pysandbox is broken by design,
writing a sandbox inside a CPython doesn't work. In fact, what you
want to restrict is the access to limited resources like CPU and
memory, and block access to the filesystem. This is the job of the
operating system, and external sandboxes help to block access to the
filesystem.

Victor

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 08:18:33 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 13:18:33 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>
References: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
 <20160412111445.GI8206@unequivocal.co.uk>
 <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>
Message-ID: <20160412121833.GK8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 01:38:09PM +0200, Maciej Fijalkowski wrote:
> Jon, let me reiterate. You asked people to break it (that's the title
> of the thread) and they did so almost immediately. Then you patched
> the thing and asked them to break it again and they did. Now the
> faulty assumption here is that this procedure, repeated enough times
> will produce a secure environment - this is not how security works,

That is not an accurate summary of what has happened so far,
nor am I making that assumption. You are misunderstanding the
purpose of the experiment - I am not sure how, as I have tried
to be quite clear.

The question is: with a minimal (or empty) set of builtins, and a
restriction on ast.Name and ast.Attribute nodes, can exec/eval be
made 'safe' so they cannot execute code outside the sandbox. The
answer appears to be "yes", if the restriction is "^f?_". (If you
additionally inject external objects to the namespace then they need
to be proxied and mro() prevented.)

> You can't do that just by asking on the mailing list and whacking
> all the examples.

If anyone had managed to find any more examples of holes in the
original featureset after the first couple then I would agree with
you, but they haven't. 

> As others pointed out, this particular approach (with maybe
> different details) has been tried again and again and again

This simply isn't true either. As far as I can see, only
RestrictedPython has tried anything remotely similar, and
to the best of my ability to determine, that project is not
considerd a failure.

From victor.stinner at gmail.com  Tue Apr 12 08:20:37 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 14:20:37 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>
References: <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
 <20160412111445.GI8206@unequivocal.co.uk>
 <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>
Message-ID: <CAMpsgwZVuYqPpUA-aaTgNti-OE8=qhkphRGw-ccydRY6UR4E1g@mail.gmail.com>

2016-04-12 13:38 GMT+02:00 Maciej Fijalkowski <fijall at gmail.com>:
> (...) you end up with either a
> completely unusable python (the python that can't run anything is
> trivially secure)

Yeah, that's the obvious question: what's the purpose of such very
limited Python subset, for example something limited to int with a few
operators (+ - * /)?

That's also why I gave up with pysandbox. It became impossible to
execute anything more complex than an hello world.

By the way, I noticed that enum.Enum and enum.EnumMeta don't work in
your sandbox.

Victor

From victor.stinner at gmail.com  Tue Apr 12 08:24:31 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 14:24:31 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412121833.GK8206@unequivocal.co.uk>
References: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>
 <20160412111445.GI8206@unequivocal.co.uk>
 <CAK5idxQ9uVeZH5yA_n5PP6BO5vqubMWUqbTNOLH169-Wo9HN4A@mail.gmail.com>
 <20160412121833.GK8206@unequivocal.co.uk>
Message-ID: <CAMpsgwae1kd2kQ8+qB26r67JrJLcBBZ7g5Jy_E5R6BgDVPo5YA@mail.gmail.com>

2016-04-12 14:18 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> The question is: with a minimal (or empty) set of builtins, and a
> restriction on ast.Name and ast.Attribute nodes, can exec/eval be
> made 'safe' so they cannot execute code outside the sandbox.

According to multiple exploits listed in this thread, no, it's not possible.


> If anyone had managed to find any more examples of holes in the
> original featureset after the first couple then I would agree with
> you, but they haven't.

See my latest exploit using functools.update_wrapper() + A.__setattr__() ;-)


>> As others pointed out, this particular approach (with maybe
>> different details) has been tried again and again and again
>
> This simply isn't true either. As far as I can see, only
> RestrictedPython has tried anything remotely similar, and
> to the best of my ability to determine, that project is not
> considerd a failure.

IMHO nobody seriously audited RestrictedPython. It doesn't mean that
it's secure.

When it was created, security was less important than nowadays.

Victor

From victor.stinner at gmail.com  Tue Apr 12 08:31:19 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 12 Apr 2016 14:31:19 +0200
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
Message-ID: <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>

2016-04-12 14:16 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
> I read your code and the code of CPython. I found many issues.
> (...)
> The exploit is based on two things:
>
> * update_wrapper() is used to get the secret attribute using the real
> getattr() function
> * update_wrapper() + A.__setattr__ are used to pass the secret from
> the real namespace to the untrusted namespace

Oh, I forgot to mention another vulnerability: you block access to
attributes by replacing getattr and by analyzing the AST. Ok, but one
more time, it's not enough. If you get access to obj.__dict__, you
will likely get access to any attribute using obj_dict[attr] instead
of obj.attr.

I wrote pysandbox because I liked Tav's idea of *removing* sensitive
dictionary keys of sensitive types like functions, frames and code
objects. Again, it was not enough.

Victor

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 08:31:45 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 13:31:45 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
References: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
 <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
Message-ID: <20160412123145.GL8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:
> 2016-04-12 13:10 GMT+02:00 Jon Ribbens <jon+python-dev at unequivocal.co.uk>:
> > No, it's a matter of reducing the whitelist. I must admit that
> > I don't understand in what way this is not already clear. Look:
> >
> >   >>> len(unsafe._SAFE_MODULES)
> >   23
> 
> You don't understand that even if the visible "Python scope", "Python
> namespace", or call it as you want (the code that is accessible from
> your sandbox) looks very tiny, the real effictive code is HUGE.

You are mistaken, I do understand that.

> In a few minutes, I found "{0.__class__}".format(obj) which is not a
> full escape of the sandbox, but it's just to give one example.

It's something I'd already thought of, and it's not an escape at all.

> > I could "mathematically prove" that there are no more security holes
> > in that list by reducing its length to zero.
> 
> You only see a very tiny portion of the real attack surface.

You've misunderstood my comment - I was saying that the security holes
from imported modules can be easily eliminated. That doesn't say
anything about security holes not from imported modules, of course.

> > The "minimum viable set" in my view would be: no builtins at all,
> > only allowing eval() not exec(), and disallowing yield [from],
> > lambdas and generator expressions.
> 
> IMHO it's a waste of time to try to reduce the great Python with
> battery included to a simple calculator to compute 1+2.

And in my opinion it isn't. There are plenty of use cases for such
a thing. Take a look at this for example:
https://developer.blender.org/D1862 

> It's very easy to implement your own calculator in pure Python, from
> the parser to the code to compute the operators. If you write yourself
> the whole code, it's much easier to control what is allowed and put
> limits. For example, with your own code, you can put limits on the
> maximum number, whereas your sandbox will kill your CPU and memory if
> you try 2**(2**100) (no builtin function required for this "exploit").

Yes, I'd already thought of that too, although if you allow functions
and methods to be called (which they are, in my minimal viable set
suggestion above) then I think perhaps you've not actually bought
yourself very much with all that work.

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 08:42:31 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 13:42:31 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
 <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
Message-ID: <20160412124231.GM8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
> Oh, I forgot to mention another vulnerability: you block access to
> attributes by replacing getattr and by analyzing the AST. Ok, but one
> more time, it's not enough. If you get access to obj.__dict__, you
> will likely get access to any attribute using obj_dict[attr] instead
> of obj.attr.

That's not a vulnerability, and it's something I already explicitly
mentioned - if you can get a function to return an object's __dict__
then you win. The question is: can you do that?

From rosuav at gmail.com  Tue Apr 12 08:45:06 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 22:45:06 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412124231.GM8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
 <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
 <20160412124231.GM8206@unequivocal.co.uk>
Message-ID: <CAPTjJmqYHoUKbFg39ELyrXPsPpJG_q3trcdNROF2zr4is8y1Dw@mail.gmail.com>

On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
>> Oh, I forgot to mention another vulnerability: you block access to
>> attributes by replacing getattr and by analyzing the AST. Ok, but one
>> more time, it's not enough. If you get access to obj.__dict__, you
>> will likely get access to any attribute using obj_dict[attr] instead
>> of obj.attr.
>
> That's not a vulnerability, and it's something I already explicitly
> mentioned - if you can get a function to return an object's __dict__
> then you win. The question is: can you do that?

The question is, rather: Can you prove that we cannot?

ChrisA

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 08:48:22 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 13:48:22 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
Message-ID: <20160412124822.GN8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 02:16:57PM +0200, Victor Stinner wrote:
> I read your code and the code of CPython. I found many issues.

Thanks for your efforts.

> Your "safe import" hides real functions with a proxy. Ok. But the code
> of modules is still run in the real namespace,

Yes, that was the intention.

> I found functools.update_wrapper(). I was very surprised because this
> function calls getattr() and setattr(), whereas your sandbox replaces
> these builtin functions.

Good point. It seems it was almost certainly foolish of me to add
'import' back in in response to peoples' comments while my original
concept was still being discussed.

> So here you have:
> ---
> import functools

Thanks, that was pretty clever. I've of course fixed it by reducing
the list of imports (a lot, since I had really audited them at all).
But you make a good point.

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 08:49:50 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 13:49:50 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmqYHoUKbFg39ELyrXPsPpJG_q3trcdNROF2zr4is8y1Dw@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
 <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
 <20160412124231.GM8206@unequivocal.co.uk>
 <CAPTjJmqYHoUKbFg39ELyrXPsPpJG_q3trcdNROF2zr4is8y1Dw@mail.gmail.com>
Message-ID: <20160412124950.GO8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
> On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
> <jon+python-dev at unequivocal.co.uk> wrote:
> > That's not a vulnerability, and it's something I already explicitly
> > mentioned - if you can get a function to return an object's __dict__
> > then you win. The question is: can you do that?
> 
> The question is, rather: Can you prove that we cannot?

I refer you to the answer given previously. Can you prove you cannot
write code to escape JavaScript sandboxes? No? Then why have you not
disabled JavaScript in your browser?

From rosuav at gmail.com  Tue Apr 12 09:03:11 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 23:03:11 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412124950.GO8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
 <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
 <20160412124231.GM8206@unequivocal.co.uk>
 <CAPTjJmqYHoUKbFg39ELyrXPsPpJG_q3trcdNROF2zr4is8y1Dw@mail.gmail.com>
 <20160412124950.GO8206@unequivocal.co.uk>
Message-ID: <CAPTjJmrJ6YzHwuJGAiPKuzFaOWedtz0=hGm1qhnk3-dFM-uecw@mail.gmail.com>

On Tue, Apr 12, 2016 at 10:49 PM, Jon Ribbens
<jon+python-dev at unequivocal.co.uk> wrote:
> On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
>> On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
>> <jon+python-dev at unequivocal.co.uk> wrote:
>> > That's not a vulnerability, and it's something I already explicitly
>> > mentioned - if you can get a function to return an object's __dict__
>> > then you win. The question is: can you do that?
>>
>> The question is, rather: Can you prove that we cannot?
>
> I refer you to the answer given previously. Can you prove you cannot
> write code to escape JavaScript sandboxes? No? Then why have you not
> disabled JavaScript in your browser?

I personally cannot, any more than I can prove that SSL is secure or
that my Linux+Apache system doesn't allow remote code execution [1]. I
trust other people to, and then make a value judgement: is it worth
breaking all the web sites that depend on it? (And sometimes the
answer is "yes".)

One of the key differences with scripts in web browsers is that there
*is* no "outer environment" to access. Remember what I said about the
difference between Python-in-Python sandboxing and, say,
Lua-in-Python? One tiny exploit in Python-in-Python and you suddenly
gain access to the entire outer environment, and it's game over. One
tiny exploit in Lua-in-Python and you have whatever that exploit gave
you, nothing more.

In fact, if you're prepared to forfeit almost all of Python's power to
achieve security, you probably should look into embedding a JavaScript
or Lua engine in your Python code. You'll get a comparable expression
evaluator, and most people won't be able to tell the difference.
You've already cut the set of modules down to just cmath, datetime,
math, and re; I suspect re is next on the chopping block (it has a
global cache - if the outer system uses a regular expression more than
once, it would potentially be possible to mess with it in the cache,
and then next time it gets used, the injected code gets run), and
datetime might not be that far behind. And if they do go, all you have
left is a scientific calculator. You can implement that in any
language you like.

ChrisA

[1] And if anyone mentions PHP, I will set him to work on the hardest
PHP problem I know of - no, not securing it. I mean convincing end
users that it's not necessary. Securing it is trivial by comparison.

From steve at pearwood.info  Tue Apr 12 09:12:27 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 12 Apr 2016 23:12:27 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
References: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
 <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
Message-ID: <20160412131226.GB1819@ando.pearwood.info>

I haven't been following this thread in detail, so perhaps I have 
missed something, but I have a question...


On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:

> You don't understand that even if the visible "Python scope", "Python
> namespace", or call it as you want (the code that is accessible from
> your sandbox) looks very tiny, the real effictive code is HUGE. For
> example, you give a full access to the str type which is made of 20K
> lines of C code:
> 
> haypo at smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c
> Objects/stringlib/*h
>  15670 Objects/unicodeobject.c
[...]
>   1284 Objects/stringlib/unicode_format.h
>  20156 total
> 
> Did you review carefully *all* these lines? If a single C line gives
> access to the real Python namespace, the game is over.

I don't follow this logic. Jon's sandbox doesn't provide an interface to 
calling arbitrary lines of C code from Python. It is limited to only a 
restricted set of Python operations.

So sticking to string methods for the sake of discussion, it doesn't 
matter if (let's say) str.upper has access to the real Python namespace. 
There is no API for str.upper to return that namespace. It only returns 
a new string. So where is the error in the following reasoning?

There are 44 string methods, excluding those that start with an 
underscore. So if Jon audits those 44 methods, and determines which ones 
return (let's say) strings and which give access to namespaces, then he 
can block the ones which give access to namespaces and allow the ones 
which return strings.

To give a concrete example... suppose that the C locale library is 
unsafe. Further, let's suppose that the str.isdigit method calls code 
from the C locale library, to determine whether or not the string is 
made up of locale-specific digits. How does this make str.isdigit 
(potentially) unsafe? Regardless of what happens inside the method, it 
still returns either True or False and nothing else. There's no 
str.isdigit API to access the locale library.

I can think of one possible threat. Suppose that the locale library has 
a bug, so that calling "aardvark".isdigit seg faults, potentially 
executing arbitrary C code, but at the very least crashing the 
application. Is that the sort of attack you're concerned by?



> In a few minutes, I found "{0.__class__}".format(obj) which is not a
> full escape of the sandbox, but it's just to give one example. With
> more time, I'm sure that a line can be found in the str type to escape
> your sandbox.

Maybe so. And then Jon will fix that vulnerability. And somebody will 
find a new one. And he'll fix that too, or decide that it is too hard to 
fix and give up.

That's how security works. Even software designed for security can have 
exploitable bugs:

http://securityvulns.com/news/FreeBSD/jail/chdir.html

It seems unfair to me to hold Jon to a higher standard than we hold 
people like Apple, or the Linux kernal devs.

I fully accept and respect your personal opinion, based on your 
experience, that Jon's tactic is doomed to failure. But if he needs to 
learn this for himself, just as you had to learn it for yourself 
(otherwise you wouldn't have started your own sandbox project), I can 
respect that too. Progress depends on the unreasonable person who thinks 
they can overturn the conventional wisdom.

You're telling Jon not to bother trying to sandbox CPython, he should 
use PyPy's sandbox instead. But if the PyPy people had believed the 
conventional wisdom that you can't sandbox Python, they wouldn't have a 
sandbox either.

Even if the only thing we learn from Jon's experiment is a new set of 
tricks for breaking out of the sandbox, that's still interesting, if not 
useful. And maybe he'll find some combination of whielist and OS-level 
jail that together makes a practical sandbox. And if not, well, it's his 
own time he is wasting.


> IMHO it's a waste of time to try to reduce the great Python with
> battery included to a simple calculator to compute 1+2.

Completely agree. But hopefully the whitelist won't be that restrictive, 
and will allow subtraction and multiplication as well :-)



-- 
Steve

From rosuav at gmail.com  Tue Apr 12 09:19:53 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 12 Apr 2016 23:19:53 +1000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412131226.GB1819@ando.pearwood.info>
References: <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
 <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
 <20160412131226.GB1819@ando.pearwood.info>
Message-ID: <CAPTjJmpCByUCd3gP6_nr-fQup8RhqHY+AOORN4EjdTvXmx65Dg@mail.gmail.com>

On Tue, Apr 12, 2016 at 11:12 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> To give a concrete example... suppose that the C locale library is
> unsafe. Further, let's suppose that the str.isdigit method calls code
> from the C locale library, to determine whether or not the string is
> made up of locale-specific digits. How does this make str.isdigit
> (potentially) unsafe? Regardless of what happens inside the method, it
> still returns either True or False and nothing else. There's no
> str.isdigit API to access the locale library.
>
> I can think of one possible threat. Suppose that the locale library has
> a bug, so that calling "aardvark".isdigit seg faults, potentially
> executing arbitrary C code, but at the very least crashing the
> application. Is that the sort of attack you're concerned by?

That is a potentially significant attack vector, as it depends on a
lot of external-to-Python information (the current locale, for
instance; and we've seen exploits that involve remotely setting
environment variables, which could include LC_ALL). However, you're
right that it isn't the concern here.

There is one other thing to worry about, and that's anything where the
"inner" system can affect or influence the "outer" system. With the
str type, that's unlikely (since strings are immutable), but I raised
the potential concern of the regex cache, as there's a chance someone
could attack that. The mere presence of decimal.getcontext() resulted
in the whole module getting off the whitelist.

If you want complete isolation of one and the other, that's easy: have
no communication whatsoever. But then there's no point in having them
both execute in the same interpreter. You may as well create a chroot
and run Python inside that, have it serialize the result to JSON and
write it to stdout, which you can then retrieve. That would pretty
much solve the problem. (And in fact, if I were to do-over the project
where I wanted Python sandboxing, that's probably what I'd do.)

ChrisA

From ijmorlan at uwaterloo.ca  Tue Apr 12 06:21:04 2016
From: ijmorlan at uwaterloo.ca (Isaac Morland)
Date: Tue, 12 Apr 2016 06:21:04 -0400 (EDT)
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412100623.GG8206@unequivocal.co.uk>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwZ7mZfkOaVop5CKZovR5Vp2t_00Re-z7EqJk5HBuy3oCA@mail.gmail.com>
 <20160410164308.GE17895@unequivocal.co.uk>
 <CAMpsgwbi7SNbLkacwF6b_20u1s4azd+s0ydDVby5X1ch7rajZg@mail.gmail.com>
 <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
Message-ID: <alpine.DEB.2.02.1604120616530.5446@ubuntu1204-102.cs.uwaterloo.ca>

On Tue, 12 Apr 2016, Jon Ribbens wrote:

>> This is still a massive game of whack-a-mole.
>
> No, it still isn't. If the names blacklist had to keep being extended
> then you would be right, but that hasn't happened so far. Whitelists
> by definition contain only a small, limited number of potential moles.
>
> The only thing you found above that even remotely approaches an
> exploit is the decimal.getcontext() thing, and even that I don't
> think you could use to do any code execution.

"I don't think"?

Where's the formal proof?

Without a proof, this is indeed just a game of whack-a-mole.

I don't "think" Python is a suitable foundation for a sandboxing system 
intended for security purposes, but my "think" won't lead to security 
holes whereas yours will.  So, I would respectfully suggest that unless 
you increase the rigour of your effort substantially, it is not 
worthwhile.  Python is great for lots of applications already - there is 
no need to force it into unsuitable problem domains.

Isaac Morland           CSCF Web Guru
DC 2619, x36650         WWW Software Specialist

From dw+python-dev at hmmz.org  Tue Apr 12 09:40:57 2016
From: dw+python-dev at hmmz.org (David Wilson)
Date: Tue, 12 Apr 2016 13:40:57 +0000
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412131226.GB1819@ando.pearwood.info>
References: <20160411144644.GA8206@unequivocal.co.uk>
 <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
 <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
 <20160412131226.GB1819@ando.pearwood.info>
Message-ID: <20160412134057.GA15550@k3>

On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:

> I can think of one possible threat. Suppose that the locale library
> has a bug, so that calling "aardvark".isdigit seg faults, potentially
> executing arbitrary C code, but at the very least crashing the
> application. Is that the sort of attack you're concerned by?

This thread already covered the need to address SEGV at length. For a
truly evil user, almost any kind of crash is an opportunity to take
control of the system, and a security solution ignoring this is no
security solution at all.


> Maybe so. And then Jon will fix that vulnerability. And somebody will
> find a new one. And he'll fix that too, or decide that it is too hard
> to fix and give up.
> 
> That's how security works. Even software designed for security can
> have exploitable bugs:
> 
> It seems unfair to me to hold Jon to a higher standard than we hold 
> people like Apple, or the Linux kernal devs.

I don't believe that's what is happening here. In the OS analogy, Jon is
generating busywork trying to secure an environment similar to Windows
3.1 that was simply never designed with e.g. memory protection in mind
to begin with, and there is no evidence after numerous attempts spanning
many years by multiple people that such an environment can be secured
meaningfully while still remaining generally useful.


> I fully accept and respect your personal opinion, based on your
> experience, that Jon's tactic is doomed to failure. But if he needs to
> learn this for himself, just as you had to learn it for yourself
> (otherwise you wouldn't have started your own sandbox project), I can
> respect that too. Progress depends on the unreasonable person who
> thinks they can overturn the conventional wisdom.

I'd deeply prefer it is this turned into an investigation or patchset
making CPython work nicely with seccomp, sandbox(7), pledge(2) or
whatever capability minimization mechanisms exist on Windows, they are
all mechanisms to make it much safer for random code to be executing on
your system, designed by folk who at all times expressively had security
in mind.

But that's not what's happening, instead a dead horse is being flogged
over a hundred messages in our inboxes and IMHO it is excruciating to
watch.


> Even if the only thing we learn from Jon's experiment is a new set of
> tricks for breaking out of the sandbox, that's still interesting, if
> not useful.

Don't forget the worst case: a fundamentally broken security module
heavily marketed to the naive using claims the core team couldn't break
it.


David

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 09:48:12 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 14:48:12 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <CAPTjJmrJ6YzHwuJGAiPKuzFaOWedtz0=hGm1qhnk3-dFM-uecw@mail.gmail.com>
References: <20160408141847.GQ4951@unequivocal.co.uk>
 <CAMpsgwbd_knm2V_fHz5WOHwPmCdy47Mm8b0EgQLSYD5GFCO_vg@mail.gmail.com>
 <CAMpsgwbja0chB+OPDV0Fr0P11qb6YAiMumu=sVa8AGVr3yGuwg@mail.gmail.com>
 <20160412124231.GM8206@unequivocal.co.uk>
 <CAPTjJmqYHoUKbFg39ELyrXPsPpJG_q3trcdNROF2zr4is8y1Dw@mail.gmail.com>
 <20160412124950.GO8206@unequivocal.co.uk>
 <CAPTjJmrJ6YzHwuJGAiPKuzFaOWedtz0=hGm1qhnk3-dFM-uecw@mail.gmail.com>
Message-ID: <20160412134812.GP8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 11:03:11PM +1000, Chris Angelico wrote:
> One of the key differences with scripts in web browsers is that there
> *is* no "outer environment" to access.

If you think that then I think you considerably misunderstand how
modern browsers work.

> Remember what I said about the difference between Python-in-Python
> sandboxing and, say, Lua-in-Python? One tiny exploit in
> Python-in-Python and you suddenly gain access to the entire outer
> environment, and it's game over. One tiny exploit in Lua-in-Python
> and you have whatever that exploit gave you, nothing more.

Are you imagining the Lua-in-Python as being completely isolated from
the Python namespace then?

> In fact, if you're prepared to forfeit almost all of Python's power to
> achieve security, you probably should look into embedding a JavaScript
> or Lua engine in your Python code.

Yes, I have in fact already done this (JavaScript using SpiderMonkey).
It allows the JavaScript to access Python objects and methods directly
from JavaScript so it doesn't actually help, but I think I could put
limits on that (e.g. making things read-only) and unlike most of this
Python stuff, that could be made a solid rule with no clever ways
around it.

> I suspect re is next on the chopping block (it has a global cache -
> if the outer system uses a regular expression more than once, it
> would potentially be possible to mess with it in the cache, and then
> next time it gets used, the injected code gets run),

All you could do would be to give misleading results from the regular
expression methods, but yes that is a good point. I regret that
I added the import stuff at all now - it has just been a distraction
from my original point.

> [1] And if anyone mentions PHP, I will set him to work on the hardest
> PHP problem I know of - no, not securing it. I mean convincing end
> users that it's not necessary. Securing it is trivial by comparison.

Fortunately I have managed to exclude PHP completely these days from
any system I have anything to do with!

From jon+python-dev at unequivocal.co.uk  Tue Apr 12 10:03:47 2016
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Tue, 12 Apr 2016 15:03:47 +0100
Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted
 mode revisited)
In-Reply-To: <20160412134057.GA15550@k3>
References: <CACac1F91eRPLTjJSo-=83A2vbPpUSYkuV4q2RrW5HXjFiyc96Q@mail.gmail.com>
 <20160411165354.GC8206@unequivocal.co.uk>
 <CACac1F8uLR-pRvqSGdC2KESu66nP+wLVSmRVfi9is24B73aMxQ@mail.gmail.com>
 <CAPTjJmqDvY2JibBhoZoaLVdaT4SHh9E_k_jKWV=f65L8i2UDRg@mail.gmail.com>
 <20160412100623.GG8206@unequivocal.co.uk>
 <CAPTjJmp3MyJ7xvbKSC4SswADSw8hz7dvZw+e6PL_yPhFRteKTQ@mail.gmail.com>
 <20160412111040.GH8206@unequivocal.co.uk>
 <CAMpsgwZifbSPT5Lov-KXbhLvufar4vHj7P8p8KY3qmTf25Mikg@mail.gmail.com>
 <20160412131226.GB1819@ando.pearwood.info>
 <20160412134057.GA15550@k3>
Message-ID: <20160412140347.GQ8206@unequivocal.co.uk>

On Tue, Apr 12, 2016 at 01:40:57PM +0000, David Wilson wrote:
> On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:
> > I can think of one possible threat. Suppose that the locale library
> > has a bug, so that calling "aardvark".isdigit seg faults, potentially
> > executing arbitrary C code, but at the very least crashing the
> > application. Is that the sort of attack you're concerned by?
> 
> This thread already covered the need to address SEGV at length. For a
> truly evil user, almost any kind of crash is an opportunity to take
> control of the system, and a security solution ignoring this is no
> security solution at all.

Indeed.

> But that's not what's happening, instead a dead horse is being flogged
> over a hundred messages in our inboxes and IMHO it is excruciating to
> watch.

I don't think that is true at all, and I personally I have found this
thread very interesting. I apologise if others have not.

> > Even if the only thing we learn from Jon's experiment is a new set of
> > tricks for breaking out of the sandbox, that's still interesting, if
> > not useful.
> 
> Don't forget the worst case: a fundamentally broken security module
> heavily marketed to the naive using claims the core team couldn't break
> it.

I should point out that my module is called "unsafe.py", is titled
an "experiment", and prominently states in the README:

  Do not use this code for any purpose in the real world.

I will not be putting it up as an installable package, and as already
stated it was never my intention to suggest that it or anything like
it be included in the stdlib. I will however leave it on github for
anyone who wants to have a go at breaking into it in the future.

From srkunze at mail.de  Tue Apr 12 10:52:30 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Tue, 12 Apr 2016 16:52:30 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
 <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
Message-ID: <570D0BAE.9020404@mail.de>

On 12.04.2016 00:56, Random832 wrote:
> Fully general re-dispatch from argument types on any call to a function
> that raises TypeError or NotImplemented? [e.g. call
> Path.__missing_func__(os.open, path, mode)]
>
> Have pathlib monkey-patch things at import?

Implicit conversion. No, thanks.

> On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote:
>> So, I might add:
>>
>> 3. add more high-level features to pathlib to prevent a downgrade to os
>> or os.path
> 3. reimplement the entire ecosystem in every walled garden so no-one has
> to leave their walled gardens.
>
> What's the point of batteries being included if you can't wire them to
> anything?

Huh? That makes not sense to me.

> I don't get what you mean by this whole "different level of abstraction"
> thing, anyway.

Strings are strings. Paths are paths. That's were the difference is.

> The fact that there is one obvious thing to want to do
> with open and a Path strongly suggests that that should be able to be
> done by passing the Path to open.

Path(...).open() is your friend then. I don't see why you need os.open.

Refusing to upgrade it like saying, everything was better in the old 
days. So let's use os.open instead of Path(...).open().

> Also, what level of abstraction is builtin open? Maybe we should _just_
> leave os alone on the grounds of some holy sacred lowest-level-itude,
> but allow io and shutils to accept Path?

os, io and shutils accept strings. Not Path objects. Why? Because the 
semantics of "being a path" are applied implicitly by those modules. You 
are free to use a random string as a path and later as the name of your 
pet. Semantics of a string comes from usage. Path objects however have 
built-in semantics.

Furthermore, if os, io and shutils are changed, we allow code like the 
following:


my_path.touch()
os.remove(my_path)


I don't know how to explain reasonably why my_path sometimes stays in 
front of the method call and sometimes behind it to newbies.

Best,
Sven

From srkunze at mail.de  Tue Apr 12 10:54:03 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Tue, 12 Apr 2016 16:54:03 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
 <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
Message-ID: <570D0C0B.7000208@mail.de>

On 12.04.2016 12:41, Paul Moore wrote:
> As your thoughts appear to have been triggered by my comments, I feel
> I should clarify.
>
> 1. I like pathlib even as it is right now, and I'm strongly -1 on removing it.
> 2. The "external dependency" aspect of 3rd party solutions makes them
> far less useful to me.
> 3. The work on improving integration with the stdlib (which is nearly
> sorted now, as far as I can see) is a big improvement, and I'm all in
> favour. But even without it, I wouldn't want pathlib to be removed.
> 4. There are further improvements that could be made to pathlib,
> certainly, but again they are optional, and pathlib is fine without
> them.

My conclusion is that these changes are not optional and tweaking os, io 
and shutil is just yet another workaround for a clean solution. :)

Just my two cents.

> 5. I wish more 3rd party code integrated better with pathlib. The
> improved integration work might help with this. But ultimately, Python
> 2 compatibility is likely to be the biggest block (either perceived or
> real - we can make pathlib support as simple as possible, but some 3rd
> party authors will remain unwilling to add support for Python 3 only
> features in the short term). This isn't a pathlib problem.
> 6. There will probably always be a place for low-level os/os.path
> code. Adding support in those modules for pathlib doesn't affect that
> fact, but does make it easier to use pathlib "seamlessly", so why not
> do so?
>
> tl; dr; I'm 100% in favour of pathlib, and in the direction the
> current discussion (excluding "let's give up on pathlib" digressions)
> is going.

Best,
Sven

From donald at stufft.io  Tue Apr 12 10:58:11 2016
From: donald at stufft.io (Donald Stufft)
Date: Tue, 12 Apr 2016 10:58:11 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570D0BAE.9020404@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
 <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
 <570D0BAE.9020404@mail.de>
Message-ID: <196C5476-DEC9-4822-93BD-A7C53D76D50C@stufft.io>


> On Apr 12, 2016, at 10:52 AM, Sven R. Kunze <srkunze at mail.de> wrote:
> 
> Path(...).open() is your friend then. I don't see why you need os.open.
> 
> Refusing to upgrade it like saying, everything was better in the old days. So let's use os.open instead of Path(...).open().


I think it was a mistake to have Path(?).open to be honest and I think the main reason it exists is because open(Path(?)) doesn?t work (yet!). You can?t hang every single thing you might ever want to do to a Path off the path object.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/80f5496e/attachment.sig>

From random832 at fastmail.com  Tue Apr 12 10:59:05 2016
From: random832 at fastmail.com (Random832)
Date: Tue, 12 Apr 2016 10:59:05 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570D0BAE.9020404@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
 <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
 <570D0BAE.9020404@mail.de>
Message-ID: <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com>

On Tue, Apr 12, 2016, at 10:52, Sven R. Kunze wrote:
> On 12.04.2016 00:56, Random832 wrote:
> > Fully general re-dispatch from argument types on any call to a function
> > that raises TypeError or NotImplemented? [e.g. call
> > Path.__missing_func__(os.open, path, mode)]
> >
> > Have pathlib monkey-patch things at import?
> 
> Implicit conversion. No, thanks.

No more so than __radd__ - I didn't actually mean this as a serious
suggestion, but but python *does* already have multiple dispatch.

> > On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote:
> > I don't get what you mean by this whole "different level of abstraction"
> > thing, anyway.
> 
> Strings are strings. Paths are paths. That's were the difference is.

Yes but why aren't these both "things that you may want to use to open a
file"?

> > The fact that there is one obvious thing to want to do
> > with open and a Path strongly suggests that that should be able to be
> > done by passing the Path to open.
> 
> Path(...).open() is your friend then. I don't see why you need os.open.

Because I'm passing it to modfoo.dosomethingwithafile() which takes a
filename and passes it to shutils, which passes it to builtin open,
which passes it to os.open.

Should Path grow a dosomethingwithmodfoo method?

From rosuav at gmail.com  Tue Apr 12 11:25:15 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 13 Apr 2016 01:25:15 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C1E13.4090909@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
Message-ID: <CAPTjJmpP+0LyNMj0bp5gY5hCgqw_cLxn-+KjFyFsKX3bBHeAbA@mail.gmail.com>

On Tue, Apr 12, 2016 at 7:58 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Sticking points:
> ---------------
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow
> bytes from __fspath__()?
>

I would say No and No, on the basis that it's *far* easier to widen
their scope in 3.7 than to narrow it. Once you declare that one or
both of these may return bytes, it becomes an annoying incompatibility
to change that (even if it *is* marked provisional), which almost
certainly means it won't happen. By restricting them both, we force
the issue: if you want bytes, you'll know about it.

I'd also prefer to stick to Unicode path names, for reasons I've
stated in other threads. Undecodable path byte streams can be handled
already, so what are we really gaining by allowing a Path-like object
to emit bytes? If it becomes a major issue for a lot of types, it
wouldn't be hard to add a helper function somewhere (or a mixin class
that provides a ready-to-go __fspath__, which might well be
sufficient).

ChrisA

From srkunze at mail.de  Tue Apr 12 11:38:36 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Tue, 12 Apr 2016 17:38:36 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C8EEE.6050904@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us>
Message-ID: <570D167C.8040202@mail.de>

Sorry for disturbing this thread's harmony.


On 12.04.2016 08:00, Ethan Furman wrote:
> On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote:
>
>>> Consider os.path.join:
>>
>> Why in the world do the  os.path functions need to work with Path
>> objects? ( and other conforming objects)
>
> Because library XYZ that takes a path and wants to open it shouldn't 
> have to care whether that path is a string or pathlib.Path -- but if 
> os.open can't use pathlib.Path then the library has to care (or the 
> user has to care).
>
>> This all started with the goal of using Path objects in the stdlib,
>> but that's for opening files, etc.
>
> Etc. as in os.join?  os.stat? os.path.split?
>
>> Path is an alternative to os.path -- you don't need to use both.
>

I agree with that quote of Chris.

> As a user you don't, no.  As a library that has no control over what 
> kind of "path" is passed to you -- well, if os and os.path can accept 
> Path objects then you can just use os and os.path; otherwise you have 
> to use os and os.path if passed a str or bytes, and pathlib.Path if 
> passed a pathlib.Path -- so you do have to use both.

I don't agree here. There's no need to increase the convenience for a 
library maintainer when it comes to implicit conversions.

When people want to use your library and it requires a string, the can 
simply use "my_path.path" and everything still works for them when they 
switch to pathlib.


Best,
Sven

From stephen at xemacs.org  Tue Apr 12 11:52:43 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 13 Apr 2016 00:52:43 +0900
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
Message-ID: <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > One possible way to address this concern would be to have the
 > underlying protocol be bytes/str (since boundary code frequently
 > needs to handle the paths-are-bytes assumption in POSIX),

What "needs"?  As has been pointed out several times, with PEP 383 you
can deal with bytes losslessly by using an arbitrary codec and
errors=surrogateescape.  I know why *I* use bytes nevertheless:
because when I must guess the encoding, it just makes more sense to
read bytes and then iterate over codecs until the result looks like
words I know in some language.

I don't understand why people who mostly believe "bytes are text, too"
because almost all they ever see are bytes in the range 0x00-0x7f need
bytes.  For them, fsdecode and fsencode DTRT.

If you want to claim "efficiency", I can't gainsay since I don't know
the applications, but if you're trying to manipulate file names
millions of times per second, I have to wonder what you're doing with
them that benefits so much from Path.

 > but offer an "os.fspathname" API that rejected bytes output from
 > os.fspath.

Either it's a YAGNI because I'm not going to get any bytes in the
first place, or it raises where I probably could have done something
useful with bytes if I were expecting them (see "pathological" below).

 > That way folks that wanted the clean "must be str" signature

Er, I don't need no steenkin' "clean signature".  I need str, and if
I can't get it from __fspath__, there's always os.fsdecode.  But this
is serious horse-before cart-putting, punishing those who do things
Python-3-ishly right.

 > The ambiguity in question here is inherent in the differences between
 > the way POSIX and Windows work,

Not with PEP 383, it's not.  And I don't do Windows, so my preference
for str has nothing to do with it mapping to native OS APIs well.

The ambiguity in question here is inherent in the differences between
the ways Python 2 and Python 3 programmers work on POSIX AFAICS.
Certainly, there will be times when fsdecode doesn't DTRT.  So those
times you have to use an explicit bytes.decode.  Note that when you
*do* care enough to do that, it's because the Path is *text* -- you're
going to display it to a human, or pass it out of the module.  If all
you're going to do is access the filesystem object denoted, fsdecode
does a sufficiently accurate job.

So if for some reason you're getting bytes at the boundary, I see no
reason why you can't have a convenience constructor

def pathological(str_or_bytes_or_path_seq):
    args = []
    for s_o_b in str_or_bytes_or_path_seq:
        args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b)
    return pathlib.Path(str_or_path_list)

for when that's good enough (maybe Antoine would even allow it into
pathlib?)

 > so there are limits to how far we can go in hiding it without
 > making things worse rather than better.

What "hide"?  Nobody is suggesting that the polymorphic os APIs should
go away.  Indeed, they are perfect TOOWTDI, giving the programmer
exactly the flexibility needed *and no more*, *at* the boundary.

The questions on my mind are:

(A) Why does anybody need bytes out of a pathlib.Path (or other
    __fspath__-toting, higher-level API) *inside* the boundary?  Note
    that the APIs in os (etc) *don't need* bytes because they are
    already polymorphic.

(B) If they do, why can't they just apply bytes() to the object?  I
    understand that that would offend Ethan's aesthetic sense, so it's
    worth looking for a nice way around it.  But allowing __fspath__
    to return bytes or str is hideous, because Paths are clearly on
    the application side of the boundary.

Note that bytes() may not have the serious problem that str() does of
being too catholic about its argument: nothing in __builtins__ has a
__bytes__!  Of course there are a few things that do work: ints, and
sequences of ints.


From srkunze at mail.de  Tue Apr 12 11:57:24 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Tue, 12 Apr 2016 17:57:24 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
 <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
 <570D0BAE.9020404@mail.de>
 <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com>
Message-ID: <570D1AE4.9010607@mail.de>

On 12.04.2016 16:59, Random832 wrote:
>
>> Strings are strings. Paths are paths. That's were the difference is. 
> Yes but why aren't these both "things that you may want to use to open a
> file"?

Because "things that you may want to use to open a file" is a bit vague 
and thus conceal the fact that we really need.

As an example: time.sleep takes a number of seconds (notice the 
primitive datatype just like a string) and does not take timedelta.

Why don't we add datetime.timedelta support to time.sleep? Very same thing.

>>> The fact that there is one obvious thing to want to do
>>> with open and a Path strongly suggests that that should be able to be
>>> done by passing the Path to open.
>> Path(...).open() is your friend then. I don't see why you need os.open.
> Because I'm passing it to modfoo.dosomethingwithafile() which takes a
> filename and passes it to shutils, which passes it to builtin open,
> which passes it to os.open.
>
> Should Path grow a dosomethingwithmodfoo method?

Because we can argue here the other way round and say:

"oh, pathlib can do things, I cannot do with os.path."

Should os.path grow those things?


Put differently, you cannot do everything. But the most common issues 
should be resolved in the correct module. This is no argument for or 
against either solution.


I am sorry, if my contribution on the threads of python-ideas made it 
seem that I would always support this idea. I don't anymore. However, I 
will still be happy with the outcome even if not perfect, will help 
making the Python stdlib better. :)

Best,
Sven

From chris.barker at noaa.gov  Tue Apr 12 11:59:19 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 08:59:19 -0700
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
 <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
Message-ID: <CALGmxEJaivV-LJ+BHHux4xZyEj37+vK7Qh3vhotVDhBD8JdgvQ@mail.gmail.com>

one little note:

On Tue, Apr 12, 2016 at 3:41 AM, Paul Moore <p.f.moore at gmail.com> wrote:

> 4. There are further improvements that could be made to pathlib,
> certainly, but again they are optional, and pathlib is fine without
> them.
>

Exactly -- "improvements to pathlib" and "make the stdlib pathlib
compatible" are completely orthogonal.


> 5. I wish more 3rd party code integrated better with pathlib. The
> improved integration work might help with this. But ultimately, Python
> 2 compatibility is likely to be the biggest block (either perceived or
> real - we can make pathlib support as simple as possible, but some 3rd
> party authors will remain unwilling to add support for Python 3 only
> features in the short term). This isn't a pathlib problem.
>

true -- though the proposed protocol approach opens doors there -- any
third party lib can check for a __whatever_it's_called__ and run fine in
py2 or py3 or, indeed, any version of python.

Also if you really don't like pathlib, then the protocol allows you to
write/use a different path implementation -- really win-win.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/d91a1008/attachment-0001.html>

From chris.barker at noaa.gov  Tue Apr 12 12:04:21 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 09:04:21 -0700
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570D0C0B.7000208@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
 <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
 <570D0C0B.7000208@mail.de>
Message-ID: <CALGmxEJvJ_k5CC6s5Wd_GdgXE6krs1Pko5QHSC=Ga7cUOifUkg@mail.gmail.com>

On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze <srkunze at mail.de> wrote:

>
> My conclusion is that these changes are not optional and tweaking os, io
> and shutil is just yet another workaround for a clean solution. :)
>

Is the clean solution to re-implement EVERYTHING in the stdlib that
involves a path in a new, fancy pathlib way?

If we were starting from scratch, I _might_ like that idea, but we're not
starting from scratch. And that would cement in pathlib itself, leaving no
room for other path implementations. kind of like how the pre-__Index__
python cemented in python integers as the only objects once could use to
index a sequence.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/65ab8594/attachment.html>

From ethan at stoneleaf.us  Tue Apr 12 12:10:55 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 12 Apr 2016 09:10:55 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C1E13.4090909@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
Message-ID: <570D1E0F.5040502@stoneleaf.us>

On 04/11/2016 02:58 PM, Ethan Furman wrote:
> Sticking points:
> ---------------
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we
> allow bytes from __fspath__()?

On 04/11/2016 10:28 PM, Stephen J. Turnbull wrote:
 > In text applications, "bytes as carcinogen" is an apt metaphor.

On 04/12/2016 08:25 AM, Chris Angelico wrote:
 > I would say No and No, on the basis that it's *far* easier to widen
 > their scope in 3.7 than to narrow it.

On 04/11/2016 08:45 PM, Nick Coghlan wrote:
 > I've come around to the point of view that allowing both str and
 > bytes-like objects to pass through unchanged makes sense, with the
 > rationale being the one someone mentioned regarding ease-of-use in
 > os.path.
[...]
> One possible way to address this concern would be to have the
> underlying protocol be bytes/str (since boundary code frequently needs
> to handle the paths-are-bytes assumption in POSIX), but offer an
> "os.fspathname" API that rejected bytes output from os.fspath.

I think this is the way forward:  offer a standard way to get 
paths-as-strings, with an easily supported way of working with 
paths-as-bytes.

This could be with on os.fspathname() & os.fspath() pair of functions, 
or with a single function that has a parameter specifying what to do 
with bytes objects: reject (default), accept, or (maybe) an encoding to 
use to coerce to bytes.

--
~Ethan~

From srkunze at mail.de  Tue Apr 12 12:14:29 2016
From: srkunze at mail.de (Sven R. Kunze)
Date: Tue, 12 Apr 2016 18:14:29 +0200
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <CALGmxEJvJ_k5CC6s5Wd_GdgXE6krs1Pko5QHSC=Ga7cUOifUkg@mail.gmail.com>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
 <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
 <570D0C0B.7000208@mail.de>
 <CALGmxEJvJ_k5CC6s5Wd_GdgXE6krs1Pko5QHSC=Ga7cUOifUkg@mail.gmail.com>
Message-ID: <570D1EE5.4090904@mail.de>

On 12.04.2016 18:04, Chris Barker wrote:
> On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze <srkunze at mail.de 
> <mailto:srkunze at mail.de>> wrote:
>
>
>     My conclusion is that these changes are not optional and tweaking
>     os, io and shutil is just yet another workaround for a clean
>     solution. :)
>
>
> Is the clean solution to re-implement EVERYTHING in the stdlib that 
> involves a path in a new, fancy pathlib way?
>
> If we were starting from scratch, I _might_ like that idea, but we're 
> not starting from scratch. And that would cement in pathlib itself, 
> leaving no room for other path implementations. kind of like how the 
> pre-__Index__ python cemented in python integers as the only objects 
> once could use to index a sequence.

I cannot remember us using another datetime library. So, I don't value 
this "advantage" as much as you do.


Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/a4fb704c/attachment.html>

From ethan at stoneleaf.us  Tue Apr 12 12:15:34 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 12 Apr 2016 09:15:34 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>	<570A7C67.3010304@stoneleaf.us>	<CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>	<570BCE39.8090306@stoneleaf.us>	<CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>	<570BDB17.5000601@stoneleaf.us>	<CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>	<570BECC6.1080708@stoneleaf.us>	<CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>	<CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>	<570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
Message-ID: <570D1F26.5090800@stoneleaf.us>

On 04/11/2016 04:43 PM, Victor Stinner wrote:
> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit :

>> So my concern in such a case is what happens if we pass this SE
>> string somewhere else: a UTF-8 file, or over a socket, or into a
>> database? Does this have issues that we wouldn't face if we just used bytes?
>
> "SE string" are returned by os.listdir(str), os.walk(str),
> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
> the sun.

So when we pass a bytes object in, Python (on posix) converts that to a 
string using surrogateescape, gets back strings from the os, and encodes 
them back to bytes, again using surrogateescape?


> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
> error.

latin1?  I thought latin1 had a code point for 0-255, so how could using 
it raise an encoding error?

--
~Ethan~

From rosuav at gmail.com  Tue Apr 12 12:20:17 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 13 Apr 2016 02:20:17 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570D1F26.5090800@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
Message-ID: <CAPTjJmp_aAAbYWNFcLkMFsWTCj8tP4G6tAfKcgTg=B4kt6PZ-Q@mail.gmail.com>

On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?
>
>
>> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
>> error.
>
>
> latin1?  I thought latin1 had a code point for 0-255, so how could using it
> raise an encoding error?

Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.

ChrisA

From chris.barker at noaa.gov  Tue Apr 12 12:19:41 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 09:19:41 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570C8A40.6020903@canterbury.ac.nz>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz>
Message-ID: <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>

On Mon, Apr 11, 2016 at 10:40 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
wrote:
>
> So the ONLY thing
>> you should do with it is pass it along to another low level system
>> call.
>>
>
> Not quite -- you can separate it into components and
> work with them. Essentially the same set of operations
> that os.path provides.
>

ahh yes, so while posix claims that paths are "just a char*", they are
really bytes where we can assume that the byte with value 2F is the pathsep
(and that 2E separates an extension?), so I suppose os.path is useful. But
I still think that most of us should never deal with bytes paths, and the
few that need to should just work with the low level functions and be done
with it.

One more though came up just now: there are different level sof
abstractions and representations for paths. We don't want to make Path a
subclass of string, because Path is supposed to be a higher level
abstraction -- good.

then at the bottom of the stack, we NEED the bytes level path, because that
what ultimately gets passed to the OS.

THe legacy from the single-byte encoding days is that bytes and strings
were the same, so we could let people work with nice human readable
strings, while also working with byte paths in the same way -- but those
days are gone -- py3 make s clear (and important) distiction between nice
human readable strings  and the bytes that represent them.

So: why use strings as the lingua franca of paths? i.e. the basis of the
path protocol. maybe we should support only two path representations:

1) A "proper" path object -- i.e. pathlib.Path or anything else that
supports the path protocol.

2) the bytes that the OS actually needs.

this would mean that the protocol would be to have a __pathbytes__() method
that woulde return the bytes that should be passed off to the OS.

A posix Path implementation could store that internal bytes representation,
so it could pass it off unchanged if that's all you need to do.

Any current API that takes bytes could be made to easily work.

I'm SURE I'm missing something really big here, but it seems like maybe
it's better to get farther from "strings as paths" rather than closer to
it....

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/4191cada/attachment.html>

From k7hoven at gmail.com  Tue Apr 12 12:26:24 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Tue, 12 Apr 2016 19:26:24 +0300
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
Message-ID: <CAMiohoj66ac8+-rZDupeO3BpqZd_2BQ67YDhm=bUvkTyoYdubA@mail.gmail.com>

On Tue, Apr 12, 2016 at 11:56 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> One possible way to address this concern would be to have the
> underlying protocol be bytes/str (since boundary code frequently needs
> to handle the paths-are-bytes assumption in POSIX), but offer an
> "os.fspathname" API that rejected bytes output from os.fspath. That
> is, it would be equivalent to:
>
>     def fspathname(path):
>         name = os.fspath(path)
>         if not isinstance(name, str):
>             raise TypeError("Expected str for pathname, not
> {}".format(type(name)))
>         return name
>
> That way folks that wanted the clean "must be str" signature could use
> os.fspathname, while those that wanted to accept either could use the
> lower level os.fspath.

I'm not necessarily opposed to this. I kept bringing up bytes in the
discussion because os.path.* etc. and DirEntry support bytes and will
need to keep doing so for backwards compatibility.  I have no
intention to use bytes pathnames myself. But it may break existing
code if functions, for instance, began to decode bytes paths to str if
they did not previously do so (or to reject them). It is indeed a lot
safer to make new code not support bytes paths than to change the
behavior of old code.

But then again, do we really recommend new code to use os.fspath (or
os.fspathname)? Should they not be using either pathlib or os.path.*
etc. so they don't have to care? I'm sure Ethan and his library (or
some other path library) will manage without the function in the
stdlib, as long as the dunder attribute is there.

So I'm, once again, posing this question (that I don't think got any
reactions previously): Is there a significant audience for this new
function, or is it enough to keep it a private function for the stdlib
to use? That handful of third-party path libraries can decide for
themselves if they want to (a) reject bytes or (b) implicitly fsdecode
them or (c) pass them through just like str, depending on whatever
their case requires in terms of backwards compatiblity or other goals.

If we forget about the os.fswhatever function, we only have to decide
whether the magic dunder attribute can be str or bytes or just str.

-Koos

From k7hoven at gmail.com  Tue Apr 12 12:32:13 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Tue, 12 Apr 2016 19:32:13 +0300
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid>
 <570C8A40.6020903@canterbury.ac.nz>
 <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
Message-ID: <CAMiohoit+SS3oun32cCGbqJSP+eApG0B2N5ybq3E3hx-YgZomQ@mail.gmail.com>

On Tue, Apr 12, 2016 at 7:19 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>
> One more though came up just now: there are different level sof abstractions
> and representations for paths. We don't want to make Path a subclass of
> string, because Path is supposed to be a higher level abstraction -- good.
>
> then at the bottom of the stack, we NEED the bytes level path, because that
> what ultimately gets passed to the OS.
>
> THe legacy from the single-byte encoding days is that bytes and strings were
> the same, so we could let people work with nice human readable strings,
> while also working with byte paths in the same way -- but those days are
> gone -- py3 make s clear (and important) distiction between nice human
> readable strings  and the bytes that represent them.
>
> So: why use strings as the lingua franca of paths? i.e. the basis of the
> path protocol. maybe we should support only two path representations:
>
> 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> supports the path protocol.
>
> 2) the bytes that the OS actually needs.
>

You do have a point there. But since bytes pathnames are deprecated on
windows, this seems to lead to supporting both str and bytes in the
protocol, or having two protocols __fspathbytes__ and __fspathstr__
(and one being preferred over the other, potentially even depending on
the platform).,

-Koos

From chris.barker at noaa.gov  Tue Apr 12 12:33:54 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 09:33:54 -0700
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570D1AE4.9010607@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de>
 <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com>
 <570D0BAE.9020404@mail.de>
 <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com>
 <570D1AE4.9010607@mail.de>
Message-ID: <CALGmxE++wm=wuHKdJe1HqgF8wci9J4AmHqQ-HSggO3ufkX_1Ew@mail.gmail.com>

On Tue, Apr 12, 2016 at 8:57 AM, Sven R. Kunze <srkunze at mail.de> wrote:

> As an example: time.sleep takes a number of seconds (notice the primitive
> datatype just like a string) and does not take timedelta.
>
> Why don't we add datetime.timedelta support to time.sleep? Very same thing.


yup -- and it there were a lot of commonly used APIs that took strings, and
multiple timedelta implementations, then it would make sense to introduce a
__seconds_int__ protocol.

I don't think the use-cases rise to that level, myself. Though if someone
wanted to put a call in to obj.totalseconds() into time.sleep, that might
actually be worth it :-)

(now that yo mention it -- I have a substantial library that uses seconds
internally, and currently has an ugly sometimes integer seconds, sometimes
timedelta API -- maybe I'll introduce that protocol. Not sure why I didn't
think of that before now.

Because I'm passing it to modfoo.dosomethingwithafile() which takes a
>> filename and passes it to shutils, which passes it to builtin open,
>> which passes it to os.open.
>>
>> Should Path grow a dosomethingwithmodfoo method?
>
>
It can't -- modfoo could be a third-party module -- it is impossible for
Path to grow everything that any third party module might support.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/b3bf7478/attachment-0001.html>

From ethan at stoneleaf.us  Tue Apr 12 12:37:13 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 12 Apr 2016 09:37:13 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAPTjJmp_aAAbYWNFcLkMFsWTCj8tP4G6tAfKcgTg=B4kt6PZ-Q@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CAPTjJmp_aAAbYWNFcLkMFsWTCj8tP4G6tAfKcgTg=B4kt6PZ-Q@mail.gmail.com>
Message-ID: <570D2439.7010005@stoneleaf.us>

On 04/12/2016 09:20 AM, Chris Angelico wrote:
> On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman

>> latin1?  I thought latin1 had a code point for 0-255, so how could using it
>> raise an encoding error?
>
> Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
> string will *decode*. It only defines 256 characters as having
> equivalent bytes, though, so *encoding* can fail.

Ah, right -- so if you start with bytes it cannot fail, if you start 
with a string it can.

--
~Ethan~


From chris.barker at noaa.gov  Tue Apr 12 12:36:32 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 09:36:32 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAPTjJmp_aAAbYWNFcLkMFsWTCj8tP4G6tAfKcgTg=B4kt6PZ-Q@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CAPTjJmp_aAAbYWNFcLkMFsWTCj8tP4G6tAfKcgTg=B4kt6PZ-Q@mail.gmail.com>
Message-ID: <CALGmxEL=NE+JVaU0Cpv+-03tTh=+4=oo1b57FABX8pff8HfLJQ@mail.gmail.com>

On Tue, Apr 12, 2016 at 9:20 AM, Chris Angelico <rosuav at gmail.com> wrote:

> > latin1?  I thought latin1 had a code point for 0-255, so how could using
> it
> > raise an encoding error?
>
> Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
> string will *decode*. It only defines 256 characters as having
> equivalent bytes, though, so *encoding* can fail.
>

unless it was decoded as latin-1 in the first place. doesn't the surrogate
escape thing only work properly if you decode/encode with the same encoding?

-CHB




Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/01074b98/attachment.html>

From ethan at stoneleaf.us  Tue Apr 12 12:39:59 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 12 Apr 2016 09:39:59 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMiohoj66ac8+-rZDupeO3BpqZd_2BQ67YDhm=bUvkTyoYdubA@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
 <CAMiohoj66ac8+-rZDupeO3BpqZd_2BQ67YDhm=bUvkTyoYdubA@mail.gmail.com>
Message-ID: <570D24DF.4050902@stoneleaf.us>

On 04/12/2016 09:26 AM, Koos Zevenhoven wrote:

> So I'm, once again, posing this question (that I don't think got any
> reactions previously): Is there a significant audience for this new
> function, or is it enough to keep it a private function for the stdlib
> to use?

Quite frankly, I expect the stdlib itself to be the primary consumer. 
But I see no reason to not publish the function so that users who need 
the advanced functionality have easy access to it.

--
~Ethan~


From chris.barker at noaa.gov  Tue Apr 12 12:40:00 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 09:40:00 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMiohoit+SS3oun32cCGbqJSP+eApG0B2N5ybq3E3hx-YgZomQ@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz>
 <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
 <CAMiohoit+SS3oun32cCGbqJSP+eApG0B2N5ybq3E3hx-YgZomQ@mail.gmail.com>
Message-ID: <CALGmxEJGCoorhwcJHRe=MAEdzzTMP3KLAtm3QYBirOoQ+m1kbg@mail.gmail.com>

On Tue, Apr 12, 2016 at 9:32 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> > 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> > supports the path protocol.
> >
> > 2) the bytes that the OS actually needs.
> >
>
> You do have a point there. But since bytes pathnames are deprecated on
> windows,


Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest
level, but the decision was already made there to use str as the the
lingua-franca -- i.e. the user NEVER sees a path as a bytestring on
Windows? I guess that's decided then. str is the exchange format.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/5e3cb315/attachment.html>

From random832 at fastmail.com  Tue Apr 12 12:45:42 2016
From: random832 at fastmail.com (Random832)
Date: Tue, 12 Apr 2016 12:45:42 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxEJGCoorhwcJHRe=MAEdzzTMP3KLAtm3QYBirOoQ+m1kbg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz>
 <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
 <CAMiohoit+SS3oun32cCGbqJSP+eApG0B2N5ybq3E3hx-YgZomQ@mail.gmail.com>
 <CALGmxEJGCoorhwcJHRe=MAEdzzTMP3KLAtm3QYBirOoQ+m1kbg@mail.gmail.com>
Message-ID: <1460479542.3589839.576602801.5DD53A06@webmail.messagingengine.com>

On Tue, Apr 12, 2016, at 12:40, Chris Barker wrote:
> Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest
> level,

Only in the sense that literally everything's bytes at the lowest level.
But the bytes Windows needs are not in an ASCII-compatible encoding so
it's not reasonable to talk about them in the same way as every other
kind of bytes filename.

> but the decision was already made there to use str as the the
> lingua-franca -- i.e. the user NEVER sees a path as a bytestring on
> Windows? I guess that's decided then. str is the exchange format.

From barry at barrys-emacs.org  Tue Apr 12 13:03:56 2016
From: barry at barrys-emacs.org (Barry Scott)
Date: Tue, 12 Apr 2016 18:03:56 +0100
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570C13D6.4090609@stoneleaf.us>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <570C13D6.4090609@stoneleaf.us>
Message-ID: <20160412180356.000005a2@barrys-emacs.org>

On Mon, 11 Apr 2016 14:15:02 -0700
Ethan Furman <ethan at stoneleaf.us> wrote:

> We've pretty decided that we have two options:
> 
> 1. remove pathlib
> 2. make the stdlib work with pathlib
> 
> So we're trying to make option 2 work before falling back to option 1.

I have been doing a lot of porting to Python 3 and have really enjoyed
having pathlib, even in its current state.

In one of my previous projects using python 2 on linux we had to code to
handle files with names that where not utf-8. (Users could FTP a file
into the file system and it could end up non-utf-8).

Today we would have used pathlib to represent paths in the app.
But we would need to be able to detect the paths that do not following 
the fs encoding rules.

I would suggest a predicate in Path to report that the path cannot be
encoding without the use of surrogates. Not sure what to call the
predicate.

This can be used by code that cares to handle converting the path into
a suitable presentation string for showing to a user. I'm assuming here
that PEP383 may not provide an presentation string that is suitable for
showing to users.

In the case of our product we refused to use files that did not encode
to utf-8 and had a UI to allow the user to fix the name. 

One reason for files that can only be represented as bytes()
being detectable I suspect is to avoid security issues. I think
if I have my black hat on I would probe a python3 app with filenames
that are non-utf-8 and see if I can break the app.

Barry

From k7hoven at gmail.com  Tue Apr 12 13:31:21 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Tue, 12 Apr 2016 20:31:21 +0300
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp>
References: <570C1E13.4090909@stoneleaf.us>
 <E90564BC-FD56-46D7-9514-C347283715B2@stufft.io>
 <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7coi+=PcF-5SYksUkdMRvuz+Bx8APKZGaFzHbXgB59t7w@mail.gmail.com>
 <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohoiLpDaMG5tB9egoS3n8cuNwxV6Lx3AW1PVWGPpEQeew1w@mail.gmail.com>

On Tue, Apr 12, 2016 at 6:52 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>
> (A) Why does anybody need bytes out of a pathlib.Path (or other
>     __fspath__-toting, higher-level API) *inside* the boundary?  Note
>     that the APIs in os (etc) *don't need* bytes because they are
>     already polymorphic.
>

Indeed not from pathlib.*Path , but from DirEntry, which may have a
path as bytes. So the options for DirEntry (or things like Ethan's
'antipathy') are:

(1) Provide bytes or str via the protocol, depending on which type
this DirEntry has

Downside: The protocol needs to support str and bytes.

(2) Decode bytes using os.fsdecode and provide a str via the protocol

Downside: The user passed in bytes and maybe had a reason to do so.
This might lead to a weird mixture of str and bytes in the same code.

(3) Do not implement the protocol when dealing with bytes

Downside: If a function calling os.scandir accepts both bytes and str
in a duck-typing fashion, then, if this adopted something that uses
the new protocol, it will lose its bytes compatiblity. This risk might
not be huge, so perhaps (3) is an option?


> (B) If they do, why can't they just apply bytes() to the object?  I
>     understand that that would offend Ethan's aesthetic sense, so it's
>     worth looking for a nice way around it.  But allowing __fspath__
>     to return bytes or str is hideous, because Paths are clearly on
>     the application side of the boundary.
>
> Note that bytes() may not have the serious problem that str() does of
> being too catholic about its argument: nothing in __builtins__ has a
> __bytes__!  Of course there are a few things that do work: ints, and
> sequences of ints.

Good point. But this only applies to when the user _explicitly_ deals
with bytes. But when the user just deals with the type (str or bytes)
that is passed in, as os.path.* as well as DirEntry now do, this does
not work.

-Koos

From tritium-list at sdamon.com  Tue Apr 12 13:41:12 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Tue, 12 Apr 2016 13:41:12 -0400
Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.
In-Reply-To: <570D1EE5.4090904@mail.de>
References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de>
 <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de>
 <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com>
 <570C1560.7070105@mail.de>
 <CACac1F_rT4fJnr2NjxB=10CepW6TzL6LSTh4Yk0DTS8OetmEmw@mail.gmail.com>
 <570D0C0B.7000208@mail.de>
 <CALGmxEJvJ_k5CC6s5Wd_GdgXE6krs1Pko5QHSC=Ga7cUOifUkg@mail.gmail.com>
 <570D1EE5.4090904@mail.de>
Message-ID: <570D3338.5060402@sdamon.com>

On 4/12/2016 12:14, Sven R. Kunze wrote:
> I cannot remember us using another datetime library. So, I don't value 
> this "advantage" as much as you do.

They exist, and there are many cases where you would use a datetime 
library other than datetime for various reasons (integration in third 
party systems is only one reason).  But this is just a tangent.

In fact the situation with pathlib is similar to datetime - before the 
inclusion of datetime in the stdlib, there were several datetime 
libraries available.  Before pathlib, there were several path object 
libraries.  Only now, the third party options offer a great deal of 
competition over the stdlib option, thus these many hundreds, if not 
thousands, of emails on the subject.

From chris.barker at noaa.gov  Tue Apr 12 18:37:14 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 12 Apr 2016 15:37:14 -0700
Subject: [Python-Dev] ping on issue 18378: locale.getdefaultlocale() fails
 on recent Mac OS X
Message-ID: <CALGmxEKh-pk9zonXS3Lp+WSvJG0fCbJC3uxRqZMxK8ocY8PMcw@mail.gmail.com>

Hi folks,

There have been multiple reports of folks having failures on startup of
matplotlib, which appears to be due to the most recent OS-X version setting
the locale weirdly. This was identified last summer in this issue:

http://bugs.python.org/issue18378

It looks like the issue was figured out, and even a patch contributed, but
it stalled out before being applied.

I have no idea if the patch is any good, but it would be great to get this
fixed!

-Thanks,
  -Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160412/a0c5cded/attachment.html>

From stephen at xemacs.org  Tue Apr 12 22:56:50 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 13 Apr 2016 11:56:50 +0900
Subject: [Python-Dev] List posting custom [was: current status of
 discussions]
In-Reply-To: <570D167C.8040202@mail.de>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us>
 <570D167C.8040202@mail.de>
Message-ID: <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp>

The following is my opinion, as will become obvious, but it's based on
over a decade of observing these lists, and other open source
development lists.  In a context where some core developers have
unsubscribed from these lists, and others regularly report muting
threads with a certain air of asperity, I think it's worth the risk of
seeming arrogant to explain some of the customs (which are complex and
subtle) around posting to Python developer lists.  I'm posting
publicly because there are several new developers whose activity and
fresh perspective is very welcome, but harmony *is* being disturbed,
IMO unnecessarily.

This particular post caught my eye, but it's only an example of one of
the most unharmonious posting styles that has become common recently.
Attribution deliberately removed.

 > Sorry for disturbing this thread's harmony.

*sigh*  There is way too much of this on Python-Ideas recently, and
there shouldn't be any on Python-Dev.  So please don't.  Specifically,
disagreement with an apparently developing consensus is fine but
please avoid this:

 > >> Path is an alternative to os.path -- you don't need to use both.
 > 
 > I agree with that quote of Chris.

It's a waste of time to post *what* you agree with.[1]  Decisions are
not taken by vote in this community, except for the color of the
bikeshed, where it is agreed that *what* decision is taken doesn't
matter, but that some decision should be taken expeditiously.[2]
Chris already stated this position clearly and it's not a "color", so
there is no need to reiterate.  It simply wastes others' time to read
it.  (Whether it was a waste of the poster's time is not for me to
comment on.)

What matters to the decision is *why* you agree (or disagree).  If you
think that some of Chris's arguments are bogus (and should be
disregarded) and others are important, that is valuable information.
It's even better if you can shed additional light on the matter
(example below).

Also, expression of agreement is often a prelude to a request for
information.  "I agree with Z's post.  At least, I have never needed
X.  *When* do you need X?  Let's look for a better way than X!"

Unsupported (dis)agreement to statements about "needs" also may be
taken as *rude*, because others may infer your arrogant claim to know
what *they* do or don't need.  Admittedly there's a difficult
distinction here between Chris's *idiom* where "you don't need to"
translates to "In my understanding, it is generally not necessary to",
and your *unsupported* agreement, which in my dialect of English
changes the emphasis to imply you know better than those who disagree
with you and Chris.  And, of course, the position that others are "too
easily offended" is often reasonable, but you should be aware that
there will be an impact on your reputation and ability to influence
development of Python (even if it doesn't come near the point where
a moderator invokes "Code of Conduct").

"Me too" posts aren't entirely forbidden, but I feel that in Python
custom they are most appropriate when voting on bikeshed colors, and
as applause for a *technically* excellent suggestion.  They should be
avoided in the context of value judgments (of "need" and "simplicity",
for example) for the reason given above.

 > When people want to use your library and it requires a string, the
 > can simply use "my_path.path" and everything still works for them
 > when they switch to pathlib.

This is disrespectful in tone.  I don't know if you're responding to
Ethan here, but he's one of the authors in question.  We *know* that
Ethan doesn't like such inelegant idioms -- he said so -- where "this
object has an appropriate conversion to your argument type, so you
should apply it implicitly" is unambiguous.[3] So for him, it's *not*
so simple.  Since it's not a matter of voting, each proponent should
provide more contexts where preferred programming idioms are
"Pythonic" to sway the sense of the community, or if necessary, the
BDFL.

Where that aesthetic came up was in the context of consistently
wrapping arguments that might be Paths in str, as in

    p = Path(*stuff) or defaultstring
    # 500 lines crossing function and module boundaries!
    with open(str(p)) as f:
        process(f)

I think it was Nick who posted agreement with Ethan on the aesthetics
of str-wrapping.  If that were all, he probably wouldn't have posted
(see fn. 1), but he further pointed out that this application of str
is *dangerous* because *everything* in Python can be coerced to str.
That was a very valuable observation, which swayed the list in favor
of "Uh-oh, we can't recommend 'os.method(str(Path))'!"

This is my last post on this particular topic, but I will be happy to
discuss off-list.  (I may discuss further in public on my blog, but
first I have to get a blog. :-)


Footnotes: 
[1]  "You" is generic here.  There are a couple of developers whose
agreement has the status of pronouncement of Pythonicity.  Aspire to
that, but don't assume it -- very few have it, and it's actually
*very* rarely exercised.  And you can recognize them because they are
*asked* to pronounce -- by people whose statements you thought were
already authoritative!

[2]  And even so votes are often overturned by later arguments, both
theoretical and based in experience.  See for example the several
threads over time on the naming of Py_XSETREF.

[3]  Interpreting Zen koans frequently requires figure-ground
inversion.  In this case we can apply "In the face of ambiguity,
refuse to guess" in the form "in the absence of ambiguity, don't wait
to be asked".  I'm hardly authoritative, but FWIW :-) I think Ethan's
esthetic sense here accords with Pythonicity.


From tjreedy at udel.edu  Wed Apr 13 00:39:11 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 13 Apr 2016 00:39:11 -0400
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <ndukuq$or3$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org>
Message-ID: <nekii9$9lr$1@ger.gmane.org>

On 4/4/2016 5:05 PM, Terry Reedy wrote:

Since a few days, I am getting bug tracker emails again, in my Inbox.  I 
just got a Rietveld review in the Inbox and I believe it went there 
directly instead of first to Junk.  Thank you to whoever made the 
improvements.

-- 
Terry Jan Reedy


From cybersol at yahoo.com  Wed Apr 13 01:37:01 2016
From: cybersol at yahoo.com (Michael Mysinger)
Date: Wed, 13 Apr 2016 05:37:01 +0000 (UTC)
Subject: [Python-Dev] pathlib - current status of discussions
References: <570C1E13.4090909@stoneleaf.us>
Message-ID: <loom.20160413T071958-483@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:
 
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we 
> allow bytes from __fspath__()?

De-lurking. Especially since the ultimate goal is better interoperability, I 
feel like an implementation that people can play with would help guide the 
few remaining decisions. To help test the various options you could 
temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to both 
pathlib.__fspath__() and os.fspath(), with distinct configurable defaults for 
each. 

In the spirit of Python 3 I feel like bytes might not be needed in practice, 
but something like this with defaults of False will allow people to easily 
test all the various options.




From victor.stinner at gmail.com  Wed Apr 13 07:40:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 13:40:44 +0200
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
Message-ID: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>

Hi,

Last months, most 3.x buildbots failed randomly. Some of them were
always failing. I spent some time to fix almost all Windows and Linux
buildbots. There were a lot of different issues.

So please try to not break buildbots again and remind to watch them sometimes:

  http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable

Next weeks, I will try to backport some fixes to Python 3.5 (if
needed) to make these buildbots more stable too.

Python 2.7 buildbots are also in a sad state (ex: test_marshal
segfaults on Windows, see issue #25264). But it's not easy to get a
Windows with the right compiler to develop on Python 2.7 on Windows.

--

Maybe it's time to move more 3.x buildbots to the "stable" category?
http://buildbot.python.org/all/waterfall?category=3.x.stable

By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
considered as stable since it's failing with multiple issues since
many months and nobody is working on these failures. I suggest to move
this buildbot back to the unstable category.

--

We have many offline buildbots. What's the status of these buildbots?
Should we expect that they come back soon?

Or would it be possible to hide them? It would help to check the
status of all buildbots.

--

Failing buildbots:

- AMD64 FreeBSD CURRENT 3.x: http://bugs.python.org/issue26566 -- I
installed a fresh FreeBSD CURRENT in a VM and I'm unable to reproduce
failures. Maybe the buildbot slave is oudated and FreeBSD must be
upgraded?

- AMD64 OpenIndiana 3.x, x86 OpenIndiana 3.x: test_socket failures on
sendfile. Sorry but I'm not really interested by this OS.

- PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers,
test_socket, test_distutils, test_asyncio, (...); random timeout
failure in test_eintr, etc. I don't have access to AIX and I'm not
interested to acquire an AIX license, nor to install it. I'm not sure
that it's useful to have an AIX buildbot and no core developer have
access to AIX, and nobody is working on AIX failures. Maybe HP wants
to help us to support AIX? (Provide manpower, access to AIX servers,
or something like that.)

- x86 OpenBSD 3.x: 5 tests failed, test_crypt test_socket test_ssl
test_strptime test_time. This OS needs some love ;-)

- the 4 ICC buildbots are failing with stack overflow, segfault, etc.
Again, I'm not sure that these buildbots are useful since it looks
like we don't support this compiler yet. Or does it help to work on
supporting this compiler? Who is working on ICC support?

--

FYI I also made some enhancements on regrtest (our test runner for the
test suite), mostly to debug failures:

- display the duration of tests taking longer than 30 seconds
- new timestamp prefix, used to debug buildbot hangs
- when parallel tests are interrupted, display progress on waiting for
completion
- add timeout to main process when using -jN: it should help to debug
buildbot hang
- "Run tests in parallel using 3 child processes" or "Run tests
sequentially" message which helps to understand how tests are running.
There is the -j1 trap which has no effect: tests are still run
sequentially. By the way, I proposed to really use subprocesses when
-j1 is used: http://bugs.python.org/issue25285

The default timeout changed from 1 hour to 15 min, it's the maximum
duration to run a single test file (ex: test_os.py). On my Linux box,
running the whole test suite in parallel (10 child processes for my 4
CPU cores with hyperthreading) with Python compiled in debug mode
(slow) takes 4 min 37 sec.

Tell me if the default timeout is too low. It can be configured per
buildbot if needed (TESTTIMEOUT env var).

--

By the way, I'm always surprised by the huge difference of time needed
to run a build on the different slaves: from a few minutes to more
than 3 hours. The fatest Windows slave takes 28 minutes (run tests in
parallel using 4 child processes), whereas the 3 others (run tests
sequentially and) take between 2 hours and more than 3 hours! Why
running tests on Windows takes so long?

Maybe we should make sure that no buildbot run tests sequentially,
because it creates a lot of annoying side effects (even if sometimes
it helps to find tricky bugs, sometimes bugs restricted to the tests
themself) and because a lot of time simply wait a few seconds. So
running mutliple tests in parallel don't burn your CPU, it's just
faster. IMHO the risk of random timeout failures is low compared to
the speedup.

--

The most interesting bug was a deadlock in locale.setlocale() on
Windows 7: the bug made the buildbot to hang "sometimes" (randomly).
Jeremy Kloth identified the bug, but Steve Dower noticed us that it's
already fixed in Visual Studio 2015 Update 1: so please update VS if
it's not the case yet. Steve added a post-build test to check if the
ucrtbase/ucrtbased DLL has the known bug.
=> http://bugs.python.org/issue26624

Victor

From rosuav at gmail.com  Wed Apr 13 08:19:34 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 13 Apr 2016 22:19:34 +1000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <CAPTjJmpMh01+iXm=Zjzf1cdS054YfOgK5qK18YogW22_s0x96w@mail.gmail.com>

On Wed, Apr 13, 2016 at 9:40 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Maybe it's time to move more 3.x buildbots to the "stable" category?
> http://buildbot.python.org/all/waterfall?category=3.x.stable

Move the Bruces into stable, perhaps? The AMD64 Debian Root one. Been
fairly consistently green.

ChrisA

From eric at trueblade.com  Wed Apr 13 08:32:34 2016
From: eric at trueblade.com (Eric V. Smith)
Date: Wed, 13 Apr 2016 08:32:34 -0400
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <570E3C62.2080305@trueblade.com>

On 4/13/2016 7:40 AM, Victor Stinner wrote:
> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.

Thanks for all of your work on this, Victor. It's much appreciated.

Eric.


From mail at timgolden.me.uk  Wed Apr 13 08:56:53 2016
From: mail at timgolden.me.uk (Tim Golden)
Date: Wed, 13 Apr 2016 13:56:53 +0100
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <570E4215.5090101@timgolden.me.uk>

On 13/04/2016 12:40, Victor Stinner wrote:
> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.

Can I state the obvious and offer a huge vote of thanks for this work,
which is often tedious and unrewarding?

Thank you

TJG


From stefan at bytereef.org  Wed Apr 13 09:13:07 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Wed, 13 Apr 2016 13:13:07 +0000 (UTC)
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <loom.20160413T150406-192@post.gmane.org>

Victor Stinner <victor.stinner <at> gmail.com> writes:
> Maybe it's time to move more 3.x buildbots to the "stable" category?
> http://buildbot.python.org/all/waterfall?category=3.x.stable

+1 I think anything that is actually stable should be in that category.


> By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
> considered as stable since it's failing with multiple issues since
> many months and nobody is working on these failures. I suggest to move
> this buildbot back to the unstable category.

+1 The bot was very stable and fast for some time but has been unstable
for at least a year.



> - PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers,
> test_socket, test_distutils, test_asyncio, (...); random timeout
> failure in test_eintr, etc. I don't have access to AIX and I'm not
> interested to acquire an AIX license, nor to install it. I'm not sure
> that it's useful to have an AIX buildbot and no core developer have
> access to AIX, and nobody is working on AIX failures. Maybe HP wants
> to help us to support AIX? (Provide manpower, access to AIX servers,
> or something like that.)

Well, I think in this case it's the gcc AIX maintainer running it, so...


I think we should have a policy to stop reporting issues on unstable
bots unless someone has a concrete fix OR the bot maintainers are
known to fix issues fast (but that does not seem to be the case).



Stefan Krah











From ncoghlan at gmail.com  Wed Apr 13 09:51:02 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Apr 2016 23:51:02 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570D1F26.5090800@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
Message-ID: <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>

On 13 April 2016 at 02:15, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?

On POSIX, if you pass bytes to the os module, it will pass bytes to
the underlying system API, and then pass bytes back to your
application.

The potentially SE-strings only come back when you pass str, and the
operating system data isn't properly encoded according to the nominal
filesystem encoding. They round trip nicely to other operating system
APIs, but can indeed be a problem if they escape to other parts of
your program (hence ideas like
http://bugs.python.org/issue18814#msg251694 and the preceding
discussion in that issue)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Wed Apr 13 10:04:29 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 00:04:29 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid>
 <570C8A40.6020903@canterbury.ac.nz>
 <CALGmxE+-nWPj5AcNCnM2GAGbSqPxnQCixJTk0f=+Orur_TAy+Q@mail.gmail.com>
Message-ID: <CADiSq7ftfnfG1DLX9mmi-ofhUBxOKVAw5vZiv9Ju8TNsUMcyTg@mail.gmail.com>

On 13 April 2016 at 02:19, Chris Barker <chris.barker at noaa.gov> wrote:
> So: why use strings as the lingua franca of paths? i.e. the basis of the
> path protocol. maybe we should support only two path representations:
>
> 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> supports the path protocol.
>
> 2) the bytes that the OS actually needs.
>
> this would mean that the protocol would be to have a __pathbytes__() method
> that woulde return the bytes that should be passed off to the OS.

The reason to favour strings over raw bytes for path manipulation is
the same reason to favour them anywhere else: to avoid having to worry
about encodings *while* you're manipulating things, and instead only
worry about the encoding when actually talking to the OS (which may be
UTF-16-LE to talk to a Windows API, or UTF-8 to talk to a *nix API, or
something else entirely if your OS is set up that way, or you're
writing the path to a file or network packet, rather than using it
locally).

Regardless of what we decide about os.fspath's return type, that
general principle won't change - if you're manipulating bytes paths
directly, you're doing something relatively specialised (like working
on CPython's own os module).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From p.f.moore at gmail.com  Wed Apr 13 10:11:13 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 13 Apr 2016 15:11:13 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
Message-ID: <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>

On 13 April 2016 at 14:51, Nick Coghlan <ncoghlan at gmail.com> wrote:
> The potentially SE-strings only come back when you pass str, and the
> operating system data isn't properly encoded according to the nominal
> filesystem encoding. They round trip nicely to other operating system
> APIs, but can indeed be a problem if they escape to other parts of
> your program

If the operating system APIs handle SE-strings correctly, is it not
acceptable to require the fspath protocol to return strings, and then
places like DirEntry or Ethan's module, when they want to return
bytes, can just SE-encode the bytes and return those?

Or will the fspath protocol be used at a low enough level that it's
*below* the point where SE-encoded strings are handled properly?

Paul

From ncoghlan at gmail.com  Wed Apr 13 10:21:37 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 00:21:37 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
Message-ID: <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>

On 14 April 2016 at 00:11, Paul Moore <p.f.moore at gmail.com> wrote:
> On 13 April 2016 at 14:51, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> The potentially SE-strings only come back when you pass str, and the
>> operating system data isn't properly encoded according to the nominal
>> filesystem encoding. They round trip nicely to other operating system
>> APIs, but can indeed be a problem if they escape to other parts of
>> your program
>
> If the operating system APIs handle SE-strings correctly, is it not
> acceptable to require the fspath protocol to return strings, and then
> places like DirEntry or Ethan's module, when they want to return
> bytes, can just SE-encode the bytes and return those?
>
> Or will the fspath protocol be used at a low enough level that it's
> *below* the point where SE-encoded strings are handled properly?

I'd expect the main consumers to be os and os.path, and would honestly
be surprised if we needed many explicit invocations above that layer,
other than in pathlib itself.

That's actually the main factor in my suggesting the two level API
design - from a protocol consumer perspective, bytes-or-str is a
natural fit for os and os.path, while str-only is a natural fit for
pathlib.

I also now believe it makes sense to postpone a final decision on this
aspect of the design until after a draft implementation has been put
together, as my and Ethan's assumption that os and os.path will be the
main consumers is exactly that: an assumption. Putting the draft
implementation together will let us know whether or not it's an
accurate one.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ethan at stoneleaf.us  Wed Apr 13 11:09:36 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 08:09:36 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>	<570A7C67.3010304@stoneleaf.us>	<CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>	<570BCE39.8090306@stoneleaf.us>	<CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>	<570BDB17.5000601@stoneleaf.us>	<CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>	<570BECC6.1080708@stoneleaf.us>	<CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>	<CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>	<570C12C2.9000602@stoneleaf.us>	<CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>	<570D1F26.5090800@stoneleaf.us>	<CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>	<CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
Message-ID: <570E6130.6080507@stoneleaf.us>

On 04/13/2016 07:21 AM, Nick Coghlan wrote:
> On 14 April 2016 at 00:11, Paul Moore wrote:
>> On 13 April 2016 at 14:51, Nick Coghlan wrote:

>>> The potential SE-strings only come back when you pass str, and the
>>> operating system data isn't properly encoded according to the nominal
>>> filesystem encoding. They round trip nicely to other operating system
>>> APIs, but can indeed be a problem if they escape to other parts of
>>> your program
>>
>> If the operating system APIs handle SE-strings correctly, is it not
>> acceptable to require the fspath protocol to return strings, and then
>> places like DirEntry or Ethan's module, when they want to return
>> bytes, can just SE-encode the bytes and return those?
>>
>> Or will the fspath protocol be used at a low enough level that it's
>> *below* the point where SE-encoded strings are handled properly?
>
> I'd expect the main consumers to be os and os.path, and would honestly
> be surprised if we needed many explicit invocations above that layer,
> other than in pathlib itself.
>
> That's actually the main factor in my suggesting the two level API
> design - from a protocol consumer perspective, bytes-or-str is a
> natural fit for os and os.path, while str-only is a natural fit for
> pathlib.
>
> I also now believe it makes sense to postpone a final decision on this
> aspect of the design until after a draft implementation has been put
> together, as my and Ethan's assumption that os and os.path will be the
> main consumers is exactly that: an assumption. Putting the draft
> implementation together will let us know whether or not it's an
> accurate one.

Sounds reasonable.

However, there is still one choice that needs to be made:

- a single os.fspath() with an allow_bytes parameter
   (mostly True in os and os.path, mostly False everywhere
   else)

- a str-only os.fspathname() and a str/bytes os.fspath()

I'm partial to the first choice as it is simplicity itself to know when 
looking at it if bytes might be coming back by the presence or absence 
of a second argument to the call; otherwise one has to keep straight in 
one's head which is str-only and which might allow bytes (I'm not very 
good at keeping similar sounding functions separate -- what's the 
difference between shutil.copy and shutil.copy2?  I have to look it up 
every time).

--
~Ethan~

From random832 at fastmail.com  Wed Apr 13 11:17:41 2016
From: random832 at fastmail.com (Random832)
Date: Wed, 13 Apr 2016 11:17:41 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
Message-ID: <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> I'd expect the main consumers to be os and os.path, and would honestly
> be surprised if we needed many explicit invocations above that layer,
> other than in pathlib itself.

I made a toy implementation to try this out, and making os.open support
it does not get you builtin open "for free" as I had suspected; builtin
open has its own type checks in _iomodule.c.

Probably anything not implemented in pure python that deals with
filenames is going to have to have its type checking revised.

From ethan at stoneleaf.us  Wed Apr 13 11:28:28 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 08:28:28 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
Message-ID: <570E659C.8010108@stoneleaf.us>

On 04/13/2016 08:17 AM, Random832 wrote:
> On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:

>> I'd expect the main consumers to be os and os.path, and would honestly
>> be surprised if we needed many explicit invocations above that layer,
>> other than in pathlib itself.
>
> I made a toy implementation to try this out, and making os.open support
> it does not get you builtin open "for free" as I had suspected; builtin
> open has its own type checks in _iomodule.c.

Yup, it will take some effort to make this work.

> Probably anything not implemented in pure python that deals with
> filenames is going to have to have its type checking revised.

Agreed.

You can see why there was no point in pursuing the conversation unless 
someone was willing to do the work.

--
~Ethan~


From fred at fdrake.net  Wed Apr 13 12:18:36 2016
From: fred at fdrake.net (Fred Drake)
Date: Wed, 13 Apr 2016 12:18:36 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570E6130.6080507@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
Message-ID: <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>

On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> - a single os.fspath() with an allow_bytes parameter
>   (mostly True in os and os.path, mostly False everywhere
>   else)

-0

> - a str-only os.fspathname() and a str/bytes os.fspath()

+1 on using separate functions.

> I'm partial to the first choice as it is simplicity itself to know when
> looking at it if bytes might be coming back by the presence or absence of a
> second argument to the call; otherwise one has to keep straight in one's
> head which is str-only and which might allow bytes (I'm not very good at
> keeping similar sounding functions separate -- what's the difference between
> shutil.copy and shutil.copy2?  I have to look it up every time).

I do the same, but... this is one of those cases where a caller will
usually be passing a constant directly. If passed as a positional
argument, it'll just be confusing ("what's True?" is my usual reaction
to a Boolean positional argument). If passed as a keyword argument
with a descriptive name, it'll be longer than I'd like to see:

    path_str = os.fspath(path, allow_bytes=True)

Names like os.fspath() and os.fssyspath() seem good to me.


  -Fred

-- 
Fred L. Drake, Jr.    <fred at fdrake.net>
"A storm broke loose in my mind."  --Albert Einstein

From victor.stinner at gmail.com  Wed Apr 13 12:24:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 18:24:44 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Message-ID: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>

Hi,

In the middle of recent discussions about Python performance, it was
discussed to change the Python bytecode. Serhiy proposed to reuse
MicroPython short bytecode to reduce the disk space and reduce the
memory footprint.

Demur Rumed proposes a different change to use a regular bytecode
using 16-bit units: an instruction has always one 8-bit argument, it's
zero if the instruction doesn't have an argument:

   http://bugs.python.org/issue26647

According to benchmarks, it looks faster:

  http://bugs.python.org/issue26647#msg263339

IMHO it's a nice enhancement: it makes the code simpler. The most
interesting change is made in Python/ceval.c:

-        if (HAS_ARG(opcode))
-            oparg = NEXTARG();
+        oparg = NEXTARG();

This code is the very hot loop evaluating Python bytecode. I expect
that removing a conditional branch here can reduce the CPU branch
misprediction.

I reviewed first versions of the change, and IMHO it's almost ready to
be merged. But I would prefer to have a review from a least a second
core reviewer.

Can someone please review the change?

--

The side effect of wordcode is that arguments in 0..255 now uses 2
bytes per instruction instead of 3, so it also reduce the size of
bytecode for the most common case.

Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
argument for keyword defaults and 24-bit argument for annotations.
Other common instruction known to use large argument are jumps for
bytecode longer than 256 bytes.

--

Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
instructions. Later, we can discuss if it would be possible to ensure
that the bytecode is always aligned to 16-bit in memory to fetch the
two bytes using a uint16_t* pointer.

Maybe we can overallocate 1 byte in codeobject.c and align manually
the memory block if needed. Or ceval.c should maybe copy the code if
it's not aligned?

Raymond Hettinger proposes something like that, but it looks like
there are concerns about non-aligned memory accesses:

   http://bugs.python.org/issue25823

The cost of non-aligned memory accesses depends on the CPU
architecture, but it can raise a SIGBUS on some arch (MIPS and
SPARC?).

Victor

From brett at python.org  Wed Apr 13 12:26:34 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 16:26:34 +0000
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <nekii9$9lr$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
Message-ID: <CAP1=2W4Pzim9w7Z5Jz+7hb9sjzhmhzAkR1as0Wt0eLn_w8vy8w@mail.gmail.com>

Glad it's working again! And it was a combination or R. David Murray, Ezio
Melotti, Mark Mangoba (
http://pyfound.blogspot.com/2016/04/the-psf-has-hired-it-manager.html in
case you don't know who Mark is), and myself along with Upfront (b.p.o
hosting provider).

On Tue, 12 Apr 2016 at 21:40 Terry Reedy <tjreedy at udel.edu> wrote:

> On 4/4/2016 5:05 PM, Terry Reedy wrote:
>
> Since a few days, I am getting bug tracker emails again, in my Inbox.  I
> just got a Rietveld review in the Inbox and I believe it went there
> directly instead of first to Junk.  Thank you to whoever made the
> improvements.
>
> --
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/f589fb9c/attachment.html>

From p.f.moore at gmail.com  Wed Apr 13 12:27:48 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 13 Apr 2016 17:27:48 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
Message-ID: <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>

On 13 April 2016 at 17:18, Fred Drake <fred at fdrake.net> wrote:
> Names like os.fspath() and os.fssyspath() seem good to me.

-1 on fssyspath - the "system" representation is bytes on POSIX, but
not on Windows. Let's be explicit and go with fsbytespath().

But agreed that always-constant boolean parameters are a bad idea. The
hard bit is good naming of the separate functions (100% agree that
shutil is a good example of how not to do it :-))

Paul

From ethan at stoneleaf.us  Wed Apr 13 12:30:37 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 09:30:37 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
Message-ID: <570E742D.2050303@stoneleaf.us>

On 04/13/2016 09:18 AM, Fred Drake wrote:
> On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman wrote:
>> - a single os.fspath() with an allow_bytes parameter
>>    (mostly True in os and os.path, mostly False everywhere
>>    else)
>
> -0
>
>> - a str-only os.fspathname() and a str/bytes os.fspath()
>
> +1 on using separate functions.

> Names like os.fspath() and os.fssyspath() seem good to me.

Ooh, I like that!  I could probably keep those names separate in my 
head.  :)

--
~Ethan~


From ethan at stoneleaf.us  Wed Apr 13 12:31:32 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 09:31:32 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>	<570BCE39.8090306@stoneleaf.us>	<CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>	<570BDB17.5000601@stoneleaf.us>	<CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>	<570BECC6.1080708@stoneleaf.us>	<CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>	<CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>	<570C12C2.9000602@stoneleaf.us>	<CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>	<570D1F26.5090800@stoneleaf.us>	<CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>	<CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>	<CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>	<570E6130.6080507@stoneleaf.us>	<CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
 <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>
Message-ID: <570E7464.2070008@stoneleaf.us>

On 04/13/2016 09:27 AM, Paul Moore wrote:
> On 13 April 2016 at 17:18, Fred Drake wrote:

>> Names like os.fspath() and os.fssyspath() seem good to me.
>
> -1 on fssyspath - the "system" representation is bytes on POSIX, but
> not on Windows. Let's be explicit and go with fsbytespath().

It will be confusing that fsbytespath() can return a string.

--
~Ethan~


From fred at fdrake.net  Wed Apr 13 12:31:09 2016
From: fred at fdrake.net (Fred Drake)
Date: Wed, 13 Apr 2016 12:31:09 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
 <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>
Message-ID: <CAFT4OTGCizmGTRvxQy=cyrMjT4rdEq23geYRz5GoJxxwNno4fg@mail.gmail.com>

On Wed, Apr 13, 2016 at 12:27 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> -1 on fssyspath - the "system" representation is bytes on POSIX, but
> not on Windows. Let's be explicit and go with fsbytespath().

Depends on the semantics; if we're expecting it to return
str-or-bytes, os.fssyspath() seems fine.  If only returning bytes (not
sure that ever makes sense on Windows, since I don't use Windows),
then I'd be happy with os.fsbytespath().


  -Fred

-- 
Fred L. Drake, Jr.    <fred at fdrake.net>
"A storm broke loose in my mind."  --Albert Einstein

From guido at python.org  Wed Apr 13 12:33:34 2016
From: guido at python.org (Guido van Rossum)
Date: Wed, 13 Apr 2016 09:33:34 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
Message-ID: <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>

Nice work. I think that for CPython, speed is much more important than
memory use for the code. Disk space is practically free for anything
smaller than a video. :-)

On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> In the middle of recent discussions about Python performance, it was
> discussed to change the Python bytecode. Serhiy proposed to reuse
> MicroPython short bytecode to reduce the disk space and reduce the
> memory footprint.
>
> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>    http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -        if (HAS_ARG(opcode))
> -            oparg = NEXTARG();
> +        oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>
> I reviewed first versions of the change, and IMHO it's almost ready to
> be merged. But I would prefer to have a review from a least a second
> core reviewer.
>
> Can someone please review the change?
>
> --
>
> The side effect of wordcode is that arguments in 0..255 now uses 2
> bytes per instruction instead of 3, so it also reduce the size of
> bytecode for the most common case.
>
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
> of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
> bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
> argument for keyword defaults and 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for
> bytecode longer than 256 bytes.
>
> --
>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>    http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From p.f.moore at gmail.com  Wed Apr 13 12:41:11 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 13 Apr 2016 17:41:11 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570E7464.2070008@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
 <CACac1F9MGyjbAxqO17ojrCHL12HvU9dNJ2mFTNfy0Zg1iuHtnA@mail.gmail.com>
 <570E7464.2070008@stoneleaf.us>
Message-ID: <CACac1F8bLjpT-EgjVuBsxopq-HYPny+x0EMv-qPyZuFEJKdmMw@mail.gmail.com>

On 13 April 2016 at 17:31, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/13/2016 09:27 AM, Paul Moore wrote:
>>
>> On 13 April 2016 at 17:18, Fred Drake wrote:
>
>
>>> Names like os.fspath() and os.fssyspath() seem good to me.
>>
>>
>> -1 on fssyspath - the "system" representation is bytes on POSIX, but
>> not on Windows. Let's be explicit and go with fsbytespath().
>
>
> It will be confusing that fsbytespath() can return a string.

Oh, wait, yes fssyspath is for allow_bytes=True which *may* be bytes,
but could still be a string. My mistake. On that basis, I could go
with fssyspath (thinking "sys" = "low level").

Paul

From brett at python.org  Wed Apr 13 12:43:31 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 16:43:31 +0000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <loom.20160413T150406-192@post.gmane.org>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <loom.20160413T150406-192@post.gmane.org>
Message-ID: <CAP1=2W6-t81SvA_ZOn1qRM1Gt2sAf55UncTW8uWAGjzPYCPy8A@mail.gmail.com>

On Wed, 13 Apr 2016 at 06:14 Stefan Krah <stefan at bytereef.org> wrote:

> Victor Stinner <victor.stinner <at> gmail.com> writes:
> > Maybe it's time to move more 3.x buildbots to the "stable" category?
> > http://buildbot.python.org/all/waterfall?category=3.x.stable
>
> +1 I think anything that is actually stable should be in that category.
>
>
> > By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
> > considered as stable since it's failing with multiple issues since
> > many months and nobody is working on these failures. I suggest to move
> > this buildbot back to the unstable category.
>
> +1 The bot was very stable and fast for some time but has been unstable
> for at least a year.
>
>
>
> > - PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers,
> > test_socket, test_distutils, test_asyncio, (...); random timeout
> > failure in test_eintr, etc. I don't have access to AIX and I'm not
> > interested to acquire an AIX license, nor to install it. I'm not sure
> > that it's useful to have an AIX buildbot and no core developer have
> > access to AIX, and nobody is working on AIX failures. Maybe HP wants
> > to help us to support AIX? (Provide manpower, access to AIX servers,
> > or something like that.)
>
> Well, I think in this case it's the gcc AIX maintainer running it, so...
>
>
> I think we should have a policy to stop reporting issues on unstable
> bots unless someone has a concrete fix OR the bot maintainers are
> known to fix issues fast (but that does not seem to be the case).
>

Official policy per
https://www.python.org/dev/peps/pep-0011/#supporting-platforms states that
there must be a core developer to maintain the compatibility, so if there's
no one helping to keep a particular buildbot green then I agree it should
be marked as unstable and thus not supported.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/835a1098/attachment-0001.html>

From brett at python.org  Wed Apr 13 12:44:08 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 16:44:08 +0000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <570E4215.5090101@timgolden.me.uk>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <570E4215.5090101@timgolden.me.uk>
Message-ID: <CAP1=2W776KHxZvyEibrzLNjkiPTngbDKkH+0W+SGYPkGN36Y+w@mail.gmail.com>

On Wed, 13 Apr 2016 at 05:57 Tim Golden <mail at timgolden.me.uk> wrote:

> On 13/04/2016 12:40, Victor Stinner wrote:
> > Last months, most 3.x buildbots failed randomly. Some of them were
> > always failing. I spent some time to fix almost all Windows and Linux
> > buildbots. There were a lot of different issues.
>
> Can I state the obvious and offer a huge vote of thanks for this work,
> which is often tedious and unrewarding?
>

Yep, big thanks from me as well!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/e39ad3c3/attachment.html>

From random832 at fastmail.com  Wed Apr 13 12:51:29 2016
From: random832 at fastmail.com (Random832)
Date: Wed, 13 Apr 2016 12:51:29 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570E659C.8010108@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
Message-ID: <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote:
> On 04/13/2016 08:17 AM, Random832 wrote:
> > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> 
> >> I'd expect the main consumers to be os and os.path, and would honestly
> >> be surprised if we needed many explicit invocations above that layer,
> >> other than in pathlib itself.
> >
> > I made a toy implementation to try this out, and making os.open support
> > it does not get you builtin open "for free" as I had suspected; builtin
> > open has its own type checks in _iomodule.c.
> 
> Yup, it will take some effort to make this work.

A corner case just occurred to me...

For functions that will continue to accept str/bytes (and functions that
accept some other type such as Number or file-like objects), what should
be done with an object that is one of these, *and* has an __fspath__
method, *and* this method returns a value other than the object's own
value? Basically, should the protocol check be done unconditionally
(before attempting to use the argument as a string) or only if the
argument is not a string (there's an efficiency argument for this). Or
should it be left "unspecified", with the understanding that such
objects are badly behaved and may not be handled consistently across
different functions / python implementations / cpython versions?

Also, should the os.fspath (or whatever we call it) function itself
accept str/bytes, even if these are not going to implement the protocol?

From brett at python.org  Wed Apr 13 12:58:22 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 16:58:22 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
Message-ID: <CAP1=2W5tJkR=hr-13oHYVfBhNAuHMbTyB3jwx3VLXGQE__tocA@mail.gmail.com>

On Wed, 13 Apr 2016 at 09:19 Fred Drake <fred at fdrake.net> wrote:

> On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> > - a single os.fspath() with an allow_bytes parameter
> >   (mostly True in os and os.path, mostly False everywhere
> >   else)
>
> -0
>
> > - a str-only os.fspathname() and a str/bytes os.fspath()
>
> +1 on using separate functions.
>
> > I'm partial to the first choice as it is simplicity itself to know when
> > looking at it if bytes might be coming back by the presence or absence
> of a
> > second argument to the call; otherwise one has to keep straight in one's
> > head which is str-only and which might allow bytes (I'm not very good at
> > keeping similar sounding functions separate -- what's the difference
> between
> > shutil.copy and shutil.copy2?  I have to look it up every time).
>
> I do the same, but... this is one of those cases where a caller will
> usually be passing a constant directly. If passed as a positional
> argument, it'll just be confusing ("what's True?" is my usual reaction
> to a Boolean positional argument).


It would be keyword-only so this isn't even a possibility.


> If passed as a keyword argument
> with a descriptive name, it'll be longer than I'd like to see:
>
>     path_str = os.fspath(path, allow_bytes=True)
>

I think the expectation that the number of people actually directly calling
this function with that argument specified is going to be rather small, so
the common-case will simply be:

    path_str = os.fspath(path)


>
> Names like os.fspath() and os.fssyspath() seem good to me.
>

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/62a24e52/attachment.html>

From Nikolaus at rath.org  Wed Apr 13 12:59:35 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Wed, 13 Apr 2016 09:59:35 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570E6130.6080507@stoneleaf.us> (Ethan Furman's message of "Wed, 
 13 Apr 2016 08:09:36 -0700")
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
Message-ID: <87y48hcumg.fsf@thinkpad.rath.org>

On Apr 13 2016, Ethan Furman <ethan at stoneleaf.us> wrote:
> (I'm not very good at keeping similar sounding functions separate --
> what's the difference between shutil.copy and shutil.copy2?  I have to
> look it up every time).

Well, "2" is more than "" (or 1), so copy2() copies *more* than copy() -
it includes the metadata. That always helps me.


Best,
-Nikolaus
-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From ethan at stoneleaf.us  Wed Apr 13 13:06:33 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 10:06:33 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W5tJkR=hr-13oHYVfBhNAuHMbTyB3jwx3VLXGQE__tocA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <570E6130.6080507@stoneleaf.us>
 <CAFT4OTHng5dAESHNnrzvJk9GFxoLU_py_jov1j0aMPNWKyMYhw@mail.gmail.com>
 <CAP1=2W5tJkR=hr-13oHYVfBhNAuHMbTyB3jwx3VLXGQE__tocA@mail.gmail.com>
Message-ID: <570E7C99.3020402@stoneleaf.us>

On 04/13/2016 09:58 AM, Brett Cannon wrote:> On Wed, 13 Apr 2016 at 
09:19 Fred Drake wrote:

 >> I do the same, but... this is one of those cases where a caller will
 >> usually be passing a constant directly. If passed as a positional
 >> argument, it'll just be confusing ("what's True?" is my usual
 >> reaction to a Boolean positional argument).
 >
 > It would be keyword-only so this isn't even a possibility.
 >
 >> If passed as a keyword argument
 >> with a descriptive name, it'll be longer than I'd like to see:
 >>
 >>      path_str = os.fspath(path, allow_bytes=True)
 >
 > I think the expectation that the number of people actually directly
 > calling this function with that argument specified is going to be
 > rather small, so the common-case will simply be:
 >
 >      path_str = os.fspath(path)

That is certainly my expectation.  :)

 >> Names like os.fspath() and os.fssyspath() seem good to me.

A single function is definitely my preference, but if that's not 
possible then I'm fine with that pair of names.

--
~Ethan~

From brett at python.org  Wed Apr 13 13:10:09 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 17:10:09 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <loom.20160413T071958-483@post.gmane.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
Message-ID: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>

On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
python-dev at python.org> wrote:

> Ethan Furman <ethan <at> stoneleaf.us> writes:
>
> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
> > allow bytes from __fspath__()?
>
> De-lurking. Especially since the ultimate goal is better interoperability,
> I
> feel like an implementation that people can play with would help guide the
> few remaining decisions. To help test the various options you could
> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
> both
> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults
> for
> each.
>
> In the spirit of Python 3 I feel like bytes might not be needed in
> practice,
> but something like this with defaults of False will allow people to easily
> test all the various options.
>

https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
the four potential approaches implemented (although it doesn't follow the
"separate functions" approach some are proposing and instead goes with the
allow_bytes approach I originally proposed).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/db073749/attachment.html>

From brett at python.org  Wed Apr 13 13:20:07 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 17:20:07 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
Message-ID: <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>

On Wed, 13 Apr 2016 at 09:52 Random832 <random832 at fastmail.com> wrote:

> On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote:
> > On 04/13/2016 08:17 AM, Random832 wrote:
> > > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote:
> >
> > >> I'd expect the main consumers to be os and os.path, and would honestly
> > >> be surprised if we needed many explicit invocations above that layer,
> > >> other than in pathlib itself.
> > >
> > > I made a toy implementation to try this out, and making os.open support
> > > it does not get you builtin open "for free" as I had suspected; builtin
> > > open has its own type checks in _iomodule.c.
> >
> > Yup, it will take some effort to make this work.
>
> A corner case just occurred to me...
>
> For functions that will continue to accept str/bytes (and functions that
> accept some other type such as Number or file-like objects), what should
> be done with an object that is one of these, *and* has an __fspath__
> method, *and* this method returns a value other than the object's own
> value? Basically, should the protocol check be done unconditionally
> (before attempting to use the argument as a string) or only if the
> argument is not a string (there's an efficiency argument for this). Or
> should it be left "unspecified", with the understanding that such
> objects are badly behaved and may not be handled consistently across
> different functions / python implementations / cpython versions?
>
> Also, should the os.fspath (or whatever we call it) function itself
> accept str/bytes, even if these are not going to implement the protocol?
>

All of this is demonstrated in
https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the
various possibilities. In the end it's not a corner case because the
definition of __fspath__ will be such that there's no ambiguity in what
os.fspath() will accept and what __fspath__ can return and the code will be
written to conform to what the PEP dictates (IOW I'm aware that this needs
to be considered in the implementation :) .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/9ab9ac2f/attachment.html>

From tritium-list at sdamon.com  Wed Apr 13 13:22:48 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Wed, 13 Apr 2016 13:22:48 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <570E8068.4060303@sdamon.com>

On 4/13/2016 13:10, Brett Cannon wrote:
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has 
> the four potential approaches implemented (although it doesn't follow 
> the "separate functions" approach some are proposing and instead goes 
> with the allow_bytes approach I originally proposed). 

Number 4 is my personal favorite - it has a simple control flow path and 
is the least needlessly restrictive.

(I could rant about needless restrictions, but I am about a decade late 
for that, so I wont bother.)

From ethan at stoneleaf.us  Wed Apr 13 13:49:48 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 10:49:48 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570E8068.4060303@sdamon.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <570E8068.4060303@sdamon.com>
Message-ID: <570E86BC.2030703@stoneleaf.us>

On 04/13/2016 10:22 AM, Alexander Walters wrote:
> On 4/13/2016 13:10, Brett Cannon wrote:

>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
>> has the four potential approaches implemented (although it doesn't
>> follow the "separate functions" approach some are proposing and
>> instead goes with the allow_bytes approach I originally proposed).
>
> Number 4 is my personal favorite - it has a simple control flow path and
> is the least needlessly restrictive.

Number 3: it allows bytes, but only when told it's okay to do so. 
Having code get a bytes object when one is not expected is not a 
headache we need to inflict on anyone.

--
~Ethan~


From antoine at python.org  Wed Apr 13 14:25:52 2016
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 13 Apr 2016 18:25:52 +0000 (UTC)
Subject: [Python-Dev] pathlib - current status of discussions
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <loom.20160413T202415-731@post.gmane.org>

Brett Cannon <brett <at> python.org> writes:
> In the spirit of Python 3 I feel like bytes might not be needed in practice,
> but something like this with defaults of False will allow people to easily
> test all the various options.
> 
> 
> 
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1?has
the four potential approaches implemented (although it doesn't follow the
"separate functions" approach some are proposing and instead goes with the
allow_bytes approach I originally proposed).?

Either number 1 or number 3 for me (I don't think bytes path-like
objects are useful in Python).

Regards

Antoine.

From rosuav at gmail.com  Wed Apr 13 15:24:35 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 14 Apr 2016 05:24:35 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>

On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon <brett at python.org> wrote:
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
> four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).

All of them have this construct:

try:
    path = path.__fspath__()
except AttributeError:
    pass

Is that the intention, or should the exception catching be narrower? I
know it's clunky to write it in Python, but AIUI it's less so in C:

try:
    callme = path.__fspath__
except AttributeError:
    pass
else:
    path = callme()

ChrisA

From brett at python.org  Wed Apr 13 15:30:30 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 19:30:30 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
Message-ID: <CAP1=2W6NZjzr0naSBMU6E6iCFTr+R4Rr+ULn-X80G5sta=2Gfw@mail.gmail.com>

On Wed, 13 Apr 2016 at 12:25 Chris Angelico <rosuav at gmail.com> wrote:

> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon <brett at python.org> wrote:
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
> has the
> > four potential approaches implemented (although it doesn't follow the
> > "separate functions" approach some are proposing and instead goes with
> the
> > allow_bytes approach I originally proposed).
>
> All of them have this construct:
>
> try:
>     path = path.__fspath__()
> except AttributeError:
>     pass
>
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> try:
>     callme = path.__fspath__
> except AttributeError:
>     pass
> else:
>     path = callme()
>

I'm assuming the C code will do what you're suggesting. My way is just
faster to write in 2 minutes of coding. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/c07829d5/attachment.html>

From fred at fdrake.net  Wed Apr 13 15:36:12 2016
From: fred at fdrake.net (Fred Drake)
Date: Wed, 13 Apr 2016 15:36:12 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
Message-ID: <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>

On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico <rosuav at gmail.com> wrote:
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> try:
>     callme = path.__fspath__
> except AttributeError:
>     pass
> else:
>     path = callme()

+1 for this variant; I really don't like masking errors inside the
__fspath__ implementation.


  -Fred

-- 
Fred L. Drake, Jr.    <fred at fdrake.net>
"A storm broke loose in my mind."  --Albert Einstein

From rosuav at gmail.com  Wed Apr 13 15:37:30 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 14 Apr 2016 05:37:30 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W6NZjzr0naSBMU6E6iCFTr+R4Rr+ULn-X80G5sta=2Gfw@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAP1=2W6NZjzr0naSBMU6E6iCFTr+R4Rr+ULn-X80G5sta=2Gfw@mail.gmail.com>
Message-ID: <CAPTjJmqjLBCBbQTZ8ZEvwrrez836QQ_W9k55pJex0r3qRBWFAQ@mail.gmail.com>

On Thu, Apr 14, 2016 at 5:30 AM, Brett Cannon <brett at python.org> wrote:
>
>
> On Wed, 13 Apr 2016 at 12:25 Chris Angelico <rosuav at gmail.com> wrote:
>>
>> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon <brett at python.org> wrote:
>> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
>> > the
>> > four potential approaches implemented (although it doesn't follow the
>> > "separate functions" approach some are proposing and instead goes with
>> > the
>> > allow_bytes approach I originally proposed).
>>
>> All of them have this construct:
>>
>> try:
>>     path = path.__fspath__()
>> except AttributeError:
>>     pass
>>
>> Is that the intention, or should the exception catching be narrower? I
>> know it's clunky to write it in Python, but AIUI it's less so in C:
>>
>> try:
>>     callme = path.__fspath__
>> except AttributeError:
>>     pass
>> else:
>>     path = callme()
>
>
> I'm assuming the C code will do what you're suggesting. My way is just
> faster to write in 2 minutes of coding. :)

Cool cool. Just checking!

You're already aware that my preference is for the first one,
str-only. I don't think the second one has much value (a path-like
object can only ever return a str, but a bytes can be passed through
unchanged?), and the fourth strikes me as a bad idea (just allowing
bytes any time). So my votes are +1, -0.5, +0, -1.

ChrisA

From tritium-list at sdamon.com  Wed Apr 13 15:42:43 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Wed, 13 Apr 2016 15:42:43 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570E86BC.2030703@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <570E8068.4060303@sdamon.com> <570E86BC.2030703@stoneleaf.us>
Message-ID: <570EA133.6030504@sdamon.com>

On 4/13/2016 13:49, Ethan Furman wrote:
> Number 3: it allows bytes, but only when told it's okay to do so. 
> Having code get a bytes object when one is not expected is not a 
> headache we need to inflict on anyone. 

This is an artifact of the other needless restrictions I said I wouldn't 
rant about.  I think it is in the best interest not to perpetuate those 
needless restrictions.

From random832 at fastmail.com  Wed Apr 13 15:46:37 2016
From: random832 at fastmail.com (Random832)
Date: Wed, 13 Apr 2016 15:46:37 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
Message-ID: <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote:
> Is that the intention, or should the exception catching be narrower? I
> know it's clunky to write it in Python, but AIUI it's less so in C:

How is it less so in C? You lose the ability to PyObject_CallMethod.

From rosuav at gmail.com  Wed Apr 13 15:54:37 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 14 Apr 2016 05:54:37 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com>
Message-ID: <CAPTjJmo=bSsyi03YtQ5fZct7i4Wnugahhrb0geRa81S96xOjQw@mail.gmail.com>

On Thu, Apr 14, 2016 at 5:46 AM, Random832 <random832 at fastmail.com> wrote:
> On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote:
>> Is that the intention, or should the exception catching be narrower? I
>> know it's clunky to write it in Python, but AIUI it's less so in C:
>
> How is it less so in C? You lose the ability to PyObject_CallMethod.

I might be wrong, then. Wasn't sure how it was all implemented.
Anyway, it's a correctness thing, not a simplicity one, so even if it
is clunkier, it ought to be the case.

And that is the intention, so we're fine.

ChrisA

From brett at python.org  Wed Apr 13 15:54:44 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 19:54:44 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
Message-ID: <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>

On Wed, 13 Apr 2016 at 12:39 Fred Drake <fred at fdrake.net> wrote:

> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico <rosuav at gmail.com> wrote:
> > Is that the intention, or should the exception catching be narrower? I
> > know it's clunky to write it in Python, but AIUI it's less so in C:
> >
> > try:
> >     callme = path.__fspath__
> > except AttributeError:
> >     pass
> > else:
> >     path = callme()
>
> +1 for this variant; I really don't like masking errors inside the
> __fspath__ implementation.
>

Don't read too much into the code in that gist. I just did them quickly to
get the point across of the proposals in terms of str/bytes, not what will
be proposed in any final patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/7aaf7ce6/attachment.html>

From k7hoven at gmail.com  Wed Apr 13 15:59:50 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 13 Apr 2016 22:59:50 +0300
Subject: [Python-Dev] List posting custom [was: current status of
 discussions]
In-Reply-To: <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp>
References: <570C1E13.4090909@stoneleaf.us>
 <CADiSq7fGnEq3eMo_wgPVYMC2ZGq7A4=v882dFHTuKGTv2mFoog@mail.gmail.com>
 <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us>
 <570D167C.8040202@mail.de>
 <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohojA4xJkbSZLEgdz9V7u9YHJZJy_xxRc=OxN3cbR-+68eA@mail.gmail.com>

On Wed, Apr 13, 2016 at 5:56 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> The following is my opinion, as will become obvious, but it's based on
> over a decade of observing these lists, and other open source
> development lists.  In a context where some core developers have
> unsubscribed from these lists, and others regularly report muting
> threads with a certain air of asperity, I think it's worth the risk of
> seeming arrogant to explain some of the customs (which are complex and
> subtle) around posting to Python developer lists.  I'm posting
> publicly because there are several new developers whose activity and
> fresh perspective is very welcome, but harmony *is* being disturbed,
> IMO unnecessarily.
>

Thank you for this thoughtful post. While none of the quotes you refer
to are mine, I did try to find whether any of the advice is something
I should learn from. While I didn't find a whole lot (please do
correct me if you think otherwise), it is also valuable to hear these
things from someone more experienced, even just to confirm what I may
have thought or guessed. I can't really tell, but possibly some of the
thoughts are interesting even to people significantly more experienced
than me.

I know you are not interested in discussing this further here, but
I'll add some inexperienced points of view inline below, just in case
someone is interested:

> This particular post caught my eye, but it's only an example of one of
> the most unharmonious posting styles that has become common recently.
> Attribution deliberately removed.
>
>  > Sorry for disturbing this thread's harmony.
>
> *sigh*  There is way too much of this on Python-Ideas recently, and
> there shouldn't be any on Python-Dev.  So please don't.  Specifically,
> disagreement with an apparently developing consensus is fine but
> please avoid this:
>
>  > >> Path is an alternative to os.path -- you don't need to use both.
>  >
>  > I agree with that quote of Chris.
>
> It's a waste of time to post *what* you agree with.[1]  Decisions are
> not taken by vote in this community, except for the color of the
> bikeshed, where it is agreed that *what* decision is taken doesn't
> matter, but that some decision should be taken expeditiously.[2]
> Chris already stated this position clearly and it's not a "color", so
> there is no need to reiterate.  It simply wastes others' time to read
> it.  (Whether it was a waste of the poster's time is not for me to
> comment on.)
>
> What matters to the decision is *why* you agree (or disagree).  If you
> think that some of Chris's arguments are bogus (and should be
> disregarded) and others are important, that is valuable information.
> It's even better if you can shed additional light on the matter
> (example below).
>
> Also, expression of agreement is often a prelude to a request for
> information.  "I agree with Z's post.  At least, I have never needed
> X.  *When* do you need X?  Let's look for a better way than X!"
>

That's what I thought too. I remember several times recently that I
have mentioned I agreed about something, then continuing to add more
to it, or even saying I disagree about something else. Part of the
reason to also state that I agree is an attempt to keep the overall
tone more positive. After all, the other person might be a highly
experienced core developer who just did not happen to have gone though
all the same thoughts regarding that specific question recently. I
hope that has not been interpreted as arrogance such as "I know better
than these people".

For me, as one of the (many?) newcomers, especially on -dev, it can
sometimes be difficult to tell whether not getting a reaction means
"Good point, I agree", "I did not understand so I'll just ignore it",
"I don't want to argue with you" or something else. Then again,
someone just saying essentially the same thing without a reference a
few posts later just feels strange. Also, if the only thing people
apparently do is disagree about things, it makes the overall tone of
the discussions at least *seem* very negative. From this point of view
there seems to be some good in positive comments.

> Unsupported (dis)agreement to statements about "needs" also may be
> taken as *rude*, because others may infer your arrogant claim to know
> what *they* do or don't need.  Admittedly there's a difficult
> distinction here between Chris's *idiom* where "you don't need to"
> translates to "In my understanding, it is generally not necessary to",
> and your *unsupported* agreement, which in my dialect of English
> changes the emphasis to imply you know better than those who disagree
> with you and Chris.  And, of course, the position that others are "too
> easily offended" is often reasonable, but you should be aware that
> there will be an impact on your reputation and ability to influence
> development of Python (even if it doesn't come near the point where
> a moderator invokes "Code of Conduct").
>
> "Me too" posts aren't entirely forbidden, but I feel that in Python
> custom they are most appropriate when voting on bikeshed colors, and
> as applause for a *technically* excellent suggestion.  They should be
> avoided in the context of value judgments (of "need" and "simplicity",
> for example) for the reason given above.

Personally, I've sometimes feeled the urge to give a positive comment
just to make sure something gets noticed, or to help keep the
discussion *not* go around in circles by pointing out more clearly the
important points to the people not as involved in the topic of
discussion. But I've tried to resist this urge when I don't have
anything to add. I find the notion of S/N (signal-to-noise ratio),
which you in fact brought up recently in another thread, very
important.

>  > When people want to use your library and it requires a string, the
>  > can simply use "my_path.path" and everything still works for them
>  > when they switch to pathlib.
>
> This is disrespectful in tone.  I don't know if you're responding to
> Ethan here, but he's one of the authors in question.  We *know* that
> Ethan doesn't like such inelegant idioms -- he said so -- where "this
> object has an appropriate conversion to your argument type, so you
> should apply it implicitly" is unambiguous.[3] So for him, it's *not*
> so simple.  Since it's not a matter of voting, each proponent should
> provide more contexts where preferred programming idioms are
> "Pythonic" to sway the sense of the community, or if necessary, the
> BDFL.
>
> Where that aesthetic came up was in the context of consistently
> wrapping arguments that might be Paths in str, as in
>
>     p = Path(*stuff) or defaultstring
>     # 500 lines crossing function and module boundaries!
>     with open(str(p)) as f:
>         process(f)
>
> I think it was Nick who posted agreement with Ethan on the aesthetics
> of str-wrapping.  If that were all, he probably wouldn't have posted
> (see fn. 1), but he further pointed out that this application of str
> is *dangerous* because *everything* in Python can be coerced to str.
> That was a very valuable observation, which swayed the list in favor
> of "Uh-oh, we can't recommend 'os.method(str(Path))'!"
>
> This is my last post on this particular topic, but I will be happy to
> discuss off-list.  (I may discuss further in public on my blog, but
> first I have to get a blog. :-)
>
>
> Footnotes:
> [1]  "You" is generic here.  There are a couple of developers whose
> agreement has the status of pronouncement of Pythonicity.  Aspire to
> that, but don't assume it -- very few have it, and it's actually
> *very* rarely exercised.  And you can recognize them because they are
> *asked* to pronounce -- by people whose statements you thought were
> already authoritative!
>
> [2]  And even so votes are often overturned by later arguments, both
> theoretical and based in experience.  See for example the several
> threads over time on the naming of Py_XSETREF.
>
> [3]  Interpreting Zen koans frequently requires figure-ground
> inversion.  In this case we can apply "In the face of ambiguity,
> refuse to guess" in the form "in the absence of ambiguity, don't wait
> to be asked".  I'm hardly authoritative, but FWIW :-) I think Ethan's
> esthetic sense here accords with Pythonicity.

From zachary.ware+pydev at gmail.com  Wed Apr 13 16:16:08 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Wed, 13 Apr 2016 15:16:08 -0500
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <CAKJDb-MD=+x0_2TLp2N-EvAXZrbtUMwEijzvQ1Q5745BsOoxEw@mail.gmail.com>

On Wed, Apr 13, 2016 at 6:40 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.

Thank you for doing this!

> Maybe it's time to move more 3.x buildbots to the "stable" category?
> http://buildbot.python.org/all/waterfall?category=3.x.stable

A few months ago, I put together a list of suggestions for updating
the stable/unstable list, but never got around to implementing it.

> We have many offline buildbots. What's the status of these buildbots?
> Should we expect that they come back soon?

My Windows 8.1 bot is a VM that resides on a machine that has been
disturbingly unstable lately, and it's starting to seem like the
instability is due to that VM.  I hope to have it back up (and stable)
again soon, but have no timetable for it.  My Docs bot was off after
losing power over the weekend, and I just hadn't noticed yet.  It's
back now.

I'll ping the python-buildbots list about other offline bots.

> Or would it be possible to hide them? It would help to check the
> status of all buildbots.

I'm not sure, but that would be a nice feature.

> - the 4 ICC buildbots are failing with stack overflow, segfault, etc.
> Again, I'm not sure that these buildbots are useful since it looks
> like we don't support this compiler yet. Or does it help to work on
> supporting this compiler? Who is working on ICC support?

The Ubuntu ICC bot is generally quite stable.  The OSX ICC bot is
currently offline, but has only a couple of known issues.  The Windows
ICC bot is still a bit experimental, but has inched closer to
producing a working build.  R. David Murray and I have been working
with Intel on ICC support.

> By the way, I'm always surprised by the huge difference of time needed
> to run a build on the different slaves: from a few minutes to more
> than 3 hours. The fatest Windows slave takes 28 minutes (run tests in
> parallel using 4 child processes), whereas the 3 others (run tests
> sequentially and) take between 2 hours and more than 3 hours! Why
> running tests on Windows takes so long?

Most of that is down to debug mode; building Python in debug mode
links with the debug CRT which also enables all manner of extra
checks.  When it's up, the non-debug Windows bot also runs the test
suite in ~28 minutes, running sequentially.

---

After receiving a suggestion from koobs several months ago, I've been
intermittently thinking about completely redoing our buildmaster setup
such that instead of a single builder per version on each slave, we
instead set up a series of builders with particular 'tags', and each
builder attaches to each slave that satisfies the tags (running each
build only on the first slave available).  This would allow us to test
some of the rarer options (such as --without-threads) significantly
more often than 'never', and generally get a lot more
customization/flexibility of builds.  I haven't had a chance to sit
down and think out all the edge cases of this idea, but what do people
generally think of it?  I think the GitHub switchover will be a good
time to do this if it's generally seen as a decent idea, since there
will need to be some work on the buildmaster to do the switch anyway.

-- 
Zach

From brett at python.org  Wed Apr 13 16:37:46 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 20:37:46 +0000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAKJDb-MD=+x0_2TLp2N-EvAXZrbtUMwEijzvQ1Q5745BsOoxEw@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <CAKJDb-MD=+x0_2TLp2N-EvAXZrbtUMwEijzvQ1Q5745BsOoxEw@mail.gmail.com>
Message-ID: <CAP1=2W40YWky6vJ1sxUVc2Oc6r232S6r7NEgGJuK7QYik-O5Bw@mail.gmail.com>

On Wed, 13 Apr 2016 at 13:17 Zachary Ware <zachary.ware+pydev at gmail.com>
wrote:

> [SNIP]
> ---
>
> After receiving a suggestion from koobs several months ago, I've been
> intermittently thinking about completely redoing our buildmaster setup
> such that instead of a single builder per version on each slave, we
> instead set up a series of builders with particular 'tags', and each
> builder attaches to each slave that satisfies the tags (running each
> build only on the first slave available).  This would allow us to test
> some of the rarer options (such as --without-threads) significantly
> more often than 'never', and generally get a lot more
> customization/flexibility of builds.  I haven't had a chance to sit
> down and think out all the edge cases of this idea, but what do people
> generally think of it?  I think the GitHub switchover will be a good
> time to do this if it's generally seen as a decent idea, since there
> will need to be some work on the buildmaster to do the switch anyway.
>

So we have slaves connect to multiple builders who have requirements of
what they are testing? So the --without-threads master would have all
slaves able to compile --without-threads connect to it and then do that
build? And those same slaves may also connect to the gcc and clang masters
to do those builds as well? So would that mean slaves could potentially do
a bunch of builds per change? That sounds nice to me as long as the slave
maintainers are also up to utilizing this by double/triple/quadrupling
their builds.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/2a93d741/attachment.html>

From chris.barker at noaa.gov  Wed Apr 13 16:39:42 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 13 Apr 2016 13:39:42 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
 <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
Message-ID: <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>

so are we worried that __fspath__ will exist and be callable, but  might
raise an AttributeError somewhere inside itself? if so isn't it broken
anyway, so should it be ignored?

and I know it's asking poermission rather than forgiveness, but what's
wrong with:

if hasattr(path, "__fspath__"):
    path = path.__fspath__()

if you really want to check for the existence of the attribute first?

or even:

path = path.__fspath__ if hasattr(path, "__fspath__") else path


(OK, really a Pythonic style question now....)

-CHB



On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon <brett at python.org> wrote:

>
>
> On Wed, 13 Apr 2016 at 12:39 Fred Drake <fred at fdrake.net> wrote:
>
>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> > Is that the intention, or should the exception catching be narrower? I
>> > know it's clunky to write it in Python, but AIUI it's less so in C:
>> >
>> > try:
>> >     callme = path.__fspath__
>> > except AttributeError:
>> >     pass
>> > else:
>> >     path = callme()
>>
>> +1 for this variant; I really don't like masking errors inside the
>> __fspath__ implementation.
>>
>
> Don't read too much into the code in that gist. I just did them quickly to
> get the point across of the proposals in terms of str/bytes, not what will
> be proposed in any final patch.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/4bdf6038/attachment.html>

From brett at python.org  Wed Apr 13 16:42:48 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 20:42:48 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
 <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
 <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
Message-ID: <CAP1=2W40VMN5NdAtcr6AMLaoaW2mL=RH-NvPnzLgb+u1UgKVOg@mail.gmail.com>

On Wed, 13 Apr 2016 at 13:40 Chris Barker <chris.barker at noaa.gov> wrote:

> so are we worried that __fspath__ will exist and be callable, but  might
> raise an AttributeError somewhere inside itself? if so isn't it broken
> anyway, so should it be ignored?
>

It should propagate instead of swallowing up the exception, otherwise it's
hard to debug why __fspath__ seems to be ignored.


>
> and I know it's asking permission rather than forgiveness, but what's
> wrong with:
>
> if hasattr(path, "__fspath__"):
>     path = path.__fspath__()
>
> if you really want to check for the existence of the attribute first?
>
>
Nothing.


> or even:
>
> path = path.__fspath__ if hasattr(path, "__fspath__") else path
>
>
That also works.


>
> (OK, really a Pythonic style question now....)
>

Yes, this is getting a bit side-tracked over some example code to just get
a concept across.

-Brett


>
> -CHB
>
>
>
> On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon <brett at python.org> wrote:
>
>>
>>
>> On Wed, 13 Apr 2016 at 12:39 Fred Drake <fred at fdrake.net> wrote:
>>
>>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico <rosuav at gmail.com>
>>> wrote:
>>> > Is that the intention, or should the exception catching be narrower? I
>>> > know it's clunky to write it in Python, but AIUI it's less so in C:
>>> >
>>> > try:
>>> >     callme = path.__fspath__
>>> > except AttributeError:
>>> >     pass
>>> > else:
>>> >     path = callme()
>>>
>>> +1 for this variant; I really don't like masking errors inside the
>>> __fspath__ implementation.
>>>
>>
>> Don't read too much into the code in that gist. I just did them quickly
>> to get the point across of the proposals in terms of str/bytes, not what
>> will be proposed in any final patch.
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/e66f2ecb/attachment.html>

From random832 at fastmail.com  Wed Apr 13 16:47:44 2016
From: random832 at fastmail.com (Random832)
Date: Wed, 13 Apr 2016 16:47:44 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
 <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
 <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
Message-ID: <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote:
> so are we worried that __fspath__ will exist and be callable, but  might
> raise an AttributeError somewhere inside itself? if so isn't it broken
> anyway, so should it be ignored?

Well, if you're going to say "ignore the protocol because it's broken",
where do you stop? What if it raises some other exception? What if it
raises SystemExit? 

From ericfahlgren at gmail.com  Wed Apr 13 17:02:27 2016
From: ericfahlgren at gmail.com (Eric Fahlgren)
Date: Wed, 13 Apr 2016 14:02:27 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
Message-ID: <030201d195c7$ca9de130$5fd9a390$@gmail.com>

On Wednesday, April 13, 2016 09:25, Victor Stinner wrote:
> The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per
> instruction instead of 3, so it also reduce the size of bytecode for the most
> common case.
> 
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3.
> Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit uses 4
> units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and
> 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for bytecode
> longer than 256 bytes.

A couple months ago during an earlier discussion of wordcode, I got curious enough to instrument dis.dis so that I could calculate the actual size changes expected in practice.  I ran it on a large chunk of our product code, here are the results (looks best with a fixed font).  I suspect the fairly significant reduction in footprint will also give better cache hit characteristics, so we might see some "magic" speed ups from that, too.

Code-generating source lines =    70,792
Total bytes                  = 1,196,653
Argument-bearing operators   =   380,978
Operands over 1 byte long    =    12,191
Extended arguments           =         0
Percentage of 1-byte args    = 96.80%

Total operators              =   434,697
Non-argument ops             =    53,719
One-byte args                =   368,787
Multi-byte args              =    12,191
Byte code size               = 1,196,653
Word code size               =   893,776
Word:byte size               = 74.69%

Just for the record, here's my arithmetic:
byteCodeSize     = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
wordCodeSize     = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs

(It is interesting to note that I have never encountered an EXTENDED_ARG operator in the wild, only in my own synthetic examples.)


From victor.stinner at gmail.com  Wed Apr 13 17:23:59 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 23:23:59 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <030201d195c7$ca9de130$5fd9a390$@gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <030201d195c7$ca9de130$5fd9a390$@gmail.com>
Message-ID: <CAMpsgwbvNpoi969FWDyNkxr1X3qeN-1yuc4nkRiq7RjU9rDvZA@mail.gmail.com>

2016-04-13 23:02 GMT+02:00 Eric Fahlgren <ericfahlgren at gmail.com>:
> Percentage of 1-byte args    = 96.80%

Yeah, I expected such high ratio. Good news that you confirm it.


> Non-argument ops             =    53,719
> One-byte args                =   368,787
> Multi-byte args              =    12,191

Again, only a very few arguments take multiple bytes. Good, the
bytecode will be smaller.

IMHO it's more a nice side effect than a real goal. The runtime
performance matters more than the size of the bytecode, it's not like
a bytecode take 4 MB. It's probably closer to 1 KB and so can probably
benefit of the fatest CPU caches.


> Just for the record, here's my arithmetic:
> byteCodeSize     = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
> wordCodeSize     = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs

If multiByteArgs means any size > 1 byte, the wordCodeSize formula is wrong:

- no parameter: 2 bytes
- 8-bit parameter: 2 bytes
- 16-bit parameter: 4 bytes
- 24-bit parameter: 6 bytes
- 32-bit parameter: 8 bytes

But you wrote that you didn't see EXTEND_ARG, so I guess that
multibyte means 16-bit in your case, and so your formula is correct.

Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
parameter for function with annotation.


> (It is interesting to note that I have never encountered an EXTENDED_ARG operator in the wild, only in my own synthetic examples.)

As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with
annotations.

Victor

From rymg19 at gmail.com  Wed Apr 13 17:29:05 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 13 Apr 2016 16:29:05 -0500
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
Message-ID: <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>

What is the value of HAS_ARG going to be now?

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
On Apr 13, 2016 11:26 AM, "Victor Stinner" <victor.stinner at gmail.com> wrote:

> Hi,
>
> In the middle of recent discussions about Python performance, it was
> discussed to change the Python bytecode. Serhiy proposed to reuse
> MicroPython short bytecode to reduce the disk space and reduce the
> memory footprint.
>
> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>    http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -        if (HAS_ARG(opcode))
> -            oparg = NEXTARG();
> +        oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>
> I reviewed first versions of the change, and IMHO it's almost ready to
> be merged. But I would prefer to have a review from a least a second
> core reviewer.
>
> Can someone please review the change?
>
> --
>
> The side effect of wordcode is that arguments in 0..255 now uses 2
> bytes per instruction instead of 3, so it also reduce the size of
> bytecode for the most common case.
>
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
> of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
> bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
> argument for keyword defaults and 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for
> bytecode longer than 256 bytes.
>
> --
>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>    http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/a305bdfe/attachment-0001.html>

From ericfahlgren at gmail.com  Wed Apr 13 17:35:27 2016
From: ericfahlgren at gmail.com (Eric Fahlgren)
Date: Wed, 13 Apr 2016 14:35:27 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwbvNpoi969FWDyNkxr1X3qeN-1yuc4nkRiq7RjU9rDvZA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <030201d195c7$ca9de130$5fd9a390$@gmail.com>
 <CAMpsgwbvNpoi969FWDyNkxr1X3qeN-1yuc4nkRiq7RjU9rDvZA@mail.gmail.com>
Message-ID: <CAP2Qz+VEy5GzXC-N69dzSwU2p696akj11WimRnSqM4vvWMq3iw@mail.gmail.com>

The EXTENDED_ARG is included in the multibyte ops, I treat it just like any
other operator.  Here's a snippet of my hacked-dis.dis output, which made
it clear to me that I could just count them as an "operator with word
operand."

Line 3000: x = x if x or not x and x is None else x
0001dc83 7c 00 00         LOAD_FAST           x
0001dc86 91 01 00         EXTENDED_ARG        1
0001dc89 70 9f dc         JUMP_IF_TRUE_OR_POP L1dc9f
0001dc8c 7c 00 00         LOAD_FAST           x
0001dc8f 0c               UNARY_NOT
0001dc90 91 01 00         EXTENDED_ARG        1
0001dc93 6f 9f dc         JUMP_IF_FALSE_OR_POPL1dc9f
0001dc96 7c 00 00         LOAD_FAST           x
0001dc99 74 01 00         LOAD_GLOBAL         None
0001dc9c 6b 08 00         COMPARE_OP          'is'
                  L1dc9f:
0001dc9f 91 01 00         EXTENDED_ARG        1
0001dca2 72 ab dc         POP_JUMP_IF_FALSE   L1dcab
0001dca5 7c 00 00         LOAD_FAST           x
0001dca8 6e 03 00         JUMP_FORWARD        L1dcae (+3)
                  L1dcab:
0001dcab 7c 00 00         LOAD_FAST           x
                  L1dcae:
0001dcae 7d 00 00         STORE_FAST          x


On Wed, Apr 13, 2016 at 2:23 PM, Victor Stinner <victor.stinner at gmail.com>
wrote:

> 2016-04-13 23:02 GMT+02:00 Eric Fahlgren <ericfahlgren at gmail.com>:
> > Percentage of 1-byte args    = 96.80%
>
> Yeah, I expected such high ratio. Good news that you confirm it.
>
>
> > Non-argument ops             =    53,719
> > One-byte args                =   368,787
> > Multi-byte args              =    12,191
>
> Again, only a very few arguments take multiple bytes. Good, the
> bytecode will be smaller.
>
> IMHO it's more a nice side effect than a real goal. The runtime
> performance matters more than the size of the bytecode, it's not like
> a bytecode take 4 MB. It's probably closer to 1 KB and so can probably
> benefit of the fatest CPU caches.
>
>
> > Just for the record, here's my arithmetic:
> > byteCodeSize     = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
> > wordCodeSize     = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs
>
> If multiByteArgs means any size > 1 byte, the wordCodeSize formula is
> wrong:
>
> - no parameter: 2 bytes
> - 8-bit parameter: 2 bytes
> - 16-bit parameter: 4 bytes
> - 24-bit parameter: 6 bytes
> - 32-bit parameter: 8 bytes
>
> But you wrote that you didn't see EXTEND_ARG, so I guess that
> multibyte means 16-bit in your case, and so your formula is correct.
>
> Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
> parameter for function with annotation.
>
>
> > (It is interesting to note that I have never encountered an EXTENDED_ARG
> operator in the wild, only in my own synthetic examples.)
>
> As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with
> annotations.
>
> Victor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/194a94ba/attachment.html>

From victor.stinner at gmail.com  Wed Apr 13 17:37:33 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 23:37:33 +0200
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
Message-ID: <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>

Le mercredi 13 avril 2016, Brett Cannon <brett at python.org> a ?crit :
>
> All of this is demonstrated in
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by
> the various possibilities. In the end it's not a corner case because the
> definition of __fspath__ will be such that there's no ambiguity in what
> os.fspath() will accept and what __fspath__ can return and the code will be
> written to conform to what the PEP dictates (IOW I'm aware that this needs
> to be considered in the implementation :) .
>

I'm not a big fan of a flag parameter to change the return type of a
function. Usually, two functions are preferred. In the os module we have
getcwd/getcwdb for example. I don't know if it's a good example

Do you know other examples of Python functions taking a (flag) parameter to
change the result type?

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/1ceeed08/attachment.html>

From victor.stinner at gmail.com  Wed Apr 13 17:39:29 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 23:39:29 +0200
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
Message-ID: <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>

Oops sorry, I forgot to add that I have no strong opinion on the type (I
only have a minor preference for str only).

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/5a0d7c42/attachment.html>

From zachary.ware+pydev at gmail.com  Wed Apr 13 16:50:52 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Wed, 13 Apr 2016 15:50:52 -0500
Subject: [Python-Dev] Tag-based buildmaster (was: Most 3.x buildbots are
 green again ... )
Message-ID: <CAKJDb-Pr0Cp5xVOxe9UYq62U-hVFp03jYKOZxqUVzkziKLqDWg@mail.gmail.com>

(Cross-posting to python-buildbots, discussion is probably best continued there)

On Wed, Apr 13, 2016 at 3:37 PM, Brett Cannon <brett at python.org> wrote:
> On Wed, 13 Apr 2016 at 13:17 Zachary Ware <zachary.ware+pydev at gmail.com>
> wrote:
>> After receiving a suggestion from koobs several months ago, I've been
>> intermittently thinking about completely redoing our buildmaster setup
>> such that instead of a single builder per version on each slave, we
>> instead set up a series of builders with particular 'tags', and each
>> builder attaches to each slave that satisfies the tags (running each
>> build only on the first slave available).  This would allow us to test
>> some of the rarer options (such as --without-threads) significantly
>> more often than 'never', and generally get a lot more
>> customization/flexibility of builds.  I haven't had a chance to sit
>> down and think out all the edge cases of this idea, but what do people
>> generally think of it?  I think the GitHub switchover will be a good
>> time to do this if it's generally seen as a decent idea, since there
>> will need to be some work on the buildmaster to do the switch anyway.
>
> So we have slaves connect to multiple builders who have requirements of what
> they are testing? So the --without-threads master would have all slaves able
> to compile --without-threads connect to it and then do that build? And those
> same slaves may also connect to the gcc and clang masters to do those builds
> as well? So would that mean slaves could potentially do a bunch of builds
> per change? That sounds nice to me as long as the slave maintainers are also
> up to utilizing this by double/triple/quadrupling their builds.

Basically, yes.  I'm unsure as to whether the build would be done on
all matching slaves on each change, or rotate between them (or use the
next available) on each change; that would likely come down to which
scheme we collectively want.  I also have vague ideas about having
'daily' or even 'weekly' tags for builds that are deemed to not need a
build for every changeset, which could alleviate some of the
multiplying.

-- 
Zach

From victor.stinner at gmail.com  Wed Apr 13 17:44:14 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 13 Apr 2016 23:44:14 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
Message-ID: <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>

Le mercredi 13 avril 2016, Ryan Gonzalez <rymg19 at gmail.com> a ?crit :

> What is the value of HAS_ARG going to be now?
>

I asked Demur to keep HAS_ARG(). Not really for backward compatibility, but
for the dis module: to keep a nice assembler. There are also debug traces
in ceval.c which use it.

For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0
rather than reading the bytecode) for operators with no argument. Or maybe
it's completly useless :-)

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/5251f952/attachment.html>

From rymg19 at gmail.com  Wed Apr 13 18:11:14 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 13 Apr 2016 17:11:14 -0500
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
 <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>
Message-ID: <CAO41-mM8Xsqv0xizAN3Hv9ZPvSkantioViSSdZdRj0QvwXzXGg@mail.gmail.com>

So code that depends on iterating through bytecode via HAS_ARG is going to
break...

Darn it. :/

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
On Apr 13, 2016 4:44 PM, "Victor Stinner" <victor.stinner at gmail.com> wrote:

> Le mercredi 13 avril 2016, Ryan Gonzalez <rymg19 at gmail.com> a ?crit :
>
>> What is the value of HAS_ARG going to be now?
>>
>
> I asked Demur to keep HAS_ARG(). Not really for backward compatibility,
> but for the dis module: to keep a nice assembler. There are also debug
> traces in ceval.c which use it.
>
> For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0
> rather than reading the bytecode) for operators with no argument. Or maybe
> it's completly useless :-)
>
> Victor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/f6051af4/attachment.html>

From victor.stinner at gmail.com  Wed Apr 13 18:19:42 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 00:19:42 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <CAMpsgwakxmpKzCvcCGB4BhjL1oLYmTmTAgDG7V0rCkOy9UQAkA@mail.gmail.com>

Oh, since others voted, I will also vote and explain my vote.

I like choice 1, str only, because it's very well defined. In Python
3, Unicode is simply the native type for text. It's accepted by almost
all functions. In other emails, I also explained that Unicode is fine
to store undecodable filenames on UNIX, it works as expected since
many years (since Python 3.3).

--

If you cannot survive without bytes, I suggest to add two functions:
one for str only, another which can return str or bytes.

Maybe you want in fact two protocols: __fspath__(str only) and
__fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or
fallback to os.fsencode(__fspath__). os.fspath() would first try
__fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not
worth to have such complexity while Unicode handles all use cases.

Or do you know functions implemented in Python accepting str *and* bytes?

--

The C implementation of the os module has an important
path_converter() function:

 * path_converter accepts (Unicode) strings and their
 * subclasses, and bytes and their subclasses.  What
 * it does with the argument depends on the platform:
 *
 *   * On Windows, if we get a (Unicode) string we
 *     extract the wchar_t * and return it; if we get
 *     bytes we extract the char * and return that.
 *
 *   * On all other platforms, strings are encoded
 *     to bytes using PyUnicode_FSConverter, then we
 *     extract the char * from the bytes object and
 *     return that.

This function will implement something like os.fspath().

With os.fspath() only accepting str, we will return directly the
Unicode string on Windows. On UNIX, Unicode will be encoded, as it's
already done for Unicode strings.

This specific function would benefit of the flavor 4 (os.fspath() can
return str and bytes), but it's more an exception than the rule. I
would be more a micro-optimization than a good reason to drive the API
design.

Victor

Le mercredi 13 avril 2016, Brett Cannon <brett at python.org> a ?crit :
>
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed).

From victor.stinner at gmail.com  Wed Apr 13 18:26:00 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 00:26:00 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAO41-mM8Xsqv0xizAN3Hv9ZPvSkantioViSSdZdRj0QvwXzXGg@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
 <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>
 <CAO41-mM8Xsqv0xizAN3Hv9ZPvSkantioViSSdZdRj0QvwXzXGg@mail.gmail.com>
Message-ID: <CAMpsgwZhkwxasLybJNToQmFV8ugjgGwmhxKScJeSfmgt77ktFA@mail.gmail.com>

2016-04-14 0:11 GMT+02:00 Ryan Gonzalez <rymg19 at gmail.com>:
> So code that depends on iterating through bytecode via HAS_ARG is going to
> break...

Sure. This change is backward incompatible for applications parsing
bytecode in C or Python. That's why the patch also has to update the
dis module.

I don't see how you plan to keep the backwad compatibility, since the
argument size changed from 2 bytes to 1 byte. You must update your
code (written in C or Python or whatever).

Hopefully, the dis was enhanced in Python 3.4: get_instructions() now
gives nice Instructon objects rather than only pure text output.

FYI I wrote my own library to decode and decode bytecode. It provides
abstract bytecode objects to easily modify bytecode:
https://bytecode.readthedocs.org/

I suggest to use such library (or simply the dis module for simple
needs) if you have to handle bytecode, rather than writing your own
code.

I know a few other projects which handle directly bytecode:

* https://pypi.python.org/pypi/codetransformer
* https://github.com/serprex/byteplay
* https://pypi.python.org/pypi/coverage

IHMO it's not a big deal to update these projects for the future
Python 3.6. I can even help them to support the new bytecode format.

Victor

From yselivanov.ml at gmail.com  Wed Apr 13 18:45:06 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Wed, 13 Apr 2016 18:45:06 -0400
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
Message-ID: <570ECBF2.7020205@gmail.com>



On 2016-04-13 12:24 PM, Victor Stinner wrote:
> Can someone please review the change?

+1 for the change.  I can take a look at the patch in a few days.

Yury

From Nikolaus at rath.org  Wed Apr 13 18:45:23 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Wed, 13 Apr 2016 15:45:23 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 (Brett Cannon's message of "Wed, 13 Apr 2016 17:10:09 +0000")
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <87vb3lcem4.fsf@thinkpad.rath.org>

On Apr 13 2016, Brett Cannon <brett at python.org> wrote:
> On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
> python-dev at python.org> wrote:
>
>> Ethan Furman <ethan <at> stoneleaf.us> writes:
>>
>> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
>> > allow bytes from __fspath__()?
>>
>> De-lurking. Especially since the ultimate goal is better interoperability,
>> I
>> feel like an implementation that people can play with would help guide the
>> few remaining decisions. To help test the various options you could
>> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
>> both
>> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults
>> for
>> each.
>>
>> In the spirit of Python 3 I feel like bytes might not be needed in
>> practice,
>> but something like this with defaults of False will allow people to easily
>> test all the various options.
>>
>
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
> the four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).


When passing an object that is of type str and has a __fspath__
attribute, all approaches return the value of __fspath__().

However, when passing something of type bytes, the second approach
returns the object, while the third returns the value of __fspath__().

Is this intentional? I think a __fspath__ attribute should always be
preferred.


Best,
-Nikolaus


-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From ethan at stoneleaf.us  Wed Apr 13 18:58:54 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 15:58:54 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <87vb3lcem4.fsf@thinkpad.rath.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org>
Message-ID: <570ECF2E.1070004@stoneleaf.us>

On 04/13/2016 03:45 PM, Nikolaus Rath wrote:

> When passing an object that is of type str and has a __fspath__
> attribute, all approaches return the value of __fspath__().
>
> However, when passing something of type bytes, the second approach
> returns the object, while the third returns the value of __fspath__().
>
> Is this intentional? I think a __fspath__ attribute should always be
> preferred.

Yes, it is intentional.  The second approach assumes __fspath__ can only 
contain str, so there is no point in checking it for bytes.

--
~Ethan~


From brett at python.org  Wed Apr 13 19:06:35 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 23:06:35 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <87vb3lcem4.fsf@thinkpad.rath.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org>
Message-ID: <CAP1=2W6EW8DaftH0DKzfna1n0CTnNK_MWof_yOY1=-761_GMoQ@mail.gmail.com>

On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath <Nikolaus at rath.org> wrote:

> On Apr 13 2016, Brett Cannon <brett at python.org> wrote:
> > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev <
> > python-dev at python.org> wrote:
> >
> >> Ethan Furman <ethan <at> stoneleaf.us> writes:
> >>
> >> > Do we allow bytes to be returned from os.fspath()?  If yes, then do we
> >> > allow bytes from __fspath__()?
> >>
> >> De-lurking. Especially since the ultimate goal is better
> interoperability,
> >> I
> >> feel like an implementation that people can play with would help guide
> the
> >> few remaining decisions. To help test the various options you could
> >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to
> >> both
> >> pathlib.__fspath__() and os.fspath(), with distinct configurable
> defaults
> >> for
> >> each.
> >>
> >> In the spirit of Python 3 I feel like bytes might not be needed in
> >> practice,
> >> but something like this with defaults of False will allow people to
> easily
> >> test all the various options.
> >>
> >
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has
> > the four potential approaches implemented (although it doesn't follow the
> > "separate functions" approach some are proposing and instead goes with
> the
> > allow_bytes approach I originally proposed).
>
>
> When passing an object that is of type str and has a __fspath__
> attribute, all approaches return the value of __fspath__().
>
> However, when passing something of type bytes, the second approach
> returns the object, while the third returns the value of __fspath__().
>
> Is this intentional? I think a __fspath__ attribute should always be
> preferred.
>

It's very much intentional. If we define __fspath__() to only return
strings but still want to minimize boilerplate of allowing bytes to simply
pass through without checking a path argument to see if it is bytes then
approach #2 is warranted. But if __fspath__() can return bytes then
approach #3 allows for it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/10deaf2e/attachment.html>

From brett at python.org  Wed Apr 13 19:09:57 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 13 Apr 2016 23:09:57 +0000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMpsgwakxmpKzCvcCGB4BhjL1oLYmTmTAgDG7V0rCkOy9UQAkA@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwakxmpKzCvcCGB4BhjL1oLYmTmTAgDG7V0rCkOy9UQAkA@mail.gmail.com>
Message-ID: <CAP1=2W7bRf8SzXJdY+6SM41WL93m235kKLvoPAijZh-6Ou9gAw@mail.gmail.com>

On Wed, 13 Apr 2016 at 15:20 Victor Stinner <victor.stinner at gmail.com>
wrote:

> Oh, since others voted, I will also vote and explain my vote.
>
> I like choice 1, str only, because it's very well defined. In Python
> 3, Unicode is simply the native type for text. It's accepted by almost
> all functions. In other emails, I also explained that Unicode is fine
> to store undecodable filenames on UNIX, it works as expected since
> many years (since Python 3.3).
>
> --
>
> If you cannot survive without bytes, I suggest to add two functions:
> one for str only, another which can return str or bytes.
>
> Maybe you want in fact two protocols: __fspath__(str only) and
> __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or
> fallback to os.fsencode(__fspath__). os.fspath() would first try
> __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not
> worth to have such complexity while Unicode handles all use cases.
>

Implementing two magic methods for this seems like overkill. Best I would
be willing to do with automatic encode/decode is use
os.fsencode()/os.fsdecode() on the argument or what __fspath__() returned.


>
> Or do you know functions implemented in Python accepting str *and* bytes?
>

On purpose, nothing off the top of my head.


>
> --
>
> The C implementation of the os module has an important
> path_converter() function:
>
>  * path_converter accepts (Unicode) strings and their
>  * subclasses, and bytes and their subclasses.  What
>  * it does with the argument depends on the platform:
>  *
>  *   * On Windows, if we get a (Unicode) string we
>  *     extract the wchar_t * and return it; if we get
>  *     bytes we extract the char * and return that.
>  *
>  *   * On all other platforms, strings are encoded
>  *     to bytes using PyUnicode_FSConverter, then we
>  *     extract the char * from the bytes object and
>  *     return that.
>
> This function will implement something like os.fspath().
>
> With os.fspath() only accepting str, we will return directly the
> Unicode string on Windows. On UNIX, Unicode will be encoded, as it's
> already done for Unicode strings.
>
> This specific function would benefit of the flavor 4 (os.fspath() can
> return str and bytes), but it's more an exception than the rule. I
> would be more a micro-optimization than a good reason to drive the API
> design.
>

Yep, it's interesting to know but Chris and I won't let it drive the
decision (I assume).

-Brett


>
> Victor
>
> Le mercredi 13 avril 2016, Brett Cannon <brett at python.org> a ?crit :
> >
> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1
> has the four potential approaches implemented (although it doesn't follow
> the "separate functions" approach some are proposing and instead goes with
> the allow_bytes approach I originally proposed).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/818e2cf4/attachment.html>

From chris.barker at noaa.gov  Wed Apr 13 20:06:41 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 13 Apr 2016 17:06:41 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
 <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
 <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
 <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com>
Message-ID: <CALGmxEKymtZU38wbO9imU2Ue7=HzU1fCCwWe+8m6MOmRHMJ4dQ@mail.gmail.com>

On Wed, Apr 13, 2016 at 1:47 PM, Random832 <random832 at fastmail.com> wrote:

> On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote:
> > so are we worried that __fspath__ will exist and be callable, but  might
> > raise an AttributeError somewhere inside itself? if so isn't it broken
> > anyway, so should it be ignored?
>
> Well, if you're going to say "ignore the protocol because it's broken",
> where do you stop? What if it raises some other exception? What if it
> raises SystemExit?


this is pretty much always the case with EAFTP coding:

try:
    something()
except SomeError:
    do_something_else()

unless SomeError is a custom defined error that you know is never going to
get raised anywhere else, then something() could raise SomeError for the
reason you expect, or some code deep in the call stack could raise
SomeError also, and you wouldn't know that.

I had a student run into this and it took him a good while to debug it. But
that was because the code in something() was pretty darn buggy. If he had
tested something() by itself, there would have been no issue finding the
problem.

In this case, I don't know that we need to be tolerant of buggy
__fspathname__() implementations -- they should be tested outside these
checks, and not be buggy. So a buggy implementation may raise and may be
ignored, depending on what Exception the bug triggers -- big deal. The only
time it would matter is when the implementer is debugging the
implementation.

-CHB





-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/ac985836/attachment.html>

From ethan at stoneleaf.us  Wed Apr 13 20:29:19 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 17:29:19 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CALGmxEKymtZU38wbO9imU2Ue7=HzU1fCCwWe+8m6MOmRHMJ4dQ@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAPTjJmqo7yocuN73BsVTAzysvphZP-u93VjP+ckfeHiCADKJNw@mail.gmail.com>
 <CAFT4OTEv2NXMku=aQBdeq2YUP-gYeURSgsWnNdj9yD8P=aL3Sw@mail.gmail.com>
 <CAP1=2W7PqGQhcAbHXE7O2kTwqgS=0g+oGxYG4xLUwXeLE_y3xg@mail.gmail.com>
 <CALGmxELo59SkNwZNZsK6+jGJ+M8aTLNPa1Pj+5ptem_-rh6uPA@mail.gmail.com>
 <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com>
 <CALGmxEKymtZU38wbO9imU2Ue7=HzU1fCCwWe+8m6MOmRHMJ4dQ@mail.gmail.com>
Message-ID: <570EE45F.2060303@stoneleaf.us>

On 04/13/2016 05:06 PM, Chris Barker wrote:

> In this case, I don't know that we need to be tolerant of buggy
> __fspathname__() implementations -- they should be tested outside these
> checks, and not be buggy. So a buggy implementation may raise and may be
> ignored, depending on what Exception the bug triggers -- big deal. The
> only time it would matter is when the implementer is debugging the
> implementation.

Yet the idea behind robust exception handling is to test as little as 
possible and only catch what you know how to correct.

This code catches only one thing, only at one place, and we know how to 
deal with it:

   try:
      fsp = obj.__fspath__
   except AttributeError:
      pass
   else:
      fsp = fsp()

Contrarily, this next code catches the same error, but it could happen 
at the one place we know how to deal with it *or* anywhere further down 
the call stack where we have no clue what the proper course is to handle 
the problem... yet we suppress it anyway:

   try:
     fsp = obj.__fspath__()
   except AttributeError:
     pass

Certainly not code I want to see in the stdlib.

--
~Ethan~

From random832 at fastmail.us  Wed Apr 13 19:55:32 2016
From: random832 at fastmail.us (Random832)
Date: Wed, 13 Apr 2016 19:55:32 -0400
Subject: [Python-Dev] pathlib - current status of discussions
Message-ID: <20160414003711.079666800CE@frontend2.nyi.internal>


On Apr 13, 2016 19:06, Brett Cannon <brett at python.org> wrote:
> On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath <Nikolaus at rath.org> wrote:
>> When passing an object that is of type str and has a __fspath__
>> attribute, all approaches return the value of __fspath__().
>>
>> However, when passing something of type bytes, the second approach
>> returns the object, while the third returns the value of __fspath__().
>>
>> Is this intentional? I think a __fspath__ attribute should always be
>> preferred.
>
>
> It's very much intentional. If we define __fspath__() to only return strings but still want to minimize boilerplate of allowing bytes to simply pass through without checking a path argument to see if it is bytes then approach #2 is warranted. But if __fspath__() can return bytes then approach #3 allows for it.?

Er, the difference comes in when the object passed to os.fspath is a subclass of bytes that, itself, has a __fspath__ method (which may return a str). It's unlikely to occur in the wild, but is a semantic difference between this case and all other objects with __fspath__ methods.

From random832 at fastmail.us  Wed Apr 13 20:25:28 2016
From: random832 at fastmail.us (Random832)
Date: Wed, 13 Apr 2016 20:25:28 -0400
Subject: [Python-Dev] pathlib - current status of discussions
Message-ID: <20160414003712.1B876680160@frontend2.nyi.internal>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160413/5ceca972/attachment.html>

From ncoghlan at gmail.com  Wed Apr 13 22:49:09 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 12:49:09 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
Message-ID: <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>

On 14 April 2016 at 07:37, Victor Stinner <victor.stinner at gmail.com> wrote:
> Le mercredi 13 avril 2016, Brett Cannon <brett at python.org> a ?crit :
>>
>> All of this is demonstrated in
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the
>> various possibilities. In the end it's not a corner case because the
>> definition of __fspath__ will be such that there's no ambiguity in what
>> os.fspath() will accept and what __fspath__ can return and the code will be
>> written to conform to what the PEP dictates (IOW I'm aware that this needs
>> to be considered in the implementation :) .
>
> I'm not a big fan of a flag parameter to change the return type of a
> function. Usually, two functions are preferred. In the os module we have
> getcwd/getcwdb for example. I don't know if it's a good example

It is, as one of the benefits of the "two separate functions" model is
to improve type inference during static analysis - you don't
necessarily know the values of parameters at analysis time, but you do
know which function is being called.

> Do you know other examples of Python functions taking a (flag) parameter to
> change the result type?

subprocess.Popen has a couple of flags that can do that (more
precisely, they change the return type of some methods on the
resulting object), but that's not an especially pretty API in general.
String based type variations are more common (e.g. file mode flags,
using the codec module registry), but they're still used only
sparingly (since they make the code harder to reason about for both
humans and static analysers).

In terms of types for filesystem path APIs:

1. I assume we'll want a fast path for bytes & str to avoid
performance regressions (especially in os.path, where we may be doing
pure data manipulation without any IO operations)
2. I favour defining __fspath__ and os.fspath() in terms of what the
os and os.path modules need to handle both DirEntry and pathlib (which
I currently expect to be str-or-bytes)
3. For the benefit of higher level cross-platform code like pathlib,
it likely makes sense to also have a str-only API that throws an
exception rather than returning bytes

However, I also suggest deferring a decision on 3 until 2 has been
definitively answered by way of implementing the changes. If I'm right
about 2, then the API could be something like:

- os.fspath -> str-or-bytes
- os.fsencode -> bytes (with coercion from str)
- os.fsdecode -> str (with coercion from bytes)
- os.strpath -> str (no coercion)

It's also worth noting that os.fsencode and os.fsdecode are already
idempotent - their current signatures are "str-or-bytes -> bytes" and
"str-or-bytes -> str". With a str-or-bytes return type on os.fspath,
adapting them to handle rich path objects should just be a matter of
adding an os.fspath call as the first step.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From Nikolaus at rath.org  Wed Apr 13 22:57:57 2016
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Wed, 13 Apr 2016 19:57:57 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570ECF2E.1070004@stoneleaf.us> (Ethan Furman's message of "Wed, 
 13 Apr 2016 15:58:54 -0700")
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
Message-ID: <87oa9c3nii.fsf@vostro.rath.org>

On Apr 13 2016, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/13/2016 03:45 PM, Nikolaus Rath wrote:
>
>> When passing an object that is of type str and has a __fspath__
>> attribute, all approaches return the value of __fspath__().
>>
>> However, when passing something of type bytes, the second approach
>> returns the object, while the third returns the value of __fspath__().
>>
>> Is this intentional? I think a __fspath__ attribute should always be
>> preferred.
>
> Yes, it is intentional.  The second approach assumes __fspath__ can
> only contain str, so there is no point in checking it for bytes.

Either I haven't understood your answer, or you haven't understood my
question. I'm concerned about this case:

  class Special(bytes):
      def __fspath__(self):
        return 'str-val'
  obj = Special('bytes-val', 'utf8')
  path_obj = fspath(obj, allow_bytes=True)  

With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.

I would expect that fspath(obj, allow_bytes=True) == 'str-val' (after
all, it's allow_bytes, not require_bytes). Bu


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From ncoghlan at gmail.com  Wed Apr 13 23:04:09 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 13:04:09 +1000
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwZhkwxasLybJNToQmFV8ugjgGwmhxKScJeSfmgt77ktFA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
 <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>
 <CAO41-mM8Xsqv0xizAN3Hv9ZPvSkantioViSSdZdRj0QvwXzXGg@mail.gmail.com>
 <CAMpsgwZhkwxasLybJNToQmFV8ugjgGwmhxKScJeSfmgt77ktFA@mail.gmail.com>
Message-ID: <CADiSq7eH_DE_035veojyh+ja6HTJAbbdgT_Fq67ARCqBWzZu-w@mail.gmail.com>

On 14 April 2016 at 08:26, Victor Stinner <victor.stinner at gmail.com> wrote:
> 2016-04-14 0:11 GMT+02:00 Ryan Gonzalez <rymg19 at gmail.com>:
>> So code that depends on iterating through bytecode via HAS_ARG is going to
>> break...
>
> Sure. This change is backward incompatible for applications parsing
> bytecode in C or Python. That's why the patch also has to update the
> dis module.
>
> I don't see how you plan to keep the backwad compatibility, since the
> argument size changed from 2 bytes to 1 byte. You must update your
> code (written in C or Python or whatever).
>
> Hopefully, the dis was enhanced in Python 3.4: get_instructions() now
> gives nice Instructon objects rather than only pure text output.
>
> FYI I wrote my own library to decode and decode bytecode. It provides
> abstract bytecode objects to easily modify bytecode:
> https://bytecode.readthedocs.org/
>
> I suggest to use such library (or simply the dis module for simple
> needs) if you have to handle bytecode, rather than writing your own
> code.
>
> I know a few other projects which handle directly bytecode:
>
> * https://pypi.python.org/pypi/codetransformer
> * https://github.com/serprex/byteplay
> * https://pypi.python.org/pypi/coverage
>
> IHMO it's not a big deal to update these projects for the future
> Python 3.6. I can even help them to support the new bytecode format.

+1

We've also had previous discussions on adding a "minimum viable
bytecode editing" API to the standard library, and updating these
third party modules to support wordcode instead of bytecode could
provide a good use-case-driven opportunity for defining that (i.e. it
wouldn't be about providing an end user facing API directly, but
rather about letting CPython take care of the bookkeeping details for
things like lnotab and sorting out jump targets).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ethan at stoneleaf.us  Wed Apr 13 23:14:44 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 13 Apr 2016 20:14:44 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <87oa9c3nii.fsf@vostro.rath.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org>
Message-ID: <570F0B24.50705@stoneleaf.us>

On 04/13/2016 07:57 PM, Nikolaus Rath wrote:
> On Apr 13 2016, Ethan Furman wrote:
>> On 04/13/2016 03:45 PM, Nikolaus Rath wrote:

>>> When passing an object that is of type str and has a __fspath__
>>> attribute, all approaches return the value of __fspath__().
>>>
>>> However, when passing something of type bytes, the second approach
>>> returns the object, while the third returns the value of __fspath__().
>>>
>>> Is this intentional? I think a __fspath__ attribute should always be
>>> preferred.
>>
>> Yes, it is intentional.  The second approach assumes __fspath__ can
>> only contain str, so there is no point in checking it for bytes.
>
> Either I haven't understood your answer, or you haven't understood my
> question. I'm concerned about this case:
>
>    class Special(bytes):
>        def __fspath__(self):
>          return 'str-val'
>    obj = Special('bytes-val', 'utf8')
>    path_obj = fspath(obj, allow_bytes=True)
>
> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.

I misunderstood your question.  That is... an interesting case.  ;)

--
~Ethan~


From ncoghlan at gmail.com  Wed Apr 13 23:17:36 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 13:17:36 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
 <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>
Message-ID: <CADiSq7cn_TAnt_+-VEq-9gWRM6V9=4L777bDK+eyaPfM1rgcYA@mail.gmail.com>

On 14 April 2016 at 12:49, Nick Coghlan <ncoghlan at gmail.com> wrote:
> The API could be something like:
>
> - os.fspath -> str-or-bytes
> - os.fsencode -> bytes (with coercion from str)
> - os.fsdecode -> str (with coercion from bytes)
> - os.strpath -> str (no coercion)

There seems to be fairly broad opposition to the idea of defining the
public API in terms of what os and os.path are likely to need, which
reminded me of Koos's suggestion of using a private API for the
str-or-bytes variant. That approach would give us something like:

- os.fspath -> str (no coercion)
- os.fsdecode -> str (with coercion from bytes)
- os.fsencode -> bytes (with coercion from str)
- os._raw_fspath -> str-or-bytes (no coercion)

(with "coercion" referring to how the result of __fspath__ and any
directly passed in str or bytes objects are handled)

The leading underscore on _raw_fspath would be of the "this is a
documented and stable API, but you probably don't want to use it
unless you really know what you're doing" variety, rather than the
"this is an undocumented and potentially unstable private API"
variety.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Wed Apr 13 23:27:41 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 13:27:41 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570F0B24.50705@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us>
Message-ID: <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>

On 14 April 2016 at 13:14, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/13/2016 07:57 PM, Nikolaus Rath wrote:
>> Either I haven't understood your answer, or you haven't understood my
>> question. I'm concerned about this case:
>>
>>    class Special(bytes):
>>        def __fspath__(self):
>>          return 'str-val'
>>    obj = Special('bytes-val', 'utf8')
>>    path_obj = fspath(obj, allow_bytes=True)
>>
>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>
> I misunderstood your question.  That is... an interesting case.  ;)

In this kind of case, inheritance tends to trump protocol. For
example, int subclasses can't override operator.index:

>>> from operator import index
>>> class NotAnInt():
...     def __index__(self):
...         return 42
...
>>> index(NotAnInt())
42
>>> class MyInt(int):
...     def __index__(self):
...         return 42
...
>>> index(MyInt(53))
53

The reasons for that behaviour are more pragmatic than philosophical:
builtins and their subclasses are extensively special-cased for speed
reasons, and those shortcuts are encountered before the interpreter
even considers using the general protocol.

In cases where the magic method return types are polymorphic (so
subclasses may want to override them) we'll use more restrictive exact
type checks for the shortcuts, but that argument doesn't apply for
typechecked protocols where the result is required to be an instance
of a particular builtin type (but subclasses are considered
acceptable).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From random832 at fastmail.com  Wed Apr 13 23:54:52 2016
From: random832 at fastmail.com (Random832)
Date: Wed, 13 Apr 2016 23:54:52 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7cn_TAnt_+-VEq-9gWRM6V9=4L777bDK+eyaPfM1rgcYA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
 <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>
 <CADiSq7cn_TAnt_+-VEq-9gWRM6V9=4L777bDK+eyaPfM1rgcYA@mail.gmail.com>
Message-ID: <1460606092.516946.578278417.49F19066@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote:

> - os.fspath -> str (no coercion)
> - os.fsdecode -> str (with coercion from bytes)
> - os.fsencode -> bytes (with coercion from str)
> - os._raw_fspath -> str-or-bytes (no coercion)
> 
> (with "coercion" referring to how the result of __fspath__ and any
> directly passed in str or bytes objects are handled)
> 
> The leading underscore on _raw_fspath would be of the "this is a
> documented and stable API, but you probably don't want to use it
> unless you really know what you're doing" variety, rather than the
> "this is an undocumented and potentially unstable private API"
> variety.

In this scenario could the protocol return bytes?

If the protocol cannot return bytes, then _raw_fspath will only return
bytes if directly passed bytes. This limits its utility for the
functions that consume it (presumably path_convert (os.open and friends)
and builtin open), since they already have to act specially based on the
types of their arguments (builtin open can accept an integer;
path_convert has to behave radically differently on str or bytes input)
and there's no reason they couldn't simply accept bytes directly while
they're doing that.

If the protocol can return bytes, then that means that types (DirEntry?
someone had an alternate path library with a bPath?) which return bytes
via the protocol will proliferate, and cannot be safely passed to
anything that uses os.fspath. Numerous copies of "def myfspath(x):
return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
monkey-patch os.fspath), and no-one actually uses os.fspath except toy
examples.

Why is it so objectionable for os.fspath to do coercion?

From random832 at fastmail.com  Thu Apr 14 00:05:43 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 00:05:43 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us>
 <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
Message-ID: <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com>

On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote:
> In this kind of case, inheritance tends to trump protocol. For
> example, int subclasses can't override operator.index:
...
> The reasons for that behaviour are more pragmatic than philosophical:
> builtins and their subclasses are extensively special-cased for speed
> reasons, and those shortcuts are encountered before the interpreter
> even considers using the general protocol.
> 
> In cases where the magic method return types are polymorphic (so
> subclasses may want to override them) we'll use more restrictive exact
> type checks for the shortcuts, but that argument doesn't apply for
> typechecked protocols where the result is required to be an instance
> of a particular builtin type (but subclasses are considered
> acceptable).

Then why aren't we doing it for str? Because "try: path =
path.__fspath__()" is more idiomatic than the alternative?

If some sort of reasoned decision has been made to require the protocol
to trump the special case for str subclasses, it's unreasonable not to
apply the same decision to bytes subclasses. The decision should be
"always use the protocol first" or "always use the type match first".

In other words, why not this:

def fspath(path, *, allow_bytes=False):
    if isinstance(path, (bytes, str) if allow_bytes else str)
        return path
    try:
        m = path.__fspath__
    except AttributeError:
        raise TypeError
    path = m()
    if isinstance(path, (bytes, str) if allow_bytes else str)
            return path
    raise TypeError

From ncoghlan at gmail.com  Thu Apr 14 02:00:22 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 16:00:22 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <1460606092.516946.578278417.49F19066@webmail.messagingengine.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
 <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>
 <CADiSq7cn_TAnt_+-VEq-9gWRM6V9=4L777bDK+eyaPfM1rgcYA@mail.gmail.com>
 <1460606092.516946.578278417.49F19066@webmail.messagingengine.com>
Message-ID: <CADiSq7ezm1JRkeSjGo7P44g9bX7-gGKPEG2nb6Kftz9-QfkMNA@mail.gmail.com>

On 14 April 2016 at 13:54, Random832 <random832 at fastmail.com> wrote:
> On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote:
>
>> - os.fspath -> str (no coercion)
>> - os.fsdecode -> str (with coercion from bytes)
>> - os.fsencode -> bytes (with coercion from str)
>> - os._raw_fspath -> str-or-bytes (no coercion)
>>
>> (with "coercion" referring to how the result of __fspath__ and any
>> directly passed in str or bytes objects are handled)
>>
>> The leading underscore on _raw_fspath would be of the "this is a
>> documented and stable API, but you probably don't want to use it
>> unless you really know what you're doing" variety, rather than the
>> "this is an undocumented and potentially unstable private API"
>> variety.
>
> In this scenario could the protocol return bytes?

Yes, that's desirable to handle DirEntry transparently regardless of type.

> If the protocol can return bytes, then that means that types (DirEntry?
> someone had an alternate path library with a bPath?) which return bytes
> via the protocol will proliferate, and cannot be safely passed to
> anything that uses os.fspath. Numerous copies of "def myfspath(x):
> return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> examples.

If folks want coercion, they can just use os.fsdecode(x), as that
already has a str -> str passthrough from the input to the output
(unlike codecs.decode) and will presumably be updated to include an
implicit call to os._raw_fspath() on the passed in object.

> Why is it so objectionable for os.fspath to do coercion?

The first problem is that binary paths on Windows basically don't
work, so it's preferable for them to fail fast regardless of platform,
rather than to have them implicitly work on *nix, only to fail for
Windows users using non-ASCII paths later.

The second is that it would make os.fspath and os.fsdecode
functionally equivalent, so we'd have two different spellings for the
same operation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Apr 14 02:09:17 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 16:09:17 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us>
 <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
 <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com>
Message-ID: <CADiSq7eY3xChtL5rVwBDR0=XFK1tADX3AGLF5HWpr+YnGh83Pg@mail.gmail.com>

On 14 April 2016 at 14:05, Random832 <random832 at fastmail.com> wrote:
> On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote:
>> In this kind of case, inheritance tends to trump protocol. For
>> example, int subclasses can't override operator.index:
> ...
>> The reasons for that behaviour are more pragmatic than philosophical:
>> builtins and their subclasses are extensively special-cased for speed
>> reasons, and those shortcuts are encountered before the interpreter
>> even considers using the general protocol.
>>
>> In cases where the magic method return types are polymorphic (so
>> subclasses may want to override them) we'll use more restrictive exact
>> type checks for the shortcuts, but that argument doesn't apply for
>> typechecked protocols where the result is required to be an instance
>> of a particular builtin type (but subclasses are considered
>> acceptable).
>
> Then why aren't we doing it for str? Because "try: path =
> path.__fspath__()" is more idiomatic than the alternative?

The sketches Brett posted will bear little resemblance to the actual
implementation - that will be in C and use similar idioms to those we
use for other abstract protocols (such as shortcuts for instances of
builtin types, and doing the method lookup via the passed in object's
type, rather than on the instance).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From stephen at xemacs.org  Thu Apr 14 02:55:49 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 Apr 2016 15:55:49 +0900
Subject: [Python-Dev] Pathlib enhancements - improve fsdecode and fsencode
In-Reply-To: <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
Message-ID: <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp>

Please please please, junk both "filter out bytes" proposals.

Since they involve an exception, they impose an unnecessary "try" on
all text applications that fear death on bytes returns.  May as well
just wrap all objects with __fspath__ in fsdecode, and all is
happy.

Counterproposal: make fsdecode and fsencode grok __fspath__.  Then:
(1) Bytes-lovers and str-addicts are both safe.
(2) They can omit fspath, too!

No, that doesn't work if the bytes objects aren't in the file system
encoding, but these are *bytes*, mon ami: you have no way to find out
what that encoding is, so you either know already and you substitute
that + fspath for fsdecode, or you're hosed.  And in the only concrete
use case so far, fsdecode Just Works.

I suppose a similar argument holds for applications that want bytes
and fsencode, but I leave that as an exercise for the reader.


From stephen at xemacs.org  Thu Apr 14 03:02:36 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 Apr 2016 16:02:36 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
Message-ID: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>

I was going to read the new posts that came in since I started this
one (at one point it was 5X as long as it is now), but this thread is
way out of control.  My apologies to anybody who has presented[1] use
cases in support of the wildly speculative proposals under discussion,
but my bet is that there have been none.

Victor Stinner writes:

 > Oops sorry, I forgot to add that I have no strong opinion on the type (I
 > only have a minor preference for str only).

I have a strong preference for str only, because I still don't see a
use case for polymorphic __fspath__.

os functions and os.path functions need to *accept* both str and bytes
because they are interfaces to OS functionality used by both text and
non-text applications, and so must check and convert to OS native type.
Many of these function produce what they receive because both text and
non-text applications use names of filesystem objects internally, as
well as passing them to OS wrappers.  The question is how far to take
that logic.

So let me propose what I think is the elephant in the room.  If you're
going to have a polymorphic __fspath__, then pathlib is *the* example
of a module that *desperately* needs to be polymorphic.  Consider:

    A non-text Application has some bytes and passes them to
        pathlib.Path as <type A>
    manipulates them and passes the result to
        os.scandir as <type B>
    expecting a return of
        DirEntries of <type C>

<type A> == <type C> == bytes, and <type B> == Path is TOOWTDI, no?
But under the current proposal which doesn't touch the internal
mechanisms of pathlib and allows, but has no way to request, bytes
returns, <type A> == str, <type B> == Path, and <type C> == str,
requiring two explicit conversions that bytes-shoveling developers
will tell you should be unnecessary.  QED, pathlib should be
polymorphic as a central part of this proposal.

IMO that's not the right way to go (slippery slope, very quickly you
hit manipulations that are "really" text operations).  See also my
proposal "Pathlib enhancements - improve fsdecode and fsencode" which
suggests a (primitive) way for code to request the type it likes
better.

But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
(so far he's been in the "pathlib is a text utility" camp).


Footnotes: 
[1]  Just because I don't know of any I consider persuasive doesn't
mean there aren't any, but what you don't tell me I don't know.
(Maybe you'd have to kill me?  If so, thanks for not telling!)


From cybersol at yahoo.com  Thu Apr 14 03:03:00 2016
From: cybersol at yahoo.com (Michael Mysinger)
Date: Thu, 14 Apr 2016 07:03:00 +0000 (UTC)
Subject: [Python-Dev] pathlib - current status of discussions
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <loom.20160414T083232-765@post.gmane.org>

Brett Cannon <brett <at> python.org> writes:

> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1?has 
the four potential approaches implemented (although it doesn't follow the 
"separate functions" approach some are proposing and instead goes with the 
allow_bytes approach I originally proposed).?
> 

Thanks Brett, it is definitely a start! Maybe I am just more unimaginative 
than most, but since interoperability is the goal, I would ideally be able 
to play with a full implementation where all the stdlib functions Nick 
originally mentioned accepted these "rich path" objects. 

However, for concrete example purposes, maybe it is sufficient to start with 
your fspath function, a toy RichPath class implementing __fspath__, and 
something like os.path.join, which is a meaty enough example to test some of 
the functionality. I posted a gist of a string only example at 
https://gist.github.com/mmysinger/0b5ae2cfb866f7013c387a2683c7fc39

After playing with and considering the 4 possibilities, anything where 
__fspath__ can return bytes seems like insanity that flies in the face of 
everything Python 3 is trying to accomplish. In particular, one RichPath 
class might return bytes and another str, or even worse the same class might 
sometimes return bytes and sometimes str. When will os.path.join blow up due 
to mixing bytes and str and when will it work in those situations? So for me 
that eliminates #3 and #4.

Also the version #2 accepting bytes in os.fspath felt like it could be a 
very minor convenience, but even the str only version #1 is just requires 
one isinstance check in the rare case you need to also deal with bytes (see 
the os.path.join example in the gist above). So I lean toward the str only 
#1 version. 

In any case I would start with the strict str only full implementation and 
loosen it either in 3.6 or 3.7 depending on what people think after actually 
using it.


From storchaka at gmail.com  Thu Apr 14 04:36:29 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 Apr 2016 11:36:29 +0300
Subject: [Python-Dev] Bytes path
Message-ID: <nenkqe$glb$1@ger.gmane.org>

What types should be accepted as bytes path?

For now os.path is strict and accepts only bytes and bytes subclasses 
(even bytearray is not accepted) as bytes path. This is enough for 
working with low-level Posix paths and supporting backward compatibility.

On other hand, most os functions is too permissive since 3.3 and accept 
any type that supports the buffer protocol as bytes path. Accepted even 
such meaningless objects as array('h').

Some functions (zipimport.zipimporter() in 3.x, _imp.load_dynamic() in 
3.3+, builtin compile() etc in 3.4) accept even arbitrary iterables, 
e.g. [116, 101, 115, 116] (see http://bugs.python.org/issue26754).

I think we should accept only bytes (and subclasses). Even bytearray is 
less acceptable since it is mutable and can't be used as a key in caches.


From storchaka at gmail.com  Thu Apr 14 04:51:53 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 Apr 2016 11:51:53 +0300
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <nekii9$9lr$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
Message-ID: <nenln9$v2a$1@ger.gmane.org>

On 13.04.16 07:39, Terry Reedy wrote:
> On 4/4/2016 5:05 PM, Terry Reedy wrote:
>
> Since a few days, I am getting bug tracker emails again, in my Inbox.  I
> just got a Rietveld review in the Inbox and I believe it went there
> directly instead of first to Junk.  Thank you to whoever made the
> improvements.

AFAIK David just disabled IPv6 support.

Most bug tracker emails still went in the Spam folder. I have a filter 
for Roundap emails, but there is no any mark that I can use for 
filtering Rietveld emails.


From storchaka at gmail.com  Thu Apr 14 05:15:01 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 Apr 2016 12:15:01 +0300
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <nenn2m$lqu$1@ger.gmane.org>

On 13.04.16 14:40, Victor Stinner wrote:
> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.

Excelent! Many thanks for doing this. And new features of regrtest look 
nice.

> So please try to not break buildbots again and remind to watch them sometimes:
>
>    http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable

A desirable but nonexistent feature is to write emails to authors of 
commits that broke buildbots. How hard to implement this?

> Next weeks, I will try to backport some fixes to Python 3.5 (if
> needed) to make these buildbots more stable too.
>
> Python 2.7 buildbots are also in a sad state (ex: test_marshal
> segfaults on Windows, see issue #25264). But it's not easy to get a
> Windows with the right compiler to develop on Python 2.7 on Windows.

What are you think about backporting recent regrtest to 2.7? Most needed 
features to me are the -m and -G options.

> Maybe it's time to move more 3.x buildbots to the "stable" category?
> http://buildbot.python.org/all/waterfall?category=3.x.stable

+1

> By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
> considered as stable since it's failing with multiple issues since
> many months and nobody is working on these failures. I suggest to move
> this buildbot back to the unstable category.

I think the main cause is the lack of memory in this buildbot. I tried 
to minimize memory consumption and leaks, but some leaks are left, and 
they provoke other tests failures, and additional resource leaks. Would 
be nice to add a feature for running every test in separate subprocess. 
This will isolate the effect of failed tests.



From p.f.moore at gmail.com  Thu Apr 14 06:07:49 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 14 Apr 2016 11:07:49 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CACac1F_JO==kPDthSGFtZRzgY7OVRm=4pXY7XHMJauaHFoO58g@mail.gmail.com>

On 14 April 2016 at 08:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> So let me propose what I think is the elephant in the room.  If you're
> going to have a polymorphic __fspath__, then pathlib is *the* example
> of a module that *desperately* needs to be polymorphic.  Consider:
>
>     A non-text Application has some bytes and passes them to
>         pathlib.Path as <type A>
>     manipulates them and passes the result to
>         os.scandir as <type B>
>     expecting a return of
>         DirEntries of <type C>
>
> <type A> == <type C> == bytes, and <type B> == Path is TOOWTDI, no?

I'm not sure I follow this logic at all. But from my reading your
argument contradicts your conclusion, so maybe I'm misunderstanding.

To me, the "obvious" conclusion is that pathlib is not appropriate in
non-text applications, because <type A> *cannot* be bytes (the
constructor rejects bytes). I see no reason to change that - non-text
applications are inherently low level, and shouldn't expect to use
high-level abstractions like pathlib.

> But under the current proposal which doesn't touch the internal
> mechanisms of pathlib and allows, but has no way to request, bytes
> returns, <type A> == str, <type B> == Path, and <type C> == str,
> requiring two explicit conversions that bytes-shoveling developers
> will tell you should be unnecessary.  QED, pathlib should be
> polymorphic as a central part of this proposal.

Nope, QED pathlib is not a low level abstraction.

So your argument to me doesn't help much, because it's a given that
pathlib is str-only. The debate is about how things like scandir
(specifically DirEntry objects) and Ethan's pathlib replacement, which
*do* allow bytes in and out, should participate in the new protocol,
when they are bytes (they obviously should work just like pathlib when
they are strings).

In my opinion, they *shouldn't* the new protocol should be string-only
(at least initially).

If I understand (from a couple of brief mentions) Ethan has a
string-like path object and a bytes-like path object, so he could
support fspath on the string-like one but not the bytes-like one. He
may not like having slightly different APIs for the two types, I don't
know, but it's possible. But DirEntry is polymorphic, so it *will*
have a __fspath__ method, and needs to know what to do when it's
bytes-like (I guess with a bit of getattr hacking DirEntry *could*
expose a __fspath__ method only if it's string-like, but that seems
like a pretty gross hack).

So:

1. pathlib remains string-like, and is the canonical example of
__fspath__, returns strings only
2. DirEntry is the only other example of the protocol in the stdlib,
but is polymorphic
3. I'm not aware of any 3rd party library that has polymorphic classes
(Ethan can correct me if I'm wrong here)

So the only purpose I know of for discussing __fspath__ returning
bytes is for scandir, and hypothetical polymorphic 3rd party path
abstractions (and possibly Ethan's preference to have a common API for
his 2 classes).

I propose we should have a string-only __fspath__ protocol in 3.6.
Bytes-format DirEntry objects can raise an error in __fspath__. If it
becomes obvious with usage that we need bytes support in __fspath__ we
can add it (compatibly - string-only code wouldn't need to change) in
3.7. That seems far better to me than trying to design bytes support
without actual use cases.

Paul

From vadmium+py at gmail.com  Thu Apr 14 06:21:42 2016
From: vadmium+py at gmail.com (Martin Panter)
Date: Thu, 14 Apr 2016 10:21:42 +0000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <nenn2m$lqu$1@ger.gmane.org>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <nenn2m$lqu$1@ger.gmane.org>
Message-ID: <CA+eR4cF0AWX1FtLNQNfxYcN7EC6zuPYmqr0LGr8WWLRCrkKhJg@mail.gmail.com>

On 14 April 2016 at 09:15, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 13.04.16 14:40, Victor Stinner wrote:
>> By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
>> considered as stable since it's failing with multiple issues since
>> many months and nobody is working on these failures. I suggest to move
>> this buildbot back to the unstable category.
>
> I think the main cause is the lack of memory in this buildbot. I tried to
> minimize memory consumption and leaks, but some leaks are left, and they
> provoke other tests failures, and additional resource leaks. Would be nice
> to add a feature for running every test in separate subprocess. This will
> isolate the effect of failed tests.

Last time I looked into the Open Indiana buildbot, I concluded that
the biggest problem was Python using fork() to spawn subprocesses. I
understand that OS does not do ?memory overcommitment? like Linux
does, so every time you fork, the OS has to double the amount of
memory that is reserved. It is ironic, but running each test using the
current subprocess module (which uses fork) would probably make the
problem worse.

I suspect using posix_spawn() if possible would help a lot. But this
was rejected in <https://bugs.python.org/issue20104> for not being
flexible enough and making maintainence too complicated.

From victor.stinner at gmail.com  Thu Apr 14 06:25:40 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 12:25:40 +0200
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwaQDAA-icsAnRAYZpg18og569pdeZQ94HRgs=GyS_NHZQ@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <nenn2m$lqu$1@ger.gmane.org>
 <CAMpsgwaQDAA-icsAnRAYZpg18og569pdeZQ94HRgs=GyS_NHZQ@mail.gmail.com>
Message-ID: <CAMpsgwaC=5Y4g5YV6h_eLgYzBaeKwm0N8FC2pdfeKt8s1D+vYQ@mail.gmail.com>

Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka" <storchaka at gmail.com> a ?crit :
> A desirable but nonexistent feature is to write emails to authors of
commits that broke buildbots. How hard to implement this?

Yeah I also had this idea since many years but buildbots were quite
unstable. Maybe we should be more strict to consider a buildbot as stable?

I propose to experiment sending notifications of failure to the authors of
changes *and* to a new mailing list. I would subscribe to such list. An
even safer starting point would be to only start with the mailing list.

FYI I'm connected to the #python-dev IRC channel which already contain
these notifications. But I agree that mails are better.

> What are you think about backporting recent regrtest to 2.7? Most needed
features to me are the -m and -G options.

Regrtest changed a lot in python 3.6 (new test.libregrtest library).
I suggest to start from python 3.5.

For -m: if it doesn't need to modify the unittest module, I agree.

I don't know -G option.

> Would be nice to add a feature for running every test in separate
subprocess. This will isolate the effect of failed tests.

See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I
even mentionned my issue.

I suggest to use -jN on all buildbot, at least -j1.

Maybe -j2 is even better since many tests are waiting on IO or simple sleep.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/88f57dee/attachment.html>

From victor.stinner at gmail.com  Thu Apr 14 06:29:21 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 12:29:21 +0200
Subject: [Python-Dev] Bytes path
In-Reply-To: <nenkqe$glb$1@ger.gmane.org>
References: <nenkqe$glb$1@ger.gmane.org>
Message-ID: <CAMpsgwa19eWbemDEKUq0saDTNHN4acKFLT10+Aw4HzexNP9Lcw@mail.gmail.com>

IMHO it's more a side effect of the implementation than a deliberate
choice. For new code which really want to support bytes paths, I suggest to
only accept bytes and bytes subclasses.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/f1838c35/attachment.html>

From victor.stinner at gmail.com  Thu Apr 14 06:32:05 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 12:32:05 +0200
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <nenln9$v2a$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
 <nenln9$v2a$1@ger.gmane.org>
Message-ID: <CAMpsgwaXgTNz3o3+1iy4sxALhRS5zLC05sNW6etvqN-_MuxiqA@mail.gmail.com>

Le 14 avr. 2016 10:53 AM, "Serhiy Storchaka" <storchaka at gmail.com> a ?crit :
> Most bug tracker emails still went in the Spam folder. I have a filter
for Roundap emails, but there is no any mark that I can use for filtering
Rietveld emails.

I'm using the base URL of Rietveld and match it in the mail body. Gmail
filters have an option to never mark emails as spam.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/f81186d0/attachment-0001.html>

From vadmium+py at gmail.com  Thu Apr 14 06:33:19 2016
From: vadmium+py at gmail.com (Martin Panter)
Date: Thu, 14 Apr 2016 10:33:19 +0000
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <nenln9$v2a$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
 <nenln9$v2a$1@ger.gmane.org>
Message-ID: <CA+eR4cHCSKppxrbm8HGxqXziDUvZOMZ_WBBbLj3bcs-nZUuESw@mail.gmail.com>

On 14 April 2016 at 08:51, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 13.04.16 07:39, Terry Reedy wrote:
>>
>> On 4/4/2016 5:05 PM, Terry Reedy wrote:
>>
>> Since a few days, I am getting bug tracker emails again, in my Inbox.  I
>> just got a Rietveld review in the Inbox and I believe it went there
>> directly instead of first to Junk.  Thank you to whoever made the
>> improvements.
>
>
> AFAIK David just disabled IPv6 support.
>
> Most bug tracker emails still went in the Spam folder. I have a filter for
> Roundap emails, but there is no any mark that I can use for filtering
> Rietveld emails.

FWIW I set up the following filter in Gmail for Rietveld reviews:

Matches: http://bugs.python.org/review
Do this: Never send it to Spam

I suspect it helps, but occasionally I think stuff still goes to spam.
(Just don?t tell this secret rule to actual spammers :)

From storchaka at gmail.com  Thu Apr 14 07:01:37 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 Apr 2016 14:01:37 +0300
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <CA+eR4cHCSKppxrbm8HGxqXziDUvZOMZ_WBBbLj3bcs-nZUuESw@mail.gmail.com>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
 <nenln9$v2a$1@ger.gmane.org>
 <CA+eR4cHCSKppxrbm8HGxqXziDUvZOMZ_WBBbLj3bcs-nZUuESw@mail.gmail.com>
Message-ID: <nentai$t7l$1@ger.gmane.org>

On 14.04.16 13:33, Martin Panter wrote:
> On 14 April 2016 at 08:51, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> Most bug tracker emails still went in the Spam folder. I have a filter for
>> Roundap emails, but there is no any mark that I can use for filtering
>> Rietveld emails.
>
> FWIW I set up the following filter in Gmail for Rietveld reviews:
>
> Matches: http://bugs.python.org/review
> Do this: Never send it to Spam
>
> I suspect it helps, but occasionally I think stuff still goes to spam.
> (Just don?t tell this secret rule to actual spammers :)

Thank you and Victor for this advise.

But this filter is not quite robust, for example it will cause this mail 
to be moved to the folder for Rietveld reviews.

I was going to try a different approach, append "+py" to my address for 
the tracker, as in your address.


From victor.stinner at gmail.com  Thu Apr 14 07:26:12 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 13:26:12 +0200
Subject: [Python-Dev] Not receiving bug tracker emails
In-Reply-To: <nentai$t7l$1@ger.gmane.org>
References: <CA+eR4cFgk+UmB-3TZP_m-Ng=hx9gYrdOcY_gHPGJfnheDDaKmw@mail.gmail.com>
 <ndukuq$or3$1@ger.gmane.org> <nekii9$9lr$1@ger.gmane.org>
 <nenln9$v2a$1@ger.gmane.org>
 <CA+eR4cHCSKppxrbm8HGxqXziDUvZOMZ_WBBbLj3bcs-nZUuESw@mail.gmail.com>
 <nentai$t7l$1@ger.gmane.org>
Message-ID: <CAMpsgwYDXMDHmoZML_9-eg3nJr1-S6pH_kV_jA9Qk-OmQTqJvg@mail.gmail.com>

2016-04-14 13:01 GMT+02:00 Serhiy Storchaka <storchaka at gmail.com>:
> But this filter is not quite robust, for example it will cause this mail to
> be moved to the folder for Rietveld reviews.

Right, it's just a workaround since I'm unable to fix the root cause
(emails marked as spam which looks like a configuration issue in the
SMTP server.)

From ncoghlan at gmail.com  Thu Apr 14 07:44:58 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 21:44:58 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>

On 14 April 2016 at 17:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> But WDOT?  I'd especially like to hear if Nick is tempted to flip-flop
> (so far he's been in the "pathlib is a text utility" camp).

pathlib is too high level (i.e. has too many dependencies) to be used
in low level boundary code.

The use case for returning bytes from __fspath__ is DirEntry, so you
can write things like this in low level code:

    def myscandir(dirpath):
        for entry in os.scandir(dirpath):
            if entry.is_file():
                with open(entry) as f:
                    # do something

and still have them automatically inherit the str/bytes handling of
the core standard library APIs.

By contrast, as soon as you type "import pathlib" at the top of your
file, you've stepped outside the world of potentially pure boundary
code, and are instead dealing with structured application level
objects (which means traversing the bytes->str boundary before the
str->Path one).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Thu Apr 14 08:02:55 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 14:02:55 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CADiSq7eH_DE_035veojyh+ja6HTJAbbdgT_Fq67ARCqBWzZu-w@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAO41-mNHebkPgiC1A410nbVpy0wOsxmE8C7RNP3rinufoqwwLQ@mail.gmail.com>
 <CAMpsgwa-XuuCtGBMz1-gqy12VxiiibA8xBZw0On+y4kinkVRmg@mail.gmail.com>
 <CAO41-mM8Xsqv0xizAN3Hv9ZPvSkantioViSSdZdRj0QvwXzXGg@mail.gmail.com>
 <CAMpsgwZhkwxasLybJNToQmFV8ugjgGwmhxKScJeSfmgt77ktFA@mail.gmail.com>
 <CADiSq7eH_DE_035veojyh+ja6HTJAbbdgT_Fq67ARCqBWzZu-w@mail.gmail.com>
Message-ID: <CAMpsgwaOxd8aKya4pYVpkZr0-PWLT4R5K7CGa1WuiOju72HQ4w@mail.gmail.com>

Le jeudi 14 avril 2016, Nick Coghlan <ncoghlan at gmail.com> a ?crit :
>
> > IHMO it's not a big deal to update these projects for the future
> > Python 3.6. I can even help them to support the new bytecode format.
>
> We've also had previous discussions on adding a "minimum viable
> bytecode editing" API to the standard library, and updating these
> third party modules to support wordcode instead of bytecode could
> provide a good use-case-driven opportunity for defining that (i.e. it
> wouldn't be about providing an end user facing API directly, but
> rather about letting CPython take care of the bookkeeping details for
> things like lnotab and sorting out jump targets).

Yeah, I know well this discussion since it started with my PEP 511. I
wrote the bytecode as a tool for the discussion, to try to understand
better the use case. The main task was to design the API.

I first looked at byteplay and codetranformer projects, but I found
some issues in their design. Their API has some design issues. IMHO
their API is not the best to modify bytecode.

My goal is to support Bytecode.from_code(code).to_code()==code: store
enough information to be able to emit again exactly the same bytecode
(line numbers, exact argument value, etc.).

I started with a long email, but I decided to document differences in
bytecode documentation:
https://bytecode.readthedocs.org/en/latest/byteplay_codetransformer.html

Victor

From victor.stinner at gmail.com  Thu Apr 14 08:16:03 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 14:16:03 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
Message-ID: <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>

2016-04-13 19:10 GMT+02:00 Brett Cannon <brett at python.org>:
> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
> four potential approaches implemented (although it doesn't follow the
> "separate functions" approach some are proposing and instead goes with the
> allow_bytes approach I originally proposed).

IMHO the best argument against the flavor 4 (fspath: str or bytes
allowed) is the os.path.join() function.

I consider that the final goal of the whole discussion is to support
something like:

    path = os.path.join(pathlib_path, "str_path", direntry)

Even if direntry uses a bytes filename. I expect genericpath.join() to
be patched to use os.fspath(). If os.fspath() returns bytes,
path.join() will fail with an annoying TypeError.

I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
just to make my life easier.

I recall that I used to say that Python 2 doesn't support Unicode
filenames because os.path.join() raises a UnicodeDecodeError when you
try to join a Unicode filename with a byte filename which contains
non-ASCII bytes. The problem occurs indirectly in code using hardcoded
paths, Unicode or bytes paths. Saying that "Python 2 doesn't support
Unicode filenames" is wrong, but since Unicode is an hard problem, I
tried to simplify my explanation :-)

You can apply the same rationale for the flavors 2 and 3
(os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
TypeError on os.path.join().

Victor

From random832 at fastmail.com  Thu Apr 14 08:28:29 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 08:28:29 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7ezm1JRkeSjGo7P44g9bX7-gGKPEG2nb6Kftz9-QfkMNA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
 <CADiSq7e_BZwun6UgBZtrHhKewOZ19bakNej1DzttgpxczphvLA@mail.gmail.com>
 <CADiSq7cn_TAnt_+-VEq-9gWRM6V9=4L777bDK+eyaPfM1rgcYA@mail.gmail.com>
 <1460606092.516946.578278417.49F19066@webmail.messagingengine.com>
 <CADiSq7ezm1JRkeSjGo7P44g9bX7-gGKPEG2nb6Kftz9-QfkMNA@mail.gmail.com>
Message-ID: <1460636909.4186032.578623185.789BF90D@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 02:00, Nick Coghlan wrote:
> > If the protocol can return bytes, then that means that types (DirEntry?
> > someone had an alternate path library with a bPath?) which return bytes
> > via the protocol will proliferate, and cannot be safely passed to
> > anything that uses os.fspath. Numerous copies of "def myfspath(x):
> > return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just
> > monkey-patch os.fspath), and no-one actually uses os.fspath except toy
> > examples.
> 
> If folks want coercion, they can just use os.fsdecode(x), as that
> already has a str -> str passthrough from the input to the output
> (unlike codecs.decode) and will presumably be updated to include an
> implicit call to os._raw_fspath() on the passed in object.

This is the first I've heard of any suggestion to have fsdecode accept
non-strings.

> > Why is it so objectionable for os.fspath to do coercion?
> 
> The first problem is that binary paths on Windows basically don't
> work, so it's preferable for them to fail fast regardless of platform,
> rather than to have them implicitly work on *nix, only to fail for
> Windows users using non-ASCII paths later.

Ideally, this warning would be raised from a central place, and even
fspath (and even fsdecode) would go through it.

From random832 at fastmail.com  Thu Apr 14 08:33:23 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 08:33:23 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
Message-ID: <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:
> I have a strong preference for str only, because I still don't see a
> use case for polymorphic __fspath__.

Ultimately we're talking about redundancy and performance here. The "use
case" such as there is one, is if there's a class (be it DirEntry or
whatever else) that natively stores bytes, and __fspath__ has to return
str, then it calls fsdecode and then open immediately turns around and
calls fsencode on the result, accomplishing nothing vs just passing
everything straight through.

From ncoghlan at gmail.com  Thu Apr 14 09:40:33 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Apr 2016 23:40:33 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
Message-ID: <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>

On 14 April 2016 at 22:16, Victor Stinner <victor.stinner at gmail.com> wrote:
> 2016-04-13 19:10 GMT+02:00 Brett Cannon <brett at python.org>:
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the
>> four potential approaches implemented (although it doesn't follow the
>> "separate functions" approach some are proposing and instead goes with the
>> allow_bytes approach I originally proposed).
>
> IMHO the best argument against the flavor 4 (fspath: str or bytes
> allowed) is the os.path.join() function.
>
> I consider that the final goal of the whole discussion is to support
> something like:
>
>     path = os.path.join(pathlib_path, "str_path", direntry)

That's not a *new* problem though, it already exists if you pass in a
mix of bytes and str:

>>> import os.path
>>> os.path.join("str", b"bytes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.4/posixpath.py", line 89, in join
    "components") from None
TypeError: Can't mix strings and bytes in path components

There's also already a solution (regardless of whether you want bytes
or str as the result), which is to explicitly coerce all the arguments
to the same type:

>>> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
'str/bytes'
>>> os.path.join(*map(os.fsencode, ("str", b"bytes")))
b'str/bytes'

Assuming os.fsdecode and os.fsencode are updated to call os.fspath on
their argument before continuing with the current logic, the latter
two forms would both start automatically handling both DirEntry and
pathlib objects, while the first form would continue to throw
TypeError if handed an unexpected bytes value (whether directly or via
an __fspath__ call).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From random832 at fastmail.com  Thu Apr 14 09:45:41 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 09:45:41 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
Message-ID: <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> 
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:

It'd be nice if that went away. Having to do that makes about as much
sense to me as if you had to explicitly coerce an int to a float to add
them together. Sure, explicit is better than implicit, but there are
limits. You're explicitly calling os.path.join; isn't that explicit
enough?

From rosuav at gmail.com  Thu Apr 14 09:50:57 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 14 Apr 2016 23:50:57 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
Message-ID: <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>

On Thu, Apr 14, 2016 at 11:45 PM, Random832 <random832 at fastmail.com> wrote:
> On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote:
>> That's not a *new* problem though, it already exists if you pass in a
>> mix of bytes and str:
>>
>> There's also already a solution (regardless of whether you want bytes
>> or str as the result), which is to explicitly coerce all the arguments
>> to the same type:
>
> It'd be nice if that went away. Having to do that makes about as much
> sense to me as if you had to explicitly coerce an int to a float to add
> them together. Sure, explicit is better than implicit, but there are
> limits. You're explicitly calling os.path.join; isn't that explicit
> enough?

Adding integers and floats is considered "safe" because most people's
use of floats completely compasses their use of ints. (You'll get
OverflowError if it can't be represented.) But float and Decimal are
considered "unsafe":

>>> 1.5 + decimal.Decimal("1.5")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal'

This is more what's happening here. Floats and Decimals can represent
similar sorts of things, but with enough incompatibilities that you
can't simply merge them.

ChrisA

From victor.stinner at gmail.com  Thu Apr 14 09:56:24 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 15:56:24 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
Message-ID: <CAMpsgwbuABRRshbCMSe1NMAYndq=D6iDpqDQDJimcoY6FyyQaA@mail.gmail.com>

2016-04-14 15:40 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>>     path = os.path.join(pathlib_path, "str_path", direntry)
>
> That's not a *new* problem though, it already exists if you pass in a
> mix of bytes and str:
> (...)
> There's also already a solution (regardless of whether you want bytes
> or str as the result), which is to explicitly coerce all the arguments
> to the same type:
>
>>>> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
> (...)

I don't understand. What is the point of adding a new __fspath__
protocol to *implicitly* convert path objects to strings, if you still
have to use an explicit conversion?

I would really expect that a high-level API like pathlib would solve
encodings issues for me. IMHO DirEntry entries created by
os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.

os.path.join() is just one example of an operation on multiple paths.
Look at os.path for other example ;-)

> os.path.join(*map(os.fsdecode, ("str", b"bytes")))

This code is quite complex for a newbie, don't you think so?

My example was os.path.join(pathlib_path, "str_path", direntry) where
we can do something to make the API easier to use.

I don't propose to do anything for os.path.join("str", b"bytes") which
would continue to fail with TypeError, *as expected*.

Victor

From random832 at fastmail.com  Thu Apr 14 10:01:44 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 10:01:44 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
Message-ID: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
> Adding integers and floats is considered "safe" because most people's
> use of floats completely compasses their use of ints. (You'll get
> OverflowError if it can't be represented.) But float and Decimal are
> considered "unsafe":
> 
> >>> 1.5 + decimal.Decimal("1.5")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: unsupported operand type(s) for +: 'float' and
> 'decimal.Decimal'
> 
> This is more what's happening here. Floats and Decimals can represent
> similar sorts of things, but with enough incompatibilities that you
> can't simply merge them.

And what such incompatibilities exist between bytes and str for the
purpose of representing file paths? At the end of the day, there's
exactly one answer to "what file on disk this represents (or would
represent if it existed)".

From ethan at stoneleaf.us  Thu Apr 14 10:47:20 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 07:47:20 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <loom.20160414T083232-765@post.gmane.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org>
Message-ID: <570FAD78.60505@stoneleaf.us>

On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:
> Brett Cannon writes:

> After playing with and considering the 4 possibilities, anything where
> __fspath__ can return bytes seems like insanity that flies in the face of
> everything Python 3 is trying to accomplish. In particular, one RichPath
> class might return bytes and another str, or even worse the same class might
> sometimes return bytes and sometimes str. When will os.path.join blow up due
> to mixing bytes and str and when will it work in those situations?

What are you asking here?  Exactly where in os.join mixing bytes & str 
the exception will occur, or will mixing bytes & str ever work?

The answer to the first is irrelevant (except for performance).

The answer to the second is always/never.  Meaning allowing os.fspath() 
and __fspath__ to return either bytes or str will never cause the 
combination of bytes and str to work.  Said another way: if you are 
using os.path.join then all the pieces have be str or all the pieces 
have to be bytes.

--
~Ethan~


From stephen at xemacs.org  Thu Apr 14 10:52:29 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 Apr 2016 23:52:29 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
Message-ID: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > The use case for returning bytes from __fspath__ is DirEntry, so you
 > can write things like this in low level code:
 > 
 >     def myscandir(dirpath):
 >         for entry in os.scandir(dirpath):
 >             if entry.is_file():
 >                 with open(entry) as f:
 >                     # do something

Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).  If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.

If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?

The more I think about this, the more I like my proposal to junk
fspath, and have fsdecode and fsencode consume __fspath__.  That way
application code can request its native type.

 > By contrast, as soon as you type "import pathlib" at the top of your
 > file, you've stepped outside the world of potentially pure boundary
 > code,

"Potentially pure" is an odd term to apply to the boundary code IMO.
We are agreed that conceptually paths are text, for human consumption
(at least at last report we were).  Therefore, paths represented as
bytes are inherently an impure construct.  Viz, surrogateescape.

 > and are instead dealing with structured application level
 > objects (which means traversing the bytes->str boundary before the
 > str->Path one).

That assumes that pathlib.Path's str-only design is appropriate.  I'm
questioning that, primarily as a thought experiment.


From ethan at stoneleaf.us  Thu Apr 14 10:54:39 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 07:54:39 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
Message-ID: <570FAF2F.6080304@stoneleaf.us>

On 04/14/2016 05:16 AM, Victor Stinner wrote:

> I consider that the final goal of the whole discussion is to support
> something like:
>
>      path = os.path.join(pathlib_path, "str_path", direntry)
>
> Even if direntry uses a bytes filename. I expect genericpath.join() to
> be patched to use os.fspath(). If os.fspath() returns bytes,
> path.join() will fail with an annoying TypeError.
>
> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
> just to make my life easier.

This would be where we strongly disagree.  If pathlib, as a high-level 
construct, wants to take that approach I have no issues, but the 
functions in os are low-level and as such should not be changing data 
types unless I ask for it.  I see __fspath__ as a retrieval mechanism, 
not a data-transformation mechanism.

> You can apply the same rationale for the flavors 2 and 3
> (os.fspath(path, allow_bytes=True)). Indirectly, you will get similar
> TypeError on os.path.join().

And that's fine.  Low-level interfaces should not change data types 
unless explicitly requested -- and we have fsencode() and fsdecode() for 
that.

--
~Ethan~


From stephen at xemacs.org  Thu Apr 14 10:57:10 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 Apr 2016 23:57:10 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com>
Message-ID: <22287.44998.72924.402412@turnbull.sk.tsukuba.ac.jp>

Random832 writes:
 > On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote:

 > > I have a strong preference for str only, because I still don't see a
 > > use case for polymorphic __fspath__.
 > 
 > Ultimately we're talking about redundancy and performance here.

Ultimately, yes.  Right now I have some epithets for you:  Premature!
Optimization!!  Get thee behind me, Satan!

More seriously, concrete use cases where this overhead matters?

Church-of-Don-Knuth-member-ly y'rs,

From nikita at nemkin.ru  Thu Apr 14 05:04:34 2016
From: nikita at nemkin.ru (Nikita Nemkin)
Date: Thu, 14 Apr 2016 14:04:34 +0500
Subject: [Python-Dev] MAKE_FUNCTION simplification
Message-ID: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>

MAKE_FUNCTION opcode is complex due to the way it receives
input arguments:

 1) default args, individually;
 2) default kwonly args, individual name-value pairs;
 3) a tuple of parameter names (single constant);
 4) annotation values, individually;
 5) code object;
 6) qualname.

The counts for 1,2,4 are packed into oparg bitfields, making oparg large.

My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
for keyword defaults and annotations.

Then, MAKE_FUNCTION will become a dramatically simpler
5 argument opcode, taking

 1) default args tuple (optional);
 2) default keyword only args dict (optional);
 3) annotations dict (optional);
 4) code object;
 5) qualname.

These arguments correspond exactly to __annotations__, __kwdefaults__,
__defaults__, __code__ and __qualname__ attributes.

For optional args, oparg bits should indicate individual arg presence.
(This also saves None checks in opcode implementation.)

If we add another optional argument (and oparg bit) for __closure__
attribute, then separate MAKE_CLOSURE opcode becomes unnecessary.

Default args tuple is likely to be a constant and can be packaged whole,
compensating for the extra size of explicit BUILD_* instructions.

Compare the current implementation:

    https://github.com/python/cpython/blob/master/Python/ceval.c#L3262

with this provisional implementation (untested):

    TARGET(MAKE_FUNCTION) {
        PyObject *qualname = POP();
        PyObject *codeobj = POP();
        PyFunctionObject *func;
        func = (PyFunctionObject *)PyFunction_NewWithQualName(
                                       codeobj, f->f_globals, qualname);
        Py_DECREF(codeobj);
        Py_DECREF(qualname);
        if (func == NULL)
            goto error;

        /* NB: Py_None is not an acceptable value for these. */
        if (oparg & 0x08)
            func->func_closure = POP();
        if (oparg & 0x04)
            func->func_annotations = POP();
        if (oparg & 0x02)
            func->func_kwdefaults = POP();
        if (oparg & 0x01)
            func->func_defaults = POP();

        PUSH((PyObject *)func);
        DISPATCH();
    }

compile.c also gets a bit simpler, but not much.

What do you think?

From ethan at stoneleaf.us  Thu Apr 14 11:02:22 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 08:02:22 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMpsgwbuABRRshbCMSe1NMAYndq=D6iDpqDQDJimcoY6FyyQaA@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <CAMpsgwbuABRRshbCMSe1NMAYndq=D6iDpqDQDJimcoY6FyyQaA@mail.gmail.com>
Message-ID: <570FB0FE.3060308@stoneleaf.us>

On 04/14/2016 06:56 AM, Victor Stinner wrote:
> 2016-04-14 15:40 GMT+02:00 Nick Coghlan:
 >> Even earlier, Victor Stinner wrote:

>>> I consider that the final goal of the whole discussion is to support
>>> something like:
>>>
>>>      path = os.path.join(pathlib_path, "str_path", direntry)
>>
>> That's not a *new* problem though, it already exists if you pass in a
>> mix of bytes and str:
>> (...)
>> There's also already a solution (regardless of whether you want bytes
>> or str as the result), which is to explicitly coerce all the arguments
>> to the same type:
>>
>>--> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
>> (...)
>
> I don't understand. What is the point of adding a new __fspath__
> protocol to *implicitly* convert path objects to strings, if you still
> have to use an explicit conversion?

That's the crux of the issue -- some of us think the job of __fspath__ 
is to simply retrieve the inherent data from the pathy object, *not* to 
do any implicit conversions.

> I would really expect that a high-level API like pathlib would solve
> encodings issues for me. IMHO DirEntry entries created by
> os.scandir(bytes) must use os.fsdecode() in their __fspath__ method.

Then let pathlib do it. As a high-level interface I have no issue with 
pathlib converting DirEntry bytes objects to str using fsdecode (or 
whatever makes sense); os.path.join (and by extension os.fspath and 
__fspath__) should do no such thing.

>> os.path.join(*map(os.fsdecode, ("str", b"bytes")))
>
> This code is quite complex for a newbie, don't you think so?

A newbie should be using pathlib.  If pathlib is not low-level enough, 
then the newbie needs to learn about low-level stuff.

--
~Ethan~

From victor.stinner at gmail.com  Thu Apr 14 11:19:42 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 17:19:42 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
Message-ID: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>

Hi,

I updated my PEP 509 to make the dictionary version globally unique.
With *two* use cases of this PEP (Yury's method call patch and my FAT
Python project), I think that the PEP is now ready to be accepted.

Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced.

I also renamed the ma_version field to ma_version_tag.

HTML version:
https://www.python.org/dev/peps/pep-0509/

Victor


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract
========

Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.


Rationale
=========

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.

See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.


Guard example
=============

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.value = dict.get(key, UNSET)
            self.version = dict_get_version(dict)

        def check(self):
            """Return True if the dictionary entry did not changed
            and the dictionary was not replaced."""

            # read the version of the dict structure
            version = dict_get_version(self.dict)
            if version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            value = self.dict.get(self.key, UNSET)
            if value is self.value:
                # another key was modified:
                # cache the new dictionary version
                self.version = version
                return True

            # the key was modified
            return False


Usage of the dict version
=========================

Speedup method calls 1.2x
-------------------------

Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the
`implement per-opcode cache in ceval
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.


Specialized functions using guards
----------------------------------

The `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.

Example of a static Python optimizer: the `fatoptimizer
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces.
Examples:

* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
  ``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
  guards on ``builtins.__dict__['range']`` and ``globals()['range']``
  are required


Pyjion
------

According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement
optimizations.

Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).


Unladen Swallow
---------------

Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
Retrospective
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.


Changes
=======

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
dictionary version. Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:

* ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist
* ``__detitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
  is not ``dict[key]``
* ``update(...)`` if new values are different than existing values:
  values are compared by identity, not by their content; the version can
  be incremented multiple times

The ``PyDictObject`` structure is not part of the stable ABI.

The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.

Example using an hypothetical ``dict_get_version(dict)`` function::

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
``new_value is old_value``, not by their content:
``new_value == old_value``. Example::

    >>> d = {}
    >>> value = object()
    >>> d['key'] = value
    >>> dict_get_version(d)
    40
    >>> d['key'] = value
    >>> dict_get_version(d)
    40

.. note::
   CPython uses some singleton like integers in the range [-5; 257],
   empty tuple, empty strings, Unicode strings of a single character in
   the range [U+0000; U+00FF], etc. When a key is set twice to the same
   singleton, the version is not modified.


Implementation and Performance
==============================

The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP.

On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations.

When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
a guard can watch for multiple keys. For example, for an optimization
using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).

The `fat module
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
such guards: ``fat.GuardDict`` is based on the dictionary version.


Integer overflow
================

The implementation uses the C type ``PY_UINT64_T`` to store the version:
a 64 bits unsigned integer. The C code uses ``version++``. On integer
overflow, the version is wrapped to ``0`` (and then continue to be
incremented) according to the C standard.

After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug only occurs at a guard check if
there are exaclty ``2 ** 64`` dictionary creations or modifications
since the previous guard check.

If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.

A risk of a bug every 584 years is acceptable.


Alternatives
============

Expose the version at Python level as a read-only __version__ property
----------------------------------------------------------------------

The first version of the PEP proposed to expose the dictionary version
as a read-only ``__version__`` property at Python level, and also to add
the property to ``collections.UserDict`` (since this type must mimick
the ``dict`` API).

There are multiple issues:

* To be consistent and avoid bad surprises, the version must be added to
  all mapping types. Implementing a new mapping type would require extra
  work for no benefit, since the version is only required on the
  ``dict`` type in practice.
* All Python implementations must implement this new property, it gives
  more work to other implementations, whereas they may not use the
  dictionary version at all.
* Exposing the dictionary version at Python level can lead the
  false assumption on performances. Checking ``dict.__version__`` at
  the Python level is not faster than a dictionary lookup. A dictionary
  lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
  ns, the difference is only 1.2 ns (3%)::


    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
    10000000 loops, best of 3: 0.0487 usec per loop
    $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
'd.__version__ == 100'
    10000000 loops, best of 3: 0.0475 usec per loop

* The ``__version__`` can be wrapped on integer overflow. It is error
  prone: using ``dict.__version__ <= guard_version`` is wrong,
  ``dict.__version__ == guard_version`` must be used instead to reduce
  the risk of bug on integer overflow (even if the integer overflow is
  unlikely in practice).

Mandatory bikeshedding on the property name:

* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
  `abc.get_cache_token()
  <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__``
* ``__timestamp__``


Add a version to each dict entry
--------------------------------

A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version to avoid the strong reference to the value (only strong
references to the dictionary and to the key are needed).

Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
the field has the C type ``PY_INT64_T``. When a key is created or
modified, the entry version is set to the dictionary version which is
incremented at any change (create, modify, delete).

Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)``
``dict_get_entry_version(dict)`` functions::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.dict_version = dict_get_version(dict)
            self.entry_version = dict_get_entry_version(dict, key)

        def check(self):
            """Return True if the dictionary entry did not changed
            and the dictionary was not replaced."""

            # read the version of the dict structure
            dict_version = dict_get_version(self.dict)
            if dict_version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            entry_version = get_dict_key_version(dict, key)
            if entry_version == self.entry_version:
                # another key was modified:
                # cache the new dictionary version
                self.dict_version = dict_version
                return True

            # the key was modified
            return False

The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.

In Python, the memory footprint matters and the trend is to reduce it.
Examples:

* `PEP 393 -- Flexible String Representation
  <https://www.python.org/dev/peps/pep-0393/>`_
* `PEP 412 -- Key-Sharing Dictionary
  <https://www.python.org/dev/peps/pep-0412/>`_


Add a new dict subtype
----------------------

Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.

Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed.

Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues:

* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
  use ``globals={}``. It is not possible to cast the ``dict`` to a
  ``dict`` subtype because the caller expects the ``globals`` parameter
  to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling
  ``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
  functions require the exact ``dict`` type.
* ``Python/ceval.c`` does not completely supports dict subtypes for
  namespaces


The ``exec()`` issue is a blocker issue.

Other issues:

* The garbage collector has a special code to "untrack" ``dict``
  instances. If a ``dict`` subtype is used for namespaces, the garbage
  collector can be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken
  for ``dict`` subtypes, and so it would make Python a little bit
  slower.


Prior Art
=========

Method cache and type version tag
---------------------------------

In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
was merged into Python 2.6.  The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure).

The type version tag is not available at the Python level.

The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to
"make it fast, have a deterministic and low memory footprint, and be
easy to invalidate". Each cache entry has a version tag. A global
version tag is used to create the next version tag, it also has the C
type ``unsigned int``.

By default, a type has its "valid version tag" flag cleared to indicate
that the version tag is invalid. When the first method of the type is
cached, the version tag and the "valid version tag" flag are set. When a
type is modified, the "valid version tag" flag of the type and its
subclasses is cleared. Later, when a cache entry of these types is used,
the entry is removed because its version tag is outdated.

On integer overflow, the whole cache is cleared and the global version
tag is reset to ``0``.

See `Method cache (issue #1685986)
<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
optimization updated for Python 2.6 (issue #1700288)
<https://bugs.python.org/issue1700288>`_.


Globals / builtins cache
------------------------

In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
#10401) <http://bugs.python.org/issue10401>`_ which adds a private
``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``Py_ssize_t``.

The patch adds a "global and builtin cache" to functions and frames, and
changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
cache.

The change on the ``PyDictObject`` structure is very similar to this
PEP.


Cached globals+builtins lookup
------------------------------

In 2006, Andrea Griffini proposed a patch implementing a `Cached
globals+builtins lookup optimization
<https://bugs.python.org/issue1616125>`_.  The patch adds a private
``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``size_t``.

Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.


Guard against changing dict during iteration
--------------------------------------------

In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``.  This field is incremented
when the dictionary is modified, and so is very similar to the proposed
dictionary version.

Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation. The dictionary version changes when values
are replaced, whereas modifying dictionary values while iterating on
dictionary keys is legit in Python.


PySizer
-------

`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.

This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.


Discussion
==========

Thread on the mailing lists:

* python-dev: `PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
  (january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
  <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
  (january 2016)


Copyright
=========

This document has been placed in the public domain.

From ethan at stoneleaf.us  Thu Apr 14 11:25:04 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 08:25:04 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
Message-ID: <570FB650.203@stoneleaf.us>

On 04/14/2016 07:52 AM, Stephen J. Turnbull wrote:
> Nick Coghlan writes:

>> The use case for returning bytes from __fspath__ is DirEntry, so you
>> can write things like this in low level code:
>>
>>     def myscandir(dirpath):
>>         for entry in os.scandir(dirpath):
>>             if entry.is_file():
>>                 with open(entry) as f:
>>                     # do something
>
> Excuse me, but that is *not* a use case for returning bytes from
> DirEntry.__fspath__.  open() is perfectly happy taking str (including
> surrogate-encoded rawbytes).

Substitute open() with sending those bytes somewhere else: why should I 
have to reencode this str back to bytes, when bytes are what I asked for 
in the first place?

> If the trivial thing is for __fspath__
> to return bytes, then implicitly applying os.fsencode to the value
> being returned is almost as trivial, and just as safe.  A low price to
> pay for ensuring that text applications don't crash just because a
> bytes-oriented object decides to implement __fspath__.

How did this application get a bytes path object to begin with?  Either 
it explicitly used bytes when calling scandir and friends (in which case 
it shouldn't be surprised to be working with bytes); or it got that 
bytes object from a database, over-the-wire, an-other-language-lib, etc. 
  Those are the boundaries where bytes should be transformed to str if 
the app doesn't want to deal with bytes (whether for path manipulation 
or other text manipulation).  os.fspath() is not a boundary function and 
shouldn't be used as if it were.

> If there's any cost to defining __fspath__ as str-only, it's some
> other use case.  What consumer of __fspath__ that expects bytes but
> not str do you envision?  Is it generalizable, so that applying
> fsencode to the value of __fspath__ would lead to "unacceptably"
> widespread sprinkling of fsencode all over bytes-oriented code?

If I'm working with bytes, why would I want to work with str?  Python is 
a glue language, and Python practitioners don't always have the luxury 
of working only with text.

--
~Ethan~

From ethan at stoneleaf.us  Thu Apr 14 11:29:02 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 08:29:02 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
 <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
Message-ID: <570FB73E.5000408@stoneleaf.us>

On 04/14/2016 07:01 AM, Random832 wrote:
> On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
>> Adding integers and floats is considered "safe" because most people's
>> use of floats completely compasses their use of ints. (You'll get
>> OverflowError if it can't be represented.) But float and Decimal are
>> considered "unsafe":
>>
>>--> 1.5 + decimal.Decimal("1.5")
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> TypeError: unsupported operand type(s) for +: 'float' and
>> 'decimal.Decimal'
>>
>> This is more what's happening here. Floats and Decimals can represent
>> similar sorts of things, but with enough incompatibilities that you
>> can't simply merge them.
>
> And what such incompatibilities exist between bytes and str for the
> purpose of representing file paths? At the end of the day, there's
> exactly one answer to "what file on disk this represents (or would
> represent if it existed)".

Interoperability with other systems and/or libraries.  If we use 
surrogateescape to transform str to bytes, and the other side does not, 
we no longer have a workable path.

--
~Ethan~


From guido at python.org  Thu Apr 14 11:27:52 2016
From: guido at python.org (Guido van Rossum)
Date: Thu, 14 Apr 2016 08:27:52 -0700
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>
References: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>
Message-ID: <CAP7+vJJd1rrG7UR3Q8yzscubvzHG+T1C2e6ixMm813qCebeGKQ@mail.gmail.com>

Great analysis! What might stand in the way of adoption is concern for
bytecode manipulation libraries that would have to be changed. What
might encourage adoption would be a benchmark showing this saves a lot
of time.

Personally I'm expecting it won't make much of a difference for real
programs since almost always the cost of creating the function is
dwarfed by the (total) cost of running it. But Python does create a
lot of functions, and there's also lambdas.

There's also talk of switching to wordcode, in a different thread.
Maybe the idea would be easier to introduce there? (Bytecode libraries
would have to change anyways, so the additional concern for this
change would be minimal.)

On Thu, Apr 14, 2016 at 2:04 AM, Nikita Nemkin <nikita at nemkin.ru> wrote:
> MAKE_FUNCTION opcode is complex due to the way it receives
> input arguments:
>
>  1) default args, individually;
>  2) default kwonly args, individual name-value pairs;
>  3) a tuple of parameter names (single constant);
>  4) annotation values, individually;
>  5) code object;
>  6) qualname.
>
> The counts for 1,2,4 are packed into oparg bitfields, making oparg large.
>
> My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
> i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
> for keyword defaults and annotations.
>
> Then, MAKE_FUNCTION will become a dramatically simpler
> 5 argument opcode, taking
>
>  1) default args tuple (optional);
>  2) default keyword only args dict (optional);
>  3) annotations dict (optional);
>  4) code object;
>  5) qualname.
>
> These arguments correspond exactly to __annotations__, __kwdefaults__,
> __defaults__, __code__ and __qualname__ attributes.
>
> For optional args, oparg bits should indicate individual arg presence.
> (This also saves None checks in opcode implementation.)
>
> If we add another optional argument (and oparg bit) for __closure__
> attribute, then separate MAKE_CLOSURE opcode becomes unnecessary.
>
> Default args tuple is likely to be a constant and can be packaged whole,
> compensating for the extra size of explicit BUILD_* instructions.
>
> Compare the current implementation:
>
>     https://github.com/python/cpython/blob/master/Python/ceval.c#L3262
>
> with this provisional implementation (untested):
>
>     TARGET(MAKE_FUNCTION) {
>         PyObject *qualname = POP();
>         PyObject *codeobj = POP();
>         PyFunctionObject *func;
>         func = (PyFunctionObject *)PyFunction_NewWithQualName(
>                                        codeobj, f->f_globals, qualname);
>         Py_DECREF(codeobj);
>         Py_DECREF(qualname);
>         if (func == NULL)
>             goto error;
>
>         /* NB: Py_None is not an acceptable value for these. */
>         if (oparg & 0x08)
>             func->func_closure = POP();
>         if (oparg & 0x04)
>             func->func_annotations = POP();
>         if (oparg & 0x02)
>             func->func_kwdefaults = POP();
>         if (oparg & 0x01)
>             func->func_defaults = POP();
>
>         PUSH((PyObject *)func);
>         DISPATCH();
>     }
>
> compile.c also gets a bit simpler, but not much.
>
> What do you think?
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From victor.stinner at gmail.com  Thu Apr 14 11:32:14 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 17:32:14 +0200
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>
References: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>
Message-ID: <CAMpsgwb_387ugR73St3Uw9=VZ5m0xECtxMvraLQbpVMoC48Ezw@mail.gmail.com>

2016-04-14 11:04 GMT+02:00 Nikita Nemkin <nikita at nemkin.ru>:
> MAKE_FUNCTION opcode is complex due to the way it receives
> input arguments: (...)

Yeah, I was always disturbed how this opcode gets parameters.

> My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION,
> i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs
> for keyword defaults and annotations.

I read the code. I fact, I don't understand why it wasn't done like
that since the beginning :-p

> Then, MAKE_FUNCTION will become a dramatically simpler
> 5 argument opcode, taking

Would you like to work on a patch to implement that change?

Since Python 3.6 may get a new bytecode format  (wordcode, see the
other thread on this mlailing list), I think that it's ok to change
MAKE_FUNCTION in the same release.

Victor

From victor.stinner at gmail.com  Thu Apr 14 11:36:10 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 17:36:10 +0200
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <CAP7+vJJd1rrG7UR3Q8yzscubvzHG+T1C2e6ixMm813qCebeGKQ@mail.gmail.com>
References: <CANawmycgJwN-vQ5BkmBLs59TSjPDB=8QE-wRNYDjPNFzDsanMQ@mail.gmail.com>
 <CAP7+vJJd1rrG7UR3Q8yzscubvzHG+T1C2e6ixMm813qCebeGKQ@mail.gmail.com>
Message-ID: <CAMpsgwYbeemA0kfj_OaKg6jjQiTZq+bXgU4kGn7U8nLo+6WhZA@mail.gmail.com>

2016-04-14 17:27 GMT+02:00 Guido van Rossum <guido at python.org>:
> Great analysis! What might stand in the way of adoption is concern for
> bytecode manipulation libraries that would have to be changed.
> (...)
> There's also talk of switching to wordcode, in a different thread.

I agree that breaking backward compatibility just for MAKE_FUNCTION is
not worth. But if we accept the wordcode change, IMHO it's ok to take
this as an opportunity to also modify MAKE_FUNCTION.

> Maybe the idea would be easier to introduce there? (Bytecode libraries
> would have to change anyways, so the additional concern for this
> change would be minimal.)

Exactly ;-)

Victor

From brett at python.org  Thu Apr 14 11:40:55 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 14 Apr 2016 15:40:55 +0000
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwaC=5Y4g5YV6h_eLgYzBaeKwm0N8FC2pdfeKt8s1D+vYQ@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
 <nenn2m$lqu$1@ger.gmane.org>
 <CAMpsgwaQDAA-icsAnRAYZpg18og569pdeZQ94HRgs=GyS_NHZQ@mail.gmail.com>
 <CAMpsgwaC=5Y4g5YV6h_eLgYzBaeKwm0N8FC2pdfeKt8s1D+vYQ@mail.gmail.com>
Message-ID: <CAP1=2W5ii5tWywi7dOWpmFFh1YoqCBLqUk12=PpQ0kFVbsN3Pg@mail.gmail.com>

On Thu, 14 Apr 2016 at 03:26 Victor Stinner <victor.stinner at gmail.com>
wrote:

>
> Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka" <storchaka at gmail.com> a
> ?crit :
> > A desirable but nonexistent feature is to write emails to authors of
> commits that broke buildbots. How hard to implement this?
>
> Yeah I also had this idea since many years but buildbots were quite
> unstable. Maybe we should be more strict to consider a buildbot as stable?
>

Depending on how fancy we get with our infrastructure after we move to
GitHub, we could theoretically end up with a PR-merging bot that can detect
which commit broke things and report on the PR that did it (we well as
report anywhere else we wanted to).


> I propose to experiment sending notifications of failure to the authors of
> changes *and* to a new mailing list. I would subscribe to such list. An
> even safer starting point would be to only start with the mailing list.
>
> FYI I'm connected to the #python-dev IRC channel which already contain
> these notifications. But I agree that mails are better.
>

Yeah, I'm one of those that doesn't sit on #python-dev due to the lack of a
persistently connected machine, so an email would work better (unless we
want to be trendy and write a bot for Slack/Skype/FB Messenger :).


> > What are you think about backporting recent regrtest to 2.7? Most needed
> features to me are the -m and -G options.
>
> Regrtest changed a lot in python 3.6 (new test.libregrtest library).
> I suggest to start from python 3.5.
>
> For -m: if it doesn't need to modify the unittest module, I agree.
>
> I don't know -G option.
>
> > Would be nice to add a feature for running every test in separate
> subprocess. This will isolate the effect of failed tests.
>
> See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I
> even mentionned my issue.
>
> I suggest to use -jN on all buildbot, at least -j1.
>
> Maybe -j2 is even better since many tests are waiting on IO or simple
> sleep.
>

Both ideas seems reasonable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/4b0c30f6/attachment.html>

From cybersol at yahoo.com  Thu Apr 14 11:59:56 2016
From: cybersol at yahoo.com (Michael Mysinger)
Date: Thu, 14 Apr 2016 15:59:56 +0000 (UTC)
Subject: [Python-Dev] pathlib - current status of discussions
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org> <570FAD78.60505@stoneleaf.us>
Message-ID: <loom.20160414T172609-785@post.gmane.org>

Ethan Furman <ethan <at> stoneleaf.us> writes:

> On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote:
> > In particular, one RichPath
> > class might return bytes and another str, or even worse the same class 
might
> > sometimes return bytes and sometimes str. When will os.path.join blow up 
due
> > to mixing bytes and str and when will it work in those situations?
> 
> What are you asking here?  ...  Meaning allowing os.fspath() 
> and __fspath__ to return either bytes or str will never cause the 
> combination of bytes and str to work.  Said another way: if you are 
> using os.path.join then all the pieces have be str or all the pieces 
> have to be bytes.

I am saying that if os.path.join now accepts RichPath objects, and those 
objects can return either str or bytes, then its much harder to reason about 
when I have all bytes or all strings. In essence, you will force me to pre-
wrap all RichPath objects in either os.fsencode(os.fspath(path)) or 
os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I 
have to always do that wrapping then os.path.join doesn't need to accept 
RichPath objects and call fspath at all.




From victor.stinner at gmail.com  Thu Apr 14 12:04:36 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 18:04:36 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570FB73E.5000408@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
 <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
 <570FB73E.5000408@stoneleaf.us>
Message-ID: <CAMpsgwZ+DshmQ1x-=8uNhRwSB2Xfs11mF=eVBpjaguDCRO_pTw@mail.gmail.com>

2016-04-14 17:29 GMT+02:00 Ethan Furman <ethan at stoneleaf.us>:
> Interoperability with other systems and/or libraries.  If we use
> surrogateescape to transform str to bytes, and the other side does not, we
> no longer have a workable path.

I guess that you mean a Python library? When you exchange with
external programs or call a C libraries, Python is responsible to
encode Unicode to bytes with os.fsencode(). The external part is not
aware that Python uses surrogateescape, it gets "regular" bytes.

I suggest to consider such Python library as external programs and
libraries: convert Unicode to bytes with os.fsencode(), but also
process paths as Unicode "inside" your application.

It's the basic rule to handle correctly Unicode in an application:
decode inputs as soon as possible, and encode back as late as
possible. Encode/decode at borders.

Victor

From stephen at xemacs.org  Thu Apr 14 12:05:57 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 15 Apr 2016 01:05:57 +0900
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
 <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
Message-ID: <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp>

Random832 writes:

 > And what such incompatibilities exist between bytes and str for the
 > purpose of representing file paths?

A plethora of encodings.

 > At the end of the day, there's exactly one answer to "what file on
 > disk this represents (or would represent if it existed)".

Nope.  Suppose those bytes were read from a file or a socket?  It's
dangerous to assume that encoding matches the file system's.


From victor.stinner at gmail.com  Thu Apr 14 12:09:14 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 18:09:14 +0200
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570FAF2F.6080304@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <570FAF2F.6080304@stoneleaf.us>
Message-ID: <CAMpsgwY8=241ZA4jJpUUoXe=DOYcADmYTOu3Vn59UcnELhyGRQ@mail.gmail.com>

2016-04-14 16:54 GMT+02:00 Ethan Furman <ethan at stoneleaf.us>:
>> I consider that the final goal of the whole discussion is to support
>> something like:
>>
>>      path = os.path.join(pathlib_path, "str_path", direntry)
>>
>> (...)
>> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
>> just to make my life easier.
>
> This would be where we strongly disagree.

FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)

At least, we now identified better a point of disagreement.

Victor

From donald at stufft.io  Thu Apr 14 12:13:02 2016
From: donald at stufft.io (Donald Stufft)
Date: Thu, 14 Apr 2016 12:13:02 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <loom.20160414T172609-785@post.gmane.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org> <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
Message-ID: <1B962989-D6E6-4557-BDAD-3087F1E733E6@stufft.io>


> On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev <python-dev at python.org> wrote:
> 
> In essence, you will force me to pre-
> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> os.fsdecode(os.fspath(path)), just so I can reason about the type.


This is only the case if you have a singular RichPath object that can represent both bytes and str (which is what DirEntry does, which I agree makes it harder? but that?s already the case with DirEntry.path). However that?s not the case if you have a bRichPath and uRichPath.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/e3f8685e/attachment.sig>

From nikita at nemkin.ru  Thu Apr 14 12:03:01 2016
From: nikita at nemkin.ru (Nikita Nemkin)
Date: Thu, 14 Apr 2016 21:03:01 +0500
Subject: [Python-Dev] MAKE_FUNCTION simplification
Message-ID: <CANawmyeN11WsfxEBNPuUo=8YPDqDZBdasUr5mkhOJEr_q26+GQ@mail.gmail.com>

On Thu, Apr 14, 2016 at 8:27 PM, Guido van Rossum <guido at python.org> wrote:
> Great analysis! What might stand in the way of adoption is concern for
> bytecode manipulation libraries that would have to be changed. What
> might encourage adoption would be a benchmark showing this saves a lot
> of time.
>
> Personally I'm expecting it won't make much of a difference for real
> programs since almost always the cost of creating the function is
> dwarfed by the (total) cost of running it. But Python does create a
> lot of functions, and there's also lambdas.

This change alone is very unlikely to have a measurable performance impact.
The intention is to clean up ceval.c/compile.c a bit, nothing more.
If many other opcodes were somehow slimmed down in the similar fashion,
then we might (or might not) see perf gains.

For example, most slot dispatch opcodes can be compressed into a single
opcode+slot index with inlined dispatch logic, instead of each one individually
calling C API functions...

> There's also talk of switching to wordcode, in a different thread.
> Maybe the idea would be easier to introduce there? (Bytecode libraries
> would have to change anyways, so the additional concern for this
> change would be minimal.)

Wordcode can benefit from this change, because it guarantees
single-byte MAKE_FUNCTION oparg.

I think that Python should make bytecode explicitly unstable and subject
to change with any major release. The potential for a faster Python
interpreter (or simple JIT) is huge; requiring bytecode compatibility
will slow down any progress in this area.

From nikita at nemkin.ru  Thu Apr 14 12:14:43 2016
From: nikita at nemkin.ru (Nikita Nemkin)
Date: Thu, 14 Apr 2016 21:14:43 +0500
Subject: [Python-Dev] MAKE_FUNCTION simplification
Message-ID: <CANawmydUcqNn5NsegNdPns_8yYF1E7XA5RK_vj4jKXS9KT6jeg@mail.gmail.com>

On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
>
> Would you like to work on a patch to implement that change?

I'll work on a patch. Should I post it to bugs.python.org?

> Since Python 3.6 may get a new bytecode format  (wordcode, see the
> other thread on this mlailing list), I think that it's ok to change
> MAKE_FUNCTION in the same release.

Wordcode looks like pure win from (projected) 25% bytecode size
reduction alone.

From random832 at fastmail.com  Thu Apr 14 12:18:18 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 12:18:18 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
 <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
 <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp>
Message-ID: <1460650698.48886.578872145.4926E180@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 12:05, Stephen J. Turnbull wrote:
> Random832 writes:
> 
>  > And what such incompatibilities exist between bytes and str for the
>  > purpose of representing file paths?
> 
> A plethora of encodings.

Only one encoding, fsencode/fsdecode. All other encodings are not for
filenames.

>  > At the end of the day, there's exactly one answer to "what file on
>  > disk this represents (or would represent if it existed)".
> 
> Nope.  Suppose those bytes were read from a file or a socket?  It's
> dangerous to assume that encoding matches the file system's.

Why can I pass them to os.open, then, or to os.path.join so long as
everything else is also bytes?

On UNIX, the filesystem is in bytes, so saying that bytes can't match
the filesystem is absurd. Converting it to str with fsdecode will
*always, absolutely, 100% of the time* give a str that will address the
same file that the bytes does (even if it's "dangerous" to assume that
was the name the user wanted, that's beyond the scope of what the module
is capable of dealing with).

From brett at python.org  Thu Apr 14 12:28:36 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 14 Apr 2016 16:28:36 +0000
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
Message-ID: <CAP1=2W7uMBnmCdYYcx8ySdg5-vMAp7f3opcNs_yOYmDr5uq=FQ@mail.gmail.com>

+1 from me!

A couple of grammar/typo suggestions below.

On Thu, 14 Apr 2016 at 08:20 Victor Stinner <victor.stinner at gmail.com>
wrote:

> Hi,
>
> I updated my PEP 509 to make the dictionary version globally unique.
> With *two* use cases of this PEP (Yury's method call patch and my FAT
> Python project), I think that the PEP is now ready to be accepted.
>
> Globally unique identifier is a requirement for Yury's patch
> optimizing method calls ( https://bugs.python.org/issue26110 ). It
> allows to check for free if the dictionary was replaced.
>
> I also renamed the ma_version field to ma_version_tag.
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
> Victor
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner <victor.stinner at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> ========
>
> Add a new private version to the builtin ``dict`` type, incremented at
> each dictionary creation and at each dictionary change, to implement
> fast guards on namespaces.
>
>
> Rationale
> =========
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. Since the version is globally
> unique, the version is also enough to check if the namespace dictionary
> was not replaced with a new dictionary. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =============
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.value = dict.get(key, UNSET)
>             self.version = dict_get_version(dict)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>

"did not change"


>
>             # read the version of the dict structure
>             version = dict_get_version(self.dict)
>             if version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             value = self.dict.get(self.key, UNSET)
>             if value is self.value:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.version = version
>                 return True
>
>             # the key was modified
>             return False
>
>
> Usage of the dict version
> =========================
>
> Speedup method calls 1.2x
> -------------------------
>
> Yury Selivanov wrote a `patch to optimize method calls
> <https://bugs.python.org/issue26110>`_. The patch depends on the
> `implement per-opcode cache in ceval
> <https://bugs.python.org/issue26219>`_ patch which requires dictionary
> versions to invalidate the cache if the globals dictionary or the
> builtins dictionary has been modified.
>
> The cache also requires that the dictionary version is globally unique.
> It is possible to define a function in a namespace and call it
> in a different namespace: using ``exec()`` with the *globals* parameter
> for example. In this case, the globals dictionary was changed and the
> cache must be invalidated.
>
>
> Specialized functions using guards
> ----------------------------------
>
> The `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static Python optimizer: the `fatoptimizer
> <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
> <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
> implements many optimizations which require guards on namespaces.
> Examples:
>
> * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
>   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
> * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
>   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
>   are required
>
>
> Pyjion
> ------
>
> According of Brett Cannon, one of the two main developers of Pyjion,
> Pyjion can also benefit from dictionary version to implement
> optimizations.
>
> Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
> Core runtime).
>
>
> Unladen Swallow
> ---------------
>
> Even if dictionary version was not explicitly mentioned, optimizing
> globals and builtins lookup was part of the Unladen Swallow plan:
> "Implement one of the several proposed schemes for speeding lookups of
> globals and builtins." Source: `Unladen Swallow ProjectPlan
> <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
>
> Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
> implemented with LLVM. The project stopped in 2011: `Unladen Swallow
> Retrospective
> <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html
> >`_.
>
>
> Changes
> =======
>
> Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
> the C type ``PY_INT64_T``, 64-bit unsigned integer.


Don't you mean ``PY_UINT64_T``?


> Add also a global
> dictionary version. Each time a dictionary is created, the global
> version is incremented and the dictionary version is initialized to the
> global version. The global version is also incremented and copied to the
> dictionary version at each dictionary change:
>
> * ``clear()`` if the dict was non-empty
> * ``pop(key)`` if the key exists
> * ``popitem()`` if the dict is non-empty
> * ``setdefault(key, value)`` if the `key` does not exist
> * ``__detitem__(key)`` if the key exists
> * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
>   is not ``dict[key]``
> * ``update(...)`` if new values are different than existing values:
>   values are compared by identity, not by their content; the version can
>   be incremented multiple times
>
> The ``PyDictObject`` structure is not part of the stable ABI.
>
> The field is called ``ma_version_tag`` rather than ``ma_version`` to
> suggest to compare it using ``version_tag == old_version_tag`` rather
> than ``version <= old_version`` which makes the integer overflow much
> likely.
>
> Example using an hypothetical ``dict_get_version(dict)`` function::
>
>     >>> d = {}
>     >>> dict_get_version(d)
>     100
>     >>> d['key'] = 'value'
>     >>> dict_get_version(d)
>     101
>     >>> d['key'] = 'new value'
>     >>> dict_get_version(d)
>     102
>     >>> del d['key']
>     >>> dict_get_version(d)
>     103
>
> The version is not incremented if an existing key is set to the same
> value. For efficiency, values are compared by their identity:
> ``new_value is old_value``, not by their content:
> ``new_value == old_value``. Example::
>
>     >>> d = {}
>     >>> value = object()
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>
> .. note::
>    CPython uses some singleton like integers in the range [-5; 257],
>    empty tuple, empty strings, Unicode strings of a single character in
>    the range [U+0000; U+00FF], etc. When a key is set twice to the same
>    singleton, the version is not modified.
>
>
> Implementation and Performance
> ==============================
>
> The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
> <https://bugs.python.org/issue26058>`_ contains a patch implementing
> this PEP.
>
> On pybench and timeit microbenchmarks, the patch does not seem to add
> any overhead on dictionary operations.
>
> When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
> a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
> a guard can watch for multiple keys. For example, for an optimization
> using 10 global variables in a function, 10 dictionary lookups costs 148
> ns, whereas the guard still only costs 3.8 ns when the version does not
> change (39x as fast).
>
> The `fat module
> <http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
> such guards: ``fat.GuardDict`` is based on the dictionary version.
>
>
> Integer overflow
> ================
>
> The implementation uses the C type ``PY_UINT64_T`` to store the version:
> a 64 bits unsigned integer. The C code uses ``version++``. On integer
> overflow, the version is wrapped to ``0`` (and then continue to be
> incremented) according to the C standard.
>
> After an integer overflow, a guard can succeed whereas the watched
> dictionary key was modified. The bug only occurs at a guard check if
> there are exaclty ``2 ** 64`` dictionary creations or modifications
> since the previous guard check.
>
> If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
> takes longer than 584 years. Using a 32-bit version, it only takes 4
> seconds. That's why a 64-bit unsigned type is also used on 32-bit
> systems. A dictionary lookup at the C level takes 14.8 ns.
>
> A risk of a bug every 584 years is acceptable.
>
>
> Alternatives
> ============
>
> Expose the version at Python level as a read-only __version__ property
> ----------------------------------------------------------------------
>
> The first version of the PEP proposed to expose the dictionary version
> as a read-only ``__version__`` property at Python level, and also to add
> the property to ``collections.UserDict`` (since this type must mimick
> the ``dict`` API).
>
> There are multiple issues:
>
> * To be consistent and avoid bad surprises, the version must be added to
>   all mapping types. Implementing a new mapping type would require extra
>   work for no benefit, since the version is only required on the
>   ``dict`` type in practice.
> * All Python implementations must implement this new property, it gives
>   more work to other implementations, whereas they may not use the
>   dictionary version at all.
> * Exposing the dictionary version at Python level can lead the
>   false assumption on performances. Checking ``dict.__version__`` at
>   the Python level is not faster than a dictionary lookup. A dictionary
>   lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
>   ns, the difference is only 1.2 ns (3%)::
>
>
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"]
> == 33'
>     10000000 loops, best of 3: 0.0487 usec per loop
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
> 'd.__version__ == 100'
>     10000000 loops, best of 3: 0.0475 usec per loop
>
> * The ``__version__`` can be wrapped on integer overflow. It is error
>   prone: using ``dict.__version__ <= guard_version`` is wrong,
>   ``dict.__version__ == guard_version`` must be used instead to reduce
>   the risk of bug on integer overflow (even if the integer overflow is
>   unlikely in practice).
>
> Mandatory bikeshedding on the property name:
>
> * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
>   `abc.get_cache_token()
>   <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
> * ``__version__``
> * ``__timestamp__``
>
>
> Add a version to each dict entry
> --------------------------------
>
> A single version per dictionary requires to keep a strong reference to
> the value which can keep the value alive longer than expected. If we add
> also a version per dictionary entry, the guard can only store the entry
> version to avoid the strong reference to the value (only strong
> references to the dictionary and to the key are needed).
>
> Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
> the field has the C type ``PY_INT64_T``. When a key is created or
> modified, the entry version is set to the dictionary version which is
> incremented at any change (create, modify, delete).
>
> Pseudo-code of an fast guard to check if a dictionary key was modified
> using hypothetical ``dict_get_version(dict)``
> ``dict_get_entry_version(dict)`` functions::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.dict_version = dict_get_version(dict)
>             self.entry_version = dict_get_entry_version(dict, key)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>
>             # read the version of the dict structure
>             dict_version = dict_get_version(self.dict)
>             if dict_version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             entry_version = get_dict_key_version(dict, key)
>             if entry_version == self.entry_version:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.dict_version = dict_version
>                 return True
>
>             # the key was modified
>             return False
>
> The main drawback of this option is the impact on the memory footprint.
> It increases the size of each dictionary entry, so the overhead depends
> on the number of buckets (dictionary entries, used or unused yet). For
> example, it increases the size of each dictionary entry by 8 bytes on
> 64-bit system.
>
> In Python, the memory footprint matters and the trend is to reduce it.
> Examples:
>
> * `PEP 393 -- Flexible String Representation
>   <https://www.python.org/dev/peps/pep-0393/>`_
> * `PEP 412 -- Key-Sharing Dictionary
>   <https://www.python.org/dev/peps/pep-0412/>`_
>
>
> Add a new dict subtype
> ----------------------
>
> Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
> use the ``verdict`` for namespaces (module namespace, type namespace,
> instance namespace, etc.) instead of ``dict``.
>
> Leave the ``dict`` type unchanged to not add any overhead (memory
> footprint) when guards are not needed.
>
> Technical issue: a lot of C code in the wild, including CPython core,
> expecting the exact ``dict`` type. Issues:
>
> * ``exec()`` requires a ``dict`` for globals and locals. A lot of code
>   use ``globals={}``. It is not possible to cast the ``dict`` to a
>   ``dict`` subtype because the caller expects the ``globals`` parameter
>   to be modified (``dict`` is mutable).
> * Functions call directly ``PyDict_xxx()`` functions, instead of calling
>   ``PyObject_xxx()`` if the object is a ``dict`` subtype
> * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
>   functions require the exact ``dict`` type.
> * ``Python/ceval.c`` does not completely supports dict subtypes for
>   namespaces
>
>
> The ``exec()`` issue is a blocker issue.
>
> Other issues:
>
> * The garbage collector has a special code to "untrack" ``dict``
>   instances. If a ``dict`` subtype is used for namespaces, the garbage
>   collector can be unable to break some reference cycles.
> * Some functions have a fast-path for ``dict`` which would not be taken
>   for ``dict`` subtypes, and so it would make Python a little bit
>   slower.
>
>
> Prior Art
> =========
>
> Method cache and type version tag
> ---------------------------------
>
> In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
> was merged into Python 2.6.  The patch adds a "type attribute cache
> version tag" (``tp_version_tag``) and a "valid version tag" flag to
> types (the ``PyTypeObject`` structure).
>
> The type version tag is not available at the Python level.
>
> The version tag has the C type ``unsigned int``. The cache is a global
> hash table of 4096 entries, shared by all types. The cache is global to
> "make it fast, have a deterministic and low memory footprint, and be
> easy to invalidate". Each cache entry has a version tag. A global
> version tag is used to create the next version tag, it also has the C
> type ``unsigned int``.
>
> By default, a type has its "valid version tag" flag cleared to indicate
> that the version tag is invalid. When the first method of the type is
> cached, the version tag and the "valid version tag" flag are set. When a
> type is modified, the "valid version tag" flag of the type and its
> subclasses is cleared. Later, when a cache entry of these types is used,
> the entry is removed because its version tag is outdated.
>
> On integer overflow, the whole cache is cleared and the global version
> tag is reset to ``0``.
>
> See `Method cache (issue #1685986)
> <https://bugs.python.org/issue1685986>`_ and `Armin's method cache
> optimization updated for Python 2.6 (issue #1700288)
> <https://bugs.python.org/issue1700288>`_.
>
>
> Globals / builtins cache
> ------------------------
>
> In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
> #10401) <http://bugs.python.org/issue10401>`_ which adds a private
> ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``Py_ssize_t``.
>
> The patch adds a "global and builtin cache" to functions and frames, and
> changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
> cache.
>
> The change on the ``PyDictObject`` structure is very similar to this
> PEP.
>
>
> Cached globals+builtins lookup
> ------------------------------
>
> In 2006, Andrea Griffini proposed a patch implementing a `Cached
> globals+builtins lookup optimization
> <https://bugs.python.org/issue1616125>`_.  The patch adds a private
> ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``size_t``.
>
> Thread on python-dev: `About dictionary lookup caching
> <https://mail.python.org/pipermail/python-dev/2006-December/070348.html
> >`_.
>
>
> Guard against changing dict during iteration
> --------------------------------------------
>
> In 2013, Serhiy Storchaka proposed `Guard against changing dict during
> iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
> adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
> type), the field has the C type ``size_t``.  This field is incremented
> when the dictionary is modified, and so is very similar to the proposed
> dictionary version.
>
> Sadly, the dictionary version proposed in this PEP doesn't help to
> detect dictionary mutation. The dictionary version changes when values
> are replaced, whereas modifying dictionary values while iterating on
> dictionary keys is legit in Python.
>
>
> PySizer
> -------
>
> `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
> Google Summer of Code 2005 project by Nick Smallbone.
>
> This project has a patch for CPython 2.4 which adds ``key_time`` and
> ``value_time`` fields to dictionary entries. It uses a global
> process-wide counter for dictionaries, incremented each time that a
> dictionary is modified. The times are used to decide when child objects
> first appeared in their parent objects.
>
>
> Discussion
> ==========
>
> Thread on the mailing lists:
>
> * python-dev: `PEP 509: Add a private version to dict
>   <https://mail.python.org/pipermail/python-dev/2016-January/142685.html
> >`_
>   (january 2016)
> * python-ideas: `RFC: PEP: Add dict.__version__
>   <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html
> >`_
>   (january 2016)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/2968f476/attachment-0001.html>

From brett at python.org  Thu Apr 14 12:30:37 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 14 Apr 2016 16:30:37 +0000
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <CANawmydUcqNn5NsegNdPns_8yYF1E7XA5RK_vj4jKXS9KT6jeg@mail.gmail.com>
References: <CANawmydUcqNn5NsegNdPns_8yYF1E7XA5RK_vj4jKXS9KT6jeg@mail.gmail.com>
Message-ID: <CAP1=2W4nALdGPCfLOv4ehEpj_4Ct2YwpGT207xcfGsfbtR-4tw@mail.gmail.com>

On Thu, 14 Apr 2016 at 09:16 Nikita Nemkin <nikita at nemkin.ru> wrote:

> On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner
> <victor.stinner at gmail.com> wrote:
> >
> > Would you like to work on a patch to implement that change?
>
> I'll work on a patch. Should I post it to bugs.python.org?
>

Yep.


>
> > Since Python 3.6 may get a new bytecode format  (wordcode, see the
> > other thread on this mlailing list), I think that it's ok to change
> > MAKE_FUNCTION in the same release.
>
> Wordcode looks like pure win from (projected) 25% bytecode size
> reduction alone.
>

CPU performance is more the worry here (which looks mostly unaffected,
maybe even faster), but reduced .pyc files is a nice perk. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/f86ff779/attachment.html>

From cybersol at yahoo.com  Thu Apr 14 12:30:51 2016
From: cybersol at yahoo.com (Michael Mysinger)
Date: Thu, 14 Apr 2016 16:30:51 +0000 (UTC)
Subject: [Python-Dev] pathlib - current status of discussions
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org> <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
 <1B962989-D6E6-4557-BDAD-3087F1E733E6@stufft.io>
Message-ID: <loom.20160414T181958-889@post.gmane.org>

Donald Stufft <donald <at> stufft.io> writes:

> > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev <python-dev 
<at> python.org> wrote:
> > 
> > In essence, you will force me to pre-
> > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> > os.fsdecode(os.fspath(path)), just so I can reason about the type.
> 
> This is only the case if you have a singular RichPath object that can 
represent both bytes and str (which is
> what DirEntry does, which I agree makes it harder? but that?s already the 
case with DirEntry.path).
> However that?s not the case if you have a bRichPath and uRichPath.

And you might even be able to retain your sanity if you enforce any 
particular class to be either bRichPath or uRichPath. But if you do that, 
then that still leaves DirEntry out in the cold, likely converting to str in 
its __fspath__. Which leaves me in the camp that bRichPath falls under YAGNI, 
and RichPath should be str only.


From ethan at stoneleaf.us  Thu Apr 14 12:39:06 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 09:39:06 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMpsgwY8=241ZA4jJpUUoXe=DOYcADmYTOu3Vn59UcnELhyGRQ@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <570FAF2F.6080304@stoneleaf.us>
 <CAMpsgwY8=241ZA4jJpUUoXe=DOYcADmYTOu3Vn59UcnELhyGRQ@mail.gmail.com>
Message-ID: <570FC7AA.7050805@stoneleaf.us>

On 04/14/2016 09:09 AM, Victor Stinner wrote:
> 2016-04-14 16:54 GMT+02:00 Ethan Furman:

>>> I consider that the final goal of the whole discussion is to support
>>> something like:
>>>
>>>       path = os.path.join(pathlib_path, "str_path", direntry)
>>>
>>> (...)
>>> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str,
>>> just to make my life easier.
>>
>> This would be where we strongly disagree.
>
> FYI it's ok that we disagree on this point, at least I expressed my opinion ;-)

Absolutely.  I appreciate you explaining your point of view.

> At least, we now identified better a point of disagreement.

Agreed.  :)

~Ethan~

From victor.stinner at gmail.com  Thu Apr 14 12:44:05 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 18:44:05 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAP1=2W7uMBnmCdYYcx8ySdg5-vMAp7f3opcNs_yOYmDr5uq=FQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAP1=2W7uMBnmCdYYcx8ySdg5-vMAp7f3opcNs_yOYmDr5uq=FQ@mail.gmail.com>
Message-ID: <CAMpsgwap5T=dbPmJ=H3znZWDOF17L6wzhDSx4GW2LJ6Pq4tuLg@mail.gmail.com>

2016-04-14 18:28 GMT+02:00 Brett Cannon <brett at python.org>:
> +1 from me!

Thanks.

> A couple of grammar/typo suggestions below.

Fixed. (Yes, I want to use unsigned type, so PY_UINT64_T.)

Victor

From ethan at stoneleaf.us  Thu Apr 14 12:46:13 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 09:46:13 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <loom.20160414T172609-785@post.gmane.org>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org> <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
Message-ID: <570FC955.3080908@stoneleaf.us>

On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:

> I am saying that if os.path.join now accepts RichPath objects, and those
> objects can return either str or bytes, then its much harder to reason about
> when I have all bytes or all strings. In essence, you will force me to pre-
> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I
> have to always do that wrapping then os.path.join doesn't need to accept
> RichPath objects and call fspath at all.

What many folks seem to be missing is that *you* (generic you) have 
control of your data.

If you are not working at the bytes layer, you shouldn't be getting 
bytes objects because:

- you specified str when asking for data from the OS, or
- you transformed the incoming bytes from whatever external source
   to str when you received them.

--
~Ethan~

From stefan_ml at behnel.de  Thu Apr 14 12:48:45 2016
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 14 Apr 2016 18:48:45 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
Message-ID: <neohld$sts$1@ger.gmane.org>

+1 from me, too. I'm sure we can make some use of this in Cython.

Stefan


Victor Stinner schrieb am 14.04.2016 um 17:19:
> PEP: 509
> Title: Add a private version to dict



From tjreedy at udel.edu  Thu Apr 14 12:56:54 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 14 Apr 2016 12:56:54 -0400
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <CANawmyeN11WsfxEBNPuUo=8YPDqDZBdasUr5mkhOJEr_q26+GQ@mail.gmail.com>
References: <CANawmyeN11WsfxEBNPuUo=8YPDqDZBdasUr5mkhOJEr_q26+GQ@mail.gmail.com>
Message-ID: <neoi5m$6n1$1@ger.gmane.org>

On 4/14/2016 12:03 PM, Nikita Nemkin wrote:

> I think that Python should make bytecode explicitly unstable and subject
> to change with any major release.

https://docs.python.org/3/library/dis.html#module-dis
CPython implementation detail: Bytecode is an implementation detail of 
the CPython interpreter. No guarantees are made that bytecode will not 
be added, removed, or changed between versions of Python.

Version = minor release, as opposed to maintenance release.

-- 
Terry Jan Reedy


From guido at python.org  Thu Apr 14 12:59:50 2016
From: guido at python.org (Guido van Rossum)
Date: Thu, 14 Apr 2016 09:59:50 -0700
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
Message-ID: <CAP7+vJ+ZjahgH=pq6XeXzWX8mZ=JTpZz6XM8fMprpPy8erTaEw@mail.gmail.com>

I'll wait a day before formally pronouncing to see if any objections
are made, but it looks good to me.

On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> I updated my PEP 509 to make the dictionary version globally unique.
> With *two* use cases of this PEP (Yury's method call patch and my FAT
> Python project), I think that the PEP is now ready to be accepted.
>
> Globally unique identifier is a requirement for Yury's patch
> optimizing method calls ( https://bugs.python.org/issue26110 ). It
> allows to check for free if the dictionary was replaced.
>
> I also renamed the ma_version field to ma_version_tag.
>
> HTML version:
> https://www.python.org/dev/peps/pep-0509/
>
> Victor
>
>
> PEP: 509
> Title: Add a private version to dict
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner <victor.stinner at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-January-2016
> Python-Version: 3.6
>
>
> Abstract
> ========
>
> Add a new private version to the builtin ``dict`` type, incremented at
> each dictionary creation and at each dictionary change, to implement
> fast guards on namespaces.
>
>
> Rationale
> =========
>
> In Python, the builtin ``dict`` type is used by many instructions. For
> example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> global namespace, or in the builtins namespace (two dict lookups).
> Python uses ``dict`` for the builtins namespace, globals namespace, type
> namespaces, instance namespaces, etc. The local namespace (namespace of
> a function) is usually optimized to an array, but it can be a dict too.
>
> Python is hard to optimize because almost everything is mutable: builtin
> functions, function code, global variables, local variables, ... can be
> modified at runtime. Implementing optimizations respecting the Python
> semantics requires to detect when "something changes": we will call
> these checks "guards".
>
> The speedup of optimizations depends on the speed of guard checks. This
> PEP proposes to add a version to dictionaries to implement fast guards
> on namespaces.
>
> Dictionary lookups can be skipped if the version does not change which
> is the common case for most namespaces. Since the version is globally
> unique, the version is also enough to check if the namespace dictionary
> was not replaced with a new dictionary. The performance of a guard does
> not depend on the number of watched dictionary entries, complexity of
> O(1), if the dictionary version does not change.
>
> Example of optimization: copy the value of a global variable to function
> constants.  This optimization requires a guard on the global variable to
> check if it was modified. If the variable is modified, the variable must
> be loaded at runtime when the function is called, instead of using the
> constant.
>
> See the `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
> guards to specialize functions and for the rationale on Python static
> optimizers.
>
>
> Guard example
> =============
>
> Pseudo-code of an fast guard to check if a dictionary entry was modified
> (created, updated or deleted) using an hypothetical
> ``dict_get_version(dict)`` function::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.value = dict.get(key, UNSET)
>             self.version = dict_get_version(dict)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>
>             # read the version of the dict structure
>             version = dict_get_version(self.dict)
>             if version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             value = self.dict.get(self.key, UNSET)
>             if value is self.value:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.version = version
>                 return True
>
>             # the key was modified
>             return False
>
>
> Usage of the dict version
> =========================
>
> Speedup method calls 1.2x
> -------------------------
>
> Yury Selivanov wrote a `patch to optimize method calls
> <https://bugs.python.org/issue26110>`_. The patch depends on the
> `implement per-opcode cache in ceval
> <https://bugs.python.org/issue26219>`_ patch which requires dictionary
> versions to invalidate the cache if the globals dictionary or the
> builtins dictionary has been modified.
>
> The cache also requires that the dictionary version is globally unique.
> It is possible to define a function in a namespace and call it
> in a different namespace: using ``exec()`` with the *globals* parameter
> for example. In this case, the globals dictionary was changed and the
> cache must be invalidated.
>
>
> Specialized functions using guards
> ----------------------------------
>
> The `PEP 510 -- Specialized functions with guards
> <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
> specialized functions with guards. It allows to implement static
> optimizers for Python without breaking the Python semantics.
>
> Example of a static Python optimizer: the `fatoptimizer
> <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
> <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
> implements many optimizations which require guards on namespaces.
> Examples:
>
> * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
>   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
> * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
>   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
>   are required
>
>
> Pyjion
> ------
>
> According of Brett Cannon, one of the two main developers of Pyjion,
> Pyjion can also benefit from dictionary version to implement
> optimizations.
>
> Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
> Core runtime).
>
>
> Unladen Swallow
> ---------------
>
> Even if dictionary version was not explicitly mentioned, optimizing
> globals and builtins lookup was part of the Unladen Swallow plan:
> "Implement one of the several proposed schemes for speeding lookups of
> globals and builtins." Source: `Unladen Swallow ProjectPlan
> <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
>
> Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
> implemented with LLVM. The project stopped in 2011: `Unladen Swallow
> Retrospective
> <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
>
>
> Changes
> =======
>
> Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
> the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
> dictionary version. Each time a dictionary is created, the global
> version is incremented and the dictionary version is initialized to the
> global version. The global version is also incremented and copied to the
> dictionary version at each dictionary change:
>
> * ``clear()`` if the dict was non-empty
> * ``pop(key)`` if the key exists
> * ``popitem()`` if the dict is non-empty
> * ``setdefault(key, value)`` if the `key` does not exist
> * ``__detitem__(key)`` if the key exists
> * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
>   is not ``dict[key]``
> * ``update(...)`` if new values are different than existing values:
>   values are compared by identity, not by their content; the version can
>   be incremented multiple times
>
> The ``PyDictObject`` structure is not part of the stable ABI.
>
> The field is called ``ma_version_tag`` rather than ``ma_version`` to
> suggest to compare it using ``version_tag == old_version_tag`` rather
> than ``version <= old_version`` which makes the integer overflow much
> likely.
>
> Example using an hypothetical ``dict_get_version(dict)`` function::
>
>     >>> d = {}
>     >>> dict_get_version(d)
>     100
>     >>> d['key'] = 'value'
>     >>> dict_get_version(d)
>     101
>     >>> d['key'] = 'new value'
>     >>> dict_get_version(d)
>     102
>     >>> del d['key']
>     >>> dict_get_version(d)
>     103
>
> The version is not incremented if an existing key is set to the same
> value. For efficiency, values are compared by their identity:
> ``new_value is old_value``, not by their content:
> ``new_value == old_value``. Example::
>
>     >>> d = {}
>     >>> value = object()
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>     >>> d['key'] = value
>     >>> dict_get_version(d)
>     40
>
> .. note::
>    CPython uses some singleton like integers in the range [-5; 257],
>    empty tuple, empty strings, Unicode strings of a single character in
>    the range [U+0000; U+00FF], etc. When a key is set twice to the same
>    singleton, the version is not modified.
>
>
> Implementation and Performance
> ==============================
>
> The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
> <https://bugs.python.org/issue26058>`_ contains a patch implementing
> this PEP.
>
> On pybench and timeit microbenchmarks, the patch does not seem to add
> any overhead on dictionary operations.
>
> When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
> a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
> a guard can watch for multiple keys. For example, for an optimization
> using 10 global variables in a function, 10 dictionary lookups costs 148
> ns, whereas the guard still only costs 3.8 ns when the version does not
> change (39x as fast).
>
> The `fat module
> <http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
> such guards: ``fat.GuardDict`` is based on the dictionary version.
>
>
> Integer overflow
> ================
>
> The implementation uses the C type ``PY_UINT64_T`` to store the version:
> a 64 bits unsigned integer. The C code uses ``version++``. On integer
> overflow, the version is wrapped to ``0`` (and then continue to be
> incremented) according to the C standard.
>
> After an integer overflow, a guard can succeed whereas the watched
> dictionary key was modified. The bug only occurs at a guard check if
> there are exaclty ``2 ** 64`` dictionary creations or modifications
> since the previous guard check.
>
> If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
> takes longer than 584 years. Using a 32-bit version, it only takes 4
> seconds. That's why a 64-bit unsigned type is also used on 32-bit
> systems. A dictionary lookup at the C level takes 14.8 ns.
>
> A risk of a bug every 584 years is acceptable.
>
>
> Alternatives
> ============
>
> Expose the version at Python level as a read-only __version__ property
> ----------------------------------------------------------------------
>
> The first version of the PEP proposed to expose the dictionary version
> as a read-only ``__version__`` property at Python level, and also to add
> the property to ``collections.UserDict`` (since this type must mimick
> the ``dict`` API).
>
> There are multiple issues:
>
> * To be consistent and avoid bad surprises, the version must be added to
>   all mapping types. Implementing a new mapping type would require extra
>   work for no benefit, since the version is only required on the
>   ``dict`` type in practice.
> * All Python implementations must implement this new property, it gives
>   more work to other implementations, whereas they may not use the
>   dictionary version at all.
> * Exposing the dictionary version at Python level can lead the
>   false assumption on performances. Checking ``dict.__version__`` at
>   the Python level is not faster than a dictionary lookup. A dictionary
>   lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
>   ns, the difference is only 1.2 ns (3%)::
>
>
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
>     10000000 loops, best of 3: 0.0487 usec per loop
>     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
> 'd.__version__ == 100'
>     10000000 loops, best of 3: 0.0475 usec per loop
>
> * The ``__version__`` can be wrapped on integer overflow. It is error
>   prone: using ``dict.__version__ <= guard_version`` is wrong,
>   ``dict.__version__ == guard_version`` must be used instead to reduce
>   the risk of bug on integer overflow (even if the integer overflow is
>   unlikely in practice).
>
> Mandatory bikeshedding on the property name:
>
> * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
>   `abc.get_cache_token()
>   <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
> * ``__version__``
> * ``__timestamp__``
>
>
> Add a version to each dict entry
> --------------------------------
>
> A single version per dictionary requires to keep a strong reference to
> the value which can keep the value alive longer than expected. If we add
> also a version per dictionary entry, the guard can only store the entry
> version to avoid the strong reference to the value (only strong
> references to the dictionary and to the key are needed).
>
> Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
> the field has the C type ``PY_INT64_T``. When a key is created or
> modified, the entry version is set to the dictionary version which is
> incremented at any change (create, modify, delete).
>
> Pseudo-code of an fast guard to check if a dictionary key was modified
> using hypothetical ``dict_get_version(dict)``
> ``dict_get_entry_version(dict)`` functions::
>
>     UNSET = object()
>
>     class GuardDictKey:
>         def __init__(self, dict, key):
>             self.dict = dict
>             self.key = key
>             self.dict_version = dict_get_version(dict)
>             self.entry_version = dict_get_entry_version(dict, key)
>
>         def check(self):
>             """Return True if the dictionary entry did not changed
>             and the dictionary was not replaced."""
>
>             # read the version of the dict structure
>             dict_version = dict_get_version(self.dict)
>             if dict_version == self.version:
>                 # Fast-path: dictionary lookup avoided
>                 return True
>
>             # lookup in the dictionary
>             entry_version = get_dict_key_version(dict, key)
>             if entry_version == self.entry_version:
>                 # another key was modified:
>                 # cache the new dictionary version
>                 self.dict_version = dict_version
>                 return True
>
>             # the key was modified
>             return False
>
> The main drawback of this option is the impact on the memory footprint.
> It increases the size of each dictionary entry, so the overhead depends
> on the number of buckets (dictionary entries, used or unused yet). For
> example, it increases the size of each dictionary entry by 8 bytes on
> 64-bit system.
>
> In Python, the memory footprint matters and the trend is to reduce it.
> Examples:
>
> * `PEP 393 -- Flexible String Representation
>   <https://www.python.org/dev/peps/pep-0393/>`_
> * `PEP 412 -- Key-Sharing Dictionary
>   <https://www.python.org/dev/peps/pep-0412/>`_
>
>
> Add a new dict subtype
> ----------------------
>
> Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
> use the ``verdict`` for namespaces (module namespace, type namespace,
> instance namespace, etc.) instead of ``dict``.
>
> Leave the ``dict`` type unchanged to not add any overhead (memory
> footprint) when guards are not needed.
>
> Technical issue: a lot of C code in the wild, including CPython core,
> expecting the exact ``dict`` type. Issues:
>
> * ``exec()`` requires a ``dict`` for globals and locals. A lot of code
>   use ``globals={}``. It is not possible to cast the ``dict`` to a
>   ``dict`` subtype because the caller expects the ``globals`` parameter
>   to be modified (``dict`` is mutable).
> * Functions call directly ``PyDict_xxx()`` functions, instead of calling
>   ``PyObject_xxx()`` if the object is a ``dict`` subtype
> * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
>   functions require the exact ``dict`` type.
> * ``Python/ceval.c`` does not completely supports dict subtypes for
>   namespaces
>
>
> The ``exec()`` issue is a blocker issue.
>
> Other issues:
>
> * The garbage collector has a special code to "untrack" ``dict``
>   instances. If a ``dict`` subtype is used for namespaces, the garbage
>   collector can be unable to break some reference cycles.
> * Some functions have a fast-path for ``dict`` which would not be taken
>   for ``dict`` subtypes, and so it would make Python a little bit
>   slower.
>
>
> Prior Art
> =========
>
> Method cache and type version tag
> ---------------------------------
>
> In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
> was merged into Python 2.6.  The patch adds a "type attribute cache
> version tag" (``tp_version_tag``) and a "valid version tag" flag to
> types (the ``PyTypeObject`` structure).
>
> The type version tag is not available at the Python level.
>
> The version tag has the C type ``unsigned int``. The cache is a global
> hash table of 4096 entries, shared by all types. The cache is global to
> "make it fast, have a deterministic and low memory footprint, and be
> easy to invalidate". Each cache entry has a version tag. A global
> version tag is used to create the next version tag, it also has the C
> type ``unsigned int``.
>
> By default, a type has its "valid version tag" flag cleared to indicate
> that the version tag is invalid. When the first method of the type is
> cached, the version tag and the "valid version tag" flag are set. When a
> type is modified, the "valid version tag" flag of the type and its
> subclasses is cleared. Later, when a cache entry of these types is used,
> the entry is removed because its version tag is outdated.
>
> On integer overflow, the whole cache is cleared and the global version
> tag is reset to ``0``.
>
> See `Method cache (issue #1685986)
> <https://bugs.python.org/issue1685986>`_ and `Armin's method cache
> optimization updated for Python 2.6 (issue #1700288)
> <https://bugs.python.org/issue1700288>`_.
>
>
> Globals / builtins cache
> ------------------------
>
> In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
> #10401) <http://bugs.python.org/issue10401>`_ which adds a private
> ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``Py_ssize_t``.
>
> The patch adds a "global and builtin cache" to functions and frames, and
> changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
> cache.
>
> The change on the ``PyDictObject`` structure is very similar to this
> PEP.
>
>
> Cached globals+builtins lookup
> ------------------------------
>
> In 2006, Andrea Griffini proposed a patch implementing a `Cached
> globals+builtins lookup optimization
> <https://bugs.python.org/issue1616125>`_.  The patch adds a private
> ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
> the field has the C type ``size_t``.
>
> Thread on python-dev: `About dictionary lookup caching
> <https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
>
>
> Guard against changing dict during iteration
> --------------------------------------------
>
> In 2013, Serhiy Storchaka proposed `Guard against changing dict during
> iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
> adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
> type), the field has the C type ``size_t``.  This field is incremented
> when the dictionary is modified, and so is very similar to the proposed
> dictionary version.
>
> Sadly, the dictionary version proposed in this PEP doesn't help to
> detect dictionary mutation. The dictionary version changes when values
> are replaced, whereas modifying dictionary values while iterating on
> dictionary keys is legit in Python.
>
>
> PySizer
> -------
>
> `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
> Google Summer of Code 2005 project by Nick Smallbone.
>
> This project has a patch for CPython 2.4 which adds ``key_time`` and
> ``value_time`` fields to dictionary entries. It uses a global
> process-wide counter for dictionaries, incremented each time that a
> dictionary is modified. The times are used to decide when child objects
> first appeared in their parent objects.
>
>
> Discussion
> ==========
>
> Thread on the mailing lists:
>
> * python-dev: `PEP 509: Add a private version to dict
>   <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
>   (january 2016)
> * python-ideas: `RFC: PEP: Add dict.__version__
>   <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
>   (january 2016)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From random832 at fastmail.com  Thu Apr 14 13:02:10 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 13:02:10 -0400
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <neoi5m$6n1$1@ger.gmane.org>
References: <CANawmyeN11WsfxEBNPuUo=8YPDqDZBdasUr5mkhOJEr_q26+GQ@mail.gmail.com>
 <neoi5m$6n1$1@ger.gmane.org>
Message-ID: <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 12:56, Terry Reedy wrote:
> https://docs.python.org/3/library/dis.html#module-dis
> CPython implementation detail: Bytecode is an implementation detail of 
> the CPython interpreter. No guarantees are made that bytecode will not 
> be added, removed, or changed between versions of Python.
> 
> Version = minor release, as opposed to maintenance release.

"between versions" is ambiguous. It could mean that there's no guarantee
that there will be no changes from one version to the next, or it could
mean, even more strongly, that there's no guarantee that there will be
no changes in a maintenance release (which are, after all, released
*between* minor releases)

From p.f.moore at gmail.com  Thu Apr 14 13:22:55 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 14 Apr 2016 18:22:55 +0100
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570FC955.3080908@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org>
 <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
 <570FC955.3080908@stoneleaf.us>
Message-ID: <CACac1F8JRZk0=RJ322U2Ft3z+w_inCOvs+MfVv6C5ER91JWMRg@mail.gmail.com>

On 14 April 2016 at 17:46, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote:
>
>> I am saying that if os.path.join now accepts RichPath objects, and those
>> objects can return either str or bytes, then its much harder to reason
>> about
>> when I have all bytes or all strings. In essence, you will force me to
>> pre-
>> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or
>> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if
>> I
>> have to always do that wrapping then os.path.join doesn't need to accept
>> RichPath objects and call fspath at all.
>
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

My experience is that (particularly with code that was originally
written for Python 2) "you have control of your data" is often an
illusion - bytes can appear in code from unexpected sources, and when
they do I'd rather see an error if I'm using code where I expect a
string. Certainly that's a bug in the code - all I'm saying is that it
fail early rather than late.

Having said this, I don't have an actual use case - but equally it
seems to me that our problem is that *nobody* does (yet) because
uptake of pathlib has been slow, thanks to limited stdlib support. My
view remains that we should get the (relatively simple and
uncontroversial) str support in place, and defer bytes support for
when we have experience with that.

I'd appreciate it if anyone can clarify why "gracefully extending" the
protocol to include bytes support at a later date isn't practical.
Paul

From k7hoven at gmail.com  Thu Apr 14 13:56:54 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Thu, 14 Apr 2016 20:56:54 +0300
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <570FC955.3080908@stoneleaf.us>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org>
 <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
 <570FC955.3080908@stoneleaf.us>
Message-ID: <CAMiohog8k70ScxzHb_oZ026iXDa4xRPNLnKmiFOV97YTj1Si+A@mail.gmail.com>

On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

There is an apparent contradiction of the above with some previous
posts, including your own. Let me try to fix it:

Code that deals with paths can be divided in groups as follows:

(1) Code that has access to pathname/filename data and has some level
of control over what data type comes in. This code may for instance
choose to deal with either bytes or str

(2) Code that takes the path or file name that it happens to get and
does something with it. This type of code can be divided into
subgroups as follows:

  (2a) Code that accepts only one type of paths (e.g. str, bytes or
pathlib) and fails if it gets something else.

  (2b) Code that wants to support different types of paths such as
str, bytes or pathlib objects. This includes os.path.*, os.scandir,
and various other standard library code. Presumably there is also
third-party code that does the same. These functions may want to
preserve the str-ness or bytes-ness of the paths in case they return
paths, as the stdlib now does. But new code may even want to return
pathlib objects when they get such objects as inputs. This is the
duck-typing or polymorphic code we have been talking about. Code of
this type (2b) may want to avoid implicit conversions because it makes
the life of code of the other types more difficult.

(feel free to fill in more categories of code)

So the code of type (2b) is trying to make all categories happy by
returning objects of the same type that it gets as input, while the
other categories are probably in the situation where they don't
necessarily need to make other categories of code happy.

And the question is this: Do we need to make code using both bytes
*and* scandir happy? This is largely the same question as whether we
have to support bytes in addition to str in the protocol.

(We may of course talk about third-party path libraries that have the
same problem as scandir's DirEntry. Ethan's library is not exactly in
the same category as DirEntry since its path objects *are* instances
of bytes or str and therefore do not need this protocol to begin with,
except perhaps for conversions from other high-level path types so
that different path libraries work together nicely).

-Koos

From ethan at stoneleaf.us  Thu Apr 14 14:12:33 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 11:12:33 -0700
Subject: [Python-Dev] MAKE_FUNCTION simplification
In-Reply-To: <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com>
References: <CANawmyeN11WsfxEBNPuUo=8YPDqDZBdasUr5mkhOJEr_q26+GQ@mail.gmail.com>
 <neoi5m$6n1$1@ger.gmane.org>
 <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com>
Message-ID: <570FDD91.9030406@stoneleaf.us>

On 04/14/2016 10:02 AM, Random832 wrote:

> "between versions" is ambiguous. It could mean that there's no guarantee
> that there will be no changes from one version to the next, or it could
> mean, even more strongly, that there's no guarantee that there will be
> no changes in a maintenance release (which are, after all, released
> *between* minor releases)

I don't see us making a breaking change in a maintenance release except 
to fix something that was already broken.

--
~Ethan~

From ethan at stoneleaf.us  Thu Apr 14 14:17:25 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 11:17:25 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CACac1F8JRZk0=RJ322U2Ft3z+w_inCOvs+MfVv6C5ER91JWMRg@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>	<loom.20160413T071958-483@post.gmane.org>	<CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>	<loom.20160414T083232-765@post.gmane.org>	<570FAD78.60505@stoneleaf.us>	<loom.20160414T172609-785@post.gmane.org>	<570FC955.3080908@stoneleaf.us>
 <CACac1F8JRZk0=RJ322U2Ft3z+w_inCOvs+MfVv6C5ER91JWMRg@mail.gmail.com>
Message-ID: <570FDEB5.4050507@stoneleaf.us>

On 04/14/2016 10:22 AM, Paul Moore wrote:
> On 14 April 2016 at 17:46, Ethan Furman wrote:

>> If you are not working at the bytes layer, you shouldn't be getting bytes
>> objects because:
>>
>> - you specified str when asking for data from the OS, or
>> - you transformed the incoming bytes from whatever external source
>>    to str when you received them.
>
> My experience is that (particularly with code that was originally
> written for Python 2) "you have control of your data" is often an
> illusion - bytes can appear in code from unexpected sources, and when
> they do I'd rather see an error if I'm using code where I expect a
> string. Certainly that's a bug in the code - all I'm saying is that it
> fail early rather than late.

If we have one function that uses a flag and you leave the flag alone 
(it defaults to rejecting bytes) -- voila!  An error is raised when 
bytes show up.

> I'd appreciate it if anyone can clarify why "gracefully extending" the
> protocol to include bytes support at a later date isn't practical.

It's going to be a bunch of work.  I don't want to do the work twice.

On the other hand, if while doing the work it becomes apparent that 
supporting bytes and str in the protocol is either infeasible, 
confusing, or a plain ol' bad idea I have no problem ripping out the 
bytes support and going to str only.

--
~Ethan~


From random832 at fastmail.com  Thu Apr 14 14:35:35 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 14 Apr 2016 14:35:35 -0400
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CAMiohog8k70ScxzHb_oZ026iXDa4xRPNLnKmiFOV97YTj1Si+A@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org> <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
 <570FC955.3080908@stoneleaf.us>
 <CAMiohog8k70ScxzHb_oZ026iXDa4xRPNLnKmiFOV97YTj1Si+A@mail.gmail.com>
Message-ID: <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com>

On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
> (1) Code that has access to pathname/filename data and has some level
> of control over what data type comes in. This code may for instance
> choose to deal with either bytes or str
> 
> (2) Code that takes the path or file name that it happens to get and
> does something with it. This type of code can be divided into
> subgroups as follows:
> 
>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
> pathlib) and fails if it gets something else.

Ideally, these should go away.

>   (2b) Code that wants to support different types of paths such as
> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
> and various other standard library code. Presumably there is also
> third-party code that does the same. These functions may want to
> preserve the str-ness or bytes-ness of the paths in case they return
> paths, as the stdlib now does. But new code may even want to return
> pathlib objects when they get such objects as inputs.

Hold on. None of the discussion I've seen has included any way to
specify how to construct a new object representing a different path
other than the ones passed in. Surely you're not suggesting type(a)(b).

Also, how does DirEntry fit in with any of this?

> This is the
> duck-typing or polymorphic code we have been talking about. Code of
> this type (2b) may want to avoid implicit conversions because it makes
> the life of code of the other types more difficult.

As long as the type it returns is still a path/bytes/str (and therefore
can be accepted when the caller passes it somewhere else) what's the
problem?

From ethan at stoneleaf.us  Thu Apr 14 14:39:43 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 11:39:43 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us> <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwY7XD5KJB95CFUKa7h70q1CgO4ebOh_0VWR0notBu+hYQ@mail.gmail.com>
Message-ID: <570FE3EF.8050206@stoneleaf.us>

On 04/13/2016 02:37 PM, Victor Stinner wrote:

> I'm not a big fan of a flag parameter to change the return type of a
> function. Usually, two functions are preferred. In the os module we have
> getcwd/getcwdb for example. I don't know if it's a good example

I think of os.fspath() as more of a filter/reduce operation:

- str -> str
- str DirEntry -> str

- bytes -> bytes
- bytes DirEntry -> bytes

The purpose of os.fspath() (at least the one I'm arguing for ;) is to 
distil its inputs to the lowest common denominator, and no lower -- 
which is either str for string-based path objects, or bytes for 
bytes-based path objects.

--
~Ethan~


From k7hoven at gmail.com  Thu Apr 14 15:17:21 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Thu, 14 Apr 2016 22:17:21 +0300
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <loom.20160414T083232-765@post.gmane.org>
 <570FAD78.60505@stoneleaf.us>
 <loom.20160414T172609-785@post.gmane.org>
 <570FC955.3080908@stoneleaf.us>
 <CAMiohog8k70ScxzHb_oZ026iXDa4xRPNLnKmiFOV97YTj1Si+A@mail.gmail.com>
 <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com>
Message-ID: <CAMiohojbQCWqRKwqcixTWq0RmfZh8-R4HswK3pVFXuajw-RMgQ@mail.gmail.com>

On Thu, Apr 14, 2016 at 9:35 PM, Random832 <random832 at fastmail.com> wrote:
> On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote:
>> (1) Code that has access to pathname/filename data and has some level
>> of control over what data type comes in. This code may for instance
>> choose to deal with either bytes or str
>>
>> (2) Code that takes the path or file name that it happens to get and
>> does something with it. This type of code can be divided into
>> subgroups as follows:
>>
>>   (2a) Code that accepts only one type of paths (e.g. str, bytes or
>> pathlib) and fails if it gets something else.
>
> Ideally, these should go away.
>

I don't think so. (1) might even be the most common type of all code.
This is code that gets a path from user input, from a config file,
from a database etc. and then does things with it, typically including
passing it to type (2) code and potentially getting a path back from
there too.

>>   (2b) Code that wants to support different types of paths such as
>> str, bytes or pathlib objects. This includes os.path.*, os.scandir,
>> and various other standard library code. Presumably there is also
>> third-party code that does the same. These functions may want to
>> preserve the str-ness or bytes-ness of the paths in case they return
>> paths, as the stdlib now does. But new code may even want to return
>> pathlib objects when they get such objects as inputs.
>
> Hold on. None of the discussion I've seen has included any way to
> specify how to construct a new object representing a different path
> other than the ones passed in. Surely you're not suggesting type(a)(b).
>

That's right. This protocol is not solving the issue of returning
'rich' path objects. It's solving the issue of passing those objects
to lower-level functions or to interact with other 'rich' path types.
What I meant by this is that there may be code that *does* want to do
type(a)(b), which is out of our control. Maybe I should not have
mentioned that.

> Also, how does DirEntry fit in with any of this?
>

os.scandir + DirEntry are one of the many things in the stdlib that
give you pathnames of the same type as those that were put in.

>> This is the
>> duck-typing or polymorphic code we have been talking about. Code of
>> this type (2b) may want to avoid implicit conversions because it makes
>> the life of code of the other types more difficult.
>
> As long as the type it returns is still a path/bytes/str (and therefore
> can be accepted when the caller passes it somewhere else) what's the
> problem?

No, because not all paths are passed to the function that does the
implicit conversion, and then when for instance os.path.joining two
paths of a differenty type, it raises an error.

In other words: Most non-library code (even library code?) deals with
one specific type and does not want implicit conversions to other
types. Some code (2b) deals with several types and, at least in the
stdlib, such code returns paths of the same type as they are given,
which makes said "most non-library code" happy, because it does not
force the programmer to think about type conversions.

(Then there is also code that explicitly deals with type conversions,
such as os.fsencode and os.fsdecode.)

-Koos

From victor.stinner at gmail.com  Thu Apr 14 15:49:12 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 21:49:12 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAP7+vJ+ZjahgH=pq6XeXzWX8mZ=JTpZz6XM8fMprpPy8erTaEw@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAP7+vJ+ZjahgH=pq6XeXzWX8mZ=JTpZz6XM8fMprpPy8erTaEw@mail.gmail.com>
Message-ID: <CAMpsgwYj_PZLnOHm42wwTQOweoL4cHB0dibf7ydkcu0eHHi8ww@mail.gmail.com>

It would be nice to hear Barry Warsow who was opposed to the PEP in
january. He wanted to wait until FAT Python was proven to really be faster,
which is still not case right now. (I mean that I didnt't run seriously
benchmarks, but early macro benchmarks are not really promising, only micro
benchmarks. I expect better results when the implemenation will be more
complete.)

The main change since january is that Yury wrote a patch making method
calls using the PEP.
https://mail.python.org/pipermail/python-dev/2016-January/142772.html

Victor

Le jeudi 14 avril 2016, Guido van Rossum <guido at python.org> a ?crit :

> I'll wait a day before formally pronouncing to see if any objections
> are made, but it looks good to me.
>
> On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner
> <victor.stinner at gmail.com <javascript:;>> wrote:
> > Hi,
> >
> > I updated my PEP 509 to make the dictionary version globally unique.
> > With *two* use cases of this PEP (Yury's method call patch and my FAT
> > Python project), I think that the PEP is now ready to be accepted.
> >
> > Globally unique identifier is a requirement for Yury's patch
> > optimizing method calls ( https://bugs.python.org/issue26110 ). It
> > allows to check for free if the dictionary was replaced.
> >
> > I also renamed the ma_version field to ma_version_tag.
> >
> > HTML version:
> > https://www.python.org/dev/peps/pep-0509/
> >
> > Victor
> >
> >
> > PEP: 509
> > Title: Add a private version to dict
> > Version: $Revision$
> > Last-Modified: $Date$
> > Author: Victor Stinner <victor.stinner at gmail.com <javascript:;>>
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 4-January-2016
> > Python-Version: 3.6
> >
> >
> > Abstract
> > ========
> >
> > Add a new private version to the builtin ``dict`` type, incremented at
> > each dictionary creation and at each dictionary change, to implement
> > fast guards on namespaces.
> >
> >
> > Rationale
> > =========
> >
> > In Python, the builtin ``dict`` type is used by many instructions. For
> > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
> > global namespace, or in the builtins namespace (two dict lookups).
> > Python uses ``dict`` for the builtins namespace, globals namespace, type
> > namespaces, instance namespaces, etc. The local namespace (namespace of
> > a function) is usually optimized to an array, but it can be a dict too.
> >
> > Python is hard to optimize because almost everything is mutable: builtin
> > functions, function code, global variables, local variables, ... can be
> > modified at runtime. Implementing optimizations respecting the Python
> > semantics requires to detect when "something changes": we will call
> > these checks "guards".
> >
> > The speedup of optimizations depends on the speed of guard checks. This
> > PEP proposes to add a version to dictionaries to implement fast guards
> > on namespaces.
> >
> > Dictionary lookups can be skipped if the version does not change which
> > is the common case for most namespaces. Since the version is globally
> > unique, the version is also enough to check if the namespace dictionary
> > was not replaced with a new dictionary. The performance of a guard does
> > not depend on the number of watched dictionary entries, complexity of
> > O(1), if the dictionary version does not change.
> >
> > Example of optimization: copy the value of a global variable to function
> > constants.  This optimization requires a guard on the global variable to
> > check if it was modified. If the variable is modified, the variable must
> > be loaded at runtime when the function is called, instead of using the
> > constant.
> >
> > See the `PEP 510 -- Specialized functions with guards
> > <https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
> > guards to specialize functions and for the rationale on Python static
> > optimizers.
> >
> >
> > Guard example
> > =============
> >
> > Pseudo-code of an fast guard to check if a dictionary entry was modified
> > (created, updated or deleted) using an hypothetical
> > ``dict_get_version(dict)`` function::
> >
> >     UNSET = object()
> >
> >     class GuardDictKey:
> >         def __init__(self, dict, key):
> >             self.dict = dict
> >             self.key = key
> >             self.value = dict.get(key, UNSET)
> >             self.version = dict_get_version(dict)
> >
> >         def check(self):
> >             """Return True if the dictionary entry did not changed
> >             and the dictionary was not replaced."""
> >
> >             # read the version of the dict structure
> >             version = dict_get_version(self.dict)
> >             if version == self.version:
> >                 # Fast-path: dictionary lookup avoided
> >                 return True
> >
> >             # lookup in the dictionary
> >             value = self.dict.get(self.key, UNSET)
> >             if value is self.value:
> >                 # another key was modified:
> >                 # cache the new dictionary version
> >                 self.version = version
> >                 return True
> >
> >             # the key was modified
> >             return False
> >
> >
> > Usage of the dict version
> > =========================
> >
> > Speedup method calls 1.2x
> > -------------------------
> >
> > Yury Selivanov wrote a `patch to optimize method calls
> > <https://bugs.python.org/issue26110>`_. The patch depends on the
> > `implement per-opcode cache in ceval
> > <https://bugs.python.org/issue26219>`_ patch which requires dictionary
> > versions to invalidate the cache if the globals dictionary or the
> > builtins dictionary has been modified.
> >
> > The cache also requires that the dictionary version is globally unique.
> > It is possible to define a function in a namespace and call it
> > in a different namespace: using ``exec()`` with the *globals* parameter
> > for example. In this case, the globals dictionary was changed and the
> > cache must be invalidated.
> >
> >
> > Specialized functions using guards
> > ----------------------------------
> >
> > The `PEP 510 -- Specialized functions with guards
> > <https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
> > specialized functions with guards. It allows to implement static
> > optimizers for Python without breaking the Python semantics.
> >
> > Example of a static Python optimizer: the `fatoptimizer
> > <http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
> > <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
> > implements many optimizations which require guards on namespaces.
> > Examples:
> >
> > * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
> >   ``builtins.__dict__['len']`` and ``globals()['len']`` are required
> > * Loop unrolling: to unroll the loop ``for i in range(...): ...``,
> >   guards on ``builtins.__dict__['range']`` and ``globals()['range']``
> >   are required
> >
> >
> > Pyjion
> > ------
> >
> > According of Brett Cannon, one of the two main developers of Pyjion,
> > Pyjion can also benefit from dictionary version to implement
> > optimizations.
> >
> > Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
> > Core runtime).
> >
> >
> > Unladen Swallow
> > ---------------
> >
> > Even if dictionary version was not explicitly mentioned, optimizing
> > globals and builtins lookup was part of the Unladen Swallow plan:
> > "Implement one of the several proposed schemes for speeding lookups of
> > globals and builtins." Source: `Unladen Swallow ProjectPlan
> > <https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
> >
> > Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
> > implemented with LLVM. The project stopped in 2011: `Unladen Swallow
> > Retrospective
> > <http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html
> >`_.
> >
> >
> > Changes
> > =======
> >
> > Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
> > the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
> > dictionary version. Each time a dictionary is created, the global
> > version is incremented and the dictionary version is initialized to the
> > global version. The global version is also incremented and copied to the
> > dictionary version at each dictionary change:
> >
> > * ``clear()`` if the dict was non-empty
> > * ``pop(key)`` if the key exists
> > * ``popitem()`` if the dict is non-empty
> > * ``setdefault(key, value)`` if the `key` does not exist
> > * ``__detitem__(key)`` if the key exists
> > * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
> >   is not ``dict[key]``
> > * ``update(...)`` if new values are different than existing values:
> >   values are compared by identity, not by their content; the version can
> >   be incremented multiple times
> >
> > The ``PyDictObject`` structure is not part of the stable ABI.
> >
> > The field is called ``ma_version_tag`` rather than ``ma_version`` to
> > suggest to compare it using ``version_tag == old_version_tag`` rather
> > than ``version <= old_version`` which makes the integer overflow much
> > likely.
> >
> > Example using an hypothetical ``dict_get_version(dict)`` function::
> >
> >     >>> d = {}
> >     >>> dict_get_version(d)
> >     100
> >     >>> d['key'] = 'value'
> >     >>> dict_get_version(d)
> >     101
> >     >>> d['key'] = 'new value'
> >     >>> dict_get_version(d)
> >     102
> >     >>> del d['key']
> >     >>> dict_get_version(d)
> >     103
> >
> > The version is not incremented if an existing key is set to the same
> > value. For efficiency, values are compared by their identity:
> > ``new_value is old_value``, not by their content:
> > ``new_value == old_value``. Example::
> >
> >     >>> d = {}
> >     >>> value = object()
> >     >>> d['key'] = value
> >     >>> dict_get_version(d)
> >     40
> >     >>> d['key'] = value
> >     >>> dict_get_version(d)
> >     40
> >
> > .. note::
> >    CPython uses some singleton like integers in the range [-5; 257],
> >    empty tuple, empty strings, Unicode strings of a single character in
> >    the range [U+0000; U+00FF], etc. When a key is set twice to the same
> >    singleton, the version is not modified.
> >
> >
> > Implementation and Performance
> > ==============================
> >
> > The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
> > <https://bugs.python.org/issue26058>`_ contains a patch implementing
> > this PEP.
> >
> > On pybench and timeit microbenchmarks, the patch does not seem to add
> > any overhead on dictionary operations.
> >
> > When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
> > a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
> > a guard can watch for multiple keys. For example, for an optimization
> > using 10 global variables in a function, 10 dictionary lookups costs 148
> > ns, whereas the guard still only costs 3.8 ns when the version does not
> > change (39x as fast).
> >
> > The `fat module
> > <http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
> > such guards: ``fat.GuardDict`` is based on the dictionary version.
> >
> >
> > Integer overflow
> > ================
> >
> > The implementation uses the C type ``PY_UINT64_T`` to store the version:
> > a 64 bits unsigned integer. The C code uses ``version++``. On integer
> > overflow, the version is wrapped to ``0`` (and then continue to be
> > incremented) according to the C standard.
> >
> > After an integer overflow, a guard can succeed whereas the watched
> > dictionary key was modified. The bug only occurs at a guard check if
> > there are exaclty ``2 ** 64`` dictionary creations or modifications
> > since the previous guard check.
> >
> > If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
> > takes longer than 584 years. Using a 32-bit version, it only takes 4
> > seconds. That's why a 64-bit unsigned type is also used on 32-bit
> > systems. A dictionary lookup at the C level takes 14.8 ns.
> >
> > A risk of a bug every 584 years is acceptable.
> >
> >
> > Alternatives
> > ============
> >
> > Expose the version at Python level as a read-only __version__ property
> > ----------------------------------------------------------------------
> >
> > The first version of the PEP proposed to expose the dictionary version
> > as a read-only ``__version__`` property at Python level, and also to add
> > the property to ``collections.UserDict`` (since this type must mimick
> > the ``dict`` API).
> >
> > There are multiple issues:
> >
> > * To be consistent and avoid bad surprises, the version must be added to
> >   all mapping types. Implementing a new mapping type would require extra
> >   work for no benefit, since the version is only required on the
> >   ``dict`` type in practice.
> > * All Python implementations must implement this new property, it gives
> >   more work to other implementations, whereas they may not use the
> >   dictionary version at all.
> > * Exposing the dictionary version at Python level can lead the
> >   false assumption on performances. Checking ``dict.__version__`` at
> >   the Python level is not faster than a dictionary lookup. A dictionary
> >   lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
> >   ns, the difference is only 1.2 ns (3%)::
> >
> >
> >     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
> 'd["33"] == 33'
> >     10000000 loops, best of 3: 0.0487 usec per loop
> >     $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
> > 'd.__version__ == 100'
> >     10000000 loops, best of 3: 0.0475 usec per loop
> >
> > * The ``__version__`` can be wrapped on integer overflow. It is error
> >   prone: using ``dict.__version__ <= guard_version`` is wrong,
> >   ``dict.__version__ == guard_version`` must be used instead to reduce
> >   the risk of bug on integer overflow (even if the integer overflow is
> >   unlikely in practice).
> >
> > Mandatory bikeshedding on the property name:
> >
> > * ``__cache_token__``: name proposed by Nick Coghlan, name coming from
> >   `abc.get_cache_token()
> >   <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
> > * ``__version__``
> > * ``__timestamp__``
> >
> >
> > Add a version to each dict entry
> > --------------------------------
> >
> > A single version per dictionary requires to keep a strong reference to
> > the value which can keep the value alive longer than expected. If we add
> > also a version per dictionary entry, the guard can only store the entry
> > version to avoid the strong reference to the value (only strong
> > references to the dictionary and to the key are needed).
> >
> > Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
> > the field has the C type ``PY_INT64_T``. When a key is created or
> > modified, the entry version is set to the dictionary version which is
> > incremented at any change (create, modify, delete).
> >
> > Pseudo-code of an fast guard to check if a dictionary key was modified
> > using hypothetical ``dict_get_version(dict)``
> > ``dict_get_entry_version(dict)`` functions::
> >
> >     UNSET = object()
> >
> >     class GuardDictKey:
> >         def __init__(self, dict, key):
> >             self.dict = dict
> >             self.key = key
> >             self.dict_version = dict_get_version(dict)
> >             self.entry_version = dict_get_entry_version(dict, key)
> >
> >         def check(self):
> >             """Return True if the dictionary entry did not changed
> >             and the dictionary was not replaced."""
> >
> >             # read the version of the dict structure
> >             dict_version = dict_get_version(self.dict)
> >             if dict_version == self.version:
> >                 # Fast-path: dictionary lookup avoided
> >                 return True
> >
> >             # lookup in the dictionary
> >             entry_version = get_dict_key_version(dict, key)
> >             if entry_version == self.entry_version:
> >                 # another key was modified:
> >                 # cache the new dictionary version
> >                 self.dict_version = dict_version
> >                 return True
> >
> >             # the key was modified
> >             return False
> >
> > The main drawback of this option is the impact on the memory footprint.
> > It increases the size of each dictionary entry, so the overhead depends
> > on the number of buckets (dictionary entries, used or unused yet). For
> > example, it increases the size of each dictionary entry by 8 bytes on
> > 64-bit system.
> >
> > In Python, the memory footprint matters and the trend is to reduce it.
> > Examples:
> >
> > * `PEP 393 -- Flexible String Representation
> >   <https://www.python.org/dev/peps/pep-0393/>`_
> > * `PEP 412 -- Key-Sharing Dictionary
> >   <https://www.python.org/dev/peps/pep-0412/>`_
> >
> >
> > Add a new dict subtype
> > ----------------------
> >
> > Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
> > use the ``verdict`` for namespaces (module namespace, type namespace,
> > instance namespace, etc.) instead of ``dict``.
> >
> > Leave the ``dict`` type unchanged to not add any overhead (memory
> > footprint) when guards are not needed.
> >
> > Technical issue: a lot of C code in the wild, including CPython core,
> > expecting the exact ``dict`` type. Issues:
> >
> > * ``exec()`` requires a ``dict`` for globals and locals. A lot of code
> >   use ``globals={}``. It is not possible to cast the ``dict`` to a
> >   ``dict`` subtype because the caller expects the ``globals`` parameter
> >   to be modified (``dict`` is mutable).
> > * Functions call directly ``PyDict_xxx()`` functions, instead of calling
> >   ``PyObject_xxx()`` if the object is a ``dict`` subtype
> > * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
> >   functions require the exact ``dict`` type.
> > * ``Python/ceval.c`` does not completely supports dict subtypes for
> >   namespaces
> >
> >
> > The ``exec()`` issue is a blocker issue.
> >
> > Other issues:
> >
> > * The garbage collector has a special code to "untrack" ``dict``
> >   instances. If a ``dict`` subtype is used for namespaces, the garbage
> >   collector can be unable to break some reference cycles.
> > * Some functions have a fast-path for ``dict`` which would not be taken
> >   for ``dict`` subtypes, and so it would make Python a little bit
> >   slower.
> >
> >
> > Prior Art
> > =========
> >
> > Method cache and type version tag
> > ---------------------------------
> >
> > In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
> > was merged into Python 2.6.  The patch adds a "type attribute cache
> > version tag" (``tp_version_tag``) and a "valid version tag" flag to
> > types (the ``PyTypeObject`` structure).
> >
> > The type version tag is not available at the Python level.
> >
> > The version tag has the C type ``unsigned int``. The cache is a global
> > hash table of 4096 entries, shared by all types. The cache is global to
> > "make it fast, have a deterministic and low memory footprint, and be
> > easy to invalidate". Each cache entry has a version tag. A global
> > version tag is used to create the next version tag, it also has the C
> > type ``unsigned int``.
> >
> > By default, a type has its "valid version tag" flag cleared to indicate
> > that the version tag is invalid. When the first method of the type is
> > cached, the version tag and the "valid version tag" flag are set. When a
> > type is modified, the "valid version tag" flag of the type and its
> > subclasses is cleared. Later, when a cache entry of these types is used,
> > the entry is removed because its version tag is outdated.
> >
> > On integer overflow, the whole cache is cleared and the global version
> > tag is reset to ``0``.
> >
> > See `Method cache (issue #1685986)
> > <https://bugs.python.org/issue1685986>`_ and `Armin's method cache
> > optimization updated for Python 2.6 (issue #1700288)
> > <https://bugs.python.org/issue1700288>`_.
> >
> >
> > Globals / builtins cache
> > ------------------------
> >
> > In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
> > #10401) <http://bugs.python.org/issue10401>`_ which adds a private
> > ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
> > the field has the C type ``Py_ssize_t``.
> >
> > The patch adds a "global and builtin cache" to functions and frames, and
> > changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
> > cache.
> >
> > The change on the ``PyDictObject`` structure is very similar to this
> > PEP.
> >
> >
> > Cached globals+builtins lookup
> > ------------------------------
> >
> > In 2006, Andrea Griffini proposed a patch implementing a `Cached
> > globals+builtins lookup optimization
> > <https://bugs.python.org/issue1616125>`_.  The patch adds a private
> > ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
> > the field has the C type ``size_t``.
> >
> > Thread on python-dev: `About dictionary lookup caching
> > <https://mail.python.org/pipermail/python-dev/2006-December/070348.html
> >`_.
> >
> >
> > Guard against changing dict during iteration
> > --------------------------------------------
> >
> > In 2013, Serhiy Storchaka proposed `Guard against changing dict during
> > iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
> > adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
> > type), the field has the C type ``size_t``.  This field is incremented
> > when the dictionary is modified, and so is very similar to the proposed
> > dictionary version.
> >
> > Sadly, the dictionary version proposed in this PEP doesn't help to
> > detect dictionary mutation. The dictionary version changes when values
> > are replaced, whereas modifying dictionary values while iterating on
> > dictionary keys is legit in Python.
> >
> >
> > PySizer
> > -------
> >
> > `PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
> > Google Summer of Code 2005 project by Nick Smallbone.
> >
> > This project has a patch for CPython 2.4 which adds ``key_time`` and
> > ``value_time`` fields to dictionary entries. It uses a global
> > process-wide counter for dictionaries, incremented each time that a
> > dictionary is modified. The times are used to decide when child objects
> > first appeared in their parent objects.
> >
> >
> > Discussion
> > ==========
> >
> > Thread on the mailing lists:
> >
> > * python-dev: `PEP 509: Add a private version to dict
> >   <https://mail.python.org/pipermail/python-dev/2016-January/142685.html
> >`_
> >   (january 2016)
> > * python-ideas: `RFC: PEP: Add dict.__version__
> >   <
> https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
> >   (january 2016)
> >
> >
> > Copyright
> > =========
> >
> > This document has been placed in the public domain.
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org <javascript:;>
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/e1650215/attachment-0001.html>

From victor.stinner at gmail.com  Thu Apr 14 15:56:10 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 21:56:10 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <neohld$sts$1@ger.gmane.org>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <neohld$sts$1@ger.gmane.org>
Message-ID: <CAMpsgwZkEO3fHPDg=vaHb4FYXBJmEznt3saJzivXCMjRVXpPxQ@mail.gmail.com>

Which kind of usage do you see in Cython?

Off-topic (PEP 510):

I really want to experiment automatic generation of Cython code from the
Python using profiling to discover function parameters types. Then use the
PEP 510 to attach the fast Cython code to a Python function, but fallback
to bytecode if the types are different. See the example of builtin
functions in the PEP:
https://www.python.org/dev/peps/pep-0510/#using-builtin-function

Before having something fully automated, we can use some manual steps, like
annotate manually function types, compile manually the code, etc.

Victor

Le jeudi 14 avril 2016, Stefan Behnel <stefan_ml at behnel.de> a ?crit :

> +1 from me, too. I'm sure we can make some use of this in Cython.
>
> Stefan
>
>
> Victor Stinner schrieb am 14.04.2016 um 17:19:
> > PEP: 509
> > Title: Add a private version to dict
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <javascript:;>
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/b0a63927/attachment.html>

From stefan_ml at behnel.de  Thu Apr 14 16:34:28 2016
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 14 Apr 2016 22:34:28 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwZkEO3fHPDg=vaHb4FYXBJmEznt3saJzivXCMjRVXpPxQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <neohld$sts$1@ger.gmane.org>
 <CAMpsgwZkEO3fHPDg=vaHb4FYXBJmEznt3saJzivXCMjRVXpPxQ@mail.gmail.com>
Message-ID: <neousk$pag$1@ger.gmane.org>

Victor Stinner schrieb am 14.04.2016 um 21:56:
> Which kind of usage do you see in Cython?

Mainly caching, I guess. We could avoid global/module name lookups in some
cases, especially inside of loops.


> Off-topic (PEP 510):
> 
> I really want to experiment automatic generation of Cython code from the
> Python using profiling to discover function parameters types. Then use the
> PEP 510 to attach the fast Cython code to a Python function, but fallback
> to bytecode if the types are different. See the example of builtin
> functions in the PEP:
> https://www.python.org/dev/peps/pep-0510/#using-builtin-function
> 
> Before having something fully automated, we can use some manual steps, like
> annotate manually function types, compile manually the code, etc.

Sounds like Cython's "Fused Types" could help here:

http://docs.cython.org/src/userguide/fusedtypes.html

It's essentially a generic functions implementation and you get a dispatch
either at compile time or runtime, depending on where (Python/Cython) and
how you call a function.

Stefan



From arigo at tunes.org  Thu Apr 14 16:42:21 2016
From: arigo at tunes.org (Armin Rigo)
Date: Thu, 14 Apr 2016 22:42:21 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
Message-ID: <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>

Hi Victor,

On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com> wrote:
> Each time a dictionary is created, the global
> version is incremented and the dictionary version is initialized to the
> global version.

A detail, but why not set the version tag of new empty dictionaries to
zero, always?   Same after a clear().  This would satisfy the
condition: equality of the version tag is supposed to mean "the
dictionary content is precisely the same".


A bient?t,

Armin.

From barry at python.org  Thu Apr 14 16:50:51 2016
From: barry at python.org (Barry Warsaw)
Date: Thu, 14 Apr 2016 16:50:51 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwYj_PZLnOHm42wwTQOweoL4cHB0dibf7ydkcu0eHHi8ww@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAP7+vJ+ZjahgH=pq6XeXzWX8mZ=JTpZz6XM8fMprpPy8erTaEw@mail.gmail.com>
 <CAMpsgwYj_PZLnOHm42wwTQOweoL4cHB0dibf7ydkcu0eHHi8ww@mail.gmail.com>
Message-ID: <20160414165051.341ab547@subdivisions>

On Apr 14, 2016, at 09:49 PM, Victor Stinner wrote:

>It would be nice to hear Barry Warsow who was opposed to the PEP in
>january. He wanted to wait until FAT Python was proven to really be faster,
>which is still not case right now. (I mean that I didnt't run seriously
>benchmarks, but early macro benchmarks are not really promising, only micro
>benchmarks. I expect better results when the implemenation will be more
>complete.)

Although I'm not totally convinced, I won't continue to object.  You've
provided some performance numbers in the PEP even without FAT, and you aren't
exposing the API to Python, so it's not a burden being imposed on other
implementations.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/5069f34c/attachment.sig>

From vgr255 at live.ca  Thu Apr 14 17:00:58 2016
From: vgr255 at live.ca (=?UTF-8?Q?=C3=89manuel_Barry?=)
Date: Thu, 14 Apr 2016 17:00:58 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
Message-ID: <BLU403-EAS165DDB94D5883CE17E9C4F491970@phx.gbl>

> From Armin Rigo
> Sent: Thursday, April 14, 2016 4:42 PM
> To: Victor Stinner
> Cc: Python Dev
> Subject: Re: [Python-Dev] RFC: PEP 509: Add a private version to dict
> 
> Hi Victor,
> 
> On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com> wrote:
> > Each time a dictionary is created, the global
> > version is incremented and the dictionary version is initialized to the
> > global version.
> 
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".

>From Victor's original post:

"Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced."

I think it's a good design idea, and there's no chance that this counter will ever overflow (I think Victor is using 64-bit unsigned integer). I don't think there's really any drawback to using a global vs per-dict counter (but Victor is better placed to answer that :))

-Emanuel

~Ducks lay where no programmer has ever been~

From victor.stinner at gmail.com  Thu Apr 14 17:17:30 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 23:17:30 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
Message-ID: <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>

Hi,

2016-04-14 22:42 GMT+02:00 Armin Rigo <arigo at tunes.org>:
> Hi Victor,
>
> On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Each time a dictionary is created, the global
>> version is incremented and the dictionary version is initialized to the
>> global version.
>
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".

You're right that incrementing the global version is useless for these
specific cases, and using the version 0 should work. It only matters
that the version (version? version tag?) is different.

I will play with that. If I don't see any issue, I will update the PEP.

It's more an implementation detail, but it may help to mention it in the PEP.

Victor

From victor.stinner at gmail.com  Thu Apr 14 17:19:24 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 14 Apr 2016 23:19:24 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <20160414165051.341ab547@subdivisions>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAP7+vJ+ZjahgH=pq6XeXzWX8mZ=JTpZz6XM8fMprpPy8erTaEw@mail.gmail.com>
 <CAMpsgwYj_PZLnOHm42wwTQOweoL4cHB0dibf7ydkcu0eHHi8ww@mail.gmail.com>
 <20160414165051.341ab547@subdivisions>
Message-ID: <CAMpsgwazBXmdLTa=gFuT4OE_67n0PoMniSfRc6ai9VMgURMujg@mail.gmail.com>

2016-04-14 22:50 GMT+02:00 Barry Warsaw <barry at python.org>:
> Although I'm not totally convinced, I won't continue to object.  You've
> provided some performance numbers in the PEP even without FAT, and you aren't
> exposing the API to Python, so it's not a burden being imposed on other
> implementations.

Cool!

Ah right, the PEP evolved since its first version sent to
python-ideas. I didn't recall the full context of the discussion. The
PEP is now more complete and it has more known (future) use cases ;-)
(now maybe also Cython?)

Victor

From barry at python.org  Thu Apr 14 17:29:26 2016
From: barry at python.org (Barry Warsaw)
Date: Thu, 14 Apr 2016 17:29:26 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
Message-ID: <20160414172926.44085562@subdivisions>

On Apr 14, 2016, at 11:17 PM, Victor Stinner wrote:

>You're right that incrementing the global version is useless for these
>specific cases, and using the version 0 should work. It only matters
>that the version (version? version tag?) is different.
>
>I will play with that. If I don't see any issue, I will update the PEP.
>
>It's more an implementation detail, but it may help to mention it in the PEP.

I can see why you might want a global version number, but not doing so would
eliminate an implicit reliance on the GIL, or in a GIL-less implementation
<wink> a lock around incrementing the global version number.

-Barry

From victor.stinner at gmail.com  Thu Apr 14 18:13:21 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 00:13:21 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <20160414172926.44085562@subdivisions>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
Message-ID: <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>

2016-04-14 23:29 GMT+02:00 Barry Warsaw <barry at python.org>:
> I can see why you might want a global version number, but not doing so would
> eliminate an implicit reliance on the GIL, or in a GIL-less implementation
> <wink> a lock around incrementing the global version number.

It's not like the builtin dict type is going to become GIL-free... So
I think that it's ok to use a global version.

A very few know that, but the GIL has some advantages sometimes...

Victor

From brett at python.org  Thu Apr 14 18:22:04 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 14 Apr 2016 22:22:04 +0000
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
Message-ID: <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>

On Thu, 14 Apr 2016 at 15:14 Victor Stinner <victor.stinner at gmail.com>
wrote:

> 2016-04-14 23:29 GMT+02:00 Barry Warsaw <barry at python.org>:
> > I can see why you might want a global version number, but not doing so
> would
> > eliminate an implicit reliance on the GIL, or in a GIL-less
> implementation
> > <wink> a lock around incrementing the global version number.
>
> It's not like the builtin dict type is going to become GIL-free... So
> I think that it's ok to use a global version.
>
> A very few know that, but the GIL has some advantages sometimes...
>

And even if it was GIL-free you do run the risk of two dicts ending up at
the same version # by simply mutating the same number of times if the
counters were per-dict instead of process-wide.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/7b9d9158/attachment.html>

From victor.stinner at gmail.com  Thu Apr 14 18:33:23 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 00:33:23 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
 <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
Message-ID: <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>

2016-04-15 0:22 GMT+02:00 Brett Cannon <brett at python.org>:
> And even if it was GIL-free you do run the risk of two dicts ending up at
> the same version # by simply mutating the same number of times if the
> counters were per-dict instead of process-wide.

For some optimizations, it is not needed to check if the dictionary
was replaced, or you check it directly. So it doesn't matter to have
the same version with the same number of operations.

For the use case of Yury's optimization, having a globally unique
version tag makes the guard much cheaper, and the guard must check
that the dictionary was not replaced.

IMHO it's cheap enough to make the version globally unique. I don't
see any technical drawback of having a globally unique version. It
doesn't make the integer overflow much more likely. We are still
talking about many years before an overflow occurs.

--

When we will be able to get ride of the GIL for the dict type, we will
probably be able to get an atomic "global_version++" for 64-bit
integer. Right now, I don't think that an atomic int64++ is available
on 32-bit archs.

Victor

From v+python at g.nevcal.com  Thu Apr 14 19:56:55 2016
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 14 Apr 2016 16:56:55 -0700
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
 <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
 <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
Message-ID: <57102E47.4020005@g.nevcal.com>

On 4/14/2016 3:33 PM, Victor Stinner wrote:
> When we will be able to get ride of the GIL for the dict type, we will
> probably be able to get an atomic "global_version++" for 64-bit
> integer. Right now, I don't think that an atomic int64++ is available
> on 32-bit archs.
By the time we get an atomic increment for 64-bit integer, we'll be 
wanting it for 128-bit...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/e571fadf/attachment.html>

From yselivanov.ml at gmail.com  Thu Apr 14 20:06:25 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 14 Apr 2016 20:06:25 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
Message-ID: <57103081.5060803@gmail.com>



On 2016-04-14 4:42 PM, Armin Rigo wrote:
> Hi Victor,
>
> On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Each time a dictionary is created, the global
>> version is incremented and the dictionary version is initialized to the
>> global version.
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".

So

{}.version_tag == {}.version_tag == 0
{'a':1}.version_tag != {'a':1}.version_tag

right?

For my patches I need globally unique version tags
(making an exception for empty dicts is OK).

Yury


From python at mrabarnett.plus.com  Thu Apr 14 20:11:16 2016
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 15 Apr 2016 01:11:16 +0100
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
Message-ID: <571031A4.3090308@mrabarnett.plus.com>

On 2016-04-14 21:42, Armin Rigo wrote:
> Hi Victor,
>
> On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com> wrote:
>> Each time a dictionary is created, the global
>> version is incremented and the dictionary version is initialized to the
>> global version.
>
> A detail, but why not set the version tag of new empty dictionaries to
> zero, always?   Same after a clear().  This would satisfy the
> condition: equality of the version tag is supposed to mean "the
> dictionary content is precisely the same".
>
If you did that, wouldn't it then be possible to replace an empty dict 
with another empty dict with you noticing? Would that matter?


From stephen at xemacs.org  Thu Apr 14 20:20:42 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 15 Apr 2016 09:20:42 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <570FB650.203@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <570FB650.203@stoneleaf.us>
Message-ID: <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp>

Ethan Furman writes:

 > Substitute open() with sending those bytes somewhere else:

Eg, pathlib.Path, which will raise?  Surely it should be safe to pass
a DirEntry to a pathlib constructor?  Note that having Path call
fsdecode implicitly is a bad idea, because we don't know the
provenance of generic bytes.  But by design of __fspath__, its value
(if str) is suitable for passing to Path, for further processing.

 > why should I have to reencode this str back to bytes, when bytes
 > are what I asked for in the first place?

Erm, you didn't *ask* for bytes.  You asked for whatever __fspath__ is
going to give you.  And in many cases, like pathlib, it will be str.
I imagine that doesn't bother you; you plan to use antipathy anyway.
But if there's uptake on the protocol, I'll bet that str-only
implementations are the majority.

And your question also cuts the other way.  Why should *I* have to
decode bytes to str, or suffer unexpected TypeErrors, or deal with the
possibility of TypeErrors, just because __fspath__ is polymorphic?

We're here to improve pathlib.  There's been a huge amount of mission
creep, with no use cases to provide intuition.  You pit your abstract
inconvenience against my 20 years of whack-a-mole with UnicodeErrors
and TypeErrors in Mailman.  I *know* that if you let bytes that
represent text loose inside an application, eventually they'll end up
in a str context and "blooey!"

 > How did this application get a bytes path object to begin with?
 > Either it explicitly used bytes when calling scandir and friends
 > (in which case it shouldn't be surprised to be working with bytes);
 > or it got that bytes object from a database, over-the-wire,
 > an-other-language-lib, etc.

No, it got it from an __fspath__-toting object (such as a DirEntry) it
received from some library, which constructed it polymorphically from
bytes it got from some other place -- and so lost the original
encoding.  That's the scenario I think is impossible to rule out, and
reducing that kind of scenario to the bare minimum is why bytes got
demoted from being the default representation of text in Python 3 in
the first place.

 > If I'm working with bytes, why would I want to work with str?

First, are you actually *working* on those bytes, or are you just
passing them to os functions?  If the latter, you shouldn't care.

Second, because paths are conceptually text (you may not agree, but
Nick inter alia has indicated he does).  Working with bytes paths
(except literals) is a good way to get in trouble, because there are
all kinds of ways they can end up inappropriately encoded.  For
example, the odds are very high that a bytes path read from a file
(including from a zipfile directory) in Japan will be encoded in Shift
JIS.  On Mac OS X, that will either produce mojibake in the directory
(if the access creates the file) or fail to access the intended file,
because the filesystem encoding is UTF-8.

Third, because you want to be portable to Windows, where you have no
choice about whether paths are str or bytes.

These reasons probably don't apply to you with much strength, but the
question is how typical you are, vs. the nearly universal experience
of mojibake and the dominant market share of Windows.

 > Python is a glue language, and Python practitioners don't always
 > have the luxury of working only with text.

For paths?  Of course you can work with them as text.  ISTM what you
really want is the luxury of working only with bytes, because you're
in the habit of pretending they are text.  I don't object to you
having your luxury as long as it doesn't increase risk for my use
cases.  I think you're asking for trouble, and the practice is
definitely nonportable, but consenting adults applies.

However, the proposed polymorphism does create ambiguity and risk for
my uses.  I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon.  Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging!  So I don't consent;
you'll have to impose it on me.


From ethan at stoneleaf.us  Thu Apr 14 21:01:00 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 18:01:00 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>	<570C12C2.9000602@stoneleaf.us>	<CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>	<570D1F26.5090800@stoneleaf.us>	<CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>	<CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>	<CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>	<1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>	<570E659C.8010108@stoneleaf.us>	<1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<570FB650.203@stoneleaf.us>
 <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp>
Message-ID: <57103D4C.7000001@stoneleaf.us>

On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:

> However, the proposed polymorphism does create ambiguity and risk for
> my uses.  I rarely have the luxury of *not* ensuring paths are text,
> regardless of the bytes-ness of the underlying application, because I
> can be pretty darn sure that somebody's going to feed me non-
> filesystem encodings, and soon.  Even when I am working with bytes
> representing paths in the filesystem encoding, I need to convert to
> text to read the darn things when debugging!  So I don't consent;
> you'll have to impose it on me.

Hmm.  Well, the good news is you have convinced me that letting bytes 
through willy-nilly is akin to loosing the hounds of hell on our code. 
The bad news is I was never in that camp.  ;)

The camp I'm in is a function* that, be default, will raise if bytes 
enters the picture -- but will allow them through if the user 
specifically says they are okay with getting bytes.

Would that work for you?

--
~Ethan~

*Or pair of functions, one that is str-only, one that allows both -- but 
I'd rather just have one function with a flag.

From brett at python.org  Thu Apr 14 21:42:39 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 15 Apr 2016 01:42:39 +0000
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <571031A4.3090308@mrabarnett.plus.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <571031A4.3090308@mrabarnett.plus.com>
Message-ID: <CAP1=2W64LVNxkup6fFfCTVODtxOCVuwtpbFLRyn1nUAaVw3v6g@mail.gmail.com>

On Thu, Apr 14, 2016, 17:14 MRAB <python at mrabarnett.plus.com> wrote:

> On 2016-04-14 21:42, Armin Rigo wrote:
> > Hi Victor,
> >
> > On 14 April 2016 at 17:19, Victor Stinner <victor.stinner at gmail.com>
> wrote:
> >> Each time a dictionary is created, the global
> >> version is incremented and the dictionary version is initialized to the
> >> global version.
> >
> > A detail, but why not set the version tag of new empty dictionaries to
> > zero, always?   Same after a clear().  This would satisfy the
> > condition: equality of the version tag is supposed to mean "the
> > dictionary content is precisely the same".
> >
> If you did that, wouldn't it then be possible to replace an empty dict
> with another empty dict with you noticing?


If you meant to say "without" then yes.


Would that matter?
>

Nope because this is about versioining content, so having identical/empty
content compare equal is fine.

-brett


> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/439bc461/attachment.html>

From ethan at stoneleaf.us  Fri Apr 15 00:22:07 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 Apr 2016 21:22:07 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <57103D4C.7000001@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>	<570C12C2.9000602@stoneleaf.us>	<CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>	<570D1F26.5090800@stoneleaf.us>	<CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>	<CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>	<CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>	<1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>	<570E659C.8010108@stoneleaf.us>	<1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<570FB650.203@stoneleaf.us>
 <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp>
 <57103D4C.7000001@stoneleaf.us>
Message-ID: <57106C6F.7020303@stoneleaf.us>

On 04/14/2016 06:01 PM, Ethan Furman wrote:
> On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote:

>> you'll have to impose it on me.
>
> Hmm.  Well, the good news is you have convinced me that letting bytes
> through willy-nilly is akin to loosing the hounds of hell on our code.
> The bad news is I was never in that camp.  ;)

Actually, in retrospect, I was in that camp at the beginning.  But 
Brett's code (and your arguments, amongst others) convinced me of that 
<one function with a flag> or <a pair of functions> would be better/safer.

--
~Ethan~

From steve at pearwood.info  Fri Apr 15 00:52:54 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 15 Apr 2016 14:52:54 +1000
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
Message-ID: <20160415045254.GI1819@ando.pearwood.info>

Now that PEP 506 has been approved, I've checked in the secrets module, 
but an implementation question has come up regarding compare_digest.

Currently, the module tries to import hmac.compare_digest, and if that 
fails, then it falls back to a Python version. But since compare_digest 
has been available since 3.3, I'm now questioning whether the fallback 
is useful at all. Perhaps for alternate Python implementations?

So, two questions:

- should secrets include a fallback?

- if so, what is the preferred way of doing this?

# option 1: fallback if compare_digest is missing

try:
    from hmac import compare_digest
except ImportError:
    def compare_digest(a, b):
        ...


# option 2: "C accelerator idiom"

def compare_digest(a, b):
    ...

try:
    from hmac import compare_digest
except ImportError:
    pass


Option 1 is closer to how I would write hybrid 2/3 code, but option 2 is 
how PEP 399 suggests it should be written.

https://www.python.org/dev/peps/pep-0399/


Currently, hmac imports compare_digest from _operator. There's no Python 
version in operator either. Should there be?



-- 
Steve

From senthil at uthcode.com  Fri Apr 15 01:30:13 2016
From: senthil at uthcode.com (Senthil Kumaran)
Date: Thu, 14 Apr 2016 22:30:13 -0700
Subject: [Python-Dev] Most 3.x buildbots are green again,
 please don't break them and watch them!
In-Reply-To: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
References: <CAMpsgwbHX5NT1gUUduA27Ocv5LLR-FzrPdHYrBLYPTOytYc3Lg@mail.gmail.com>
Message-ID: <CAPOVWOTh7ZJ=z5TwF-VWQsrwTEW9WrCbtSPiqprpCL301YSQ3A@mail.gmail.com>

On Wed, Apr 13, 2016 at 4:40 AM, Victor Stinner <victor.stinner at gmail.com>
wrote:

> Last months, most 3.x buildbots failed randomly. Some of them were
> always failing. I spent some time to fix almost all Windows and Linux
> buildbots. There were a lot of different issues.
>
> So please try to not break buildbots again and remind to watch them
> sometimes:
>

Piling in my thanks again, Victor. This is a great gesture from you to fix
all the build bots.
Keeping them stable is a proper thing to do and should be expected from all
committers.

--
Senthil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160414/cbede9e9/attachment.html>

From stefan_ml at behnel.de  Fri Apr 15 01:39:03 2016
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 15 Apr 2016 07:39:03 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
 <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
 <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
Message-ID: <nepupo$5qn$1@ger.gmane.org>

Victor Stinner schrieb am 15.04.2016 um 00:33:
> 2016-04-15 0:22 GMT+02:00 Brett Cannon:
>> And even if it was GIL-free you do run the risk of two dicts ending up at
>> the same version # by simply mutating the same number of times if the
>> counters were per-dict instead of process-wide.
> 
> For some optimizations, it is not needed to check if the dictionary
> was replaced, or you check it directly. So it doesn't matter to have
> the same version with the same number of operations.
> 
> For the use case of Yury's optimization, having a globally unique
> version tag makes the guard much cheaper, and the guard must check
> that the dictionary was not replaced.

How can that be achieved? If the tag is just a sequentially growing number,
creating two dicts and applying one operation to the first one should give
both the same version tag, right?

Stefan



From ncoghlan at gmail.com  Fri Apr 15 03:11:35 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 Apr 2016 17:11:35 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <CAMpsgwbD=y0OD=e0aSv-fWjV3HJRcCwE9v83CqQ+8TY6Z7tv3Q@mail.gmail.com>
 <CADiSq7fEzV7Zm5J_KHoU_+-mgr1jtSBt3_VP78cL9KgMd_6N7g@mail.gmail.com>
 <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com>
 <CAPTjJmofbDW5ptfFb11+PVLvjrnUXY+sbK0bu2-2B7H=ZXdMew@mail.gmail.com>
 <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com>
Message-ID: <CADiSq7cSygrSb1BAyXWow-uY-wgbkCxeqCKJ+7Et2MRQs_EFjw@mail.gmail.com>

On 15 April 2016 at 00:01, Random832 <random832 at fastmail.com> wrote:
> On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote:
>> Adding integers and floats is considered "safe" because most people's
>> use of floats completely compasses their use of ints. (You'll get
>> OverflowError if it can't be represented.) But float and Decimal are
>> considered "unsafe":
>>
>> >>> 1.5 + decimal.Decimal("1.5")
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: unsupported operand type(s) for +: 'float' and
>> 'decimal.Decimal'
>>
>> This is more what's happening here. Floats and Decimals can represent
>> similar sorts of things, but with enough incompatibilities that you
>> can't simply merge them.
>
> And what such incompatibilities exist between bytes and str for the
> purpose of representing file paths? At the end of the day, there's
> exactly one answer to "what file on disk this represents (or would
> represent if it existed)".

Bytes paths on WIndows are encoded as mbcs for use with the ASCII-only
Windows APIs, and hence don't support the full range of characters
that str does. The colloquial shorthand for that is "bytes paths don't
work properly on Windows" (the more strictly accurate description is
"bytes paths only work correctly on Windows if every code point in the
path can be encoded using the 'mbcs' codec").

Even on *nix, os.fsencode may fail outright if the system is
configured to use a non-universal encoding, while os.fsdecode may
pollute the resulting string with surrogate escaped characters.

Regardless of platform, if somebody hands you *mixed* bytes and str
data, the appropriate default reaction is to complain about it rather
than assume they meant one or the other. That complaint may take one
of two forms:

- for a high level, platform independent API, bytes should just be
rejected outright
- for a low level API with input type dependent behaviour, the input
should be rejected as ambiguous - the API doesn't know whether the str
behaviour or the bytes behaviour is the intended one

pathlib falls into the first category - it just rejects bytes as input
os.path.join falls into the second category - all str is fine, and all
bytes is fine, but mixing them fails

However, once somebody reaches for the coercion APIs (fsdecode and
fsencode), they're now *explicitly* telling the interpreter what they
want, since there's no ambiguity about the possible return types from
those functions.

In relation to Victor's comment about this being complex code to show
to a novice:

  os.path.join(*map(os.fsdecode, ("str", b"bytes")))

I agree, but also think that's a good reason for people to switch to
teaching novices pathlib rather than os.path, and letting them
discover the underlying libraries as required by the code and examples
they encounter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From victor.stinner at gmail.com  Fri Apr 15 04:20:48 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 10:20:48 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <nepupo$5qn$1@ger.gmane.org>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
 <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
 <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
 <nepupo$5qn$1@ger.gmane.org>
Message-ID: <CAMpsgwYp8xe2x3-yOLjyOA3Kux1A4zKJghqOj4gqvi5w78fJUA@mail.gmail.com>

Le vendredi 15 avril 2016, Stefan Behnel <stefan_ml at behnel.de> a ?crit :

> How can that be achieved? If the tag is just a sequentially growing number,
> creating two dicts and applying one operation to the first one should give
> both the same version tag, right?
>

Armin didn't propose to get ride of the global version.

a = dict() # version = 0
b = dict() # version = 0
a['key'] = 'value' # version = 300
b['key'] = 'value' # version = 301

Victor
PS: It looks like the iPad Gmail app foces me to use HTML, I don't know how
to use plain text :-/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/e7aebec0/attachment.html>

From victor.stinner at gmail.com  Fri Apr 15 04:26:31 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 10:26:31 +0200
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
In-Reply-To: <20160415045254.GI1819@ando.pearwood.info>
References: <20160415045254.GI1819@ando.pearwood.info>
Message-ID: <CAMpsgwaT2GP=4qqr5zRESYZjcxit35Uy1_7ByKXAer6YBGt2eg@mail.gmail.com>

It's easy to implement this function (in the native language of your Python
implemenation), it's short. I'm not sure that a Python version is really
safe.

The secrets module is for Python 3.6, in this version the hmac already
"requires" the compare_digest() function no?

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/06f7656b/attachment.html>

From antoine at python.org  Fri Apr 15 05:01:00 2016
From: antoine at python.org (Antoine Pitrou)
Date: Fri, 15 Apr 2016 09:01:00 +0000 (UTC)
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
Message-ID: <loom.20160415T105902-955@post.gmane.org>

Victor Stinner <victor.stinner <at> gmail.com> writes:
> 
> Hi,
> 
> 2016-04-14 22:42 GMT+02:00 Armin Rigo <arigo <at> tunes.org>:
> > Hi Victor,
> >
> > On 14 April 2016 at 17:19, Victor Stinner <victor.stinner <at>
gmail.com> wrote:
> >> Each time a dictionary is created, the global
> >> version is incremented and the dictionary version is initialized to the
> >> global version.
> >
> > A detail, but why not set the version tag of new empty dictionaries to
> > zero, always?   Same after a clear().  This would satisfy the
> > condition: equality of the version tag is supposed to mean "the
> > dictionary content is precisely the same".
> 
> You're right that incrementing the global version is useless for these
> specific cases, and using the version 0 should work. It only matters
> that the version (version? version tag?) is different.

Why do this? It's a nice property that two dicts always have different
version tags, and now you're killing this property for... no obvious
reason?

Do you really think dict.clear() is in need of micro-optimizing a
couple CPU cycles away?

Regards

Antoine.



From stefan_ml at behnel.de  Fri Apr 15 05:03:21 2016
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 15 Apr 2016 11:03:21 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwYp8xe2x3-yOLjyOA3Kux1A4zKJghqOj4gqvi5w78fJUA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <20160414172926.44085562@subdivisions>
 <CAMpsgwavyNNLJ0O_sm=U-+2_5Md83O4JeK_+cT5b3UDrxtPJWg@mail.gmail.com>
 <CAP1=2W7v1-V+LPw-xYPN9KJ3b78ePEwqk=7VBR_BBzgKu-EELQ@mail.gmail.com>
 <CAMpsgwabYMTYwUEpCX33aQYK5vLtkT3EN747XVmRWw4xJVX2pg@mail.gmail.com>
 <nepupo$5qn$1@ger.gmane.org>
 <CAMpsgwYp8xe2x3-yOLjyOA3Kux1A4zKJghqOj4gqvi5w78fJUA@mail.gmail.com>
Message-ID: <neqaoq$tic$1@ger.gmane.org>

Victor Stinner schrieb am 15.04.2016 um 10:20:
> Le vendredi 15 avril 2016, Stefan Behnel a ?crit :
> 
>> How can that be achieved? If the tag is just a sequentially growing number,
>> creating two dicts and applying one operation to the first one should give
>> both the same version tag, right?
>>
> 
> Armin didn't propose to get ride of the global version.
> 
> a = dict() # version = 0
> b = dict() # version = 0
> a['key'] = 'value' # version = 300
> b['key'] = 'value' # version = 301

Ah, sorry, should have read the PEP more closely. It's *always* the global
version that gets incremented. Then yes, that's a safe point of distinction
for dicts and their status.

Stefan



From steve at pearwood.info  Fri Apr 15 05:21:55 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 15 Apr 2016 19:21:55 +1000
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
In-Reply-To: <CAMpsgwaT2GP=4qqr5zRESYZjcxit35Uy1_7ByKXAer6YBGt2eg@mail.gmail.com>
References: <20160415045254.GI1819@ando.pearwood.info>
 <CAMpsgwaT2GP=4qqr5zRESYZjcxit35Uy1_7ByKXAer6YBGt2eg@mail.gmail.com>
Message-ID: <20160415092155.GK1819@ando.pearwood.info>

On Fri, Apr 15, 2016 at 10:26:31AM +0200, Victor Stinner wrote:
> It's easy to implement this function (in the native language of your Python
> implemenation), it's short. I'm not sure that a Python version is really
> safe.
> 
> The secrets module is for Python 3.6, in this version the hmac already
> "requires" the compare_digest() function no?

The current version looks like this:

try:
   from hmac import compare_digest
except ImportError:
   # fallback version defined


but I'm having second thoughts about this. I don't think it needs to 
support older versions of Python, but perhaps it needs to support 
implementations which don't include compare_digest?

This isn't just a question about the secrets module. PEP 399 suggests 
than any C classes/functions should have a pure Python version as 
fallback, but compare_digest doesn't. I don't know whether it should or 
not.

https://www.python.org/dev/peps/pep-0399/



-- 
Steve

From victor.stinner at gmail.com  Fri Apr 15 05:34:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 11:34:44 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <loom.20160415T105902-955@post.gmane.org>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <loom.20160415T105902-955@post.gmane.org>
Message-ID: <CAMpsgwYV3LaSeJJ7WDR6iP7U5fS34gGawV6uTmsjLufuCsA9DA@mail.gmail.com>

2016-04-15 11:01 GMT+02:00 Antoine Pitrou <antoine at python.org>:
> Victor Stinner <victor.stinner <at> gmail.com> writes:
>> You're right that incrementing the global version is useless for these
>> specific cases, and using the version 0 should work. It only matters
>> that the version (version? version tag?) is different.
>
> Why do this? It's a nice property that two dicts always have different
> version tags, and now you're killing this property for... no obvious
> reason?

I guess that the reason is to reduce *a little bit* the risk of
integer overflow (especially the bug when a guard doesn't see a change
between new_version = old_version % 2**64).

> Do you really think dict.clear() is in need of micro-optimizing a
> couple CPU cycles away?

The advantage of having a different version for empty dict is to be
able to use the version to check that they are different. Using the
dictionary pointer is not enough, since it's common that a new
dictionary gets the address of a previously destroyed dictionary. This
case can be avoided if you keep dictionaries alive by keeping a strong
reference, but there are good reasons to not keep a strong reference.

Victor

From victor.stinner at gmail.com  Fri Apr 15 05:35:56 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 11:35:56 +0200
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
In-Reply-To: <20160415092155.GK1819@ando.pearwood.info>
References: <20160415045254.GI1819@ando.pearwood.info>
 <CAMpsgwaT2GP=4qqr5zRESYZjcxit35Uy1_7ByKXAer6YBGt2eg@mail.gmail.com>
 <20160415092155.GK1819@ando.pearwood.info>
Message-ID: <CAMpsgwYc8Tyw5RTehY8WDcTwkyHG5aTu2=tncniWM4JxH75WSg@mail.gmail.com>

2016-04-15 11:21 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:
> This isn't just a question about the secrets module. PEP 399 suggests
> than any C classes/functions should have a pure Python version as
> fallback, but compare_digest doesn't. I don't know whether it should or
> not.

The hmac module is responsible to providing a fallback, not the secrets module.

Victor

From victor.stinner at gmail.com  Fri Apr 15 05:39:02 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 11:39:02 +0200
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
 <20160411175036.GA1819@ando.pearwood.info>
 <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>
Message-ID: <CAMpsgwbxakgAMy_eQFj5pL_tFcc40_TqMd9ctY=s44Jq2VoMAA@mail.gmail.com>

Hi,

Would it make sense to add a function to generate a random UUID4 (as a
string) in secrets?

The current implement in uuid.py of CPython 3.6 already uses os.urandom():

def uuid4():
    """Generate a random UUID."""
    return UUID(bytes=os.urandom(16), version=4)

Victor

From p.f.moore at gmail.com  Fri Apr 15 05:55:38 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 15 Apr 2016 10:55:38 +0100
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
In-Reply-To: <CAMpsgwYc8Tyw5RTehY8WDcTwkyHG5aTu2=tncniWM4JxH75WSg@mail.gmail.com>
References: <20160415045254.GI1819@ando.pearwood.info>
 <CAMpsgwaT2GP=4qqr5zRESYZjcxit35Uy1_7ByKXAer6YBGt2eg@mail.gmail.com>
 <20160415092155.GK1819@ando.pearwood.info>
 <CAMpsgwYc8Tyw5RTehY8WDcTwkyHG5aTu2=tncniWM4JxH75WSg@mail.gmail.com>
Message-ID: <CACac1F8yivVaTwiC1ws--2wWdPQY3K5ODieQipiXur4V1mG7Lg@mail.gmail.com>

On 15 April 2016 at 10:35, Victor Stinner <victor.stinner at gmail.com> wrote:
> 2016-04-15 11:21 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:
>> This isn't just a question about the secrets module. PEP 399 suggests
>> than any C classes/functions should have a pure Python version as
>> fallback, but compare_digest doesn't. I don't know whether it should or
>> not.
>
> The hmac module is responsible to providing a fallback, not the secrets module.

Agreed. The library docs state that the hmac module provides
compare_digest, so you are therefore entitled to unconditionally
import it (just as end user code would).

Paul

From ncoghlan at gmail.com  Fri Apr 15 06:16:53 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 Apr 2016 20:16:53 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>

On 15 April 2016 at 00:52, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Nick Coghlan writes:
>
>  > The use case for returning bytes from __fspath__ is DirEntry, so you
>  > can write things like this in low level code:
>  >
>  >     def myscandir(dirpath):
>  >         for entry in os.scandir(dirpath):
>  >             if entry.is_file():
>  >                 with open(entry) as f:
>  >                     # do something
>
> Excuse me, but that is *not* a use case for returning bytes from
> DirEntry.__fspath__.  open() is perfectly happy taking str (including
> surrogate-encoded rawbytes).

That results in a different type for the file object's name:

>>> open("README.md").name
'README.md'
>>> open(b"README.md").name
b'README.md'

Implicitly level shifting in a low level API isn't a good thing,
especially when there are idempotent level shifting commands available
(so you can always ensure a given value is on the level you expect,
even if you don't know which level it was on originally).

I completely agree with you that folks working with text in the binary
domain are asking for trouble, but at the same time, that's the
reality of the way a lot of *nix system interfaces operate. The
guarantee we want to provide those folks is that if they're operating
in the binary domain they'll stay there unless they explicitly shift
out of it using a decoding API of some kind - doing it behind their
back would be akin to implicitly shifting from the time domain to the
frequency domain in an engineering library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Fri Apr 15 06:42:06 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 Apr 2016 20:42:06 +1000
Subject: [Python-Dev] PEP 506 secrets module
In-Reply-To: <CAMpsgwbxakgAMy_eQFj5pL_tFcc40_TqMd9ctY=s44Jq2VoMAA@mail.gmail.com>
References: <20151016005711.GC11980@ando.pearwood.info>
 <CAP1=2W6p3CK-_ksvf7Jxk9UC2ugniraqkRGvv2FGrVxnvuhv=Q@mail.gmail.com>
 <CAP7+vJ+7cR3do5LOApqFjR+PvavKdOpA2Q0rLC0pYAAS4LGwKQ@mail.gmail.com>
 <20160410050845.GA12526@ando.pearwood.info>
 <CAP7+vJLph9pxOuL_19pTP_BakggnrJdos2f8OOjwZmAp1mJCnA@mail.gmail.com>
 <20160411175036.GA1819@ando.pearwood.info>
 <CAP7+vJJMx3+34xVxGiRPnN0jsDhtmb6cS6MEO=7xvKtsSUzRPQ@mail.gmail.com>
 <CAMpsgwbxakgAMy_eQFj5pL_tFcc40_TqMd9ctY=s44Jq2VoMAA@mail.gmail.com>
Message-ID: <CADiSq7e9aKtO464xk9duL94BVdAwNVGHjPCK2WB3MoejHgC+gQ@mail.gmail.com>

On 15 April 2016 at 19:39, Victor Stinner <victor.stinner at gmail.com> wrote:
> Hi,
>
> Would it make sense to add a function to generate a random UUID4 (as a
> string) in secrets?
>
> The current implement in uuid.py of CPython 3.6 already uses os.urandom():
>
> def uuid4():
>     """Generate a random UUID."""
>     return UUID(bytes=os.urandom(16), version=4)

I don't think so, as folks looking to generate a UUID specifically are
already likely to end up at the uuid module docs rather than trying to
craft their own based on the random module (and the uuid module
already does the right thing, and it would be a bug if it didn't).

The new secrets module fills the gap for cases where random is
otherwise an attractive nuisance by making it easy to say "use this
instead".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Fri Apr 15 06:48:44 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 Apr 2016 20:48:44 +1000
Subject: [Python-Dev] Should secrets include a fallback for
 hmac.compare_digest?
In-Reply-To: <20160415045254.GI1819@ando.pearwood.info>
References: <20160415045254.GI1819@ando.pearwood.info>
Message-ID: <CADiSq7erjGOxxrZxauS3++X1FmghRBfmQ6=F9Zg54hQDTQCXDg@mail.gmail.com>

On 15 April 2016 at 14:52, Steven D'Aprano <steve at pearwood.info> wrote:
> Now that PEP 506 has been approved, I've checked in the secrets module,
> but an implementation question has come up regarding compare_digest.
>
> Currently, the module tries to import hmac.compare_digest, and if that
> fails, then it falls back to a Python version. But since compare_digest
> has been available since 3.3, I'm now questioning whether the fallback
> is useful at all. Perhaps for alternate Python implementations?
>
> So, two questions:
>
> - should secrets include a fallback?

It definitely *shouldn't* include a fallback, as the function needs to
be writen in C (or some other not-normal-Python-code language) in
order to provide the appropriate timing guarantees.

We added hmac.compare_digest in response to Python web frameworks
providing their own pure Python "constant time" comparison functions
that were nevertheless still subject to remote timing atacks.

I'd forgotten about the hmac vs operator indirection, but it's still
better to import the public API from hmac (since
operator._compare_digest is a Python implementation detail, and you
may as well make it easy to extract the secrets module for use in
earlier versions - 2.7 also gained hmac.compare_digest as part of PEP
466).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From status at bugs.python.org  Fri Apr 15 12:08:25 2016
From: status at bugs.python.org (Python tracker)
Date: Fri, 15 Apr 2016 18:08:25 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20160415160825.E7ECE5667A@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2016-04-08 - 2016-04-15)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    5489 (+12)
  closed 33039 (+46)
  total  38528 (+58)

Open issues with patches: 2381 


Issues opened (45)
==================

#11205: Evaluation order of dictionary display is different from refer
http://bugs.python.org/issue11205  reopened by ncoghlan

#25609: Add a ContextManager ABC and type
http://bugs.python.org/issue25609  reopened by brett.cannon

#25731: Assigning and deleting __new__ attr on the class does not allo
http://bugs.python.org/issue25731  reopened by barry

#26673: Tkinter error when opening IDLE configuration menu
http://bugs.python.org/issue26673  reopened by terry.reedy

#26716: EINTR handling in fcntl
http://bugs.python.org/issue26716  opened by Jack Zhou

#26717: wsgiref.simple_server: mojibake with cp1252 bytes in PATH_INFO
http://bugs.python.org/issue26717  opened by Anthony Sottile

#26720: memoryview from BufferedWriter becomes garbage
http://bugs.python.org/issue26720  opened by martin.panter

#26721: Avoid socketserver.StreamRequestHandler.wfile doing partial wr
http://bugs.python.org/issue26721  opened by martin.panter

#26724: Serialize dict with non-string keys to JSON ??? unexpected res
http://bugs.python.org/issue26724  opened by anton-ryzhov

#26726: Incomplete Internationalization in Argparse Module
http://bugs.python.org/issue26726  opened by IronGrid

#26728: make pdb.set_trace() accept debugger commands as arguments and
http://bugs.python.org/issue26728  opened by irdb

#26729: Incorrect __text_signature__ for sorted
http://bugs.python.org/issue26729  opened by eriknw

#26730: SpooledTemporaryFile doesn't correctly preserve data for text 
http://bugs.python.org/issue26730  opened by James Hennessy

#26731: subprocess on windows leaks stdout/stderr handle to child proc
http://bugs.python.org/issue26731  opened by saifujinaro

#26732: multiprocessing sentinel resource leak
http://bugs.python.org/issue26732  opened by quick-b

#26733: staticmethod and classmethod are ignored when disassemble clas
http://bugs.python.org/issue26733  opened by xiang.zhang

#26736: Use HTTPS protocol in links
http://bugs.python.org/issue26736  opened by serhiy.storchaka

#26739: idle: Errno 10035 a non-blocking socket operation could not be
http://bugs.python.org/issue26739  opened by MICHAEL JACOBSON

#26740: tarfile: accessing (listing and extracting) tarball fails with
http://bugs.python.org/issue26740  opened by Tomas Tomecek

#26741: subprocess.Popen should emit a ResourceWarning in destructor i
http://bugs.python.org/issue26741  opened by haypo

#26742: imports in test_warnings changes warnings.filters
http://bugs.python.org/issue26742  opened by haypo

#26743: Unable to import random with python2.7 on power pc based machi
http://bugs.python.org/issue26743  opened by ragreddy

#26744: print() function hangs on MS-Windows 10
http://bugs.python.org/issue26744  opened by Ma Lin

#26745: Redundant code in _PyObject_GenericSetAttrWithDict
http://bugs.python.org/issue26745  opened by xiang.zhang

#26746: struct.pack(): trailing padding bytes on x64
http://bugs.python.org/issue26746  opened by skrah

#26750: Mock autospec does not work with subclasses of property()
http://bugs.python.org/issue26750  opened by amaury.forgeotdarc

#26751: Possible bug in sorting algorithm
http://bugs.python.org/issue26751  opened by David.Manowitz

#26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam
http://bugs.python.org/issue26752  opened by jekin000

#26753: Obmalloc lock LOCK_INIT and LOCK_FINI are never used
http://bugs.python.org/issue26753  opened by larry

#26754: PyUnicode_FSDecoder() accepts arbitrary iterable
http://bugs.python.org/issue26754  opened by serhiy.storchaka

#26755: Update version{added,changed} docs in devguide
http://bugs.python.org/issue26755  opened by berker.peksag

#26756: fileinput handling of unicode errors from standard input
http://bugs.python.org/issue26756  opened by jmb236

#26757: test_urllib2net.test_http_basic() timeout after 15 min on
http://bugs.python.org/issue26757  opened by haypo

#26758: Unnecessary format string handling for no argument slot wrappe
http://bugs.python.org/issue26758  opened by josh.r

#26759: PyBytes_FromObject accepts arbitrary iterable
http://bugs.python.org/issue26759  opened by serhiy.storchaka

#26760: Document PyFrameObject
http://bugs.python.org/issue26760  opened by brett.cannon

#26762: test_multiprocessing_spawn leaves processes running in backgro
http://bugs.python.org/issue26762  opened by martin.panter

#26763: Update PEP-8 regarding binary operators
http://bugs.python.org/issue26763  opened by IanLee1521

#26764: SystemError in bytes.__rmod__
http://bugs.python.org/issue26764  opened by serhiy.storchaka

#26765: Factor out common bytes and bytearray implementation
http://bugs.python.org/issue26765  opened by serhiy.storchaka

#26766: The result type of bytearray formatting is not stable
http://bugs.python.org/issue26766  opened by berker.peksag

#26767: Inconsistant error messages for failed attribute modification
http://bugs.python.org/issue26767  opened by serhiy.storchaka

#26769: Python 2.7: make private file descriptors non inheritable
http://bugs.python.org/issue26769  opened by haypo

#26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a
http://bugs.python.org/issue26770  opened by haypo

#26771: python-config.sh.in INCDIR does not match python version if ex
http://bugs.python.org/issue26771  opened by benzea



Most recent 15 issues with no replies (15)
==========================================

#26771: python-config.sh.in INCDIR does not match python version if ex
http://bugs.python.org/issue26771

#26769: Python 2.7: make private file descriptors non inheritable
http://bugs.python.org/issue26769

#26767: Inconsistant error messages for failed attribute modification
http://bugs.python.org/issue26767

#26765: Factor out common bytes and bytearray implementation
http://bugs.python.org/issue26765

#26760: Document PyFrameObject
http://bugs.python.org/issue26760

#26758: Unnecessary format string handling for no argument slot wrappe
http://bugs.python.org/issue26758

#26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam
http://bugs.python.org/issue26752

#26750: Mock autospec does not work with subclasses of property()
http://bugs.python.org/issue26750

#26739: idle: Errno 10035 a non-blocking socket operation could not be
http://bugs.python.org/issue26739

#26728: make pdb.set_trace() accept debugger commands as arguments and
http://bugs.python.org/issue26728

#26726: Incomplete Internationalization in Argparse Module
http://bugs.python.org/issue26726

#26700: Make digest_size a class variable
http://bugs.python.org/issue26700

#26697: tkFileDialog crash on askopenfilename Python 2.7 64-bit Win7
http://bugs.python.org/issue26697

#26696: Document collections.abc.ByteString
http://bugs.python.org/issue26696

#26695: pickle and _pickle accelerator have different behavior when un
http://bugs.python.org/issue26695



Most recent 15 issues waiting for review (15)
=============================================

#26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a
http://bugs.python.org/issue26770

#26769: Python 2.7: make private file descriptors non inheritable
http://bugs.python.org/issue26769

#26766: The result type of bytearray formatting is not stable
http://bugs.python.org/issue26766

#26765: Factor out common bytes and bytearray implementation
http://bugs.python.org/issue26765

#26764: SystemError in bytes.__rmod__
http://bugs.python.org/issue26764

#26763: Update PEP-8 regarding binary operators
http://bugs.python.org/issue26763

#26755: Update version{added,changed} docs in devguide
http://bugs.python.org/issue26755

#26750: Mock autospec does not work with subclasses of property()
http://bugs.python.org/issue26750

#26745: Redundant code in _PyObject_GenericSetAttrWithDict
http://bugs.python.org/issue26745

#26742: imports in test_warnings changes warnings.filters
http://bugs.python.org/issue26742

#26741: subprocess.Popen should emit a ResourceWarning in destructor i
http://bugs.python.org/issue26741

#26736: Use HTTPS protocol in links
http://bugs.python.org/issue26736

#26733: staticmethod and classmethod are ignored when disassemble clas
http://bugs.python.org/issue26733

#26730: SpooledTemporaryFile doesn't correctly preserve data for text 
http://bugs.python.org/issue26730

#26729: Incorrect __text_signature__ for sorted
http://bugs.python.org/issue26729



Top 10 most discussed issues (10)
=================================

#26743: Unable to import random with python2.7 on power pc based machi
http://bugs.python.org/issue26743  20 msgs

#26766: The result type of bytearray formatting is not stable
http://bugs.python.org/issue26766  12 msgs

#25702: Link Time Optimizations support for GCC and CLANG
http://bugs.python.org/issue25702  10 msgs

#25910: Fixing links in documentation
http://bugs.python.org/issue25910  10 msgs

#26647: ceval: use Wordcode, 16-bit bytecode
http://bugs.python.org/issue26647  10 msgs

#26716: EINTR handling in fcntl
http://bugs.python.org/issue26716   9 msgs

#26729: Incorrect __text_signature__ for sorted
http://bugs.python.org/issue26729   9 msgs

#26763: Update PEP-8 regarding binary operators
http://bugs.python.org/issue26763   9 msgs

#26359: CPython build options for out-of-the box performance
http://bugs.python.org/issue26359   8 msgs

#26601: Use new madvise()'s MADV_FREE on the private heap
http://bugs.python.org/issue26601   8 msgs



Issues closed (48)
==================

#13410: String formatting bug in interactive mode
http://bugs.python.org/issue13410  closed by serhiy.storchaka

#13952: mimetypes doesn't recognize .csv
http://bugs.python.org/issue13952  closed by berker.peksag

#14784: Re-importing _warnings changes warnings.filters
http://bugs.python.org/issue14784  closed by martin.panter

#15984: Wrong documentation for PyUnicode_FromObject() and PyUnicode_F
http://bugs.python.org/issue15984  closed by martin.panter

#16329: mimetypes does not support webm type
http://bugs.python.org/issue16329  closed by berker.peksag

#17264: Update Building C and C++ Extensions with distutils documentat
http://bugs.python.org/issue17264  closed by berker.peksag

#17339: bytes() TypeError message is misleadingly narrow
http://bugs.python.org/issue17339  closed by serhiy.storchaka

#18461: X Error in tkinter
http://bugs.python.org/issue18461  closed by serhiy.storchaka

#21069: test_fileno of test_urllibnet intermittently fails
http://bugs.python.org/issue21069  closed by martin.panter

#22659: SyntaxError in the configure_ctypes
http://bugs.python.org/issue22659  closed by berker.peksag

#23397: PEP 431 implementation
http://bugs.python.org/issue23397  closed by berker.peksag

#24951: Idle test_configdialog fails on Fedora 23, 3.6
http://bugs.python.org/issue24951  closed by terry.reedy

#25339: sys.stdout.errors is set to "surrogateescape"
http://bugs.python.org/issue25339  closed by serhiy.storchaka

#25496: tarfile: Default value for compresslevel is not documented
http://bugs.python.org/issue25496  closed by martin.panter

#25654: test_multiprocessing_spawn ResourceWarning with -Werror
http://bugs.python.org/issue25654  closed by martin.panter

#26057: Avoid nonneeded use of PyUnicode_FromObject()
http://bugs.python.org/issue26057  closed by serhiy.storchaka

#26257: Eliminate buffer_tests.py
http://bugs.python.org/issue26257  closed by martin.panter

#26404: socketserver context manager
http://bugs.python.org/issue26404  closed by martin.panter

#26585: Use html.escape to replace _quote_html in http.server
http://bugs.python.org/issue26585  closed by martin.panter

#26587: Possible duplicate entries in sys.path if .pth files are used 
http://bugs.python.org/issue26587  closed by brett.cannon

#26609: Wrong request target in test_httpservers.py
http://bugs.python.org/issue26609  closed by martin.panter

#26610: test_venv.test_with_pip() fails when ctypes is missing
http://bugs.python.org/issue26610  closed by haypo

#26623: JSON encode: more informative error
http://bugs.python.org/issue26623  closed by serhiy.storchaka

#26624: Windows hangs in call to CRT setlocale()
http://bugs.python.org/issue26624  closed by python-dev

#26639: Tools/i18n/pygettext.py: replace deprecated imp module with im
http://bugs.python.org/issue26639  closed by haypo

#26668: Remove Lib/test/test_importlib/regrtest.py?
http://bugs.python.org/issue26668  closed by brett.cannon

#26685: Raise errors from socket.close()
http://bugs.python.org/issue26685  closed by martin.panter

#26687: Use Py_RETURN_NONE in sqlite3 module
http://bugs.python.org/issue26687  closed by berker.peksag

#26699: locale.str docstring is incorrect: "Convert float to integer"
http://bugs.python.org/issue26699  closed by orsenthil

#26706: Update OpenSSL version in readme
http://bugs.python.org/issue26706  closed by python-dev

#26712: Unify (r)split(), (l/r)strip() method tests
http://bugs.python.org/issue26712  closed by martin.panter

#26714: telnetlib.Telnet should act as a context manager
http://bugs.python.org/issue26714  closed by SilentGhost

#26715: can not deactivate venv (deactivate.bat) if the venv was activ
http://bugs.python.org/issue26715  closed by zach.ware

#26718: super.__init__ leaks memory if called multiple times
http://bugs.python.org/issue26718  closed by brett.cannon

#26719: More efficient formatting of ints and floats in json
http://bugs.python.org/issue26719  closed by serhiy.storchaka

#26722: Fold compare operators on constants (peephole)
http://bugs.python.org/issue26722  closed by ncoghlan

#26723: Add an option to skip _decimal module
http://bugs.python.org/issue26723  closed by skrah

#26725: list() destroys map object data
http://bugs.python.org/issue26725  closed by ned.deily

#26727: ctypes.util.find_msvcrt() does not work in python 3.5.1
http://bugs.python.org/issue26727  closed by steve.dower

#26734: Repeated mmap\munmap calls during temporary allocation
http://bugs.python.org/issue26734  closed by pitrou

#26735: os.urandom(2500) fails on Solaris 11.3
http://bugs.python.org/issue26735  closed by haypo

#26737: csv.DictReader throws generic error when fieldnames is accesse
http://bugs.python.org/issue26737  closed by serhiy.storchaka

#26738: listname.strip() does not work right if the name ends with an 
http://bugs.python.org/issue26738  closed by SilentGhost

#26747: types.InstanceType only for old style class only in 2.7
http://bugs.python.org/issue26747  closed by berker.peksag

#26748: enum.Enum is False-y
http://bugs.python.org/issue26748  closed by ethan.furman

#26749: Update devguide to include Fedora's DNF
http://bugs.python.org/issue26749  closed by berker.peksag

#26761: winsound module very unstable in Windows 10
http://bugs.python.org/issue26761  closed by zach.ware

#26768: Fix instructions at WindowsCompilers for MSVC/SDKs
http://bugs.python.org/issue26768  closed by berker.peksag

From guido at python.org  Fri Apr 15 12:53:13 2016
From: guido at python.org (Guido van Rossum)
Date: Fri, 15 Apr 2016 09:53:13 -0700
Subject: [Python-Dev] PEP 8 updated on whether to break before or after a
 binary update
Message-ID: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>

After a fruitful discussion on python-ideas I've decided that it's fine to
break lines *before* a binary operator. It looks better and Knuth
recommends it.

The head of the python-ideas discussion:
https://mail.python.org/pipermail/python-ideas/2016-April/039752.html

See also the discussion in the tracker: http://bugs.python.org/issue26763

Here's the diff I applied: https://hg.python.org/peps/rev/3857909d7956

The talk by Brandon Rhodes where Knuth is referenced ([3] below):
http://rhodesmill.org/brandon/slides/2012-11-pyconca/#laying-down-the-law

The key section in PEP 8 that was updated (apart from fixing up references):

Should a line break before or after a binary operator?
------------------------------------------------------

For decades the recommended style has been to break after binary
operators.  However, recent reseach unearthed recommendations by
Donald Knuth to break *before* binary operators, in his writings about
typesetting [3]_.  Therefore it is permissible to break before or
after a binary operator, as long as the convention is consistent
locally.  For new code Knuth's style is suggested.

Some examples of code breaking before binary Boolean operators::

    class Rectangle(Blob):

        def __init__(self, width, height,
                     color='black', emphasis=None, highlight=0):
            if (width == 0
                and height == 0
                and color == 'red'
                and emphasis == 'strong'
                or highlight > 100):
                raise ValueError("sorry, you lose")
            if (width == 0 and height == 0
                and (color == 'red' or emphasis is None)):
                raise ValueError("I don't think so -- values are %s, %s" %
                                 (width, height))
            Blob.__init__(self, width, height,
                          color, emphasis, highlight)


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/84b8d98d/attachment.html>

From tim.peters at gmail.com  Fri Apr 15 13:02:43 2016
From: tim.peters at gmail.com (Tim Peters)
Date: Fri, 15 Apr 2016 12:02:43 -0500
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
Message-ID: <CAExdVNmP2pDjiockWFDJzL9NPaHKQNRQ8zfp4MpfCY_xEnHRmQ@mail.gmail.com>

[Guido]
> After a fruitful discussion on python-ideas I've decided that it's fine to
> break lines *before* a binary operator. It looks better and Knuth recommends
> it.
> ...
> Therefore it is permissible to break before or
> after a binary operator, as long as the convention is consistent
> locally.  For new code Knuth's style is suggested.
>
> Some examples of code breaking before binary Boolean operators::
>
>     class Rectangle(Blob):
>
>         def __init__(self, width, height,
>                      color='black', emphasis=None, highlight=0):
>             if (width == 0
>                 and height == 0
>                 and color == 'red'
>                 and emphasis == 'strong'
>                 or highlight > 100):
>                 raise ValueError("sorry, you lose")
>             if (width == 0 and height == 0
>                 and (color == 'red' or emphasis is None)):
>                 raise ValueError("I don't think so -- values are %s, %s" %
>                                  (width, height))
>             Blob.__init__(self, width, height,
>                           color, emphasis, highlight)
>

Note that this code still breaks a line after a binary operator (the
string formatting "%" operator in the 2nd ValueError call).  But it's
perfectly clear the way it is.  Good taste can't be reduced to rules
;-)

From victor.stinner at gmail.com  Fri Apr 15 13:03:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 19:03:44 +0200
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
Message-ID: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>

Hum.

        if (width == 0
            and height == 0
            and color == 'red'
            and emphasis == 'strong'
            or highlight > 100):
            raise ValueError("sorry, you lose")

Please remove one space to vertically align "and" operators with the
opening parenthesis:

        if (width == 0
           and height == 0
           and color == 'red'
           and emphasis == 'strong'
           or highlight > 100):
            raise ValueError("sorry, you lose")

(I'm not sure that the difference is obvious in a mail client, you
need a fixed width font which is not the case in my Gmail editor.)

It helps to visually see that the multiline test and the raise
instruction are in two different blocks.

(Moreover, the pep8 checks of OpenStack simply reject such syntax, but
I cannot use this syntax anymore :-))

Victor

From guido at python.org  Fri Apr 15 13:06:00 2016
From: guido at python.org (Guido van Rossum)
Date: Fri, 15 Apr 2016 10:06:00 -0700
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
Message-ID: <CAP7+vJJxwknSm5AuLB8JFM1jHGPojog0iiyPEhavk+PtD=QLtw@mail.gmail.com>

On Fri, Apr 15, 2016 at 10:03 AM, Victor Stinner <victor.stinner at gmail.com>
wrote:

> Hum.
>
>         if (width == 0
>             and height == 0
>             and color == 'red'
>             and emphasis == 'strong'
>             or highlight > 100):
>             raise ValueError("sorry, you lose")
>
> Please remove one space to vertically align "and" operators with the
> opening parenthesis:
>
>         if (width == 0
>            and height == 0
>            and color == 'red'
>            and emphasis == 'strong'
>            or highlight > 100):
>             raise ValueError("sorry, you lose")
>
> (I'm not sure that the difference is obvious in a mail client, you
> need a fixed width font which is not the case in my Gmail editor.)
>

I can see it perfectly fin and I disagree.


> It helps to visually see that the multiline test and the raise
> instruction are in two different blocks.
>
> (Moreover, the pep8 checks of OpenStack simply reject such syntax, but
> I cannot use this syntax anymore :-))


That's why that tool shouldn't be named after the PEP. See
https://github.com/PyCQA/pycodestyle/issues/466

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/0df87464/attachment.html>

From gjcarneiro at gmail.com  Fri Apr 15 13:15:09 2016
From: gjcarneiro at gmail.com (Gustavo Carneiro)
Date: Fri, 15 Apr 2016 18:15:09 +0100
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
Message-ID: <CAO-CpEKVjrHxPSqOxHfJULeh8mXnesCWvCD3t7dbqyyL1ZEw=Q@mail.gmail.com>

On 15 April 2016 at 18:03, Victor Stinner <victor.stinner at gmail.com> wrote:

> Hum.
>
>         if (width == 0
>             and height == 0
>             and color == 'red'
>             and emphasis == 'strong'
>             or highlight > 100):
>             raise ValueError("sorry, you lose")
>
> Please remove one space to vertically align "and" operators with the
> opening parenthesis:
>
>         if (width == 0
>            and height == 0
>            and color == 'red'
>            and emphasis == 'strong'
>            or highlight > 100):
>             raise ValueError("sorry, you lose")
>

Personally, I think what you propose looks ugly.  The first version looks
so much better.


It helps to visually see that the multiline test and the raise
> instruction are in two different blocks.


The only thing I would add would be an empty line to help distinguish the
if expression block from the "then" code block:

        if (width == 0
            and height == 0
            and color == 'red'
            and emphasis == 'strong'
            or highlight > 100):

            raise ValueError("sorry, you lose")


-- 
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/b63de748/attachment.html>

From storchaka at gmail.com  Fri Apr 15 13:24:12 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 15 Apr 2016 20:24:12 +0300
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
Message-ID: <ner83s$u6d$1@ger.gmane.org>

On 15.04.16 20:03, Victor Stinner wrote:
> Hum.
>
>          if (width == 0
>              and height == 0
>              and color == 'red'
>              and emphasis == 'strong'
>              or highlight > 100):
>              raise ValueError("sorry, you lose")
>
> Please remove one space to vertically align "and" operators with the
> opening parenthesis:
>
>          if (width == 0
>             and height == 0
>             and color == 'red'
>             and emphasis == 'strong'
>             or highlight > 100):
>              raise ValueError("sorry, you lose")

I would rather *add* spaces to wrapped condition lines.

         if (width == 0
                 and height == 0
                 and color == 'red'
                 and emphasis == 'strong'
                 or highlight > 100):
             raise ValueError("sorry, you lose")



From guido at python.org  Fri Apr 15 13:43:44 2016
From: guido at python.org (Guido van Rossum)
Date: Fri, 15 Apr 2016 10:43:44 -0700
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <ner83s$u6d$1@ger.gmane.org>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
 <ner83s$u6d$1@ger.gmane.org>
Message-ID: <CAP7+vJJRLnvgr_8ZNCjVvCCozPdMasyXfmWUMM3nx4-rTNqbVA@mail.gmail.com>

The update is already serving its real purpose: showing that style is
debatable and cannot always easily be reduced to fixed rules.

On Fri, Apr 15, 2016 at 10:24 AM, Serhiy Storchaka <storchaka at gmail.com>
wrote:

> On 15.04.16 20:03, Victor Stinner wrote:
>
>> Hum.
>>
>>          if (width == 0
>>              and height == 0
>>              and color == 'red'
>>              and emphasis == 'strong'
>>              or highlight > 100):
>>              raise ValueError("sorry, you lose")
>>
>> Please remove one space to vertically align "and" operators with the
>> opening parenthesis:
>>
>>          if (width == 0
>>             and height == 0
>>             and color == 'red'
>>             and emphasis == 'strong'
>>             or highlight > 100):
>>              raise ValueError("sorry, you lose")
>>
>
> I would rather *add* spaces to wrapped condition lines.
>
>         if (width == 0
>                 and height == 0
>                 and color == 'red'
>                 and emphasis == 'strong'
>                 or highlight > 100):
>             raise ValueError("sorry, you lose")
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/4b31e6cd/attachment.html>

From python at mrabarnett.plus.com  Fri Apr 15 13:49:03 2016
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 15 Apr 2016 18:49:03 +0100
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
Message-ID: <5711298F.7060308@mrabarnett.plus.com>

On 2016-04-15 18:03, Victor Stinner wrote:
 > Hum.
 >
 >          if (width == 0
 >              and height == 0
 >              and color == 'red'
 >              and emphasis == 'strong'
 >              or highlight > 100):
 >              raise ValueError("sorry, you lose")
 >
 > Please remove one space to vertically align "and" operators with the
 > opening parenthesis:
 >
 >          if (width == 0
 >             and height == 0
 >             and color == 'red'
 >             and emphasis == 'strong'
 >             or highlight > 100):
 >              raise ValueError("sorry, you lose")
 >
 > (I'm not sure that the difference is obvious in a mail client, you
 > need a fixed width font which is not the case in my Gmail editor.)
 >
 > It helps to visually see that the multiline test and the raise
 > instruction are in two different blocks.
 >
 > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but
 > I cannot use this syntax anymore :-))
 >
I always half-indent continuation lines:

         if (width == 0
           and height == 0
           and color == 'red'
           and emphasis == 'strong'
           or highlight > 100):
             raise ValueError("sorry, you lose")

From jimjjewett at gmail.com  Fri Apr 15 13:54:59 2016
From: jimjjewett at gmail.com (Jim J. Jewett)
Date: Fri, 15 Apr 2016 10:54:59 -0700 (PDT)
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
Message-ID: <57112af3.641d8c0a.3bf39.03b0@mx.google.com>


On Thu Apr 14 11:19:42 EDT 2016, Victor Stinner posted the latest
draft of PEP 509; dict version_tag

(1)  Meta Question:  If this is really only for CPython, then is
"Standards Track" the right classification?

(2)  Why *promise* not to update the version_tag when replacing a
value with itself?  Isn't that the sort of quality-of-implementation
issue that got pushed to a note for objects that happen to be
represented as singletons, such as small integers or ASCII chars?

I think it is a helpful optimization, and worth documenting ... I
just think it should be at the layer of "this particular patch",
rather than something that sounds like part of the contract.

e.g.,

... The global version is also incremented and copied to the
dictionary version at each dictionary change.  The following
dict methods can trigger changes:

* ``clear()`` 
* ``pop(key)``
* ``popitem()`` 
* ``setdefault(key, value)`` 
* ``__detitem__(key)`` 
* ``__setitem__(key, value)`` 
* ``update(...)``

.. note::  As a quality of implementation issue, the actual patch
does not increment the version_tag when it can prove that there
was no actual change.  For example, clear() on an already-empty
dict will not trigger a version_tag change, nor will updating a
dict with itself, since the values will be unchanged.  For efficiency,
the analysis considers only object identity (not equality) when
deciding whether to increment the version_tag.

[2A] Do you want to promise that replacing a value with a
non-identical object *will* trigger a version_tag update *even*
if the objects are equal?

I would vote no, but I realize backwards-compatibility may create
such a promise implicitly.

(3)  It is worth being explicit on whether empty dicts can share
a version_tag of 0.  If this PEP is about dict content, then that
seems fine, and it may well be worth optimizing dict creation.

There are times when it is important to keep the same empty dict;
I can't think of any use cases where it is important to verify
that some *other* code has done so, *and* I can't get a reference
to the correct dict for an identity check.

(4)  Please be explicit about the locking around version++; it
is enough to say that the relevant methods already need to hold
the GIL (assuming that is true).

(5)  I'm not sure I understand the arguments around a per-entry
version.

On the one hand, you never need a strong reference to the value;
if it has been collected, then it has obviously been removed from
the dict and should trigger a change even with per-dict.

On the other hand, I'm not sure per-entry would really allow
finer-grained guards to avoid lookups; just because an entry hasn't
been modified doesn't prove it hasn't been moved to another location,
perhaps by replacing a dummy in a slot it would have preferred.

(6)  I'm also not sure why version_tag *doesn't* solve the problem
of dicts that fool the iteration guards by mutating without changing
size ( https://bugs.python.org/issue19332 ) ... are you just saying
that the iterator views aren't allowed to rely on the version-tag
remaining stable, because replacing a value (as opposed to a
key-value pair) is allowed?

I had always viewed the failing iterators as a supporting-this-case-
makes-the-code-too-slow-and-ugly limitation, rather than a data
integrity check.  When I do care about the data not changing,
(an exposed variant of) version_tag is as likely to be what I want as
a hypothetical keys_version_tag would be. 

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

From barry at python.org  Fri Apr 15 14:37:23 2016
From: barry at python.org (Barry Warsaw)
Date: Fri, 15 Apr 2016 14:37:23 -0400
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
Message-ID: <20160415143723.239895bc@subdivisions>

On Apr 15, 2016, at 09:53 AM, Guido van Rossum wrote:

>After a fruitful discussion on python-ideas I've decided that it's fine to
>break lines *before* a binary operator.

Thanks Guido, your changes look great.

-Barry

From oscar.j.benjamin at gmail.com  Fri Apr 15 16:33:44 2016
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Fri, 15 Apr 2016 21:33:44 +0100
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
Message-ID: <CAHVvXxS08BD86ePSAHz=HMLrJ7phENdY4sm5r_8Fouh5KrhOJw@mail.gmail.com>

On 15 April 2016 at 18:54, Jim J. Jewett <jimjjewett at gmail.com> wrote:
>
> [2A] Do you want to promise that replacing a value with a
> non-identical object *will* trigger a version_tag update *even*
> if the objects are equal?
>
> I would vote no, but I realize backwards-compatibility may create
> such a promise implicitly.

It needs to trigger a version update. Equality  doesn't guarantee any
kind of equivalence in Python. It's not even guaranteed that a==b will
come to the same value if evaluated twice in a row.

An example:

>>> from fractions import Fraction as F
>>> F(1) == 1
True
>>> d = globals()
>>> d['a'] = F(1)
>>> a.limit_denominator()
Fraction(1, 1)
>>> d['a'] = 1
>>> a.limit_denominator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'limit_denominator'

--
Oscar

From victor.stinner at gmail.com  Fri Apr 15 16:41:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 22:41:44 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
Message-ID: <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>

2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:
> (1)  Meta Question:  If this is really only for CPython, then is
> "Standards Track" the right classification?

Yes, I think so. It doesn't seem to be an Informal nor a Process:
https://www.python.org/dev/peps/pep-0001/#pep-types


> (2)  Why *promise* not to update the version_tag when replacing a
> value with itself?

It's an useful property. For example, let's say that you have a guard
on globals()['value']. The guard is created with value=3. An unit test
replaces the value with 50, but then restore the value to its previous
value (3). Later, the guard is checked to decide if an optimization
can be used.

If the dict version is increased, you need a lookup. If the dict
version is not increased, the guard is cheap.

In C, it's very cheap to implement the test "new_value == old_value",
it just compares two pointers.

If an overhead is visible, I can drop it from the PEP, and implement
the check in the guard.


>  Isn't that the sort of quality-of-implementation
> issue that got pushed to a note for objects that happen to be
> represented as singletons, such as small integers or ASCII chars?

I prefer to require this property.


> [2A] Do you want to promise that replacing a value with a
> non-identical object *will* trigger a version_tag update *even*
> if the objects are equal?

It's already written in the PEP:

"The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
new_value is old_value , not by their content: new_value == old_value
."


> (3)  It is worth being explicit on whether empty dicts can share
> a version_tag of 0.  If this PEP is about dict content, then that
> seems fine, and it may well be worth optimizing dict creation.

This is not part of the PEP yet. I'm not sure that I will modify the
PEP to use the version 0 for empty dictionaries. Antoine doesn't seem
to be convinced :-)


> (4)  Please be explicit about the locking around version++; it
> is enough to say that the relevant methods already need to hold
> the GIL (assuming that is true).

I don't think that it's important to mention it in the PEP. It's more
an implementation detail. The version can be protected by atomic
operations.


> (5)  I'm not sure I understand the arguments around a per-entry
> version.

It doesn't matter since I don't want this option :-)


> On the one hand, you never need a strong reference to the value;
> if it has been collected, then it has obviously been removed from
> the dict and should trigger a change even with per-dict.

Let's say that you watch the key1 of a dict. The key2 is modified, it
increases the version. Later, you test the guard: to check if the key1
was modified, you need to lookup the key and compare the value. You
need the value to compare it.

> On the other hand, I'm not sure per-entry would really allow
> finer-grained guards to avoid lookups; just because an entry hasn't
> been modified doesn't prove it hasn't been moved to another location,
> perhaps by replacing a dummy in a slot it would have preferred.

The main advantage of per-entry version is to avoid the strong
reference to values.

According to my tests, the drawbacks are too important to take this
option. I prefer a simple version per dictionary.


> (6)  I'm also not sure why version_tag *doesn't* solve the problem
> of dicts that fool the iteration guards by mutating without changing
> size ( https://bugs.python.org/issue19332 ) ... are you just saying
> that the iterator views aren't allowed to rely on the version-tag
> remaining stable, because replacing a value (as opposed to a
> key-value pair) is allowed?

If the dictionary values are modified during the loop, the dict
version is increased. But it's allowed to modify values when you
iterate on *keys*.

Victor

From ianlee1521 at gmail.com  Fri Apr 15 16:48:25 2016
From: ianlee1521 at gmail.com (Ian Lee)
Date: Fri, 15 Apr 2016 13:48:25 -0700
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <5711298F.7060308@mrabarnett.plus.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
 <5711298F.7060308@mrabarnett.plus.com>
Message-ID: <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com>

Cross posting the comment I?d left on the issue [1].

> My preference is to actually break that logic up and avoid the wrapping in the first place, as in [2]. Which in this particular class has the side benefit of that value being used again in the same function anyways.

> I'm starting to realize that Brandon Rhodes really had a big impact on my ideas of styling as I've been learning Python these past few years, as this was another one style I'm stealing from that same talk [3].

[1] http://bugs.python.org/msg263509 <http://bugs.python.org/msg263509>
[2] https://github.com/python/peps/commit/0c790e7b721bd13ad12ab9e6f6206836f398f9c4

~ Ian Lee | IanLee1521 at gmail.com <mailto:IanLee1521 at gmail.com>
> On Apr 15, 2016, at 10:49, MRAB <python at mrabarnett.plus.com> wrote:
> 
> On 2016-04-15 18:03, Victor Stinner wrote:
> > Hum.
> >
> >          if (width == 0
> >              and height == 0
> >              and color == 'red'
> >              and emphasis == 'strong'
> >              or highlight > 100):
> >              raise ValueError("sorry, you lose")
> >
> > Please remove one space to vertically align "and" operators with the
> > opening parenthesis:
> >
> >          if (width == 0
> >             and height == 0
> >             and color == 'red'
> >             and emphasis == 'strong'
> >             or highlight > 100):
> >              raise ValueError("sorry, you lose")
> >
> > (I'm not sure that the difference is obvious in a mail client, you
> > need a fixed width font which is not the case in my Gmail editor.)
> >
> > It helps to visually see that the multiline test and the raise
> > instruction are in two different blocks.
> >
> > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but
> > I cannot use this syntax anymore :-))
> >
> I always half-indent continuation lines:
> 
>        if (width == 0
>          and height == 0
>          and color == 'red'
>          and emphasis == 'strong'
>          or highlight > 100):
>            raise ValueError("sorry, you lose")
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <mailto:Python-Dev at python.org>
> https://mail.python.org/mailman/listinfo/python-dev <https://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ianlee1521%40gmail.com <https://mail.python.org/mailman/options/python-dev/ianlee1521%40gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/dd33d5d4/attachment-0001.html>

From random832 at fastmail.com  Fri Apr 15 17:07:26 2016
From: random832 at fastmail.com (Random832)
Date: Fri, 15 Apr 2016 17:07:26 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
Message-ID: <1460754446.441936.580207105.774469D1@webmail.messagingengine.com>

On Fri, Apr 15, 2016, at 16:41, Victor Stinner wrote:
> If the dictionary values are modified during the loop, the dict
> version is increased. But it's allowed to modify values when you
> iterate on *keys*.

Why is iterating over items different from iterating over keys?

in other words, why do I have to write:

for k in dict:
    v = dict[k]
    ...do some stuff...
    dict[k] = something

rather than

for k, v in dict.items():
    ...do some stuff...
    dict[k] = something


It's not clear why the latter is something you want to prevent.

From ethan at stoneleaf.us  Fri Apr 15 17:16:22 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 15 Apr 2016 14:16:22 -0700
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
Message-ID: <57115A26.10402@stoneleaf.us>

On 04/15/2016 01:41 PM, Victor Stinner wrote:
> 2016-04-15 19:54 GMT+02:00 Jim J. Jewett:

>> (2)  Why *promise* not to update the version_tag when replacing a
>> value with itself?
>
> It's an useful property. For example, let's say that you have a guard
> on globals()['value']. The guard is created with value=3. An unit test
> replaces the value with 50, but then restore the value to its previous
> value (3). Later, the guard is checked to decide if an optimization
> can be used.

I don't understand -- shouldn't the version be incremented with the 
value was replaced with 50, and again when re-replaced with 3?


>> (6)  I'm also not sure why version_tag *doesn't* solve the problem
>> of dicts that fool the iteration guards by mutating without changing
>> size ( https://bugs.python.org/issue19332 ) ... are you just saying
>> that the iterator views aren't allowed to rely on the version-tag
>> remaining stable, because replacing a value (as opposed to a
>> key-value pair) is allowed?
>
> If the dictionary values are modified during the loop, the dict
> version is increased. But it's allowed to modify values when you
> iterate on *keys*.

I don't understand.  Could you provide a small example?

--
~Ethan~


From victor.stinner at gmail.com  Fri Apr 15 17:19:10 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 23:19:10 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <1460754446.441936.580207105.774469D1@webmail.messagingengine.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
 <1460754446.441936.580207105.774469D1@webmail.messagingengine.com>
Message-ID: <CAMpsgwa3z811mTLZ=1UVsMSW12maDo_3Q9hA_E-PPP7k0bBgOA@mail.gmail.com>

2016-04-15 23:07 GMT+02:00 Random832 <random832 at fastmail.com>:
> Why is iterating over items different from iterating over keys?
>
> in other words, why do I have to write:
>
> for k in dict:
>     v = dict[k]
>     ...do some stuff...
>     dict[k] = something
>
> rather than
>
> for k, v in dict.items():
>     ...do some stuff...
>     dict[k] = something
>
> It's not clear why the latter is something you want to prevent.

Hum, I think that you misunderstood what should be prevented. Please
see https://bugs.python.org/issue19332

Sorry, I don't know well this issue. I just know that sadly the PEP
509 doesn't help to fix this issue. Maybe it's not worth to mention
it...

Victor

From victor.stinner at gmail.com  Fri Apr 15 17:24:21 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 23:24:21 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <57115A26.10402@stoneleaf.us>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
 <57115A26.10402@stoneleaf.us>
Message-ID: <CAMpsgwY-3C6s=jBNgz5KDGie75Jg3mgusvVC7r=Tt-qBsMOSDA@mail.gmail.com>

2016-04-15 23:16 GMT+02:00 Ethan Furman <ethan at stoneleaf.us>:
>> It's an useful property. For example, let's say that you have a guard
>> on globals()['value']. The guard is created with value=3. An unit test
>> replaces the value with 50, but then restore the value to its previous
>> value (3). Later, the guard is checked to decide if an optimization
>> can be used.
>
> I don't understand -- shouldn't the version be incremented with the value
> was replaced with 50, and again when re-replaced with 3?

Oh wait, I'm tired and you are right.

Not increasing the value only helps on this code:

dict[key] = value
dict[key] = value  # version doesn't change


>> If the dictionary values are modified during the loop, the dict
>> version is increased. But it's allowed to modify values when you
>> iterate on *keys*.
>
> I don't understand.  Could you provide a small example?

For example, this loop is fine:

for key in dict:
   dict[key] = None

In this loop, the dict version is increased at each loop iteration.

For iter(dict), the check prevents a crash.

The following example raises a RuntimeError("dictionary changed size
during iteration"):

d={1:2}
for k in d:
    d[k+1] = None

Victor

From victor.stinner at gmail.com  Fri Apr 15 17:38:51 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 15 Apr 2016 23:38:51 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <loom.20160415T105902-955@post.gmane.org>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <CAMSv6X0U4-Pw8BnG3XegQBcm5umtwyYa_f623B1WBvbV=P5fNA@mail.gmail.com>
 <CAMpsgwZ4_wn5h_fCEkNKikZ99SWvcUqayEG_Ctj4vyyH1qHnPQ@mail.gmail.com>
 <loom.20160415T105902-955@post.gmane.org>
Message-ID: <CAMpsgwbk+JmaQGbLW5=6=FfgQ4qqTsN7_9se3NPkooS4X_8TeA@mail.gmail.com>

Hi,

FYI I updated the implementation of the PEP 509:
https://bugs.python.org/issue26058


2016-04-15 11:01 GMT+02:00 Antoine Pitrou <antoine at python.org>:
> Why do this? It's a nice property that two dicts always have different
> version tags, and now you're killing this property for... no obvious
> reason?
>
> Do you really think dict.clear() is in need of micro-optimizing a
> couple CPU cycles away?

So, I played with Armin's idea. I confirm that it works for my use
case, guards on dict keys. It should also work on Yury's use case.

Antoine is right, it's really a micro-optimization. It shouldn't help
much for the integer overflow (which is not a real issue in practice).

I propose to leave the PEP unchanged to keep the nice property of
unique identifier for empty dictionaries. It can help for future use
cases.

Victor

From jimjjewett at gmail.com  Fri Apr 15 17:45:32 2016
From: jimjjewett at gmail.com (Jim J. Jewett)
Date: Fri, 15 Apr 2016 17:45:32 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
Message-ID: <CA+OGgf4B1xtU8vrf5DEho5+W-6YrMgCtvhXBCVofESU5zKhxJQ@mail.gmail.com>

On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> 2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:

>> (2)  Why *promise* not to update the version_tag when replacing a
>> value with itself?

> It's an useful property. For example, let's say that you have a guard
> on globals()['value']. The guard is created with value=3. An unit test
> replaces the value with 50, but then restore the value to its previous
> value (3). Later, the guard is checked to decide if an optimization
> can be used.

> If the dict version is increased, you need a lookup. If the dict
> version is not increased, the guard is cheap.

I would expect the version to be increased twice, and therefore to
require a lookup.  Are you suggesting that unittest should provide an
example of resetting the version back to the original value when it
cleans up after itself?

> In C, it's very cheap to implement the test "new_value == old_value",
> it just compares two pointers.

Yeah, I understand that it is likely a win in terms of performance,
and a good way to start off (given that you're willing to do the
work).

I just worry that you may end up closing off even better optimizations
later, if you make too many promises about exactly how you will do
which ones.

Today, dict only cares about ==, and you (reasonably) think that full
== isn't always worth running ... but when it comes to which tests
*are* worth running, I'm not confident that the answers won't change
over the years.

>> [2A] Do you want to promise that replacing a value with a
>> non-identical object *will* trigger a version_tag update *even*
>> if the objects are equal?

> It's already written in the PEP:

I read that as a description of what the code does, rather than a spec
for what it should do... so it isn't clear whether I could count on
that remaining true.

For example, if I know that my dict values are all 4-digit integers,
can I write:

    d[k]  = d[k] + 0

and be assured that the version_tag will bump?  Or is that something
that a future optimizer might optimize out?

>> (3)  It is worth being explicit on whether empty dicts can share
>> a version_tag of 0.  If this PEP is about dict content, then that
>> seems fine, and it may well be worth optimizing dict creation.

> This is not part of the PEP yet. I'm not sure that I will modify the
> PEP to use the version 0 for empty dictionaries. Antoine doesn't seem
> to be convinced :-)

True.  But do note that "not hitting the global counter an extra time
for every dict creation" is a more compelling reason than "we could
speed up dict.clear(), sometimes".


>> (4)  Please be explicit about the locking around version++; it
>> is enough to say that the relevant methods already need to hold
>> the GIL (assuming that is true).

> I don't think that it's important to mention it in the PEP. It's more
> an implementation detail. The version can be protected by atomic
> operations.

Now I'm the one arguing from a specific implementation.  :D

My thought was that any sort of locking (including atomic operations)
is slow, but if the GIL is already held, then there is no *extra*
locking cost. (Well, a slightly longer hold on the lock, but...)

>> (5)  I'm not sure I understand the arguments around a per-entry
>> version.

>> On the one hand, you never need a strong reference to the value;
>> if it has been collected, then it has obviously been removed from
>> the dict and should trigger a change even with per-dict.
>
> Let's say that you watch the key1 of a dict. The key2 is modified, it
> increases the version. Later, you test the guard: to check if the key1
> was modified, you need to lookup the key and compare the value. You
> need the value to compare it.

And the value for key1 is still there, so you can.

The only reason you would notice that the key2 value had gone away is
if you also care about key2 -- in which case the cached value is out
of date, regardless of what specific value it used to hold.

>> (6)  I'm also not sure why version_tag *doesn't* solve the problem
>> of dicts that fool the iteration guards by mutating without changing
>> size ( https://bugs.python.org/issue19332 ) ... are you just saying
>> that the iterator views aren't allowed to rely on the version-tag
>> remaining stable, because replacing a value (as opposed to a
>> key-value pair) is allowed?

> If the dictionary values are modified during the loop, the dict
> version is increased. But it's allowed to modify values when you
> iterate on *keys*.

Sure.  So?

I see three cases:

(A)  I don't care that the collection changed.  The python
implementation might, but I don't.  (So no bug even today.)

(B)  I want to process exactly the collection that I started with.  If
some of the values get replaced, then I want to complain, even if
python doesn't.  version_tag is what I want.

(C)  I want to process exactly the original keys, but go ahead and use
updated values.  The bug still bites, but ... I don't think this case
is any more common than B.

-jJ

From victor.stinner at gmail.com  Fri Apr 15 19:31:45 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 16 Apr 2016 01:31:45 +0200
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CA+OGgf4B1xtU8vrf5DEho5+W-6YrMgCtvhXBCVofESU5zKhxJQ@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
 <CA+OGgf4B1xtU8vrf5DEho5+W-6YrMgCtvhXBCVofESU5zKhxJQ@mail.gmail.com>
Message-ID: <CAMpsgwZ9uv+cWHU25GiczbacDeHA4Pw-5NvR6nQ_6YxaXNYrOA@mail.gmail.com>

.2016-04-15 23:45 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:
>> It's an useful property. For example, let's say that you have a guard
>> on globals()['value']. The guard is created with value=3. An unit test
>> replaces the value with 50, but then restore the value to its previous
>> value (3). Later, the guard is checked to decide if an optimization
>> can be used.
>
>> If the dict version is increased, you need a lookup. If the dict
>> version is not increased, the guard is cheap.
>
> I would expect the version to be increased twice, and therefore to
> require a lookup.  Are you suggesting that unittest should provide an
> example of resetting the version back to the original value when it
> cleans up after itself?

Sorry, as I wrote in another email that I was wrong. If you modify the
value, the version is increased. The discussed case is really a corner
case: the version does not change if the key is set again to exactly
the same value.

d[key] = value
d[key] = value

It's just that it's cheap to implement it :-)


>> In C, it's very cheap to implement the test "new_value == old_value",
>> it just compares two pointers.
>
> Yeah, I understand that it is likely a win in terms of performance,
> and a good way to start off (given that you're willing to do the
> work).
>
> I just worry that you may end up closing off even better optimizations
> later, if you make too many promises about exactly how you will do
> which ones.
>
> Today, dict only cares about ==, and you (reasonably) think that full
> == isn't always worth running ... but when it comes to which tests
> *are* worth running, I'm not confident that the answers won't change
> over the years.

I checked, currently there is no unit test for a==b, only for a is b.
I will add add a test for a==b but a is not b, and ensure that the
version is increased.


>>> [2A] Do you want to promise that replacing a value with a
>>> non-identical object *will* trigger a version_tag update *even*
>>> if the objects are equal?
>
>> It's already written in the PEP:
>
> I read that as a description of what the code does, rather than a spec
> for what it should do... so it isn't clear whether I could count on
> that remaining true.
>
> For example, if I know that my dict values are all 4-digit integers,
> can I write:
>
>     d[k]  = d[k] + 0
>
> and be assured that the version_tag will bump?  Or is that something
> that a future optimizer might optimize out?

Hum, I will try to clarify that.


>>> (4)  Please be explicit about the locking around version++; it
>>> is enough to say that the relevant methods already need to hold
>>> the GIL (assuming that is true).
>
>> I don't think that it's important to mention it in the PEP. It's more
>> an implementation detail. The version can be protected by atomic
>> operations.
>
> Now I'm the one arguing from a specific implementation.  :D
>
> My thought was that any sort of locking (including atomic operations)
> is slow, but if the GIL is already held, then there is no *extra*
> locking cost. (Well, a slightly longer hold on the lock, but...)

Hum, since the PEP clarify targets CPython, I will simply described
its implementation, so explain that the GIL ensures that version++ is
atomic.


>>> On the one hand, you never need a strong reference to the value;
>>> if it has been collected, then it has obviously been removed from
>>> the dict and should trigger a change even with per-dict.
>>
>> Let's say that you watch the key1 of a dict. The key2 is modified, it
>> increases the version. Later, you test the guard: to check if the key1
>> was modified, you need to lookup the key and compare the value. You
>> need the value to compare it.
>
> And the value for key1 is still there, so you can.

Sorry, how do you want to compare that dict[key1] value didn't change,
using the value identifier? dict[key1] is old_value_id?

The problem with storing an identifier (a pointer in C) with no strong
reference is when the object is destroyed, a new object can likely get
the same identifier. So it's likely that "dict[key] is old_value_id"
can be true even if dict[key] is now a new object.


> The only reason you would notice that the key2 value had gone away is
> if you also care about key2 -- in which case the cached value is out
> of date, regardless of what specific value it used to hold.

I don't understand, technically, what do you mean by "out of date" for
an object?


>> If the dictionary values are modified during the loop, the dict
>> version is increased. But it's allowed to modify values when you
>> iterate on *keys*.
>
> Sure.  So?
>
> I see three cases:
>
> (A)  I don't care that the collection changed.  The python
> implementation might, but I don't.  (So no bug even today.)

I'm sorry, I don't understand your description. What do you mean by
"collection"? It's different if you modify dict *keys*, or dict
*values*, or both.

Serhiy opened an issue because he wants to raise an exception if keys
are modified while you iterate on keys:
https://bugs.python.org/issue19332

But only modifying values must *not* raise an exception.


> (B)  I want to process exactly the collection that I started with.  If
> some of the values get replaced, then I want to complain, even if
> python doesn't.  version_tag is what I want.

This is not the issue #19332.


> (C)  I want to process exactly the original keys, but go ahead and use
> updated values.  The bug still bites, but ... I don't think this case
> is any more common than B.

I don't understand exaclty your definition neither. Maybe you need to
provide an example of code.

Sorry, I don't understand why do you want to discuss the issue #19332
here. I only mentioned the issue in "Prior Work" because the
implementation is *similar*, but the PEP 509 is different and so it
doesn't help to fix this issue.

Do you want to modify the PEP 509 to fix this issue? Or you don't
understand why the PEP 509 cannot be used to fix the issue? I'm
lost...

Victor

From pludemann at google.com  Fri Apr 15 23:46:51 2016
From: pludemann at google.com (Peter Ludemann)
Date: Fri, 15 Apr 2016 20:46:51 -0700
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
 <5711298F.7060308@mrabarnett.plus.com>
 <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com>
Message-ID: <CACsRUKJWgNdUJ1gqHr+7hfrfBbo682NyRRJNWDZegBWUAXA7+g@mail.gmail.com>

If Python ever adopts the BCPL rule for implicit line continuation if the
last thing on a line is an operator (or if there's an open parentheses),
then the break-after-an-operator rule would be more persuasive.
;)

[IIRC, the BCPL rule was that there was an implicit continuation if the
grammar would not allow inserting a semicolon at the end of the line, which
covered both the open-parens and last-item-is-operator cases, and probably
a few others.] But I should shut up and leave shut discussions to
python-ideas.


On 15 April 2016 at 13:48, Ian Lee <ianlee1521 at gmail.com> wrote:

> Cross posting the comment I?d left on the issue [1].
>
> > My preference is to actually break that logic up and avoid the wrapping
> in the first place, as in [2]. Which in this particular class has the side
> benefit of that value being used again in the same function anyways.
>
> > I'm starting to realize that Brandon Rhodes really had a big impact on
> my ideas of styling as I've been learning Python these past few years, as
> this was another one style I'm stealing from that same talk [3].
>
> [1] http://bugs.python.org/msg263509
> [2]
> https://github.com/python/peps/commit/0c790e7b721bd13ad12ab9e6f6206836f398f9c4
>
> ~ Ian Lee | IanLee1521 at gmail.com
>
> On Apr 15, 2016, at 10:49, MRAB <python at mrabarnett.plus.com> wrote:
>
> On 2016-04-15 18:03, Victor Stinner wrote:
> > Hum.
> >
> >          if (width == 0
> >              and height == 0
> >              and color == 'red'
> >              and emphasis == 'strong'
> >              or highlight > 100):
> >              raise ValueError("sorry, you lose")
> >
> > Please remove one space to vertically align "and" operators with the
> > opening parenthesis:
> >
> >          if (width == 0
> >             and height == 0
> >             and color == 'red'
> >             and emphasis == 'strong'
> >             or highlight > 100):
> >              raise ValueError("sorry, you lose")
> >
> > (I'm not sure that the difference is obvious in a mail client, you
> > need a fixed width font which is not the case in my Gmail editor.)
> >
> > It helps to visually see that the multiline test and the raise
> > instruction are in two different blocks.
> >
> > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but
> > I cannot use this syntax anymore :-))
> >
> I always half-indent continuation lines:
>
>        if (width == 0
>          and height == 0
>          and color == 'red'
>          and emphasis == 'strong'
>          or highlight > 100):
>            raise ValueError("sorry, you lose")
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/ianlee1521%40gmail.com
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/pludemann%40google.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160415/54057f2c/attachment-0001.html>

From random832 at fastmail.com  Sat Apr 16 00:09:23 2016
From: random832 at fastmail.com (Random832)
Date: Sat, 16 Apr 2016 00:09:23 -0400
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CACsRUKJWgNdUJ1gqHr+7hfrfBbo682NyRRJNWDZegBWUAXA7+g@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
 <5711298F.7060308@mrabarnett.plus.com>
 <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com>
 <CACsRUKJWgNdUJ1gqHr+7hfrfBbo682NyRRJNWDZegBWUAXA7+g@mail.gmail.com>
Message-ID: <1460779763.2138868.580401713.09908C97@webmail.messagingengine.com>

On Fri, Apr 15, 2016, at 23:46, Peter Ludemann via Python-Dev wrote:
> If Python ever adopts the BCPL rule for implicit line continuation if
> the last thing on a line is an operator (or if there's an open
> parentheses), then the break-after-an-operator rule would be more
> persuasive. ;)
>
> [IIRC, the BCPL rule was that there was an implicit continuation if
> the grammar would not allow inserting a semicolon at the end of the
> line, which covered both the open-parens and last-item-is-operator
> cases, and probably a few others.] But I should shut up and leave shut
> discussions to python-ideas.

Sounds like Visual Basic. Meanwhile, Javascript's rule is that there's
an implicit semicolon if and only if the grammar would not allow the
two lines to be considered as a single statement. Insanity comes in
all flavors.

From stephen at xemacs.org  Sat Apr 16 07:21:54 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 Apr 2016 20:21:54 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
Message-ID: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > On 15 April 2016 at 00:52, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > Nick Coghlan writes:
 > >
 > >  > The use case for returning bytes from __fspath__ is DirEntry, so you
 > >  > can write things like this in low level code:
 > >  >
 > >  >     def myscandir(dirpath):
 > >  >         for entry in os.scandir(dirpath):
 > >  >             if entry.is_file():
 > >  >                 with open(entry) as f:
 > >  >                     # do something
 > >
 > > Excuse me, but that is *not* a use case for returning bytes from
 > > DirEntry.__fspath__.  open() is perfectly happy taking str (including
 > > surrogate-encoded rawbytes).
 > 
 > That results in a different type for the file object's name:
 > 
 > >>> open("README.md").name
 > 'README.md'
 > >>> open(b"README.md").name
 > b'README.md'

OK, you win, __fspath__ needs to be polymorphic.

But you've just shifted me to -1 on "os.fspath": it's an attractive
nuisance.  EIBTI, applications and high-level library functions should
use os.fsdecode or os.fsencode.  Functions that take a polymorphic
argument and want preserve type should invoke __fspath__ on the
argument.  That will visually signal that the caller is not merely
low-level, but is explicitly a boundary function.  (You could rename
the generic function as "os._fspath", I guess, but I *really* want to
deprecate calling the polymorphic version in user code.  _fspath can
be added if experience shows that polymorphic usage is very desireable
outside the stdlib.  This remark is in my not-so-Dutch opinion, of
course.)

 > The guarantee we want to provide those folks is that if they're
 > operating in the binary domain they'll stay there.

Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing with
an implicitly invoked polymorphic API like this one -- unless you
consider a crashed program to be in the binary domain. ;-)  Note that
the current proposala don't even do that for the binary domain, only
for the text domain!


From p.f.moore at gmail.com  Sat Apr 16 08:05:25 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 16 Apr 2016 13:05:25 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CACac1F_wdG4um9nfFTM3zBRuYLi7REgQ6-tj5+Dq547Hmd_mZA@mail.gmail.com>

On 16 April 2016 at 12:21, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> OK, you win, __fspath__ needs to be polymorphic.
>
> But you've just shifted me to -1 on "os.fspath": it's an attractive
> nuisance.  EIBTI, applications and high-level library functions should
> use os.fsdecode or os.fsencode.

I presume your expectation is that os.fsencode/os.fsdecode will work
with objects supporting the __fspath__ protocol?

So the question for me is, if I'm writing a function that takes a path
argument p (in the most general sense - I want my function to be able
to handle anything the stdlib functions can) then how do I write the
code? There are 4 cases I can think of:

1. I just want to pass the argument on to other functions - just do
so, stdlib functions will work fine.
2. I need a string - use os.fsdecode(p)
3. I need bytes - use os.fsencode(p)
4. I need a guaranteed pathlib.Path object so that I can use Path
methods - convert via pathlib.Path(os.fsdecode(p))

I guess there's the possibility that you want to deliberately reject
bytes-like paths, and it's not immediately obvious how you'd do that
without os.fspath or using the __fspath__ protocol directly, but I'm
not sure what anyone gains by doing so (maybe the chance to fail
early? but doesn't using fsdecode mean I never need to fail at all?)

While I don't have any specific reason to object to os.fspath, I'd
appreciate someone describing a concrete use case that needs it (and
isn't covered by any of the options above).

Paul

From francismb at email.de  Sat Apr 16 09:29:41 2016
From: francismb at email.de (francismb)
Date: Sat, 16 Apr 2016 15:29:41 +0200
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAP7+vJJRLnvgr_8ZNCjVvCCozPdMasyXfmWUMM3nx4-rTNqbVA@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
 <ner83s$u6d$1@ger.gmane.org>
 <CAP7+vJJRLnvgr_8ZNCjVvCCozPdMasyXfmWUMM3nx4-rTNqbVA@mail.gmail.com>
Message-ID: <57123E45.9050902@email.de>

Hi,

On 04/15/2016 07:43 PM, Guido van Rossum wrote:
> The update is already serving its real purpose: showing that style is
> debatable and cannot always easily be reduced to fixed rules.
> 

As you said, there will be always some kind personal preferences or
style taste and one can see on the debate that the current rules are
context dependent. But I wonder how far that style context/rule
(function) evaluation/application issue could be solved in a machine
learning context.

Regards,
francis

From stephen at xemacs.org  Sat Apr 16 09:46:02 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 Apr 2016 22:46:02 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACac1F_wdG4um9nfFTM3zBRuYLi7REgQ6-tj5+Dq547Hmd_mZA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CACac1F_wdG4um9nfFTM3zBRuYLi7REgQ6-tj5+Dq547Hmd_mZA@mail.gmail.com>
Message-ID: <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp>

Paul Moore writes:
 > On 16 April 2016 at 12:21, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > OK, you win, __fspath__ needs to be polymorphic.
 > >
 > > But you've just shifted me to -1 on "os.fspath": it's an attractive
 > > nuisance.  EIBTI, applications and high-level library functions should
 > > use os.fsdecode or os.fsencode.
 > 
 > I presume your expectation is that os.fsencode/os.fsdecode will work
 > with objects supporting the __fspath__ protocol?

Yes, I've suggested that before, and I think it's TOOWTDI, rather than
insisting on a os.fspath intervening, even if os.fspath is included
after all.

 > So the question for me is, if I'm writing a function that takes a path
 > argument p:

 > 1. I just want to pass the argument on to other functions - just do
 > so, stdlib functions will work fine.

I think this is a bad idea unless you *need* polymorphism, but OK,
it's "consenting adults".

 > 2. I need a string - use os.fsdecode(p)
 > 3. I need bytes - use os.fsencode(p)
 > 4. I need a guaranteed pathlib.Path object so that I can use Path
 > methods - convert via pathlib.Path(os.fsdecode(p))

LGTM.  Applications or user toolkits could provide a derived
IFeelLuckyPath(Path) for symmetry with the os functions.<wink/>

 > I guess there's the possibility that you want to deliberately reject
 > bytes-like paths,

I wouldn't put it that way.  I think more likely is the possibility
that you want to restrict yourself to a particular type, as all your
code is written in terms of that type and expects that type.  Note
that Nick's example shows that in both the bytes domain and the text
domain you can easily end up with a filelike.name of the wrong type.

 > and it's not immediately obvious how you'd do that without
 > os.fspath or using the __fspath__ protocol directly, but I'm not
 > sure what anyone gains by doing so (maybe the chance to fail early? 
 > but doesn't using fsdecode mean I never need to fail at all?)

Well, wouldn't you like to raise there if your dataflow spec says only
one type should ever be observed?

The reasons that I wouldn't bother are that (1) I suspect it's going
to be very rare to see bytes in a text application, and (2) in bytes-
oriented code I would be fairly likely to either specify literals as
str (a bug, but nobody would ever notice) or importing them from an
.ini or other text source (which might very well be in a non-
filesystem encoding in my environment!)  In either case it's probably
the filename I want but specified in the wrong form.


From stephen at xemacs.org  Sat Apr 16 09:48:47 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 Apr 2016 22:48:47 +0900
Subject: [Python-Dev] PEP 8 updated on whether to break before or after
 a binary update
In-Reply-To: <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
References: <CAP7+vJ++jCV6-Vt98u88Cw72JytxwiGqE5vxoO=vxjSwkhM1vw@mail.gmail.com>
 <CAMpsgwZE0NAbH_tXP1O3+bUT9yFWW3V3WA_bAQ=mWqDjqxfCng@mail.gmail.com>
Message-ID: <22290.17087.797345.923061@turnbull.sk.tsukuba.ac.jp>

Victor Stinner writes:
 > Hum.
 > 
 >         if (width == 0
 >             and height == 0
 >             and color == 'red'
 >             and emphasis == 'strong'
 >             or highlight > 100):
 >             raise ValueError("sorry, you lose")
 > 
 > Please remove one space to vertically align "and" operators with the
 > opening parenthesis:
 > 
 >         if (width == 0
 >            and height == 0
 >            and color == 'red'
 >            and emphasis == 'strong'
 >            or highlight > 100):
 >             raise ValueError("sorry, you lose")

The RightThang[tm] is to remove "if" and replace it with the Japanese
"moshi":

        moshi (width == 0
               and height == 0
               and color == 'red'
               and emphasis == 'strong'
               or highlight > 100):
            raise ValueError("sorry, you lose")

It-works-for-me-ly y'rs,


From p.f.moore at gmail.com  Sat Apr 16 12:30:15 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 16 Apr 2016 17:30:15 +0100
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CACac1F_wdG4um9nfFTM3zBRuYLi7REgQ6-tj5+Dq547Hmd_mZA@mail.gmail.com>
 <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CACac1F_Z0oyZ=WCtRADXup2-cGNZEwbVEXdOVDrBmjUfiGi6LQ@mail.gmail.com>

On 16 April 2016 at 14:46, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Paul Moore writes:
[...]
>  > 1. I just want to pass the argument on to other functions - just do
>  > so, stdlib functions will work fine.
>
> I think this is a bad idea unless you *need* polymorphism, but OK,
> it's "consenting adults".

All I'm really saying here is that if you don't need to care about
type checking (and 99% of Python programs rely on duck typing, so this
is pretty much the norm) then everything will be OK. I'm not
suggesting encouraging polymorphism, just pointing out that most code
should simply work and this whole debate is a non-issue for code like
that. (That's the whole point of getting the stdlib functions to
accept Path objects, after all :-))

>  > 2. I need a string - use os.fsdecode(p)
>  > 3. I need bytes - use os.fsencode(p)
>  > 4. I need a guaranteed pathlib.Path object so that I can use Path
>  > methods - convert via pathlib.Path(os.fsdecode(p))
>
> LGTM.  Applications or user toolkits could provide a derived
> IFeelLuckyPath(Path) for symmetry with the os functions.<wink/>
>
>  > I guess there's the possibility that you want to deliberately reject
>  > bytes-like paths,
>
> I wouldn't put it that way.  I think more likely is the possibility
> that you want to restrict yourself to a particular type, as all your
> code is written in terms of that type and expects that type.  Note
> that Nick's example shows that in both the bytes domain and the text
> domain you can easily end up with a filelike.name of the wrong type.

But within your own code, you do that by convention and good coding
practices, not by explicit type checks (except in boundary code). If
you're writing a library to be used by others, you should be as
permissive as possible - you may not expect your code to be called
with bytes-like paths, but why go out of your way to reject it? That's
not Pythonic, IMO. (On the other hand, documenting that only text-like
path objects are supported by your library is fine).

In my experience, bytes/text safety is about being aware of where the
two different types appear in your program, not about forcing only one
type. So my cases are about keeping the types clear - the output of
(1) is "same as input", of (2) is "string", of (3) is "bytes" and of
(4) is "Path". Call me with whatever you like, I can work with it in
terms I need.

But we're mostly just debating coding style here, I think we agree on
the basic principle.

>  > and it's not immediately obvious how you'd do that without
>  > os.fspath or using the __fspath__ protocol directly, but I'm not
>  > sure what anyone gains by doing so (maybe the chance to fail early?
>  > but doesn't using fsdecode mean I never need to fail at all?)
>
> Well, wouldn't you like to raise there if your dataflow spec says only
> one type should ever be observed?

Meh. Maybe asserts, maybe unit tests. But typechecks throughout my
code sounds more like strong typing than Python. But as I say, coding
style - I write scripts, glue code, and general-use libraries. None of
these lend themselves to that sort of rigorous dataflow analysis (this
is the same reason I have little personal use for the new typechecking
stuff).

> The reasons that I wouldn't bother are that (1) I suspect it's going
> to be very rare to see bytes in a text application, and (2) in bytes-
> oriented code I would be fairly likely to either specify literals as
> str (a bug, but nobody would ever notice) or importing them from an
> .ini or other text source (which might very well be in a non-
> filesystem encoding in my environment!)  In either case it's probably
> the filename I want but specified in the wrong form.

Also, that feels very much like the sort of boundary code that needs
to do the fiddly rigorous stuff so the rest of us don't have to :-)

Paul

From chris.barker at noaa.gov  Sat Apr 16 14:47:26 2016
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Sat, 16 Apr 2016 11:47:26 -0700
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us>
 <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
Message-ID: <8690193008049818583@unknownmsgid>

> On Apr 13, 2016, at 8:31 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>>>   class Special(bytes):
>>>       def __fspath__(self):
>>>         return 'str-val'
>>>   obj = Special('bytes-val', 'utf8')
>>>   path_obj = fspath(obj, allow_bytes=True)
>>>
>>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>
> In this kind of case, inheritance tends to trump protocol.

Sure, but...

> example, int subclasses can't override operator.index:
...
> The reasons for that behaviour are more pragmatic than philosophical:
> builtins and their subclasses are extensively special-cased for speed
> reasons,

OK, but in this case, purity can beat practicality. If the author
writes an __fspath__ method, presumably it's because it should be
used.

And I can certainly imagine one might want to store a path
representation as bytes, but NOT want the raw bytes passed off to file
handling libs.

(of course you could use composition rather than subclassing if you had to)

-CHB

From gunkmute at gmail.com  Sat Apr 16 20:04:57 2016
From: gunkmute at gmail.com (Demur Rumed)
Date: Sun, 17 Apr 2016 00:04:57 +0000
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
Message-ID: <CAA1B2Jy-ewfNo2x8NJKe-mrwBF80Xw3r9qg6nT-ePKmauzoT3g@mail.gmail.com>

 The outstanding bug with this patch right now is a regression in line
numbers causing the test for http://bugs.python.org/issue9936 to fail. I've
tried to debug it without success
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160417/6f004c79/attachment.html>

From ncoghlan at gmail.com  Sat Apr 16 21:28:09 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 17 Apr 2016 11:28:09 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>

On 16 April 2016 at 21:21, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Nick Coghlan writes:
>
>  > On 15 April 2016 at 00:52, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>  > > Nick Coghlan writes:
>  > >
>  > >  > The use case for returning bytes from __fspath__ is DirEntry, so you
>  > >  > can write things like this in low level code:
>  > >  >
>  > >  >     def myscandir(dirpath):
>  > >  >         for entry in os.scandir(dirpath):
>  > >  >             if entry.is_file():
>  > >  >                 with open(entry) as f:
>  > >  >                     # do something
>  > >
>  > > Excuse me, but that is *not* a use case for returning bytes from
>  > > DirEntry.__fspath__.  open() is perfectly happy taking str (including
>  > > surrogate-encoded rawbytes).
>  >
>  > That results in a different type for the file object's name:
>  >
>  > >>> open("README.md").name
>  > 'README.md'
>  > >>> open(b"README.md").name
>  > b'README.md'
>
> OK, you win, __fspath__ needs to be polymorphic.
>
> But you've just shifted me to -1 on "os.fspath": it's an attractive
> nuisance.
>
> EIBTI, applications and high-level library functions should
> use os.fsdecode or os.fsencode.  Functions that take a polymorphic
> argument and want preserve type should invoke __fspath__ on the
> argument. That will visually signal that the caller is not merely
> low-level, but is explicitly a boundary function.

str and bytes aren't going to implement __fspath__ (since they're only
*sometimes* path objects), so asking people to call the protocol
method directly for any purpose would be a pain.

>  (You could rename
> the generic function as "os._fspath", I guess, but I *really* want to
> deprecate calling the polymorphic version in user code.  _fspath can
> be added if experience shows that polymorphic usage is very desireable
> outside the stdlib.  This remark is in my not-so-Dutch opinion, of
> course.)

You may have missed my email where I agreed os.fspath() itself needs
to ensure the output is a str object and throw an exception otherwise.
The remaining API design debate relates to whether the polymorphic
version should be "os.fspath(obj, allow_bytes=True)" or
"os._raw_fspath(obj)" (with Ethan favouring the former, and me the
latter).

>
>  > The guarantee we want to provide those folks is that if they're
>  > operating in the binary domain they'll stay there.
>
> Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing with
> an implicitly invoked polymorphic API like this one -- unless you
> consider a crashed program to be in the binary domain. ;-)

I do, as one of the core changes in design philosophy between Python 2
and 3 is attempting to remove the implicit level shifting between the
binary and text domains, and instead throw exceptions in those cases.
Pragmatism requires us to keep some of them (e.g. the codecs module is
officially object<->object in both Python 2 and Python 3, and string
formatting codes can still do unexpected things), but a great many of
them are already gone, and we don't want to add any new ones if
alternative designs are available.

> Note that
> the current proposala don't even do that for the binary domain, only
> for the text domain!

Folks that want to ensure they're working in the binary domain can
already do "memoryview(obj)" to ensure they have a bytes-like object
without constraining it to a specific type.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sat Apr 16 21:38:11 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 17 Apr 2016 11:38:11 +1000
Subject: [Python-Dev] pathlib - current status of discussions
In-Reply-To: <8690193008049818583@unknownmsgid>
References: <570C1E13.4090909@stoneleaf.us>
 <loom.20160413T071958-483@post.gmane.org>
 <CAP1=2W5PWxd_Jzxz=D_Ox8eorTiY-NdP8NvvhYoKCtERqpaLFg@mail.gmail.com>
 <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us>
 <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us>
 <CADiSq7eTbL=LAMi+HSAwR5gwtGMWwv_Z_xVAfEf_2Z2xAbLm5Q@mail.gmail.com>
 <8690193008049818583@unknownmsgid>
Message-ID: <CADiSq7cCcadB7fYkxMTGvY2sUniGRKa2LonPyQ4W88e1JT9FtQ@mail.gmail.com>

On 17 April 2016 at 04:47, Chris Barker - NOAA Federal
<chris.barker at noaa.gov> wrote:
>> On Apr 13, 2016, at 8:31 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>>>>   class Special(bytes):
>>>>       def __fspath__(self):
>>>>         return 'str-val'
>>>>   obj = Special('bytes-val', 'utf8')
>>>>   path_obj = fspath(obj, allow_bytes=True)
>>>>
>>>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'.
>>
>> In this kind of case, inheritance tends to trump protocol.
>
> Sure, but...
>
>> example, int subclasses can't override operator.index:
> ...
>> The reasons for that behaviour are more pragmatic than philosophical:
>> builtins and their subclasses are extensively special-cased for speed
>> reasons,
>
> OK, but in this case, purity can beat practicality. If the author
> writes an __fspath__ method, presumably it's because it should be
> used.
>
> And I can certainly imagine one might want to store a path
> representation as bytes, but NOT want the raw bytes passed off to file
> handling libs.
>
> (of course you could use composition rather than subclassing if you had to)

Exactly - inheritance is a really strong relationship that directly
affects the in-memory layout of instances (at least in CPython), and
also the kinds of assumption other code will make about that type (for
example, subclasses are special cased to allow them to override the
behaviour of numeric binary operators when they appear as the right
operand with an instance of the parent type as the left operand, while
with unrelated types, the left operand always gets the first chance to
handle the operation).

When folks don't want to trigger those "this is an <X>" behaviours,
the appropriate design pattern is composition, not inheritance (and
many of the ABCs were introduced to make it easier to implement
particular interfaces without inheriting from the corresponding
builtin types).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From stephen at xemacs.org  Sun Apr 17 04:03:54 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 17 Apr 2016 17:03:54 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
Message-ID: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > str and bytes aren't going to implement __fspath__ (since they're
 > only *sometimes* path objects), so asking people to call the
 > protocol method directly for any purpose would be a pain.

It *should* be a pain.  People who need bytes should call fsencode,
people who need str should call fsdecode, and Ethan's antipathy checks
for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
the bartender and the janitor, last call was hours ago.  OK, maybe
there are enough clients to make it worthwhile to provide the utility,
but it should be clearly marked as "double opt-in, for experts only
(consenting adults must show proof of insurance)".

The functionality of raising on wrong types can be incorporated in
fsencode and fsdecode, but I think there's still some discussion
needed about the conditions for raising, and what flags are needed.

Of course with this reinterpretation, names like "fs_ensure_str" and
"fs_ensure_bytes" might be more appropriate (much as y'all hate
putting types in function names, in this case I think that's best).
But backward compatibility, and the existing names aren't *that* bad I
guess.

 > You may have missed my email where I agreed os.fspath() itself
 > needs to ensure the output is a str object and throw an exception
 > otherwise.

Presumably it should do the same for bytes when those are desired,
though.  I don't find the "cast to bytes using memoryview" approach
plausible, especially not where I live: if str, very likely some of
the characters will be outside of the latin1 repertoire, and thus the
internal representation will likely be full of NULs, and certainly not
be what the user wants.

 > The remaining API design debate relates to whether the polymorphic
 > version should be "os.fspath(obj, allow_bytes=True)" or
 > "os._raw_fspath(obj)" (with Ethan favouring the former, and me the
 > latter).

 > > Et tu, Nick?  "Guarantee"?!  You can't guarantee any such thing
 > > with an implicitly invoked polymorphic API like this one --
 > > unless you consider a crashed program to be in the binary
 > > domain. ;-)
 > 
 > I do, as one of the core changes in design philosophy between
 > Python 2 and 3 is attempting to remove the implicit level shifting
 > between the binary and text domains,

Hey, Reverend, I've been singing those hymns since the early '90s.

 > and instead throw exceptions in those cases.

Then I don't understand the current design of fsdecode and fsencode.
Shouldn't they raise on str and bytes respectively, rather than
passing them through?  In general, I would expect that something
that's explicitly intended to be polymorphic would be documented as
such, and the *caller* would be responsible for type-checking and
raising if it got the wrong thing.

Steve

From ncoghlan at gmail.com  Sun Apr 17 08:36:15 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 17 Apr 2016 22:36:15 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7ddSjcNG04xWuZWaw058KjKoiNG8Q3pcQfjfw-pV25pEQ@mail.gmail.com>

On 17 April 2016 at 18:03, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> Nick Coghlan writes:
>  > and instead throw exceptions in those cases.
>
> Then I don't understand the current design of fsdecode and fsencode.
> Shouldn't they raise on str and bytes respectively, rather than
> passing them through?  In general, I would expect that something
> that's explicitly intended to be polymorphic would be documented as
> such, and the *caller* would be responsible for type-checking and
> raising if it got the wrong thing.
>

I was initially surprised myself, but then realised it made sense for their
intended use cases - if almost every usage looks like "obj if
isinstance(obj, str) else os.fsdecode(obj)", then there ends up being a
strong pragmatic case for pushing the pass-through down into the underlying
function to reduce code duplication and rejecting str input in the cases
where it isn't supported. By contrast, there are lots of places where
"obj.decode()" gets called without a pass-through for objects that are
already decoded.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160417/6953c682/attachment.html>

From k7hoven at gmail.com  Sun Apr 17 09:58:19 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Sun, 17 Apr 2016 16:58:19 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>

On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
> Nick Coghlan writes:
>
>  > str and bytes aren't going to implement __fspath__ (since they're
>  > only *sometimes* path objects), so asking people to call the
>  > protocol method directly for any purpose would be a pain.
>
> It *should* be a pain.  People who need bytes should call fsencode,
> people who need str should call fsdecode, and Ethan's antipathy checks
> for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
> the bartender and the janitor, last call was hours ago.  OK, maybe
> there are enough clients to make it worthwhile to provide the utility,
> but it should be clearly marked as "double opt-in, for experts only
> (consenting adults must show proof of insurance)".

My doubts, expressed several times in these threads, about the need
for a *public* os.fspath function to complement the __fspath__
protocol, are now perhaps gone. I'll explain why (and how). The
reasons for my doubts were that

(1) The audience outside the stdlib for such a function should be
small, because it is preferred to either use existing tools in
os.path.* or pathlib (or similar) for manipulating paths.

(2) There are just too many different possible versions of this
function: rejecting str, rejecting bytes, coercion to str, coercion to
bytes, and accepting both str and bytes. That's a total of 5 different
cases. People also used to talk about versions that would not allow
passing through objects that are already bytes or str. That would make
it a total of 10 different versions!
(in principle, there could be even more, but let's not go there :-).
In other words, this argument was that it is probably best to
implement whatever flavor is needed for the context, perhaps based on
documented recipes.


Regarding (2), we can first rule out half of the 10 cases---the ones
that reject plain instances of bytes and/or str---because they would
not be very useful as all the isinstance/hasattr checking etc. would
be left to the caller. And here are the remaining five, explained
based on what they accept as argument, what they return, and where
they would be used:

(A) "polymorphic"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str/bytes depending on input.
*Audience*: the stdlib, including os.path.things, os.things,
shutil.things, open, ... (some functions would need a C version).
There may even be a small audience outside the stdlib.

(B) "str-based only"
*Accept*: str, provided via __fspath__ as well as plain str.
*Return*: str.
*Audience*: relatively low-level code that works exclusively with str
paths but accepts specialized path objects as input.

(C) "bytes-based only"
*Accept*: bytes, provided via __fspath__ as well as plain bytes.
*Return*: bytes.
*Audience*: low-level code that explicitly deals with paths as bytes
(probably to deal with undefined/ill-defined encodings).

(D) "coerce to str"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str (coerced / decoded if needed).
*Audience*: code that deals explicitly with str but wants to 'try'
supporting bytes-based path inputs too via implicit decoding (even if
it may result in surrogate escapes, which one cannot for instance
print(...).)

(E) "coerce to bytes"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: bytes (coerced / encoded if needed).
*Audience*: low-level code that explicitly deals with bytes paths but
wants to accept str-based path inputs too via implicit encoding.


Even if all options (A-E) probably have small audiences (compared to
e.g. os.path.*), some of them have larger audiences than others. But
all of them have at least *some* reasonable audience (as desribed
above).

Recently (well, a few days ago, but 'recently', considering the scale
of these discussions anyway ;-), Nick pointed out something I hadn't
realized---os.fsencode and os.fsdecode actually already implement
coercion to bytes and str, respectively. With those two functions made
compatible with the __fspath__ protocol [using (A) above], they would
in fact *be* (D) and (E), respectively.

Now, we only have options (A-C) left. They could all be implemented
roughly as follows:

def fspath(pathlike, *, output_types = (str,)):
  if hasattr(pathlike, '__fspath__'):
    ret = pathlike.__fspath__()  # or pathlike.__fspath__ if it's not a method
  else:
    ret = pathlike
  if not isinstance(ret, output_types):
    raise TypeError("argument is not and does not provide an
acceptable pathname")
  return ret

With an implementation like the above, (A) would correspond to
output_types = (str, bytes), (B) to the default, and (C) to
output_types = (bytes,).


So, with the above considerations as a counterargument, I consider
argument (2) gone.

What about argument (1), that the audience for the os.fspath(...)
function (especially for one selected version of the 5 or 10
variations!) is quite small, and we should not encourage manipulating
pathnames by hand, but to use os.path.* or pathlib instead?

The counterargument for (1):

It seems to me we now "all" agree that __fspath__ should allow
str+bytes polymorphism. I could try to list who I mean by "all"
(Ethan, Brett, Stephen T, Nick, ... ?), but obviously I won't be able
to list all or speak for them so I won't even try :-). Anyway, for
this argument, I'm assuming we agree on that. So, __fspath__ can
provide either str or bytes, even if str is *highly preferred* in most
places. Therefore, the os.fspath function, as part of the protocol,
has the important role of *by default* rejecting bytes, so that the
protocol effectively becomes str-only by default. With the fspath
implementation like the one I drafted above, and
os.fsencode+os.fsdecode, we in fact cover all cases (A-E).

So, as a summary: With a str+bytes-polymorphic __fspath__, with the
above argumentation and the rough implementation of os.fspath(...),
the conclusion is that the os.fspath function should indeed be public,
and that no further variations are needed.

-Koos

P.S. There is also the possibility of two dunder methods corresponding
to str and bytes, leading to one being preferred over the other in
some cases etc. I have gone though various aspects and possible
versions of that approach, but concluded it's not worth it, as some of
us may also have implied in earlier posts. After all, we want
something that's *almost* exclusively str.

From ericfahlgren at gmail.com  Sun Apr 17 12:08:19 2016
From: ericfahlgren at gmail.com (Eric Fahlgren)
Date: Sun, 17 Apr 2016 09:08:19 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAA1B2Jy-ewfNo2x8NJKe-mrwBF80Xw3r9qg6nT-ePKmauzoT3g@mail.gmail.com>
References: <CAA1B2Jy-ewfNo2x8NJKe-mrwBF80Xw3r9qg6nT-ePKmauzoT3g@mail.gmail.com>
Message-ID: <008901d198c3$5cba8f70$162fae50$@gmail.com>

Just on the off chance that it?s related, could it have something to do with the bug in findlabels?

 

http://bugs.python.org/issue26448

 

(I have high confidence that my patch fixes the problem, just haven?t gotten around to completing the tests.)

 

From: Demur Rumed [mailto:gunkmute at gmail.com] 
Sent: Saturday, April 16, 2016 17:05
To: python-dev at python.org
Subject: Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

 

 The outstanding bug with this patch right now is a regression in line numbers causing the test for http://bugs.python.org/issue9936 to fail. I've tried to debug it without success

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160417/20b7242e/attachment.html>

From ethan at stoneleaf.us  Sun Apr 17 14:14:02 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 17 Apr 2016 11:14:02 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
Message-ID: <5713D26A.4000704@stoneleaf.us>

On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:

> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
> above argumentation and the rough implementation of os.fspath(...),
> the conclusion is that the os.fspath function should indeed be public,
> and that no further variations are needed.

Nice summation, thank you.  :)

--
~Ethan~


From k7hoven at gmail.com  Sun Apr 17 17:05:24 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Mon, 18 Apr 2016 00:05:24 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <5713D26A.4000704@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <5713D26A.4000704@stoneleaf.us>
Message-ID: <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>

On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
>
>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
>> above argumentation and the rough implementation of os.fspath(...),
>> the conclusion is that the os.fspath function should indeed be public,
>> and that no further variations are needed.
>
>
> Nice summation, thank you.  :)
>

Come on, Ethan, that summary was not for you ;) It was for lazy
people, people with bad memory, or people not so involved in the
topic. I wrote a big post, provided new arguments, with other points
collected into the same logical framework, wrote a new version of
os.fspath and argued why it is the right one --- and all you do is
read the stupid summary. You can do better than that: read the whole
thing! ;-).

-Koos

From rosuav at gmail.com  Sun Apr 17 17:14:19 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 18 Apr 2016 07:14:19 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <5713D26A.4000704@stoneleaf.us>
 <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
Message-ID: <CAPTjJmqYY=c3kFfWwRe+34xtwoY=Jjgnq16z9E1fgb+ZeWaLAQ@mail.gmail.com>

On Mon, Apr 18, 2016 at 7:05 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:
> On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
>>
>>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
>>> above argumentation and the rough implementation of os.fspath(...),
>>> the conclusion is that the os.fspath function should indeed be public,
>>> and that no further variations are needed.
>>
>>
>> Nice summation, thank you.  :)
>>
>
> Come on, Ethan, that summary was not for you ;) It was for lazy
> people, people with bad memory, or people not so involved in the
> topic. I wrote a big post, provided new arguments, with other points
> collected into the same logical framework, wrote a new version of
> os.fspath and argued why it is the right one --- and all you do is
> read the stupid summary. You can do better than that: read the whole
> thing! ;-).

Yes, but people like me who haven't read every single post appreciate
the vote of support from someone who has. Ethan's post says that this
one-paragraph summary has twice as much weight as it had when only one
person attests it.

So, thank you Koos for summarizing, and thank you Ethan for affirming
the summary.

ChrisA

From ethan at stoneleaf.us  Sun Apr 17 17:52:37 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 17 Apr 2016 14:52:37 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>	<570E659C.8010108@stoneleaf.us>	<1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>	<22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>	<CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>	<22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>	<CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>	<5713D26A.4000704@stoneleaf.us>
 <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
Message-ID: <571405A5.7090406@stoneleaf.us>

On 04/17/2016 02:05 PM, Koos Zevenhoven wrote:
> On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
>>
>>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
>>> above argumentation and the rough implementation of os.fspath(...),
>>> the conclusion is that the os.fspath function should indeed be public,
>>> and that no further variations are needed.
>>
>>
>> Nice summation, thank you.  :)
>>
>
> Come on, Ethan, that summary was not for you ;)

Heh.

> You can do better than that: read the whole thing! ;-).

Ah, but I did read the whole thing!  I just didn't want to quote it all 
and then add one line, so I snipped the rest.

Let me try again:

Good, well thought-out post.  Thank you.  :)

if-at-first-you-don't-succeed'ly yrs,

--
~Ethan~

From burkhardameier at gmail.com  Mon Apr 18 01:23:49 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Sun, 17 Apr 2016 22:23:49 -0700
Subject: [Python-Dev] My first post here ~ do you need more Python core
 developers on Windows?
Message-ID: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>

Hi,

I just subscribed to the "Python-Dev" mailing list and the 'Welcome" reply
asked me to introduce myself.

My name is Burkhard Meier and I wrote the "Python GUI Programming Cookbook"
published by Packt.

It is available on Amazon and PacktPub.com.

Maybe I can become more involved in the Python community as a Python
developer on Windows .

Kind regards,
Burkhard Meier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160417/df419356/attachment.html>

From ncoghlan at gmail.com  Mon Apr 18 03:41:16 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 18 Apr 2016 17:41:16 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <5713D26A.4000704@stoneleaf.us>
 <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
Message-ID: <CADiSq7d_gt+vfPAwtE1Knx77M_BegTghCLzFA6TkqFWE24rSpg@mail.gmail.com>

On 18 April 2016 at 07:05, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> > On 04/17/2016 06:58 AM, Koos Zevenhoven wrote:
> >
> >> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
> >> above argumentation and the rough implementation of os.fspath(...),
> >> the conclusion is that the os.fspath function should indeed be public,
> >> and that no further variations are needed.
> >
> >
> > Nice summation, thank you.  :)
> >
>
> Come on, Ethan, that summary was not for you ;)


As Chris noted though, the "Yes, that summary is accurate" from active
participants in the discussion helps assure readers that it's a good
overview :)

Given the variant you suggested, what if we defined the API semantics like
this:

    # Offer the simplest possible API as the public vesion
    def fspath(pathlike) -> str:
        return os._raw_fspath(pathlike)

    # Expose the complexity in the "private" variant
    def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
        # Short-circuit for instances of the output type
        if isinstance(pathlike, output_types):
            return pathlike
        # We'd have a tidier error message here for non-path objects
        result = pathlike.__fspath__()
        if not isinstance(result, output_types):
            raise TypeError("argument is not and does not provide an
acceptable pathname")
        return result

That way, the default API would be saying unambiguously that the preferred
way of manipulating filesystem paths is as text, but the lower level
"mainly for the standard library" API would explicitly handle the 3
different scenarios (binary-input-is-a-bug, text-input-is-a-bug, and
either-binary-or-text-input-is-fine).

That way the structure of the additional parameters on _raw_fspath can be
tailored specifically to the needs of the standard library, without
worrying as much about 3rd party use cases.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/dd5bf3a4/attachment.html>

From ncoghlan at gmail.com  Mon Apr 18 03:44:12 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 18 Apr 2016 17:44:12 +1000
Subject: [Python-Dev] My first post here ~ do you need more Python core
 developers on Windows?
In-Reply-To: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>
References: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>
Message-ID: <CADiSq7dSZBmy8U5B5srQ-rGFx3HZo266pvexF14Ma8MX4u8_kw@mail.gmail.com>

On 18 April 2016 at 15:23, Burkhard Meier <burkhardameier at gmail.com> wrote:

> Maybe I can become more involved in the Python community as a Python
> developer on Windows .
>

Welcome! We definitely still have a marked skew towards Linux and *nix
programmers in general relative to the global software development
population, so participation from additional experienced Windows developers
is always appreciated :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/4d35531a/attachment.html>

From victor.stinner at gmail.com  Mon Apr 18 04:16:05 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 18 Apr 2016 10:16:05 +0200
Subject: [Python-Dev] My first post here ~ do you need more Python core
 developers on Windows?
In-Reply-To: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>
References: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>
Message-ID: <CAMpsgwbO1rX0wx2hUAfceTNUU+4oL0=OuD0LRBGBMkQrU24M-Q@mail.gmail.com>

2016-04-18 7:23 GMT+02:00 Burkhard Meier <burkhardameier at gmail.com>:
> My name is Burkhard Meier and I wrote the "Python GUI Programming Cookbook"
> published by Packt.
>
> It is available on Amazon and PacktPub.com.

Welcome!

> Maybe I can become more involved in the Python community as a Python
> developer on Windows .

You can use the Developer Guide to start:
https://docs.python.org/devguide/

See also the Python menthors to get help on a dedicated and private
mailing list:
http://pythonmentors.com/

Sadly yes, we have many open issues specific to Windows. I'm trying to
sometimes give time to fix some of them, but I'm less interested than
in open source operating systems ;-)

Victor

From jimjjewett at gmail.com  Mon Apr 18 07:20:47 2016
From: jimjjewett at gmail.com (Jim J. Jewett)
Date: Mon, 18 Apr 2016 07:20:47 -0400
Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict
In-Reply-To: <CAMpsgwZ9uv+cWHU25GiczbacDeHA4Pw-5NvR6nQ_6YxaXNYrOA@mail.gmail.com>
References: <CAMpsgwbKdUmCsNd2mCpPg5_3LBQG6ZxMiQHa6Pj4PUH2=eFPkQ@mail.gmail.com>
 <57112af3.641d8c0a.3bf39.03b0@mx.google.com>
 <CAMpsgwbYOcZ7W_RqqR1whi3jbiL+ssMn74hrzGJTZkp1SrVbXA@mail.gmail.com>
 <CA+OGgf4B1xtU8vrf5DEho5+W-6YrMgCtvhXBCVofESU5zKhxJQ@mail.gmail.com>
 <CAMpsgwZ9uv+cWHU25GiczbacDeHA4Pw-5NvR6nQ_6YxaXNYrOA@mail.gmail.com>
Message-ID: <CA+OGgf6AzCcTqZjkUM0Mv828g3xN3g-JTCGpv1yvZeRxzggzyA@mail.gmail.com>

On Fri, Apr 15, 2016 at 7:31 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> .2016-04-15 23:45 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:
...
>> I just worry that you may end up closing off even better optimizations
>> later, if you make too many promises about exactly how you will do
>> which ones.

>> Today, dict only cares about ==, and you (reasonably) think that full
>> == isn't always worth running ... but when it comes to which tests
>> *are* worth running, I'm not confident that the answers won't change
>> over the years.

> I checked, currently there is no unit test for a==b, only for a is b.
> I will add add a test for a==b but a is not b, and ensure that the
> version is increased.

Again, why?  Why not just say "If an object is replaced by something
equal to itself, the version_tag may not be changed.  While the
initial heuristics are simply to check for identity but not full
equality, this may change in future releases."

>> For example, if I know that my dict values are all 4-digit integers,
>> can I write:
>>
>>     d[k]  = d[k] + 0
>>
>> and be assured that the version_tag will bump?  Or is that something
>> that a future optimizer might optimize out?

> Hum, I will try to clarify that.

I would prefer that you clarify it to say that while the initial patch
doesn't optimize that out, a future optimizer might.

> The problem with storing an identifier (a pointer in C) with no strong
> reference is when the object is destroyed, a new object can likely get
> the same identifier. So it's likely that "dict[key] is old_value_id"
> can be true even if dict[key] is now a new object.

Yes, but it shouldn't actually be destroyed until it is removed from
the dict, which should change version_tag, so that there will be no
need to compare it.

> Do you want to modify the PEP 509 to fix this issue? Or you don't
> understand why the PEP 509 cannot be used to fix the issue? I'm
> lost...

I believe it *does* fix the issue in some (but not all) cases.

-jJ

From jimjjewett at gmail.com  Mon Apr 18 07:46:44 2016
From: jimjjewett at gmail.com (Jim J. Jewett)
Date: Mon, 18 Apr 2016 07:46:44 -0400
Subject: [Python-Dev] Updated PEP 509
In-Reply-To: <CAMpsgwZYEop0FQ5eqroeBDvfpSqZGvq_xz5cRnDqv4wGMVQ9VA@mail.gmail.com>
References: <CAMpsgwZYEop0FQ5eqroeBDvfpSqZGvq_xz5cRnDqv4wGMVQ9VA@mail.gmail.com>
Message-ID: <CA+OGgf6BvwcUcUz958VRQZEByDoXS2OzGnY7=mDyUGP+hjtOnw@mail.gmail.com>

On Sat, Apr 16, 2016 at 5:01 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> * I mentionned that version++ must be atomic, and that in the case of
> CPython, it's done by the GIL

Better; if those methods *already* hold the GIL, it is worth saying
"already", to indicate that the change is not expensive.

> * I removed the dict[key]=value; dict[key]=value. It's really a
> micro-optimization. I also fear that Raymond will complain because it
> adds an if in the hot code of dict, and the dict type is very
> important for Python performance.

That is an acceptable answer.  Though I really do prefer explicitly
*refusing to promise* either way when the replacement/replaced objects
are ==.

dicts (and other collections) already assume sensible ==, even
explicitly allowing self-matches of objects that are not equal to
themselves.  I don't like the idea of making new promises that violate
(or rely on violations of) that sensible == assumption.

-jJ

From ethan at stoneleaf.us  Mon Apr 18 10:03:28 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 18 Apr 2016 07:03:28 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7d_gt+vfPAwtE1Knx77M_BegTghCLzFA6TkqFWE24rSpg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<570E659C.8010108@stoneleaf.us>	<1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>	<22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>	<CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>	<22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>	<CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>	<5713D26A.4000704@stoneleaf.us>	<CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
 <CADiSq7d_gt+vfPAwtE1Knx77M_BegTghCLzFA6TkqFWE24rSpg@mail.gmail.com>
Message-ID: <5714E930.50305@stoneleaf.us>

On 04/18/2016 12:41 AM, Nick Coghlan wrote:

> Given the variant you [Koos] suggested, what if we defined the API semantics
> like this:
>
>      # Offer the simplest possible API as the public vesion
>      def fspath(pathlike) -> str:
>          return os._raw_fspath(pathlike)
>
>      # Expose the complexity in the "private" variant
>      def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
>          # Short-circuit for instances of the output type
>          if isinstance(pathlike, output_types):
>              return pathlike
>          # We'd have a tidier error message here for non-path objects
>          result = pathlike.__fspath__()
>          if not isinstance(result, output_types):
>              raise TypeError("argument is not and does not provide an
> acceptable pathname")
>          return result

My initial reaction was that this was overly complex, but after thinking 
about it a couple days I /really/ like it.  It has a reasonable default 
for the 99% real-world use-case, while still allowing for custom and 
exact tailoring (for the 99% stdlib use-case ;) .

--
~Ethan~


From oscar.j.benjamin at gmail.com  Mon Apr 18 10:38:54 2016
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 18 Apr 2016 15:38:54 +0100
Subject: [Python-Dev] Updated PEP 509
In-Reply-To: <CA+OGgf6BvwcUcUz958VRQZEByDoXS2OzGnY7=mDyUGP+hjtOnw@mail.gmail.com>
References: <CAMpsgwZYEop0FQ5eqroeBDvfpSqZGvq_xz5cRnDqv4wGMVQ9VA@mail.gmail.com>
 <CA+OGgf6BvwcUcUz958VRQZEByDoXS2OzGnY7=mDyUGP+hjtOnw@mail.gmail.com>
Message-ID: <CAHVvXxTAHZ5HH4+W8gfv_voJ1iDwhTPC28mLhJmJtWTaJRt1Ow@mail.gmail.com>

On 18 April 2016 at 12:46, Jim J. Jewett <jimjjewett at gmail.com> wrote:
>>
>> * I removed the dict[key]=value; dict[key]=value. It's really a
>> micro-optimization. I also fear that Raymond will complain because it
>> adds an if in the hot code of dict, and the dict type is very
>> important for Python performance.
>
> That is an acceptable answer.  Though I really do prefer explicitly
> *refusing to promise* either way when the replacement/replaced objects
> are ==.
>
> dicts (and other collections) already assume sensible ==, even
> explicitly allowing self-matches of objects that are not equal to
> themselves.  I don't like the idea of making new promises that violate
> (or rely on violations of) that sensible == assumption.

dicts make assumptions about the behaviour of __eq__ for the *keys*
but not for the *values* (on which no assumptions are made). The only
way to replace a key in a dict with another equal key (having a
well-behaved hash function) is to pop the key out and then insert the
new key so it's not possible to replace a key with another equal key
without bumping the version twice. So presumably you're referring to
the values here right?

The purpose of the PEP is to be able to guard for changes to
namespaces which are implemented as dicts. So if
builtins.__dict__['abs'] is replaced by foo then we don't care what
foo.__eq__ says about the situation: any optimisation that assumed
builtins.abs was not monkeypatched is invalidated. That's why the
version update is needed. Without it the version cannot be relied upon
as an optimisation guard. Consider:

class MyAbs:
    def __eq__(self, other):
        return True
    def __call__(self, arg):
        return - arg

builtins.abs = MyAbs()

--
Oscar

From cr0hn at cr0hn.com  Mon Apr 18 06:05:28 2016
From: cr0hn at cr0hn.com (cr0hn)
Date: Mon, 18 Apr 2016 12:05:28 +0200
Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio... has
 sense?
Message-ID: <CAO5w5vZxhAyCaiL+xMGRxGnUCEHdf8gk-fmPX=jNroQ0AYSPwQ@mail.gmail.com>

Hi all,

It's the first time I write in this list. Sorry if it's not the best place
for this question.

After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley
articles/talks, etc, I developed a PoC library that mixes: Process +
Threads + Asyncio Tasks, doing an scheme like this diagram:

 main -> Process 1 -> Thread 1.1 -> Task 1.1.1
                                                      -> Task 1.1.2
                                                      -> Task 1.1.3

                                -> Thread 1.2
                                                     -> Task 1.2.1
                                                     -> Task 1.2.2
                                                     -> Task 1.2.3

             Process 2 -> Thread 2.1 -> Task 2.1.1
                                                     -> Task 2.1.2
                                                     -> Task 2.1.3

                              -> Thread 2.2
                                                     -> Task 2.2.1
                                                     -> Task 2.2.2
                                                     -> Task 2.2.3

In my local tests, this approach appear to improve (and simplify) the
concurrency/parallelism for some tasks but, before release the library at
github, I don't know if my aproach is wrong and I would appreciate your
opinion.

Thank you very much for your time.

Regards!

-- 
Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester
@ggdaniel
http://www.cr0hn.com/me/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/ff7b1d1f/attachment.html>

From guido at python.org  Mon Apr 18 12:40:14 2016
From: guido at python.org (Guido van Rossum)
Date: Mon, 18 Apr 2016 09:40:14 -0700
Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio...
 has sense?
In-Reply-To: <CAO5w5vZxhAyCaiL+xMGRxGnUCEHdf8gk-fmPX=jNroQ0AYSPwQ@mail.gmail.com>
References: <CAO5w5vZxhAyCaiL+xMGRxGnUCEHdf8gk-fmPX=jNroQ0AYSPwQ@mail.gmail.com>
Message-ID: <CAP7+vJLNtSrH0oqkSY=jRGTdSjPLRUWbOCKfgxLsF4_JzV=Eaw@mail.gmail.com>

A better place for this question would be the tulip Google group:
https://groups.google.com/forum/#!forum/python-tulip

On Mon, Apr 18, 2016 at 3:05 AM, cr0hn <cr0hn at cr0hn.com> wrote:

> Hi all,
>
> It's the first time I write in this list. Sorry if it's not the best place
> for this question.
>
> After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley
> articles/talks, etc, I developed a PoC library that mixes: Process +
> Threads + Asyncio Tasks, doing an scheme like this diagram:
>
>  main -> Process 1 -> Thread 1.1 -> Task 1.1.1
>                                                       -> Task 1.1.2
>                                                       -> Task 1.1.3
>
>                                 -> Thread 1.2
>                                                      -> Task 1.2.1
>                                                      -> Task 1.2.2
>                                                      -> Task 1.2.3
>
>              Process 2 -> Thread 2.1 -> Task 2.1.1
>                                                      -> Task 2.1.2
>                                                      -> Task 2.1.3
>
>                               -> Thread 2.2
>                                                      -> Task 2.2.1
>                                                      -> Task 2.2.2
>                                                      -> Task 2.2.3
>
> In my local tests, this approach appear to improve (and simplify) the
> concurrency/parallelism for some tasks but, before release the library at
> github, I don't know if my aproach is wrong and I would appreciate your
> opinion.
>
> Thank you very much for your time.
>
> Regards!
>
> --
> Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester
> @ggdaniel
> http://www.cr0hn.com/me/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/a16337ef/attachment.html>

From brett at python.org  Mon Apr 18 13:13:51 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 18 Apr 2016 17:13:51 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
Message-ID: <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>

On Sun, 17 Apr 2016 at 06:59 Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull
> <stephen at xemacs.org> wrote:
> > Nick Coghlan writes:
> >
> >  > str and bytes aren't going to implement __fspath__ (since they're
> >  > only *sometimes* path objects), so asking people to call the
> >  > protocol method directly for any purpose would be a pain.
> >
> > It *should* be a pain.  People who need bytes should call fsencode,
> > people who need str should call fsdecode, and Ethan's antipathy checks
> > for bytes and str, then calls __fspath__ if needed.  Who's left?  Just
> > the bartender and the janitor, last call was hours ago.  OK, maybe
> > there are enough clients to make it worthwhile to provide the utility,
> > but it should be clearly marked as "double opt-in, for experts only
> > (consenting adults must show proof of insurance)".
>
> My doubts, expressed several times in these threads, about the need
> for a *public* os.fspath function to complement the __fspath__
> protocol, are now perhaps gone. I'll explain why (and how). The
> reasons for my doubts were that
>
> (1) The audience outside the stdlib for such a function should be
> small, because it is preferred to either use existing tools in
> os.path.* or pathlib (or similar) for manipulating paths.
>
> (2) There are just too many different possible versions of this
> function: rejecting str, rejecting bytes, coercion to str, coercion to
> bytes, and accepting both str and bytes. That's a total of 5 different
> cases. People also used to talk about versions that would not allow
> passing through objects that are already bytes or str. That would make
> it a total of 10 different versions!
> (in principle, there could be even more, but let's not go there :-).
> In other words, this argument was that it is probably best to
> implement whatever flavor is needed for the context, perhaps based on
> documented recipes.
>
>
> Regarding (2), we can first rule out half of the 10 cases---the ones
> that reject plain instances of bytes and/or str---because they would
> not be very useful as all the isinstance/hasattr checking etc. would
> be left to the caller. And here are the remaining five, explained
> based on what they accept as argument, what they return, and where
> they would be used:
>
> (A) "polymorphic"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: str/bytes depending on input.
> *Audience*: the stdlib, including os.path.things, os.things,
> shutil.things, open, ... (some functions would need a C version).
> There may even be a small audience outside the stdlib.
>
> (B) "str-based only"
> *Accept*: str, provided via __fspath__ as well as plain str.
> *Return*: str.
> *Audience*: relatively low-level code that works exclusively with str
> paths but accepts specialized path objects as input.
>
> (C) "bytes-based only"
> *Accept*: bytes, provided via __fspath__ as well as plain bytes.
> *Return*: bytes.
> *Audience*: low-level code that explicitly deals with paths as bytes
> (probably to deal with undefined/ill-defined encodings).
>
> (D) "coerce to str"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: str (coerced / decoded if needed).
> *Audience*: code that deals explicitly with str but wants to 'try'
> supporting bytes-based path inputs too via implicit decoding (even if
> it may result in surrogate escapes, which one cannot for instance
> print(...).)
>
> (E) "coerce to bytes"
> *Accept*: str and bytes, provided via __fspath__ as well as plain str
> and bytes instances.
> *Return*: bytes (coerced / encoded if needed).
> *Audience*: low-level code that explicitly deals with bytes paths but
> wants to accept str-based path inputs too via implicit encoding.
>
>
> Even if all options (A-E) probably have small audiences (compared to
> e.g. os.path.*), some of them have larger audiences than others. But
> all of them have at least *some* reasonable audience (as desribed
> above).
>
> Recently (well, a few days ago, but 'recently', considering the scale
> of these discussions anyway ;-), Nick pointed out something I hadn't
> realized---os.fsencode and os.fsdecode actually already implement
> coercion to bytes and str, respectively. With those two functions made
> compatible with the __fspath__ protocol [using (A) above], they would
> in fact *be* (D) and (E), respectively.
>
> Now, we only have options (A-C) left. They could all be implemented
> roughly as follows:
>
> def fspath(pathlike, *, output_types = (str,)):
>   if hasattr(pathlike, '__fspath__'):
>     ret = pathlike.__fspath__()  # or pathlike.__fspath__ if it's not a
> method
>   else:
>     ret = pathlike
>   if not isinstance(ret, output_types):
>     raise TypeError("argument is not and does not provide an
> acceptable pathname")
>   return ret
>
> With an implementation like the above, (A) would correspond to
> output_types = (str, bytes), (B) to the default, and (C) to
> output_types = (bytes,).
>
>
> So, with the above considerations as a counterargument, I consider
> argument (2) gone.
>
> What about argument (1), that the audience for the os.fspath(...)
> function (especially for one selected version of the 5 or 10
> variations!) is quite small, and we should not encourage manipulating
> pathnames by hand, but to use os.path.* or pathlib instead?
>
> The counterargument for (1):
>
> It seems to me we now "all" agree that __fspath__ should allow
> str+bytes polymorphism. I could try to list who I mean by "all"
> (Ethan, Brett, Stephen T, Nick, ... ?), but obviously I won't be able
> to list all or speak for them so I won't even try :-). Anyway, for
> this argument, I'm assuming we agree on that. So, __fspath__ can
> provide either str or bytes, even if str is *highly preferred* in most
> places. Therefore, the os.fspath function, as part of the protocol,
> has the important role of *by default* rejecting bytes, so that the
> protocol effectively becomes str-only by default. With the fspath
> implementation like the one I drafted above, and
> os.fsencode+os.fsdecode, we in fact cover all cases (A-E).
>
> So, as a summary: With a str+bytes-polymorphic __fspath__, with the
> above argumentation and the rough implementation of os.fspath(...),
> the conclusion is that the os.fspath function should indeed be public,
> and that no further variations are needed.
>
> -Koos
>
> P.S. There is also the possibility of two dunder methods corresponding
> to str and bytes, leading to one being preferred over the other in
> some cases etc. I have gone though various aspects and possible
> versions of that approach, but concluded it's not worth it, as some of
> us may also have implied in earlier posts. After all, we want
> something that's *almost* exclusively str.
>

Just to add to the chorus of praise, thanks for the summary, Koos!

I just wanted to add a rephrasing to your overall conclusion that I reached
independently Friday night but couldn't post earlier as I promised my wife
I wouldn't write or say the "P" word all weekend which meant I didn't read
or respond to any python-dev email all weekend (if you think that's cruel
and unusual punishment, her Twitter is https://twitter.com/AndreaMcInnes21 ;)
.

If we continue with the "str is an encoding of file paths", you can then
build from "bytes is an encoding of str" to get a pyramid of file path
encodings: Path -> str -> bytes. I don't think this is in any way a
controversial view.

Now Stephen has been promoting the idea of enhancing os.fsencode() and
os.fsdecode() to understand what __fspath__ is (I'm ignoring the str/bytes
return points for now). With os.fsencode() this would mean giving it
anything in the Path -> str -> bytes pyramid would lead to following the
steps to reach bytes at the bottom of the encoding pyramid. That's fine and
easy to explain: whatever you pass into os.fsencode() you know it will get
encoded to bytes using the file system encoding and surrogate escape.

The trick becomes os.fsdecode() and its str return value. Looking at our
encoding pyramid of Path -> str -> bytes we notice that the return value
for os.fsdecode() is actually now in the *middle* of our encoding pyramid.
What that means is that while passing in bytes and decoding them to str
makes sense, passing in a Path object and getting back str is actually an
*encoding*! My brain wanting semantic purity for the "decode" part of
os.fsdecode() started to hurt.

But that's when I realized that adding __fspath__ support to os.fsdecode()
and os.fsencode(), they become more coercion functions rather than
encoding/decoding functions. It also means that os.fspath() has a place
when you want to say "I only want to encode a file path to str" and avoid
the decode bit that os.fsdecode() would do (IOW it's like a half step of
os.fsencode() for full control). You probably also want control of getting
just bytes and skipping os.fsencode() and its automatic encoding call so
that you don't accidentally get mojibake or something.

Now going back to what __fspath__ returns, this starts to promote that it
returns the highest level in the Path -> str -> bytes pyramid that isn't
the top. We then provide whatever support we need to allow to go straight
to the encoding someone might want through the os module. Koos outlined all
of this above so I'm not going to rehash it all here, but the point will be
the protocol will be more low-level than we expect people to work with and
we will promote the use of the proper helper functions in the os module to
get the results people desire (although I still feel a little bad for
people writing libraries that will be manipulating paths prior to Python
3.6 who don't get this helper code, but my assumption is that they will get
TypeError from using whatever __fspath__() returns and e.g. os.path.join()
w/ a different type, otherwise they are just passing paths down to the
stdlib and so shouldn't inhibit usage of specific path encodings).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/305711ff/attachment.html>

From stephen at xemacs.org  Mon Apr 18 15:25:16 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 19 Apr 2016 04:25:16 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
Message-ID: <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>

I don't disagree with the basic analysis, but there are a number of
issues with motivational statements.

Koos Zevenhoven writes:

 > (B) "str-based only"
 > *Accept*: str, provided via __fspath__ as well as plain str.
 > *Return*: str.
 > *Audience*: relatively low-level code that works exclusively with str
 > paths but accepts specialized path objects as input.

Why "low-level"?  All code that stores paths persistently is likely to
store them in text files or database strings or the like, rather than
as Path (read: specialized path objects, not necessarily
pathlib.Path).  But if there is any low-level manipulation of the
paths to be done before storing, it would be done as Path.  Thus
high-level code might also want to accept Path transparently.

 > (C) "bytes-based only"
 > *Accept*: bytes, provided via __fspath__ as well as plain bytes.
 > *Return*: bytes.
 > *Audience*: low-level code that explicitly deals with paths as bytes
 > (probably to deal with undefined/ill-defined encodings).

No, if it's to deal with encoding issues, we wouldn't accept this.
PEP 383 eliminates that concern.  We accept bytes to support people
who are representing paths with bytes because they think that it's a
good idea and that encoding doesn't matter in their application.

 > (D) "coerce to str"
 > *Accept*: str and bytes, provided via __fspath__ as well as plain str
 > and bytes instances.
 > *Return*: str (coerced / decoded if needed).
 > *Audience*: code that deals explicitly with str but wants to 'try'
 > supporting bytes-based path inputs too via implicit decoding (even if
 > it may result in surrogate escapes, which one cannot for instance
 > print(...).)

No.  As Nick points out with respect to fsencode/fsdecode, it's not
a question of supporting known bytes via implicit decoding (that's
what __fspath__ does for the types that support it), but rather
of supporting ambiguity.  Best practice is to convert explicitly at
the boundary, because it's too likely that data with unexpected type
is just the wrong data.  

Printing surrogates can be done with errors=backslashreplace, and if
you're using fsdecode, you probably should use that, namereplace, or
xmlcharrefreplace.

 > (E) "coerce to bytes"
 > *Accept*: str and bytes, provided via __fspath__ as well as plain str
 > and bytes instances.
 > *Return*: bytes (coerced / encoded if needed).
 > *Audience*: low-level code that explicitly deals with bytes paths but
 > wants to accept str-based path inputs too via implicit encoding.

Again, it's a question of ambiguity, or perhaps sloppy programming
(eg, using str literals for paths in a bytes-oriented program).

Use cases D and E are basically "guessing when faced with ambiguity",
and fsencode and fsdecode are code smells because (as Nick claims)
they almost always conceal a situation where you don't know whether
you've got bytes or str (and it's way too much work to find out by
tracing them back to where they came from).

 > It seems to me we now "all" agree that __fspath__ should allow
 > str+bytes polymorphism.

I don't agree that we *should* allow polymorphism, because (purity)
paths are in the text domain[1] and (practicality) I don't believe that
use of os.fspath will be restricted to "low-level boundary code".  I
would be perfectly happy telling bytes users that the idiom is not
"os.fspath(maybe_direntry, allow_types=(bytes,))", but rather
"os.fsencode(os.fspath(maybe_direntry))", so that code in the text
domain can safely use os.fspath(maybe_direntry) without worrying that
it will raise because maybe_direntry.__fspath__() returns bytes.

This would allow pathlib.Path to handle arguments providing __fspath__
transparently.  With the current proposal, it would need to rule out
bytes before invoking os.fspath, or handle the exception, or leave the
exception to its caller.  None of these options are pleasant.

Unfortunately, as Nick points out, defining __fspath__ to return str
is very unpleasant because bytes applications will now have to guard
*everything* that might provide __fspath__ with that incantation
before passing to open and other APIs that store the path on the
object returned.  So we don't really have a choice about polymorphism
if we want to support both __fspath__ and bytes paths.

 > After all, we want something that's *almost* exclusively str.

But we don't want that, AFAICT.  Some clearly want this API to be
unbiased against bytes in the same way the os APIs are unbiased[2],
because that's what we've got in the current proposal.  Further, due
to the existing ambiguity in fsencode and fsdecode, we're extending
the field of ambiguity where bytes and str can mix indiscriminately.

If we are serious about "*almost* exclusively str" we should accept
that "exclusively str" is a very good approximation and much easier to
use correctly, and regretfully postpone inclusion of DirEntry in this
protocol to the future.  But that's not on the table, is it?


Footnotes: 
[1]  Representation on disk as (basically unconstrained) byte
sequences is an historical accident.

[2]  That doesn't mean the bytes variants will be used as often as the
str variants, just that the bytes variants are as easy to use. 


From stephen at xemacs.org  Mon Apr 18 15:26:56 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 19 Apr 2016 04:26:56 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
Message-ID: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>

Brett Cannon writes:

 > If we continue with the "str is an encoding of file paths",

It's not.  It's a representation, but not an encoding.  In Python 3,
encoding means a representation of a character string using bytes.
It's using "encoding" generically for "representation" that makes your
head hurt.

 > you can then build from "bytes is an encoding of str" to get a
 > pyramid of file path encodings: Path -> str -> bytes. I don't think
 > this is in any way a controversial view.

Perhaps not.  But it's not particularly useful. ;-)  Here's the
pyramid I think about:

                                 Path
                                /    \
                               /      \
                              V        V
                            str <-> bytes

That is, str and bytes are interchangeable *without* any knowledge of
paths, which are on a higher level of complexity and abstraction.
Although in pathlib, there's an assumption that paths are serialized
to str which is (implicitly) serialized to bytes when talking to the
OS, this is not necessarily true for other structured path classes, in
particular it is not true for DirEntry (which is a "enhanced
degenerate" path containing only one path segment but also other
useful information abot the filesystem object addressed)

I haven't looked at Antipathy, but I would guess from Ethan's
promotion of bytes paths and concern with efficiency that "bytes
antipaths" do *not* "go through" str to get to bytes, they already are
bytes (in the sense of class inheritance).

 > But that's when I realized that adding __fspath__ support to os.fsdecode()
 > and os.fsencode(), they become more coercion functions rather than
 > encoding/decoding functions. It also means that os.fspath() has a place
 > when you want to say "I only want to encode a file path to str" and avoid
 > the decode bit that os.fsdecode() would do

I don't understand what you're trying to say here.  fsdecode currently
does not promise to decode anything, because it's polymorphic,
accepting str and bytes.  fsdecode and fsencode already *are* coercion
functions.

It's this kind of semantic confusion and broken nomenclature that is
*why* I dislike these polymorphic functions and objects so much.  It
is impossible to reason correctly about them.  We're stuck with
invoking "practicality" and muddling through.  And the names mislead
even experienced Pythonistas.

Steve


From random832 at fastmail.com  Mon Apr 18 15:42:59 2016
From: random832 at fastmail.com (Random832)
Date: Mon, 18 Apr 2016 15:42:59 -0400
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
Message-ID: <1461008579.2246251.582474273.555E64CC@webmail.messagingengine.com>



On Mon, Apr 18, 2016, at 15:26, Stephen J. Turnbull wrote:
> in
> particular it is not true for DirEntry (which is a "enhanced
> degenerate" path containing only one path segment but also other
> useful information abot the filesystem object addressed)

DirEntry contains multiple path segments - it has the name, and the
directory path that was passed into scandir.

From ethan at stoneleaf.us  Mon Apr 18 15:50:56 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 18 Apr 2016 12:50:56 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
Message-ID: <57153AA0.5090103@stoneleaf.us>

On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
> Koos Zevenhoven writes:

>> After all, we want something that's *almost* exclusively str.
>
> But we don't want that, AFAICT.  Some clearly want this API to be
> unbiased against bytes in the same way the os APIs are unbiased[2],
> because that's what we've got in the current proposal.

Are we reading the same thread?  For my last several replies I am very 
biased against bytes (and I know I'm not the only one).

Just not so biased that I'm unwilling to let clients say, "No, I'm 
really okay with getting bytes back".

I really like Koos' ideas because they allow the client to say:

- I only want str
- I only want bytes
- I'm okay with either

If the client says "I'm okay with either" then I fully expect the client 
to have code to properly handle str vs bytes after the fspath (or 
whatever it's called) call.

--
~Ethan~

From wes.turner at gmail.com  Mon Apr 18 15:54:49 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 18 Apr 2016 14:54:49 -0500
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <57153AA0.5090103@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
Message-ID: <CACfEFw9VsfgH41ZEi8gZGxmN19QXa2mhfUv_avrR40DDX1tX3w@mail.gmail.com>

On Apr 18, 2016 2:50 PM, "Ethan Furman" <ethan at stoneleaf.us> wrote:
>
> On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
>
>> Koos Zevenhoven writes:
>
>
>>> After all, we want something that's *almost* exclusively str.
>>
>>
>> But we don't want that, AFAICT.  Some clearly want this API to be
>> unbiased against bytes in the same way the os APIs are unbiased[2],
>> because that's what we've got in the current proposal.
>
>
> Are we reading the same thread?  For my last several replies I am very
biased against bytes (and I know I'm not the only one).
>
> Just not so biased that I'm unwilling to let clients say, "No, I'm really
okay with getting bytes back".
>
> I really like Koos' ideas because they allow the client to say:
>
> - I only want str
> - I only want bytes
> - I'm okay with either
>
> If the client says "I'm okay with either" then I fully expect the client
to have code to properly handle str vs bytes after the fspath (or whatever
it's called) call.

Don't we *have* to always support bytes because other programs can create
filenames containing bytes?

>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/d4dc7909/attachment.html>

From ethan at stoneleaf.us  Mon Apr 18 16:19:22 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 18 Apr 2016 13:19:22 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACfEFw9VsfgH41ZEi8gZGxmN19QXa2mhfUv_avrR40DDX1tX3w@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>	<570E659C.8010108@stoneleaf.us>	<1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>	<22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>	<CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>	<22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>	<CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>	<22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>	<57153AA0.5090103@stoneleaf.us>
 <CACfEFw9VsfgH41ZEi8gZGxmN19QXa2mhfUv_avrR40DDX1tX3w@mail.gmail.com>
Message-ID: <5715414A.7000901@stoneleaf.us>

On 04/18/2016 12:54 PM, Wes Turner wrote:

> Don't we *have* to always support bytes because other programs can
> create filenames containing bytes?

Yes, but not every function has to support bytes.

--
~Ethan~


From cr0hn at cr0hn.com  Mon Apr 18 13:13:58 2016
From: cr0hn at cr0hn.com (cr0hn)
Date: Mon, 18 Apr 2016 13:13:58 -0400
Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio...
 has sense?
In-Reply-To: <CAP7+vJLNtSrH0oqkSY=jRGTdSjPLRUWbOCKfgxLsF4_JzV=Eaw@mail.gmail.com>
References: <CAP7+vJLNtSrH0oqkSY=jRGTdSjPLRUWbOCKfgxLsF4_JzV=Eaw@mail.gmail.com>
Message-ID: <CAO5w5vZ2TaeFqyZNhpkwCvCwFM7AbvN37bMe9NbVB26q8G9EDg@mail.gmail.com>

Oks. Thank you very much.


---
*Daniel Garc?a (cr0hn)*
Security researcher and ethical hacker

*Personal site*: http://cr0hn.com
*Linkedin*: https://www.linkedin.com/in/garciagarciadaniel
*Company*: http://abirtone.com
*Twitter*: @ggdaniel <https://twitter.com/ggdaniel>

El d?a 18 de abril de 2016 a las 18:40:14, Guido van Rossum (
guido at python.org) escrito:

> A better place for this question would be the tulip Google group:
> https://groups.google.com/forum/#!forum/python-tulip
>
> On Mon, Apr 18, 2016 at 3:05 AM, cr0hn <cr0hn at cr0hn.com> wrote:
>
>> Hi all,
>>
>> It's the first time I write in this list. Sorry if it's not the best
>> place for this question.
>>
>> After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley
>> articles/talks, etc, I developed a PoC library that mixes: Process +
>> Threads + Asyncio Tasks, doing an scheme like this diagram:
>>
>>  main -> Process 1 -> Thread 1.1 -> Task 1.1.1
>>                                                       -> Task 1.1.2
>>                                                       -> Task 1.1.3
>>
>>                                 -> Thread 1.2
>>                                                      -> Task 1.2.1
>>                                                      -> Task 1.2.2
>>                                                      -> Task 1.2.3
>>
>>              Process 2 -> Thread 2.1 -> Task 2.1.1
>>                                                      -> Task 2.1.2
>>                                                      -> Task 2.1.3
>>
>>                               -> Thread 2.2
>>                                                      -> Task 2.2.1
>>                                                      -> Task 2.2.2
>>                                                      -> Task 2.2.3
>>
>> In my local tests, this approach appear to improve (and simplify) the
>> concurrency/parallelism for some tasks but, before release the library at
>> github, I don't know if my aproach is wrong and I would appreciate your
>> opinion.
>>
>> Thank you very much for your time.
>>
>> Regards!
>>
>> --
>> Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester
>> @ggdaniel
>> http://www.cr0hn.com/me/
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/95c97f5b/attachment.html>

From rosuav at gmail.com  Mon Apr 18 16:27:05 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 Apr 2016 06:27:05 +1000
Subject: [Python-Dev] [Python-ideas] pep 7 line break suggestion differs
 from pep 8
In-Reply-To: <CAHGq92XnVkP77fX9ibEZuBfxa_25Y7neKSgO=vuCv4OfkJ25aQ@mail.gmail.com>
References: <CAHGq92XnVkP77fX9ibEZuBfxa_25Y7neKSgO=vuCv4OfkJ25aQ@mail.gmail.com>
Message-ID: <CAPTjJmrL_gsvWKfBoB2AE1+vCzV8Rhg5LYJ9N6V6cqYZvtUSeg@mail.gmail.com>

On Tue, Apr 19, 2016 at 5:33 AM, Joseph Jevnik <joejev at gmail.com> wrote:
> I saw that there was recently a change to pep 8 to suggest adding a line
> break before a binary operator. Pep 7 suggests the opposite:
>
>> When you break a long expression at a binary operator, the operator goes
>> at the end of the previous line, e.g.:
>
>> if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 &&
>>     type->tp_dictoffset == b_size &&
>>     (size_t)t_size == b_size + sizeof(PyObject *))
>>     return 0; /* "Forgive" adding a __dict__ only */
>
> I imagine that some of the reasons for making the change in pep 8 for
> readability reasons will also
> translate to C; maybe pep 7 should also be updated.

I would agree with this. Passing it directly to python-dev as that's
where the key decision makers are.

ChrisA

From brett at python.org  Mon Apr 18 17:40:37 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 18 Apr 2016 21:40:37 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>

On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull <stephen at xemacs.org> wrote:

> Brett Cannon writes:
>
>  > If we continue with the "str is an encoding of file paths",
>
> It's not.  It's a representation, but not an encoding.  In Python 3,
> encoding means a representation of a character string using bytes.
> It's using "encoding" generically for "representation" that makes your
> head hurt.
>

Well, it makes *your* head hurt; for me it helped clarify some things. :)


>
>  > you can then build from "bytes is an encoding of str" to get a
>  > pyramid of file path encodings: Path -> str -> bytes. I don't think
>  > this is in any way a controversial view.
>
> Perhaps not.  But it's not particularly useful. ;-)  Here's the
> pyramid I think about:
>
>                                  Path
>                                 /    \
>                                /      \
>                               V        V
>                             str <-> bytes
>
> That is, str and bytes are interchangeable *without* any knowledge of
> paths, which are on a higher level of complexity and abstraction.
> Although in pathlib, there's an assumption that paths are serialized
> to str which is (implicitly) serialized to bytes when talking to the
> OS, this is not necessarily true for other structured path classes, in
> particular it is not true for DirEntry (which is a "enhanced
> degenerate" path containing only one path segment but also other
> useful information about the filesystem object addressed)
>
> I haven't looked at Antipathy, but I would guess from Ethan's
> promotion of bytes paths and concern with efficiency that "bytes
> antipaths" do *not* "go through" str to get to bytes, they already are
> bytes (in the sense of class inheritance).
>
>  > But that's when I realized that adding __fspath__ support to
> os.fsdecode()
>  > and os.fsencode(), they become more coercion functions rather than
>  > encoding/decoding functions. It also means that os.fspath() has a place
>  > when you want to say "I only want to encode a file path to str" and
> avoid
>  > the decode bit that os.fsdecode() would do
>
> I don't understand what you're trying to say here.  fsdecode currently
> does not promise to decode anything, because it's polymorphic,
> accepting str and bytes.  fsdecode and fsencode already *are* coercion
> functions.
>

And they will continue to be coercion functions. My point is that since
they coerce there is no way to use them in a way to dictate that you don't
want any str/bytes encoding/decoding to occur without checking the
arguments going into the function (i.e. "no guessing about encodings,
please"). By providing os.fspath() I can say that I do not, under any
circumstances, want someone to guess at the encoding some bytes path is
under to get me a string and instead I want to start and end entirely in a
world of strings. IOW os.fspath() lets me work in such a way that the
instant bytes are introduced into my code for file paths it triggers a
TypeError.


>
> It's this kind of semantic confusion and broken nomenclature that is
> *why* I dislike these polymorphic functions and objects so much.  It
> is impossible to reason correctly about them.  We're stuck with
> invoking "practicality" and muddling through.  And the names mislead
> even experienced Pythonistas.
>

Yep, we are stuck with the names unless you want to propose a new name and
deprecate the old one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/de4fee64/attachment-0001.html>

From k7hoven at gmail.com  Mon Apr 18 17:58:59 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Tue, 19 Apr 2016 00:58:59 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <5714E930.50305@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us> <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <5713D26A.4000704@stoneleaf.us>
 <CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>
 <CADiSq7d_gt+vfPAwtE1Knx77M_BegTghCLzFA6TkqFWE24rSpg@mail.gmail.com>
 <5714E930.50305@stoneleaf.us>
Message-ID: <CAMiohoifYE=iCEP=gr5bx+onJqsiU6vCxk4mxQcvLf2LnX-CQg@mail.gmail.com>

On Mon, Apr 18, 2016 at 5:03 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/18/2016 12:41 AM, Nick Coghlan wrote:
>
>> Given the variant you [Koos] suggested, what if we defined the API
>> semantics
>> like this:
>>
>>      # Offer the simplest possible API as the public vesion
>>      def fspath(pathlike) -> str:
>>          return os._raw_fspath(pathlike)
>>
>>      # Expose the complexity in the "private" variant
>>      def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes):
>>          # Short-circuit for instances of the output type
>>          if isinstance(pathlike, output_types):
>>              return pathlike
>>          # We'd have a tidier error message here for non-path objects
>>          result = pathlike.__fspath__()
>>          if not isinstance(result, output_types):
>>              raise TypeError("argument is not and does not provide an
>> acceptable pathname")
>>          return result
>
> My initial reaction was that this was overly complex, but after thinking
> about it a couple days I /really/ like it.  It has a reasonable default for
> the 99% real-world use-case, while still allowing for custom and exact
> tailoring (for the 99% stdlib use-case ;) .
>

While it does seem we finally might be nearly there :), this still
seems to need some further discussion.

As described in that long post of mine, I suppose some third-party
code may need the variations (A-C), while it seems that in the stdlib,
most places need (str, bytes), i.e. (A), except in pathlib, which
needs (str,), i.e. (B). I'm not sure what I think about making the
variations private, even if "hiding" the bytes version is, as I said,
an important role of the public function.

Except for that type hint, there is *nothing* in the function that
might mislead the user to think bytes paths are something important in
Python 3. It's a matter of documentation whether it "supports" bytes
or not. In fact, that function (assuming the name os.fspath) could now
even be documented to support this:

    patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath))  # :-)

So are we still going to end up with two functions or can we deal with one?
What should the typehint be? Something new in typing.py? How about
FSPath[...] as follows:

FSPath[bytes]  # bytes-based pathlike, including bytes
FSPath[str]       # str-based pathlike, including str

pathstring = typing.TypeVar('pathstring', str, bytes)  # could be
extended with PurePath or some path ABC

So the above variation might become:

def fspathname(pathlike: FSPath[pathstring],
           *, output_types: tuple = (str,)) -> pathstring:
    # Short-circuit for instances of the output type
    if isinstance(pathlike, output_types):
        return pathlike
    # We'd have a tidier error message here for non-path objects
    result = pathlike.__fspath__()
    if not isinstance(result, output_types):
        raise TypeError("valid output type not provided via __fspath__")
    return result

And similar type hints would apply to os.path functions. For instance,
os.path.dirname:

def dirname(p: FSPath[pathstring]) -> pathstring:
    ...

This would say pathstring all over and not give anyone any ideas about
bytes, unless they know what they're doing.

Complicated? Yes, typing is. But I think we will need this kind of
hints for os.path functions anyway.

-Koos

From ethan at stoneleaf.us  Mon Apr 18 18:12:53 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 18 Apr 2016 15:12:53 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohoifYE=iCEP=gr5bx+onJqsiU6vCxk4mxQcvLf2LnX-CQg@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>	<CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>	<CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>	<22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>	<CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>	<22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>	<CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>	<22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>	<CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>	<22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>	<CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>	<5713D26A.4000704@stoneleaf.us>	<CAMiohoiPqbBE9P=hoEPfg7xqGwEQ_wXXdN70Av9gyuZDgAQu_g@mail.gmail.com>	<CADiSq7d_gt+vfPAwtE1Knx77M_BegTghCLzFA6TkqFWE24rSpg@mail.gmail.com>	<5714E930.50305@stoneleaf.us>
 <CAMiohoifYE=iCEP=gr5bx+onJqsiU6vCxk4mxQcvLf2LnX-CQg@mail.gmail.com>
Message-ID: <57155BE5.3010100@stoneleaf.us>

On 04/18/2016 02:58 PM, Koos Zevenhoven wrote:

> It's a matter of documentation whether it "supports" bytes
> or not. In fact, that function (assuming the name os.fspath) could now
> even be documented to support this:
>
>      patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath))  # :-)

While the os.fspath() function could be abused in such a way, we 
certainly wouldn't advertise it.  (Leave that to StackOverflow. ;)

--
~Ethan~

From ethan at stoneleaf.us  Mon Apr 18 18:30:38 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 18 Apr 2016 15:30:38 -0700
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
Message-ID: <5715600E.2060304@stoneleaf.us>

On 04/18/2016 12:26 PM, Stephen J. Turnbull wrote:

> I haven't looked at Antipathy, but I would guess from Ethan's
> promotion of bytes paths and concern with efficiency that "bytes
> antipaths" do *not* "go through" str to get to bytes, they already are
> bytes (in the sense of class inheritance).

Couple points:

- Correct: if you create an antipathy.Path with bytes, you get a
   bytes path (bPath); if you create an antipathy.Path with str
   you get a str path (uPath)

- if you mix a bPath with a uPath, or bytes with a uPath, or str with
   a bPath -- an exception is raised (conversions are *not* implicit (on
   3.0, at least -- on 2.x you can activate that behavior if you want it)

- my concern with supporting bytes is primarily for the sake of the
   stdlib, and secondarily for anyone who needs to work with bytes; it
   really has no effect on my library (since antipathy uses subclasses
   of bytes/str)

--
~Ethan~

From guido at python.org  Mon Apr 18 18:52:38 2016
From: guido at python.org (Guido van Rossum)
Date: Mon, 18 Apr 2016 15:52:38 -0700
Subject: [Python-Dev] [Python-ideas] pep 7 line break suggestion differs
 from pep 8
In-Reply-To: <CAPTjJmrL_gsvWKfBoB2AE1+vCzV8Rhg5LYJ9N6V6cqYZvtUSeg@mail.gmail.com>
References: <CAHGq92XnVkP77fX9ibEZuBfxa_25Y7neKSgO=vuCv4OfkJ25aQ@mail.gmail.com>
 <CAPTjJmrL_gsvWKfBoB2AE1+vCzV8Rhg5LYJ9N6V6cqYZvtUSeg@mail.gmail.com>
Message-ID: <CAP7+vJ+HXAS0-pUH8_3CHBVBWzG3CuXwEVuzWKKf5QzQu338jA@mail.gmail.com>

[ideas to bcc]

I'm not as excited about this as I am about the PEP 8 change.

PEP 8 affects most Python programmers.

But PEP 7 is really just for CPython and its extensions, and I don't think
it has found anything like as widespread a following as PEP 8.

I worry that if we change this in PEP 7 we'll just see either massing
inconsistent code or endless diffs that do nothing but change the
formatting (and occasionally introduce a bug).

And I don't think it would do as much good -- reading and understanding C
code is primarily a matter of knowing the language, and the audience is
much more heavily skewed towards experts.

IOW, -1.

On Mon, Apr 18, 2016 at 1:27 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Tue, Apr 19, 2016 at 5:33 AM, Joseph Jevnik <joejev at gmail.com> wrote:
> > I saw that there was recently a change to pep 8 to suggest adding a line
> > break before a binary operator. Pep 7 suggests the opposite:
> >
> >> When you break a long expression at a binary operator, the operator goes
> >> at the end of the previous line, e.g.:
> >
> >> if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 &&
> >>     type->tp_dictoffset == b_size &&
> >>     (size_t)t_size == b_size + sizeof(PyObject *))
> >>     return 0; /* "Forgive" adding a __dict__ only */
> >
> > I imagine that some of the reasons for making the change in pep 8 for
> > readability reasons will also
> > translate to C; maybe pep 7 should also be updated.
>
> I would agree with this. Passing it directly to python-dev as that's
> where the key decision makers are.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/f20243ee/attachment.html>

From wes.turner at gmail.com  Mon Apr 18 19:08:42 2016
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 18 Apr 2016 18:08:42 -0500
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CACfEFw-ah9F6UCJS1AY9w3JCG6SGgsCSbDH=k6M4BpYZCmiK3Q@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <CACfEFw9VsfgH41ZEi8gZGxmN19QXa2mhfUv_avrR40DDX1tX3w@mail.gmail.com>
 <5715414A.7000901@stoneleaf.us>
 <CACfEFw8vkUk4FdhD2S2=VK1y=ciuG4Mrw4XoAw6ndfgg4MWnng@mail.gmail.com>
 <CACfEFw9PHOf8abJdBssz2k2+8u310njRpicfXQjD_+--+1YuJg@mail.gmail.com>
 <CACfEFw-8=Z+25bV5nXHSORCJajB5u80mB=sWytzxjzE42mdc2g@mail.gmail.com>
 <CACfEFw9Cp5cMAMvUSQUaOUEeaL6pbmMRg0EQ237iPChrPs8pmA@mail.gmail.com>
 <CACfEFw-NePuvTzYdoz1mSt0cVUBUo==iuJ-sON7_fMn0ae5wwQ@mail.gmail.com>
 <CACfEFw_rvcJX7x3zoOHOQ4GJmQO0cn0DXd-iZBfORiJRFiUyKA@mail.gmail.com>
 <CACfEFw-4jgSO5HkkJBbp+rr-N5QoO5ym3g5KgbfQs+4YJ9J=Ow@mail.gmail.com>
 <CACfEFw_7KJTKDPL+eRhNXkJFQ8LQiFe1Bc_C05dhzu-cOGF7UA@mail.gmail.com>
 <CACfEFw_xPvfi-J89_vk1QAn6Zcar22S3Y8P4rWxZu_BXt2bFvg@mail.gmail.com>
 <CACfEFw-ah9F6UCJS1AY9w3JCG6SGgsCSbDH=k6M4BpYZCmiK3Q@mail.gmail.com>
Message-ID: <CACfEFw9bwkU5_GEur0nqgPeDEO4KP-y6hmk5EMhVM9q=BOamzg@mail.gmail.com>

On Apr 18, 2016 3:19 PM, "Ethan Furman" <ethan at stoneleaf.us> wrote:
>
> On 04/18/2016 12:54 PM, Wes Turner wrote:
>
>> Don't we *have* to always support bytes because other programs can
>> create filenames containing bytes?
>
>
> Yes, but not every function has to support bytes.

Because there's no function overloading in Python, we then must have
explicit typing conditionals.

I haven't the time to dig through and compare this with the other fine
solutions presented; is there a reason that a proxy/facade PrimitiveType
wouldn't solve for this?

class TextThing:
  __init__(self, data):
      self.data = data
      self.type_ = type(data)
   __getattr__(self, key):
       return getattr(self.data, key)


>
>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/a1581017/attachment.html>

From burkhardameier at gmail.com  Mon Apr 18 21:13:01 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Mon, 18 Apr 2016 18:13:01 -0700
Subject: [Python-Dev] My first post here ~ do you need more Python core
 developers on Windows?
In-Reply-To: <CAMpsgwbO1rX0wx2hUAfceTNUU+4oL0=OuD0LRBGBMkQrU24M-Q@mail.gmail.com>
References: <CACKxkAxVGLOtaOhJVjbCJQ6ezc2QQjAQVwjPV8qaFVRd=wVR7g@mail.gmail.com>
 <CAMpsgwbO1rX0wx2hUAfceTNUU+4oL0=OuD0LRBGBMkQrU24M-Q@mail.gmail.com>
Message-ID: <CACKxkAwey9BgVuhtJJdK_EZAb4BNkc2mOFCs9XdiR3M9oitKpg@mail.gmail.com>

Thank you for the warm welcome and the links. I will definitely check them
out.

Burkhard

On Mon, Apr 18, 2016 at 1:16 AM, Victor Stinner <victor.stinner at gmail.com>
wrote:

> 2016-04-18 7:23 GMT+02:00 Burkhard Meier <burkhardameier at gmail.com>:
> > My name is Burkhard Meier and I wrote the "Python GUI Programming
> Cookbook"
> > published by Packt.
> >
> > It is available on Amazon and PacktPub.com.
>
> Welcome!
>
> > Maybe I can become more involved in the Python community as a Python
> > developer on Windows .
>
> You can use the Developer Guide to start:
> https://docs.python.org/devguide/
>
> See also the Python menthors to get help on a dedicated and private
> mailing list:
> http://pythonmentors.com/
>
> Sadly yes, we have many open issues specific to Windows. I'm trying to
> sometimes give time to fix some of them, but I'm less interested than
> in open source operating systems ;-)
>
> Victor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/10e93116/attachment.html>

From victor.stinner at gmail.com  Tue Apr 19 06:27:38 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 19 Apr 2016 12:27:38 +0200
Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3)
Message-ID: <CAMpsgwa_==3yMjNWWFOM=xZi1Zeg_TW3gNJWf6sc1XmbzZofrQ@mail.gmail.com>

Hi,

Below if the third version of my PEP 509 (dict version).

Changes since the version 2:

* __setitem__() and update() now always increases the version: remove
the micro-optimization on "dict[key] is new_value". Exception: version
is not changed with dict.update() is called without argument.
* be more explict on version++: explain that the operation must be
atomic, and that dict methods are already atomic thanks to the GIL
* Usage of the dict version: add Cython
* "Guard against changing dict during iteration": don't guess if the
new dict version can be used or not. Let's discuss that later.
* rephrase/complete some sections
* add links to new threads on python-dev

I hope that I addressed all Jim's concerns about the version 2.

Note: I also updated the implementation. The implementation now
contains more tests for identical values and more tests on equal
values.

HTML version:
https://www.python.org/dev/peps/pep-0509/


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract
========

Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.


Rationale
=========

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction looks up a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (function
namespace) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a private version to dictionaries to implement fast
guards on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. The version is globally unique,
so checking the version is also enough to check if the namespace
dictionary was not replaced with a new dictionary.

When the dictionary version does not change, the performance of a guard
does not depend on the number of watched dictionary entries: the
complexity is O(1).

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the global variable is not modified, the
function uses the cached copy. If the global variable is modified, the
function uses a regular lookup, and maybe also deoptimize the function
(to remove the overhead of the guard check for next function calls).

See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for a more general rationale on
Python static optimizers.


Guard example
=============

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.value = dict.get(key, UNSET)
            self.version = dict_get_version(dict)

        def check(self):
            """Return True if the dictionary entry did not change
            and the dictionary was not replaced."""

            # read the version of the dictionary
            version = dict_get_version(self.dict)
            if version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            value = self.dict.get(self.key, UNSET)
            if value is self.value:
                # another key was modified:
                # cache the new dictionary version
                self.version = version
                return True

            # the key was modified
            return False


Usage of the dict version
=========================

Speedup method calls
--------------------

Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the
`"implement per-opcode cache in ceval"
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it in a
different namespace, using ``exec()`` with the *globals* parameter for
example. In this case, the globals dictionary was replaced and the cache
must also be invalidated.


Specialized functions using guards
----------------------------------

The `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.

The `fatoptimizer <http://fatoptimizer.readthedocs.org/>`_ of the `FAT
Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
is an example of a static Python optimizer. It implements many
optimizations which require guards on namespaces:

* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
  ``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
  guards on ``builtins.__dict__['range']`` and ``globals()['range']``
  are required
* etc.


Pyjion
------

According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can benefit from dictionary version to implement optimizations.

`Pyjion <https://github.com/Microsoft/Pyjion>`_ is a JIT compiler for
Python based upon CoreCLR (Microsoft .NET Core runtime).


Cython
------

Cython can benefit from dictionary version to implement optimizations.

`Cython <http://cython.org/>`_ is an optimising static compiler for both
the Python programming language and the extended Cython programming
language.


Unladen Swallow
---------------

Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." (source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_).

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
Retrospective
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.


Changes
=======

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global
dictionary version. Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:

* ``clear()`` if the dict is non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the key does not exist
* ``__delitem__(key)`` if the key exists
* ``__setitem__(key, value)`` always increases the version
* ``update(...)`` if called with arguments

The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make them atomic.

Example using an hypothetical ``dict_get_version(dict)`` function::

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

``dict.__setitem__(key, value)`` and ``dict.update(...)`` always
increases the version, even if the new value is identical or is equal to
the current value (even if ``(dict[key] is value) or (dict[key] ==
value)``).

The field is called ``ma_version_tag``, rather than ``ma_version``, to
suggest to compare it using ``version_tag == old_version_tag``, rather
than ``version <= old_version`` which is wrong most of the time after an
integer overflow.


Backwards Compatibility
=======================

Since the ``PyDictObject`` structure is not part of the stable ABI and
the new dictionary version not exposed at the Python scope, changes are
backward compatible.


Implementation and Performance
==============================

The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP.

On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations. For example, the following timeit
micro-benchmarks takes 318 nanoseconds before and after the change::

    python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1];
del d[2]; d.clear()'

When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
a guard can watch for multiple keys. For example, for an optimization
using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).

The `fat module
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
such guards: ``fat.GuardDict`` is based on the dictionary version.


Integer overflow
================

The implementation uses the C type ``PY_UINT64_T`` to store the version:
a 64 bits unsigned integer. The C code uses ``version++``. On integer
overflow, the version is wrapped to ``0`` (and then continue to be
incremented) according to the C standard.

After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug only occurs at a guard check if
there are exaclty ``2 ** 64`` dictionary creations or modifications
since the previous guard check.

If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.

A risk of a bug every 584 years is acceptable.


Alternatives
============

Expose the version at Python level as a read-only __version__ property
----------------------------------------------------------------------

The first version of the PEP proposed to expose the dictionary version
as a read-only ``__version__`` property at Python level, and also to add
the property to ``collections.UserDict`` (since this type must mimick
the ``dict`` API).

There are multiple issues:

* To be consistent and avoid bad surprises, the version must be added to
  all mapping types. Implementing a new mapping type would require extra
  work for no benefit, since the version is only required on the
  ``dict`` type in practice.
* All Python implementations would have to implement this new property,
  it gives more work to other implementations, whereas they may not use
  the dictionary version at all.
* Exposing the dictionary version at the Python level can lead the
  false assumption on performances. Checking ``dict.__version__`` at
  the Python level is not faster than a dictionary lookup. A dictionary
  lookup in Python has a cost of 48.7 ns and checking the version has a
  cost of 47.5 ns, the difference is only 1.2 ns (3%)::


    $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}'
'd["33"] == 33'
    10000000 loops, best of 3: 0.0487 usec per loop
    $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}'
'd.__version__ == 100'
    10000000 loops, best of 3: 0.0475 usec per loop

* The ``__version__`` can be wrapped on integer overflow. It is error
  prone: using ``dict.__version__ <= guard_version`` is wrong,
  ``dict.__version__ == guard_version`` must be used instead to reduce
  the risk of bug on integer overflow (even if the integer overflow is
  unlikely in practice).

Mandatory bikeshedding on the property name:

* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
  `abc.get_cache_token()
  <https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__``
* ``__version_tag__``
* ``__timestamp__``


Add a version to each dict entry
--------------------------------

A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version (a simple integer) to avoid the strong reference to the value:
only strong references to the dictionary and to the key are needed.

Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry``
structure, the field has the C type ``PY_UINT64_T``. When a key is
created or modified, the entry version is set to the dictionary version
which is incremented at any change (create, modify, delete).

Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)`` and
``dict_get_entry_version(dict)`` functions::

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.dict_version = dict_get_version(dict)
            self.entry_version = dict_get_entry_version(dict, key)

        def check(self):
            """Return True if the dictionary entry did not change
            and the dictionary was not replaced."""

            # read the version of the dictionary
            dict_version = dict_get_version(self.dict)
            if dict_version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary to read the entry version
            entry_version = get_dict_key_version(dict, key)
            if entry_version == self.entry_version:
                # another key was modified:
                # cache the new dictionary version
                self.dict_version = dict_version
                self.entry_version = entry_version
                return True

            # the key was modified
            return False

The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or not used). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.

In Python, the memory footprint matters and the trend is to reduce it.
Examples:

* `PEP 393 -- Flexible String Representation
  <https://www.python.org/dev/peps/pep-0393/>`_
* `PEP 412 -- Key-Sharing Dictionary
  <https://www.python.org/dev/peps/pep-0412/>`_


Add a new dict subtype
----------------------

Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.

Leave the ``dict`` type unchanged to not add any overhead (CPU, memory
footprint) when guards are not used.

Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues:

* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
  use ``globals={}``. It is not possible to cast the ``dict`` to a
  ``dict`` subtype because the caller expects the ``globals`` parameter
  to be modified (``dict`` is mutable).
* C functions call directly ``PyDict_xxx()`` functions, instead of calling
  ``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
  functions require the exact ``dict`` type.
* ``Python/ceval.c`` does not completely supports dict subtypes for
  namespaces


The ``exec()`` issue is a blocker issue.

Other issues:

* The garbage collector has a special code to "untrack" ``dict``
  instances. If a ``dict`` subtype is used for namespaces, the garbage
  collector can be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken
  for ``dict`` subtypes, and so it would make Python a little bit
  slower.


Prior Art
=========

Method cache and type version tag
---------------------------------

In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
was merged into Python 2.6.  The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure).

The type version tag is not exposed at the Python level.

The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to
"make it fast, have a deterministic and low memory footprint, and be
easy to invalidate". Each cache entry has a version tag. A global
version tag is used to create the next version tag, it also has the C
type ``unsigned int``.

By default, a type has its "valid version tag" flag cleared to indicate
that the version tag is invalid. When the first method of the type is
cached, the version tag and the "valid version tag" flag are set. When a
type is modified, the "valid version tag" flag of the type and its
subclasses is cleared. Later, when a cache entry of these types is used,
the entry is removed because its version tag is outdated.

On integer overflow, the whole cache is cleared and the global version
tag is reset to ``0``.

See `Method cache (issue #1685986)
<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
optimization updated for Python 2.6 (issue #1700288)
<https://bugs.python.org/issue1700288>`_.


Globals / builtins cache
------------------------

In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
#10401) <http://bugs.python.org/issue10401>`_ which adds a private
``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``Py_ssize_t``.

The patch adds a "global and builtin cache" to functions and frames, and
changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
cache.

The change on the ``PyDictObject`` structure is very similar to this
PEP.


Cached globals+builtins lookup
------------------------------

In 2006, Andrea Griffini proposed a patch implementing a `Cached
globals+builtins lookup optimization
<https://bugs.python.org/issue1616125>`_.  The patch adds a private
``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``size_t``.

Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_
(December 2006).


Guard against changing dict during iteration
--------------------------------------------

In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``.  This field is incremented
when the dictionary is modified.


PySizer
-------

`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.

This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.


Discussion
==========

Thread on the mailing lists:

* python-dev: `Updated PEP 509
  <https://mail.python.org/pipermail/python-dev/2016-April/144250.html>`_
* python-dev: `RFC: PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-April/144137.html>`_
* python-dev: `PEP 509: Add a private version to dict
  <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
  (january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
  <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
  (january 2016)


Copyright
=========

This document has been placed in the public domain.

From stephen at xemacs.org  Tue Apr 19 07:46:38 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 19 Apr 2016 20:46:38 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
Message-ID: <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>

Brett Cannon writes:
 > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull <stephen at xemacs.org> wrote:

 > Well, it makes *your* head hurt;

It doesn't, because I have a different (and IMHO better) model.  I can
interpret yours without pain by comparing to that.

 > By providing os.fspath() I can say that I do not, under any
 > circumstances, want someone to guess at the encoding some bytes
 > path is under to get me a string and instead I want to start and
 > end entirely in a world of strings. IOW os.fspath() lets me work in
 > such a way that the instant bytes are introduced into my code for
 > file paths it triggers a TypeError.

Does it really help you work that way?  open is polymorphic, and will
use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
don't, there's no point in supporting bytes returns from __fspath__,
is there?  Application code will normally not be calling os.fspath.
In the future, pathlib will, I suppose, but even without os.fspath
pathlib already protects you, as does antipathy.[1]

More effective, then, is just to use pathlib for your Path-hacking
work as soon as the path-representing object appears, and Path will
complain about bytes for you.  This is an analogue of the "decode
bytes at the boundary" principle.

 > Yep, we are stuck with the names unless you want to propose a new
 > name and deprecate the old one.

I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
sufficiently ugly to prove my point.<wink/>


Footnotes: 
[1]  Strictly speaking, antipathy protects you from inadvertant mixing
of bytes and str.


From stephen at xemacs.org  Tue Apr 19 07:55:55 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 19 Apr 2016 20:55:55 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <57153AA0.5090103@stoneleaf.us>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
Message-ID: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>

Ethan Furman writes:
 > On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
 > > Koos Zevenhoven writes:
 > 
 > >> After all, we want something that's *almost* exclusively str.
 > >
 > > But we don't want that, AFAICT.  Some clearly want this API to be
 > > unbiased against bytes in the same way the os APIs are unbiased,
 > > because that's what we've got in the current proposal.
 > 
 > Are we reading the same thread?  For my last several replies I am
 > very biased against bytes (and I know I'm not the only one).

I'm not "reinterpreting" what people *write*, I'm looking at *the APIs
they propose and advocate*.  As I wrote, and you quoted.<wink/>

Except for the original proposal that only supported pathlib.Path, the
facilities advocated are actually unbiased.  It's just as easy to use
bytes as str, but it's proposed not to advertise that fact.  So what?
A 'my.fspath' is trivial to write, and hard to get wrong AFAICS.

Consider a truly biased alternative: __fspath__ of types like DirEntry
would return self when bytes-oriented.  (This addresses the issue of
__fspath__ that coerces to str becoming a timebomb in bytes apps.)
bytes-oriented applications would have to use DirEntry.path.  No
visible difference from now (you get the same API for bytes and the
same TypeError from open), and no loss, except for str-envy.  So use
str!  Why isn't that acceptable to you?  Maybe even TOOWTDI?

I really want to know.  I'm not 100% sure that's the right way to go,
mostly because Nick and Brett are signed up for polymorphism.  But I
sure haven't seen any explicit arguments for polymorphism, though I've
asked for them.  AFAICS, everybody just assumed that because some
related APIs are polymorphic, this one should be, too, and dove into
the problem of how to make a polymorphic API safe for Python 3.

 > If the client says "I'm okay with either" then I fully expect the
 > client to have code to properly handle str vs bytes after the
 > fspath (or whatever it's called) call.

I would too, but, uh, examples of such clients?  And no, antipathy
isn't an example -- it doesn't consume bytes, it passes them through
to the kind of client I want to hear about.

AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
that actually wants it.

Steve

From k7hoven at gmail.com  Tue Apr 19 08:50:01 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Tue, 19 Apr 2016 15:50:01 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohohLRS1XEJUUxfUMUASpgQ7c142bLG-2Jj1vyCz-ezzf+g@mail.gmail.com>

On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>
> AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
> that actually wants it.

It might be, but as long as bytes paths are supported polymorphicly
all over the stdlib, we won't get rid of supporting bytes paths. So
are you proposing to deprecate bytes paths?

-Koos

From victor.stinner at gmail.com  Tue Apr 19 09:33:17 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 19 Apr 2016 15:33:17 +0200
Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3)
In-Reply-To: <CAMpsgwa_==3yMjNWWFOM=xZi1Zeg_TW3gNJWf6sc1XmbzZofrQ@mail.gmail.com>
References: <CAMpsgwa_==3yMjNWWFOM=xZi1Zeg_TW3gNJWf6sc1XmbzZofrQ@mail.gmail.com>
Message-ID: <CAMpsgwZdp8-pUknyfDPe7P2hQRvvwnVZLwk3tey4Lkz7nVwTdQ@mail.gmail.com>

Hi,

> Backwards Compatibility
> =======================
>
> Since the ``PyDictObject`` structure is not part of the stable ABI and
> the new dictionary version not exposed at the Python scope, changes are
> backward compatible.

My current implementation inserts the new ma_version_tag field in the
middle of the PyDictObject structure, so it obviously changes the ABI.

Can someone please confirm (double check) that the PyDictObject
structure is explicitly excluded from the stable ABI? I'm talking
about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h.

I understood what is an ABI in the hard way. When I ran the perf.py
benchmark, I got a crash in ctypes on django_v3. The ctypes module
uses a C type which inherits from the dict type. I compiled Python
with and without my patch in the same directory and then I renamed the
./python binary, but the _ctypes.so was shared between the two
binaries.

Victor

From ncoghlan at gmail.com  Tue Apr 19 10:26:44 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 20 Apr 2016 00:26:44 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7en9xZZCLgsLpdaD3gNitAkTn2e_N4VfZpVsvS5vYxvYw@mail.gmail.com>

On 19 April 2016 at 21:55, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> I really want to know.  I'm not 100% sure that's the right way to go,
> mostly because Nick and Brett are signed up for polymorphism.  But I
> sure haven't seen any explicit arguments for polymorphism, though I've
> asked for them.  AFAICS, everybody just assumed that because some
> related APIs are polymorphic, this one should be, too, and dove into
> the problem of how to make a polymorphic API safe for Python 3.
>

In my case, it's ~5 years of peripheral involvement in porting the Fedora
ecosystem to Python 3. I haven't personally done that much of the actual
porting work, but I've spent plenty of time talking to the folks that are,
and tweaking various things to make their lives easier where I could make
the case that there was either a benefit to Python 3, or at least no harm
to it.

The gist of the motivation for bytes/str polymorphism here is similar to
that for restoring __mod__ polymorphism in
https://www.python.org/dev/peps/pep-0461/: the bytes/str duality is as much
a fact of life when dealing with OS interfaces as it is when dealing with
wire protocols, so if __fspath__ is polymorphic, then it's easier for
compatibility modules like six and future to define their own "fspath"
helper functions that work on both Python 2 and Python 3 across all
supported platforms.

This is also why I ended up proposing pushing the complexity down into a
documented-but-underscore-prefixed API: folks writing pure Python 3
application code *really* shouldn't need to worry about the bytes support
in the protocol, but for operating system level use cases, not having it
readily available to 2/3 compatible Python code would be a pain.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160420/cf775fc5/attachment.html>

From andreas.r.maier at gmx.de  Tue Apr 19 07:38:48 2016
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Tue, 19 Apr 2016 13:38:48 +0200
Subject: [Python-Dev] Dependent packages not listed on PyPI
Message-ID: <571618C8.5010002@gmx.de>

Hi,
I have a package "pywbem" which in its setup script specifies a number 
of dependent packages via "install_requires".

I should also say that it extends setuptools/distutils with its own 
additional keywords, e.g. it adds a "develop_requires", but I believe 
(hope) that is irrelevant for my problem.

In pywbem 0.8.3, the dependencies are:

     args = {
         ...,
         'install_requires': [
             'six',
             'ply',
         ],
         ...,
     }

and when running on Python 2.x, an additional one is added, dependent on 
the OS platform and bit size:

     if sys.version_info[0] == 2:
         if platform.system() == 'Windows':
             if platform.architecture()[0] == '64bit':
                 m2crypto_req = 'M2CryptoWin64>=0.21'
             else:
                 m2crypto_req = 'M2CryptoWin32>=0.21'
         else:
             m2crypto_req = 'M2Crypto>=0.24'
         args['install_requires'] += [
             m2crypto_req,
         ]

The problem is that the pywbem package on PyPI does not show these 
dependencies: https://pypi.python.org/pypi/pywbem/0.8.3

I wonder whether this is the reason for a particular installation 
problem we have seen (https://github.com/pywbem/pywbem/issues/113).

I do see other projects on PyPI, that show the dependencies they specify 
in their setup scripts, on their PyPI package page in a "*Requires 
Distributions*" section:

* https://pypi.python.org/pypi/bandit/0.17.3
* https://pypi.python.org/pypi/json-spec/0.9.14

Many others also do not have their dependencies shown, including six, 
pbr, PyYAML, lxml, to name just a few.

So far, I was unable to find out what the presence or absence of that 
information is related to, in the source of the project.

Here are my questions:

1. What causes the "Requires Distributions" section on a PyPI package 
page to show up there?

2. Is it important to show up there (e.g. for some tools)?

Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160419/f389a476/attachment.html>

From brett at python.org  Tue Apr 19 12:17:51 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 19 Apr 2016 16:17:51 +0000
Subject: [Python-Dev] Dependent packages not listed on PyPI
In-Reply-To: <571618C8.5010002@gmx.de>
References: <571618C8.5010002@gmx.de>
Message-ID: <CAP1=2W6zm5=Sug7geBv8PwhB3W3+g-WLVcMH46ZVghAZNxuw5Q@mail.gmail.com>

Questions about PyPI should be directed at the distutils-sig mailing list.

On Tue, 19 Apr 2016 at 08:12 Andreas Maier <andreas.r.maier at gmx.de> wrote:

> Hi,
> I have a package "pywbem" which in its setup script specifies a number of
> dependent packages via "install_requires".
>
> I should also say that it extends setuptools/distutils with its own
> additional keywords, e.g. it adds a "develop_requires", but I believe
> (hope) that is irrelevant for my problem.
>
> In pywbem 0.8.3, the dependencies are:
>
>     args = {
>         ...,
>         'install_requires': [
>             'six',
>             'ply',
>         ],
>         ...,
>     }
>
> and when running on Python 2.x, an additional one is added, dependent on
> the OS platform and bit size:
>
>     if sys.version_info[0] == 2:
>         if platform.system() == 'Windows':
>             if platform.architecture()[0] == '64bit':
>                 m2crypto_req = 'M2CryptoWin64>=0.21'
>             else:
>                 m2crypto_req = 'M2CryptoWin32>=0.21'
>         else:
>             m2crypto_req = 'M2Crypto>=0.24'
>         args['install_requires'] += [
>             m2crypto_req,
>         ]
>
> The problem is that the pywbem package on PyPI does not show these
> dependencies: https://pypi.python.org/pypi/pywbem/0.8.3
>
> I wonder whether this is the reason for a particular installation problem
> we have seen (https://github.com/pywbem/pywbem/issues/113).
>
> I do see other projects on PyPI, that show the dependencies they specify
> in their setup scripts, on their PyPI package page in a "*Requires
> Distributions*" section:
>
> * https://pypi.python.org/pypi/bandit/0.17.3
> * https://pypi.python.org/pypi/json-spec/0.9.14
>
> Many others also do not have their dependencies shown, including six, pbr,
> PyYAML, lxml, to name just a few.
>
> So far, I was unable to find out what the presence or absence of that
> information is related to, in the source of the project.
>
> Here are my questions:
>
> 1. What causes the "Requires Distributions" section on a PyPI package
> page to show up there?
>
> 2. Is it important to show up there (e.g. for some tools)?
>
> Andy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160419/1b887052/attachment.html>

From brett at python.org  Tue Apr 19 12:50:07 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 19 Apr 2016 16:50:07 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>

On Tue, 19 Apr 2016 at 04:46 Stephen J. Turnbull <stephen at xemacs.org> wrote:

> Brett Cannon writes:
>  > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull <stephen at xemacs.org>
> wrote:
>
>  > Well, it makes *your* head hurt;
>
> It doesn't, because I have a different (and IMHO better) model.  I can
> interpret yours without pain by comparing to that.
>
>  > By providing os.fspath() I can say that I do not, under any
>  > circumstances, want someone to guess at the encoding some bytes
>  > path is under to get me a string and instead I want to start and
>  > end entirely in a world of strings. IOW os.fspath() lets me work in
>  > such a way that the instant bytes are introduced into my code for
>  > file paths it triggers a TypeError.
>
> Does it really help you work that way?  open is polymorphic, and will
> use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
> don't, there's no point in supporting bytes returns from __fspath__,
> is there?


You're leaving out all of the os.path functions, but you're right that if
they didn't support it like Windows then this entire discussion of bytes
paths would be moot.


>   Application code will normally not be calling os.fspath.
> In the future, pathlib will, I suppose, but even without os.fspath
> pathlib already protects you, as does antipathy.[1]
>

I disagree that application code won't be calling os.fspath.


>
> More effective, then, is just to use pathlib for your Path-hacking
> work as soon as the path-representing object appears, and Path will
> complain about bytes for you.  This is an analogue of the "decode
> bytes at the boundary" principle.
>

Ah, but you see that doesn't make porting easy. If I have a bunch of
path-manipulating code using os.path already and I want to add support for
pathlib I can either (a) rewrite all of that path-manipulating code to work
using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
it. Basically if you have written any code that uses os.path then you will
have to care about (a) or (b) as a way to add support for pathlib short of
the `str(path)` hack we're all working to get away from. And if people
truly liked option (a) then this conversation wouldn't be such a big deal
as we would have seen more people using pathlib already (yes, the
provisional tag may have scared some off, but my guess is it's more from
not wanting to rewrite os.path-using code).

Now if you can convince me that the use of bytes paths is very minimal and
thus people doing path manipulations with them will be a very small
minority then I'm happy to try and use this to keep pushing people towards
avoiding bytes for file paths. But over the years people such as yourself,
Stephen, have convinced me that people do some really crazy stuff with
their file systems and that it isn't isolated to just one or two people.
And so it becomes this situation where we need to ask ourselves if we are
going to tell them to just deal with it or help them transition.

The other way to convince me is that people needing to support older
versions of Python will use `path = path.__fspath__() if hasattr(path,
'__fspath__') else path` and that allowing bytes with that idiom is going
to cost them dearly. My current assumption is that it won't because people
using that idiom are using os.path and those functions will complain when
mixing str and bytes together, but I'm open to being convinced otherwise.

I guess what I'm trying to get at is that I understand the desire to get
people to get the bytes path habit, but to me the best way will be to get
people quickly and easily transitioned over to pathlib as a carrot rather
than using the lack of bytes path support in this transition as a stick.

-Brett



>
>  > Yep, we are stuck with the names unless you want to propose a new
>  > name and deprecate the old one.
>
> I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
> sufficiently ugly to prove my point.<wink/>
>
>
> Footnotes:
> [1]  Strictly speaking, antipathy protects you from inadvertant mixing
> of bytes and str.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160419/531d2be8/attachment.html>

From ericsnowcurrently at gmail.com  Tue Apr 19 18:22:46 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Tue, 19 Apr 2016 16:22:46 -0600
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
Message-ID: <CALFfu7DbJAq0wwZrarkepQY4dVx-fj6MYgNGvU0Bz+WrMM9V-A@mail.gmail.com>

On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon <brett at python.org> wrote:
> Ah, but you see that doesn't make porting easy. If I have a bunch of
> path-manipulating code using os.path already and I want to add support for
> pathlib I can either (a) rewrite all of that path-manipulating code to work
> using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
> it. Basically if you have written any code that uses os.path then you will
> have to care about (a) or (b) as a way to add support for pathlib short of
> the `str(path)` hack we're all working to get away from. And if people truly
> liked option (a) then this conversation wouldn't be such a big deal as we
> would have seen more people using pathlib already (yes, the provisional tag
> may have scared some off, but my guess is it's more from not wanting to
> rewrite os.path-using code).
>
> Now if you can convince me that the use of bytes paths is very minimal and
> thus people doing path manipulations with them will be a very small minority
> then I'm happy to try and use this to keep pushing people towards avoiding
> bytes for file paths. But over the years people such as yourself, Stephen,
> have convinced me that people do some really crazy stuff with their file
> systems and that it isn't isolated to just one or two people. And so it
> becomes this situation where we need to ask ourselves if we are going to
> tell them to just deal with it or help them transition.
>
> The other way to convince me is that people needing to support older
> versions of Python will use `path = path.__fspath__() if hasattr(path,
> '__fspath__') else path` and that allowing bytes with that idiom is going to
> cost them dearly. My current assumption is that it won't because people
> using that idiom are using os.path and those functions will complain when
> mixing str and bytes together, but I'm open to being convinced otherwise.
>
> I guess what I'm trying to get at is that I understand the desire to get
> people to get the bytes path habit, but to me the best way will be to get
> people quickly and easily transitioned over to pathlib as a carrot rather
> than using the lack of bytes path support in this transition as a stick.

Perhaps I missed previous discussion on the point, but why not support
both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
NotImplemented would indicate "try the other one".  For example,
DirEntry.__fspath__() would return NotImplemented when the underlying
value is bytes and vice-versa.

A str-specific os.fspath would looks something like this:

    def fspath(path):
        try:
            fspath = type(path).__fspath__
        except AttributeError:
            pass
        else:
            rendered = fspath(path)
            if rendered is not NotImplemented:
                return rendered
        raise TypeError

...and a more lenient, polymorphic version (for use by os.path.*,
etc.) would look like this:

    def _fspath(path):
        try:
            fspath = type(path).__fspath__
        except AttributeError:
            pass
        else:
            rendered = fspath(path)
            if rendered is not NotImplemented:
                return rendered

       try:
            fspath = type(path).__fssyspath__
        except AttributeError:
            pass
        else:
            rendered = fspath(path)
            if rendered is not NotImplemented:
                return rendered

        # nothing to do
        return path

The hard distinction between the two dunder methods preserves the
conceptual str/bytes division we're aiming for.  It will be much
easier to identify which path implementations are dealing with (or
supporting) bytes paths.  Likewise with the two helpers and their
usage.

-eric

From brett at python.org  Tue Apr 19 19:05:28 2016
From: brett at python.org (Brett Cannon)
Date: Tue, 19 Apr 2016 23:05:28 +0000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CALFfu7DbJAq0wwZrarkepQY4dVx-fj6MYgNGvU0Bz+WrMM9V-A@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <CALFfu7DbJAq0wwZrarkepQY4dVx-fj6MYgNGvU0Bz+WrMM9V-A@mail.gmail.com>
Message-ID: <CAP1=2W6HBNfWSRtCACTigXXarifkHJqsqd1Oz_ABncpcmdxkxQ@mail.gmail.com>

On Tue, 19 Apr 2016 at 15:22 Eric Snow <ericsnowcurrently at gmail.com> wrote:

> On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon <brett at python.org> wrote:
> > Ah, but you see that doesn't make porting easy. If I have a bunch of
> > path-manipulating code using os.path already and I want to add support
> for
> > pathlib I can either (a) rewrite all of that path-manipulating code to
> work
> > using pathlib, or (b) simply call `path = os.fspath(path)` and be done
> with
> > it. Basically if you have written any code that uses os.path then you
> will
> > have to care about (a) or (b) as a way to add support for pathlib short
> of
> > the `str(path)` hack we're all working to get away from. And if people
> truly
> > liked option (a) then this conversation wouldn't be such a big deal as we
> > would have seen more people using pathlib already (yes, the provisional
> tag
> > may have scared some off, but my guess is it's more from not wanting to
> > rewrite os.path-using code).
> >
> > Now if you can convince me that the use of bytes paths is very minimal
> and
> > thus people doing path manipulations with them will be a very small
> minority
> > then I'm happy to try and use this to keep pushing people towards
> avoiding
> > bytes for file paths. But over the years people such as yourself,
> Stephen,
> > have convinced me that people do some really crazy stuff with their file
> > systems and that it isn't isolated to just one or two people. And so it
> > becomes this situation where we need to ask ourselves if we are going to
> > tell them to just deal with it or help them transition.
> >
> > The other way to convince me is that people needing to support older
> > versions of Python will use `path = path.__fspath__() if hasattr(path,
> > '__fspath__') else path` and that allowing bytes with that idiom is
> going to
> > cost them dearly. My current assumption is that it won't because people
> > using that idiom are using os.path and those functions will complain when
> > mixing str and bytes together, but I'm open to being convinced otherwise.
> >
> > I guess what I'm trying to get at is that I understand the desire to get
> > people to get the bytes path habit, but to me the best way will be to get
> > people quickly and easily transitioned over to pathlib as a carrot rather
> > than using the lack of bytes path support in this transition as a stick.
>
> Perhaps I missed previous discussion on the point, but why not support
> both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
> NotImplemented would indicate "try the other one".  For example,
> DirEntry.__fspath__() would return NotImplemented when the underlying
> value is bytes and vice-versa.
>

It was deemed more complexity than necessary for the protocol to have two
functions. Either __fspath__ will be polymorphic or it will only return str.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160419/b01e5a0b/attachment.html>

From victor.stinner at gmail.com  Tue Apr 19 19:33:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 20 Apr 2016 01:33:44 +0200
Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance
In-Reply-To: <CAMpsgwYaEw6vO7GrK82N4QVWupSk1cAPVyZPMcqTwn-xzn89CQ@mail.gmail.com>
References: <CAMpsgwaB43ygVr8-fY0obW+BJ4UORKKJeDXMR6r46thXQAVrnQ@mail.gmail.com>
 <56B3254F.7020605@egenix.com>
 <CAMpsgwbcac2ZNqeeCUNZpbVGyzb_vg8cdRZbFuwkZAzn9X7x1A@mail.gmail.com>
 <56B34A1E.4010501@egenix.com>
 <CAMpsgwbtfrSeiUbuGa3=eVUKqUdwC0==-Z++M+Rhu_mNRq56qg@mail.gmail.com>
 <56B35AB5.5090308@egenix.com>
 <CAMpsgwYNvSv44t2qG8bypcVA2T5YOSkyJkXsH9Ea2ZmkdNkHxw@mail.gmail.com>
 <CAMpsgwaazYP+nfhtmwNgBx2jN6Y1jTySe=yJwG=ZFkjwNy7QWQ@mail.gmail.com>
 <56BDDEA3.2060702@egenix.com>
 <CAMpsgwYaEw6vO7GrK82N4QVWupSk1cAPVyZPMcqTwn-xzn89CQ@mail.gmail.com>
Message-ID: <CAMpsgwabMLVwFPGa=6si9vfeZLNpU2HWWcDL_y5WavL0CJG7+w@mail.gmail.com>

Ping? Is someone still opposed to my change #26249 "Change
PyMem_Malloc to use pymalloc allocator"? If no, I think that I will
push my change.

My change only changes two lines, so it can be easily reverted before
CPython 3.6 if we detect major issues in third-party extensions. And
maybe it's better to push such change today to get more time to play
with it, than pushing it late in the development of CPython 3.6.

The new PYTHONMALLOC=debug feature allows to quickly and easily check
the usage of the PyMem_Malloc() API, even if Python is compiled in
release mode.

I checked multiple Python extensions written in C. I only found one
bug in numpy and I sent a patch (not merged yet).

victor

2016-03-15 0:19 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:
> 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
>>>> If your program has bugs, you can use a debug build of Python 3.5 to
>>>> detect misusage of the API.
>>
>> Yes, but people don't necessarily do this, e.g. I have
>> for a very long time ignored debug builds completely
>> and when I started to try them, I found that some of the
>> things I had been doing with e.g. free list implementations
>> did not work in debug builds.
>
> I just added support for debug hooks on Python memory allocators on
> Python compiled in *release* mode. Set the environment variable
> PYTHONMALLOC to debug to try with Python 3.6.
>
> I added a check on PyObject_Malloc() debug hook to ensure that the
> function is called with the GIL held. I opened an issue to add a
> similar check on PyMem_Malloc():
> https://bugs.python.org/issue26563
>
>
>> Yes, but those are part of the stdlib. You'd need to check
>> a few C extensions which are not tested as part of the stdlib,
>> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom
>> types in C since these will often need the memory management
>> APIs).
>>
>> It may also be a good idea to check wrapper generators such
>> as cython, swig, cffi, etc.
>
> I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
>
> I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL:
> https://github.com/numpy/numpy/pull/7404
>
> Except of this bug, all other tests pass with PyMem_Malloc() using
> pymalloc and all debug checks.
>
> Victor

From stephen at xemacs.org  Tue Apr 19 23:11:16 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 20 Apr 2016 12:11:16 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAMiohohLRS1XEJUUxfUMUASpgQ7c142bLG-2Jj1vyCz-ezzf+g@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
 <CAMiohohLRS1XEJUUxfUMUASpgQ7c142bLG-2Jj1vyCz-ezzf+g@mail.gmail.com>
Message-ID: <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp>

Koos Zevenhoven writes:
 > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > >
 > > AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
 > > that actually wants it.
 > 
 > It might be,

May I take that as meaning you just jumped to the conclusion that
extending polymorphism is useful on no actual evidence of usefulness?

 > but as long as bytes paths are supported polymorphicly all over the
 > stdlib, we won't get rid of supporting bytes paths. So are you
 > proposing to deprecate bytes paths?

You claim "almost always want str", Ethan claims "bias against bytes."
Sorry, guys, you can't have it both ways.  Either bytes paths are
discouraged (not "deprecated", not yet), or they aren't.

I say, let's not encourage them.  Ie, keep the status quo for bytes,
and make things better for the preferred str.  Yes, that means
discouraging bytes relative to str in this context.  That's a Python 3
principle, one strong enough to justify the huge compatibility break
involved in making str be Unicode.  That compatibility break has been
extremely successful in my personal experience as a sometime Python
teacher and Mailman developer, though the Mercurial developers have a
different POV.


From stephen at xemacs.org  Tue Apr 19 23:16:29 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 20 Apr 2016 12:16:29 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
Message-ID: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>

Brett Cannon writes:

 > Now if you can convince me that the use of bytes paths is very
 > minimal

I doubt that I can do that, because all that Python 2 code is
effectively bytes.  To the extent that people are just passing it into
their bytes-domain code and it works for them, they probably "port" to
Python 3 by using bytes for paths.  I just don't think bytes usage per
se matters to the issue of polymorphism of __fspath__.

 > Ah, but you see that doesn't make porting easy. If I have a bunch
 > of path-manipulating code using os.path already and I want to add
 > support for pathlib I can either (a) rewrite all of that
 > path-manipulating code to work using pathlib, or (b) simply call
 > `path = os.fspath(path)` and be done with it.

OK, so what matters here is not "how many people are using bytes".
They can keep using os.path, which is what they probably have already
been using.  What we are worrying about is that

(1) some really attractive producer of pathlib.Paths will be
    published, and

(2) people will want to plug that producer into their bytes paths
    consumers using os.fspath(path) "and be done with it".

Excuse me, but that doesn't make sense as written.  Path.__fspath__
will return str, in any case.  So these developers have to consume
text to use pathlib, even merely as a consumer of Paths.  No need for
polymorphism here, simply because it won't be used in this instance.

What's left is DirEntry (and perhaps other producers of byte-oriented
objects in os and os.path).  If they're currently using DirEntry,
they're currently accessing .path.  Surely bytes users can continue
doing that, even if we offer str users the advantage of new protocols?

I conclude that there is no real use in having a polymorphic
__fspath__ unless callers of os.fspath can communicate desired return
type to it, and it implicitly coerces to that type.  But then open and
friends *implicitly* consume __fspath__.  So there probably needs to
be a way to communicate the desired type to them in the case where
they receive an __fspath__-bearing object so they can tell os.fspath
what their callers want, no?

Supporting both "pipeline polymorphism" of this kind and implicit
conversion protocols at the same time is quite complicated, I think.

 > [Folks] have convinced me that people do some really crazy stuff
 > with their file systems and that it isn't isolated to just one or
 > two people.  And so it becomes this situation where we need to ask
 > ourselves if we are going to tell them to just deal with it or help
 > them transition.

People who have to deal with really crazy stuff in filesystems are
already manipulating paths as text.  It's not we who need help with
the transition that matters (bytes to text).  We can use os.path or
pathlib, but bytes just don't matter because we're not using them in
path manipulations.

It's people who live in monolingual mono-encoding environments who
will be using bytes successfully, and be resistent to costly changes
that don't make their lives better.  But the bytes vs. text cost is
inherent in using pathlib, so polymorphism doesn't help promote
pathlib.  It might help promote use of os.scandir in bytes-oriented
code, though I don't see that as a huge effect nor more than mildly
desirable.  Is it?

Steve

From stephen at xemacs.org  Tue Apr 19 23:19:31 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 20 Apr 2016 12:19:31 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CADiSq7en9xZZCLgsLpdaD3gNitAkTn2e_N4VfZpVsvS5vYxvYw@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7en9xZZCLgsLpdaD3gNitAkTn2e_N4VfZpVsvS5vYxvYw@mail.gmail.com>
Message-ID: <22294.62787.882889.852338@turnbull.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > The gist of the motivation for bytes/str polymorphism here is similar to
 > that for restoring __mod__ polymorphism in
 > https://www.python.org/dev/peps/pep-0461/:

I don't think it is, actually.  Filenames off the wire cannot be
relied on to be in the local file system encoding, and that matters.
The semantics of a filename or path requires getting the encodings
matched.  You cannot be encoding-agnostic.

On the other hand, streams of characters are merely a special case of
streams of tokens, and the principles that apply to editing streams of
characters apply to more general tokens, including bytes and XML.  You
*can* be content-agnostic as long as you define semantics in terms of
moving tokens around, and not in terms of their content.

BTW, my opposition to PEP 461 was based on the same mistake with
opposite polarity: I think of bytes as encoded text *first*, and
therefore feared PEP 461 for quite insufficient reason.  Most
applications of PEP 461 won't be for text.

 > This is also why I ended up proposing pushing the complexity down into a
 > documented-but-underscore-prefixed API: folks writing pure Python 3
 > application code *really* shouldn't need to worry about the bytes
 > support

You can't have that with your proposal.  They are going to (at least
in theory) get a new TypeError which they will not be expecting (vs
bytes, which are implicit in the object they have, where previously
they would have got one vs. Path or DirEntry which they were
expecting).  So they will have to learn that much about bytes support.

 > in the protocol, but for operating system level use cases, not having it
 > readily available to 2/3 compatible Python code would be a pain.

Erm, how do you propose to make this protocol available to Python-2-
compatible code?  Pervasively monkey-patch the Python 2 os module?
Even if so, is it our responsibility to worry about that?

BTW, I came to this conclusion thinking about the poster boy for PEP
461, Mercurial.


From rosuav at gmail.com  Tue Apr 19 23:34:44 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 20 Apr 2016 13:34:44 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAPTjJmoCO=JUniZhTOAVoNYwYo0E0M72JtW6=fPTfKEbyajxMA@mail.gmail.com>

On Wed, Apr 20, 2016 at 1:16 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Brett Cannon writes:
>
>  > Now if you can convince me that the use of bytes paths is very
>  > minimal
>
> I doubt that I can do that, because all that Python 2 code is
> effectively bytes.  To the extent that people are just passing it into
> their bytes-domain code and it works for them, they probably "port" to
> Python 3 by using bytes for paths.  I just don't think bytes usage per
> se matters to the issue of polymorphism of __fspath__.
>

I would prefer to see this kind of code ported to Python 3 by using
native strings.

Python 2 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
    data = json.load(f)
for scene in data["scene_order"]:
    print scene["name"]

Python 3 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
    data = json.load(f)
for scene in data["scene_order"]:
    print(scene["name"])

The bulk of path string literals in Python programs will be all-ASCII.
Porting to Py3 won't fundamentally change this code, yet suddenly now
it's using Unicode strings. In reality, both versions of this example
are using *text* strings. The Py3 version has text in the source code,
a stream of Unicode codepoints in the runtime, and then (since I ran
this on Linux) encodes that to bytes for the file system. The Py2
version just does that conversion a little earlier: text in the source
code, a stream of eight-bit "texty bytes" in the runtime, and those
same bytes get given to the fs.

There's no reason to slap a b"..." prefix on every path for Py3. There
might be specific situations where you want that, but for the most
part, those paths came from human-readable text anyway, so they should
stay that way.

ChrisA

From stephen at xemacs.org  Wed Apr 20 02:31:33 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 20 Apr 2016 15:31:33 +0900
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <CALFfu7DbJAq0wwZrarkepQY4dVx-fj6MYgNGvU0Bz+WrMM9V-A@mail.gmail.com>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <CALFfu7DbJAq0wwZrarkepQY4dVx-fj6MYgNGvU0Bz+WrMM9V-A@mail.gmail.com>
Message-ID: <m2ega0pz96.fsf@xemacs.org>

Eric Snow writes:
 > On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon <brett at python.org> wrote:
 > > Ah, but you see that doesn't make porting easy.

 > Perhaps I missed previous discussion on the point, but why not support
 > both __fspath__() -> str and __fssyspath__() -> bytes?

That's fine by me, I can live with that although I don't really like
it.  But the proponents of polymorphic __fspath__ think it's
unnecessary.

Why I don't like it: what's going to end up happening is that a
__fspath__- or __fssyspath__-bearing object of unknown provenance is
going to get passed to polymorphic os functions that won't complain,
and a few million cycles later something is going to access
fileobj.path expecting bytes and getting str, and blooey!

Also I just don't see a need for bytes when the original purpose of
this was to support passing pathlib.Path objects to open.  It's also
nice to pass DirEntry objects to open, but it's not obvious to me that
we need to support bytes since only new code can use this feature, and
there's a way to not-support them that doesn't cause any new problems.

It's not that I want bytes to go away[1], it's just that the playing
field will tilt a little more against them in new code.

Footnotes: 
[1]  I wouldn't weep, but I wouldn't laugh, either.


From pjenvey at underboss.org  Wed Apr 20 03:54:59 2016
From: pjenvey at underboss.org (Philip Jenvey)
Date: Wed, 20 Apr 2016 00:54:59 -0700
Subject: [Python-Dev] Bytes path
In-Reply-To: <CAMpsgwa19eWbemDEKUq0saDTNHN4acKFLT10+Aw4HzexNP9Lcw@mail.gmail.com>
References: <nenkqe$glb$1@ger.gmane.org>
 <CAMpsgwa19eWbemDEKUq0saDTNHN4acKFLT10+Aw4HzexNP9Lcw@mail.gmail.com>
Message-ID: <AF0698D9-CE0A-4CDC-B3EE-89521D05A2AE@underboss.org>

Yes, in the 3.2 time frame there was a consensus that only bytes and their subclasses should be accepted. buffer support crept back into the posix module with the major changes in 3.3, likely by mistake. A couple new issues are proposed to remove these inconsistencies/regressions:

http://bugs.python.org/issue26754 <http://bugs.python.org/issue26754>
http://bugs.python.org/issue26800 <http://bugs.python.org/issue26800>

--
Philip Jenvey

> On Apr 14, 2016, at 3:29 AM, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> IMHO it's more a side effect of the implementation than a deliberate choice. For new code which really want to support bytes paths, I suggest to only accept bytes and bytes subclasses.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160420/98c4e3dd/attachment.html>

From k7hoven at gmail.com  Wed Apr 20 04:20:10 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 20 Apr 2016 11:20:10 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp>
 <57153AA0.5090103@stoneleaf.us>
 <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp>
 <CAMiohohLRS1XEJUUxfUMUASpgQ7c142bLG-2Jj1vyCz-ezzf+g@mail.gmail.com>
 <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohoiPYYiVY-X91-zuhOaZi5HycfuP07uH8rg9W85_1o-x3A@mail.gmail.com>

On Wed, Apr 20, 2016 at 6:11 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Koos Zevenhoven writes:
>  > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>  > >
>  > > AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
>  > > that actually wants it.
>  >
>  > It might be,
>
> May I take that as meaning you just jumped to the conclusion that
> extending polymorphism is useful on no actual evidence of usefulness?

No you may not! YAGNI almost never means "you are *never* going to
need it". And if you implement a feature, better implement it well. If
a variation of the feature is rarely used, that is perfectly fine. I
think leaving bytes out would complicate things. If os.fspath does its
job well, everyone should be happy.

I kept bringing up bytes paths, because that is already a feature in
Python 3. Then (already some time ago in these discussions) I briefly
visited the thought of 'can we deprecate bytes paths', and it then
quickly became clear to me that is not going to happen any time soon.

In other words: As long as bytes paths are supported, they should be
supported consistently. I don't want DirEntry to behave differently
when the underlying type is bytes, which is one of the things I've
been talking about all the time. That would just be broken. And as you
also understand, one point is to allow passing DirEntry to open. Or
any of the os.path functions.

An some more: I don't want open(direntry_obj) to ever raise because it
is the bytes flavor of direntry, because, when they are created,
DirEntry objects always point to existing objects on the file system.
I also don't want implicit conversions between str and bytes paths,
because there are cases where they will produce strange results and
exceptions. [Yes, way back in the p-string thread, I did first suggest
a similiar thing that implied implicit conversion, but I soon
abandoned that part.]

Not that I will ever use these features---just to do this right.

>  > but as long as bytes paths are supported polymorphicly all over the
>  > stdlib, we won't get rid of supporting bytes paths. So are you
>  > proposing to deprecate bytes paths?
>
> You claim "almost always want str", Ethan claims "bias against bytes."
> Sorry, guys, you can't have it both ways.  Either bytes paths are
> discouraged (not "deprecated", not yet), or they aren't.
>
> I say, let's not encourage them.

It's all essentially the same thing:

"almost always want str":
Yes, I still claim this. This is the reason for str (and rejecting
bytes) being the default for third-party code. If we wanted to, we
could even leave bytes support out of the documentation, so no-one
will know about it unless they already deal with bytes paths. However,
I dont think we should do that---we should just strongly discourage
using the bytes version unless there is a reason to, and you know what
you are doing.

"bias against bytes":
I agree with this too. This is in line with making str (and rejecting
bytes) the default for third-party code.

"let's not encourage them":
And I even agree with this, as you may have noticed.

I just don't believe in deliberately making implementations awkward
for the bytes-based paths. Bytes paths already exist, not because of
Python 2 (as you know), but because not all operating systems
guarantee that paths make sense in any encoding, and people may need
to work at that level.

There is no need to make working with bytes-based paths awkward, and
we can support them with little additional work compared to supporting
str-based rich path objects. The additional work is mostly this
discussion.

> Ie, keep the status quo for bytes,
> and make things better for the preferred str.  Yes, that means
> discouraging bytes relative to str in this context.  That's a Python 3
> principle, one strong enough to justify the huge compatibility break
> involved in making str be Unicode.  That compatibility break has been
> extremely successful in my personal experience as a sometime Python
> teacher and Mailman developer, though the Mercurial developers have a
> different POV.

Yes. Luckily, people are already using str-based paths. We don't need
any more discrete transitions. If linux will start to enforce an
encoding, as Guido and Random832 may be suggesting on python-ideas,
these already obscure bytes paths will slowly fade away.

-Koos

From k7hoven at gmail.com  Wed Apr 20 06:19:50 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 20 Apr 2016 13:19:50 +0300
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohoj1jiVFrhr1Xtufnk2s7yeEGuo1xE=6Yg2QeHjTUgGZ7w@mail.gmail.com>

On Wed, Apr 20, 2016 at 6:16 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>
> (1) some really attractive producer of pathlib.Paths will be
>     published, and
>

Yes, pathlib is str-only, so this sounds just right.

> (2) people will want to plug that producer into their bytes paths
>     consumers using os.fspath(path) "and be done with it".
>

No, fspath can't know that is the the right thing to do.  There should
be *someone* that is aware of the encoding that happens, either the
provider or the consumer. That byte path consumer, assuming it wants
to support the behavior you describe, should use os.fsencode instead
of os.fspath, which will do exactly what you want, and just as easy
for the bytes path consumer to implement!

(Unless you want to explicitly reject plain str objects, which you
would then indeed do *explicitly*, but I'm not sure there is a point
in accepting plain bytes and str-based pathlib objects but not str).

To avoid further unnecessary discussion, please read [1] carefully,
where I already explained this, among other things.

-Koos

[1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html

From victor.stinner at gmail.com  Wed Apr 20 07:52:22 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 20 Apr 2016 13:52:22 +0200
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
Message-ID: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>

Hi,

I'm unable to count the number of threads about the fspath protocol.
It's even more difficult to count the total number of emails. IMHO
everyone had enough time to give him/her opinion. We even had multiple
summaries :-)

Can you please wait for a PEP? Brett Canon and Ethan Furman are
working on a PEP. So please give them time to write it.

The PEP should summarize the discussion and help a lot to make
concrete progress on the design (avoid restarting to discuss the same
points forever). I don't expect that more emails would add anything at
the current state of the discussion.

I think that we have enough other topics to discuss in the meanwhile ;-)

FYI there is already an article about fspath/pathlib on LWN. Here is a
free link until the article is freely accessible:

"Python looks at paths" By Jake Edge (April 13, 2016)
https://lwn.net/SubscriberLink/683350/4f52334af09653c8/

Victor

From k7hoven at gmail.com  Wed Apr 20 09:30:39 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 20 Apr 2016 16:30:39 +0300
Subject: [Python-Dev] Pathlib enhancements - improve fsdecode and
 fsencode
In-Reply-To: <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CAMiohoi32k46BufnLcdCnT1-rJuSkdGVWFxE_P3Pnb3cpLyH=Q@mail.gmail.com>

On Thu, Apr 14, 2016 at 9:55 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Please please please, junk both "filter out bytes" proposals.

If you were referring to some of the fspath versions, I think we will
need a bytes-rejecting version, for reasons explained in [1-2]. Of
course not eve?yone wants or has to use it.

> Since they involve an exception, they impose an unnecessary "try" on
> all text applications that fear death on bytes returns.  May as well
> just wrap all objects with __fspath__ in fsdecode, and all is
> happy.
>
> Counterproposal: make fsdecode and fsencode grok __fspath__.  Then:

Not being a native English speaker, I'm relying on a Wikipedia
explanation of "grok", but if you mean that fsdecode and fsencode
would accept objects that implement __fspath__, then I think we all
agree on this. Making the stdlib accept path objects, after all, is
the whole point of the pathlib discussions :).

Anyway, I am happy that Nick [3] (and you [4] ?) pointed out that
os.fsencode and os.fsdecode currently implement coercion, i.e., they
both accept both str and bytes, and return just one of them. This was
important for my conclusion in [1]. When these two functions are made
__fspath__ compatible using `fspath(patharg, output_types = (str,
bytes))`, like most os functions, they will indeed implement coercion
to bytes or str from "any pathlike object".

[Side note: One may, for instance, ask why os.fsdecode passes str
objects through silently, even if they can't be decoded. Well, that's
the way it is, and I'm not expecting that to change. But maybe
fsdecode should have an additional keyword-only argument to tell them
that it should strictly return something it actually did decode. (And
similarly for os.fsencode.) But this has nothing to do with the path
protocol we are discussing.]

> (1) Bytes-lovers and str-addicts are both safe.

I don't think everyone is safe if you cant say "I don't want implicit
encoding/decoding".

> (2) They can omit fspath, too!

I think having *one* additonal function for the
non-encoding/non-decoding cases is too much, and as shown in [1], one
is enough.

> No, that doesn't work if the bytes objects aren't in the file system
> encoding, but these are *bytes*, mon ami: you have no way to find out
> what that encoding is, so you either know already and you substitute
> that + fspath for fsdecode, or you're hosed.  And in the only concrete
> use case so far, fsdecode Just Works.

Well, as you say yourself, fsdecode indeed works if your bytes are in
the default fs encoding, and when you know they are, go for it, use
fsdecode. But I, for instance, rarely have my paths as bytes.
Therefore, I would be happy to get an exception if I'm accidentally
passing bytes to some non-bytes-supporting function because I've
forgotten to decode some input that I got in an encoding other than
the file system encoding.

> I suppose a similar argument holds for applications that want bytes
> and fsencode, but I leave that as an exercise for the reader.

A similar counterargument holds, too :).

Unrelated to this particular post, I believe these discussions are
almost done and I truly hope we at least won't have to keep addressing
the same questions that we have already gone through, unless there is
something new on the table.

I hope it takes a shorter time to read these emails than it takes to
write them :).

-Koos

[1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html
[2] https://mail.python.org/pipermail/python-dev/2016-April/144290.html

And somewhat older ones:

[3] https://mail.python.org/pipermail/python-dev/2016-April/144101.html
[4] https://mail.python.org/pipermail/python-dev/2016-April/144107.html

From ethan at stoneleaf.us  Wed Apr 20 09:58:07 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 20 Apr 2016 06:58:07 -0700
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
Message-ID: <57178AEF.7040205@stoneleaf.us>

On 04/20/2016 04:52 AM, Victor Stinner wrote:

> Can you please wait for a PEP? Brett Canon and Ethan Furman are
> working on a PEP.

Actually, Brett Canon and Chris Angelico.

 > So please give them time to write it.

Okay, I'll shut-up now.  ;)

--
~Ethan~

From rosuav at gmail.com  Wed Apr 20 10:00:56 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 21 Apr 2016 00:00:56 +1000
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <57178AEF.7040205@stoneleaf.us>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
 <57178AEF.7040205@stoneleaf.us>
Message-ID: <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>

On Wed, Apr 20, 2016 at 11:58 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 04/20/2016 04:52 AM, Victor Stinner wrote:
>
>> Can you please wait for a PEP? Brett Canon and Ethan Furman are
>> working on a PEP.
>
>
> Actually, Brett Canon and Chris Angelico.

I thought just Brett; my half of the proposal (the generic
"string-like" protocol) was withdrawn as being too broad in scope for
the justifying use-cases.

Brett, your turn. :)

ChrisA

From k7hoven at gmail.com  Wed Apr 20 11:27:59 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 20 Apr 2016 18:27:59 +0300
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
Message-ID: <CAMiohojHnor0nm4W=YZ-X=AS5umf2dBaP7NULnqbcbucpmBmMw@mail.gmail.com>

On Wed, Apr 20, 2016 at 2:52 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> I'm unable to count the number of threads about the fspath protocol.
> It's even more difficult to count the total number of emails. IMHO
> everyone had enough time to give him/her opinion.

Couldn't agree more.

> We even had multiple
> summaries :-)

I'm not quite as sure about this. Maybe the meaning of "summary" in
the subculture of python lists is different from the one I know.

> Can you please wait for a PEP? Brett Canon and Ethan Furman are
> working on a PEP. So please give them time to write it.

I wonder what happened there...

> The PEP should summarize the discussion and help a lot to make
> concrete progress on the design (avoid restarting to discuss the same
> points forever). I don't expect that more emails would add anything at
> the current state of the discussion.

Again, agreed, and this part makes me feel relieved. Personally, I got
tired of the discussion a long time ago, but felt it had to be
finished.

> I think that we have enough other topics to discuss in the meanwhile ;-)

No doubt about that.

> FYI there is already an article about fspath/pathlib on LWN. Here is a
> free link until the article is freely accessible:
>
> "Python looks at paths" By Jake Edge (April 13, 2016)
> https://lwn.net/SubscriberLink/683350/4f52334af09653c8/

Wow. Wasn't expecting that. A whole story about the notorious "path
discussions"! (well, up to some date). Anyway, the beginning seems
fairly accurate, but then, among other things, it fails to mention
this for example:

https://mail.python.org/pipermail/python-ideas/2016-March/039179.html
https://mail.python.org/pipermail/python-ideas/2016-April/039595.html

Since I did not get any responses to that suggestion, it felt like a
dead end, and I continued experimenting with other things and ended up
taking the approach of "subclassing path-types from str gives more
complete pathlib support, but the objects should not pretend to be
strings in every way". By the way, I even implemented this, which I
suppose I failed to mention. Admittedly, it became a little awkward in
the end, but the main point was to provide a smooth transition from a
str world to a PurePath-subclass world (as opposed to a discrete one
like Py3k).

While I was working on that, the discussions on -dev seemed to have
reopened the gate at exactly that 'dead end' I mentioned before, and
had started to step through it.

-Koos

>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

From ethan at stoneleaf.us  Wed Apr 20 11:58:50 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 20 Apr 2016 08:58:50 -0700
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
Message-ID: <5717A73A.3070106@stoneleaf.us>

On 04/20/2016 04:52 AM, Victor Stinner wrote:

> FYI there is already an article about fspath/pathlib on LWN. Here is a
> free link until the article is freely accessible:
>
> "Python looks at paths" By Jake Edge (April 13, 2016)
> https://lwn.net/SubscriberLink/683350/4f52334af09653c8/

Nice article, thanks for sharing!

--
~Ethan~


From brett at python.org  Wed Apr 20 12:12:01 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 20 Apr 2016 16:12:01 +0000
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
 <57178AEF.7040205@stoneleaf.us>
 <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>
Message-ID: <CAP1=2W50YY=HPbtSfaeSnm-ps5eLz5hKi43AG-zNzYAr3oqBRw@mail.gmail.com>

On Wed, 20 Apr 2016 at 07:07 Chris Angelico <rosuav at gmail.com> wrote:

> On Wed, Apr 20, 2016 at 11:58 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> > On 04/20/2016 04:52 AM, Victor Stinner wrote:
> >
> >> Can you please wait for a PEP? Brett Canon and Ethan Furman are
> >> working on a PEP.
>

I was actually going to send this email when I got in to work today, but
Victor and timezones beat me to it. :)


> >
> >
> > Actually, Brett Canon and Chris Angelico.
>
> I thought just Brett; my half of the proposal (the generic
> "string-like" protocol) was withdrawn as being too broad in scope for
> the justifying use-cases.
>
> Brett, your turn. :)
>

I thought Chris and I w/ Ethan helping with coding, but if it's just me for
the PEP then that's fine; luckily my firefighter gear is well-worn:
https://goo.gl/photos/R8oWdLE45d99ebaw8

I'll try to get a PEP draft written and posted prior to PyCon US. I will
reply to any dangling comments/issues that have appeared overnight to close
those threads, but otherwise I will start ignoring all discussions so I can
focus on the PEP. Everyone can now consider themselves spared from any
further path-related discussions. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160420/c8a92471/attachment.html>

From victor.stinner at gmail.com  Wed Apr 20 12:34:54 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 20 Apr 2016 18:34:54 +0200
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAP1=2W50YY=HPbtSfaeSnm-ps5eLz5hKi43AG-zNzYAr3oqBRw@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
 <57178AEF.7040205@stoneleaf.us>
 <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>
 <CAP1=2W50YY=HPbtSfaeSnm-ps5eLz5hKi43AG-zNzYAr3oqBRw@mail.gmail.com>
Message-ID: <CAMpsgwaGnwWKv6UbyHsnoNAJZz1Cfhdcwd2-xGwgTyQg4vj4eQ@mail.gmail.com>

Hi,

2016-04-20 18:12 GMT+02:00 Brett Cannon <brett at python.org>:
>> >> Can you please wait for a PEP? Brett Canon and Ethan Furman are
>> >> working on a PEP.
>
> I was actually going to send this email when I got in to work today, but
> Victor and timezones beat me to it. :)

Ha ha, bitten by the french connection!

> I thought Chris and I w/ Ethan helping with coding, but if it's just me for
> the PEP then that's fine; luckily my firefighter gear is well-worn:
> https://goo.gl/photos/R8oWdLE45d99ebaw8

LOL, it seems appropriate for this topic...

> I'll try to get a PEP draft written and posted prior to PyCon US. I will
> reply to any dangling comments/issues that have appeared overnight to close
> those threads, but otherwise I will start ignoring all discussions so I can
> focus on the PEP. Everyone can now consider themselves spared from any
> further path-related discussions. :)

I hesitated to propose to create a fspath-sig mailing list, but I suck
at humor and so I skipped this joke in my email ;-)

Victor

From k7hoven at gmail.com  Wed Apr 20 13:51:16 2016
From: k7hoven at gmail.com (Koos Zevenhoven)
Date: Wed, 20 Apr 2016 20:51:16 +0300
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAMpsgwaGnwWKv6UbyHsnoNAJZz1Cfhdcwd2-xGwgTyQg4vj4eQ@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
 <57178AEF.7040205@stoneleaf.us>
 <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>
 <CAP1=2W50YY=HPbtSfaeSnm-ps5eLz5hKi43AG-zNzYAr3oqBRw@mail.gmail.com>
 <CAMpsgwaGnwWKv6UbyHsnoNAJZz1Cfhdcwd2-xGwgTyQg4vj4eQ@mail.gmail.com>
Message-ID: <CAMiohoi36jQqGWAwPREtS1QFdd09fzTa2d0_xADagvBn7R2QXg@mail.gmail.com>

On Wed, Apr 20, 2016 at 7:34 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
>
> 2016-04-20 18:12 GMT+02:00 Brett Cannon <brett at python.org>:
>>
>> I thought Chris and I w/ Ethan helping with coding, but if it's just me for
>> the PEP then that's fine;

Well, just in case you didn't notice this on python-ideas, I offered
to work on the PEP in case there turns out to be one. This was when
Guido had asked if there is going to be a PEP, in response to my "Type
hinting for path-related functions" email. That offer is certainly
still valid.

>> luckily my firefighter gear is well-worn:
>> https://goo.gl/photos/R8oWdLE45d99ebaw8
>
> LOL, it seems appropriate for this topic...

It sure has been flammable XD

>> I'll try to get a PEP draft written and posted prior to PyCon US. I will
>> reply to any dangling comments/issues that have appeared overnight to close
>> those threads, but otherwise I will start ignoring all discussions so I can
>> focus on the PEP. Everyone can now consider themselves spared from any
>> further path-related discussions. :)

Yes, going from endless discussion to PEP seems like a very healthy
direction at this point.

> I hesitated to propose to create a fspath-sig mailing list, but I suck
> at humor and so I skipped this joke in my email ;-)

You did not have to tell that joke, the joke was present all the time ;).

> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

From brett at python.org  Wed Apr 20 14:21:30 2016
From: brett at python.org (Brett Cannon)
Date: Wed, 20 Apr 2016 18:21:30 +0000
Subject: [Python-Dev] Discussion on fspath: please wait for a PEP
In-Reply-To: <CAMiohoi36jQqGWAwPREtS1QFdd09fzTa2d0_xADagvBn7R2QXg@mail.gmail.com>
References: <CAMpsgwYkr-bZcmOZJeMMgwETJiK3z91yv7rn7JH5zMWHrZ7ejA@mail.gmail.com>
 <57178AEF.7040205@stoneleaf.us>
 <CAPTjJmq-bqPi=V7jv2zXuo93modC__h1MDhXq95qX8GfKNVR0A@mail.gmail.com>
 <CAP1=2W50YY=HPbtSfaeSnm-ps5eLz5hKi43AG-zNzYAr3oqBRw@mail.gmail.com>
 <CAMpsgwaGnwWKv6UbyHsnoNAJZz1Cfhdcwd2-xGwgTyQg4vj4eQ@mail.gmail.com>
 <CAMiohoi36jQqGWAwPREtS1QFdd09fzTa2d0_xADagvBn7R2QXg@mail.gmail.com>
Message-ID: <CAP1=2W6_umhqtO-ogxX8uW7v9FXfUrQ6SmpRW4+ShyBKhFc2vQ@mail.gmail.com>

On Wed, 20 Apr 2016 at 10:51 Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Wed, Apr 20, 2016 at 7:34 PM, Victor Stinner
> <victor.stinner at gmail.com> wrote:
> >
> > 2016-04-20 18:12 GMT+02:00 Brett Cannon <brett at python.org>:
> >>
> >> I thought Chris and I w/ Ethan helping with coding, but if it's just me
> for
> >> the PEP then that's fine;
>
> Well, just in case you didn't notice this on python-ideas, I offered
> to work on the PEP in case there turns out to be one. This was when
> Guido had asked if there is going to be a PEP, in response to my "Type
> hinting for path-related functions" email. That offer is certainly
> still valid.
>

I'm going to host the draft PEP at
https://github.com/brettcannon/path-pep/blob/master/pep-0NNN.rst . Let me
get a basic first draft done so there is something to work off of and if
you still want to be co-author I'll be more than happy to add you as a
co-author to help me finish it!

-Brett


>
> >> luckily my firefighter gear is well-worn:
> >> https://goo.gl/photos/R8oWdLE45d99ebaw8
> >
> > LOL, it seems appropriate for this topic...
>
> It sure has been flammable XD
>
> >> I'll try to get a PEP draft written and posted prior to PyCon US. I will
> >> reply to any dangling comments/issues that have appeared overnight to
> close
> >> those threads, but otherwise I will start ignoring all discussions so I
> can
> >> focus on the PEP. Everyone can now consider themselves spared from any
> >> further path-related discussions. :)
>
> Yes, going from endless discussion to PEP seems like a very healthy
> direction at this point.
>
> > I hesitated to propose to create a fspath-sig mailing list, but I suck
> > at humor and so I skipped this joke in my email ;-)
>
> You did not have to tell that joke, the joke was present all the time ;).
>
> > Victor
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160420/8f22643a/attachment.html>

From victor.stinner at gmail.com  Wed Apr 20 19:34:55 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 21 Apr 2016 01:34:55 +0200
Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3)
In-Reply-To: <CAMpsgwa_==3yMjNWWFOM=xZi1Zeg_TW3gNJWf6sc1XmbzZofrQ@mail.gmail.com>
References: <CAMpsgwa_==3yMjNWWFOM=xZi1Zeg_TW3gNJWf6sc1XmbzZofrQ@mail.gmail.com>
Message-ID: <CAMpsgwbh_pwz9Cv1x7NZMC_GhKJgUfPsW6Xv0zG3LHt4BgtjGw@mail.gmail.com>

Hi,

Guido van Rossum and Jim J. Jewett suggested me to *not require* to
always increase the dict version if a dict method does not modify its
content. I modified the Changes section to only require that the
version is increased when the dictionary content is modified.

I also explained the nice side effect of having an unique identifier
for two empty dictionaries: it avoids a strong reference when checking
if a namespace (dictionary) was replaced. Yury Selivanov's opcode
cache uses this property to avoid a strong refence on builtin and
global namespaces. It's also a written reply to Armin Rigo's
suggestion to use the version 0 for empty dictionaries (new empty dict
or for dict.clear()).

The modified Changes section:

Changes
=======

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global
dictionary version.

Each time a dictionary is created, the global version is incremented and
the dictionary version is initialized to the global version.

Each time the dictionary content is modified, the global version must be
incremented and copied to the dictionary version. Dictionary methods
which can modify its content:

* ``clear()``
* ``pop(key)``
* ``popitem()``
* ``setdefault(key, value)``
* ``__delitem__(key)``
* ``__setitem__(key, value)``
* ``update(...)``

The choice of increasing or not the version when a dictionary method
does not change its content is left to the Python implementation. A
Python implementation can decide to not increase the version to avoid
dictionary lookups in guards. Examples of cases when dictionary methods
don't modify its content:

* ``clear()`` if the dict is already empty
* ``pop(key)`` if the key does not exist
* ``popitem()`` if the dict is empty
* ``setdefault(key, value)`` if the key already exists
* ``__delitem__(key)`` if the key does not exist
* ``__setitem__(key, value)`` if the new value is identical to the
  current value
* ``update()`` if called without argument or if new values are identical
  to current values

Setting a key to a new value equals to the old value is also considered
as an operation modifying the dictionary content.

Two different empty dictionaries must have a different version to be
able to identify a dictionary just by its version. It allows to verify
in a guard that a namespace was not replaced without storing a strong
reference to the dictionary. Using a borrowed reference does not work:
if the old dictionary is destroyed, it is possible that a new dictionary
is allocated at the same memory address. By the way, dictionaries don't
support weak references.

The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make changes atomic.

Example using an hypothetical ``dict_get_version(dict)`` function::

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

The field is called ``ma_version_tag``, rather than ``ma_version``, to
suggest to compare it using ``version_tag == old_version_tag``, rather
than ``version <= old_version`` which becomes wrong after an integer
overflow.

--
Victor

From burkhardameier at gmail.com  Thu Apr 21 07:00:29 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Thu, 21 Apr 2016 04:00:29 -0700
Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our
 Python backup?
Message-ID: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>

Well,

Just a thought.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/011e6c1b/attachment.html>

From steve at pearwood.info  Thu Apr 21 07:32:16 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 21 Apr 2016 21:32:16 +1000
Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our
 Python backup?
In-Reply-To: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>
References: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>
Message-ID: <20160421113214.GH1819@ando.pearwood.info>

On Thu, Apr 21, 2016 at 04:00:29AM -0700, Burkhard Meier wrote:
> Well,
> 
> Just a thought.

I'm afraid I have no idea what you are referring to.


-- 
Steve

From burkhardameier at gmail.com  Thu Apr 21 07:41:29 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Thu, 21 Apr 2016 04:41:29 -0700
Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our
 Python backup?
In-Reply-To: <20160421113214.GH1819@ando.pearwood.info>
References: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>
 <20160421113214.GH1819@ando.pearwood.info>
Message-ID: <CACKxkAx-XeKyHYkf5M1exkZJKCggc3w+fexpsz-y6EiF5_ohpw@mail.gmail.com>

Don't be afraid.

This is just CEO talk.

Let's imagine Python without a leader.

All commercial companies...well ... are we free?

Burkhard



On Thursday, April 21, 2016, Steven D'Aprano <steve at pearwood.info> wrote:

> O Thu, Apr 21, 2016 at 04:00:29AM -0700, Burkhard Meier wrote:
> > Well,
> >
> > Just a thought.
>
> I'm afraid I have no idea what you are referring to.
>
>
> --
> Steve
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <javascript:;>
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/burkhardameier%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/72228652/attachment.html>

From burkhardameier at gmail.com  Thu Apr 21 07:54:32 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Thu, 21 Apr 2016 04:54:32 -0700
Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it may
 well be...
Message-ID: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>

Please do allow me to share my humble experiences of being a software
professional on a Windows platform.

Almost 20 years.

You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin'
dude kicked me out 5 times in one sole eve,

Maybe this is just *me*..

You know what: I did have my time with this *open source community*...

I was just asking a sincere question.

C'mon

This was rather very ridiculous.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/556e4bd0/attachment.html>

From brian at python.org  Thu Apr 21 08:36:30 2016
From: brian at python.org (Brian Curtin)
Date: Thu, 21 Apr 2016 08:36:30 -0400
Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it
 may well be...
In-Reply-To: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>
References: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>
Message-ID: <CAD+XWwppZw71K_=_d-EHCFD=+P3SFtoibRCYLoT4Y6oVbPjjQg@mail.gmail.com>

On Thursday, April 21, 2016, Burkhard Meier <burkhardameier at gmail.com>
wrote:

> Please do allow me to share my humble experiences of being a software
> professional on a Windows platform.
>
> Almost 20 years.
>
> You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin'
> dude kicked me out 5 times in one sole eve,
>
> Maybe this is just *me*..
>
> You know what: I did have my time with this *open source community*...
>
> I was just asking a sincere question.
>
> C'mon
>
> This was rather very ridiculous.
>
>
>
As someone who spent many years as a Windows user and several years as a
contributor to the Windows build here, if you have constructive thoughts to
share on Python-on-Windows, please share them...but I can't decipher what
any of this message is actually about.

Additionally, you may want to try the python-list mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/19d00387/attachment-0001.html>

From rosuav at gmail.com  Thu Apr 21 08:43:42 2016
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 21 Apr 2016 22:43:42 +1000
Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our
 Python backup?
In-Reply-To: <CACKxkAx-XeKyHYkf5M1exkZJKCggc3w+fexpsz-y6EiF5_ohpw@mail.gmail.com>
References: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>
 <20160421113214.GH1819@ando.pearwood.info>
 <CACKxkAx-XeKyHYkf5M1exkZJKCggc3w+fexpsz-y6EiF5_ohpw@mail.gmail.com>
Message-ID: <CAPTjJmqR_N4frF+KHOh8uBGZyS1-ESQAfd7AQc_UaSdLJMX34Q@mail.gmail.com>

On Thu, Apr 21, 2016 at 9:41 PM, Burkhard Meier
<burkhardameier at gmail.com> wrote:
> Don't be afraid.
>
> This is just CEO talk.
>
> Let's imagine Python without a leader.
>
> All commercial companies...well ... are we free?
>

I still have no clue what you're talking about. Every project has a
leader. If Guido dies, goes insane [1], or gets bored with the Python
project, someone else can and will take over.

ChrisA
[1] More than usual, I mean. It has to be bad enough that we'd notice.

From ncoghlan at gmail.com  Thu Apr 21 09:13:38 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 Apr 2016 23:13:38 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7cdBDeVhDh1tRdeX5bp=kFH78KM+Mo+E7rq2XuQ5NR-qA@mail.gmail.com>

On 20 April 2016 at 13:16, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> What's left is DirEntry (and perhaps other producers of byte-oriented
> objects in os and os.path).  If they're currently using DirEntry,
> they're currently accessing .path.  Surely bytes users can continue
> doing that, even if we offer str users the advantage of new protocols?
>

The consuming functions aren't currently allowing DirEntry objects either
(since scandir is even newer than pathlib), so we want to allow both
pathlib and DirEntry objects with a single change to consuming functions.

I'd like to see that change in consuming functions be as simple as
possible: an unconditional "path = os._raw_fspath(path)" at the start of
their existing input processing

Those consuming functions fall into one of three categories:

1. They're bytes/str polymorphic
2. They're bytes only
3. They're str only

Whichever category they're in, their existing argument processing will be
readily able to cope with a polymorphic result from os._raw_fspath, since
that's no different from today, where the argument passed in may be bytes
or str and they need to handle that appropriately.

Having os.fspath(path) as a specifically str-only layer then gives
consuming functions in category 3 an alternative option, and encourages
category 3 functions and APIs (like pathlib) as the future default, without
getting in the way of the folks that need to mess about down in the low
level weeds of operating system interfaces.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/eac042ac/attachment.html>

From barry at python.org  Thu Apr 21 09:20:21 2016
From: barry at python.org (Barry Warsaw)
Date: Thu, 21 Apr 2016 09:20:21 -0400
Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our
 Python backup?
In-Reply-To: <CAPTjJmqR_N4frF+KHOh8uBGZyS1-ESQAfd7AQc_UaSdLJMX34Q@mail.gmail.com>
References: <CACKxkAyXqzLBMRLOKSPfZGR=xNHYKGrETPtczTcz2cD5Qg6ysg@mail.gmail.com>
 <20160421113214.GH1819@ando.pearwood.info>
 <CACKxkAx-XeKyHYkf5M1exkZJKCggc3w+fexpsz-y6EiF5_ohpw@mail.gmail.com>
 <CAPTjJmqR_N4frF+KHOh8uBGZyS1-ESQAfd7AQc_UaSdLJMX34Q@mail.gmail.com>
Message-ID: <20160421092021.23d381da@subdivisions.wooz.org>

On Apr 21, 2016, at 10:43 PM, Chris Angelico wrote:

>I still have no clue what you're talking about. Every project has a
>leader. If Guido dies, goes insane [1], or gets bored with the Python
>project, someone else can and will take over.

Fortunately, the Python Secret Underground (PSU) which most emphatically does
not exist, has a succession plan involving three questions and holy gr


From ncoghlan at gmail.com  Thu Apr 21 09:29:47 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 Apr 2016 23:29:47 +1000
Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and
 outputs for __fspath__ and os.fspath()
In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
References: <5709309D.8030007@stoneleaf.us>
 <CADiSq7e9ZeWqb1NXfOnMY1omeY4ypL+HiMcN8+3d2t0W=u3Dwg@mail.gmail.com>
 <570A7C67.3010304@stoneleaf.us>
 <CADiSq7ch9tE7B1O2c2a6=0rL_BMxLOZK01cJ=dtX3730i1NKYw@mail.gmail.com>
 <570BCE39.8090306@stoneleaf.us>
 <CAKJDb-PysN=VR80Nz=MbdCv9SVdXY97TQsC7514fG8bcKqq1Yg@mail.gmail.com>
 <570BDB17.5000601@stoneleaf.us>
 <CAP1=2W5ZBR9MiJsQzKqqYfhb26NQtzz-mMhYRJfn7O6mjeY09Q@mail.gmail.com>
 <570BECC6.1080708@stoneleaf.us>
 <CAP1=2W6zdoRVu64Uj0KdE2bqbfiZHpACJA+fJbw_v4SiS4fYoA@mail.gmail.com>
 <CAMpsgwZSRxrngNF9_fuZJ9UEAHkiqzD_DGo0jQ9vYTBH+erazA@mail.gmail.com>
 <570C12C2.9000602@stoneleaf.us>
 <CAMpsgwYezPynJyDbcVBSdfXifCOEYF0cL9Vbo-VViFY4kT8txg@mail.gmail.com>
 <570D1F26.5090800@stoneleaf.us>
 <CADiSq7f+smrMYSJ=LQ2M5_OaWU8QOXUar7iiXT92DQ1oxUyNyA@mail.gmail.com>
 <CACac1F_m-=vv_0Z6cjR=PU=cZeqArHJ4UDyYMKXddFD68by9bg@mail.gmail.com>
 <CADiSq7fk_MB4Pu_svrc9Rivp2gzAuyNF+Qg=r90F2hgqzj+PKQ@mail.gmail.com>
 <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com>
 <570E659C.8010108@stoneleaf.us>
 <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com>
 <CAP1=2W5fmpuh0T9x03j0JrR-3MAUY31n7Mg+G11o6ecRY-T6Fw@mail.gmail.com>
 <CAMpsgwa8aStrJ7tapmGR-6n2Q4ZaAoaenDovbtG0Rhm-+V8JZQ@mail.gmail.com>
 <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7d-hDpii=5MOL2Yzg4GzHLfT5rUPfXQarsb6hwY3o8v6g@mail.gmail.com>
 <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7dtABjpST+f8UhY8n5f95WeAkFYEz-Op-FUmRSC1xWpfQ@mail.gmail.com>
 <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp>
 <CADiSq7efdg5TbEMbTsmGdNe6pMJtZT_PkdQsH=njH0ae+4jUGg@mail.gmail.com>
 <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp>
 <CAMiohogkGF1W2n_75i8hf=7jF5jmjjzdXpRPvA9t6Z=Xo_4tBw@mail.gmail.com>
 <CAP1=2W6DMYTjM-qQA6SPp7idZLS6xMnnnn0_9XUCfCTz7F+g3A@mail.gmail.com>
 <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W49ARd5WBqGuk7EccahML6Ze4cvqhhh+Q42Z7aDTmKbOA@mail.gmail.com>
 <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp>
 <CAP1=2W4=rXouXP2bVLEzKppXMdVUHqq46FreRmUbgz=_uy9kwA@mail.gmail.com>
 <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7cCPxq_xeB44TtPvpY3K6QGd8iQZJVdyHqkHePs+EGR6A@mail.gmail.com>

On 20 April 2016 at 13:16, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> It's people who live in monolingual mono-encoding environments who
> will be using bytes successfully, and be resistent to costly changes
> that don't make their lives better.  But the bytes vs. text cost is
> inherent in using pathlib, so polymorphism doesn't help promote
> pathlib.  It might help promote use of os.scandir in bytes-oriented
> code, though I don't see that as a huge effect nor more than mildly
> desirable.  Is it?
>

Some of us are also interested in optimised network service development use
cases where UTF-8 already rules the world [1]. It's a vastly different
domain from desktop computing, and different even from traditional stateful
servers where the same instance may be kept running for years.

When "absolutely everything is UTF-8, and your system boundaries are
policed accordingly" is a valid assumption, then writing bytes level
network code is a far more viable option than when you're writing software
to give to other people to run in arbitrary environments (that's how Go is
able to get away with its "all system boundaries use UTF-8" approach - if
you're not prepared to meet that precondition, you don't choose to use Go
in the first place).

I think this is also why we're talking past each other - as a default, I
completely agree it makes sense to present a "str-only" API (that's where
my proposed fspath/_raw_fspath split came from). However, there really are
contexts where "our text is always stored as bytes, those bytes are always
UTF-8 encoded, and our software only needs to work on *nix systems" is a
reasonable approach, and those are the domains where being *able* to stay
entirely in the binary domain is actually a desirable characteristic,
rather than merely a tool for migrating from Python 2.

Cheers,
Nick.

[1] http://utf8everywhere.org/

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/71f35dba/attachment.html>

From neil at python.ca  Thu Apr 21 17:44:52 2016
From: neil at python.ca (Neil Schemenauer)
Date: Thu, 21 Apr 2016 14:44:52 -0700
Subject: [Python-Dev] obmalloc mmap/munmap thrashing
Message-ID: <20160421214452.GA22080@python.ca>

I was running Python 2.4.11 under strace and I noticed some odd
looking system calls:

mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
munmap(0x7f9848681000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
munmap(0x7f9848681000, 262144)          = 0
[... repeated a number of times ...]

Looking at obmalloc.c, there doesn't seem to be any high/low
watermark (hysteresis) associated with unallocating arenas.  Is that
true?  If so, does it seem prudent to implement something to avoid
this behavior?  It seems potentially expensive if you program is
running just at the threshold of needing another arena.

From tritium-list at sdamon.com  Thu Apr 21 17:55:48 2016
From: tritium-list at sdamon.com (Alexander Walters)
Date: Thu, 21 Apr 2016 17:55:48 -0400
Subject: [Python-Dev] obmalloc mmap/munmap thrashing
In-Reply-To: <20160421214452.GA22080@python.ca>
References: <20160421214452.GA22080@python.ca>
Message-ID: <57194C64.8030001@sdamon.com>

...is that a typo for 2.7.11?

On 4/21/2016 17:44, Neil Schemenauer wrote:
> I was running Python 2.4.11 under strace and I noticed some odd
> looking system calls:
>
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
> munmap(0x7f9848681000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
> munmap(0x7f9848681000, 262144)          = 0
> [... repeated a number of times ...]
>
> Looking at obmalloc.c, there doesn't seem to be any high/low
> watermark (hysteresis) associated with unallocating arenas.  Is that
> true?  If so, does it seem prudent to implement something to avoid
> this behavior?  It seems potentially expensive if you program is
> running just at the threshold of needing another arena.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com


From tim.peters at gmail.com  Thu Apr 21 17:59:20 2016
From: tim.peters at gmail.com (Tim Peters)
Date: Thu, 21 Apr 2016 16:59:20 -0500
Subject: [Python-Dev] obmalloc mmap/munmap thrashing
In-Reply-To: <20160421214452.GA22080@python.ca>
References: <20160421214452.GA22080@python.ca>
Message-ID: <CAExdVNmPSDsOpaz+3kpg-rwRsChgxw9YMOFm2pF3JsoqxsyjRQ@mail.gmail.com>

You may be interested in this seemingly related bug report:

    http://bugs.python.org/issue26601

[Neil Schemenauer <neil at python.ca>]
> I was running Python 2.4.11 under strace and I noticed some odd
> looking system calls:
>
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
> munmap(0x7f9848681000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000
> munmap(0x7f9848681000, 262144)          = 0
> [... repeated a number of times ...]
>
> Looking at obmalloc.c, there doesn't seem to be any high/low
> watermark (hysteresis) associated with unallocating arenas.  Is that
> true?  If so, does it seem prudent to implement something to avoid
> this behavior?  It seems potentially expensive if you program is
> running just at the threshold of needing another arena.

From chris.barker at noaa.gov  Thu Apr 21 18:43:08 2016
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 21 Apr 2016 15:43:08 -0700
Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it
 may well be...
In-Reply-To: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>
References: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>
Message-ID: <CALGmxEJWL5HrGTKEhNJz0pHyyOfURmSrGMFvS5z-hnEipOnptA@mail.gmail.com>

 I'm really confused -- you had a handful of very positive responses to
your offer to help with Python on Windows.

Then a couple off the cuff remarks (at least one of which was serious)
about what is often known as "the bus factor":

But I think you may want to take into account the history here. This has
been talked about A LOT in the Python community for years -- so we may be a
bit blase about it. Note that Wikipedia's page on the bus factor:

https://en.wikipedia.org/wiki/Bus_factor

"""An early instance of this sort of query was when Michael McLay publicly
asked, in 1994, what would happen to the Python language
<https://en.wikipedia.org/wiki/Python_(programming_language)> if Guido van
Rossum <https://en.wikipedia.org/wiki/Guido_van_Rossum> were hit by a bus.
[8] <https://en.wikipedia.org/wiki/Bus_factor#cite_note-8>"""

So this has been very, very well hashed out in the Python community.

And a quick look at the existence of this list, the messages on it, and the
source repo will tell you that Python is in no way a personal project of
one person. (not to mentions the PSF)

I think the lessons here are:

- don't be too sensitive

and, important for every open source community:

- your comments and questions will be taken far more seriously if you have
done your homework.

-CHB


On Thu, Apr 21, 2016 at 4:54 AM, Burkhard Meier <burkhardameier at gmail.com>
wrote:

> Please do allow me to share my humble experiences of being a software
> professional on a Windows platform.
>
> Almost 20 years.
>
> You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin'
> dude kicked me out 5 times in one sole eve,
>
> Maybe this is just *me*..
>
> You know what: I did have my time with this *open source community*...
>
> I was just asking a sincere question.
>
> C'mon
>
> This was rather very ridiculous.
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160421/3ee00972/attachment.html>

From erik.m.bray at gmail.com  Fri Apr 22 09:09:53 2016
From: erik.m.bray at gmail.com (Erik Bray)
Date: Fri, 22 Apr 2016 15:09:53 +0200
Subject: [Python-Dev] Issue with DLL import library installation in Cygwin
Message-ID: <CAOTD34ZFKnUC8VKZD5wvnyVLGmYPFDtf4XQYbC0-nz4ycYq6RA@mail.gmail.com>

Hi all,

I've been working on compiling/installing Python on Cygwin and have
hit upon an odd issue in the Makefile that seems to have been around
for as long as there's been Cygwin support in it.

When building Python on Cygwin, both a libpython-X.Y.dll and a
libpython-X.Y.dll.a are created.  The latter is an "import library"
consisting of stubs for functions in the DLL so that it can be linked
to statically when building, for example, extension modules.

The odd bit is that in the altbininstall target (see [1]) if the
$(DLLLIBRARY) variable is defined then only it is installed, while
$(LDLIBRARY) (which in this cases references the import library) is
*not* installed, except in $(prefix)/lib/pythonX.Y/config, which is
not normally on the linker search path, or even included by
python-config --ldflags.  Therefore static linking to libpython fails,
unless the search path is explicitly modified, or a symlink is created
from $(prefix)/lib/pythonX.Y/config/libpython.dll.a to $(prefix)/lib.

In fact Cygwin's own package for Python manually creates the latter
symlink in its install script.  But it's not clear why Python's
Makefile doesn't install this file in the first place.  In other
words, why not install $LDLIBRARY regardless?

Thanks,
Erik


[1] https://hg.python.org/cpython/file/496e094f4734/Makefile.pre.in#l1097

From victor.stinner at gmail.com  Fri Apr 22 10:46:12 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 22 Apr 2016 16:46:12 +0200
Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance
In-Reply-To: <CAMpsgwabMLVwFPGa=6si9vfeZLNpU2HWWcDL_y5WavL0CJG7+w@mail.gmail.com>
References: <CAMpsgwaB43ygVr8-fY0obW+BJ4UORKKJeDXMR6r46thXQAVrnQ@mail.gmail.com>
 <56B3254F.7020605@egenix.com>
 <CAMpsgwbcac2ZNqeeCUNZpbVGyzb_vg8cdRZbFuwkZAzn9X7x1A@mail.gmail.com>
 <56B34A1E.4010501@egenix.com>
 <CAMpsgwbtfrSeiUbuGa3=eVUKqUdwC0==-Z++M+Rhu_mNRq56qg@mail.gmail.com>
 <56B35AB5.5090308@egenix.com>
 <CAMpsgwYNvSv44t2qG8bypcVA2T5YOSkyJkXsH9Ea2ZmkdNkHxw@mail.gmail.com>
 <CAMpsgwaazYP+nfhtmwNgBx2jN6Y1jTySe=yJwG=ZFkjwNy7QWQ@mail.gmail.com>
 <56BDDEA3.2060702@egenix.com>
 <CAMpsgwYaEw6vO7GrK82N4QVWupSk1cAPVyZPMcqTwn-xzn89CQ@mail.gmail.com>
 <CAMpsgwabMLVwFPGa=6si9vfeZLNpU2HWWcDL_y5WavL0CJG7+w@mail.gmail.com>
Message-ID: <CAMpsgwa5pvvSXMSGC3GizhNwdQYThRVtmSD9G+E00PT5QVmkXg@mail.gmail.com>

Hi,

My pull request has been merged into numpy. numpy now uses
PyMem_RawMalloc() rather than PyMem_Malloc() since it uses the memory
allocator without holding the GIL:
https://github.com/numpy/numpy/pull/7404

It was proposed to modify numpy to hold the GIL. Maybe it will be done later.

It means that there are no more C extensions known to not use
correctly Python memory allocators. So I pushed my change in CPython
to use the pymalloc memory allocator in PyMem_Malloc():
https://hg.python.org/cpython/rev/68b2a43d8653

I documented that porting C extensions to Python 3.6 require to run
tests with PYTHONMALLOC=debug. This environment variable enables
checks at runtime to validate the usage of Python memory allocators,
including checks on the GIL. PYTHONMALLOC=debug and the check on the
GIL are new in Python 3.6.

By the way, I modified the code to log the fatal error. if a buffer
overflow/underflow is detected in a free function like PyObject_Free()
and tracemalloc is enabled, the traceback where the memory block was
allocated is now displayed:
https://docs.python.org/dev/whatsnew/3.6.html#pythonmalloc-environment-variable

Moreover, the warning logger now also log where file, socket, etc.
were allocated on ResourceWarning:
https://docs.python.org/dev/whatsnew/3.6.html#warnings

It looks like Python 3.6 will help developers ;-)

Victor

2016-04-20 1:33 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
> Ping? Is someone still opposed to my change #26249 "Change
> PyMem_Malloc to use pymalloc allocator"? If no, I think that I will
> push my change.
>
> My change only changes two lines, so it can be easily reverted before
> CPython 3.6 if we detect major issues in third-party extensions. And
> maybe it's better to push such change today to get more time to play
> with it, than pushing it late in the development of CPython 3.6.
>
> The new PYTHONMALLOC=debug feature allows to quickly and easily check
> the usage of the PyMem_Malloc() API, even if Python is compiled in
> release mode.
>
> I checked multiple Python extensions written in C. I only found one
> bug in numpy and I sent a patch (not merged yet).
>
> victor
>
> 2016-03-15 0:19 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:
>> 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
>>>>> If your program has bugs, you can use a debug build of Python 3.5 to
>>>>> detect misusage of the API.
>>>
>>> Yes, but people don't necessarily do this, e.g. I have
>>> for a very long time ignored debug builds completely
>>> and when I started to try them, I found that some of the
>>> things I had been doing with e.g. free list implementations
>>> did not work in debug builds.
>>
>> I just added support for debug hooks on Python memory allocators on
>> Python compiled in *release* mode. Set the environment variable
>> PYTHONMALLOC to debug to try with Python 3.6.
>>
>> I added a check on PyObject_Malloc() debug hook to ensure that the
>> function is called with the GIL held. I opened an issue to add a
>> similar check on PyMem_Malloc():
>> https://bugs.python.org/issue26563
>>
>>
>>> Yes, but those are part of the stdlib. You'd need to check
>>> a few C extensions which are not tested as part of the stdlib,
>>> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom
>>> types in C since these will often need the memory management
>>> APIs).
>>>
>>> It may also be a good idea to check wrapper generators such
>>> as cython, swig, cffi, etc.
>>
>> I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
>>
>> I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL:
>> https://github.com/numpy/numpy/pull/7404
>>
>> Except of this bug, all other tests pass with PyMem_Malloc() using
>> pymalloc and all debug checks.
>>
>> Victor

From status at bugs.python.org  Fri Apr 22 12:08:42 2016
From: status at bugs.python.org (Python tracker)
Date: Fri, 22 Apr 2016 18:08:42 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20160422160842.D4D7956645@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2016-04-15 - 2016-04-22)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    5491 ( +2)
  closed 33095 (+56)
  total  38586 (+58)

Open issues with patches: 2384 


Issues opened (41)
==================

#26773: Shelve works inconsistently when carried over to child process
http://bugs.python.org/issue26773  opened by Paul Ellenbogen

#26774: Elide Py_atomic fences when WITH_THREAD is disabled?
http://bugs.python.org/issue26774  opened by larry

#26776: Determining the failure of C API call is ambiguous
http://bugs.python.org/issue26776  opened by serhiy.storchaka

#26779: pdb continue followed by an exception in the same frame shows 
http://bugs.python.org/issue26779  opened by Sriram Rajagopalan

#26781: os.walk max_depth
http://bugs.python.org/issue26781  opened by palaviv

#26786: bdist_msi duplicates directories with names in ALL CAPS to a b
http://bugs.python.org/issue26786  opened by Ivan.Pozdeev

#26787: test_distutils fails when configured --with-lto
http://bugs.python.org/issue26787  opened by gregory.p.smith

#26788: test_gdb fails all tests on a profile-opt build configured --w
http://bugs.python.org/issue26788  opened by gregory.p.smith

#26789: Please do not log during shutdown
http://bugs.python.org/issue26789  opened by smurfix

#26790: bdist_msi package duplicates everything to a bogus location wh
http://bugs.python.org/issue26790  opened by Ivan.Pozdeev

#26791: shutil.move fails to move symlink (Invalid cross-device link)
http://bugs.python.org/issue26791  opened by Unode

#26792: docstrings of runpy.run_{module,path} are rather sparse
http://bugs.python.org/issue26792  opened by Antony.Lee

#26793: uuid causing thread issues when forking using os.fork py3.4+
http://bugs.python.org/issue26793  opened by Steven Adams

#26794: curframe can be None in pdb.py
http://bugs.python.org/issue26794  opened by Jacek.Pliszka

#26796: BaseEventLoop.run_in_executor shouldn't specify max_workers fo
http://bugs.python.org/issue26796  opened by Hans Lawrenz

#26797: Segafault in _PyObject_Alloc
http://bugs.python.org/issue26797  opened by yselivanov

#26798: add BLAKE2 to hashlib
http://bugs.python.org/issue26798  opened by Zooko.Wilcox-O'Hearn

#26800: Don't accept bytearray as filenames part 2
http://bugs.python.org/issue26800  opened by pjenvey

#26801: Fix shutil.get_terminal_size() to catch AttributeError
http://bugs.python.org/issue26801  opened by ebarry

#26803: syslog logging handler fails with address in unix abstract nam
http://bugs.python.org/issue26803  opened by xdegaye

#26804: Prioritize lowercase proxy variables in urllib.request
http://bugs.python.org/issue26804  opened by frispete

#26806: IDLE not displaying RecursionError tracebacks
http://bugs.python.org/issue26806  opened by terry.reedy

#26807: mock_open()().readline() fails at EOF
http://bugs.python.org/issue26807  opened by rbcollins

#26809: `string` exposes ChainMap from `collections`
http://bugs.python.org/issue26809  opened by leewz

#26810: inconsistent garbage collector behavior across platforms when 
http://bugs.python.org/issue26810  opened by unsec treedee

#26811: segfault due to null pointer in tuple
http://bugs.python.org/issue26811  opened by random832

#26812: ExtendedInterpolation drops user-defined 'vars' during _interp
http://bugs.python.org/issue26812  opened by yab-arz

#26814: [WIP] Add a new _PyObject_FastCall() function which avoids the
http://bugs.python.org/issue26814  opened by haypo

#26815: SIGBUS in test_ssl.test_dealloc_warn() on "AMD64 FreeBSD 10.0 
http://bugs.python.org/issue26815  opened by haypo

#26816: Make concurrent.futures.Executor an abc
http://bugs.python.org/issue26816  opened by xiang.zhang

#26817: Docs for StringIO should link to io.BytesIO
http://bugs.python.org/issue26817  opened by guettli

#26818: trace CLI doesn't respect -s option
http://bugs.python.org/issue26818  opened by berker.peksag

#26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br
http://bugs.python.org/issue26819  opened by Fulvio Esposito

#26820: Prevent uses of format string based PyObject_Call* that do not
http://bugs.python.org/issue26820  opened by josh.r

#26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu
http://bugs.python.org/issue26822  opened by serhiy.storchaka

#26823: Shrink recursive tracebacks
http://bugs.python.org/issue26823  opened by ebarry

#26824: Make some macros use Py_TYPE
http://bugs.python.org/issue26824  opened by xiang.zhang

#26826: Expose new copy_file_range() syscal in os module and use it to
http://bugs.python.org/issue26826  opened by StyXman

#26827: PyObject *PyInit_myextention -> PyMODINIT_FUNC PyInit_myextent
http://bugs.python.org/issue26827  opened by prinsherbert

#26828: Implement __length_hint__() on map() and filter() to optimize 
http://bugs.python.org/issue26828  opened by haypo

#26829: update docs: when creating classes a new dict is created for t
http://bugs.python.org/issue26829  opened by ethan.furman



Most recent 15 issues with no replies (15)
==========================================

#26829: update docs: when creating classes a new dict is created for t
http://bugs.python.org/issue26829

#26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br
http://bugs.python.org/issue26819

#26818: trace CLI doesn't respect -s option
http://bugs.python.org/issue26818

#26817: Docs for StringIO should link to io.BytesIO
http://bugs.python.org/issue26817

#26812: ExtendedInterpolation drops user-defined 'vars' during _interp
http://bugs.python.org/issue26812

#26794: curframe can be None in pdb.py
http://bugs.python.org/issue26794

#26792: docstrings of runpy.run_{module,path} are rather sparse
http://bugs.python.org/issue26792

#26790: bdist_msi package duplicates everything to a bogus location wh
http://bugs.python.org/issue26790

#26789: Please do not log during shutdown
http://bugs.python.org/issue26789

#26786: bdist_msi duplicates directories with names in ALL CAPS to a b
http://bugs.python.org/issue26786

#26779: pdb continue followed by an exception in the same frame shows 
http://bugs.python.org/issue26779

#26771: python-config.sh.in INCDIR does not match python version if ex
http://bugs.python.org/issue26771

#26767: Inconsistant error messages for failed attribute modification
http://bugs.python.org/issue26767

#26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam
http://bugs.python.org/issue26752

#26750: Mock autospec does not work with subclasses of property()
http://bugs.python.org/issue26750



Most recent 15 issues waiting for review (15)
=============================================

#26824: Make some macros use Py_TYPE
http://bugs.python.org/issue26824

#26823: Shrink recursive tracebacks
http://bugs.python.org/issue26823

#26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu
http://bugs.python.org/issue26822

#26818: trace CLI doesn't respect -s option
http://bugs.python.org/issue26818

#26816: Make concurrent.futures.Executor an abc
http://bugs.python.org/issue26816

#26814: [WIP] Add a new _PyObject_FastCall() function which avoids the
http://bugs.python.org/issue26814

#26811: segfault due to null pointer in tuple
http://bugs.python.org/issue26811

#26809: `string` exposes ChainMap from `collections`
http://bugs.python.org/issue26809

#26804: Prioritize lowercase proxy variables in urllib.request
http://bugs.python.org/issue26804

#26803: syslog logging handler fails with address in unix abstract nam
http://bugs.python.org/issue26803

#26801: Fix shutil.get_terminal_size() to catch AttributeError
http://bugs.python.org/issue26801

#26796: BaseEventLoop.run_in_executor shouldn't specify max_workers fo
http://bugs.python.org/issue26796

#26787: test_distutils fails when configured --with-lto
http://bugs.python.org/issue26787

#26786: bdist_msi duplicates directories with names in ALL CAPS to a b
http://bugs.python.org/issue26786

#26781: os.walk max_depth
http://bugs.python.org/issue26781



Top 10 most discussed issues (10)
=================================

#26801: Fix shutil.get_terminal_size() to catch AttributeError
http://bugs.python.org/issue26801  16 msgs

#26814: [WIP] Add a new _PyObject_FastCall() function which avoids the
http://bugs.python.org/issue26814  16 msgs

#26824: Make some macros use Py_TYPE
http://bugs.python.org/issue26824  15 msgs

#26809: `string` exposes ChainMap from `collections`
http://bugs.python.org/issue26809  14 msgs

#26601: Use new madvise()'s MADV_FREE on the private heap
http://bugs.python.org/issue26601  13 msgs

#26803: syslog logging handler fails with address in unix abstract nam
http://bugs.python.org/issue26803  13 msgs

#26811: segfault due to null pointer in tuple
http://bugs.python.org/issue26811  13 msgs

#26793: uuid causing thread issues when forking using os.fork py3.4+
http://bugs.python.org/issue26793  12 msgs

#26804: Prioritize lowercase proxy variables in urllib.request
http://bugs.python.org/issue26804  10 msgs

#26058: PEP 509: Add ma_version to PyDictObject
http://bugs.python.org/issue26058   9 msgs



Issues closed (52)
==================

#4806: Function calls taking a generator as star argument can mask Ty
http://bugs.python.org/issue4806  closed by martin.panter

#7694: DeprecationWarnings in distutils are pointless
http://bugs.python.org/issue7694  closed by berker.peksag

#8978: "tarfile.ReadError: file could not be opened successfully" if 
http://bugs.python.org/issue8978  closed by lars.gustaebel

#9317: Incorrect coverage file from trace test_pickle.py
http://bugs.python.org/issue9317  closed by berker.peksag

#10261: tarfile iterator without members caching
http://bugs.python.org/issue10261  closed by lars.gustaebel

#13876: Sporadic failure in test_socket: testRecvmsgEOF
http://bugs.python.org/issue13876  closed by berker.peksag

#15933: flaky test in test_datetime
http://bugs.python.org/issue15933  closed by berker.peksag

#17859: improve error message for saving ints to file
http://bugs.python.org/issue17859  closed by serhiy.storchaka

#18591: threading.Thread.run returning a result
http://bugs.python.org/issue18591  closed by berker.peksag

#20739: PEP 463 (except expression) implementation
http://bugs.python.org/issue20739  closed by berker.peksag

#21668: The select and time modules uses libm functions without linkin
http://bugs.python.org/issue21668  closed by haypo

#22625: When cross-compiling, don???t try to execute binaries
http://bugs.python.org/issue22625  closed by martin.panter

#22873: Re: SSLsocket.getpeercert - return ALL the fields of the certi
http://bugs.python.org/issue22873  closed by berker.peksag

#23029: test_warnings produces extra output in quiet mode
http://bugs.python.org/issue23029  closed by berker.peksag

#23251: mention in time.sleep() docs that it does not block other Pyth
http://bugs.python.org/issue23251  closed by berker.peksag

#24173: curses HOWTO/implementation disagreement
http://bugs.python.org/issue24173  closed by berker.peksag

#24838: tarfile.py: fix GNU and USTAR formats to properly handle paths
http://bugs.python.org/issue24838  closed by lars.gustaebel

#24922: assertWarnsRegex doesn't allow multiple warning messages
http://bugs.python.org/issue24922  closed by berker.peksag

#25314: Documentation: argparse's actions store_{true,false} default t
http://bugs.python.org/issue25314  closed by martin.panter

#25642: Setting maxsize breaks asyncio.JoinableQueue/Queue
http://bugs.python.org/issue25642  closed by berker.peksag

#25989: documentation version switcher is broken fro 2.6, 3.2, 3.3
http://bugs.python.org/issue25989  closed by berker.peksag

#26535: Minor typo in the docs for struct.unpack
http://bugs.python.org/issue26535  closed by martin.panter

#26615: Missing entry in WRAPPER_ASSIGNMENTS in update_wrapper's doc
http://bugs.python.org/issue26615  closed by berker.peksag

#26657: Directory traversal with http.server and SimpleHTTPServer on w
http://bugs.python.org/issue26657  closed by martin.panter

#26659: slice() leaks memory when part of a cycle
http://bugs.python.org/issue26659  closed by python-dev

#26717: wsgiref.simple_server: mojibake with cp1252 bytes in PATH_INFO
http://bugs.python.org/issue26717  closed by martin.panter

#26720: memoryview from BufferedWriter becomes garbage
http://bugs.python.org/issue26720  closed by martin.panter

#26745: Redundant code in _PyObject_GenericSetAttrWithDict
http://bugs.python.org/issue26745  closed by serhiy.storchaka

#26751: Possible bug in sorting algorithm
http://bugs.python.org/issue26751  closed by benjamin.peterson

#26755: Update version{added,changed} docs in devguide
http://bugs.python.org/issue26755  closed by berker.peksag

#26760: Document PyFrameObject
http://bugs.python.org/issue26760  closed by brett.cannon

#26763: Update PEP-8 regarding binary operators
http://bugs.python.org/issue26763  closed by gvanrossum

#26766: The result type of bytearray formatting is not stable
http://bugs.python.org/issue26766  closed by berker.peksag

#26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a
http://bugs.python.org/issue26770  closed by haypo

#26772: regex.ENHANCEMATCH crashes interpreter
http://bugs.python.org/issue26772  closed by SilentGhost

#26775: Improve test coverage on urllib.parse
http://bugs.python.org/issue26775  closed by orsenthil

#26777: test_asyncio: test_timeout_disable() fails randomly
http://bugs.python.org/issue26777  closed by haypo

#26778: More typo fixes
http://bugs.python.org/issue26778  closed by serhiy.storchaka

#26780: Illustrate both binary operator conventions in PEP-8
http://bugs.python.org/issue26780  closed by orsenthil

#26782: subprocess.__all__ incomplete on Windows
http://bugs.python.org/issue26782  closed by martin.panter

#26783: test_os.WalkTests.test_walk_topdown don't test fwalk and Bytes
http://bugs.python.org/issue26783  closed by serhiy.storchaka

#26784: regular expression problem at umlaut handling
http://bugs.python.org/issue26784  closed by serhiy.storchaka

#26785: repr of -nan value should contain the sign
http://bugs.python.org/issue26785  closed by mark.dickinson

#26795: Fix PEP 344 Python version
http://bugs.python.org/issue26795  closed by SilentGhost

#26799: gdb support fails with "Invalid cast."
http://bugs.python.org/issue26799  closed by haypo

#26802: Avoid copy in call_function_var when no extra stack args are p
http://bugs.python.org/issue26802  closed by serhiy.storchaka

#26805: Refer to types.SimpleNamespace in namedtuple documentation
http://bugs.python.org/issue26805  closed by paul.moore

#26808: wsgiref.simple_server breaks unicode in URIs
http://bugs.python.org/issue26808  closed by martin.panter

#26813: Wrong Japanese translation of "Adverb" on Documentation
http://bugs.python.org/issue26813  closed by benjamin.peterson

#26821: array module "minimum size in bytes" table is wrong for int/lo
http://bugs.python.org/issue26821  closed by georg.brandl

#26825: Variable defined in exec(code) unreachable inside function cal
http://bugs.python.org/issue26825  closed by eryksun

#1612012: builtin compile() doc needs PyCF_DONT_IMPLY_DEDENT
http://bugs.python.org/issue1612012  closed by berker.peksag

From burkhardameier at gmail.com  Fri Apr 22 16:12:19 2016
From: burkhardameier at gmail.com (Burkhard Meier)
Date: Fri, 22 Apr 2016 13:12:19 -0700
Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it
 may well be...
In-Reply-To: <CALGmxEJWL5HrGTKEhNJz0pHyyOfURmSrGMFvS5z-hnEipOnptA@mail.gmail.com>
References: <CACKxkAxsB9hfCa_52THge=voZdPCc018jkboBTnK-SQr8siqcw@mail.gmail.com>
 <CALGmxEJWL5HrGTKEhNJz0pHyyOfURmSrGMFvS5z-hnEipOnptA@mail.gmail.com>
Message-ID: <CACKxkAyJQcK_s6vafCnp+ac9SzcPUM6eCSZvJSnNw0NvrADG7Q@mail.gmail.com>

Ok. no more ellipses...what I was trying to share is an unhappy experience
I had with the open source Linux community.

I am sure this will not happen on this Python Dev list of professionals.

Please ignore my comments.

>From now on I will focus on contributing to Python (especially on a Windows
platform) and not taking up valuable reading time.


Burkhard

On Thu, Apr 21, 2016 at 3:43 PM, Chris Barker <chris.barker at noaa.gov> wrote:

>
>  I'm really confused -- you had a handful of very positive responses to
> your offer to help with Python on Windows.
>
> Then a couple off the cuff remarks (at least one of which was serious)
> about what is often known as "the bus factor":
>
> But I think you may want to take into account the history here. This has
> been talked about A LOT in the Python community for years -- so we may be a
> bit blase about it. Note that Wikipedia's page on the bus factor:
>
> https://en.wikipedia.org/wiki/Bus_factor
>
> """An early instance of this sort of query was when Michael McLay publicly
> asked, in 1994, what would happen to the Python language
> <https://en.wikipedia.org/wiki/Python_(programming_language)> if Guido
> van Rossum <https://en.wikipedia.org/wiki/Guido_van_Rossum> were hit by a
> bus.[8] <https://en.wikipedia.org/wiki/Bus_factor#cite_note-8>"""
>
> So this has been very, very well hashed out in the Python community.
>
> And a quick look at the existence of this list, the messages on it, and
> the source repo will tell you that Python is in no way a personal project
> of one person. (not to mentions the PSF)
>
> I think the lessons here are:
>
> - don't be too sensitive
>
> and, important for every open source community:
>
> - your comments and questions will be taken far more seriously if you have
> done your homework.
>
> -CHB
>
>
> On Thu, Apr 21, 2016 at 4:54 AM, Burkhard Meier <burkhardameier at gmail.com>
> wrote:
>
>> Please do allow me to share my humble experiences of being a software
>> professional on a Windows platform.
>>
>> Almost 20 years.
>>
>> You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin'
>> dude kicked me out 5 times in one sole eve,
>>
>> Maybe this is just *me*..
>>
>> You know what: I did have my time with this *open source community*...
>>
>> I was just asking a sincere question.
>>
>> C'mon
>>
>> This was rather very ridiculous.
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160422/39a029fc/attachment.html>

From storchaka at gmail.com  Sat Apr 23 11:59:23 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sat, 23 Apr 2016 18:59:23 +0300
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
Message-ID: <nfg64s$qh8$1@ger.gmane.org>

On 13.04.16 19:33, Guido van Rossum wrote:
> Nice work. I think that for CPython, speed is much more important than
> memory use for the code. Disk space is practically free for anything
> smaller than a video. :-)

I collected statistics for use opcodes with different arguments during 
running CPython tests. Estimated size with using wordcode is 1.33 times 
less than with using current bytecode.

[1] http://comments.gmane.org/gmane.comp.python.ideas/38293


From xdegaye at gmail.com  Sun Apr 24 03:20:08 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Sun, 24 Apr 2016 09:20:08 +0200
Subject: [Python-Dev] support of the android platform
Message-ID: <571C73A8.1030908@gmail.com>

Starting with API level 21 (Android 5.0), the build of python3 with the
official android toolchains (that is, without resorting to external libraries
for wide character support) runs correctly.  With the set of patches described
in the patches/Makefile file at [1], the cpython test suite runs[2] on the
android x86 and armv7 emulators with only few errors[3].  Those errors are
listed with their corresponding error messages, this may give a raw idea of
the effort needed to support this platform.

Xavier

[1] https://bitbucket.org/xdegaye/pyona/src
[2] To reproduce these results, follow the instructions found in INSTALL
     at https://bitbucket.org/xdegaye/pyona/wiki/install
[3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite


From stefan at bytereef.org  Sun Apr 24 05:50:10 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Sun, 24 Apr 2016 09:50:10 +0000 (UTC)
Subject: [Python-Dev] support of the android platform
References: <571C73A8.1030908@gmail.com>
Message-ID: <loom.20160424T113853-795@post.gmane.org>

Xavier de Gaye <xdegaye <at> gmail.com> writes:
> Starting with API level 21 (Android 5.0), the build of python3 with the
> official android toolchains (that is, without resorting to external libraries
> for wide character support) runs correctly.  With the set of patches described
> in the patches/Makefile file at [1], the cpython test suite runs[2] on the
> android x86 and armv7 emulators with only few errors[3].  Those errors are
> listed with their corresponding error messages, this may give a raw idea of
> the effort needed to support this platform.
> 
> Xavier
> 
> [1] https://bitbucket.org/xdegaye/pyona/src
> [2] To reproduce these results, follow the instructions found in INSTALL
>      at https://bitbucket.org/xdegaye/pyona/wiki/install
> [3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite


This looks great, very clean!  As I understand the patches, the locale.h and
langinfo.h problems are solved.  Do you think the following issues on the
Python bug tracker could be closed?


http://bugs.python.org/issue20305
http://bugs.python.org/issue22747
http://bugs.python.org/issue17905


Stefan Krah








From raymond.hettinger at gmail.com  Sun Apr 24 15:45:15 2016
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sun, 24 Apr 2016 12:45:15 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <nfg64s$qh8$1@ger.gmane.org>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
 <nfg64s$qh8$1@ger.gmane.org>
Message-ID: <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>


> On Apr 23, 2016, at 8:59 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> 
> I collected statistics for use opcodes with different arguments during running CPython tests. Estimated size with using wordcode is 1.33 times less than with using current bytecode.
> 
> [1] http://comments.gmane.org/gmane.comp.python.ideas/38293

I think the word code patch should go in sooner rather than later.  Several of us have been through the patch and it is in pretty good shape (some parts still need work though).  The earlier this goes in, the more time we'll have to shake out any unexpected secondary effects.

perfect-is-the-enemy-of-good-ly yours,


Raymond


P.S. The patch is smaller, more tractable, and in better shape than the C version of OrderedDict was when it went in.

From victor.stinner at gmail.com  Sun Apr 24 16:16:35 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sun, 24 Apr 2016 22:16:35 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
 <nfg64s$qh8$1@ger.gmane.org> <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>
Message-ID: <CAMpsgwb5cXmEhxEg5nXQbO_+odPpoGd=YSX4S7aLnG6zX5iJHg@mail.gmail.com>

Hi Raymond,

2016-04-24 21:45 GMT+02:00 Raymond Hettinger <raymond.hettinger at gmail.com>:
> I think the word code patch should go in sooner rather than later.  Several of us have been through the patch and it is in pretty good shape (some parts still need work though).  The earlier this goes in, the more time we'll have to shake out any unexpected secondary effects.

Yury Selivanov and Serhiy Storchaka told me that they will review
shortly the patch. I give them one more week and then I will push the
patch.

I agree that the patch is in a good shape. I reviewed first versions
of the change. I pushed some minor and obvious changes. I also asked
to revert unrelated changes.

I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
single 16-bit operation. It should be easy to implement it later, but
I prefer to focus on changing the format of the bytecode.

Victor

From raymond.hettinger at gmail.com  Sun Apr 24 17:16:25 2016
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sun, 24 Apr 2016 14:16:25 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwb5cXmEhxEg5nXQbO_+odPpoGd=YSX4S7aLnG6zX5iJHg@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
 <nfg64s$qh8$1@ger.gmane.org> <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>
 <CAMpsgwb5cXmEhxEg5nXQbO_+odPpoGd=YSX4S7aLnG6zX5iJHg@mail.gmail.com>
Message-ID: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com>


> On Apr 24, 2016, at 1:16 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
> single 16-bit operation. It should be easy to implement it later, but
> I prefer to focus on changing the format of the bytecode.

Improving instruction decoding was the whole point and it was what kicked-off the work on the patch.  It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective.

The OPs original patch had already gotten this part done and it ran fine for me.


Raymond




From victor.stinner at gmail.com  Sun Apr 24 17:31:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sun, 24 Apr 2016 23:31:44 +0200
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
 <nfg64s$qh8$1@ger.gmane.org> <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>
 <CAMpsgwb5cXmEhxEg5nXQbO_+odPpoGd=YSX4S7aLnG6zX5iJHg@mail.gmail.com>
 <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com>
Message-ID: <CAMpsgwZvFzKtBqJ3Y0o-k_i+04+V=Lb+gO_8-mTBzdHn8chrRA@mail.gmail.com>

2016-04-24 23:16 GMT+02:00 Raymond Hettinger <raymond.hettinger at gmail.com>:
>> On Apr 24, 2016, at 1:16 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
>> single 16-bit operation. It should be easy to implement it later, but
>> I prefer to focus on changing the format of the bytecode.
>
> Improving instruction decoding was the whole point and it was what kicked-off the work on the patch.  It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective.
>
> The OPs original patch had already gotten this part done and it ran fine for me.

Oh wait, my phrasing is unclear. I do want optimize the (opcode,
oparg) fetch, I just suggested to split the patch in two parts, and
first review carefully the first part.

Victor

From raymond.hettinger at gmail.com  Mon Apr 25 02:51:51 2016
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sun, 24 Apr 2016 23:51:51 -0700
Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units
In-Reply-To: <CAMpsgwZvFzKtBqJ3Y0o-k_i+04+V=Lb+gO_8-mTBzdHn8chrRA@mail.gmail.com>
References: <CAMpsgwaaDW2ZcPWXqnzTNz19CN-vKw7=fUj==32HbdEh4+oEuA@mail.gmail.com>
 <CAP7+vJK5EdoNA6aQNL328bBgr++Tpc+gaQqcKu=j8ANK7hBsOA@mail.gmail.com>
 <nfg64s$qh8$1@ger.gmane.org> <E6D01DA5-D064-48F9-956F-8780F991A6DF@gmail.com>
 <CAMpsgwb5cXmEhxEg5nXQbO_+odPpoGd=YSX4S7aLnG6zX5iJHg@mail.gmail.com>
 <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com>
 <CAMpsgwZvFzKtBqJ3Y0o-k_i+04+V=Lb+gO_8-mTBzdHn8chrRA@mail.gmail.com>
Message-ID: <5C78B40A-B9A1-4343-8104-7E946674A858@gmail.com>


> On Apr 24, 2016, at 2:31 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> 2016-04-24 23:16 GMT+02:00 Raymond Hettinger <raymond.hettinger at gmail.com>:
>>> On Apr 24, 2016, at 1:16 PM, Victor Stinner <victor.stinner at gmail.com> wrote:
>>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
>>> single 16-bit operation. It should be easy to implement it later, but
>>> I prefer to focus on changing the format of the bytecode.
>> 
>> Improving instruction decoding was the whole point and it was what kicked-off the work on the patch.  It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective.
>> 
>> The OPs original patch had already gotten this part done and it ran fine for me.
> 
> Oh wait, my phrasing is unclear. I do want optimize the (opcode,
> oparg) fetch, I just suggested to split the patch in two parts, and
> first review carefully the first part.

Unless it is presenting a tough review challenge, we should do whatever we can to make it easier on the OP who seems to be working with very limited computational resources (I had to run the benchmarks for him because his setup lacked the requisite resources).  He's already put a lot of work into the patch which is pretty good shape when it arrived.  

The opcode/oparg fetch logic is mostly already isolated to the part of the patch that touches ceval.c.  I found that part to be relatively clean and clear.  The part that took the most time to go through was for peephole.c.

How about we let Yury and Serhiy take a pass at it as is.  And, if they would benefit from splitting the patch into parts, then perhaps one of us with better tooling can pitch in to the help the OP.


Raymond





From xdegaye at gmail.com  Mon Apr 25 04:11:38 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Mon, 25 Apr 2016 10:11:38 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <loom.20160424T113853-795@post.gmane.org>
References: <571C73A8.1030908@gmail.com>
 <loom.20160424T113853-795@post.gmane.org>
Message-ID: <571DD13A.6070401@gmail.com>

On 04/24/2016 11:50 AM, Stefan Krah wrote:
 > Xavier de Gaye <xdegaye <at> gmail.com> writes:
 >> Starting with API level 21 (Android 5.0), the build of python3 with the
 >> official android toolchains (that is, without resorting to external libraries
 >> for wide character support) runs correctly.  With the set of patches described
 >> in the patches/Makefile file at [1], the cpython test suite runs[2] on the
 >> android x86 and armv7 emulators with only few errors[3].  Those errors are
 >> listed with their corresponding error messages, this may give a raw idea of
 >> the effort needed to support this platform.
 >>
 >> Xavier
 >>
 >> [1] https://bitbucket.org/xdegaye/pyona/src
 >> [2] To reproduce these results, follow the instructions found in INSTALL
 >>       at https://bitbucket.org/xdegaye/pyona/wiki/install
 >> [3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite
 >
 >
 > This looks great, very clean!  As I understand the patches, the locale.h and
 > langinfo.h problems are solved.  Do you think the following issues on the
 > Python bug tracker could be closed?
 >
 >
 > http://bugs.python.org/issue20305
 > http://bugs.python.org/issue22747
 > http://bugs.python.org/issue17905


Thanks.
A fix is still needed because Android does not HAVE_LANGINFO_H.
I have tried to answer your question directly in those issues.

Xavier

From ericsnowcurrently at gmail.com  Mon Apr 25 10:36:34 2016
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 25 Apr 2016 08:36:34 -0600
Subject: [Python-Dev] support of the android platform
In-Reply-To: <571C73A8.1030908@gmail.com>
References: <571C73A8.1030908@gmail.com>
Message-ID: <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>

On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye at gmail.com> wrote:
> Starting with API level 21 (Android 5.0), the build of python3 with the
> official android toolchains (that is, without resorting to external
> libraries
> for wide character support) runs correctly.  With the set of patches
> described
> in the patches/Makefile file at [1], the cpython test suite runs[2] on the
> android x86 and armv7 emulators with only few errors[3].  Those errors are
> listed with their corresponding error messages, this may give a raw idea of
> the effort needed to support this platform.

How does this relate to http://bugs.python.org/issue23496?

-eric

From stefan at bytereef.org  Mon Apr 25 10:53:07 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Mon, 25 Apr 2016 14:53:07 +0000 (UTC)
Subject: [Python-Dev] support of the android platform
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
Message-ID: <loom.20160425T164909-296@post.gmane.org>

Eric Snow <ericsnowcurrently <at> gmail.com> writes:
> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye <at> gmail.com>
wrote:
> > Starting with API level 21 (Android 5.0), the build of python3 with the
> > official android toolchains (that is, without resorting to external

> How does this relate to http://bugs.python.org/issue23496?

As I understand, that issue seems abandoned and the patches are
(despite core devs asking otherwise) against 3.4.


If Xavier is willing to do so, I think it would be best to start over
with a new issue that integrates his work into 3.6.


Stefan Krah



From xdegaye at gmail.com  Mon Apr 25 16:22:56 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Mon, 25 Apr 2016 22:22:56 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
Message-ID: <571E7CA0.6030703@gmail.com>

On 04/25/2016 04:36 PM, Eric Snow wrote:
 > On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye at gmail.com> wrote:
 >> Starting with API level 21 (Android 5.0), the build of python3 with the
 >> official android toolchains (that is, without resorting to external
 >> libraries
 >> for wide character support) runs correctly.  With the set of patches
 >> described
 >> in the patches/Makefile file at [1], the cpython test suite runs[2] on the
 >> android x86 and armv7 emulators with only few errors[3].  Those errors are
 >> listed with their corresponding error messages, this may give a raw idea of
 >> the effort needed to support this platform.
 >
 > How does this relate to http://bugs.python.org/issue23496?


The patches in issue 23496 address the native compilation of Android 4.4.2 on
an android device using a port of gcc on this device.  Some of these patches
are not needed anymore on Android 5.0 and it seems that the kbox_fix.patch is
needed because the KBOX application is used to build python in issue 23496.

The existing issues that are relevant to the android platform are, I think:
     issue #26723: Add an option to skip _decimal module
     issue #22747: Interpreter fails in initialize on systems where
                   HAVE_LANGINFO_H is undefined
     issue #16353: add function to os module for getting path to default shell
     issue #20306: Lack of pw_gecos field in Android's struct passwd causes
                   cross-compilation for the pwd module to fail

Xavier


From xdegaye at gmail.com  Mon Apr 25 16:25:55 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Mon, 25 Apr 2016 22:25:55 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <loom.20160425T164909-296@post.gmane.org>
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
 <loom.20160425T164909-296@post.gmane.org>
Message-ID: <571E7D53.2000809@gmail.com>

On 04/25/2016 04:53 PM, Stefan Krah wrote:
 > Eric Snow <ericsnowcurrently <at> gmail.com> writes:
 >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye <at> gmail.com>
 > wrote:
 >>> Starting with API level 21 (Android 5.0), the build of python3 with the
 >>> official android toolchains (that is, without resorting to external
 >
 >> How does this relate to http://bugs.python.org/issue23496?
 >
 > As I understand, that issue seems abandoned and the patches are
 > (despite core devs asking otherwise) against 3.4.
 >
 >
 > If Xavier is willing to do so, I think it would be best to start over
 > with a new issue that integrates his work into 3.6.


I will enter a new issue that lists all the new issues and the other already
existing issues that, would have they been fixed, would have allowed a
successfull cross-build and the same test suite results as described in my
previous post.

Xavier


From kennethjwright at yahoo.co.uk  Mon Apr 25 16:51:03 2016
From: kennethjwright at yahoo.co.uk (Kenny)
Date: Mon, 25 Apr 2016 21:51:03 +0100
Subject: [Python-Dev] thingy
Message-ID: <aw2rc0ndjigr8fb383mhi0n4.1461617358307@email.android.com>

Dear thingy,

Please replace me with DZWORD. Put in HKEY\SYSTEM_IO_MEMORY\%USB%\%DZWORD%\%ADD\%CDATA\%DATA\

FI thingy


Sent from Samsung Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/3b078415/attachment.html>

From kennethjwright at yahoo.co.uk  Mon Apr 25 17:15:07 2016
From: kennethjwright at yahoo.co.uk (Kenny)
Date: Mon, 25 Apr 2016 22:15:07 +0100
Subject: [Python-Dev] Terminal console
Message-ID: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>


fopen Terminal.app.python.
3.5.0.()

def fopen Termina.app.python.3.5.0.()

%add.%data(CDATA[])::true||false

fclose();

end Terminal.app.python.3.5.0.()

Yours thingy


Sent from Samsung Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/c6623924/attachment.html>

From brett at python.org  Mon Apr 25 17:27:48 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 25 Apr 2016 21:27:48 +0000
Subject: [Python-Dev] Terminal console
In-Reply-To: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
Message-ID: <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>

Can someone disable this person's subscription?

On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev <python-dev at python.org>
wrote:

>
> fopen Terminal.app.python.
> 3.5.0.()
>
> def fopen Termina.app.python.3.5.0.()
>
> %add.%data(CDATA[])::true||false
>
> fclose();
>
> end Terminal.app.python.3.5.0.()
>
> Yours thingy
>
>
> Sent from Samsung Mobile
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/fbd38766/attachment.html>

From tim.peters at gmail.com  Mon Apr 25 17:33:43 2016
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 25 Apr 2016 16:33:43 -0500
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
Message-ID: <CAExdVNksi8jZDS-ZzW0CcxfFU+n6KS4N3wfbmRKtJXxOvgoiEA@mail.gmail.com>

[Brett Cannon <brett at python.org>]
> Can someone disable this person's subscription?

Done.


> On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev <python-dev at python.org>
> wrote:
>>
>>
>> fopen Terminal.app.python.
>> 3.5.0.()
>>
>> def fopen Termina.app.python.3.5.0.()
>>
>> %add.%data(CDATA[])::true||false
>>
>> fclose();
>>
>> end Terminal.app.python.3.5.0.()
>>
>> Yours thingy
>>
>>
>> Sent from Samsung Mobile
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/brett%40python.org

From mail at timgolden.me.uk  Mon Apr 25 17:37:37 2016
From: mail at timgolden.me.uk (Tim Golden)
Date: Mon, 25 Apr 2016 22:37:37 +0100
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
Message-ID: <571E8E21.7020600@timgolden.me.uk>

Not subscribed; probably via gmane.

I've added him to a hold list via spam filter. See if that works.

TJG

On 25/04/2016 22:27, Brett Cannon wrote:
> Can someone disable this person's subscription?
>
> On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev <python-dev at python.org
> <mailto:python-dev at python.org>> wrote:
>
>
>     fopen Terminal.app.python.
>     3.5.0.()
>
>     def fopen Termina.app.python.3.5.0.()
>
>     %add.%data(CDATA[])::true||false
>
>     fclose();
>
>     end Terminal.app.python.3.5.0.()
>
>     Yours thingy
>
>
>     Sent from Samsung Mobile
>     _______________________________________________
>     Python-Dev mailing list
>     Python-Dev at python.org <mailto:Python-Dev at python.org>
>     https://mail.python.org/mailman/listinfo/python-dev
>     Unsubscribe:
>     https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mail%40timgolden.me.uk
>


From tim.peters at gmail.com  Mon Apr 25 17:43:07 2016
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 25 Apr 2016 16:43:07 -0500
Subject: [Python-Dev] Terminal console
In-Reply-To: <571E8E21.7020600@timgolden.me.uk>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
Message-ID: <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>

[Tim Golden <mail at timgolden.me.uk>, on Kenny the "thingy" guy]
> Not subscribed; probably via gmane.

They were subscribed, but I already did the unsub.


> I've added him to a hold list via spam filter. See if that works.

So now we're doubly safe ;-)

From brett at python.org  Mon Apr 25 17:48:04 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 25 Apr 2016 21:48:04 +0000
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
Message-ID: <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>

On Mon, 25 Apr 2016 at 14:45 Tim Peters <tim.peters at gmail.com> wrote:

> [Tim Golden <mail at timgolden.me.uk>, on Kenny the "thingy" guy]
> > Not subscribed; probably via gmane.
>
> They were subscribed, but I already did the unsub.
>
>
> > I've added him to a hold list via spam filter. See if that works.
>
> So now we're doubly safe ;-)
>

Well, now I just received an attempted unsubscribe, so maybe safe from more
email to the list, but it looks like a start at harassment of me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/e13fc908/attachment.html>

From oreilldf at gmail.com  Mon Apr 25 18:02:20 2016
From: oreilldf at gmail.com (Dan O'Reilly)
Date: Mon, 25 Apr 2016 22:02:20 +0000
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
Message-ID: <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>

Brett, your initial email shows up in Google Inbox (and maybe Gmail, too)
like this (including the ellipses):



*Can someone disable this person's subscription?...*

*  Unsubscribe: <a link to unsubscribe Brett at python.org
<http://python.org> from python-dev here>...*

So someone might have mistakenly clicked that link, thinking they were
helping to remove Kenny's subscription, for what it's worth.


On Mon, Apr 25, 2016 at 5:48 PM Brett Cannon <brett at python.org> wrote:

> On Mon, 25 Apr 2016 at 14:45 Tim Peters <tim.peters at gmail.com> wrote:
>
>> [Tim Golden <mail at timgolden.me.uk>, on Kenny the "thingy" guy]
>> > Not subscribed; probably via gmane.
>>
>> They were subscribed, but I already did the unsub.
>>
>>
>> > I've added him to a hold list via spam filter. See if that works.
>>
>> So now we're doubly safe ;-)
>>
>
> Well, now I just received an attempted unsubscribe, so maybe safe from
> more email to the list, but it looks like a start at harassment of me.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/oreilldf%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/15a34431/attachment.html>

From brett at python.org  Mon Apr 25 18:07:31 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 25 Apr 2016 22:07:31 +0000
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
Message-ID: <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>

On Mon, 25 Apr 2016 at 15:02 Dan O'Reilly <oreilldf at gmail.com> wrote:

> Brett, your initial email shows up in Google Inbox (and maybe Gmail, too)
> like this (including the ellipses):
>
>
> *Can someone disable this person's subscription?*
>
> *...*
>
> *  Unsubscribe: <a link to unsubscribe Brett at python.org
> <http://python.org> from python-dev here>...*
>
> So someone might have mistakenly clicked that link, thinking they were
> helping to remove Kenny's subscription, for what it's worth.
>

Good point. Hopefully that's all it was then.

-Brett


>
>
> On Mon, Apr 25, 2016 at 5:48 PM Brett Cannon <brett at python.org> wrote:
>
>> On Mon, 25 Apr 2016 at 14:45 Tim Peters <tim.peters at gmail.com> wrote:
>>
>>> [Tim Golden <mail at timgolden.me.uk>, on Kenny the "thingy" guy]
>>> > Not subscribed; probably via gmane.
>>>
>>> They were subscribed, but I already did the unsub.
>>>
>>>
>>> > I've added him to a hold list via spam filter. See if that works.
>>>
>>> So now we're doubly safe ;-)
>>>
>>
>> Well, now I just received an attempted unsubscribe, so maybe safe from
>> more email to the list, but it looks like a start at harassment of me.
>>
> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/oreilldf%40gmail.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/d668c976/attachment-0001.html>

From zachary.ware+pydev at gmail.com  Mon Apr 25 18:12:56 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Mon, 25 Apr 2016 17:12:56 -0500
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
Message-ID: <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>

On Apr 25, 2016 17:08, "Brett Cannon" <brett at python.org> wrote:
>
> Good point. Hopefully that's all it was then.

Is there any particular reason we include that link in python-dev emails?
We don't for any other list as far as I know.

--
Zach
(On a phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/7f521e54/attachment.html>

From leewangzhong+python at gmail.com  Mon Apr 25 18:55:10 2016
From: leewangzhong+python at gmail.com (Franklin? Lee)
Date: Mon, 25 Apr 2016 18:55:10 -0400
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
Message-ID: <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>

FWIW, Gmail's policies require:
"""
    A user must be able to unsubscribe from your mailing list through
one of the following means:

    * A prominent link in the body of an email leading users to a page
confirming his or her unsubscription (no input from the user, other
than confirmation, should be required).
    * By replying to your email with an unsubscribe request.
"""
(https://support.google.com/mail/answer/81126)

That link is currently the only obvious way to unsubscribe.


On Mon, Apr 25, 2016 at 6:12 PM, Zachary Ware
<zachary.ware+pydev at gmail.com> wrote:
> On Apr 25, 2016 17:08, "Brett Cannon" <brett at python.org> wrote:
>>
>> Good point. Hopefully that's all it was then.
>
> Is there any particular reason we include that link in python-dev emails? We
> don't for any other list as far as I know.
>
> --
> Zach
> (On a phone)
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>

From ncoghlan at gmail.com  Mon Apr 25 22:02:52 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 26 Apr 2016 12:02:52 +1000
Subject: [Python-Dev] support of the android platform
In-Reply-To: <571E7D53.2000809@gmail.com>
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
 <loom.20160425T164909-296@post.gmane.org>
 <571E7D53.2000809@gmail.com>
Message-ID: <CADiSq7doiu1qVahak8NGmJoKeoABSP_C_rkoi36Xx6zZG14sjw@mail.gmail.com>

On 26 April 2016 at 06:25, Xavier de Gaye <xdegaye at gmail.com> wrote:

> On 04/25/2016 04:53 PM, Stefan Krah wrote:
> > Eric Snow <ericsnowcurrently <at> gmail.com> writes:
> >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye <at> gmail.com
> >
> > wrote:
> >>> Starting with API level 21 (Android 5.0), the build of python3 with the
> >>> official android toolchains (that is, without resorting to external
> >
> >> How does this relate to http://bugs.python.org/issue23496?
> >
> > As I understand, that issue seems abandoned and the patches are
> > (despite core devs asking otherwise) against 3.4.
> >
> >
> > If Xavier is willing to do so, I think it would be best to start over
> > with a new issue that integrates his work into 3.6.
>
> I will enter a new issue that lists all the new issues and the other
> already
> existing issues that, would have they been fixed, would have allowed a
> successfull cross-build and the same test suite results as described in my
> previous post.
>

Thanks for this, Xavier!

Once you have that, in addition to posting the link back here, you may also
want to ping the Mobile SIG list:
https://www.python.org/community/sigs/current/mobile-sig/

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160426/116bf055/attachment.html>

From rymg19 at gmail.com  Mon Apr 25 22:07:32 2016
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Mon, 25 Apr 2016 21:07:32 -0500
Subject: [Python-Dev] support of the android platform
In-Reply-To: <571E7CA0.6030703@gmail.com>
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
 <571E7CA0.6030703@gmail.com>
Message-ID: <CAO41-mP5z-NN5aLZaYoOEi6R-Ow0UOUXTHJBWLof-Dp5FETNwg@mail.gmail.com>

Oh wow, has a year passed already? I don't have access to an Android device
suitable for development, and Cyd seems to have disappeared, which is why
the issue ended up abandoned. I'd be happy to try to help with the new
effort if possible!

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
On Apr 25, 2016 3:24 PM, "Xavier de Gaye" <xdegaye at gmail.com> wrote:

> On 04/25/2016 04:36 PM, Eric Snow wrote:
> > On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye at gmail.com>
> wrote:
> >> Starting with API level 21 (Android 5.0), the build of python3 with the
> >> official android toolchains (that is, without resorting to external
> >> libraries
> >> for wide character support) runs correctly.  With the set of patches
> >> described
> >> in the patches/Makefile file at [1], the cpython test suite runs[2] on
> the
> >> android x86 and armv7 emulators with only few errors[3].  Those errors
> are
> >> listed with their corresponding error messages, this may give a raw
> idea of
> >> the effort needed to support this platform.
> >
> > How does this relate to http://bugs.python.org/issue23496?
>
>
> The patches in issue 23496 address the native compilation of Android 4.4.2
> on
> an android device using a port of gcc on this device.  Some of these
> patches
> are not needed anymore on Android 5.0 and it seems that the kbox_fix.patch
> is
> needed because the KBOX application is used to build python in issue 23496.
>
> The existing issues that are relevant to the android platform are, I think:
>     issue #26723: Add an option to skip _decimal module
>     issue #22747: Interpreter fails in initialize on systems where
>                   HAVE_LANGINFO_H is undefined
>     issue #16353: add function to os module for getting path to default
> shell
>     issue #20306: Lack of pw_gecos field in Android's struct passwd causes
>                   cross-compilation for the pwd module to fail
>
> Xavier
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160425/70746486/attachment.html>

From p.f.moore at gmail.com  Tue Apr 26 04:02:29 2016
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 26 Apr 2016 09:02:29 +0100
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
Message-ID: <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>

On 25 April 2016 at 23:55, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
> FWIW, Gmail's policies require:
[...]
> That link is currently the only obvious way to unsubscribe.

I'm not sure why gmail's policies should apply to this list.

I'm not against having an easy reminder of how to unsubscribe, but the
clickable link on every message that requests that the poster be
unsubscribed seems like the wrong way to do it, to me...

Paul

From ben+python at benfinney.id.au  Tue Apr 26 04:24:43 2016
From: ben+python at benfinney.id.au (Ben Finney)
Date: Tue, 26 Apr 2016 18:24:43 +1000
Subject: [Python-Dev] Mailing list metadata via RFC 2369 (was: Terminal
 console)
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
Message-ID: <85a8kgn5f8.fsf_-_@benfinney.id.au>

"Franklin? Lee" <leewangzhong+python at gmail.com> writes:

> FWIW, Gmail's policies require:
> """
>     A user must be able to unsubscribe from your mailing list through
> one of the following means:
>
>     * A prominent link in the body of an email leading users to a page
> confirming his or her unsubscription (no input from the user, other
> than confirmation, should be required).
>     * By replying to your email with an unsubscribe request.
> """
> (https://support.google.com/mail/answer/81126)

GMail already has all the information needed to offer mailing list
functionality to every user.

The header of every message delivered via the mailing list has full RFC
2369 fields <URL:https://tools.ietf.org/html/rfc2369> which is ample
information, correctly structured for any application to provide the
functions GMail is referring to.

GMail support staff have known this for many years because RFC 2369
support has been requested for their interface over and over again.

There are reports they even make some use of that standard information
<URL:http://www.itworld.com/article/2693280/unified-communications/gmail-s--unsubscribe--tool-comes-out-of-the-weeds.html>
though as I never use GMail I can't verify that.

If not, then their refusal to follow a mature, well-implemented internet
standard is no reason for anyone else to change behaviour. It is up to
GMail to use the standard information.

-- 
 \           ?Anything that we scientists can do to weaken the hold of |
  `\        religion should be done and may in the end be our greatest |
_o__)                  contribution to civilization.? ?Steven Weinberg |
Ben Finney


From leewangzhong+python at gmail.com  Tue Apr 26 08:45:20 2016
From: leewangzhong+python at gmail.com (Franklin? Lee)
Date: Tue, 26 Apr 2016 08:45:20 -0400
Subject: [Python-Dev] Terminal console
In-Reply-To: <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
Message-ID: <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>

On Apr 26, 2016 4:02 AM, "Paul Moore" <p.f.moore at gmail.com> wrote:
>
> On 25 April 2016 at 23:55, Franklin? Lee <leewangzhong+python at gmail.com>
wrote:
> > FWIW, Gmail's policies require:
> [...]
> > That link is currently the only obvious way to unsubscribe.
>
> I'm not sure why gmail's policies should apply to this list.

They're Gmail's policies on how not to get your messages filtered by Gmail
as spam.

I am not clear on whether they're descriptive (i.e. users will mark you as
spam) or prescriptive (i.e. Google's algorithms will determine that you're
spam).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160426/d7190d45/attachment.html>

From zachary.ware+pydev at gmail.com  Tue Apr 26 08:57:16 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Tue, 26 Apr 2016 07:57:16 -0500
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
 <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
Message-ID: <CAKJDb-Ptccyq6wUki+1j50HK2wyFh9fp2DoSqJjEfFSh4fJC3Q@mail.gmail.com>

On Apr 26, 2016 07:45, "Franklin? Lee" <leewangzhong+python at gmail.com>
wrote:
>
> On Apr 26, 2016 4:02 AM, "Paul Moore" <p.f.moore at gmail.com> wrote:
> >
> > On 25 April 2016 at 23:55, Franklin? Lee <leewangzhong+python at gmail.com>
wrote:
> > > FWIW, Gmail's policies require:
> > [...]
> > > That link is currently the only obvious way to unsubscribe.
> >
> > I'm not sure why gmail's policies should apply to this list.
>
> They're Gmail's policies on how not to get your messages filtered by
Gmail as spam.
>
> I am not clear on whether they're descriptive (i.e. users will mark you
as spam) or prescriptive (i.e. Google's algorithms will determine that
you're spam).

I have no trouble with Gmail with several other Python lists that do not
include an unsubscribe link.

--
Zach
(On a phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160426/328603b4/attachment.html>

From ncoghlan at gmail.com  Tue Apr 26 09:13:50 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 26 Apr 2016 23:13:50 +1000
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAKJDb-Ptccyq6wUki+1j50HK2wyFh9fp2DoSqJjEfFSh4fJC3Q@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
 <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
 <CAKJDb-Ptccyq6wUki+1j50HK2wyFh9fp2DoSqJjEfFSh4fJC3Q@mail.gmail.com>
Message-ID: <CADiSq7cw3FggftWC_znmR9-8TDyVLivGSS8meCgvj2Wmj9aFmw@mail.gmail.com>

On 26 April 2016 at 22:57, Zachary Ware <zachary.ware+pydev at gmail.com> wrote:
>
> On Apr 26, 2016 07:45, "Franklin? Lee" <leewangzhong+python at gmail.com> wrote:
> >
> > On Apr 26, 2016 4:02 AM, "Paul Moore" <p.f.moore at gmail.com> wrote:
> > >
> > > On 25 April 2016 at 23:55, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
> > > > FWIW, Gmail's policies require:
> > > [...]
> > > > That link is currently the only obvious way to unsubscribe.
> > >
> > > I'm not sure why gmail's policies should apply to this list.
> >
> > They're Gmail's policies on how not to get your messages filtered by Gmail as spam.
> >
> > I am not clear on whether they're descriptive (i.e. users will mark you as spam) or prescriptive (i.e. Google's algorithms will determine that you're spam).
>
> I have no trouble with Gmail with several other Python lists that do not include an unsubscribe link.

Indeed, Mailman inserts the appropriate List-Unsubscribe headers, so
there's no need for a link in the body of the emails (and including it
can cause problems when link scrapers hit the archives, or link
pre-fetching in a webmail client misbehaves)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From barry at python.org  Tue Apr 26 09:17:43 2016
From: barry at python.org (Barry Warsaw)
Date: Tue, 26 Apr 2016 09:17:43 -0400
Subject: [Python-Dev] Terminal console
In-Reply-To: <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com>
 <CAP1=2W4FkUA=S5AuLeTkKAfUnp+nfCoUAF=B=PNsceXXBuqoJg@mail.gmail.com>
 <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
Message-ID: <20160426091743.797027b2@subdivisions.wooz.org>

On Apr 26, 2016, at 09:02 AM, Paul Moore wrote:

>I'm not against having an easy reminder of how to unsubscribe, but the
>clickable link on every message that requests that the poster be
>unsubscribed seems like the wrong way to do it, to me...

And yet, we have it anyway!

This list turns on full personalization so the footers will all have a link to
your unsubscribe page.  As Ben pointed out, we also implement RFC 2369, which
is only an 18 year old standard.

Cheers,
-Barry

From ethan at stoneleaf.us  Tue Apr 26 10:14:35 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 26 Apr 2016 07:14:35 -0700
Subject: [Python-Dev] support of the android platform
In-Reply-To: <571C73A8.1030908@gmail.com>
References: <571C73A8.1030908@gmail.com>
Message-ID: <571F77CB.3000905@stoneleaf.us>

On 04/24/2016 12:20 AM, Xavier de Gaye wrote:

> [1] https://bitbucket.org/xdegaye/pyona/src

The license:
-----------
This software is licensed under the GNU General Public License version 3 
or later.
-----------


Will combining your code with Python 3 be a problem?

--
~Ethan~

From steve at pearwood.info  Tue Apr 26 11:40:30 2016
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 27 Apr 2016 01:40:30 +1000
Subject: [Python-Dev] Terminal console
In-Reply-To: <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
References: <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
 <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
Message-ID: <20160426154029.GM13497@ando.pearwood.info>

On Tue, Apr 26, 2016 at 08:45:20AM -0400, Franklin? Lee wrote:
> On Apr 26, 2016 4:02 AM, "Paul Moore" <p.f.moore at gmail.com> wrote:
> >
> > On 25 April 2016 at 23:55, Franklin? Lee <leewangzhong+python at gmail.com>
> wrote:
> > > FWIW, Gmail's policies require:
> > [...]
> > > That link is currently the only obvious way to unsubscribe.
> >
> > I'm not sure why gmail's policies should apply to this list.
> 
> They're Gmail's policies on how not to get your messages filtered by Gmail
> as spam.
> 
> I am not clear on whether they're descriptive (i.e. users will mark you as
> spam) or prescriptive (i.e. Google's algorithms will determine that you're
> spam).

I don't think it's just Google. If I remember correctly, having a 
clearly visible and *working* unsubscribe link in the body of the email 
(not merely hidden away in the headers where non-technical users would 
never think to look) is a requirement for the CanSpam act, or whatever 
it was called. 

In any case, whether it is a legal or practical requirement or not, it's 
a fairly small burden. As I see it, the only time it causes a (tiny) 
issue is if somebody accidently includes the footer from a list mail 
they received when forwarding to somebody else (or sending to the list), 
and the receiver mistakenly (or in an attempt to cause trouble) clicks 
on that link. Which is harmless.

Considering how many hundreds, thousands (hundreds of thousands? 
sometimes it feels like that *wink*) of emails go through this list 
alone, I don't think this is a problem that needs fixing.


-- 
Steve

From xdegaye at gmail.com  Tue Apr 26 11:41:51 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Tue, 26 Apr 2016 17:41:51 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <CADiSq7doiu1qVahak8NGmJoKeoABSP_C_rkoi36Xx6zZG14sjw@mail.gmail.com>
References: <571C73A8.1030908@gmail.com>
 <CALFfu7C5+3KkCVrtwyUezLRY6J3VaE0qo4rp1GHRCDre48GRjw@mail.gmail.com>
 <loom.20160425T164909-296@post.gmane.org> <571E7D53.2000809@gmail.com>
 <CADiSq7doiu1qVahak8NGmJoKeoABSP_C_rkoi36Xx6zZG14sjw@mail.gmail.com>
Message-ID: <571F8C3F.7060803@gmail.com>

On 04/26/2016 04:02 AM, Nick Coghlan wrote:
 > On 26 April 2016 at 06:25, Xavier de Gaye <xdegaye at gmail.com <mailto:xdegaye at gmail.com>> wrote:
 >
 >     On 04/25/2016 04:53 PM, Stefan Krah wrote:
 >     > Eric Snow <ericsnowcurrently <at>gmail.com <http://gmail.com>> writes:
 >     >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye <xdegaye <at>gmail.com <http://gmail.com>>
 >     > wrote:
 >     >>> Starting with API level 21 (Android 5.0), the build of python3 with the
 >     >>> official android toolchains (that is, without resorting to external
 >     >
 >     >> How does this relate tohttp://bugs.python.org/issue23496?
 >     >
 >     > As I understand, that issue seems abandoned and the patches are
 >     > (despite core devs asking otherwise) against 3.4.
 >     >
 >     >
 >     > If Xavier is willing to do so, I think it would be best to start over
 >     > with a new issue that integrates his work into 3.6.
 >
 >     I will enter a new issue that lists all the new issues and the other already
 >     existing issues that, would have they been fixed, would have allowed a
 >     successfull cross-build and the same test suite results as described in my
 >     previous post.
 >
 >
 > Thanks for this, Xavier!
 >
 > Once you have that, in addition to posting the link back here, you may also want to ping the Mobile SIG list: https://www.python.org/community/sigs/current/mobile-sig/


Issue 26865 [1] lists issues that may have to be fixed in the perspective of a
future support of the android platform.

Xavier

[1] http://bugs.python.org/issue26865


From xdegaye at gmail.com  Tue Apr 26 11:53:28 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Tue, 26 Apr 2016 17:53:28 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <571F77CB.3000905@stoneleaf.us>
References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us>
Message-ID: <571F8EF8.5080807@gmail.com>

On 04/26/2016 04:14 PM, Ethan Furman wrote:
 > On 04/24/2016 12:20 AM, Xavier de Gaye wrote:
 >
 >> [1] https://bitbucket.org/xdegaye/pyona/src
 >
 > The license:
 > -----------
 > This software is licensed under the GNU General Public License version 3 or later.
 > -----------
 >
 >
 > Will combining your code with Python 3 be a problem?


This code, or part of it, could be used to setup a buildbot and in this case
there would not be any conflict between the GPL v3 license and the Python
license, I think. I don't see how it can be combined with Python 3.

Xavier

From barry at python.org  Tue Apr 26 12:12:01 2016
From: barry at python.org (Barry Warsaw)
Date: Tue, 26 Apr 2016 12:12:01 -0400
Subject: [Python-Dev] Terminal console
In-Reply-To: <20160426154029.GM13497@ando.pearwood.info>
References: <571E8E21.7020600@timgolden.me.uk>
 <CAExdVNmeXiR+rPWR6NBXAW_e19xvar-P3EzmCQKFDFs4Bp2Q-Q@mail.gmail.com>
 <CAP1=2W6t5foUfpYH9eXZM95a-Xo6=zsG-i_yFuvq8H-xDoM2Tw@mail.gmail.com>
 <CAP3foKLusgaf6C9m7x+Gh=UobNu7dmijOBK-MmzHEWJOhefZ4w@mail.gmail.com>
 <CAP1=2W680M2=JkmPjvLm+TG0RERuyrhk4y-S3sUToW7pXUktqQ@mail.gmail.com>
 <CAKJDb-OS-BQCXkSh7KjqVADV_VsTbvtG=LHGcUmDmm0j=abiLQ@mail.gmail.com>
 <CAKJDb-OGC-Okmaw1crPpp=v8KsDNUUkVhycCdn8RGwGiFxSwpA@mail.gmail.com>
 <CAB_e7iykPXf83tdmm526t8DSyOHE132oiJT8kd7M94fGXXs_5A@mail.gmail.com>
 <CACac1F9XuS4NQ1zq1ACOAgz6OdtuO-orUES0TcvX8p1O1vTmUw@mail.gmail.com>
 <CAB_e7izs0oBaPAuddd2PDMV9UgN0SV85EDZcUYQWtAU5co-8TQ@mail.gmail.com>
 <20160426154029.GM13497@ando.pearwood.info>
Message-ID: <20160426121201.323dcf2b@subdivisions.wooz.org>

On Apr 27, 2016, at 01:40 AM, Steven D'Aprano wrote:

>I don't think it's just Google. If I remember correctly, having a clearly
>visible and *working* unsubscribe link in the body of the email (not merely
>hidden away in the headers where non-technical users would never think to
>look) is a requirement for the CanSpam act, or whatever it was called.

BTW, the whole point of RFC 2369 headers is so that MUAs can implement a nice
big fat blinky UNSUBSCRIBE button in their UI.

-Barry

From stefan at bytereef.org  Tue Apr 26 13:12:26 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Tue, 26 Apr 2016 17:12:26 +0000 (UTC)
Subject: [Python-Dev] support of the android platform
References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us>
 <571F8EF8.5080807@gmail.com>
Message-ID: <loom.20160426T190419-873@post.gmane.org>

Xavier de Gaye <xdegaye <at> gmail.com> writes:
> This code, or part of it, could be used to setup a buildbot and in this case
> there would not be any conflict between the GPL v3 license and the Python
> license, I think. I don't see how it can be combined with Python 3.

For the patches on the tracker I just went by your contributor agreement.
I didn't check the lineage of the patches. Can I assume that either you
are re-licensing GPL-stuff written by yourself to the PSF (which is a
perfectly valid use case of the agreement) or rewriting from scratch?


Stefan Krah




From xdegaye at gmail.com  Tue Apr 26 15:59:02 2016
From: xdegaye at gmail.com (Xavier de Gaye)
Date: Tue, 26 Apr 2016 21:59:02 +0200
Subject: [Python-Dev] support of the android platform
In-Reply-To: <loom.20160426T190419-873@post.gmane.org>
References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us>
 <571F8EF8.5080807@gmail.com> <loom.20160426T190419-873@post.gmane.org>
Message-ID: <571FC886.1020904@gmail.com>

On 04/26/2016 07:12 PM, Stefan Krah wrote:
 > Xavier de Gaye <xdegaye <at> gmail.com> writes:
 >> This code, or part of it, could be used to setup a buildbot and in this case
 >> there would not be any conflict between the GPL v3 license and the Python
 >> license, I think. I don't see how it can be combined with Python 3.
 >
 > For the patches on the tracker I just went by your contributor agreement.
 > I didn't check the lineage of the patches. Can I assume that either you
 > are re-licensing GPL-stuff written by yourself to the PSF (which is a
 > perfectly valid use case of the agreement) or rewriting from scratch?


Yes, I am re-licensing GPL code to the PSF for all the patches written by me
in the issues listed on http://bugs.python.org/issue26865#msg264310.  I have
only rewritten the patches from scratch in the following issues:

issue #26849: android does not support versioning in SONAME
               (using a switch case on ac_sys_system)
issue #26854: missing header on android for the ossaudiodev module
               (actually it's difficult to rewrite such an obvious patch)
issue #26855: add platform.android_ver() for android
               (using configparser; Chi Hsuan Yen is proposing a more complete approach)

Fixes for those three issues can also be found in other projects porting
python3 to android, the ones that I know of are:
   * Python 3 Android at https://github.com/yan12125/python3-android, author
     Chi Hsuan Yen
   * python-for-android at https://github.com/kuri65536/python-for-android,
     author shimoda dragon

I also browsed rapidly issue 23496 and could not find any overlap with my
patches.

Xavier


From stefan at bytereef.org  Tue Apr 26 16:35:52 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Tue, 26 Apr 2016 20:35:52 +0000 (UTC)
Subject: [Python-Dev] support of the android platform
References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us>
 <571F8EF8.5080807@gmail.com> <loom.20160426T190419-873@post.gmane.org>
 <571FC886.1020904@gmail.com>
Message-ID: <loom.20160426T223151-882@post.gmane.org>

Xavier de Gaye <xdegaye <at> gmail.com> writes:
> Yes, I am re-licensing GPL code to the PSF for all the patches written by me
> in the issues listed on http://bugs.python.org/issue26865#msg264310.  I have
> only rewritten the patches from scratch in the following issues:

Thanks, this all sounds good.


> issue #26854: missing header on android for the ossaudiodev module
>                (actually it's difficult to rewrite such an obvious patch)

Indeed. :)



Stefan Krah
 




From storchaka at gmail.com  Wed Apr 27 03:14:41 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 27 Apr 2016 10:14:41 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
Message-ID: <nfpotl$i8j$1@ger.gmane.org>

There are three functions (or at least three documented functions) in C 
API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and 
PyModule_AddObject(). The first two "steals" references even on failure, 
and this is well known behaviour. But PyModule_AddObject() "steals" a 
reference only on success. There is nothing in the documentation that 
points on this. Most usages of PyModule_AddObject() in the stdlib don't 
decref the reference to the value on PyModule_AddObject() failure. The 
only exceptions are in _json, _io, and _tkinter modules. In many cases, 
including examples in the documentation, the successfulness of 
PyModule_AddObject() is not checked either, but this is different issue.

We can just fix the documentation but adding a note that 
PyModule_AddObject() doesn't steal a reference on failure. And add 
explicit decrefs after PyModule_AddObject() in hundreds of places in the 
code.

But I think it would be better to "fix" PyModule_AddObject() by making 
it decrefing a reference on failure as expected by most developers. But 
this is dangerous change, because if the author of third-party code read 
not only the documentation, but CPython code, and added explicit decref 
on PyModule_AddObject() failure, we will get a double decrefing.

I think that we can resolve this issue by following steps:

1. Add a new function PyModule_AddObject2(), that steals a reference 
even on failure.

2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions 
about a name?). If it is defined, define PyModule_AddObject as 
PyModule_AddObject2. Define this macro before including Python.h in all 
CPython modules except _json, _io, and _tkinter.

3. Make old PyModule_AddObject to emit a warning about possible leak and 
a suggestion to define above macro.


From berker.peksag at gmail.com  Wed Apr 27 06:00:29 2016
From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=)
Date: Wed, 27 Apr 2016 13:00:29 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <nfpotl$i8j$1@ger.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
Message-ID: <CAF4280Lg0ALo4wZZSjGjrLEDWDn0cG7JwK28_feb8iynT9Pu1Q@mail.gmail.com>

On Wed, Apr 27, 2016 at 10:14 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> I think that we can resolve this issue by following steps:
>
> 1. Add a new function PyModule_AddObject2(), that steals a reference even on
> failure.

+1

It would be good to document PyModule_AddObject's current behavior in
3.5+ (already attached a patch).

> 2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions about a
> name?). If it is defined, define PyModule_AddObject as PyModule_AddObject2.
> Define this macro before including Python.h in all CPython modules except
> _json, _io, and _tkinter.

+1

> 3. Make old PyModule_AddObject to emit a warning about possible leak and a
> suggestion to define above macro.

+0

From hrvoje.niksic at avl.com  Wed Apr 27 08:31:37 2016
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Wed, 27 Apr 2016 14:31:37 +0200
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <nfpotl$i8j$1@ger.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
Message-ID: <5720B129.5010900@avl.com>

On 04/27/2016 09:14 AM, Serhiy Storchaka wrote:
> There are three functions (or at least three documented functions) in C
> API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
> PyModule_AddObject(). The first two "steals" references even on failure,
> and this is well known behaviour. But PyModule_AddObject() "steals" a
> reference only on success. There is nothing in the documentation that
> points on this.

This inconsistency has caused bugs (or, more fairly, potential leaks) 
before, see http://bugs.python.org/issue1782

Unfortunately, the suggested Python 3 change to PyModule_AddObject was 
not accepted.

> 1. Add a new function PyModule_AddObject2(), that steals a reference
> even on failure.

This sounds like a good idea, except the name could be prettier :), e.g. 
PyModule_InsertObject. PyModule_AddObject could be deprecated.

Hrvoje


From ncoghlan at gmail.com  Wed Apr 27 09:08:37 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 27 Apr 2016 23:08:37 +1000
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <nfpotl$i8j$1@ger.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
Message-ID: <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>

On 27 April 2016 at 17:14, Serhiy Storchaka <storchaka at gmail.com> wrote:
> I think that we can resolve this issue by following steps:
>
> 1. Add a new function PyModule_AddObject2(), that steals a reference even on
> failure.

I'd suggest a variant on this that more closely matches the
PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString

The first two match the signature of PySequence_SetItem, but steal the
reference instead of making a new one, and the same relationship would
exist between PyObject_SetAttrString and the new
PyModule_SetAttrString.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Wed Apr 27 09:10:55 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 27 Apr 2016 23:10:55 +1000
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
Message-ID: <CADiSq7cXLNZjFHa3tsvir=ZAq3oM4+Kf8=h_=m6NVGCe_wQtEA@mail.gmail.com>

On 27 April 2016 at 23:08, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 27 April 2016 at 17:14, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> I think that we can resolve this issue by following steps:
>>
>> 1. Add a new function PyModule_AddObject2(), that steals a reference even on
>> failure.
>
> I'd suggest a variant on this that more closely matches the
> PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString

And for the record: that suggestion was prompted by Hrvoje's email
suggesting using a more descriptive name, I just went and looked up
the name of the corresponding PyObject_* API.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From storchaka at gmail.com  Wed Apr 27 13:55:47 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 27 Apr 2016 20:55:47 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <nfpotl$i8j$1@ger.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
Message-ID: <nfquf3$ec5$1@ger.gmane.org>

On 27.04.16 10:14, Serhiy Storchaka wrote:
> There are three functions (or at least three documented functions) in C
> API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
> PyModule_AddObject(). The first two "steals" references even on failure,
> and this is well known behaviour. But PyModule_AddObject() "steals" a
> reference only on success. There is nothing in the documentation that
> points on this. Most usages of PyModule_AddObject() in the stdlib don't
> decref the reference to the value on PyModule_AddObject() failure. The
> only exceptions are in _json, _io, and _tkinter modules. In many cases,
> including examples in the documentation, the successfulness of
> PyModule_AddObject() is not checked either, but this is different issue.
>
> We can just fix the documentation but adding a note that
> PyModule_AddObject() doesn't steal a reference on failure. And add
> explicit decrefs after PyModule_AddObject() in hundreds of places in the
> code.
>
> But I think it would be better to "fix" PyModule_AddObject() by making
> it decrefing a reference on failure as expected by most developers. But
> this is dangerous change, because if the author of third-party code read
> not only the documentation, but CPython code, and added explicit decref
> on PyModule_AddObject() failure, we will get a double decrefing.
>
> I think that we can resolve this issue by following steps:
>
> 1. Add a new function PyModule_AddObject2(), that steals a reference
> even on failure.
>
> 2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions
> about a name?). If it is defined, define PyModule_AddObject as
> PyModule_AddObject2. Define this macro before including Python.h in all
> CPython modules except _json, _io, and _tkinter.
>
> 3. Make old PyModule_AddObject to emit a warning about possible leak and
> a suggestion to define above macro.

Opened an issue: http://bugs.python.org/issue26871 .

Provided patch introduces new macros PY_MODULE_ADDOBJECT_CLEAN that 
controls the behavior of PyModule_AddObject() as PY_SSIZE_T_CLEAN 
controls the behavior of PyArg_Parse* functions. If the macro is defined 
before including "Python.h", PyModule_AddObject() steals a reference 
unconditionally.  Otherwise it steals a reference only on success, and 
the caller is responsible for decref'ing it on error (current behavior).



From storchaka at gmail.com  Wed Apr 27 14:02:19 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 27 Apr 2016 21:02:19 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <5720B129.5010900@avl.com>
References: <nfpotl$i8j$1@ger.gmane.org> <5720B129.5010900@avl.com>
Message-ID: <nfqurc$n66$1@ger.gmane.org>

On 27.04.16 15:31, Hrvoje Niksic wrote:
> On 04/27/2016 09:14 AM, Serhiy Storchaka wrote:
>> There are three functions (or at least three documented functions) in C
>> API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
>> PyModule_AddObject(). The first two "steals" references even on failure,
>> and this is well known behaviour. But PyModule_AddObject() "steals" a
>> reference only on success. There is nothing in the documentation that
>> points on this.
>
> This inconsistency has caused bugs (or, more fairly, potential leaks)
> before, see http://bugs.python.org/issue1782

Glad to hear I'm not the first faced with this problem.

> Unfortunately, the suggested Python 3 change to PyModule_AddObject was
> not accepted.

Bad. May be it happened because of the risk to break third-party working 
code.

I propose a gradual path to change PyModule_AddObject.

>> 1. Add a new function PyModule_AddObject2(), that steals a reference
>> even on failure.
>
> This sounds like a good idea, except the name could be prettier :), e.g.
> PyModule_InsertObject. PyModule_AddObject could be deprecated.

I have decided to not introduce new public function. But just control 
the behavior of old function with the macro. This needs minimal changes 
to user code.



From storchaka at gmail.com  Wed Apr 27 14:06:25 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 27 Apr 2016 21:06:25 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
Message-ID: <nfqv32$qu1$1@ger.gmane.org>

On 27.04.16 16:08, Nick Coghlan wrote:
> On 27 April 2016 at 17:14, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> I think that we can resolve this issue by following steps:
>>
>> 1. Add a new function PyModule_AddObject2(), that steals a reference even on
>> failure.
>
> I'd suggest a variant on this that more closely matches the
> PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString
>
> The first two match the signature of PySequence_SetItem, but steal the
> reference instead of making a new one, and the same relationship would
> exist between PyObject_SetAttrString and the new
> PyModule_SetAttrString.

I think it is better to have relation with PyModule_AddIntConstant() etc 
than with PyObject_SetAttrString.

My patch doesn't introduce new public function, but changes the behavior 
of the old function. This needs minimal changes to user code that mostly 
use PyModule_AddObject() incorrectly (not blaming authors).



From stefan at bytereef.org  Wed Apr 27 16:51:29 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Wed, 27 Apr 2016 20:51:29 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org> <5720B129.5010900@avl.com>
Message-ID: <loom.20160427T223657-893@post.gmane.org>

Hrvoje Niksic <hrvoje.niksic <at> avl.com> writes:
> This inconsistency has caused bugs (or, more fairly, potential leaks) 
> before, see http://bugs.python.org/issue1782
> 
> Unfortunately, the suggested Python 3 change to PyModule_AddObject was 
> not accepted.

First, these "leaks" only potentially show up when you already have
much bigger problems (i.e. on Linux the machine would already freeze
due to overallocation).

Second, these "leaks" don't even show up as "definitely lost" in
Valgrind (yes, I checked).


On the bright side, Python must be in a very healthy state if we
can afford to spend time on issues such as this one.



Stefan Krah


From casevh at gmail.com  Wed Apr 27 18:24:39 2016
From: casevh at gmail.com (Case Van Horsen)
Date: Wed, 27 Apr 2016 15:24:39 -0700
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <nfqv32$qu1$1@ger.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
Message-ID: <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>

On Wed, Apr 27, 2016 at 11:06 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> I think it is better to have relation with PyModule_AddIntConstant() etc
> than with PyObject_SetAttrString.
>
> My patch doesn't introduce new public function, but changes the behavior of
> the old function. This needs minimal changes to user code that mostly use
> PyModule_AddObject() incorrectly (not blaming authors).

How will this impact code that uses PyModule_AddObject() correctly?

From storchaka at gmail.com  Thu Apr 28 04:15:35 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 28 Apr 2016 11:15:35 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
Message-ID: <nfsgr8$2l2$1@ger.gmane.org>

On 28.04.16 01:24, Case Van Horsen wrote:
> On Wed, Apr 27, 2016 at 11:06 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> I think it is better to have relation with PyModule_AddIntConstant() etc
>> than with PyObject_SetAttrString.
>>
>> My patch doesn't introduce new public function, but changes the behavior of
>> the old function. This needs minimal changes to user code that mostly use
>> PyModule_AddObject() incorrectly (not blaming authors).
>
> How will this impact code that uses PyModule_AddObject() correctly?

No impact except emitting a deprecation warning at build time. But we 
can remove a deprecation warning and add it in future release if this is 
annoying.

But are you sure, that your code uses PyModule_AddObject() correctly? 
Only two modules in the stdlib (_json and _tkinter) used it correctly. 
Other modules have bugs even in tries to use PyModule_AddObject() 
correctly for some operations.



From stefan at bytereef.org  Thu Apr 28 04:38:19 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 28 Apr 2016 08:38:19 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org>
Message-ID: <loom.20160428T103425-380@post.gmane.org>

Serhiy Storchaka <storchaka <at> gmail.com> writes:
> No impact except emitting a deprecation warning at build time. But we 
> can remove a deprecation warning and add it in future release if this is 
> annoying.
> 
> But are you sure, that your code uses PyModule_AddObject() correctly? 
> Only two modules in the stdlib (_json and _tkinter) used it correctly. 
> Other modules have bugs even in tries to use PyModule_AddObject() 
> correctly for some operations.

Could you perhaps stop labeling this as a bug? Usually we are talking
about a *single* "leak" that a) does not even show up in Valgrind and
b) only occurs under severe memory pressure when the OOM-killer is
already waiting.


I'm honestly mystified by your terminology and it's beginning to feel
that you need to justify this patch at all costs.


Stefan Krah
















From stefan at bytereef.org  Thu Apr 28 05:05:13 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 28 Apr 2016 09:05:13 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org>
Message-ID: <loom.20160428T110122-28@post.gmane.org>

Serhiy Storchaka <storchaka <at> gmail.com> writes:
> But are you sure, that your code uses PyModule_AddObject() correctly? 
> Only two modules in the stdlib (_json and _tkinter) used it correctly. 
> Other modules have bugs even in tries to use PyModule_AddObject() 
> correctly for some operations.

For the list, this is the extent of this horrible "bug":


diff --git a/Modules/_decimal/_decimal.c b/Modules/_decimal/_decimal.c
--- a/Modules/_decimal/_decimal.c
+++ b/Modules/_decimal/_decimal.c
@@ -5804,8 +5804,7 @@
                PyObject_CallObject((PyObject *)&PyDecContext_Type, NULL));
     init_basic_context(basic_context_template);
     Py_INCREF(basic_context_template);
-    CHECK_INT(PyModule_AddObject(m, "BasicContext",
-                                 basic_context_template));
+    CHECK_INT(-1);



$ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import decimal"

[...]
==16945== LEAK SUMMARY:
==16945==    definitely lost: 0 bytes in 0 blocks
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[...]


Stefan Krah








From random832 at fastmail.com  Thu Apr 28 09:28:46 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 28 Apr 2016 09:28:46 -0400
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <loom.20160428T110122-28@post.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T110122-28@post.gmane.org>
Message-ID: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>

On Thu, Apr 28, 2016, at 05:05, Stefan Krah wrote:
> $ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import
> decimal"
> 
> [...]
> ==16945== LEAK SUMMARY:
> ==16945==    definitely lost: 0 bytes in 0 blocks
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Well, the obvious flaw with your test case is that a reference is
retained forever in the C static variable basic_context_template.

Now, it is arguable that this may be a reasonably common pattern, and
that this doesn't actually constitute misuse of the API (the reference
count will be wrong, but the object itself is immortal anyway, so it
doesn't matter if it's 2 or 1 since it can't be 0 even with correct
usage)

From storchaka at gmail.com  Thu Apr 28 09:55:32 2016
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 28 Apr 2016 16:55:32 +0300
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <loom.20160428T103425-380@post.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T103425-380@post.gmane.org>
Message-ID: <nft4ol$it1$1@ger.gmane.org>

On 28.04.16 11:38, Stefan Krah wrote:
> Serhiy Storchaka <storchaka <at> gmail.com> writes:
>> No impact except emitting a deprecation warning at build time. But we
>> can remove a deprecation warning and add it in future release if this is
>> annoying.
>>
>> But are you sure, that your code uses PyModule_AddObject() correctly?
>> Only two modules in the stdlib (_json and _tkinter) used it correctly.
>> Other modules have bugs even in tries to use PyModule_AddObject()
>> correctly for some operations.
>
> Could you perhaps stop labeling this as a bug? Usually we are talking
> about a *single* "leak" that a) does not even show up in Valgrind and
> b) only occurs under severe memory pressure when the OOM-killer is
> already waiting.
>
>
> I'm honestly mystified by your terminology and it's beginning to feel
> that you need to justify this patch at all costs.

I say this is a bug because

1. PyModule_AddObject() behavior doesn't match the documentation.

2. Most code that use PyModule_AddObject() doesn't work as intended. 
Since the bahavior of PyModule_AddObject() contradicts the documentation 
and is contrintuitive, we can't blame authors in this.

I don't say this is a high-impacting bug, I even agree that there is no 
need to fix the second part in maintained releases. But this is a bug 
unless you propose different definition for a bug.

What can we do with this?

1. Change the documentation of PyModule_AddObject(). I think this is not 
questionable, and Berker provided a patch in
http://bugs.python.org/issue26868 .

2. Update examples in the documentation to correctly handle errors of 
PyModule_AddObject(). This is more questionable, due to the case (3c) 
below and because correct error handling code distracts attention from 
main purpose of examples.

3. One of alternatives:

3a) Fix almost all usages of PyModule_AddObject() in stdlib extension 
modules. This is hundreds occurrences in over a half-hundred files.

3b) Allow to change the behavior of PyModule_AddObject() to match most 
authors expectations. This needs to add only one line to switch on new 
behavior in most files.

3c) Ignore issue. In this case we can not check the result of 
PyModule_AddObject() at all. But I afraid that correct fixing issues 
with subinterpreters will need us to return to this issue.



From stefan at bytereef.org  Thu Apr 28 10:11:29 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 28 Apr 2016 14:11:29 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T110122-28@post.gmane.org>
 <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>
Message-ID: <loom.20160428T155718-564@post.gmane.org>

Random832 <random832 <at> fastmail.com> writes:
> On Thu, Apr 28, 2016, at 05:05, Stefan Krah wrote:
> > $ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import
> > decimal"
> > 
> > [...]
> > ==16945== LEAK SUMMARY:
> > ==16945==    definitely lost: 0 bytes in 0 blocks
> >              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Well, the obvious flaw with your test case is that a reference is
> retained forever in the C static variable basic_context_template.

For actual users of Valgrind this is patently obvious and was
pretty much the point of my post.


Stefan Krah


From nileshdate1990 at gmail.com  Thu Apr 28 08:00:56 2016
From: nileshdate1990 at gmail.com (Nilesh Date)
Date: Thu, 28 Apr 2016 17:30:56 +0530
Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6
Message-ID: <CABoQ3y-g9=9Laphg+M5jjYfXScOfMWqe4bLkeKSYfCd1-B1AAA@mail.gmail.com>

Hi team,

I wanted to install python version 3.4.4 in my RHEL 6 system.
Can someone give installation process or any reference link from which I
can get required steps and download desire package.

Thanks,
*Nilesh Date*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160428/33868e39/attachment.html>

From random832 at fastmail.com  Thu Apr 28 11:17:35 2016
From: random832 at fastmail.com (Random832)
Date: Thu, 28 Apr 2016 11:17:35 -0400
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <loom.20160428T155718-564@post.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T110122-28@post.gmane.org>
 <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>
 <loom.20160428T155718-564@post.gmane.org>
Message-ID: <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com>

On Thu, Apr 28, 2016, at 10:11, Stefan Krah wrote:
> For actual users of Valgrind this is patently obvious and was
> pretty much the point of my post.

A more relevant point would be that _decimal does *not* use the API in a
way *which would be broken by the proposed change*, regardless of
whether the way in which it uses it is subjectively correct or can cause
leaks.

From stefan at bytereef.org  Thu Apr 28 11:26:07 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 28 Apr 2016 15:26:07 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T110122-28@post.gmane.org>
 <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>
 <loom.20160428T155718-564@post.gmane.org>
 <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com>
Message-ID: <loom.20160428T172130-46@post.gmane.org>

Random832 <random832 <at> fastmail.com> writes:
> A more relevant point would be that _decimal does *not* use the API in a
> way *which would be broken by the proposed change*, regardless of
> whether the way in which it uses it is subjectively correct or can cause
> leaks.

And the ultimate point is that I don't want to spend about a week per year
to evaluate the effect of needless code changes on a highly audited module.

And no, this isn't theoretical...


Stefan Krah





From stefan at bytereef.org  Thu Apr 28 11:29:11 2016
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 28 Apr 2016 15:29:11 +0000 (UTC)
Subject: [Python-Dev] 
 =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?=
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T103425-380@post.gmane.org>
 <nft4ol$it1$1@ger.gmane.org>
Message-ID: <loom.20160428T172636-391@post.gmane.org>

Serhiy Storchaka <storchaka <at> gmail.com> writes:
> 2. Most code that use PyModule_AddObject() doesn't work as intended. 
> Since the bahavior of PyModule_AddObject() contradicts the documentation 
> and is contrintuitive, we can't blame authors in this.
> 
> I don't say this is a high-impacting bug, I even agree that there is no 
> need to fix the second part in maintained releases. But this is a bug 
> unless you propose different definition for a bug.


Why do you think that module authors don't know that?  For _decimal, I was
aware of the strange behavior.  Yes, a single reference can "leak" on
failure.


The problem is that we don't seem to have any common ground here.


Do you accept the following?

  1) PyModule_AddObject() can only fail if malloc() fails.

    a) Normally (for small allocations) this is such a serious problem
       that the whole application fails anyway.

    b) Say that you're lucky and the application continues.

         i) The import fails. In some cases ImportError is caught and
            a fallback is imported (example _pydecimal). In that case
            you leak an entire DSO and something small like a single
            context object. What is the practical difference between the
            two?

        ii) The import fails and there's no fallback. Usually the
            application stops, otherwise DSO+small leak again.

       iii) Retry the import (I have never seen this):

              while(1):
                  try:
                      import leftpad
                  except (ImportError, MemoryError):
                      continue
                  break

            You could have a legitimate leak here, but see a).



Module initializations are intricate and boring.  I suspect that if
we promote wide changes across PyPI packages we'll see more additional
segfaults than theoretically plugged memory leaks.


Stefan Krah






From zachary.ware+pydev at gmail.com  Thu Apr 28 11:38:22 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Thu, 28 Apr 2016 10:38:22 -0500
Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6
In-Reply-To: <CABoQ3y-g9=9Laphg+M5jjYfXScOfMWqe4bLkeKSYfCd1-B1AAA@mail.gmail.com>
References: <CABoQ3y-g9=9Laphg+M5jjYfXScOfMWqe4bLkeKSYfCd1-B1AAA@mail.gmail.com>
Message-ID: <CAKJDb-PbSyGFJ84c3syR6E1koL4wRLrvafGjnaFS7B+_esY=Xg@mail.gmail.com>

Hi Nilesh,

On Thu, Apr 28, 2016 at 7:00 AM, Nilesh Date <nileshdate1990 at gmail.com> wrote:
> Hi team,
>
> I wanted to install python version 3.4.4 in my RHEL 6 system.
> Can someone give installation process or any reference link from which I can
> get required steps and download desire package.

You have a couple of options.

Option 1: use software collections [1].  As I vaguely understand it
(having never used this myself), the rh-python34 package is supported
by Red Hat, and is like any other package for the most part.  Looking
at that page it does look a bit more complex than option 2 to me, but
I've built and installed Python several times over the past few years
:)

Option 2: compile and install yourself.  At a minimum, you'll need a c
compiler (gcc, icc, or clang are recommended), and development headers
for any extension modules that you require (I'd recommend
openssl-devel and readline-devel at the least).  Then download the
source [2], extract it, and run `cd Python-3.4.4 && ./configure &&
make profile-opt && make test && sudo make install`.  That series of
commands will give you python installed in `/usr/local/` that has been
compiled with profile-guided optimization (PGO) and has passed the
full Python test suite.  If any but the last step fails, nothing will
have changed on your system.

[1] https://www.softwarecollections.org/en/scls/rhscl/rh-python34/
[2] https://www.python.org/downloads/source/

Hope this helps,
-- 
Zach

From gvanrossum at gmail.com  Thu Apr 28 12:30:21 2016
From: gvanrossum at gmail.com (Guido van Rossum)
Date: Thu, 28 Apr 2016 09:30:21 -0700
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <loom.20160428T172130-46@post.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org>
 <loom.20160428T110122-28@post.gmane.org>
 <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>
 <loom.20160428T155718-564@post.gmane.org>
 <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com>
 <loom.20160428T172130-46@post.gmane.org>
Message-ID: <CAP7+vJKKj1xgtXWck48TD3E-fERsCHFCGiUwfvcV9C1q61POAg@mail.gmail.com>

Stefan, could you explain which module you are talking about and why it
would cost you a week? What is your responsibility here?

--Guido (mobile)
On Apr 28, 2016 8:28 AM, "Stefan Krah" <stefan at bytereef.org> wrote:

> Random832 <random832 <at> fastmail.com> writes:
> > A more relevant point would be that _decimal does *not* use the API in a
> > way *which would be broken by the proposed change*, regardless of
> > whether the way in which it uses it is subjectively correct or can cause
> > leaks.
>
> And the ultimate point is that I don't want to spend about a week per year
> to evaluate the effect of needless code changes on a highly audited module.
>
> And no, this isn't theoretical...
>
>
> Stefan Krah
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160428/118cfac5/attachment.html>

From ethan at stoneleaf.us  Thu Apr 28 12:56:36 2016
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 28 Apr 2016 09:56:36 -0700
Subject: [Python-Dev] Inconsistency of PyModule_AddObject()
In-Reply-To: <loom.20160428T172130-46@post.gmane.org>
References: <nfpotl$i8j$1@ger.gmane.org>
 <CADiSq7dFx0sHfbVUGE+9QxDV1oTUbSUNz_RiXipfLCFwK838yQ@mail.gmail.com>
 <nfqv32$qu1$1@ger.gmane.org>
 <CANerV6k=mo0VH=RTm78D_-8BQifSPzznvy66W+diK5_tPbqMEQ@mail.gmail.com>
 <nfsgr8$2l2$1@ger.gmane.org> <loom.20160428T110122-28@post.gmane.org>
 <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com>
 <loom.20160428T155718-564@post.gmane.org>
 <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com>
 <loom.20160428T172130-46@post.gmane.org>
Message-ID: <572240C4.9010107@stoneleaf.us>

On 04/28/2016 08:26 AM, Stefan Krah wrote:
> Random832 writes:

>> A more relevant point would be that _decimal does *not* use the API in a
>> way *which would be broken by the proposed change*, regardless of
>> whether the way in which it uses it is subjectively correct or can cause
>> leaks.
>
> And the ultimate point is that I don't want to spend about a week per year
> to evaluate the effect of needless code changes on a highly audited module.
>
> And no, this isn't theoretical...

Considering you have to opt-in to the change, why would this be a big 
deal for you?

Or are you saying you'd rather have the PyModule_AddObject deprecated 
(without removal?), and a new PyWhatever_Whatever to take it's place?

--
~Ethan~

From brett at python.org  Thu Apr 28 15:07:48 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 28 Apr 2016 19:07:48 +0000
Subject: [Python-Dev] Anyone want to lead the sprints at PyCon US 2016?
In-Reply-To: <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
References: <CADx+GQPKnvWWB1e_kL6Hqh7skuDiVRBMK8szs5QGS4NGt-rnJg@mail.gmail.com>
 <CAP1=2W4sDeTTZWUxkoieD+bJprXFK=5NfBkLUg=vopgNFPSMYg@mail.gmail.com>
Message-ID: <CAP1=2W4Sb94TPeViPwTmOEkCNikjm4+BCVM32E4wF-j54-tyFw@mail.gmail.com>

No one stepped forward to lead the sprints this year, so I will put myself
as the sprint leader and lean on everyone else who appears to help. :)

On Tue, 5 Apr 2016 at 09:36 Brett Cannon <brett at python.org> wrote:

> The call has started to go out for sprint groups to list themselves
> online. Anyone want to specifically lead the core sprint this year? If no
> one specifically does then I will sign us up and do my usual thing of
> pointing people at the devguide and encourage people to ask questions but
> not do a lot of hand-holding (I'm expecting to be busy either working on
> GitHub migration stuff or doing other things that I have been neglecting
> due to my GitHub migration work).
>
> ---------- Forwarded message ---------
> From: Ewa Jodlowska <ewa at python.org>
> Date: Mon, 4 Apr 2016 at 07:14
> Subject: [PSF-Community] Sprinting at PyCon US 2016
> To: <psf-community at python.org>
>
>
> Are you coming to PyCon US? Have you thought about sprinting?
>
> The coding Sprints are the hidden gem of PyCon, up to 4 days (June 2-5) of
> coding with many Python projects and their maintainers. And if you're
> coming to PyCon, taking part in the Sprints is easy!
>
> You don?t need to change your registration* to join the Sprints. There?s
> no additional registration fee, and you even get lunch. You do need to
> cover the additional lodging and other meals, but that?s it. If you?ve
> booked a room through the PyCon registration system, you'll need to contact
> the registration team at pycon2016 at cteusa.com as soon as possible to
> request the extra nights. The sprinting itself (along with lunch every day)
> is free, so your only expenses are your room and other meals.
>
> If you're interested in what projects will be sprinting, just keep an eye
> on the sprints page on the PyCon web site at
> https://us.pycon.org/2016/community/sprints/ Be sure to check back, as
> groups are being added all the time.
>
> If you haven't sprinted before, or if you just need to brush up on
> sprinting tools and techniques, there will again be an 'Intro to Sprinting'
> session the evening of June 1, lead by Shauna Gordon-McKeon and other
> members of Python community. To grab a free ticket for this session, just
> visit
> https://www.eventbrite.com/e/introduction-to-open-source-the-pycon-sprints-tickets-22435151141
> .
>
> *Please note that conference registration is sold out, but you do not need
> a conference registration to come to the Sprints.
>
> _______________________________________________
> PSF-Community mailing list
> PSF-Community at python.org
> https://mail.python.org/mailman/listinfo/psf-community
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160428/bbddef32/attachment.html>

From larry at hastings.org  Thu Apr 28 21:35:57 2016
From: larry at hastings.org (Larry Hastings)
Date: Thu, 28 Apr 2016 18:35:57 -0700
Subject: [Python-Dev] Release schedule for Python 3.5.2
Message-ID: <5722BA7D.2020008@hastings.org>



I've been holding off on the hope that one or two bugs would get fixes.  
But those seem to have stalled.  So I think it's time that we pushed out 
a 3.5.2.  Maybe announcing a schedule will light a fire under some rumps.

I put "Spring 2016" as the release date for 3.5.2 on the 3.5 release 
schedule PEP.  Officially, spring ends--and summer begins--Tuesday June 
21 at 12:24am EDT.  However on the off chance that the PyCon sprints are 
productive, I want to hold off until those are done, and maybe give it a 
couple extra days for the dust to settle.  Last sprint day is Sunday 
June 5th.  So, bottom line, the RC will happen during spring, but the 
final release will technically be during summer.

3.5.2 RC 1 - tag Sat June 11, release Sun June 12
3.5.2 Final - tag Sat June 25, release Sun June 26

Any problems with that?  Speak up now.


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160428/d94f87b2/attachment.html>

From ncoghlan at gmail.com  Fri Apr 29 04:37:58 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 29 Apr 2016 18:37:58 +1000
Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6
In-Reply-To: <CAKJDb-PbSyGFJ84c3syR6E1koL4wRLrvafGjnaFS7B+_esY=Xg@mail.gmail.com>
References: <CABoQ3y-g9=9Laphg+M5jjYfXScOfMWqe4bLkeKSYfCd1-B1AAA@mail.gmail.com>
 <CAKJDb-PbSyGFJ84c3syR6E1koL4wRLrvafGjnaFS7B+_esY=Xg@mail.gmail.com>
Message-ID: <CADiSq7faXQCsDHpfz1zCGa_WU9wx3t7O-j_wPqoYjWHi8tT_bA@mail.gmail.com>

On 29 April 2016 at 01:38, Zachary Ware <zachary.ware+pydev at gmail.com> wrote:
> Hi Nilesh,
>
> On Thu, Apr 28, 2016 at 7:00 AM, Nilesh Date <nileshdate1990 at gmail.com> wrote:
>> Hi team,
>>
>> I wanted to install python version 3.4.4 in my RHEL 6 system.
>> Can someone give installation process or any reference link from which I can
>> get required steps and download desire package.
>
> You have a couple of options.
>
> Option 1: use software collections [1].  As I vaguely understand it
> (having never used this myself), the rh-python34 package is supported
> by Red Hat, and is like any other package for the most part.  Looking
> at that page it does look a bit more complex than option 2 to me, but
> I've built and installed Python several times over the past few years
> :)

Note that the versions hosted on softwarecollections.org are provided
by the SCLo CentOS SIG.

For the commercially supported versions, most RHEL subscriptions
include access to the relevant channels:
https://access.redhat.com/solutions/472793

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mdione at grulic.org.ar  Fri Apr 29 10:45:04 2016
From: mdione at grulic.org.ar (Marcos Dione)
Date: Fri, 29 Apr 2016 16:45:04 +0200
Subject: [Python-Dev] Convert int() to size_t in Python/C
Message-ID: <20160429144504.GA17754@diablo.grulicueva.local>


    First of all, I'm not subbscribed to the list (too much traffic for
me), so please CC: me in any answers if possible.

    I'm trying to add a new syscall to the os module:

https://bugs.python.org/issue26826

    One of the few missing parts is to cenvert a parameter, which would
be a Python int object using PyArg_ParseTupleAndKeywords() to a size_t
variable. For something similar, the 'n' format exists, but that one
converts to Py_ssize_t (which is ssize_t, really), but that one is
signed.

    One possible solution hat was suggested to me in the #python IRC
channel was to use that, then test if the resulting value is negative,
and adjust accordingly, but I wonder if there is a cleaner, more general
solution (for instance, what if the type was something else, like loff_t,
although for that one in particular there *is* a convertion
function/macro).

-- 
(Not so) Random fortune:
Premature optimization is the root of all evil.
	    -- Donald Knuth

From ruizriverafelipejavier at yahoo.com.mx  Fri Apr 29 04:04:08 2016
From: ruizriverafelipejavier at yahoo.com.mx (ruizriverafelipejavier at yahoo.com.mx)
Date: Fri, 29 Apr 2016 08:04:08 +0000 (UTC)
Subject: [Python-Dev] Problemas con modulos
References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com>
Message-ID: <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>

? Hola,
Estoy intentando conectarme a twitter para recibir tweets, sin embargo algunos c?digos que he bajado de internet, me indican que debo de instalar tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n instalados. tweepy no reporta problemas, lo?invoco?en la l?nea de comandos, todo bien, Igual con?matplotlib requiere de varias dependencias (dateutils, numpy, tornado, etc, ya las instales)?antes de su instalaci?n, pero ya en el editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje:
ImportError: No module named 'matplotlib'
alguna idea?
Felipe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160429/393510ff/attachment.html>

From random832 at fastmail.com  Fri Apr 29 11:26:31 2016
From: random832 at fastmail.com (Random832)
Date: Fri, 29 Apr 2016 11:26:31 -0400
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <20160429144504.GA17754@diablo.grulicueva.local>
References: <20160429144504.GA17754@diablo.grulicueva.local>
Message-ID: <1461943591.3516157.593502305.734D3C1F@webmail.messagingengine.com>

On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote:
>     One possible solution hat was suggested to me in the #python IRC
> channel was to use that, then test if the resulting value is negative,
> and adjust accordingly, but I wonder if there is a cleaner, more general
> solution (for instance, what if the type was something else, like loff_t,
> although for that one in particular there *is* a convertion
> function/macro).

In principle, you could just use PyLong_AsUnsignedLong (or LongLong),
and raise OverflowError manually if the value happens to be out of
size_t's range. (99% sure that on every linux platform unsigned long is
the same size as size_t.

But it's not like it'd be the first function in OS to call a system call
that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer
protocol, which uses Py_ssize_t. How concerned are you really about the
lost range here? What does the system call return (its return type is
ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard
to test, just try copying a >2GB file on a 32-bit system)

I'm more curious about what your calling convention is going to be for
off_in and off_out. I can't think of any other interfaces that have
optional output parameters. Python functions generally deal with output
parameters in the underlying C function (there are a few examples in
math) by returning a tuple.

Maybe return a tuple (returned value, off_in, off_out), where None
corresponds to the input parameter having been NULL (and passing None in
makes it use NULL)?

From status at bugs.python.org  Fri Apr 29 12:08:40 2016
From: status at bugs.python.org (Python tracker)
Date: Fri, 29 Apr 2016 18:08:40 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20160429160840.62E8B5688D@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2016-04-22 - 2016-04-29)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    5475 (-16)
  closed 33167 (+72)
  total  38642 (+56)

Open issues with patches: 2380 


Issues opened (40)
==================

#26348: activate.fish sets VENV prompt incorrectly
http://bugs.python.org/issue26348  reopened by brett.cannon

#26830: Refactor Tools/scripts/google.py
http://bugs.python.org/issue26830  opened by franciscouzo

#26832: ProactorEventLoop doesn't support stdin/stdout nor files with 
http://bugs.python.org/issue26832  opened by Gabriel Mesquita Cangussu

#26833: returning ctypes._SimpleCData objects from callbacks
http://bugs.python.org/issue26833  opened by tilsche

#26834: Add truncated SHA512/224 and SHA512/256
http://bugs.python.org/issue26834  opened by christian.heimes

#26835: Add file-sealing ops to fcntl
http://bugs.python.org/issue26835  opened by christian.heimes

#26836: Add memfd_create to os module
http://bugs.python.org/issue26836  opened by christian.heimes

#26839: Python 3.5 running in a virtual machine with Linux kernel 3.17
http://bugs.python.org/issue26839  opened by doko

#26844: Wrong error message during import
http://bugs.python.org/issue26844  opened by lev.maximov

#26845: Misleading variable name in exception handling
http://bugs.python.org/issue26845  opened by Valentin.Lorentz

#26848: asyncio.subprocess's communicate() method mishandles empty inp
http://bugs.python.org/issue26848  opened by oconnor663

#26849: android does not support versioning in SONAME
http://bugs.python.org/issue26849  opened by xdegaye

#26850: PyMem_RawMalloc(): update also sys.getallocatedblocks() in deb
http://bugs.python.org/issue26850  opened by haypo

#26851: android compilation and link flags
http://bugs.python.org/issue26851  opened by xdegaye

#26852: add a COMPILEALL_FLAGS Makefile variable
http://bugs.python.org/issue26852  opened by xdegaye

#26855: add platform.android_ver() for android
http://bugs.python.org/issue26855  opened by xdegaye

#26856: android does not have pwd.getpwall()
http://bugs.python.org/issue26856  opened by xdegaye

#26858: setting SO_REUSEPORT fails on android
http://bugs.python.org/issue26858  opened by xdegaye

#26859: unittest fails with "Start directory is not importable"
http://bugs.python.org/issue26859  opened by xdegaye

#26860: os.walk and os.fwalk yield namedtuple instead of tuple
http://bugs.python.org/issue26860  opened by palaviv

#26861: shutil.copyfile() doesn't close the opened files
http://bugs.python.org/issue26861  opened by vocdetnojz

#26862: SYS_getdents64 does not need to be defined on android API 21
http://bugs.python.org/issue26862  opened by xdegaye

#26864: urllib.request no_proxy check differs from curl
http://bugs.python.org/issue26864  opened by Daniel Morrison

#26865: Meta-issue: support of the android platform
http://bugs.python.org/issue26865  opened by xdegaye

#26866: Inconsistent environment in Windows using "Open With"
http://bugs.python.org/issue26866  opened by busfault

#26867: test_ssl test_options fails on ubuntu 16.04
http://bugs.python.org/issue26867  opened by xiang.zhang

#26868: Document PyModule_AddObject's behavior on error
http://bugs.python.org/issue26868  opened by berker.peksag

#26869: unittest longMessage docs
http://bugs.python.org/issue26869  opened by guettli

#26870: Unexpected call to readline's add_history in call_readline
http://bugs.python.org/issue26870  opened by tylercrompton

#26871: Change weird behavior of PyModule_AddObject()
http://bugs.python.org/issue26871  opened by serhiy.storchaka

#26872: Default ConfigParser in python is not able to load values habi
http://bugs.python.org/issue26872  opened by sorin

#26873: xmlrpclib raises when trying to convert an int to string when 
http://bugs.python.org/issue26873  opened by Nathan Williams

#26876: Extend MSVCCompiler class to respect environment variables
http://bugs.python.org/issue26876  opened by rohitjamuar

#26877: tarfile use wrong code when read from fileobj
http://bugs.python.org/issue26877  opened by mmarkk

#26878: Allow doctest to deep copy globals
http://bugs.python.org/issue26878  opened by DqASe

#26881: modulefinder should reuse the dis module
http://bugs.python.org/issue26881  opened by haypo

#26882: The Python process stops responding immediately after starting
http://bugs.python.org/issue26882  opened by ?????????????????? ????????????????????

#26883: input() call blocks multiprocessing
http://bugs.python.org/issue26883  opened by the

#26884: cross-compilation of extension module links to the wrong pytho
http://bugs.python.org/issue26884  opened by xdegaye

#26885: Add parsing support for more types in xmlrpc
http://bugs.python.org/issue26885  opened by serhiy.storchaka



Most recent 15 issues with no replies (15)
==========================================

#26885: Add parsing support for more types in xmlrpc
http://bugs.python.org/issue26885

#26884: cross-compilation of extension module links to the wrong pytho
http://bugs.python.org/issue26884

#26883: input() call blocks multiprocessing
http://bugs.python.org/issue26883

#26858: setting SO_REUSEPORT fails on android
http://bugs.python.org/issue26858

#26856: android does not have pwd.getpwall()
http://bugs.python.org/issue26856

#26852: add a COMPILEALL_FLAGS Makefile variable
http://bugs.python.org/issue26852

#26851: android compilation and link flags
http://bugs.python.org/issue26851

#26845: Misleading variable name in exception handling
http://bugs.python.org/issue26845

#26836: Add memfd_create to os module
http://bugs.python.org/issue26836

#26835: Add file-sealing ops to fcntl
http://bugs.python.org/issue26835

#26834: Add truncated SHA512/224 and SHA512/256
http://bugs.python.org/issue26834

#26833: returning ctypes._SimpleCData objects from callbacks
http://bugs.python.org/issue26833

#26829: update docs: when creating classes a new dict is created for t
http://bugs.python.org/issue26829

#26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br
http://bugs.python.org/issue26819

#26818: trace CLI doesn't respect -s option
http://bugs.python.org/issue26818



Most recent 15 issues waiting for review (15)
=============================================

#26885: Add parsing support for more types in xmlrpc
http://bugs.python.org/issue26885

#26884: cross-compilation of extension module links to the wrong pytho
http://bugs.python.org/issue26884

#26881: modulefinder should reuse the dis module
http://bugs.python.org/issue26881

#26876: Extend MSVCCompiler class to respect environment variables
http://bugs.python.org/issue26876

#26873: xmlrpclib raises when trying to convert an int to string when 
http://bugs.python.org/issue26873

#26871: Change weird behavior of PyModule_AddObject()
http://bugs.python.org/issue26871

#26868: Document PyModule_AddObject's behavior on error
http://bugs.python.org/issue26868

#26864: urllib.request no_proxy check differs from curl
http://bugs.python.org/issue26864

#26862: SYS_getdents64 does not need to be defined on android API 21
http://bugs.python.org/issue26862

#26860: os.walk and os.fwalk yield namedtuple instead of tuple
http://bugs.python.org/issue26860

#26859: unittest fails with "Start directory is not importable"
http://bugs.python.org/issue26859

#26858: setting SO_REUSEPORT fails on android
http://bugs.python.org/issue26858

#26856: android does not have pwd.getpwall()
http://bugs.python.org/issue26856

#26855: add platform.android_ver() for android
http://bugs.python.org/issue26855

#26852: add a COMPILEALL_FLAGS Makefile variable
http://bugs.python.org/issue26852



Top 10 most discussed issues (10)
=================================

#26826: Expose new copy_file_range() syscal in os module.
http://bugs.python.org/issue26826  20 msgs

#22234: urllib.parse.urlparse accepts any falsy value as an url
http://bugs.python.org/issue22234  12 msgs

#26839: Python 3.5 running in a virtual machine with Linux kernel 3.17
http://bugs.python.org/issue26839  11 msgs

#26864: urllib.request no_proxy check differs from curl
http://bugs.python.org/issue26864  11 msgs

#19251: bitwise ops for bytes of equal length
http://bugs.python.org/issue19251   8 msgs

#26800: Don't accept bytearray as filenames part 2
http://bugs.python.org/issue26800   8 msgs

#26439: ctypes.util.find_library fails when ldconfig/glibc not availab
http://bugs.python.org/issue26439   7 msgs

#19317: ctypes.util.find_library should examine binary's RPATH on Sola
http://bugs.python.org/issue19317   6 msgs

#26039: More flexibility in zipfile write interface
http://bugs.python.org/issue26039   6 msgs

#26348: activate.fish sets VENV prompt incorrectly
http://bugs.python.org/issue26348   6 msgs



Issues closed (64)
==================

#7504: Same name cookies
http://bugs.python.org/issue7504  closed by berker.peksag

#9321: CGIHTTPServer cleanup htbin
http://bugs.python.org/issue9321  closed by berker.peksag

#12305: Building PEPs doesn't work on Python 3
http://bugs.python.org/issue12305  closed by berker.peksag

#12640: test_ctypes seg fault (test_callback_register_double); armv7; 
http://bugs.python.org/issue12640  closed by berker.peksag

#14713: PEP 414 installation hook fails with an AssertionError
http://bugs.python.org/issue14713  closed by berker.peksag

#16394: Reducing tee() memory footprint
http://bugs.python.org/issue16394  closed by rhettinger

#18353: PyUnicode_WRITE_CHAR macro definition missing
http://bugs.python.org/issue18353  closed by berker.peksag

#18551: child_exec() doesn't check return value of fcntl()
http://bugs.python.org/issue18551  closed by berker.peksag

#18572: Remove redundant note about surrogates in string escape doc
http://bugs.python.org/issue18572  closed by berker.peksag

#19731: Fix copyright footer
http://bugs.python.org/issue19731  closed by berker.peksag

#20077: Format of TypeError differs between comparison and arithmetic 
http://bugs.python.org/issue20077  closed by berker.peksag

#20112: The documentation for http.server error_message_format is inad
http://bugs.python.org/issue20112  closed by berker.peksag

#20247: Condition._is_owned is wrong
http://bugs.python.org/issue20247  closed by berker.peksag

#20305: Android's incomplete locale.h implementation prevents cross-co
http://bugs.python.org/issue20305  closed by skrah

#20306: Lack of pw_gecos field in Android's struct passwd causes cross
http://bugs.python.org/issue20306  closed by skrah

#20447: doctest.debug_script: insecure use of /tmp
http://bugs.python.org/issue20447  closed by berker.peksag

#20453: json.load() error message changed in 3.4
http://bugs.python.org/issue20453  closed by berker.peksag

#20598: argparse docs: '7'.split() is confusing magic
http://bugs.python.org/issue20598  closed by martin.panter

#21382: Signal module doesnt raises ValueError Exception
http://bugs.python.org/issue21382  closed by berker.peksag

#22477: GCD in Fractions
http://bugs.python.org/issue22477  closed by serhiy.storchaka

#23277: Cleanup unused and duplicate imports in tests
http://bugs.python.org/issue23277  closed by berker.peksag

#23662: Cookie.domain is undocumented
http://bugs.python.org/issue23662  closed by berker.peksag

#23806: documentation for no_proxy is missing from the python3 urllib 
http://bugs.python.org/issue23806  closed by orsenthil

#23961: IDLE autocomplete window does not automatically close when sel
http://bugs.python.org/issue23961  closed by berker.peksag

#23986: Inaccuracy about "in" keyword for list and tuple
http://bugs.python.org/issue23986  closed by rhettinger

#24296: Queue documentation note needed
http://bugs.python.org/issue24296  closed by rhettinger

#24331: *** Error in `/usr/bin/python': double free or corruption (!pr
http://bugs.python.org/issue24331  closed by berker.peksag

#24715: Sorting HOW TO: bad example for reverse sort stability
http://bugs.python.org/issue24715  closed by rhettinger

#24902: http.server: on startup, show host/port as URL
http://bugs.python.org/issue24902  closed by berker.peksag

#24911: Context manager of socket.socket is not documented
http://bugs.python.org/issue24911  closed by martin.panter

#25243: decouple string-to-boolean logic from ConfigParser.getboolean 
http://bugs.python.org/issue25243  closed by rhettinger

#25420: "import random" blocks on entropy collection on Linux with low
http://bugs.python.org/issue25420  closed by haypo

#25551: Event's test_reset_internal_locks too fragile
http://bugs.python.org/issue25551  closed by berker.peksag

#25788: fileinput.hook_encoded has no way to pass arguments to codecs
http://bugs.python.org/issue25788  closed by serhiy.storchaka

#25981: Intern namedtuple field names
http://bugs.python.org/issue25981  closed by serhiy.storchaka

#26041: Update deprecation messages of platform.dist() and platform.li
http://bugs.python.org/issue26041  closed by berker.peksag

#26089: Duplicated keyword in distutils metadata
http://bugs.python.org/issue26089  closed by berker.peksag

#26249: Change PyMem_Malloc to use pymalloc allocator
http://bugs.python.org/issue26249  closed by haypo

#26322: Missing docs for typing.Set
http://bugs.python.org/issue26322  closed by berker.peksag

#26634: recursive_repr forgets to override __qualname__ of wrapper
http://bugs.python.org/issue26634  closed by serhiy.storchaka

#26672: regrtest missing in the module name
http://bugs.python.org/issue26672  closed by berker.peksag

#26733: staticmethod and classmethod are ignored when disassemble clas
http://bugs.python.org/issue26733  closed by serhiy.storchaka

#26804: Prioritize lowercase proxy variables in urllib.request
http://bugs.python.org/issue26804  closed by orsenthil

#26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu
http://bugs.python.org/issue26822  closed by serhiy.storchaka

#26824: Make some macros use Py_TYPE
http://bugs.python.org/issue26824  closed by serhiy.storchaka

#26827: PyObject *PyInit_myextention -> PyMODINIT_FUNC PyInit_myextent
http://bugs.python.org/issue26827  closed by python-dev

#26831: ConfigParser parsing failures with default_section and Extende
http://bugs.python.org/issue26831  closed by SilentGhost

#26837: assertSequenceEqual() raises BytesWarning when format message
http://bugs.python.org/issue26837  closed by serhiy.storchaka

#26838: sax.xmlreader.InputSource.setCharacterStream() does not work?
http://bugs.python.org/issue26838  closed by sourcejedi

#26840: Hidden test in test_heapq
http://bugs.python.org/issue26840  closed by berker.peksag

#26841: Hidden test in ctypes tests
http://bugs.python.org/issue26841  closed by berker.peksag

#26842: Python Tutorial 4.7.1: Need to explain default parameter lifet
http://bugs.python.org/issue26842  closed by rhettinger

#26843: tokenize does not include Other_ID_Start or Other_ID_Continue 
http://bugs.python.org/issue26843  closed by serhiy.storchaka

#26846: Workaround for non-standard stdlib.h on Android
http://bugs.python.org/issue26846  closed by skrah

#26847: filter docs unclear wording
http://bugs.python.org/issue26847  closed by georg.brandl

#26853: missing symbols in curses and readline modules on android
http://bugs.python.org/issue26853  closed by xdegaye

#26854: missing header on android for the ossaudiodev module
http://bugs.python.org/issue26854  closed by skrah

#26857: gethostbyname_r() is broken on android
http://bugs.python.org/issue26857  closed by skrah

#26863: android lacks some declarations for the posix module
http://bugs.python.org/issue26863  closed by skrah

#26874: Docstring error in divmod function
http://bugs.python.org/issue26874  closed by python-dev

#26875: mmap doc gives wrong code example
http://bugs.python.org/issue26875  closed by python-dev

#26879: Spam
http://bugs.python.org/issue26879  closed by ethan.furman

#26880: Remove redundant checks from set.__init__
http://bugs.python.org/issue26880  closed by serhiy.storchaka

#1145257: shutil.copystat() may fail...
http://bugs.python.org/issue1145257  closed by berker.peksag

From facundobatista at gmail.com  Fri Apr 29 12:37:20 2016
From: facundobatista at gmail.com (Facundo Batista)
Date: Fri, 29 Apr 2016 13:37:20 -0300
Subject: [Python-Dev] Problemas con modulos
In-Reply-To: <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>
References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com>
 <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <CAM09pzTH27Rq29WG=ta_p-Forz3xMxoXdoGf9tK_XEqtsogjjw@mail.gmail.com>

Just to mention that I already answered this (in Spanish, in private),
redirecting to proper lists.

Regards,

2016-04-29 5:04 GMT-03:00 Felipe Ruiz via Python-Dev <python-dev at python.org>:
>   Hola,
>
> Estoy intentando conectarme a twitter para recibir tweets, sin embargo
> algunos c?digos que he bajado de internet, me indican que debo de instalar
> tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n
> instalados. tweepy no reporta problemas, lo invoco en la l?nea de comandos,
> todo bien, Igual con matplotlib requiere de varias dependencias (dateutils,
> numpy, tornado, etc, ya las instales) antes de su instalaci?n, pero ya en el
> editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje:
>
> ImportError: No module named 'matplotlib'
>
> alguna idea?
>
> Felipe
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/facundobatista%40gmail.com
>



-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Twitter: @facundobatista

From guido at python.org  Fri Apr 29 12:52:03 2016
From: guido at python.org (Guido van Rossum)
Date: Fri, 29 Apr 2016 09:52:03 -0700
Subject: [Python-Dev] Problemas con modulos
In-Reply-To: <CAM09pzTH27Rq29WG=ta_p-Forz3xMxoXdoGf9tK_XEqtsogjjw@mail.gmail.com>
References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com>
 <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>
 <CAM09pzTH27Rq29WG=ta_p-Forz3xMxoXdoGf9tK_XEqtsogjjw@mail.gmail.com>
Message-ID: <CAP7+vJK1-z28V3ga=e-3LYSia3p6LAGTO25A+rxKdjAYK3HxdA@mail.gmail.com>

 Thank you Facundo, and thanks for following up here! (I wonder if it
wouldn't have been just as efficient if you had just BCC'ed the list to
your original response? Or perhaps with a brief English note at the top?)

2016-04-29 9:37 GMT-07:00 Facundo Batista <facundobatista at gmail.com>:

> Just to mention that I already answered this (in Spanish, in private),
> redirecting to proper lists.
>
> Regards,
>
> 2016-04-29 5:04 GMT-03:00 Felipe Ruiz via Python-Dev <
> python-dev at python.org>:
> >   Hola,
> >
> > Estoy intentando conectarme a twitter para recibir tweets, sin embargo
> > algunos c?digos que he bajado de internet, me indican que debo de
> instalar
> > tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n
> > instalados. tweepy no reporta problemas, lo invoco en la l?nea de
> comandos,
> > todo bien, Igual con matplotlib requiere de varias dependencias
> (dateutils,
> > numpy, tornado, etc, ya las instales) antes de su instalaci?n, pero ya
> en el
> > editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje:
> >
> > ImportError: No module named 'matplotlib'
> >
> > alguna idea?
> >
> > Felipe
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> >
> https://mail.python.org/mailman/options/python-dev/facundobatista%40gmail.com
> >
>
>
>
> --
> .    Facundo
>
> Blog: http://www.taniquetil.com.ar/plog/
> PyAr: http://www.python.org/ar/
> Twitter: @facundobatista
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160429/78306860/attachment.html>

From tjreedy at udel.edu  Fri Apr 29 13:22:07 2016
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 29 Apr 2016 13:22:07 -0400
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <20160429144504.GA17754@diablo.grulicueva.local>
References: <20160429144504.GA17754@diablo.grulicueva.local>
Message-ID: <92907886-f0bb-4216-b7a1-91774828706b@udel.edu>

On 4/29/2016 10:45 AM, Marcos Dione wrote:
>
>     First of all, I'm not subbscribed to the list (too much traffic for
> me), so please CC: me in any answers if possible.

I am indulging you this once, but the proper solution is to read pydev 
via the gmane.comp.python.devel mirror at news.gmane.com.  You can do so 
either with a newsreader, part of most mail clients, subscribed to the 
group, or with a browser pointed at the site.

There are multiple problems with CC:.  First, the paragraph above may be 
(properly) snipped from replies, so you will not get replies to replies. 
  Second, 'Reply all' is a nuisance as it takes 'all' too literally. 
Since I receive via gmane, Thunderbird tries to reply to both gmane and 
mail.python.org, but the latter is invalid and generates a nuisance 
email as I am not subscribed.  If I were subscribed, sending and posting 
this twice would also be wrong.  Third, and related, CC lists tend to 
grow.  If someone hits 'Reply all' to this message, I will be added to 
the list, and will received a nuisance duplicate email, unless the 
person takes the trouble to remove me.  (They often do not.)

-- 
Terry Jan Reedy


From mdione at grulic.org.ar  Fri Apr 29 14:11:15 2016
From: mdione at grulic.org.ar (Marcos Dione)
Date: Fri, 29 Apr 2016 20:11:15 +0200
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <1461946726.3529591.593566721.7CF3D87D@webmail.messagingengine.com>
Message-ID: <20160429181115.GA19359@diablo.grulicueva.local>

On Fri, Apr 29, 2016 at 12:18:46PM -0400, Random832 wrote:
> On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote:
> >     One possible solution hat was suggested to me in the #python IRC
> > channel was to use that, then test if the resulting value is negative,
> > and adjust accordingly, but I wonder if there is a cleaner, more general
> > solution (for instance, what if the type was something else, like loff_t,
> > although for that one in particular there *is* a convertion
> > function/macro).
> 
> In principle, you could just use PyLong_AsUnsignedLong (or LongLong),
> and raise OverflowError manually if the value happens to be out of
> size_t's range. (99% sure that on every linux platform unsigned long is
> the same size as size_t.
> 
> But it's not like it'd be the first function in OS to call a system call
> that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer
> protocol, which uses Py_ssize_t. How concerned are you really about the
> lost range here? What does the system call return (its return type is
> ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard
> to test, just try copying a >2GB file on a 32-bit system)

    It's a very good point, but I don't have any 32 bits systems around
with a kernel-4.5. I'll try to figure it out and/or ask in the kernel ML.

> I'm more curious about what your calling convention is going to be for
> off_in and off_out. I can't think of any other interfaces that have
> optional output parameters. Python functions generally deal with output
> parameters in the underlying C function (there are a few examples in
> math) by returning a tuple.

    These are not output parameters, even if they're pointers. they'r
using the NULL pointer to signal that the current offsets should not be
touched, to differentiate from a offset of 0. Something that in Python we
would use None.

From random832 at fastmail.com  Fri Apr 29 14:25:31 2016
From: random832 at fastmail.com (Random832)
Date: Fri, 29 Apr 2016 14:25:31 -0400
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <20160429181115.GA19359@diablo.grulicueva.local>
References: <20160429181115.GA19359@diablo.grulicueva.local>
Message-ID: <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com>

On Fri, Apr 29, 2016, at 14:11, Marcos Dione wrote:
>     These are not output parameters, even if they're pointers. they'r
> using the NULL pointer to signal that the current offsets should not be
> touched, to differentiate from a offset of 0. Something that in Python we
> would use None.

That's not actually true according to the documentation. (And if it
were, they could simply use -1 rather than a null pointer)

If you pass a null pointer in, the file's offset is used and *is*
updated, same as if you used an ordinary read/write call. If you pass a
value in, that value is used *and updated* (which makes it an output
parameter) and the file's offset is left alone.

Documentation below, I've >>>highlighted<<< the part that shows they are
used as output parameters:

       The following semantics apply for off_in, and similar statements
       apply to off_out:

       *  If off_in is NULL, then bytes are read from fd_in starting
       from
          the file offset, and the file offset is adjusted by the number
          of
          bytes copied.

       *  If off_in is not NULL, then off_in must point to a buffer that
          specifies the starting offset where bytes from fd_in will be
          read.
          The file offset of fd_in is not changed, >>>but off_in is
          adjusted
          appropriately.<<<

From stephen at xemacs.org  Fri Apr 29 14:35:51 2016
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 30 Apr 2016 03:35:51 +0900
Subject: [Python-Dev] Problemas con modulos
In-Reply-To: <CAP7+vJK1-z28V3ga=e-3LYSia3p6LAGTO25A+rxKdjAYK3HxdA@mail.gmail.com>
References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com>
 <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>
 <CAM09pzTH27Rq29WG=ta_p-Forz3xMxoXdoGf9tK_XEqtsogjjw@mail.gmail.com>
 <CAP7+vJK1-z28V3ga=e-3LYSia3p6LAGTO25A+rxKdjAYK3HxdA@mail.gmail.com>
Message-ID: <22307.43399.857616.124558@turnbull.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 >  Thank you Facundo, and thanks for following up here! (I wonder if it
 > wouldn't have been just as efficient if you had just BCC'ed the list to
 > your original response? Or perhaps with a brief English note at the
 > top?)

BCC'ing lists usually gets your post held, rejected, or just
discarded, although I don't have access to the python-dev
configuration.  IIRC reject is the default in Mailman.


From facundobatista at gmail.com  Fri Apr 29 15:19:02 2016
From: facundobatista at gmail.com (Facundo Batista)
Date: Fri, 29 Apr 2016 16:19:02 -0300
Subject: [Python-Dev] Problemas con modulos
In-Reply-To: <CAP7+vJK1-z28V3ga=e-3LYSia3p6LAGTO25A+rxKdjAYK3HxdA@mail.gmail.com>
References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com>
 <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com>
 <CAM09pzTH27Rq29WG=ta_p-Forz3xMxoXdoGf9tK_XEqtsogjjw@mail.gmail.com>
 <CAP7+vJK1-z28V3ga=e-3LYSia3p6LAGTO25A+rxKdjAYK3HxdA@mail.gmail.com>
Message-ID: <CAM09pzS0eBcc1mJ04T-KHjSc7_3ixsKH4AcYVtmC7Zg7um2YVw@mail.gmail.com>

2016-04-29 13:52 GMT-03:00 Guido van Rossum <guido at python.org>:

>  Thank you Facundo, and thanks for following up here! (I wonder if it
> wouldn't have been just as efficient if you had just BCC'ed the list to your
> original response? Or perhaps with a brief English note at the top?)

Probably yes, I didn't want to mess the list with non-english stuff :)

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Twitter: @facundobatista

From vadmium+py at gmail.com  Fri Apr 29 19:26:53 2016
From: vadmium+py at gmail.com (Martin Panter)
Date: Fri, 29 Apr 2016 23:26:53 +0000
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com>
References: <20160429181115.GA19359@diablo.grulicueva.local>
 <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com>
Message-ID: <CA+eR4cHp4pqUXAeCZT2NrL2=BWeMmrKjd85Vo43oOe3WfitDaw@mail.gmail.com>

On 29 April 2016 at 18:25, Random832 <random832 at fastmail.com> wrote:
> On Fri, Apr 29, 2016, at 14:11, Marcos Dione wrote:
>>     These are not output parameters, even if they're pointers. they'r
>> using the NULL pointer to signal that the current offsets should not be
>> touched, to differentiate from a offset of 0. Something that in Python we
>> would use None.
>
> That's not actually true according to the documentation. (And if it
> were, they could simply use -1 rather than a null pointer)
> . . .
>        *  If off_in is not NULL, then off_in must point to a buffer that
>           specifies the starting offset where bytes from fd_in will be
>           read.
>           The file offset of fd_in is not changed, >>>but off_in is
>           adjusted
>           appropriately.<<<

Linux?s sendfile() syscall takes a similar offset parameter that may
be updated, but Python?s os.sendfile() wrapper does not return the
updated offset. Do you think we need to return the updated offsets for
copy_file_range()?

From vadmium+py at gmail.com  Fri Apr 29 19:42:03 2016
From: vadmium+py at gmail.com (Martin Panter)
Date: Fri, 29 Apr 2016 23:42:03 +0000
Subject: [Python-Dev] Convert int() to size_t in Python/C
In-Reply-To: <20160429181115.GA19359@diablo.grulicueva.local>
References: <1461946726.3529591.593566721.7CF3D87D@webmail.messagingengine.com>
 <20160429181115.GA19359@diablo.grulicueva.local>
Message-ID: <CA+eR4cEyYKa=R-AcvmFm9oHAuT0QibQ5-EHRe5TUiyzH-oodsA@mail.gmail.com>

On 29 April 2016 at 18:11, Marcos Dione <mdione at grulic.org.ar> wrote:
> On Fri, Apr 29, 2016 at 12:18:46PM -0400, Random832 wrote:
>> On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote:
>> >     One possible solution hat was suggested to me in the #python IRC
>> > channel was to use that, then test if the resulting value is negative,
>> > and adjust accordingly, but I wonder if there is a cleaner, more general
>> > solution (for instance, what if the type was something else, like loff_t,
>> > although for that one in particular there *is* a convertion
>> > function/macro).
>>
>> In principle, you could just use PyLong_AsUnsignedLong (or LongLong),
>> and raise OverflowError manually if the value happens to be out of
>> size_t's range. (99% sure that on every linux platform unsigned long is
>> the same size as size_t.
>>
>> But it's not like it'd be the first function in OS to call a system call
>> that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer
>> protocol, which uses Py_ssize_t. How concerned are you really about the
>> lost range here? What does the system call return (its return type is
>> ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard
>> to test, just try copying a >2GB file on a 32-bit system)

I would probably just use Py_ssize_t, since that is what the return
value is. Otherwise, a large positive count input could return a
negative value, which would be inconsistent, and could be mistaken as
an error.

>     It's a very good point, but I don't have any 32 bits systems around
> with a kernel-4.5. I'll try to figure it out and/or ask in the kernel ML.

Maybe you can compile a 32-bit program and run it on a 64-bit computer
(gcc -m32).

From greg.ewing at canterbury.ac.nz  Sat Apr 30 21:22:56 2016
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 01 May 2016 13:22:56 +1200
Subject: [Python-Dev] Bug in 2to3 concerning import statements?
Message-ID: <57255A70.3000404@canterbury.ac.nz>

It seems that 2to3 is a bit simplistic when it comes to
translating import statements. I have a module GUI.py2exe
containing:

    import py2exe.mf as modulefinder

2to3 translates this into:

    from . import py2exe.mf as modulefinder

which is a syntax error.

It looks like 2to3 is getting confused by the fact that
there is both a submodule and a top-level module here
called py2exe. But the original can only be an absolute
import because it has a dot in it, so 2to3 shouldn't be
translating it into a relative one.

Putting "from __future__ import absolute_import" at the
top fixes it, but I shouldn't have to do that, should I?

-- 
Greg