I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
>>> 1/0
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
André
If we could explicitate a too complex expression in an indented next
line, I would use this feature very often :
htmltable = ''.join( '<tr>{}</tr>'.format(htmlline) for line in table)
: # main line
htmlline : ''.join( '<td>{}</td>'.format(cell) for cell in
line) # explicitation(s) line(s)
(Sorry if this has already been discussed earlier on this list, I have
not read all the archives)
*******
in details :
List comprehension "<expression> for x in mylist" often greatly
improve readability of python programs, when <expression> is not too complex
When <expression> is too complex (ex: nested lists), this become not
readable, so we have to find another solution :
a) defining a function expression(x), or an iterator function, wich will
only used once in the code
b) or droping this beautiful syntax to replace it the very basic list
construction :
newlist = []
for x in myiterable
newlist.append(<expression>)
I often choose b), but I dislike both solutions :
- in solution a) function def can be far from list comprehension, in
fact instructions to build the new list are split in two different
places in the code.
- solution b) seems a bit better to me, but the fact we build a new list
from myiterable is not visible in a glance, unlike list comprehensions.
Renouncing to list comprehension occurs rather often when I write python
code
I think we could greatly improve readability if we could keep list
comprehension anywhere in all cases, but when necessary explicit a too
complex expression in an indented line :
htmltable = ''.join( '<tr>{}</tr>'.format(htmlline) for line in table)
: # main line
htmlline : ''.join( '<td>{}</td>'.format(cell) for cell in
line) # explicitation(s) line(s)
In the case the main line is the header of a "classical" indented block
(starting with "for", "if", "with"...) , this idented block would simply
follow the explicitation(s) line(s).
The explicitations lines can be surely identified as the lines than
begin with "identifier :" (when we are not in an unclosed dict)
with open('data.txt') as f :
if line in enumerate(mylist) : # main line
mylist : f.read().strip().lower() # explicitation(s) line(s)
print line # "classical" indented block
Another possible use of "explicitations lines" is a coding style wich
would start by "the whole picture" first, and completing with details
after, wich is the usual way we mentally solve problems.
Let's take an example : we want to write a function wich returns a
multiplication table in a simle html document.
When I solve this problem, il think a bit like that :
- I need to return an html page. For that I need a "header" and a
"body". My body will contain an "htmltable", wich be built from a
"table" of numbers etc.
My code could look like that :
def tag(content, *tags): # little convenient function
retval = content
for t in tags:
retval = '<{0}>{1}</{0}>'.format(t, retval)
return retval
def xhtml_mult_table(a, b):
return tag(header + body, 'html') :
header : tag('multiplication table', 'title')
body : tag(htmltable, 'tbody', 'table', 'body') :
htmltable : ''.join(tag(xhtmlline, 'tr') for line in table) :
table : headerline + otherlines :
headerline : [[''] + range(a)]
otherlines : [[y] + [x*y for x in range(a)] for y
in range(b)]
xhtmlline : ''.join(tag(str(cell), 'td') for cell in line)
This example is a "heavy use" of the "explicitation line" feature, to
illustrate how it could work.
I don't mean this should replace the "classical" syntax everywhere
possible, but for me this would be for me a nice way to explicitate
complex expressions from time to time, and the ability to use list
comprehension everywhere I wan't.
Daniel
Hi,
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
So if you want to create a regex based on bytes as far as I can tell you
have to do something like this:
FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii"))
# or
FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)")
I think it would be much nicer if one could write:
FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)")
# or
FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")
I _slightly_ prefer rb"" to br"" but either would be great:-)
Why would you want a bytes regex?
In my case I am reading PostScript files and PostScript .pfa font files
so that I can embed the latter into the former. But I don't know what
encoding these files use beyond the fact that it is ASCII or some ASCII
superset like Latin1. So in true Python style I don't assume: instead I
read the files as bytes and do all my processing using bytes, at no
point decoding since I only ever insert ASCII characters. I don't think
this is a rare example: with Python 3's clean separation between strings
& bytes (a major advance IMO), I think there will often be cases where
all the processing is done using bytes.
--
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
C++, Python, Qt, PyQt - training and consultancy
"Advanced Qt Programming" - ISBN 0321635906
http://www.qtrac.eu/aqpbook.html
I ordered a Dell netbook with Ubuntu...
I got no OS, no apology, no solution, & no refund (so far)
http://www.qtrac.eu/dont-buy-dell.html
Hi,
I'm interested in a feature which allows users to discard the locals
and globals references from frames held by a traceback object.
Currently, traceback objects are used when capturing and re-raising
exceptions. However, they hold a reference to all frames, which hold a
reference to their locals and globals. These are not needed by the
default traceback output, and can cause serious memory bloat if a
reference to a traceback object is kept for any significant length of
time, and there are even big red warnings in the Python docs about
using them in one frame. (
http://docs.python.org/release/3.1/library/sys.html#sys.exc_info ).
Example usage would be something like:
import sys
try:
1/0
except:
t, v, tb = sys.exc_info()
tb.clean()
# ... much later ...
raise t, v, tb
Which would be basically a function to do this:
import sys
try:
1/0
except:
t, v, tb = sys.exc_info()
c = tb
while c:
c.tb_frame.f_locals = None
c.tb_frame.f_globals = None
c = c.tb_next
# ... much later ...
raise t, v, tb
Twisted has done a very similar thing with their
twisted.python.failure.Failure object, which stringifies the traceback
data and discards the reference to the Python traceback entirely (
http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.0.0/twisted/…
) - they also replicate a lot of traceback printing functions to make
use of this stringified data.
It's worth noting that cgitb and other applications make use of locals
and globals in its traceback output. However, I believe the vast
majority of traceback usage does not make use of these references, and
a significant penalty is paid as a result.
Is there any interest in such a feature?
-Greg
I'm moving this thread to python-ideas, where it belongs.
I've looked at the implementation code (even stepped through it with
pdb!), read the sample/test code, and read the two papers on
animats.com fairly closely (they have a lot of overlap, and the memory
model described below seems copied verbatim from
http://www.animats.com/papers/languages/pythonconcurrency.html version
0.8).
Some reactions (trying to hide my responses to the details of the code):
- First of all, I'm very happy to see radical ideas proposed, even if
they are at present unrealistic. We need a big brainstorm to come up
with ideas from which an eventual solution to the multicore problem
might be chosen. (Jesse Noller's multiprocessing is another; Adam
Olsen's work yet another, at a different end of the spectrum.)
- The proposed new semantics (frozen objects, memory model,
auto-freezing of globals, enforcement of naming conventions) are
radically different from Python's current semantics. They will break
every 3rd party library in many more ways than Python 3. This is not
surprising given the goals of the proposal (and its roots in Adam
Olsen's work) but places a huge roadblock for acceptance. I see no
choice but to keep trying to come up with a compromise that is more
palatable and compatible without throwing away all the advantages. As
it now stands, the proposal might as well be a new and different
language.
- SynchronizedObject looks like a mixture of a Java synchronized class
(a non-standard concept in Java but easily understood as a class all
whose public methods are synchronized) and a condition variable (which
has the same semantics of releasing the lock while waiting but without
crawling the stack for other locks to release). It looks like the
examples showing off SynchronizedObject could be implemented just as
elegantly using a condition variable (and voluntary abstention from
using shared mutable objects).
- If the goal is to experiment with new control structures, I
recommend decoupling them from the memory model and frozen objects,
instead relying (as is traditional in Python) on programmer caution to
avoid races. This would make it much easier to see how programmers
respond to the new control structures.
- You could add the freeze() function for voluntary use, and you could
even add automatic wrapping of arguments and return values for certain
classes using a class decorator or a metaclass, but the performance
overhead makes this unlikely to win over many converts. I don't see
much use for the "whole program freezing" done by the current
prototype -- there are way too many backdoors in Python for the
prototype approach to be anywhere near foolproof, and if we want a
non-foolproof approach, voluntary constraint (and, in some cases,
voluntary, i.e. explicit, wrapping of modules or classes) would work
just as well.
- For a larger-scale experiment with the new memory model and semantic
restrictions (or would it be better to call them syntactic
restrictions? -- after all they are about statically detectable
properties like naming conventions) I recommend looking at PyPy, which
has as one of its explicitly stated project goals easy experimentation
with different object models.
- I'm sure I've forgotten something, but I wanted to keep my impressions fresh.
- Again, John, thanks for taking the time to come up with an
implementation of your idea!
--Guido
On Sat, Jun 26, 2010 at 9:39 AM, John Nagle <nagle(a)animats.com> wrote:
> On 6/26/2010 7:44 AM, Jesse Noller wrote:
>>
>> On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord
>> <fuzzyman(a)voidspace.org.uk> wrote:
>>>
>>> On 26/06/2010 07:11, John Nagle wrote:
>>>>
>>>> We have just released a proof-of-concept implementation of a new
>>>> approach to thread management - "newthreading".
>
> ....
>
>>> The import * form is considered bad practise in *general* and
>>> should not be recommended unless there is a good reason.
>
> I agree. I just did that to make the examples cleaner.
>
>>> however the introduction of free-threading in Python has not been
>>> hampered by lack of synchronization primitives but by the
>>> difficulty of changing the interpreter without unduly impacting
>>> single threaded code.
>
> That's what I'm trying to address here.
>
>>> Providing an alternative garbage collection mechanism other than
>>> reference counting would be a more interesting first-step as far as
>>> I can see, as that removes the locking required around every access
>>> to an object (which currently touches the reference count).
>>> Introducing free-threading by *changing* the threading semantics
>>> (so you can't share non-frozen objects between threads) would not
>>> be acceptable. That comment is likely to be based on a
>>> misunderstanding of your future intentions though. :-)
>
> This work comes out of a discussion a few of us had at a restaurant
> in Palo Alto after a Stanford talk by the group at Facebook which
> is building a JIT compiler for PHP. We were discussing how to
> make threading both safe for the average programmer and efficient.
> Javascript and PHP don't have threads at all; Python has safe
> threading, but it's slow. C/C++/Java all have race condition
> problems, of course. The Facebook guy pointed out that you
> can't redefine a function dynamically in PHP, and they get
> a performance win in their JIT by exploiting this.
>
> I haven't gone into the memory model in enough detail in the
> technical paper. The memory model I envision for this has three
> memory zones:
>
> 1. Shared fully-immutable objects: primarily strings, numbers,
> and tuples, all of whose elements are fully immutable. These can
> be shared without locking, and reclaimed by a concurrent garbage
> collector like Boehm's. They have no destructors, so finalization
> is not an issue.
>
> 2. Local objects. These are managed as at present, and
> require no locking. These can either be thread-local, or local
> to a synchronized object. There are no links between local
> objects under different "ownership". Whether each thread and
> object has its own private heap, or whether there's a common heap with
> locks at the allocator is an implementation decision.
>
> 3. Shared mutable objects: mostly synchronized objects, but
> also immutable objects like tuples which contain references
> to objects that aren't fully immutable. These are the high-overhead
> objects, and require locking during reference count updates, or
> atomic reference count operations if supported by the hardware.
> The general idea is to minimize the number of objects in this
> zone.
>
> The zone of an object is determined when the object is created,
> and never changes. This is relatively simple to implement.
> Tuples (and frozensets, frozendicts, etc.) are normally zone 2
> objects. Only "freeze" creates collections in zones 1 and 3.
> Synchronized objects are always created in zone 3.
> There are no difficult handoffs, where an object that was previously
> thread-local now has to be shared and has to acquire locks during
> the transition.
>
> Existing interlinked data structures, like parse trees and GUIs,
> are by default zone 2 objects, with the same semantics as at
> present. They can be placed inside a SynchronizedObject if
> desired, which makes them usable from multiple threads.
> That's optional; they're thread-local otherwise.
>
> The rationale behind "freezing" some of the language semantics
> when the program goes multi-thread comes from two sources -
> Adam Olsen's Safethread work, and the acceptance of the
> multiprocessing module. Olsen tried to retain all the dynamism of
> the language in a multithreaded environment, but locking all the
> underlying dictionaries was a boat-anchor on the whole system,
> and slowed things down so much that he abandoned the project.
> The Unladen Swallow documentation indicates that early thinking
> on the project was that Olsen's approach would allow getting
> rid of the GIL, but later notes indicate that no path to a
> GIL-free JIT system is currently in development.
>
> The multiprocessing module provides semantics similar to
> threading with "freezing". Data passed between processes is "frozen"
> by pickling. Processes can't modify each other's code. Restrictive
> though the multiprocessing module is, it appears to be useful.
> It is sometimes recommended as the Pythonic approach to multi-core CPUs.
> This is an indication that "freezing" is not unacceptable to the
> user community.
>
> Most of the real-world use cases for extreme dynamism
> involve events that happen during startup. Configuration files are
> read, modules are selectively included, functions are overridden, tables
> of references to functions are set up, regular expressions are compiled,
> and the code is brought into the appropriately configured state. Then
> the worker threads are started and the real work starts. The
> "newthreading" approach allows all that.
>
> After two decades of failed attempts remove the Global
> Interpreter Lock without making performance worse, it is perhaps
> time to take a harder look at scaleable threading semantics.
>
> John Nagle
> Animats
--
--Guido van Rossum (python.org/~guido)
On Sat, Jun 26, 2010 at 10:39, John Nagle <nagle(a)animats.com> wrote:
> The rationale behind "freezing" some of the language semantics
> when the program goes multi-thread comes from two sources -
> Adam Olsen's Safethread work, and the acceptance of the
> multiprocessing module. Olsen tried to retain all the dynamism of
> the language in a multithreaded environment, but locking all the
> underlying dictionaries was a boat-anchor on the whole system,
> and slowed things down so much that he abandoned the project.
> The Unladen Swallow documentation indicates that early thinking
> on the project was that Olsen's approach would allow getting
> rid of the GIL, but later notes indicate that no path to a
> GIL-free JIT system is currently in development.
That's not true. Refcounting was the boat-anchor, not dicts. I was
unable to come up with a relatively simple replacement that scaled
fully.
The dicts shared as module globals and class dicts were a design
issue, but more of an ideological one: concurrency mentality says you
should only share immutable objects. Python prefers ad-hoc design,
where you can do what you want so long as it's not particularly nasty.
I was unable to find a way to have both, so I declared the python
mentality the winner.
The shareddict I came up with uses a read write lock, so that it's
safe when you do mutate and doesn't bottleneck when you don't mutate.
The only thing fancy was my method of checkpointing when doing a
readlock->writelock transition, but there's a hundred other ways to
accomplish that.
One of the common complains about working with time values in Python,
is that it some functionality is available in time module, some in
datetime module and some in both.
I propose a series of steps towards improving this situation.
1. Create posixtime.py initially containing just "from time import *"
2. Add python implementation of time.* functions to posixtime.py.
3. Rename time module to _posixtime and add time.py with a deprecation
warning and "from _posixtime import *".
Note that #2 may require to move some code from timemodule.c to
datetimemodule.c, but at the binary level code compiled from these
files is already linked together in datetimemodule. Moving the
necessary code to datetime.c will help to eliminate current circular
dependency between time and datetime.
On Thu, Jun 17, 2010 at 1:01 AM, Bruce Leban <bruce(a)leapyear.org> wrote:
..
> When you say "And where in the docs would you explain the following: :-)"
> that sounds like you're saying "this is too confusing we shouldn't document
> it." To which I can only say :-(
I presented what I consider to be a bug. I opened an issue 9004, [1]
"datetime.utctimetuple() should not set tm_isdst flag to 0" for that.
There is no point in documenting the following as expected behavior:
>>> time.strftime('%c %z %Z', datetime.utcnow().utctimetuple())
'Wed Jun 16 03:26:26 2010 -0500 EST'
I believe it is better to fix it so that it produces
>>> time.strftime('%c %z %Z', datetime.utcnow().utctimetuple())
'Wed Jun 16 03:26:26 2010 '
instead.
This, however shows limitation of datetime to timetuple conversion:
there is currently no mechanism to store daylight saving time info in
datetime object. See issue 9013. [2] Rather than fixing that, it
would be much better to eliminate need for datetime to timetuple
conversion in the first place.
[1] http://bugs.python.org/issue9004
[2] http://bugs.python.org/issue9013
-1 to moving anything
The situation is confusing and moving things will add to that confusion for
a significant length of time.
What I would instead suggest is improving the docs. If I could look in one
place to find any time function it would mitigate the fact that they're
implemented in multiple places.
--- Bruce
(via android)
On Jun 16, 2010 12:56 AM, "M.-A. Lemburg" <mal(a)egenix.com> wrote:
Brett Cannon wrote:
> On Tue, Jun 15, 2010 at 16:01, Cameron Simpson <cs(a)zip.com.au> wrote:
>> On 15...
-1.
Please note that the time module provides access to low-level OS
provided services which the datetime module does not expose.
You cannot seriously expect that an application which happily uses
the time module (only) for its limited date/time functionality
to have to be rewritten just to stay compatible with Python.
Note that not all applications are interested in sub-second
accuracy and a computer without properly configured NTP and good
internal clock doesn't even provide this accuracy to begin with
(even if they happily pretend they do by exposing sub-second
floats).
You might want to do that for Python4 and then add all those
time module functions using struct_time to the datetime
module (returning datetime instances), but for Python3, we've
had the stdlib reorg already.
Renaming time -> posixtime falls into the same category.
The only improvement I could see, would be to move
calendar.timegm() to the time module, since that's where
it belongs (keeping an alias in the calendar module, of
course).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jun 16 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Ad...
2010-07-19: EuroPython 2010, Birmingham, UK 32 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, ...
Python-ideas mailing list
Python-ideas(a)python.org
http://mail.python.org/mailman/listinfo/python-ide...
On Wed, Jun 16, 2010 at 1:37 PM, Brett Cannon <brett(a)python.org> wrote:
..
>> The only improvement I could see, would be to move
>> calendar.timegm() to the time module, since that's where
>> it belongs (keeping an alias in the calendar module, of
>> course).
>
> That should definitely happen at some point.
>
This is discussed in Issue 6280 <http://bugs.python.org/issue6280>.
There are several issues with this proposal:
1. According to help(time),
"""
The Epoch is system-defined; on Unix, it is generally January 1st, 1970.
The actual value can be retrieved by calling gmtime(0).
"""
Current calendar.gmtime implementation ignores this. The solution to
this, may be to change help(time), though.
2. Current calendar.gmtime supports float values for hours, minutes
and seconds in timedelta tuple. This is probably an unintended
implementation artifact, but it is relied upon even in stdlib. See
http://bugs.python.org/issue6280#msg107808 .