Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character").
I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes())
Below is my own implementation of this as a generator, as an example and for reference.
def ismodifier(char): return unicodedata.category(char) == 'M'
start = 0
for end, char in enumerate(string):
if not ismodifier(char) and not start == end:
start = end
>>> import pickle
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Runtimes\Python\lib\pickle.py", line 1374, in dumps
File "C:\Runtimes\Python\lib\pickle.py", line 224, in dump
File "C:\Runtimes\Python\lib\pickle.py", line 306, in save
rv = reduce(self.proto)
File "C:\Runtimes\Python\lib\copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle module objects
I know that you can't, but why not? You can pickle class objects and
function objects in modules, and their static path is stored so that when
you unpickle it, the correct module is imported and the function is
retrieved from that module. It seems odd an inconsistent that you can't
pickle the module object itself; can't it just store itself as a name, and
have loads() import it and return the resultant module object?
This isn't entirely of academic interest; I'm working with some code which
is meant to pickle/unpickle arbitrary things, and occasionally it blows up
when i accidentally pass in a module. It's always possible to work around
the TypeErrors by just changing the stuff I pass in for serializing from
the module to the exact function/class i want, but it seems like needless
Is there any good reason we don't let people pickle module objects using
the same technique that we use to pickle classes and top-level functions?
I found myself repeating something that I know I have used before, several
times: I get two sets of results, may be sets of the passing tests when a
design has changed, and I need to work out what has changed so work out
1. What passed first time round
2. What passed both times.
3. What passed only the second time round.
I usually use something like the set equations in the title to do this but
I recognise that this requires both sets to be traversed at least three
times which seems wasteful.
I wondered if their was am algorithm to partition the two sets of data into
three as above, but cutting down on the number of set traversals?
I also wondered that if such an algorithm existed, would it be useful
enough to be worth incorporating into the Python library?
Maybe defined as:
exclusively1, common, exclusively2 = set1.partition(set2)
I know this ship has already sailed with PEP 435, but I've been working on
a prototype implementation of enums using MacroPy
The goal was to have enums whose methods provide most of the nice
capabilities of int enums (auto-generated indexes, fast comparison,
incrementing, index-arithmetic, find-by-index) and string enums (nice
__repr__, find-by-name) but has the ability to smoothly scale to
full-fledged objects with methods *and* fields, all this is an extremely
The gimmick here is that it uses macros to provide a concise syntax to
allow you to construct each enum instance with whatever constructor
parameters you wish, java-enum-style
This allows a degree of enums-as-objects which i haven't seen in any other
library (most don't allow custom fields).
Probably not standard-library worthy, but I thought it was pretty cool.
While I was implementing JSON-JWS (JSON web signatures), a format
which in Python 3 has to go from bytes > unicode > bytes > unicode
several times in its construction, I notice I wrote a lot of bugs:
When I meant to say:
Everything worked perfectly on Python 3 because the verifying code
also generated the sha256=b'abcdef1234' as a comparison. I would have
never noticed at all unless I had tried to verify the Python 3 output
with Python 2.
I know I'm a bad person for not having unit tests capable enough to
catch this bug, a bug I wrote repeatedly in each layer of the bytes >
unicode > bytes > unicode dance, and that there is no excuse for being
confused at any time about the type of a variable, but I'm not willing
Instead, I would like a new string formatting operator tentatively
called 'notbytes': "sha256=%notbytes" % (b'abcdef1234'). It gives the
same error as 'sha256='+b'abc1234' would: TypeError: Can't convert
'bytes' object to str implictly
---------- Forwarded message (apologies, the CC says it all) ----------
From: Joshua Landau <joshua.landau.ws(a)gmail.com>
Date: 4 July 2013 00:57
Subject: Re: [Python-ideas] exclusively1, common, exclusively2 = set1
- set2, set1 & set2, set2 - set1
To: Paddy3118 <paddy3118(a)gmail.com>
On 3 July 2013 21:50, Paddy3118 <paddy3118(a)gmail.com> wrote:
> I found myself repeating something that I know I have used before, several
> times: I get two sets of results, may be sets of the passing tests when a
> design has changed, and I need to work out what has changed so work out
> 1. What passed first time round
> 2. What passed both times.
> 3. What passed only the second time round.
> I usually use something like the set equations in the title to do this but I
> recognise that this requires both sets to be traversed at least three times
> which seems wasteful.
As far as I understand, this requires only 3 traversals in total.
> I wondered if their was am algorithm to partition the two sets of data into
> three as above, but cutting down on the number of set traversals?
You could cut it down to two, AFAIK. This seems like a minor gain.
> I also wondered that if such an algorithm existed, would it be useful enough
> to be worth incorporating into the Python library?
> Maybe defined as:
> exclusively1, common, exclusively2 = set1.partition(set2)
Something more useful, as it's just as good in this case could be:
set1.partition(set2) === set1 - set2, set1 & set 2
similarly to how we have divmod.
As you all know, Python supports a compound "with" statement to avoid the
necessity of nesting these statements.
Unfortunately, I find that using this feature often leads to exceeding the
79-character recommendation set forward by PEP 8.
# The following is over 79 characters
with open("/long/path/to/file1") as file1, open("/long/path/to/file2") as
This can be avoided by using the line continuation character, like so:
with open("/long/path/to/file1") as file1, \
open("/long/path/to/file2") as file2:
But PEP-8 prefers using implicit continuation with parentheses over line
continuation. PEP 328 states that using the line continuation character is
"unpalatable", which was the justification for allowing multi-line imports
from package.subpackage import (UsefulClass1, UsefulClass2,
Is there a reason we cannot do the same thing with compound with
statements? Has this been suggested before? If so, why was it rejected?
with (open("/long/path/to/file1") as file1,
open("/long/path/to/file2") as file2):
I would be happy to write the PEP for this and get plugged in to the Python
development process if this is an idea worth pursuing.
On Jun 27, 2013, at 5:19, Nick Coghlan <ncoghlan(a)gmail.com> wrote:
>> If you're willing, I'm actually thinking this may be one of those
>> discussions that's worth summarising in a PEP, even if it's just to
>> immediately mark it Rejected. Similar to PEP 315 and a few other PEPs,
>> it can help to have a document that clearly spells out the objective
>> (which I think you summarised nicely as "trying to find a syntactic
>> replacement for itertools.takewhile, just as comprehensions replaced
>> many uses of map and filter"), even if no acceptable solution has been
>The ideas are pretty broad-ranging. In particular, #1 got two somewhat
>supportive responses that had nothing to do with the comprehension
>idea. Do side issues like that need to be discussed first/separately before
>referencing them in a while clause PEP?
>Also, we seem pretty far from a consensus on what the actual tradeoffs are
>most of the options. For example, is the definition of comps in terms of
>the core idea behind the abstraction, or a minor triviality that's only
>in understanding bad code? Are the differences between comps and genexps a
>or a feature?
>Finally, how does this connect up to the original idea of this thread,
>to draft a PEP for while clauses in loop statements rather than in
>comprehensions? Obviously if that existed, it would change the options for
>syntax. Do we need to include that idea in the discussion? (I completely
>about it while digging up all of the ideas that spun off from it...)
I don't think while clauses in loop statements are a big gain for the
There's break and despite all the programming-style discussions going on in
part of the thread, it has been working well for many years and most people
it intuitive to use.
>> That phrasing of the objective also highlights a counter argument I
>> hadn't considered before: if we don't consider takewhile a common
>> enough use case to make it a builtin, why on *earth* are we even
>> discussing the possibility of giving it dedicated syntax?
>Besides early termination for comps, the only good use case I know of is in
>that already makes heavy use of itertools, and making one of the functions
>builtin wouldn't change anything.
>And if early termination for comps is a special case, it doesn't seem too
>unreasonable to consider other ways to handle it.
>But you're right that "make takewhile a builtin" should probably be
>among the alternatives.
But a builtin takewhile would still not come with nicer and easier to read
I guess the use cases are not that rare, it's just that right now people
to explicit loops when they need early termination because it keeps things
I'm encountering this situation quite regularly (reading a small header from
files is the most common example, but there are others).
Let me suggest one more solution although it requires a new keyword:
and define its translation as if condition: break .
You can now write
(x for x in iterable breakif x < 0)
and I don't see a way how that could possibly be misread by anyone.
Also it would translate unambiguously to the explicit:
for x in iterable:
breakif x<0 # itself translating to if x<0: break
It would work with genexps, comprehensions and explicit loops alike (with
little benefit for the later, though maybe it increases readability even
by making it clear from the start of the line what the purpose of the
Yes, but it only works for generator expressions and not comprehensions. My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter(a)g.hmc.edu> wrote:
> Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
> def stop():
> raise StopIteration
> list(i for i in range(100) if i < 50 or stop())
> it seems to me that this would provide syntax that doesn't require lambdas.
> On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky(a)gmail.com> wrote:
> On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane(a)umbrellacode.com> wrote:
> [x until condition for x in l ...] or
> [x for x in l until condition]
> Just to throw in one more variation:
> [expr for item in iterable break if condition]
> (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b")
> Python-ideas mailing list