Hi.
[Mark Hammond]
> The point isn't about my suffering as such. The point is more that
> python-dev owns a tiny amount of the code out there, and I don't believe we
> should put Python's users through this.
>
> Sure - I would be happy to "upgrade" all the win32all code, no problem. I
> am also happy to live in the bleeding edge and take some pain that will
> cause.
>
> The issue is simply the user base, and giving Python a reputation of not
> being able to painlessly upgrade even dot revisions.
I agree with all this.
[As I imagined explicit syntax did not catch up and would require
lot of discussions.]
[GvR]
> > Another way is to use special rules
> > (similar to those for class defs), e.g. having
> >
> > <frag>
> > y=3
> > def f():
> > exec "y=2"
> > def g():
> > return y
> > return g()
> >
> > print f()
> > </frag>
> >
> > # print 3.
> >
> > Is that confusing for users? maybe they will more naturally expect 2
> > as outcome (given nested scopes).
>
> This seems the best compromise to me. It will lead to the least
> broken code, because this is the behavior that we had before nested
> scopes! It is also quite easy to implement given the current
> implementation, I believe.
>
> Maybe we could introduce a warning rather than an error for this
> situation though, because even if this behavior is clearly documented,
> it will still be confusing to some, so it is better if we outlaw it in
> some future version.
>
Yes this can be easy to implement but more confusing situations can arise:
<frag>
y=3
def f():
y=9
exec "y=2"
def g():
return y
return y,g()
print f()
</frag>
What should this print? the situation leads not to a canonical solution
as class def scopes.
or
<frag>
def f():
from foo import *
def g():
return y
return g()
print f()
</frag>
[Mark Hammond]
> > This probably won't be a very popular suggestion, but how about pulling
> > nested scopes (I assume they are at the root of the problem)
> > until this can be solved cleanly?
>
> Agreed. While I think nested scopes are kinda cool, I have lived without
> them, and really without missing them, for years. At the moment the cure
> appears worse then the symptoms in at least a few cases. If nothing else,
> it compromises the elegant simplicity of Python that drew me here in the
> first place!
>
> Assuming that people really _do_ want this feature, IMO the bar should be
> raised so there are _zero_ backward compatibility issues.
I don't say anything about pulling nested scopes (I don't think my opinion
can change things in this respect)
but I should insist that without explicit syntax IMO raising the bar
has a too high impl cost (both performance and complexity) or creates
confusion.
[Andrew Kuchling]
> >Assuming that people really _do_ want this feature, IMO the bar should be
> >raised so there are _zero_ backward compatibility issues.
>
> Even at the cost of additional implementation complexity? At the cost
> of having to learn "scopes are nested, unless you do these two things
> in which case they're not"?
>
> Let's not waffle. If nested scopes are worth doing, they're worth
> breaking code. Either leave exec and from..import illegal, or back
> out nested scopes, or think of some better solution, but let's not
> introduce complicated backward compatibility hacks.
IMO breaking code would be ok if we issue warnings today and implement
nested scopes issuing errors tomorrow. But this is simply a statement
about principles and raised impression.
IMO import * in an inner scope should end up being an error,
not sure about 'exec's.
We will need a final BDFL statement.
regards, Samuele Pedroni.
PEP: 0???
Title: Support for System Upgrades
Version: $Revision: 0.0 $
Author: mal(a)lemburg.com (Marc-Andr? Lemburg)
Status: Draft
Type: Standards Track
Python-Version: 2.3
Created: 19-Jul-2001
Post-History:
Abstract
This PEP proposes strategies to allow the Python standard library
to be upgraded in parts without having to reinstall the complete
distribution or having to wait for a new patch level release.
Problem
Python currently does not allow overriding modules or packages in
the standard library per default. Even though this is possible by
defining a PYTHONPATH environment variable (the paths defined in
this variable are prepended to the Python standard library path),
there is no standard way of achieving this without changing the
configuration.
Since Python's standard library is starting to host packages which
are also available separately, e.g. the distutils, email and PyXML
packages, which can also be installed independently of the Python
distribution, it is desireable to have an option to upgrade these
packages without having to wait for a new patch level release of
the Python interpreter to bring along the changes.
Proposed Solutions
This PEP proposes two different but not necessarily conflicting
solutions:
1. Adding a new standard search path to sys.path:
$stdlibpath/system-packages just before the $stdlibpath
entry. This complements the already existing entry for site
add-ons $stdlibpath/site-packages which is appended to the
sys.path at interpreter startup time.
To make use of this new standard location, distutils will need
to grow support for installing certain packages in
$stdlibpath/system-packages rather than the standard location
for third-party packages $stdlibpath/site-packages.
2. Tweaking distutils to install directly into $stdlibpath for the
system upgrades rather than into $stdlibpath/site-packages.
The first solution has a few advantages over the second:
* upgrades can be easily identified (just look in
$stdlibpath/system-packages)
* upgrades can be deinstalled without affecting the rest
of the interpreter installation
* modules can be virtually removed from packages; this is
due to the way Python imports packages: once it finds the
top-level package directory it stay in this directory for
all subsequent package submodule imports
* the approach has an overall much cleaner design than the
hackish install on top of an existing installation approach
The only advantages of the second approach are that the Python
interpreter does not have to changed and that it works with
older Python versions.
Both solutions require changes to distutils. These changes can
also be implemented by package authors, but it would be better to
define a standard way of switching on the proposed behaviour.
Scope
Solution 1: Python 2.3 and up
Solution 2: all Python versions supported by distutils
Credits
None
References
None
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/
I've uploaded my logging module, the proposed implementation for PEP 282,
for committer review, to the SourceForge patch manager:
http://sourceforge.net/tracker/index.php?func=detail&aid=578494&group_id=547
0&atid=305470
I've assigned it to Mark Hammond as (a) he had posted some comments to Trent
Mick's original PEP posting, and (b) Barry Warsaw advised not assigning to
PythonLabs people on account of their current workload.
The file logging.py is (apart from some test scripts) all that's supposed to
go into Python 2.3. The file logging-0.4.6.tar.gz contains the module, an
updated version of the PEP (which I mailed to Barry Warsaw on 26th June),
numerous test/example scripts, TeX documentation etc. You can also refer to
http://www.red-dove.com/python_logging.html
Here's hoping for a speedy review :-)
Regards,
Vinay Sajip
tim> Straight character n-grams are very appealing because they're the
tim> simplest and most language-neutral; I didn't have any luck with
tim> them over the weekend, but the size of my training data was
tim> trivial.
Anybody up for pooling corpi (corpora?)?
Skip
While I was driving to work today, I had a thought about the
iterator/iterable discussion of a few weeks ago. My impression is
that that discussion was inconclusive, but a few general principles
emerged from it:
1) Some types are iterators -- that is, they support calls
to next() and raise StopIteration when they have no more
information to give.
2) Some types are iterables -- that is, they support calls
to __iter__() that yield an iterator as the result.
3) Every iterator is also an iterable, because iterators are
required to implement __iter__() as well as next().
4) The way to determine whether an object is an iterator
is to call its next() method and see what happens.
5) The way to determine whether an object is an iterable
is to call its __iter__() method and see what happens.
I'm uneasy about (4) because if an object is an iterator, calling its
next() method is destructive. The implication is that you had better
not use this method to test if an object is an iterator until you are
ready to take irrevocable action based on that test. On the other
hand, calling __iter__() is safe, which means that you can test
nondestructively whether an object is an iterable, which includes
all iterators.
Here is what I realized this morning. It may be obvious to you,
but it wasn't to me (until after I realized it, of course):
``iterator'' and ``iterable'' are just two of many type
categories that exist in Python.
Some other categories:
callable
sequence
generator
class
instance
type
number
integer
floating-point number
complex number
mutable
tuple
mapping
method
built-in
As far as I know, there is no uniform method of determining into which
category or categories a particular object falls. Of course, there
are non-uniform ways of doing so, but in general, those ways are, um,
nonuniform. Therefore, if you want to check whether an object is in
one of these categories, you haven't necessarily learned much about
how to check if it is in a different one of these categories.
So what I wonder is this: Has there been much thought about making
these type categories more explicitly part of the type system?
Don't count words multiple times, and you'll probably
get fewer false positives. That's the main reason I
don't do it-- because it magnifies the effect of some
random word like water happening to have a big spam
probability. (Incidentally, why so high? In my db it's
only 0.3930784.) --pg
--Tim Peters wrote:
> FYI. After cleaning the blatant spam identified by the classifier out of my
> ham corpus, and replacing it with new random msgs from Barry's corpus, the
> reported false positive rate fell to about 0.2% (averaging 8 per each batch
> of 4000 ham test messages). This seems remarkable given that it's ignoring
> headers, and just splitting the raw text on whitespace in total ignorance of
> HTML & MIME etc.
>
> 'FREE' (all caps) moved into the ranks of best spam indicators. The false
> negative rate got reduced by a small amount, but I doubt it's a
> statistically significant reduction (I'll compute that stuff later; I'm
> looking for Big Things now).
>
> Some of these false positives are almost certainly spam, and at least one is
> almost certainly a virus: these are msgs that are 100% base64-encoded, or
> maximally obfuscated quoted-printable. That could almost certainly be fixed
> by, e.g., decoding encoded text.
>
> The other false positives seem harder to deal with:
>
> + Brief HMTL msgs from newbies. I doubt the headers will help these
> get through, as they're generally first-time posters, and aren't
> replies to earlier msgs. There's little positive content, while
> all elements of raw HTML have high "it's spam" probability.
>
> Example:
>
> """
> --------------=_4D4800B7C99C4331D7B8
> Content-Description: filename="text1.txt"
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> Is there a version of Python with Prolog Extension??
> Where can I find it if there is?
>
> Thanks,
> Luis.
>
> P.S. Could you please reply to the sender too.
>
>
> --------------=_4D4800B7C99C4331D7B8
> Content-Description: filename="text1.html"
> Content-Type: text/html
> Content-Transfer-Encoding: quoted-printable
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <HTML>
> <HEAD>
> <TITLE>Prolog Extension</TITLE>
> <META NAME=3D"GENERATOR" CONTENT=3D"StarOffice/5.1 (Linux)">
> <META NAME=3D"CREATED" CONTENT=3D"19991127;12040200">
> <META NAME=3D"CHANGEDBY" CONTENT=3D"Luis Cortes">
> <META NAME=3D"CHANGED" CONTENT=3D"19991127;12044700">
> </HEAD>
> <BODY>
> <PRE>Is there a version of Python with Prolog Extension??
> Where can I find it if there is?
>
> Thanks,
> Luis.
>
> P.S. Could you please reply to the sender too.</PRE>
> </BODY>
> </HTML>
>
> --------------=_4D4800B7C99C4331D7B8--"""
> """
>
> Here's how it got scored:
>
> prob = 0.999958816093
> prob('<META') = 0.957529
> prob('<META') = 0.957529
> prob('<META') = 0.957529
> prob('<BODY>') = 0.979284
> prob('Prolog') = 0.01
> prob('<HEAD>') = 0.97989
> prob('Thanks,') = 0.0337316
> prob('Prolog') = 0.01
> prob('Python') = 0.01
> prob('NAME=3D"GENERATOR"') = 0.99
> prob('<HTML>') = 0.99
> prob('</HTML>') = 0.989494
> prob('</BODY>') = 0.987429
> prob('Thanks,') = 0.0337316
> prob('Python') = 0.01
>
> Note that '<META' gets penalized 3 times. More on that later.
>
> + Msgs talking *about* HTML, and including HTML in examples. This one
> may be troublesome, but there are mercifully few of them.
>
> + Brief msgs with obnoxious employer-generated signatures. Example:
>
> """
> Hi there,
>
> I am looking for you recommendations on training courses available in the UK
> on Python. Can you help?
>
> Thanks,
>
> Vickie Mills
> IS Training Analyst
>
> Tel: 0131 245 1127
> Fax: 0131 245 1550
> E-mail: vickie_mills(a)standardlife.com
>
> For more information on Standard Life, visit our website
> http://www.standardlife.com/ The Standard Life Assurance Company, Standard
> Life House, 30 Lothian Road, Edinburgh EH1 2DH, is registered in Scotland
> (No SZ4) and regulated by the Personal Investment Authority. Tel: 0131 225
> 2552 - calls may be recorded or monitored. This confidential e-mail is for
> the addressee only. If received in error, do not retain/copy/disclose it
> without our consent and please return it to us. We virus scan all e-mails
> but are not responsible for any damage caused by a virus or alteration by a
> third party after it is sent.
> """
>
> The scoring:
>
> prob = 0.98654879055
> prob('our') = 0.928936
> prob('sent.') = 0.939891
> prob('Tel:') = 0.0620155
> prob('Thanks,') = 0.0337316
> prob('received') = 0.940256
> prob('Tel:') = 0.0620155
> prob('Hi') = 0.0533333
> prob('help?') = 0.01
> prob('Personal') = 0.970976
> prob('regulated') = 0.99
> prob('Road,') = 0.01
> prob('Training') = 0.99
> prob('e-mails') = 0.987542
> prob('Python.') = 0.01
> prob('Investment') = 0.99
>
> The brief human-written part is fine, but the longer boilerplate sig is
> indistinguishable from spam.
>
> + The occassional non-Python conference announcement(!). These are
> long, so I'll skip an example. In effect, it's automated bulk email
> trying to sell you a conference, so is prone to use the language and
> artifacts of advertising. Here's typical scoring, for the TOOLS
> Europe '99 conference announcement:
>
> prob = 0.983583974285
> prob('THE') = 0.983584
> prob('Object') = 0.01
> prob('Bell') = 0.01
> prob('Object-Oriented') = 0.01
> prob('**************************************************************') =
> 0.99
> prob('Bertrand') = 0.01
> prob('Rational') = 0.01
> prob('object-oriented') = 0.01
> prob('CONTACT') = 0.99
> prob('**************************************************************') =
> 0.99
> prob('innovative') = 0.99
> prob('**************************************************************') =
> 0.99
> prob('Olivier') = 0.01
> prob('VISIT') = 0.99
> prob('OUR') = 0.99
>
> Note the repeated penalty for the lines of asterisks. That segues into the
> next one:
>
> + Artifacts of that the algorithm counts multiples instances of "a word"
> multiple times. These are baffling at first sight! The two clearest
> examples:
>
> """
> > > Can you create and use new files with dbhash.open()?
> >
> > Yes. But if I run db_dump on these files, it says "unexpected file type
> > or format", regardless which db_dump version I use (2.0.77, 3.0.55,
> > 3.1.17)
> >
>
> It may be that db_dump isn't compatible with version 1.85 databse files. I
> can't remember. I seem to recall that there was an option to build 1.85
> versions of db_dump and db_load. Check the configure options for
> BerkeleyDB to find out. (Also, while you are there, make sure that
> BerkeleyDB was built the same on both of your platforms...)
>
>
> >
> > > Try running db_verify (one of the utilities built
> > > when you compiled DB) on the file and see what it tells you.
> >
> > There is no db_verify among my Berkeley DB utilities.
>
> There should have been a bunch of them built when you compiled DB. I've got
> these:
>
Guido van Rossum <guido(a)python.org> writes:
> This might beling on SF, except it's already been solved in Python
> 2.3, and I need guidance about what to do for Python 2.2.2.
>
> In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that
> cannot be decode back. In 2.3, this is fixed. Should this be fixed
> in 2.2.2 as well?
I think this was discussed really quite a long time ago, like six
months or so.
> I'm asking because it caused problems with reading .pyc files: if
> there's a Unicode literal containing a lone surrogate, reading the
> .pyc file causes an exception:
>
> UnicodeError: UTF-8 decoding error: unexpected code byte
>
> It looks like revision 2.128 fixed this for 2.3, but that patch
> doesn't cleanly apply to the 2.2 maintenance branch. Can someone
> help?
I think the reason this didn't get fixed in 2.2.1 is that it
necessitates bumping MAGIC.
I can probably dig up more references if you want.
Cheers,
M.
--
34. The string is a stark data structure and everywhere it is
passed there is much duplication of process. It is a perfect
vehicle for hiding information.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
How about adding some mixins to simplify the
implementation of some of the fatter interfaces?
class CompareMixin:
"""
Given an __eq__ method in a subclass, adds a __ne__ method
Given __eq__ and __lt__, adds !=, <=, >, >=.
"""
class MappingMixin:
"""
Given __setitem__, __getitem__, and keys,
implements values, items, update, get, setdefault, len,
iterkeys, iteritems, itervalues, has_key, and __contains__.
If __delitem__ is also supplied, implements clear, pop,
and popitem.
Takes advantage of __iter__ if supplied (recommended).
Takes advantage of __contains__ or has_key if supplied
(recommended).
"""
The idea is to make it easier to implement these interfaces.
Also, if the interfaces get expanded, the clients automatically
updated.
Raymond Hettinger
Patch http://www.python.org/sf/554192 adds a function to
mimetypes.py that returns all known extensions for a mimetype,
e.g.
>>> import mimetypes
>>> mimetypes.guess_all_extensions("image/jpeg")
['.jpg', '.jpe', '.jpeg']
Martin v. Loewis and I were discussing whether it would make
sense to make the helper method add_type (which is used for
adding a mapping between one type and one extension) visible
on the module level.
Any comments?
Bye,
Walter Dörwald