It seems there could be a cleaner way of reading the first n lines of
a file and additionally not seeking past those lines (ie peek). This
is obviously a trivial task for 1 line ie...
f.readline()
f.seek(0)
but one that I think would make sense to add to the IO implementation,
given that we already have readline, readlines, and peek I think
peekline() or peeklines(n) is only a natural addition. The argument
for doing so (in 3.3 of course), is primarily readability but also
that the maintenance burden *seems* like it would be low. This
addition would also be helpful and more concise where n > 1.
I think readlines() should also take an optional argument for a max
number of lines to read. It seems more common/helpful to me than
'hint' for max bytes. In n>1 case one could do...
f.readlines(maxlines=10)
or for the 'peek' case
f.peeklines(10)
I also didn't find any of the answers from
http://stackoverflow.com/questions/1767513/read-first-n-lines-of-a-file-i...
to be very compelling.
I am more than willing to propose a patch if the idea(s) are supported.
- John

Hey,
not sure how people do this, or if I missed something obvious in the
stdlib, but I often have this pattern:
starts = ('a', 'b', 'c')
somestring = 'acapulco'
for start in starts:
if somestring.startswith(start):
print "yeah"
So what about a startsin() method, that would iterate over a sequence:
if somestring.startsin('a', 'b', 'c'):
print "yeah"
Implementing it in C should be faster as well
same deal with .endswith I guess
Cheers
Tarek
--
Tarek Ziadé | http://ziade.org

> Hmm, not really true in my experience. Here's some actual code from my
> codebase at work:
>
> v = float(row[dat]) if row[dat] else 0.0
> d.append(float(row[t]) if row[t] else 0.0)
> gen = (float(i) if i != '.' else None for i in row[1:])
> limits = [(float(i) if i != '.' else None) for i in ln[5:15]]
> line[i] = (None if line[i] == '.' else float(line[i]))
> ls.append(float(row[i]) if row[i] else None)
> data[row['s']] = float(val) if '.' in val else int(val)
> cur.append(float(ln[f]) if ln[f] else None)
> cur.append(float(ln['DL']) if ln['DL'] else None)
> pv = float(ln['PV']) if ln['PV'] else None
> mgn = float(ln['MGN']) if ln['MGN'] else None
> f = lambda x: float(x) if x else 1
> data[sn] += float(row['PC']) if row['PC'] else 0.0, row['PCC']
> ubsc = 1 if not row['CSCALE'] else float(row['CSCALE'])
> scale = float(row['ESCALE']) if row['ESCALE'] else 1.0
> efp = float(row['FSCALE']) if row['FSCALE'] else 1.0
> convert = lambda x: float(x) if x else None
>
> In other words, this happens a lot in code where you deal with data
> from a third party that you want to convert to some neater structure
> of Python objects (in cases where a null value occurs in that data,
> which I would suggest is fairly common out there in the Real World).
> Throwing a ValueError is usually not the right thing to do here,
> because you still want to use all the other data that you got even if
> one or two values are unavailable.
>
> Converting two of the above examples:
>
> pv = float(ln['PV']) if ln['PV'] else None
> pv = float(ln['PV'], default=None)
>
> d.append(float(row[t]) if row[t] else 0.0)
> d.append(float(row[t], default=0.0))
>
This is one of the cases that I typically just use logical or for if I'm
expecting some nonzero but false thing, which is reasonably readable.
v = float(row[t] or 0)

I propose adding a basic calculator statistics module to the standard
library, similar to the sorts of functions you would get on a scientific
calculator:
mean (average)
variance (population and sample)
standard deviation (population and sample)
correlation coefficient
and similar. I am volunteering to provide, and support, this module,
written in pure Python so other implementations will be able to use it.
Simple calculator-style statistics seem to me to be a fairly obvious
"battery" to be included, more useful in practice than some functions
already available such as factorial and the hyperbolic functions.
The lack of a standard solution leads people who need basic stats to
roll their own. This seems seductively simple, as the basic stats
formulae are quite simple. Unfortunately doing it *correctly* is much
harder than it seems. Variance, in particular, is prone to serious
inaccuracies. Here is the most obvious algorithm, using the so-called
"computational formula for the variance":
def variance(data):
# σ2 = 1/n**2 * (n*Σ(x**2) - (Σx)**2)
n = len(data)
s1 = sum(x**2 for x in data)
s2 = sum(data)
return (n*s1 - s2**2)/(n*n)
Many stats text books recommend this as the best way to calculate
variance, advice which makes sense when you're talking about hand
calculations of small numbers of moderate sized data, but not for
floating point. It appears to work:
>>> data = [1, 2, 4, 5, 8]
>>> variance(data) # exact value = 6
6.0
but unfortunately it is numerically unstable. Shifting all the data
points by a constant amount shouldn't change the variance, but it does:
>>> data = [x+1e12 for x in data]
>>> variance(data)
171798691.84
Even worse, variance should never be negative:
>>> variance(data*100)
-1266637395.197952
Note that using math.fsum instead of the built-in sum does not fix the
numeric instability problem, and it adds the additional problem that it
coerces the data points to float. (If you use Decimal, this may not be
what you want.)
Here is an example of published code which suffers from exactly this
problem:
https://bitbucket.org/larsyencken/simplestats/src/c42e048a6625/src/basic.py
and here is an example on StackOverflow. Note the most popular answer
given is to use the Computational Formula, which is the wrong answer.
http://stackoverflow.com/questions/2341340/calculate-mean-and-variance-wi...
I would like to add a module to the standard library to solve these
sorts of simple stats problems the right way, once and for all.
Thoughts, comments, objections or words of encouragement are welcome.
--
Steven

I think `strtr`_ in php is also very useful when escaping something.
_ strtr: http://jp.php.net/manual/en/function.strtr.php
For example:
.. code-block:: php
php> = strtr("foo\\\"bar\\'baz\\\\", array("\\\\"=>"\\",
'\\"'=>'"', "\\'"=>"'"));
"foo\"bar'baz\\"
.. code-block:: python
In [1]: "foo\\\"bar\\'baz\\\\".replace('\\"', '"').replace("\\'",
"'").replace('\\\\', '\\')
Out[1]: 'foo"bar\'baz\\'
In Python, lookup of 'replace' method occurs many times and temporary
strings is created many times too.
It makes Python slower than php.
And replacing order may cause very common mistake.
.. code-block:: python
In [4]: "foo\\\"bar\\'baz\\\\'".replace('\\\\',
'\\').replace('\\"', '"').replace("\\'", "'")
Out[4]: 'foo"bar\'baz\''
When I wrote HandlerSocket_ client in pure Python. I use dirty hack for speed.
http://bazaar.launchpad.net/~songofacandy/+junk/pyhandlersocket/view/head...
I believe Pythonic means simple and efficient. My code is not Pythonic at all.
.. _HandlerSocket: https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL
On Sat, Oct 1, 2011 at 12:30 AM, Tarek Ziadé <ziade.tarek(a)gmail.com> wrote:
> Hey,
>
> not sure how people do this, or if I missed something obvious in the
> stdlib, but I often have this pattern:
>
> starts = ('a', 'b', 'c')
> somestring = 'acapulco'
>
> for start in starts:
> if somestring.startswith(start):
> print "yeah"
>
>
> So what about a startsin() method, that would iterate over a sequence:
>
> if somestring.startsin('a', 'b', 'c'):
> print "yeah"
>
> Implementing it in C should be faster as well
>
> same deal with .endswith I guess
>
> Cheers
> Tarek
>
> --
> Tarek Ziadé | http://ziade.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas(a)python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
--
INADA Naoki <songofacandy(a)gmail.com>