Read and count
Peter Otten
__peter__ at web.de
Thu Mar 10 04:33:09 EST 2016
Jussi Piitulainen wrote:
> Val Krem writes:
>
>> Hi all,
>>
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>>
>>
>> The data set look like
>> city year x
>>
>> XC1 2001 10
>> XC1 2001 20
>> XC1 2002 20
>> XC1 2002 10
>> XC1 2002 10
>>
>> Yv2 2001 10
>> Yv2 2002 20
>> Yv2 2002 20
>> Yv2 2002 10
>> Yv2 2002 10
>>
>> out put will be
>>
>> city
>> xc1 2001 2
>> xc1 2002 3
>> yv1 2001 1
>> yv2 2002 3
>>
>>
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>>
>> fo.close()
>
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).
With pandas:
$ cat sample.txt
city year x
XC1 2001 10
XC1 2001 20
XC1 2002 20
XC1 2002 10
XC1 2002 10
Yv2 2001 10
Yv2 2002 20
Yv2 2002 20
Yv2 2002 10
Yv2 2002 10
$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> table = pandas.read_csv("sample.txt", delimiter=r"\s+")
>>> table
city year x
0 XC1 2001 10
1 XC1 2001 20
2 XC1 2002 20
3 XC1 2002 10
4 XC1 2002 10
5 Yv2 2001 10
6 Yv2 2002 20
7 Yv2 2002 20
8 Yv2 2002 10
9 Yv2 2002 10
[10 rows x 3 columns]
>>> table.groupby(["city", "year"])["x"].count()
city year
XC1 2001 2
2002 3
Yv2 2001 1
2002 4
dtype: int64
> from collections import Counter
>
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
>
> # with statement ensures closing the file
> with open("dat") as fo:
> # file object provides lines
> next(fo) # skip header line
> for line in fo:
> # test requires non-empty string, but lines
> # contain at least newline character so ok
> if line.isspace(): continue
> # .split() at whitespace, omits empty fields
> city, year, x = line.split()
> # collections.Counter has default 0,
> # key is a tuple (city, year), parentheses omitted here
> counts[city, year] += 1
>
> print("city")
> for city, year in sorted(counts): # iterate over keys
> print(city.lower(), year, counts[city, year], sep = "\t")
>
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> # city, year = cy
> # print(city.lower(), year, n, sep = "\t")
More information about the Python-list
mailing list