Read and count

Thu Mar 10 04:33:09 EST 2016

Jussi Piitulainen wrote:

> Val Krem writes:
> 
>> Hi all,
>>
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>>
>>
>> The data set look like
>> city year  x
>>
>> XC1 2001  10
>> XC1   2001  20
>> XC1   2002   20
>> XC1   2002   10
>> XC1 2002   10
>>
>> Yv2 2001   10
>> Yv2 2002   20
>> Yv2 2002   20
>> Yv2 2002   10
>> Yv2 2002   10
>>
>> out put will be
>>
>> city
>> xc1  2001  2
>> xc1   2002  3
>> yv1  2001  1
>> yv2  2002  3
>>
>>
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>>
>> fo.close()
> 
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).

With pandas:

$ cat sample.txt
city year  x 
XC1 2001  10
XC1   2001  20
XC1   2002   20
XC1   2002   10
XC1 2002   10
Yv2 2001   10
Yv2 2002   20
Yv2 2002   20
Yv2 2002   10
Yv2 2002   10
$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> table = pandas.read_csv("sample.txt", delimiter=r"\s+")
>>> table
  city  year   x
0  XC1  2001  10
1  XC1  2001  20
2  XC1  2002  20
3  XC1  2002  10
4  XC1  2002  10
5  Yv2  2001  10
6  Yv2  2002  20
7  Yv2  2002  20
8  Yv2  2002  10
9  Yv2  2002  10

[10 rows x 3 columns]
>>> table.groupby(["city", "year"])["x"].count()
city  year
XC1   2001    2
      2002    3
Yv2   2001    1
      2002    4
dtype: int64

> from collections import Counter
> 
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
> 
> # with statement ensures closing the file
> with open("dat") as fo:
>     # file object provides lines
>     next(fo) # skip header line
>     for line in fo:
>         # test requires non-empty string, but lines
>         # contain at least newline character so ok
>         if line.isspace(): continue
>         # .split() at whitespace, omits empty fields
>         city, year, x = line.split()
>         # collections.Counter has default 0,
>         # key is a tuple (city, year), parentheses omitted here
>         counts[city, year] += 1
> 
> print("city")
> for city, year in sorted(counts): # iterate over keys
>     print(city.lower(), year, counts[city, year], sep = "\t")
> 
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> #   city, year = cy
> #   print(city.lower(), year, n, sep = "\t")