[Tutor] Counting number of occurrence of each character in a string

Wed Sep 2 08:42:05 EDT 2020

Alan Gauld via Tutor wrote:

> On 02/09/2020 11:19, Manprit Singh wrote:
> 
>>>>> a = "ABRACADABRA"
>>>>> d = {}
>>>>> for i in a:
>>             d[i] = d.get(i, 0) + 1
>> 
>> will result in
>>>>> d
>> {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1}
>> 
>> Here keys are characters in the string a, and the values are the counts.
>> 
>> in an another way the same result can be achieved by replacing d[i] =
>> d.get(i, 0) + 1 with  dict .update () method,as the code written below :
>> 
>> a = "ABRACADABRA"
>>>>> d = {}
>>>>> for i in a:
>>             d.update({i : d.get(i, 0) + 1})
>> 
>> The result  again is
>>>>> d
>> {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1}
>> 
>> So in this situation which way should be adopted, :
> 
> Since the get call appears in both solutions the second one does nothing
> but add complexity. Our aim as programmers is to remove unnecessary
> complexity therefore the first option is better.
> 
> However an arguably simpler solution exists using sets and the built
> in methods:
> 
>>>> a = "ABRACADABRA"
>>>> s = sorted(set(a))
>>>> s
> ['A', 'B', 'C', 'D', 'R']
>>>> d = {key:a.count(key) for key in s}
>>>> d
> {'A': 5, 'B': 2, 'C': 1, 'D': 1, 'R': 2}
>>>>
> 
> Which could be done in a one-liner as:
> 
> d2 = {k:a.count(k) for k in sorted(set(a))}

This may be acceptable if the alphabet is small, but will cause trouble 
otherwise. To illustrate, here are some timings for the pathological case 
(each character occurs just once):

$ python3 -m timeit -s 'from collections import Counter; import sys; s = 
"".join(map(chr, range(sys.maxunicode//50)))'  '{c: s.count(c) for c in 
set(s)}'
10 loops, best of 3: 955 msec per loop

$ python3 -m timeit -s 'from collections import Counter; import sys; s = 
"".join(map(chr, range(sys.maxunicode//50)))'  'd = {}
> for c in s: d[c] = d.get(c, 0) + 1'
10 loops, best of 3: 25.6 msec per loop

$ python3 -m timeit -s 'from collections import Counter; import sys; s = 
"".join(map(chr, range(sys.maxunicode//50)))'  'd = {}
for c in s: d[c] = d[c] + 1 if c in d else 1'
10 loops, best of 3: 20.5 msec per loop

$ python3 -m timeit -s 'from collections import Counter; import sys; s = 
"".join(map(chr, range(sys.maxunicode//50)))'  'Counter(s)'
100 loops, best of 3: 18.8 msec per loop

> 
> Although personally I'd say that was overly complex for 1 line.
> 
> If you are not yet familiar with the generator style of dictionary
> creation you could unwrap it to
> 
> d = {}
> for ch in s:
>    d[ch] = a.count(ch)
>