[Tutor] Counting number of occurrence of each character in a string
Peter Otten
__peter__ at web.de
Wed Sep 2 08:42:05 EDT 2020
Alan Gauld via Tutor wrote:
> On 02/09/2020 11:19, Manprit Singh wrote:
>
>>>>> a = "ABRACADABRA"
>>>>> d = {}
>>>>> for i in a:
>> d[i] = d.get(i, 0) + 1
>>
>> will result in
>>>>> d
>> {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1}
>>
>> Here keys are characters in the string a, and the values are the counts.
>>
>> in an another way the same result can be achieved by replacing d[i] =
>> d.get(i, 0) + 1 with dict .update () method,as the code written below :
>>
>> a = "ABRACADABRA"
>>>>> d = {}
>>>>> for i in a:
>> d.update({i : d.get(i, 0) + 1})
>>
>> The result again is
>>>>> d
>> {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1}
>>
>> So in this situation which way should be adopted, :
>
> Since the get call appears in both solutions the second one does nothing
> but add complexity. Our aim as programmers is to remove unnecessary
> complexity therefore the first option is better.
>
> However an arguably simpler solution exists using sets and the built
> in methods:
>
>>>> a = "ABRACADABRA"
>>>> s = sorted(set(a))
>>>> s
> ['A', 'B', 'C', 'D', 'R']
>>>> d = {key:a.count(key) for key in s}
>>>> d
> {'A': 5, 'B': 2, 'C': 1, 'D': 1, 'R': 2}
>>>>
>
> Which could be done in a one-liner as:
>
> d2 = {k:a.count(k) for k in sorted(set(a))}
This may be acceptable if the alphabet is small, but will cause trouble
otherwise. To illustrate, here are some timings for the pathological case
(each character occurs just once):
$ python3 -m timeit -s 'from collections import Counter; import sys; s =
"".join(map(chr, range(sys.maxunicode//50)))' '{c: s.count(c) for c in
set(s)}'
10 loops, best of 3: 955 msec per loop
$ python3 -m timeit -s 'from collections import Counter; import sys; s =
"".join(map(chr, range(sys.maxunicode//50)))' 'd = {}
> for c in s: d[c] = d.get(c, 0) + 1'
10 loops, best of 3: 25.6 msec per loop
$ python3 -m timeit -s 'from collections import Counter; import sys; s =
"".join(map(chr, range(sys.maxunicode//50)))' 'd = {}
for c in s: d[c] = d[c] + 1 if c in d else 1'
10 loops, best of 3: 20.5 msec per loop
$ python3 -m timeit -s 'from collections import Counter; import sys; s =
"".join(map(chr, range(sys.maxunicode//50)))' 'Counter(s)'
100 loops, best of 3: 18.8 msec per loop
>
> Although personally I'd say that was overly complex for 1 line.
>
> If you are not yet familiar with the generator style of dictionary
> creation you could unwrap it to
>
> d = {}
> for ch in s:
> d[ch] = a.count(ch)
>
More information about the Tutor
mailing list