[Tutor] Possible Useless Python Challenge. [huffman encoding]

Tesla Coil tescoil@irtc.net
Tue, 08 May 2001 10:20:01 -0500


On 7 May 2001, Daniel Yoo wrote:
> What Huffman realized is that we're not taking advantage
> of a certain curious property about human language: we=20
> use certain words over and over, certain letters over=20
> and over, and our language is fraught with repetitious
> repetition.  What if we don't always represent a=20
> character by 8 bits, but by something that depends
> on how frequently the letter occurs in our language?
> Instead of having every letter be represented by=20
> eight bits, we can make a "variable length" encoding.
> That is, perhaps we can use the following code:
>
>    'h' =3D=3D=3D> 0000
>    'w' =3D=3D=3D> 0001
>    'd' =3D=3D=3D> 0010
>    'e' =3D=3D=3D> 0011
>    'o' =3D=3D=3D> 01
>    'r' =3D=3D=3D> 100
>    ' ' =3D=3D=3D> 101
>    'l' =3D=3D=3D> 11

Incidentally, same principle used by an early
binary compression scheme known as Morse code:

'h' =3D=3D=3D> 0000
'w' =3D=3D=3D> 011
'd' =3D=3D=3D> 100
'e' =3D=3D=3D> 0
'o' =3D=3D=3D> 111
'r' =3D=3D=3D> 010
' ' =3D=3D=3D>
'l' =3D=3D=3D> 0100

"Plus =E7a change...," I guess.