efficient way to get a sufficient set of identifying attributes
Robin Becker
robin at reportlab.com
Thu Oct 19 12:05:57 EDT 2017
On 19/10/2017 16:42, Stefan Ram wrote:
> Robin Becker <robin at reportlab.com> writes:
>> Presumably the information in any attribute is highest
>> if the number of distinct occurrences is the the same as the list length and
>> pairs of attributes are more likely to be unique, but is there some proper way
>> to go about determining what tests to use?
>
> When there is a list
>
> |>>> list = [ 'b', 'b', 'c', 'd', 'c', 'b' ]
> |>>> l = len( list )
>
> , the length of its set can be obtained:
>
> |>>> s = len( set( list ))
>
> . The entries are unique if the length of the set is the
> length of the list
>
> |>>> l == s
> |False
>
> And the ratio between the length of the set and the length
> of the list can be used to quantify the amount of repetiton.
>
> |>>> s / l
> |0.5
.......
this sort of makes sense for single attributes, but ignores the possibility of
combining the attributes to make the checks more discerning.
--
Robin Becker
More information about the Python-list
mailing list