[Ironpython-users] Hashing a directory is magnitudes slower than in cPython

Pawel Jasinski pawel.jasinski at gmail.com
Thu Feb 27 20:16:58 CET 2014


Is there any reason not to use code out of mono?
It looks like it supports SHA2 and RIPEMD160.
https://bugzilla.xamarin.com/show_bug.cgi?id=11703

On Thu, Feb 27, 2014 at 2:10 PM, Markus Schaber <m.schaber at codesys.com> wrote:
> Hi,
>
> Von: Jeff Hardy [mailto:jdhardy at gmail.com]
>> On Thu, Feb 27, 2014 at 11:11 AM, Markus Schaber <m.schaber at codesys.com>
>> wrote:
>> > Hi,
>> >
>> > I'm just trying to sum it up:
>> >
>> > 1) The current code:
>> >    - High memory usage.
>> >    - High load on the large object heap.
>> >    - Limited by the available amount of memory (which might be considered a
>> violation of the Python API).
>> >    - High CPU usage when used incrementally (quadratic to the number of
>> blocks added).
>> >
>> > 2) Optimizing with MemoryStream and lazy calculation:
>> >    - High memory usage.
>> >    - High load on the large object heap.
>> >    - Limited by the available amount of memory (which might be considered a
>> violation of the Python API).
>> >    + Optimal CPU usage when the hash is only fetched once.
>> >    ± Better than current code, but still not optimal when hash is
>> incrementally fetched several times.
>> >
>> > 3) Optimizing with jagged arrays and lazy calculation:
>> >    - High memory usage.
>> >    + Improved or no impact on the large object heap (depending on the exact
>> implementation)
>> >    - Limited by the available amount of memory (which might be considered a
>> violation of the Python API).
>> >    + Optimal CPU usage when the hash is only fetched once.
>> >    ± Better than current code, but still not optimal when hash is
>> incrementally fetched several times.
>> >
>> > 4) Using the existing .NET incremental APIs
>> >    + Low, constant memory usage.
>> >    + No impact on the large object heap.
>> >    + No limit of data length by the amount of memory.
>> >    + Optimal CPU usage when the hash is only fetched once.
>> >    - Breaks when hash is incrementally fetched several times (which likely
>> is a violation of the Python API).
>> >
>> > 5) Finding or porting a different Hash implementation in C#:
>> >    + Low, constant memory usage.
>> >    + No impact on the large object heap.
>> >    + No limit of data length by the amount of memory.
>> >    + Optimal CPU usage when the hash is only fetched once.
>> >    + Optimal CPU usage when the hash is incrementally fetched several times.
>> >
>> > I've a local prototype implemented for 2), but I'm not sure whether that's
>> > the best way to go...
>>
>> Good analysis!
>>
>> My preference would be for (4), raising an exception if .update() is called
>> after .digest(), or .copy() is called at all. As a fallback, an extra
>> parameter to hashlib.new (&c) that triggers (2), for cases where its needed -
>> I can't say for sure, but I would think calling .update() after .digest()
>> would be rare, and so would .copy() (damn you Google for shutting down code
>> search). At least then the common case is fast and edge cases are (usually)
>> possible.
>
> Do you think asking on some cPython lists could give usable feedback how
> common it is to call copy() or to continue feeding data after calling
> digest()?
>
>> > Maybe we should google for purely managed implementations of the hash codes
>> > with a sensible license...
>>
>> There seems to be for MD5 and SHA1 but not SHA2 or RIPEMD160. They could be
>> ported from the public domain Crypto++ library, but that seems like a lot of
>> work for an edge case.
>
> Yes, that seems to be a lot of work.
>
> On the other hand, it's the 100% solution. :-)
>
> Best regards
>
> Markus Schaber
>
> CODESYS® a trademark of 3S-Smart Software Solutions GmbH
>
> Inspiring Automation Solutions
>
> 3S-Smart Software Solutions GmbH
> Dipl.-Inf. Markus Schaber | Product Development Core Technology
> Memminger Str. 151 | 87439 Kempten | Germany
> Tel. +49-831-54031-979 | Fax +49-831-54031-50
>
> E-Mail: m.schaber at codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
> CODESYS forum: http://forum.codesys.com
>
> Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915
>
> _______________________________________________
> Ironpython-users mailing list
> Ironpython-users at python.org
> https://mail.python.org/mailman/listinfo/ironpython-users


More information about the Ironpython-users mailing list