[Ironpython-users] Hashing a directory is magnitudes slower than in cPython

Thu Feb 27 20:48:17 CET 2014

I had asked this question before, but there was some hesitation based on
the licensing of Mono. I'm not sure if that is an issue anymore.

On Thu, Feb 27, 2014 at 12:16 PM, Pawel Jasinski
<pawel.jasinski at gmail.com>wrote:

> Is there any reason not to use code out of mono?
> It looks like it supports SHA2 and RIPEMD160.
> https://bugzilla.xamarin.com/show_bug.cgi?id=11703
>
> On Thu, Feb 27, 2014 at 2:10 PM, Markus Schaber <m.schaber at codesys.com>
> wrote:
> > Hi,
> >
> > Von: Jeff Hardy [mailto:jdhardy at gmail.com]
> >> On Thu, Feb 27, 2014 at 11:11 AM, Markus Schaber <m.schaber at codesys.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > I'm just trying to sum it up:
> >> >
> >> > 1) The current code:
> >> >    - High memory usage.
> >> >    - High load on the large object heap.
> >> >    - Limited by the available amount of memory (which might be
> considered a
> >> violation of the Python API).
> >> >    - High CPU usage when used incrementally (quadratic to the number
> of
> >> blocks added).
> >> >
> >> > 2) Optimizing with MemoryStream and lazy calculation:
> >> >    - High memory usage.
> >> >    - High load on the large object heap.
> >> >    - Limited by the available amount of memory (which might be
> considered a
> >> violation of the Python API).
> >> >    + Optimal CPU usage when the hash is only fetched once.
> >> >    ± Better than current code, but still not optimal when hash is
> >> incrementally fetched several times.
> >> >
> >> > 3) Optimizing with jagged arrays and lazy calculation:
> >> >    - High memory usage.
> >> >    + Improved or no impact on the large object heap (depending on the
> exact
> >> implementation)
> >> >    - Limited by the available amount of memory (which might be
> considered a
> >> violation of the Python API).
> >> >    + Optimal CPU usage when the hash is only fetched once.
> >> >    ± Better than current code, but still not optimal when hash is
> >> incrementally fetched several times.
> >> >
> >> > 4) Using the existing .NET incremental APIs
> >> >    + Low, constant memory usage.
> >> >    + No impact on the large object heap.
> >> >    + No limit of data length by the amount of memory.
> >> >    + Optimal CPU usage when the hash is only fetched once.
> >> >    - Breaks when hash is incrementally fetched several times (which
> likely
> >> is a violation of the Python API).
> >> >
> >> > 5) Finding or porting a different Hash implementation in C#:
> >> >    + Low, constant memory usage.
> >> >    + No impact on the large object heap.
> >> >    + No limit of data length by the amount of memory.
> >> >    + Optimal CPU usage when the hash is only fetched once.
> >> >    + Optimal CPU usage when the hash is incrementally fetched several
> times.
> >> >
> >> > I've a local prototype implemented for 2), but I'm not sure whether
> that's
> >> > the best way to go...
> >>
> >> Good analysis!
> >>
> >> My preference would be for (4), raising an exception if .update() is
> called
> >> after .digest(), or .copy() is called at all. As a fallback, an extra
> >> parameter to hashlib.new (&c) that triggers (2), for cases where its
> needed -
> >> I can't say for sure, but I would think calling .update() after
> .digest()
> >> would be rare, and so would .copy() (damn you Google for shutting down
> code
> >> search). At least then the common case is fast and edge cases are
> (usually)
> >> possible.
> >
> > Do you think asking on some cPython lists could give usable feedback how
> > common it is to call copy() or to continue feeding data after calling
> > digest()?
> >
> >> > Maybe we should google for purely managed implementations of the hash
> codes
> >> > with a sensible license...
> >>
> >> There seems to be for MD5 and SHA1 but not SHA2 or RIPEMD160. They
> could be
> >> ported from the public domain Crypto++ library, but that seems like a
> lot of
> >> work for an edge case.
> >
> > Yes, that seems to be a lot of work.
> >
> > On the other hand, it's the 100% solution. :-)
> >
> > Best regards
> >
> > Markus Schaber
> >
> > CODESYS® a trademark of 3S-Smart Software Solutions GmbH
> >
> > Inspiring Automation Solutions
> >
> > 3S-Smart Software Solutions GmbH
> > Dipl.-Inf. Markus Schaber | Product Development Core Technology
> > Memminger Str. 151 | 87439 Kempten | Germany
> > Tel. +49-831-54031-979 | Fax +49-831-54031-50
> >
> > E-Mail: m.schaber at codesys.com | Web: http://www.codesys.com | CODESYS
> store: http://store.codesys.com
> > CODESYS forum: http://forum.codesys.com
> >
> > Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner |
> Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915
> >
> > _______________________________________________
> > Ironpython-users mailing list
> > Ironpython-users at python.org
> > https://mail.python.org/mailman/listinfo/ironpython-users
> _______________________________________________
> Ironpython-users mailing list
> Ironpython-users at python.org
> https://mail.python.org/mailman/listinfo/ironpython-users
>

-- 
Website: http://earl-of-code.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20140227/fa871fb6/attachment.html>