
We have a streaming application that processes JSON messages. Our system has been mostly stable running PyPy2.7 6.0 for a couple of years. Recently we tried upgrading to 7.3.4 and saw our processes leak memory until the container was killed. This happened with both PyPy2.7 7.3.4 and PyPy3.7 7.3.4. The only difference was the cpu usage with PyPy3.7 was almost double. :( My question is could this be from the new JSON parser introduced in 7.2? According to https://morepypy.blogspot.com/2019/10/pypys-new-json-parser.html, the parser now caches values not just keys. Is there anyway to disable the value caching? Or other things we can try? Thanks, Brad.

Hi Brad, the new json decoder caches values, but not across json.load calls. ie if you don't process a single gigantic message but many tiny messages, the caching of (string) values cannot be the problem. what *could* be the problem is the key caching though, in theory. The key cache is partially persisted across calls. Are there arbitrarily many different keys in your messages? The key cache shouldn't grow without bounds, however. what do you set your max heap size to? To debug further, I would first try to see whether it is indeed the json module or something else. Maybe you could try another json decoder and see whether the problem persists with that? Eg ujson works on PyPy (slowly) or you could even use the pure python builtin one that you get by importing json.decoder.JSONDecoder. Cheers, Carl Friedrich On 26.05.21 16:39, Brad Kish wrote:

Hi Brad, the new json decoder caches values, but not across json.load calls. ie if you don't process a single gigantic message but many tiny messages, the caching of (string) values cannot be the problem. what *could* be the problem is the key caching though, in theory. The key cache is partially persisted across calls. Are there arbitrarily many different keys in your messages? The key cache shouldn't grow without bounds, however. what do you set your max heap size to? To debug further, I would first try to see whether it is indeed the json module or something else. Maybe you could try another json decoder and see whether the problem persists with that? Eg ujson works on PyPy (slowly) or you could even use the pure python builtin one that you get by importing json.decoder.JSONDecoder. Cheers, Carl Friedrich On 26.05.21 16:39, Brad Kish wrote:
participants (2)
-
Brad Kish
-
Carl Friedrich Bolz-Tereick