Re: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support

June 28, 2015

      2015-05-25 22:08 GMT+02:00 Andrew Barnert <abarnert@yahoo.com>:
...
On Monday, May 25, 2015 6:57 AM, Ludovic Gasc <gmludo@gmail.com> wrote:
...
2015-05-25 4:19 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
...
...
At the other extreme, there is the structlog module:
https://structlog.readthedocs.org/en/stable/
Thank you for the link, it's an interesting project, it's like "logging"
module but on steroids, some good logging ideas inside.
...
However, in fact, if I understand correctly, it's the same approach that
the previous recipe: Generate a log file with JSON content, use
logstash-forwarder to reparse the JSON content, to finally send the
structure to logstash, for the query part:
https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-...
...
...
How does your change compare to those?
In the use case of structlog, drop the logstash-forwarder step to
interconnect directly Python daemon with structured log daemon.
...
Even if logstash-forwarder should be efficient, why to have an additional
step to rebuild a structure you have at the beginning ?
Sorry for the delay, I was very busy since one month.
...
You can't send a Python dictionary over the wire, or store a Python
dictionary in a database.You need to encode it to some transmission and/or
storage format; there's no way around that. And what's wrong with using
JSON as that format?
Maybe I should be more clear about my objective: I'm trying to build the
simplest architecture for logging, based on the existing python logging
features because all Python libraries use that, and with similar features
that with ELK (Elasticsearch, Logstash, Kinbana).

On the paper, the features ELK are very interesting for a sysadmin and,
with my sysadmin hat, I'm strongly agree with that.
The issue is that, based on my experience where I work, (sorry eventual
ELK-lovers on this ML), but, it's very complicated to setup, to maintain
and to keep scalable when you have a lot of logs: We passed a lot of time
to have a working ELK, and finally we dropped that because the cost of
maintenance was too important for us compare to use grep in rsyslog logs.
Maybe we aren't enough smart to maintain ELK, it's possible.
However, if we're not too smart to do that, certainly some people have the
same issue as us.
In fact, the issue shouldn't be our brains, but it was clearly a time
consuming task, and we have too much directly paid-work to take care.

Don't be wrong: I don't say that ELK doesn't work, only it's time consuming
with a high level of logs.
I'm pretty sure that a lot of people are happy with ELK, it's cool for them
;-)

It's like Oracle and PostgreSQL databases: Where with Oracle you need a
full-time DBA, with PostgreSQL: apt-get install postgresql
With this last sentence, I'm totally caricatural, but only to show where I
see an issue that should be fixed, at least for us.
(FYI, in a previous professional life, I've maintained Oracle, MySQL and
PostgreSQL servers for several clients, I know a little bit the subject).
...
From my point of view, the features in journald are enough to replace most
usages of ELK, at least for us, and, contrary to ELK, journald is already
installed in all latest Linux distributions, even in Debian Jessie. You
have almost no maintenance cost.
More importantly, when you drop logstash-forwarder, how are you intending
...
to get the messages to the upstream server? You don't want to make your log
calls synchronously wait for acknowledgement before returning. So you need
some kind of buffering. And just buffering in memory doesn't work: if your
service shuts down unexpectedly, you've lost the last batch of log messages
which would tell you why it went down (plus, if the network goes down
temporarily, your memory use becomes unbounded). You can of course buffer
to disk, but then you've just reintroduced the same need for some kind of
intermediate storage format you were trying to eliminate—and it doesn't
really solve the problem, because if your service shuts down, the last
messages won't get sent until it starts up again. So you could write a
separate simple store-and-forward daemon that either reads those file
buffers or listens on localhost UDP… but then you've just recreated
logstash-forwarder.
In the past, we used directly a local rsyslog to play this role on each VM,
connected directly with the Python daemons via a datagram UNIX socket.
See a logging config file example:
https://github.com/Eyepea/cookiecutter-API-Hour/blob/master/%7B%7Bcookiecutt...

Now, it's journald that plays this role, also via a datagram UNIX socket.
...
And even if you wanted to do all that, I don't see why you couldn't do it
all with structlog. They recommend using an already-working workflow
instead of designing a different one from scratch, but it's just a
recommendation.
You're right: Don't reinvent the wheel.
However, if I follow your argument in another context: instead of to create
AsyncIO, Guido should integrate Twisted in Python ?
As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be
fancy that we migrated from Twisted to AsyncIO ;-)
To me, the expression should be: "Don't reinvent the wheel, except if you
can provide a more efficient wheel"

Now, in the context of logging: Please let me to try another approach,
maybe I'll waste my time, or maybe I'll find an interesting gold nugget,
who knows before to dig ?
You can think I'm trying to be only different from the "ELK" standard, and
it's possible, who knows ?

If I revive this thread, it isn't to troll you, but because I'm interested
in by your opinion.
I may found a better approach that doesn't need a CPython patch and it's
more powerful.

In the source code of logging package, I've found this:
https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L269
BTW, this approach should have more promotion: I didn't know you can use a
dict to replace text in a log message, I thought only strings.

Now, instead of to use extra parameter, I use directly this feature.

For the developer, instead to write this:

logger.debug('Receive a create_or_update request from "%s" account',
account_id)

he writes this:

logger.debug('Receive a create_or_update request from "%(account_id)s"
account',
            {'request_id': request.request_id,
            'account_id': account_id,
            'aiohttp_request': request,
            'payload': payload})

With that, you can write logs as usual in your source code, and use the
handler you want.

However, if you use the systemDream handler, all metadata with your log
will be sent to journald:
https://github.com/Eyepea/systemDream/blob/master/src/systemdream/journal/ha...

The another bonus of this approach is that you can use an element of your
dict to improve your log message.
With my previous approach with extra parameter, you must pass two times the
values.
The cherry on the cake is that extra can be used for something else.
And the bonus of bonus, for the developers who already use this logging
feature, they are already journald compliant without to know.

I see no drawbacks of this approach, except that the developers who already
use this feature: he must be consistent in the key names of the dict to be
useful with journald.

I'm very interested in by your feedbacks, maybe I've missed something.

If anybody doesn't find an issue, I'll push this pattern also in the
official Python binding of journald, systemDream is only my laboratory to
experiment around systemd/journald (and secondarily, it's impossible to
setup the official Python binding of systemd/journald in a pyvenv, at least
to me).

I'll publish also a step-by-step tutorial for the new comers on my blog.

Thanks for your attention.
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/