2015-05-25 22:08 GMT+02:00 Andrew Barnert <abarnert@yahoo.com>:
On Monday, May 25, 2015 6:57 AM, Ludovic Gasc <gmludo@gmail.com> wrote:

>2015-05-25 4:19 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:

>>At the other extreme, there is the structlog module:
>>
>>https://structlog.readthedocs.org/en/stable/
>
>Thank you for the link, it's an interesting project, it's like "logging" module but on steroids, some good logging ideas inside.


>However, in fact, if I understand correctly, it's the same approach that the previous recipe: Generate a log file with JSON content, use logstash-forwarder to reparse the JSON content, to finally send the structure to logstash, for the query part: https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration

>>How does your change compare to those?
>>
>
>
>In the use case of structlog, drop the logstash-forwarder step to interconnect directly Python daemon with structured log daemon.

>Even if logstash-forwarder should be efficient, why to have an additional step to rebuild a structure you have at the beginning ?


Sorry for the delay, I was very busy since one month.
 
You can't send a Python dictionary over the wire, or store a Python dictionary in a database.You need to encode it to some transmission and/or storage format; there's no way around that. And what's wrong with using JSON as that format?

Maybe I should be more clear about my objective: I'm trying to build the simplest architecture for logging, based on the existing python logging features because all Python libraries use that, and with similar features that with ELK (Elasticsearch, Logstash, Kinbana).

On the paper, the features ELK are very interesting for a sysadmin and, with my sysadmin hat, I'm strongly agree with that.
The issue is that, based on my experience where I work, (sorry eventual ELK-lovers on this ML), but, it's very complicated to setup, to maintain and to keep scalable when you have a lot of logs: We passed a lot of time to have a working ELK, and finally we dropped that because the cost of maintenance was too important for us compare to use grep in rsyslog logs.
Maybe we aren't enough smart to maintain ELK, it's possible.
However, if we're not too smart to do that, certainly some people have the same issue as us.
In fact, the issue shouldn't be our brains, but it was clearly a time consuming task, and we have too much directly paid-work to take care.

Don't be wrong: I don't say that ELK doesn't work, only it's time consuming with a high level of logs.
I'm pretty sure that a lot of people are happy with ELK, it's cool for them ;-)

It's like Oracle and PostgreSQL databases: Where with Oracle you need a full-time DBA, with PostgreSQL: apt-get install postgresql
With this last sentence, I'm totally caricatural, but only to show where I see an issue that should be fixed, at least for us.
(FYI, in a previous professional life, I've maintained Oracle, MySQL and PostgreSQL servers for several clients, I know a little bit the subject).

From my point of view, the features in journald are enough to replace most usages of ELK, at least for us, and, contrary to ELK, journald is already installed in all latest Linux distributions, even in Debian Jessie. You have almost no maintenance cost.

More importantly, when you drop logstash-forwarder, how are you intending to get the messages to the upstream server? You don't want to make your log calls synchronously wait for acknowledgement before returning. So you need some kind of buffering. And just buffering in memory doesn't work: if your service shuts down unexpectedly, you've lost the last batch of log messages which would tell you why it went down (plus, if the network goes down temporarily, your memory use becomes unbounded). You can of course buffer to disk, but then you've just reintroduced the same need for some kind of intermediate storage format you were trying to eliminate—and it doesn't really solve the problem, because if your service shuts down, the last messages won't get sent until it starts up again. So you could write a separate simple store-and-forward daemon that either reads those file buffers or listens on localhost UDP… but then you've just recreated logstash-forwarder.

In the past, we used directly a local rsyslog to play this role on each VM, connected directly with the Python daemons via a datagram UNIX socket.
See a logging config file example:
https://github.com/Eyepea/cookiecutter-API-Hour/blob/master/%7B%7Bcookiecutter.app_name%7D%7D/etc/%7B%7Bcookiecutter.app_name%7D%7D/api_hour/logging.ini#L21

Now, it's journald that plays this role, also via a datagram UNIX socket.
 
And even if you wanted to do all that, I don't see why you couldn't do it all with structlog. They recommend using an already-working workflow instead of designing a different one from scratch, but it's just a recommendation.

You're right: Don't reinvent the wheel.
However, if I follow your argument in another context: instead of to create AsyncIO, Guido should integrate Twisted in Python ?
As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be fancy that we migrated from Twisted to AsyncIO ;-)
To me, the expression should be: "Don't reinvent the wheel, except if you can provide a more efficient wheel"

Now, in the context of logging: Please let me to try another approach, maybe I'll waste my time, or maybe I'll find an interesting gold nugget, who knows before to dig ?
You can think I'm trying to be only different from the "ELK" standard, and it's possible, who knows ?

If I revive this thread, it isn't to troll you, but because I'm interested in by your opinion.
I may found a better approach that doesn't need a CPython patch and it's more powerful.

In the source code of logging package, I've found this:
https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L269
BTW, this approach should have more promotion: I didn't know you can use a dict to replace text in a log message, I thought only strings.

Now, instead of to use extra parameter, I use directly this feature.

For the developer, instead to write this:

logger.debug('Receive a create_or_update request from "%s" account', account_id)

he writes this:

logger.debug('Receive a create_or_update request from "%(account_id)s" account',
            {'request_id': request.request_id,
            'account_id': account_id,
            'aiohttp_request': request,
            'payload': payload})

With that, you can write logs as usual in your source code, and use the handler you want.

However, if you use the systemDream handler, all metadata with your log will be sent to journald:
https://github.com/Eyepea/systemDream/blob/master/src/systemdream/journal/handler.py#L79

The another bonus of this approach is that you can use an element of your dict to improve your log message.
With my previous approach with extra parameter, you must pass two times the values.
The cherry on the cake is that extra can be used for something else.
And the bonus of bonus, for the developers who already use this logging feature, they are already journald compliant without to know.

I see no drawbacks of this approach, except that the developers who already use this feature: he must be consistent in the key names of the dict to be useful with journald.

I'm very interested in by your feedbacks, maybe I've missed something.

If anybody doesn't find an issue, I'll push this pattern also in the official Python binding of journald, systemDream is only my laboratory to experiment around systemd/journald (and secondarily, it's impossible to setup the official Python binding of systemd/journald in a pyvenv, at least to me).

I'll publish also a step-by-step tutorial for the new comers on my blog.

Thanks for your attention.
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/