Three Ideas to Customise Celery Logging Handlers

ยท

5 min read

Python logging handlers define what happens with your log messages. For instance, you might want to
write your log messages to the screen, to a file, and to a log management service such as Papertrail. In which
case you add three logging handlers to your application's root logger:

import logging

logger = logging.getLogger()
formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s [%(lineno)d]')

# StreamHandler
sh = logging.StreamHandler()
sh.setFormatter(formatter)
logger.addHandler(sh)

# FileHandler
fh = logging.FileHandler('logs.log')
fh.setFormatter(formatter)
logger.addHandler(fh)

# SysLogHandler
slh = logging.handlers.SysLogHandler(address=('logsN.papertrailapp.com', '...'))
slh.setFormatter(formatter)
logger.addHandler(slh)

Adding custom logging handlers in Celery (configuring logging in Celery in general) can involve a
great deal of head-scratching. The Celery docs are a bit sparse. StackOverflow is full of contradictory answers.
And a variety of blogposts suggest rather complicated workarounds and monkeypatches.

In this post I'll show you three alternative strategies to configure your Celery loggers and explain how and why each strategy works.
The code snippets provided are fully working stand-alone scripts. The snippets require Python 3.6+ with Celery 4.2.0, a message broker is not necessary as I use the filesystem (check out this blog post for more details on how to use the filesystem as a messagebroker).

Start the Celery worker with:

celery worker --app=app.app --concurrency=1 --loglevel=INFO

And kick off the task asynchronously:

python app.py

Celery logging 101: Not simple, but not messed up

Celery logging feels rather unpythonic and unncesarily complicated. However, the reason is not that Celery messed up (contrary
to common rants) but it is simply that it is not as trivial as you might think. The underlying Python logging system
needs to support all concurrency settings Celery supports: eventlets, greenlets, prefork (subprocessing), threads and so forth.
Unfortunately, out of the box, the underlying Python logging system does not support all these different configurations. Hence
the Celery-specific logging look and feel.

Configuration aside, Celery provides a special get_task_logger function in celery.utils.log. This returns a logger that inherits
from a special logger named celery.task which logger automatically gets the task name and unique id as part of the logs.
The Celery docs recommend using get_task_logger.

I prefer the standard getLogger approach to get the logging object. The reason is that you most likely have code can be called from within your Celery or your web app. If you stick to logging.getLogger(__name__), you keep your lower-level code base
clean and cleary separated from the context your code is executed in, which is what I use in the code example.

Strategy 1: Augment Celery logger

Celery provides a after_setup_logger signal that is triggered after Celery has set up the logger
(http://docs.celeryproject.org/en/latest/userguide/signals.html#after-setup-logger). The signal passes (among a few other
arguments) the logger object, which you can conveniently add your custom logging handlers to.

import os
import logging
from celery import Celery
from celery.signals import after_setup_logger


for f in ['./broker/out', './broker/processed']:
    if not os.path.exists(f):
        os.makedirs(f)


logger = logging.getLogger(__name__)


app = Celery('app')
app.conf.update({
    'broker_url': 'filesystem://',
    'broker_transport_options': {
        'data_folder_in': './broker/out',
        'data_folder_out': './broker/out',
        'data_folder_processed': './broker/processed'
    }})


@after_setup_logger.connect
def setup_loggers(logger, *args, **kwargs):
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # FileHandler
    fh = logging.FileHandler('logs.log')
    fh.setFormatter(formatter)
    logger.addHandler(fh)

    # SysLogHandler
    slh = logging.handlers.SysLogHandler(address=('logsN.papertrailapp.com', '...'))
    slh.setFormatter(formatter)
    logger.addHandler(slh)


@app.task()
def add(x, y):
    result = x + y
    logger.info(f'Add: {x} + {y} = {result}')
    return result


if __name__ == '__main__':
    task = add.s(x=2, y=3).delay()
    print(f'Started task: {task}')

Strategy 2: Override Celery root logger

You can prevent Celery from configuring any loggers at all by connecting the setup_logging signal. This allows you
to completely override the logging configuration with your own.

import os
import logging
from celery import Celery
from celery.signals import setup_logging

app = Celery('app')
app.conf.update({
    'broker_url': 'filesystem://',
    'broker_transport_options': {
        'data_folder_in': './broker/out',
        'data_folder_out': './broker/out',
        'data_folder_processed': './broker/processed'
    }})


for f in ['./broker/out', './broker/processed']:
    if not os.path.exists(f):
        os.makedirs(f)


logger = logging.getLogger(__name__)


@after_setup_logger.connect
def setup_loggers(*args, **kwargs):
    logger = logging.getLogger()
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # StreamHandler
    sh = logging.StreamHandler()
    sh.setFormatter(formatter)
    logger.addHandler(sh)

    # FileHandler
    fh = logging.FileHandler('logs.log')
    fh.setFormatter(formatter)
    logger.addHandler(fh)

    # SysLogHandler
    slh = logging.handlers.SysLogHandler(address=('logsN.papertrailapp.com', '...'))
    slh.setFormatter(formatter)
    logger.addHandler(slh)



@app.task()
def add(x, y):
    result = x + y
    logger.info(f'Add: {x} + {y} = {result}')
    return result


if __name__ == '__main__':
    task = add.s(x=2, y=3).delay()
    print(f'Started task: {task}')

Strategy 3: Disable Celery logger configuration

An alternative solution is to let Celery set up its logger but not use it and also prevent it from hijacking the root logger.
By default, Celery removes any previously configured handlers on the root logger. If you want to customise your own
logging handlers without Celery getting in your way, you can disable this behavior by setting worker_hijack_root_logger=False
(make sure to checkout the docs).
This allows you to take back control of the root logger and fall back to the standard Python logging setup.
Be careful with this approach as you are fully responsible for making sure that your Python logging
is compatible with your Celery setup (eventlets, greenlets, prefork (subprocessing), threads etc).

import os
import logging
from celery import Celery
from celery.signals import setup_logging

app = Celery('app')
app.conf.update({
    'broker_url': 'filesystem://',
    'broker_transport_options': {
        'data_folder_in': './broker/out',
        'data_folder_out': './broker/out',
        'data_folder_processed': './broker/processed'
    },
    'worker_hijack_root_logger': False})


# setup folder for message broking
for f in ['./broker/out', './broker/processed']:
    if not os.path.exists(f):
        os.makedirs(f)


formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger()

# StreamHandler
sh = logging.StreamHandler()
sh.setFormatter(formatter)
logger.addHandler(sh)

# FileHandler
fh = logging.FileHandler('logs.log')
fh.setFormatter(formatter)
logger.addHandler(fh)

# SysLogHandler
slh = logging.handlers.SysLogHandler(address=('logsN.papertrailapp.com', '...'))
slh.setFormatter(formatter)
logger.addHandler(slh)


logger = logging.getLogger(__name__)


@app.task()
def add(x, y):
    result = x + y
    logger.info(f'Add: {x} + {y} = {result}')
    return result


if __name__ == '__main__':
    task = add.s(x=2, y=3).delay()
    print(f'Started task: {task}')

Conclusion

In this blog post you've learned about three strategies to customise your Celery loggers. They
range from adding (fine-tuning) Celery's logger after Celery has done its thing to not use
Celery logger at all and roll your own logging.

Unless you have a good reason, my recommendation is go with the minimally invasive strategy
(Strategy 1: Augment Celery logger) as you can simply rely on Celery's logging
and do not have to reinvent the wheel.

Did you find this article valuable?

Support Bjoern Stiel by becoming a sponsor. Any amount is appreciated!

ย