Dagster is compatible and configurable with Python's logging module. At the moment, you can access the configuration options through your dagster.yaml
file, which will apply the contained settings to any run launched from your instance. These settings include which python loggers you'd like to capture from, what log level to set your loggers to, and which handlers / formatters you'd like to use to process log messages produced from your runs.
Name | Description |
---|---|
get_dagster_logger | A function that returns a python logger that will automatically be captured by Dagster. |
By default, logs generated using the python logging module are not captured into the Dagster ecosystem. This means that they are not stored in the Dagster event log, will not be associated with any Dagster metadata (such as step key, run id, etc.), and will not show up in the default view of Dagit.
For example, imagine you have the following code:
import logging
from dagster import graph, op
@op
def ambitious_op():
my_logger = logging.getLogger("my_logger")
try:
x = 1 / 0
return x
except ZeroDivisionError:
my_logger.error("Couldn't divide by zero!")
return None
With the default behavior, because this code uses a custom python logger (instead of context.log
), this log statement will not be added as an event in the Dagster event log, and therefore won't show up in Dagit. However, sometimes it is desirable to change this default behavior, and treat these sorts of log statements identically to how Dagster treats context.log
calls.
This can be accomplished by setting the managed_python_loggers
key in your dagster.yaml file to a list of python logger names that you would like to capture:
python_logs:
managed_python_loggers:
- my_logger
- my_other_logger
Once this key is set, Dagster will treat any normal python log call from one of the listed loggers in the exact same way as a context.log
call, which means you should be able to see this log statement in Dagit:
Please note that you should generally be fairly selective about which logs you wish to capture, especially in a production context. It is possible to overload the event log storage with these events, which may cause certain pages in Dagit to take a long time to load. Consider only capturing the most critical logs, and avoid including debug information if you expect to maintain a large amount of run history.
Note: if a python_log_level
is set (see: Configuring A Python Log Level), then the loggers listed here will be set to the given level before a run is launched.
dagster.yaml
#If you would like to create a logger that is captured by Dagster without modifying your dagster.yaml
file, you can use the provided get_dagster_logger
utility function. This pattern is useful when logging from inside of nested functions, or other cases where it would be inconvenient to thread through the context parameter to enable calls to context.log
.
from dagster import get_dagster_logger, graph, op
@op
def ambitious_op():
my_logger = get_dagster_logger()
try:
x = 1 / 0
return x
except ZeroDivisionError:
my_logger.error("Couldn't divide by zero!")
return None
Note: The logging module retains global state, meaning the logger returned by this function will be identical if this function is called multiple times with the same arguments in the same process. This means that there may be unpredictable or unituitive results if you set the level of the returned python logger to different values in different parts of your code.
If you want to set a global log level for your Dagster instance, you can do this by setting the python_log_level
in your dagster.yaml file. This will set the log level of all loggers managed by Dagster. By default, this will just be the context.log
logger. If there are custom python loggers that you wish to capture, see Capturing Python Logs.
This allows you to filter out logs below a given level. For example, setting a log level of INFO
will filter out all DEBUG
level logs.
python_logs:
python_log_level: INFO
In your dagster.yaml file, you can configure handlers and formatters that will apply to the Dagster instance, so all pipeline runs will share the same logging configuration.
python_logs:
dagster_handler_config:
handlers:
myHandler:
class: logging.StreamHandler
level: INFO
stream: ext://sys.stdout
formatter: myFormatter
formatters:
myFormatter:
format: "My formatted message: %(message)s"
Handler and formatter configuration follows the dictionary config schema format in the Python logging module. Only the handlers
and formatters
dictionary keys will be accepted, as Dagster creates loggers internally.
From there, standard context.log
calls will output with your configured handlers and formatters.
Suppose we'd like to output all of our Dagster logs to a file. We can use the Python logging module's built-in logging.FileHandler
class to send log output to a disk file. We format our config YAML file by defining a new handler myHandler
to be a logging.FileHandler
object.
Optionally, we can configure a formatter to apply a custom format to our logs. Since we'd like our logs to appear with a timestamp, we define a custom formatter named timeFormatter
, attaching it to myHandler
.
python_logs:
dagster_handler_config:
handlers:
myHandler:
class: logging.FileHandler
level: INFO
filename: "my_dagster_logs.log"
mode: "a"
formatter: timeFormatter
formatters:
timeFormatter:
format: "%(asctime)s :: %(message)s"
Then, we execute the following pipeline:
@solid
def file_log_solid(context):
context.log.info("Hello world!")
@pipeline
def file_log_pipeline():
file_log_solid()
After execution, we can read the output log file my_dagster_logs.log
. As expected, the log file contains the formatted log!