The easiest way to start building a Dagster project is by using the dagster project
CLI. This CLI tool helps generate files and folder structures that enable you to quickly get started with Dagster. You can scaffold a new project using the default project skeleton, or start with using one of the official Dagster examples.
To get started, you can run:
pip install dagster
dagster project scaffold --name my-dagster-project
The command dagster project scaffold
generates a folder structure with a single Dagster repository and other boilerplate files such as workspace.yaml
. This helps you to quickly start with an empty project with everything set up.
Here's a breakdown of the files and directories that are generated:
File/Directory | Description |
---|---|
my_dagster_project/ | A Python package that contains code for your new Dagster repository. |
my_dagster_project_tests/ | A Python package that contains tests for my_dagster_project . |
workspace.yaml | A file that specifies the location of your code for Dagit and the Dagster CLI. Check out Workspaces for more details. |
README.md | A description and starter guide for your new Dagster project. |
pyproject.toml | A file that specifies package core metadata in static, tool-agnostic way. Note: pyproject.toml was introduced in PEP-518 and meant to replace setup.py , but we may still include a setup.py for compatibility with tools that do not use this spec. |
setup.py | A build script with Python package dependencies for your new project as a package. |
setup.cfg | An int file that contains option defaults for setup.py commands. |
Inside of the directory my_dagster_project/
, the following files and directories are generated:
File/Directory | Description |
---|---|
my_dagster_project/repository.py | A Python module that contains a RepositoryDefinition , to specify which assets, jobs, schedules, and sensors are available in your repository. |
my_dagster_project/assets/ | A Python package that contains all the Assets. It is helpful to include all assets in a subpackage so that you can use load_assets_from_package_module to load assets into your repository, rather than needing to add assets to the repository every time you define one. |
Alternatively, you can start with one of the official Dagster examples.
To get started, you can run:
pip install dagster
dagster project from-example \
--name my-dagster-project \
--example project_fully_featured
The command dagster project from-example
downloads one of the official Dagster examples to the current directory. This command enables you to quickly bootstrap your project with an officially maintained example.
For more info about the examples, visit the Dagster GitHub repository or use dagster project list-examples
.
The newly generated my-dagster-project
directory is a fully functioning Python package and can be installed with pip
. To install it as a package and its Python dependencies, run:
pip install -e ".[dev]"
By using the --editable
flag, pip
will install your repository in "editable mode" so that as you develop, local code changes will automatically apply.
Then, start the Dagit web server:
dagit
Open http://localhost:3000
with your browser to see the project.
Now, you can start writing assets in my_dagster_project/assets/
, define your own ops or jobs and include them in my_dagster_project/repository.py
.
You can specify new Python dependencies in setup.py
.
Tests can be added in the my_dagster_project_tests
directory and you can run tests using pytest
:
pytest my_dagster_project_tests_tests
If you want to enable Dagster Schedules or Sensors, you will need to start a Dagster Daemon.
Start a daemon process in the same folder as your workspace.yaml
file, but in a different shell or terminal:
dagster-daemon run
The $DAGSTER_HOME
environment variable must be set to a directory for the daemon to work. Note: using directories within /tmp/
may cause issues. See Dagster Instance default local behavior for more details.
Once your project is ready to move to production, check out our recommendation for Transitioning Data Pipelines from Development to Production.
Visit the Deployment Guides to learn more how to run Dagster in production environments, such as Docker, Kubernetes, AWS EC2, etc.