Skip to main content

Add a resource

We've now created our own ingest assets and combined them with assets from the dbt component to model the data. In this step, we will revisit the ingest assets, and add another Dagster object to assist with managing our DuckDB connection. Currently, each of our assets handles its own connection separately. Adding a resource will allow us to centralize our connection to DuckDB in a single object that can be shared across all our Dagster objects.

1. Define the DuckDB resource

In Dagster, resources are reusable objects that provide external context or functionality, such as database connections, clients, or configurations. Resources can be used by a number of different Dagster objects.

First, we will need to install the dagster-duckdb library:

uv pip install dagster-duckdb pandas

Next, we need to scaffold our resources file with dg:

dg scaffold defs dagster.resources resources.py
Creating a component at <YOUR PATH>/etl-tutorial/src/etl_tutorial/defs/resources.py.

This adds a generic resources file to our project. The resources.py, is now part of our etl_tutorial module:

src
└── etl_tutorial
└── defs
└── resources.py

Within this file, we will define our DuckDBResource resource from the dagster-duckdb library. This consolidates the database connection in one place. Next, we will define a resources function with the @dg.Definitions. This function will map all of our resources to specific keys throughout our Dagster project:

src/etl_tutorial/defs/resources.py
from dagster_duckdb import DuckDBResource

import dagster as dg

database_resource = DuckDBResource(database="/tmp/jaffle_platform.duckdb")


@dg.definitions
def resources():
return dg.Definitions(
resources={
"duckdb": database_resource,
}
)

Here we are setting the key duckdb to the DuckDBResource we just defined. Now any Dagster object that uses that resource key will use the underlying resource set for our DuckDB database.

2. Add a resource to our assets

With our resource defined, we need to update our asset code. Since all of our ingestion assets rely on the import_url_to_duckdb to execute the query, we will first update that function to use the DuckDBResource to handle query execution:

src/etl_tutorial/defs/assets/py
from dagster_duckdb import DuckDBResource


def import_url_to_duckdb(url: str, duckdb: DuckDBResource, table_name: str):
with duckdb.get_connection() as conn:
row_count = conn.execute(
f"""
create or replace table {table_name} as (
select * from read_csv_auto('{url}')
)
"""
).fetchone()
assert row_count is not None
row_count = row_count[0]

The DuckDBResource is designed to handle concurrent queries, so we no longer need the serialize_duckdb_query function. Now we can update the assets themselves. We will add duckdb as a parameter to each asset function. Within our Dagster project, the DuckDBResource will now be available which we can pass through to the import_url_to_duckdb function:

src/etl_tutorial/defs/assets.py
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_customers"],
)
def raw_customers(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_customers",
)


@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_orders"],
)
def raw_orders(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_orders",
)


@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_payments"],
)
def raw_payments(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_payments",
)

We can run dg check again to ensure that the assets and resources are configured properly. If there was a mismatch between the key set in the resource and the resource key required by the asset, dg check would fail.

3. Viewing the resource

Back in the UI, your assets will not appear any different, but you can view the resource in the Definitions tab:

  1. Click Deployment, then click "etl-tutorial" to see your deployment.
  2. Click Definitions.
  3. Navigate to the "Resources" section to view all of your resources and select "duckdb".

2048 resolution

You can see that this resource has three uses that line up with our three assets:

2048 resolution

Summary

We have introduced resources into our project. The etl_tutorial module should look like this:

src
└── etl_tutorial
├── __init__.py
├── definitions.py
└── defs
├── __init__.py
├── assets.py
├── resources.py
└── transform
└── defs.yaml

Resources are very helpful as projects grow more complex and help ensure that all assets are using the same connection details and reduce the amount of custom code that needs to be written. We will also see that resources can be used by other Dagster objects.

Next steps

In the next step, we will ensure data quality with asset checks.