Add a resource
We've now created our own ingest assets and combined them with assets from the dbt component to model the data. In this step, we will revisit the ingest assets, and add another Dagster object to assist with managing our DuckDB connection. Currently, each of our assets handles its own connection separately. Adding a resource will allow us to centralize our connection to DuckDB in a single object that can be shared across all our Dagster objects.
1. Define the DuckDB resource
In Dagster, resources are reusable objects that provide external context or functionality, such as database connections, clients, or configurations. Resources can be used by a number of different Dagster objects.
First, we will need to install the dagster-duckdb
library:
uv pip install dagster-duckdb pandas
Next, we need to scaffold our resources file with dg
:
dg scaffold defs dagster.resources resources.py
Creating a component at <YOUR PATH>/etl-tutorial/src/etl_tutorial/defs/resources.py.
This adds a generic resources file to our project. The resources.py
, is now part of our etl_tutorial
module:
src
└── etl_tutorial
└── defs
└── resources.py
Within this file, we will define our DuckDBResource
resource from the dagster-duckdb
library. This consolidates the database connection in one place. Next, we will define a resources
function with the @dg.Definitions
. This function will map all of our resources to specific keys throughout our Dagster project:
from dagster_duckdb import DuckDBResource
import dagster as dg
database_resource = DuckDBResource(database="/tmp/jaffle_platform.duckdb")
@dg.definitions
def resources():
return dg.Definitions(
resources={
"duckdb": database_resource,
}
)
Here we are setting the key duckdb
to the DuckDBResource
we just defined. Now any Dagster object that uses that resource key will use the underlying resource set for our DuckDB database.
2. Add a resource to our assets
With our resource defined, we need to update our asset code. Since all of our ingestion assets rely on the import_url_to_duckdb
to execute the query, we will first update that function to use the DuckDBResource
to handle query execution:
from dagster_duckdb import DuckDBResource
def import_url_to_duckdb(url: str, duckdb: DuckDBResource, table_name: str):
with duckdb.get_connection() as conn:
row_count = conn.execute(
f"""
create or replace table {table_name} as (
select * from read_csv_auto('{url}')
)
"""
).fetchone()
assert row_count is not None
row_count = row_count[0]
The DuckDBResource
is designed to handle concurrent queries, so we no longer need the serialize_duckdb_query
function. Now we can update the assets themselves. We will add duckdb
as a parameter to each asset function. Within our Dagster project, the DuckDBResource
will now be available which we can pass through to the import_url_to_duckdb
function:
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_customers"],
)
def raw_customers(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_customers",
)
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_orders"],
)
def raw_orders(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_orders",
)
@dg.asset(
kinds={"duckdb"},
key=["target", "main", "raw_payments"],
)
def raw_payments(duckdb: DuckDBResource) -> None:
import_url_to_duckdb(
url="https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv",
duckdb=duckdb,
table_name="jaffle_platform.main.raw_payments",
)
We can run dg check
again to ensure that the assets and resources are configured properly. If there was a mismatch between the key set in the resource and the resource key required by the asset, dg check
would fail.
3. Viewing the resource
Back in the UI, your assets will not appear any different, but you can view the resource in the Definitions tab:
- Click Deployment, then click "etl-tutorial" to see your deployment.
- Click Definitions.
- Navigate to the "Resources" section to view all of your resources and select "duckdb".
You can see that this resource has three uses that line up with our three assets:
Summary
We have introduced resources into our project. The etl_tutorial
module should look like this:
src
└── etl_tutorial
├── __init__.py
├── definitions.py
└── defs
├── __init__.py
├── assets.py
├── resources.py
└── transform
└── defs.yaml
Resources are very helpful as projects grow more complex and help ensure that all assets are using the same connection details and reduce the amount of custom code that needs to be written. We will also see that resources can be used by other Dagster objects.
Next steps
In the next step, we will ensure data quality with asset checks.