Memento DBT
Introduction to dbt (Data Build Tool)¶
What is dbt? - dbt is a command-line tool that enables data analysts and engineers to transform data in their warehouse more effectively. - It helps manage data transformations using SQL and software engineering best practices like modularity, testing, and documentation. - Commonly used with data warehouses like Snowflake, BigQuery, Redshift, and Databricks.
Prerequisites¶
Before starting, you’ll need: - Python (3.7+) - Access to a supported data warehouse (e.g., Snowflake, BigQuery) - Basic knowledge of SQL and command-line interface - Docker (for containerized dbt setup, optional)
Setting Up dbt with Docker¶
Why use Docker? - Isolates dependencies - Ensures reproducibility - Easy setup for teams Run dbt with Docker:
docker run -it --rm \
--name dbt \
-v $(pwd):/usr/app \
-w /usr/app \
ghcr.io/dbt-labs/dbt-core:1.7.0 debug
Creating a dbt Project¶
Initialize a new project:
dbt init my_project
Configure Your Connection¶
Edit profiles.yml
:
- Located at ~/.dbt/profiles.yml
- Configure according to your warehouse (example for Snowflake):
my_project:
target: dev
outputs:
dev:
type: snowflake
account: your_account
user: your_user
password: your_password
role: your_role
database: your_database
warehouse: your_warehouse
schema: your_schema
Building Your First Model¶
Create a SQL file in models/
:
- Example: models/my_first_model.sql
SELECT *
FROM {{ ref('raw_customers') }}
WHERE is_active = TRUE
ref()
is a dbt macro that manages dependencies between models
Run dbt models:
dbt run
Testing Your Models¶
Create a test in models/tests/
:
version: 2
models:
- name: my_first_model
tests:
- unique:
column_name: id
- not_null:
column_name: id
dbt test
Documenting Models¶
Add descriptions in YAML:
models:
- name: my_first_model
description: "This model filters active customers."
columns:
- name: id
description: "Unique identifier for each customer"
dbt docs generate
dbt docs serve
Using Sources¶
Define raw tables as sources:
version: 2
sources:
- name: raw
database: your_database
schema: raw_schema
tables:
- name: customers
SELECT *
FROM {{ source('raw', 'customers') }}
Deployment & Scheduling¶
Use dbt Cloud or orchestrators like Airflow or Prefect to: - Schedule dbt runs - Monitor job health - Handle dependencies
Conclusion¶
Key Takeaways:
- dbt transforms data using modular SQL models and best practices
- Use ref()
and source()
for maintainability
- Automate testing and documentation
- Docker helps keep dbt environments reproducible
Next Steps:
- Explore dbt packages: https://hub.getdbt.com
- Learn about snapshots and incremental models
- Connect dbt with your BI tools
Last update : 2025-05-04T19:26:13Z