Back to tutorials

dbt DAG: Definition, Usage, and Examples

Overview of dbt: dbt is a transformation tool that allows data analysts and engineers to transform and model data in the data warehouse. It's a popular tool in the modern analytics engineering workflow.

DAG in Analytics: A Directed Acyclic Graph (DAG) is a type of graph where nodes represent tasks and edges represent dependencies between these tasks. In the context of dbt, these tasks are often data models, which are saved as SQL statements and result in data transformations.

Basics of DAG

What is a DAG?: When people refer to "the DAG" in the context of dbt, they usually mean the graph with directionally related nodes. The nodes can flow left to right and even back in some cases, as long as they don't form a closed loop. A closed loop would mean there's a circular dependency and the system will not work.

Application: In businesses, DAGs ensure that data processes run in the correct sequence. For instance, sales data might need to be cleaned and transformed before it's merged with inventory data.

DAGs in dbt

dbt's Lineage Graph: This is a visual representation of your dbt project's DAG. It showcases how different data models relate to each other.

DAGs and Data Models: In dbt, each model represents a SQL transformation. The DAG ensures models run in the correct order based on their dependencies.

Unpacking Relationships with dbt dag

Upstream and Downstream Models:

  • Upstream models are the ones your current model depends on.
  • Downstream models are those that depend on your current model.

dbt Lineage Graph: In this graph, upstream models are to the left, and downstream models are to the right. The arrows indicate the flow of data.

Example: Consider a business with sales and inventory data. The sales_cleaned model (which cleans raw sales data) might be upstream, and a sales_inventory_merged model (which merges sales and inventory data) would be downstream.

Auditing Projects with a dbt DAG

  • Unique DAGs: Every business has unique data needs, leading to unique DAGs.
  • Diagnosing Inefficiencies: If a model takes too long to run, it might be due to expensive joins or complex logic. The DAG can help identify such bottlenecks.

Modular Data Modeling in dbt

Traditional vs. Modular: Traditional modeling often pulls directly from raw sources. Modular modeling, recommended for scalability, uses intermediary stages.

Best Practices:

  • Staging Layers: Start with a staging model to clean raw data. For instance, stg_sales for raw sales data.
  • Intermediate Layers: Transform staged data further, like calculating total sales in int_total_sales.
  • Mart Layers: Final transformations, like merging sales and inventory in mart_sales_inventory.

Advanced Features in dbt related to DAG

  • dbt Project Evaluator Package: This tool audits your dbt project, suggesting improvements.
  • dbt Docs: A feature that visualizes your DAG, helping you understand data flow and dependencies.
  • Exposures in dbt: This newer feature lets you define downstream uses of your data models, like dashboards or machine learning pipelines, enhancing transparency.

Practical Exercise: Building Your Own dbt DAG

Setting up: Install dbt and initiate a new project.

Creating Models

Below is the model code for three models that are all connected in the DAG.

Raw sales staging model:

-- stg__sales.sql

select 
  date(created_at) as date,
  to_double(amount/100.00) as amount

from {{ source('raw', 'sales') }};

Total sales intermediate model:

-- int__total_sales.sql

select 
  date, 
  sum(amount) as total_sales 

from {{ ref('stg__sales') }} 
group by date;

Sales inventory mart:

-- mart__sales_inventory.sql

select 
  s.date, 
  s.total_sales, 
  i.total_inventory

from  {{ ref('int__total_sales') }}  s

  inner join {{ ref('stg__inventory') }} i 
    on s.date = i.date;
  • Visualizing the DAG: Run dbt run to execute the models, then dbt docs serve to view the Lineage Graph on the hosted docs page. For this example, the DAG would look like this:

simiple dbt DAG

Conclusion

  • Recap: DAGs in dbt ensure data transformations run in the correct order, providing a visual representation of data flow and dependencies.
  • Next Steps: Explore more advanced dbt features and keep refining your DAG for efficiency.

Additional Resources

database icon
Unified workspace for your dbt workflow
Forget about the painful parts of dbt development, focus on what matters the most - data analysis