DAG | dbt Developer Hub (2024)

A DAG is a Directed Acyclic Graph, a type of graph whose nodes are directionally related to each other and don’t form a directional closed loop. In the practice of analytics engineering, DAGs are often used to visually represent the relationships between your data models.

While the concept of a DAG originated in mathematics and gained popularity in computational work, DAGs have found a home in the modern data world. They offer a great way to visualize data pipelines and lineage, and they offer an easy way to understand dependencies between data models.

DAGs are an effective tool to help you understand relationships between your data models and areas of improvement for your overall data transformations.

Unpacking relationships and data lineage

Can you look at one of your data models today and quickly identify all the upstream and downstream models? If you can’t, that’s probably a good sign to start building or looking at your existing DAG.

Upstream or downstream?

How do you know if a model is upstream or downstream from the model you’re currently looking at? Upstream models are models that must be performed prior to the current model. In simple terms, the current model depends on upstream models in order to exist. Downstream relationships are the outputs from your current model. In a visual DAG, such as the dbt Lineage Graph, upstream models are to the left of your selected model and downstream models are to the right of your selected model. Ever confused? Use the arrows that create the directedness of a DAG to understand the direction of movement.

One of the great things about DAGs is that they are visual. You can clearly identify the nodes that connect to each other and follow the lines of directions. When looking at a DAG, you should be able to identify where your data sources are going and where that data is potentially being referenced.

Take this mini-DAG for an example:

Auditing projects

A potentially bold statement, but there is no such thing as a perfect DAG. DAGs are special in-part because they are unique to your business, data, and data models. There’s usually always room for improvement, whether that means making a CTE into its own view or performing a join earlier upstream, and your DAG can be an effective way to diagnose inefficient data models and relationships.

You can additionally use your DAG to help identify bottlenecks, long-running data models that severely impact the performance of your data pipeline. Bottlenecks can happen for multiple reasons:

Expensive joins
Extensive filtering or use of window functions
Complex logic stored in views
Good old large volumes of data

...to name just a few. Understanding the factors impacting model performance can help you decide on refactoring approaches, changing model materializations, replacing multiple joins with surrogate keys, or other methods.

A bad DAG, one that follows non-modular data modeling techniques

Modular data modeling best practices

See the DAG above? It follows a more traditional approach to data modeling where new data models are often built from raw sources instead of relying on intermediary and reusable data models. This type of project does not scale with team or data growth. As a result, analytics engineers tend to aim to have their DAGs not look like this.

dbt and DAGs

The marketing team at dbt Labs would be upset with us if we told you we think dbt actually stood for “dag build tool,” but one of the key elements of dbt is its ability to generate documentation and infer relationships between models. And one of the hallmark features of dbt Docs is the Lineage Graph (DAG) of your dbt project.

Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project.

The Lineage Graph in dbt Docs

The DAG is also available in the dbt Cloud IDE, so you and your team can refer to your lineage while you build your models.

Leverage exposures

One of the newer features of dbt is exposures, which allow you to define downstream use of your data models outside of your dbt project within your dbt project. What does this mean? This means you can add key dashboards, machine learning or data science pipelines, reverse ETL syncs, or other downstream use cases to your dbt project’s DAG.

This level of interconnectivity and transparency can help boost data governance (who has access to and who owns this data) and transparency (what are the data sources and models affecting your key reports).

A Directed acyclic graph (DAG) is a visual representation of your data models and their connection to each other. The key components of a DAG are that nodes (sources/models/exposures) are directionally linked and don’t form acyclic loops. Overall, DAGs are an effective tool for understanding data lineage, dependencies, and areas of improvement in your data models.

Get started with dbt today to start building your own DAG!

FAQs

What are the four generic tests that dbt ships with? ›

There are four generic data tests that are available out of the box, for everyone using dbt.

not_null ...
unique ...
accepted_values ...
relationships ...
Test an expression ...
Use custom generic test ...
Custom data test name ...
Alternative format for defining tests

Read On ›

What file type is used for specifying which generic tests to run by model and column? ›

In the . yml file the test is specified within the model or the column context. A test can either be only for one specific column, or for the whole model. For example, a test for a whole model could compare two columns with each other.

Discover More Details ›

Does dbt build run snapshots? ›

The dbt build command will: run models. test tests. snapshot snapshots.

How to use dbt model? ›

A dbt model is a representation of a table or view in the data model. To write a model, we use a SQL SELECT statement. Here, we can apply use CTEs (Common Table Expressions) and apply transforms using SQL.

See Details ›

What is the 4 missed rule in DBT? ›

In DBT, if you miss four consecutive individual sessions or four consecutive DBT skills group sessions, you are discharged from the program. This is based on the idea that DBT can only work if a person is coming to treatment.

Find Out More ›

What are the 4 pillars of DBT? ›

The four pillars of DBT are mindfulness, distress tolerance, emotion regulation, and interpersonal effectiveness.

Tell Me More ›

What is the difference between dbt run and dbt test? ›

dbt run — Runs the models you defined in your project. dbt build — Builds and tests your selected resources such as models, seeds, snapshots, and tests. dbt test — Executes the tests you defined for your project.

Show Me More ›

What is the difference between constraints and tests in dbt? ›

You may ask what is the difference between constraints and tests: Constraints depend on platform-specific support, while tests are more flexible, you can test anything, as long as you can build a query for it. Constraints prevent the table materialization if failed, tests run after the model is already materialized.

Explore More ›

What is the unique key test in dbt? ›

The unique test operates by examining each value in the targeted column(s) of a model (such as a table or view) to verify that no two rows have the same value in that column(s). This is essential for fields that are expected to uniquely identify each record, such as order IDs, customer IDs, or any composite keys.

What is the empty flag in dbt? ›

Leveraging the --empty Flag for Efficient dbt CI/CD Workflows. The solution to this problem is the --empty flag in dbt 1.8. This flag allows dbt to perform schema-only dry runs without processing large datasets.

Show Me More ›

What is the difference between dbt compile and dbt run? ›

dbt compile is similar to dbt run except that it doesn't materialize the model's compiled SQL into an existing table.

Read The Full Story ›

Does dbt build include seed? ›

As the name implies, dbt seeds are a part of the dbt framework. They help you transform data in your warehouse more effectively. Consequently, including seeds in your dbt project allows you to load static data into your warehouse as part of your dbt run.

See Details ›

What are the 4 models of dbt? ›

When it comes to the four modules of DBT they fall under acceptance skills or change skills. Acceptance Skills include Mindfulness (module one) and Distress Tolerance (module four). Change Skills include Emotion Regulation (module two) and Interpersonal Effectiveness (module three).

Get More Info Here ›

What programming language does dbt use? ›

SQL: Since dbt uses SQL as its core language to perform transformations, you must be proficient in using SQL SELECT statements.

What are the 4 problem solving options for dbt? ›

DBT offers four basic options for handling any challenge: solving the problem, finding ways to feel better about the problem, learning to accept the problem or situation, or staying miserable.

What are the 4 models of DBT? ›

View Details ›

What is a generic test in DBT? ›

Generic Tests: A generic dbt test is defined in a YAML file and references a macro that contains the SQL logic. This setup allows for greater flexibility and reuse. A dbt test macro typically contains a select statement that returns records that don't pass the test.

What are the 4 topics of DBT? ›

DBT skills training is commonly done in a group setting, which helps teach individuals behavioral skills and how to implement them in their everyday lives. The 4 skills that are focused on are mindfulness, emotion regulation, distress tolerance, and interpersonal effectiveness.

Learn More ›

What are the 4 modules of DBT? ›

The 4 DBT Modules – Explained By A DBT Therapist

Module 1 – Mindfulness.
Module 2 – Distress Tolerance.
Module 3 – Emotion Regulation.
Module 4 – Interpersonal Effectiveness.
To Conclude.

Discover More Details ›

DAG | dbt Developer Hub (2024)

Unpacking relationships and data lineage​

Auditing projects​

Modular data modeling best practices​

dbt and DAGs​

Further reading​

FAQs

What are the four generic tests that dbt ships with? ›

What is the difference between dbt compile and dbt run? ›

Unpacking relationships and data lineage

Auditing projects

Modular data modeling best practices

dbt and DAGs

Further reading