In the ever-evolving landscape of data engineering, organizations are continuously seeking more efficient, scalable, and collaborative ways to manage their data transformation pipelines. For years, graphical user interface (GUI)-based ETL tools like Talend have been a popular choice for their visual workflow design. However, a modern, code-centric approach using tools like dbt (data build tool) is rapidly gaining traction, especially for cloud data warehouses like Google BigQuery.
This article will walk you through a simple data aggregation task, first demonstrating how to build it as a Talend job and then showing the equivalent, and arguably more streamlined, process in dbt. We will use a common business scenario: calculating the total sales amount per product category from a sales data table and loading the aggregated results into a BigQuery table.
The Use Case: Aggregating Sales Data
Imagine you have a table in BigQuery named raw_sales
with the following columns: order_id
, product_category
, quantity
, and price
. Your goal is to create a new table, agg_sales_by_category
, that summarizes the total sales amount for each product category.
Here's a small sample of our source data in the raw_sales
table:

Our desired output in the agg_sales_by_category
table would be:

The Talend Approach: A Visual ETL Job
Talend allows you to build data pipelines visually by connecting components on a canvas. For our use case, the job would look like this:
Talend Job Flow:

1. Reading Data from BigQuery (tBigQueryInput
)
This component is configured to connect to your Google Cloud project and read data from the raw_sales
table. You would specify your project ID, dataset name, and the table name. The schema of the component would be defined to match the columns of the raw_sales
table.
2. Calculating Total Price (tMap
)
The tMap
component is a powerful tool for row-level transformations. In this step, we'll add a new column called sales_amount
by multiplying the quantity
and price
for each row.
Input:
order_id
,product_category
,quantity
,price
Output:
product_category
,sales_amount
(Expression:row1.quantity * row1.price
)
3. Aggregating the Data (tAggregateRow
)
This is where the core aggregation logic resides. We will group the data by product_category
and calculate the sum of sales_amount
.
Group by:
product_category
Operations:
Output Column:
total_sales
Function:
sum
Input Column:
sales_amount
4. Writing to BigQuery (tBigQueryOutput
)
Finally, this component writes the aggregated data to our target table, agg_sales_by_category
, in BigQuery. You'll configure the connection details, dataset, and table name. You can also specify the write disposition, such as CREATE_IF_NEEDED
and WRITE_TRUNCATE
to overwrite the table on each run.
While this visual approach can be intuitive for simple pipelines, it can become cumbersome to manage, version control, and debug as the logic grows in complexity.
The dbt Approach: SQL-based and Git-friendly
dbt allows you to transform data in your warehouse using SQL. It brings software engineering best practices like version control, testing, and documentation to your analytics code.
Here's how you would achieve the same aggregation in a dbt project:
dbt Project Structure:
A typical dbt project has a well-defined folder structure. For our use case, we would create a new SQL model file.
The dbt Model (agg_sales_by_category.sql
):
This SQL file contains the logic for our transformation. dbt will execute this SQL against your BigQuery database and materialize the result as a new table (or view).
Running the dbt Job:
To run this transformation, you would simply execute the following command in your terminal:
dbt connects to your BigQuery project (configured in a profiles.yml
file), runs the SQL query, and creates the agg_sales_by_category
table in the schema specified in your dbt_project.yml
.
The Verdict: Why Modern Data Teams are Flocking to dbt
While Talend has been a stalwart in the ETL world, the shift towards cloud-native data warehouses like BigQuery has highlighted the advantages of a code-first, in-warehouse transformation tool like dbt.
If your organization is currently using a traditional ETL tool like Talend to feed your BigQuery warehouse, it's worth exploring how a migration to dbt can streamline your workflows, improve your data quality, and empower your team to move faster and more reliably. The transition from clicking to coding might seem daunting, but the long-term benefits in terms of efficiency, scalability, and collaboration are well worth the investment.
Reach out to us on sales@warehows.io if you have any requirement on Talend Migration
Reviews
"Team warehows efficiently set up our pipelines on Databricks, integrated tools like Airbyte and BigQuery, and managed LLM and AI tasks smoothly."

Olivier Ramier
CTO, Telescope AI