🏭 PHASE 4 · PRODUCTION & PUBLISHING 🟡 Intermediate MODULE 14

Tableau Prep Builder

⏱️ 50 min read

🔧 7 Node Types

❓ 5 Quiz Questions

🔄 ETL Flows

Real-world data is almost never analysis-ready. It arrives from multiple systems in inconsistent formats, with wrong data types, merged columns, duplicate rows, and mismatched naming conventions. Tableau Prep Builder is a dedicated ETL (Extract, Transform, Load) tool that lets you visually build data preparation flows — cleaning, reshaping, and combining data before it reaches Tableau Desktop. Mastering Prep makes you dramatically more self-sufficient as an analyst.

Module 14 of 17 · Phase 4 Progress 82%

🆚 Section 1

Tableau Prep vs Tableau Desktop — What's the Difference?

Many beginners confuse these two tools. They serve completely different purposes in the analytics pipeline and are typically used in sequence, not interchangeably.

Aspect	Tableau Prep Builder	Tableau Desktop
Purpose	Data preparation (ETL)	Data visualization & analysis
Output	.hyper extract file or published data source	Workbook (.twb, .twbx) with dashboards
Primary User	Data engineers, advanced analysts	All analysts and business users
Strength	Complex multi-step data wrangling	Interactive charts and dashboards
Calculates?	Yes — row-level calculations in flows	Yes — aggregations and LODs in views
Visualizes?	No — flow diagram only, data preview only	Yes — full visualization engine
Scheduling	Yes — run flows automatically via Server	No — dashboards refresh, don't "run"
File Format	.tfl (flow file)	.twb / .twbx (workbook)

🔄

The Typical Pipeline

In a professional environment, the data pipeline typically goes: Source Systems → Tableau Prep → .hyper Extract → Tableau Desktop → Published Dashboard → Business Users. Prep handles the messy transformation work so that Desktop only deals with clean, well-structured data — keeping workbook performance high and calculations simple.

🔌 Section 2

Connecting Sources in Tableau Prep

Tableau Prep connects to the same data sources as Tableau Desktop — files, databases, cloud services. But in Prep, connections are just the starting point of a flow, not the end goal. You build a directed graph of steps that progressively transform the data.

Starting a New Flow

Open Tableau Prep Builder

Tableau Prep Builder is a separate application from Tableau Desktop. On launch, you see the home screen. Click Connect to Data to begin.

Add Your First Input Step

Choose your data source (Excel, CSV, Database, etc.). In Prep, each data source becomes an Input node — shown as a colored circle at the left of the flow canvas. The Input node shows a data preview panel at the bottom when selected.

Inspect the Data Profile

Click any field in the Input node's data grid. Prep shows a data profile bar — a histogram or frequency distribution of values in that field. This instantly reveals nulls, typos, outliers, and unexpected values. No formulas needed.

Add a Second Input if Needed

Click the + icon at the top of the canvas or drag another file to add a second Input node. You can then connect nodes downstream with Join or Union steps to combine the sources.

📊

The Data Profile — Your Best Friend in Prep

The profile bar beneath each step in Prep is one of its most powerful features. It shows the distribution of every field visually — you can immediately spot: null values (shown as gray bars), misspelled category values (e.g., "Techology" vs "Technology"), outlier numeric values, and date format inconsistencies. Clicking a bar in the profile instantly filters the data preview to just those rows, letting you investigate in seconds what would take hours in Excel.

🔧 Section 3

Cleaning Steps — The Core of Data Prep

A Clean step is where most of the transformation work happens. Add one by clicking the + icon on any node and selecting "Add Step". You can chain multiple Clean steps — each one sees the output of the previous step.

The 5 Most Essential Clean Step Operations

Filter — Remove Unwanted Rows

Click any field profile bar, then select values to Exclude or Keep Only. Or use the Filters panel (top-right): select a field → Add Filter → set conditions (equals, contains, is not null, date range, etc.). Filters remove rows from the downstream data permanently in the flow.

Pivot — Reshape Wide to Tall

When data has multiple value columns (e.g., Jan_Sales, Feb_Sales, Mar_Sales), use Pivot Columns to Rows: click the three-dot menu → Pivot Columns to Rows → select all the month columns. Prep creates two new columns: the pivot field name (Month) and the pivot value (Sales). This normalized structure is essential for time-series analysis in Desktop.

Split — Separate Combined Fields

Right-click a field with combined values (e.g., "John Smith" or "2024-Q1-West") → Split Values. Prep automatically detects the separator (space, dash, comma) and creates separate columns for each part. Use Custom Split to specify your own delimiter or split by character position.

Aggregate — Summarize Before Output

Add an Aggregate step (not a Clean step — it's a separate node type) to pre-compute summaries. Drag fields to the Grouped Fields area (these become your dimensions) and drag numeric fields to Aggregated Fields (choose SUM, AVG, COUNT, MIN, MAX). The output has one row per group combination.

Group & Replace — Fix Inconsistent Values

In the profile bar, if you see "Technology" and "Techology" as separate values, click the dirty value → select Group and Replace. Prep can also use fuzzy matching to automatically suggest similar values that should be grouped. This fixes categorical data quality issues in a single click.

💡

Calculated Fields in Prep

Just like in Desktop, you can create calculated fields in Prep's Clean step. Click the + icon in the field list header → "Create Calculated Field". Prep supports row-level functions (IF/THEN, string functions, date functions, arithmetic). These calculations run on every row — ideal for business logic that should apply before aggregation. Example: IF [Region] = "West" THEN "Pacific" ELSE [Region] END standardizes regional naming.

🔗 Section 4

Join and Union Nodes — Combining Data Sources

Real analytics typically requires combining data from multiple sources. Tableau Prep handles this with dedicated Join and Union nodes that are visually configured — no SQL required (though the underlying logic is identical to SQL joins and unions).

Join Nodes

A Join combines two data sources horizontally — adding columns from one source alongside matching rows from another, based on a shared key field. To create a join in Prep: drag one node on top of another until you see the Join/Union option appear, then drop on "Join".

Join Type	What It Returns	Use Case
Inner Join	Only rows where the key matches in BOTH sources	Customer orders + customer profile (only customers who ordered)
Left Join	ALL rows from the left source + matching right rows (nulls where no match)	All customers + their orders (include customers with zero orders)
Right Join	ALL rows from right source + matching left rows	All orders + customer profile (include orders with unknown customers)
Full Outer Join	ALL rows from BOTH sources, nulls where no match	Reconciliation reports — see everything that doesn't match

Create the Join

Hover over the first Input or Clean step node, click the + that appears, and drag it toward the second node. Drop on "Join" in the overlay that appears. The Join step appears in the flow between the two sources.

Configure the Join Keys

In the Join configuration panel, Prep automatically suggests join keys based on field names. Verify the key pairing is correct (e.g., [Customer ID] = [Cust_ID]). You can add multiple join conditions for composite keys.

Choose the Join Type

Click the Venn diagram icon in the join configuration to switch between Inner, Left, Right, and Full Outer. Prep shows live counts for matched and unmatched rows, helping you immediately detect join problems.

Handle Duplicate Columns

After joining, you'll often have duplicate columns (e.g., [Customer ID-1] and [Customer ID-2]). Clean step: hide or remove duplicates, keeping the version from the authoritative source.

Union Nodes — Stacking Rows

A Union stacks two datasets vertically — appending rows from one source below rows from another. This requires both sources to have the same (or similar) column structure. Classic use case: monthly sales files where January data is in one CSV and February data in another. To create: drag one node onto another → drop on "Union".

Prep's Union dialog shows field mapping — it automatically aligns fields with the same name and highlights mismatches in red. You can manually map fields with different names (e.g., Cust_Name in one file and Customer Name in another) by dragging them to the same row in the mapping view.

⚠️

Join vs Union — Don't Confuse Them

Join = wider data (more columns). Combines sources that share a key, adding attributes from one to the other. Union = taller data (more rows). Stacks sources that have the same structure but different time periods, regions, or files. Applying a Join when you need a Union (or vice versa) is one of the most common data preparation mistakes — the result looks plausible but the numbers are completely wrong.

📤 Section 5

Output Nodes — Publishing Your Clean Data

Every Prep flow needs at least one Output node — this is where your clean data goes. Outputs can be files or published data sources on Tableau Server/Cloud.

Output Options

Output Type	Result	Best For
Hyper Extract (.hyper)	Fast local extract file	Small teams, sharing via Desktop
CSV File	Plain text comma-separated	Sharing with non-Tableau tools (Excel, Python)
Published Data Source	Data source on Tableau Server/Cloud	Enterprise deployments, shared access
Database Table	Writes back to a database table	Feeding downstream systems

Running the Flow

Add an Output Step

Click the + icon on your final Clean or Aggregate step → "Add Output". An Output node (light blue) appears in the flow. Select the output type and configure the destination path.

Run the Flow

Click the Run Flow button (play triangle) in the toolbar or right-click the Output node → Run. Prep processes every step in sequence and writes the result to your specified output. Progress is shown for each step.

Check for Errors

If any step has an error (e.g., a null in a required join key, a type mismatch), Prep highlights that node in red and shows the error message. Fix the issue in the flow — common fixes: add a filter to remove nulls before the join, or change a field's data type in a Clean step.

Scheduling Flows on Tableau Server

Once your flow is working correctly, publish it to Tableau Server or Tableau Cloud: Server → Publish Flow. Once published, navigate to the flow on Server and set a Schedule — daily at 6 AM, every hour, weekly on Mondays. At the scheduled time, Server connects to your original data sources, runs every step in the flow, and writes fresh output to the published data source. Any Desktop workbooks built on that published source will then show up-to-date data automatically.

🚀

One Flow, Multiple Outputs

A single Prep flow can have multiple Output nodes branching from different points in the flow. This is powerful: you can produce a detailed row-level extract AND a pre-aggregated summary extract from the same flow, both refreshing on the same schedule. Teams needing detailed data get one published source; executives needing fast summaries get another — all maintained by one flow without duplication of effort.

⚖️ Section 6

When to Use Tableau Prep vs SQL

If you know SQL, you might wonder: why use Tableau Prep instead of writing a SQL view or stored procedure? The answer depends on your environment and the complexity of the transformation.

Consideration	Use Tableau Prep	Use SQL
Visual documentation	Flow diagram is self-documenting	SQL requires comments/documentation
Non-technical team	Prep is more accessible	Requires SQL knowledge
Very large datasets	Can be slower than database-side processing	Runs on the database server — faster
Cross-database joins	Prep handles joins across different DBs	Cannot join across different databases in SQL
Complex window functions	Limited support in Prep	Full window function support
Version control / CI-CD	Manual — no Git-native workflow	SQL files commit naturally to Git
No database access	Works with files directly	Requires database server

🎯

The Practical Decision Rule

Use Tableau Prep when: the data comes from files (Excel/CSV), you need to combine sources from different systems, the transformation needs to be maintained by non-SQL analysts, or visual self-documentation is valuable for team collaboration. Use SQL when: all data lives in one database system, the transformation involves complex window functions or recursive CTEs, performance at massive scale is critical, or your team already has a mature SQL/dbt workflow. Many organizations use both: SQL for heavy database transformations, Prep for cross-source assembly and final shaping.

🔑 Key Concepts — Lesson 14

Flow

A Tableau Prep file (.tfl) containing a directed graph of Input, Clean, Join, Union, Aggregate, and Output nodes that transform data step by step.

Pivot (Columns to Rows)

Reshapes wide data (many value columns) into tall data (one value column + one label column). Normalizes data for easier time-series analysis in Desktop.

Join vs Union

Join = adds columns (horizontal combination by matching key). Union = adds rows (vertical stacking of same-structure sources). Critical distinction for correct data combining.

Published Data Source

A Prep flow output published to Tableau Server/Cloud that multiple Desktop workbooks can connect to. Enables centralized, scheduled data refreshes.

🧠 Knowledge Check

1. You have monthly sales files: Jan.csv, Feb.csv, Mar.csv — all with the same column structure. Which Tableau Prep node should you use to combine them into a single dataset?

2. In Tableau Prep, what does the "Pivot Columns to Rows" operation produce?

3. Which join type returns ALL rows from the left source and only matching rows from the right source (with nulls where there is no match)?

4. What is the key advantage of using Tableau Prep instead of SQL for cross-database joins?

5. How do you set up a Tableau Prep flow to automatically refresh data on a daily schedule?

🏆

✅ Lesson Summary

What You Learned

📋

Lesson 14 — Key Takeaways

✅ Tableau Prep Builder is a separate ETL tool — it prepares data for Tableau Desktop, not visualizes it
✅ Flows are directed graphs of Input → Clean → Join/Union → Aggregate → Output nodes
✅ The data profile bar instantly reveals nulls, typos, and outliers in every field
✅ Pivot converts wide (many column) data into tall (normalized) data — essential for time-series analysis
✅ Join = adds columns (horizontal); Union = adds rows (vertical) — never confuse these
✅ Publish flows to Tableau Server/Cloud and set schedules for fully automated, fresh data pipelines

← Lesson 13: Forecasting & Analytics