🏭 PHASE 4 · PRODUCTION & PUBLISHING 🟡 Intermediate MODULE 14

Tableau Prep Builder

⏱️ 50 min read
🔧 7 Node Types
❓ 5 Quiz Questions
🔄 ETL Flows
Real-world data is almost never analysis-ready. It arrives from multiple systems in inconsistent formats, with wrong data types, merged columns, duplicate rows, and mismatched naming conventions. Tableau Prep Builder is a dedicated ETL (Extract, Transform, Load) tool that lets you visually build data preparation flows — cleaning, reshaping, and combining data before it reaches Tableau Desktop. Mastering Prep makes you dramatically more self-sufficient as an analyst.
Module 14 of 17 · Phase 4 Progress 82%

Tableau Prep vs Tableau Desktop — What's the Difference?

Many beginners confuse these two tools. They serve completely different purposes in the analytics pipeline and are typically used in sequence, not interchangeably.

AspectTableau Prep BuilderTableau Desktop
PurposeData preparation (ETL)Data visualization & analysis
Output.hyper extract file or published data sourceWorkbook (.twb, .twbx) with dashboards
Primary UserData engineers, advanced analystsAll analysts and business users
StrengthComplex multi-step data wranglingInteractive charts and dashboards
Calculates?Yes — row-level calculations in flowsYes — aggregations and LODs in views
Visualizes?No — flow diagram only, data preview onlyYes — full visualization engine
SchedulingYes — run flows automatically via ServerNo — dashboards refresh, don't "run"
File Format.tfl (flow file).twb / .twbx (workbook)
🔄
The Typical Pipeline
In a professional environment, the data pipeline typically goes: Source Systems → Tableau Prep → .hyper Extract → Tableau Desktop → Published Dashboard → Business Users. Prep handles the messy transformation work so that Desktop only deals with clean, well-structured data — keeping workbook performance high and calculations simple.

Connecting Sources in Tableau Prep

Tableau Prep connects to the same data sources as Tableau Desktop — files, databases, cloud services. But in Prep, connections are just the starting point of a flow, not the end goal. You build a directed graph of steps that progressively transform the data.

Starting a New Flow

1
Open Tableau Prep Builder
Tableau Prep Builder is a separate application from Tableau Desktop. On launch, you see the home screen. Click Connect to Data to begin.
2
Add Your First Input Step
Choose your data source (Excel, CSV, Database, etc.). In Prep, each data source becomes an Input node — shown as a colored circle at the left of the flow canvas. The Input node shows a data preview panel at the bottom when selected.
3
Inspect the Data Profile
Click any field in the Input node's data grid. Prep shows a data profile bar — a histogram or frequency distribution of values in that field. This instantly reveals nulls, typos, outliers, and unexpected values. No formulas needed.
4
Add a Second Input if Needed
Click the + icon at the top of the canvas or drag another file to add a second Input node. You can then connect nodes downstream with Join or Union steps to combine the sources.
📊
The Data Profile — Your Best Friend in Prep
The profile bar beneath each step in Prep is one of its most powerful features. It shows the distribution of every field visually — you can immediately spot: null values (shown as gray bars), misspelled category values (e.g., "Techology" vs "Technology"), outlier numeric values, and date format inconsistencies. Clicking a bar in the profile instantly filters the data preview to just those rows, letting you investigate in seconds what would take hours in Excel.

Cleaning Steps — The Core of Data Prep

A Clean step is where most of the transformation work happens. Add one by clicking the + icon on any node and selecting "Add Step". You can chain multiple Clean steps — each one sees the output of the previous step.

The 5 Most Essential Clean Step Operations

1
Filter — Remove Unwanted Rows
Click any field profile bar, then select values to Exclude or Keep Only. Or use the Filters panel (top-right): select a field → Add Filter → set conditions (equals, contains, is not null, date range, etc.). Filters remove rows from the downstream data permanently in the flow.
2
Pivot — Reshape Wide to Tall
When data has multiple value columns (e.g., Jan_Sales, Feb_Sales, Mar_Sales), use Pivot Columns to Rows: click the three-dot menu → Pivot Columns to Rows → select all the month columns. Prep creates two new columns: the pivot field name (Month) and the pivot value (Sales). This normalized structure is essential for time-series analysis in Desktop.
3
Split — Separate Combined Fields
Right-click a field with combined values (e.g., "John Smith" or "2024-Q1-West") → Split Values. Prep automatically detects the separator (space, dash, comma) and creates separate columns for each part. Use Custom Split to specify your own delimiter or split by character position.
4
Aggregate — Summarize Before Output
Add an Aggregate step (not a Clean step — it's a separate node type) to pre-compute summaries. Drag fields to the Grouped Fields area (these become your dimensions) and drag numeric fields to Aggregated Fields (choose SUM, AVG, COUNT, MIN, MAX). The output has one row per group combination.
5
Group & Replace — Fix Inconsistent Values
In the profile bar, if you see "Technology" and "Techology" as separate values, click the dirty value → select Group and Replace. Prep can also use fuzzy matching to automatically suggest similar values that should be grouped. This fixes categorical data quality issues in a single click.
💡
Calculated Fields in Prep
Just like in Desktop, you can create calculated fields in Prep's Clean step. Click the + icon in the field list header → "Create Calculated Field". Prep supports row-level functions (IF/THEN, string functions, date functions, arithmetic). These calculations run on every row — ideal for business logic that should apply before aggregation. Example: IF [Region] = "West" THEN "Pacific" ELSE [Region] END standardizes regional naming.

Join and Union Nodes — Combining Data Sources

Real analytics typically requires combining data from multiple sources. Tableau Prep handles this with dedicated Join and Union nodes that are visually configured — no SQL required (though the underlying logic is identical to SQL joins and unions).

Join Nodes

A Join combines two data sources horizontally — adding columns from one source alongside matching rows from another, based on a shared key field. To create a join in Prep: drag one node on top of another until you see the Join/Union option appear, then drop on "Join".

Join TypeWhat It ReturnsUse Case
Inner JoinOnly rows where the key matches in BOTH sourcesCustomer orders + customer profile (only customers who ordered)
Left JoinALL rows from the left source + matching right rows (nulls where no match)All customers + their orders (include customers with zero orders)
Right JoinALL rows from right source + matching left rowsAll orders + customer profile (include orders with unknown customers)
Full Outer JoinALL rows from BOTH sources, nulls where no matchReconciliation reports — see everything that doesn't match
1
Create the Join
Hover over the first Input or Clean step node, click the + that appears, and drag it toward the second node. Drop on "Join" in the overlay that appears. The Join step appears in the flow between the two sources.
2
Configure the Join Keys
In the Join configuration panel, Prep automatically suggests join keys based on field names. Verify the key pairing is correct (e.g., [Customer ID] = [Cust_ID]). You can add multiple join conditions for composite keys.
3
Choose the Join Type
Click the Venn diagram icon in the join configuration to switch between Inner, Left, Right, and Full Outer. Prep shows live counts for matched and unmatched rows, helping you immediately detect join problems.
4
Handle Duplicate Columns
After joining, you'll often have duplicate columns (e.g., [Customer ID-1] and [Customer ID-2]). Clean step: hide or remove duplicates, keeping the version from the authoritative source.

Union Nodes — Stacking Rows

A Union stacks two datasets vertically — appending rows from one source below rows from another. This requires both sources to have the same (or similar) column structure. Classic use case: monthly sales files where January data is in one CSV and February data in another. To create: drag one node onto another → drop on "Union".

Prep's Union dialog shows field mapping — it automatically aligns fields with the same name and highlights mismatches in red. You can manually map fields with different names (e.g., Cust_Name in one file and Customer Name in another) by dragging them to the same row in the mapping view.

⚠️
Join vs Union — Don't Confuse Them
Join = wider data (more columns). Combines sources that share a key, adding attributes from one to the other. Union = taller data (more rows). Stacks sources that have the same structure but different time periods, regions, or files. Applying a Join when you need a Union (or vice versa) is one of the most common data preparation mistakes — the result looks plausible but the numbers are completely wrong.

Output Nodes — Publishing Your Clean Data

Every Prep flow needs at least one Output node — this is where your clean data goes. Outputs can be files or published data sources on Tableau Server/Cloud.

Output Options

Output TypeResultBest For
Hyper Extract (.hyper)Fast local extract fileSmall teams, sharing via Desktop
CSV FilePlain text comma-separatedSharing with non-Tableau tools (Excel, Python)
Published Data SourceData source on Tableau Server/CloudEnterprise deployments, shared access
Database TableWrites back to a database tableFeeding downstream systems

Running the Flow

1
Add an Output Step
Click the + icon on your final Clean or Aggregate step → "Add Output". An Output node (light blue) appears in the flow. Select the output type and configure the destination path.
2
Run the Flow
Click the Run Flow button (play triangle) in the toolbar or right-click the Output node → Run. Prep processes every step in sequence and writes the result to your specified output. Progress is shown for each step.
3
Check for Errors
If any step has an error (e.g., a null in a required join key, a type mismatch), Prep highlights that node in red and shows the error message. Fix the issue in the flow — common fixes: add a filter to remove nulls before the join, or change a field's data type in a Clean step.

Scheduling Flows on Tableau Server

Once your flow is working correctly, publish it to Tableau Server or Tableau Cloud: Server → Publish Flow. Once published, navigate to the flow on Server and set a Schedule — daily at 6 AM, every hour, weekly on Mondays. At the scheduled time, Server connects to your original data sources, runs every step in the flow, and writes fresh output to the published data source. Any Desktop workbooks built on that published source will then show up-to-date data automatically.

🚀
One Flow, Multiple Outputs
A single Prep flow can have multiple Output nodes branching from different points in the flow. This is powerful: you can produce a detailed row-level extract AND a pre-aggregated summary extract from the same flow, both refreshing on the same schedule. Teams needing detailed data get one published source; executives needing fast summaries get another — all maintained by one flow without duplication of effort.

When to Use Tableau Prep vs SQL

If you know SQL, you might wonder: why use Tableau Prep instead of writing a SQL view or stored procedure? The answer depends on your environment and the complexity of the transformation.

ConsiderationUse Tableau PrepUse SQL
Visual documentationFlow diagram is self-documentingSQL requires comments/documentation
Non-technical teamPrep is more accessibleRequires SQL knowledge
Very large datasetsCan be slower than database-side processingRuns on the database server — faster
Cross-database joinsPrep handles joins across different DBsCannot join across different databases in SQL
Complex window functionsLimited support in PrepFull window function support
Version control / CI-CDManual — no Git-native workflowSQL files commit naturally to Git
No database accessWorks with files directlyRequires database server
🎯
The Practical Decision Rule
Use Tableau Prep when: the data comes from files (Excel/CSV), you need to combine sources from different systems, the transformation needs to be maintained by non-SQL analysts, or visual self-documentation is valuable for team collaboration. Use SQL when: all data lives in one database system, the transformation involves complex window functions or recursive CTEs, performance at massive scale is critical, or your team already has a mature SQL/dbt workflow. Many organizations use both: SQL for heavy database transformations, Prep for cross-source assembly and final shaping.
🔑 Key Concepts — Lesson 14
Flow
A Tableau Prep file (.tfl) containing a directed graph of Input, Clean, Join, Union, Aggregate, and Output nodes that transform data step by step.
Pivot (Columns to Rows)
Reshapes wide data (many value columns) into tall data (one value column + one label column). Normalizes data for easier time-series analysis in Desktop.
Join vs Union
Join = adds columns (horizontal combination by matching key). Union = adds rows (vertical stacking of same-structure sources). Critical distinction for correct data combining.
Published Data Source
A Prep flow output published to Tableau Server/Cloud that multiple Desktop workbooks can connect to. Enables centralized, scheduled data refreshes.
🧠 Knowledge Check
1. You have monthly sales files: Jan.csv, Feb.csv, Mar.csv — all with the same column structure. Which Tableau Prep node should you use to combine them into a single dataset?
2. In Tableau Prep, what does the "Pivot Columns to Rows" operation produce?
3. Which join type returns ALL rows from the left source and only matching rows from the right source (with nulls where there is no match)?
4. What is the key advantage of using Tableau Prep instead of SQL for cross-database joins?
5. How do you set up a Tableau Prep flow to automatically refresh data on a daily schedule?
🏆

What You Learned

📋
Lesson 14 — Key Takeaways
✅ Tableau Prep Builder is a separate ETL tool — it prepares data for Tableau Desktop, not visualizes it
✅ Flows are directed graphs of Input → Clean → Join/Union → Aggregate → Output nodes
✅ The data profile bar instantly reveals nulls, typos, and outliers in every field
✅ Pivot converts wide (many column) data into tall (normalized) data — essential for time-series analysis
✅ Join = adds columns (horizontal); Union = adds rows (vertical) — never confuse these
✅ Publish flows to Tableau Server/Cloud and set schedules for fully automated, fresh data pipelines
🎉
Lesson 14 Complete!
You can now build professional data preparation flows in Tableau Prep Builder — cleaning, reshaping, and combining data automatically. Next: learn how to publish your work to Tableau Server and Online to share dashboards with the world.