📊 PHASE 1 · LEVEL 2 🟢 Beginner MODULE 04

Data Types & Data Preparation

⏱️ 45–60 min read
🔗 4 Join Types
❓ 5 Quiz Questions
🔄 Pivoting Covered
Real-world data is rarely in one perfect table. It's spread across multiple tables with different structures, data types, and formats. This lesson teaches you how to combine tables using joins, stack data with unions, understand Tableau's modern Relationships model, and reshape messy data with pivoting — all before a single visualization is built.
Module 4 of 25+ · Phase 1 Progress 16%

Data Types in Tableau

Every field in Tableau has a data type that determines how it behaves in calculations and visualizations. Tableau automatically detects types when connecting to data, but you'll often need to correct or convert them.

TypeIconDescriptionExample Values
StringAbcText data — any sequence of characters"Pakistan", "Furniture", "John Smith"
Number (Integer)#Whole numbers, no decimals42, 1500, -7
Number (Decimal)#Floating point numbers with decimals23.45, -0.12, 9999.99
Date📅Calendar date (no time component)2024-03-15, January 1 2025
Date & Time🕐Date with hours, minutes, seconds2024-03-15 09:30:00
BooleanT|FTrue/False binary valuesTRUE, FALSE, 1, 0
Geographic🌍Location data recognized by TableauCountry, State, City, ZIP, Lat/Long

Changing Data Types

To change a field's type: on the Data Source page, click the type icon above the column header and select the new type. In a worksheet, right-click the field in the Data pane → Change Data Type. Important: changing from String to Date only works if the string values are valid date formats Tableau can parse.

⚠️
Geographic Role Assignment
If your location column (e.g., "City") isn't recognized as geographic, Tableau treats it as plain text. Fix: right-click the field in the Data pane → Geographic Role → City (or Country, State, ZIP Code, etc.). The globe icon appears and Tableau can now plot it on maps. For non-standard locations, you may need to provide a latitude/longitude file manually.

Joins in Tableau

A join combines columns from two tables based on a matching key field. When you drag two tables to the canvas on the Data Source page, Tableau creates a join dialog. Tableau supports four standard SQL join types — understanding each is critical for correct data modeling.

Inner Join
(A ∩ B) [A] [A∩B] [B] ^^^^
Returns only rows where the key exists in both tables. Most common join. Rows without a match in either table are excluded. Use when you only want data that has a complete record on both sides.
Left Join
(A + A∩B) [A] [A∩B] [B] ^^^^^^^^^^
Returns all rows from the left (primary) table, plus matching rows from the right table. Non-matching right rows become NULL. Use when you need all records from your main table, even those with no match.
Right Join
(A∩B + B) [A] [A∩B] [B] ^^^^^^^^^^
Returns all rows from the right table, plus matching rows from the left. The mirror of Left Join. Rarely needed — you can always switch table positions and use a Left Join instead.
Full Outer Join
(A ∪ B) [A] [A∩B] [B] ^^^^^^^^^^^^^^
Returns all rows from both tables. Where there's no match, NULL fills the gaps. Use for data quality audits — to find records that exist in one table but not the other (the NULLs reveal the gaps).

Configuring Joins in Tableau

1
Drag Two Tables to Canvas
On the Data Source page, drag a second table next to the first. Tableau auto-detects matching field names and proposes an Inner Join. A Venn diagram icon appears between the two table names.
2
Click the Join Icon
Click the Venn diagram to open the Join dialog. Choose your join type (Inner/Left/Right/Full Outer) by clicking the appropriate part of the Venn diagram.
3
Set Join Conditions
Specify the key fields to join on. Example: Left table Orders.Customer ID = Right table Customers.ID. You can add multiple conditions with AND logic for compound keys.
4
Verify the Row Count
Check the row count shown at the bottom of the Data Source preview. If it's dramatically higher than expected, you likely have a many-to-many join creating duplicate rows — a common mistake!
⚠️
Many-to-Many Joins — The Row Multiplication Trap
If your join key is not unique in both tables (e.g., one customer has many orders AND you're joining to a table where each customer appears multiple times), Tableau creates a cross-product — every matching row from Table A pairs with every matching row from Table B. A customer with 10 orders × 5 address records = 50 rows instead of 10. Always check row counts after joining and use Relationships instead when this is a risk.

Unions — Combining Rows

While joins combine columns, a Union combines rows. Use a union when you have multiple tables with the same structure (same columns) but different periods or regions — like separate Excel sheets for each month of sales data, or separate files per sales region that you want to analyze together.

Creating a Union in Tableau

1
Drag the First Table
Bring your first table to the canvas on the Data Source page as normal (e.g., "January Sales").
2
Drag Second Table Below
Drag the second table and drop it below the first (not beside it). A blue "Union" indicator appears. Drop when you see the Union drag target. Tableau stacks the rows.
3
Use Wildcard Union for Multiple Files
For many files (all 12 months), use New Union → Wildcard → specify a pattern like Sales_*.xlsx. Tableau automatically includes all matching files. Tableau also adds a "Sheet" column identifying which file each row came from.
💡
Mismatched Column Names in Unions
If your tables have slightly different column names (e.g., "Product_Name" vs "ProductName"), Tableau creates separate columns with NULLs where data doesn't align. Fix by using the union configuration panel to merge mismatched fields: drag one field name onto the other to tell Tableau they're the same column. The merged column gets a combined header and no more NULLs.

Relationships vs Joins vs Blending

Tableau offers three ways to combine data from multiple sources. Choosing the right one prevents data quality issues and unlocks the right analysis capabilities.

MethodHow It WorksBest Use CaseRisk
Relationships Defines logical connections between tables. Tableau resolves joins at query time based on what fields are in the view. Default choice for multi-table data models. Avoids row duplication. Low — safest option
Joins Merges tables into one flat table at connection time. Runs before any analysis. Simple two-table merges with guaranteed unique keys Row duplication with non-unique keys
Blending Left joins two data sources from different connections at the aggregated level in the view. Combining data from different databases (e.g., MySQL + Excel) that can't be joined directly Only LEFT join, limited aggregations

Relationships — The Modern Default (Tableau 2020.2+)

Relationships are the recommended approach for multi-table data in Tableau. Instead of joining tables into a flat structure, you define relationships between tables in a logical layer — similar to how you define foreign keys in a database. The actual join type is determined dynamically when you drag fields into the view, preventing row duplication and enabling correct aggregations across related tables.

To create a relationship: drag two tables to the canvas on the Data Source page (without double-clicking a table first). A noodle/line appears between them — click it to define the relationship key fields. Tableau handles the rest at query time.

Pivoting Data & Data Interpreter

Real-world data often comes in formats optimized for human reading, not machine analysis. Two tools help you reshape this data without leaving Tableau.

Pivoting Wide Data to Tall

A common spreadsheet format puts each time period as a column: "Jan Sales", "Feb Sales", "Mar Sales"... Tableau works best with tall (normalized) data where each row represents one observation. Use Pivot to convert:

1
Select the Wide Columns
On the Data Source page, click the first month column header, then Shift+click the last to select all month columns simultaneously.
2
Right-Click → Pivot
Right-click any selected column → Pivot. Tableau creates two new columns: "Pivot Field Names" (containing "Jan Sales", "Feb Sales", etc.) and "Pivot Field Values" (the actual numbers).
3
Rename the New Columns
Double-click "Pivot Field Names" and rename it to "Month". Double-click "Pivot Field Values" and rename it to "Sales". Now building a time-series line chart is trivial — drag Month to Columns, Sales to Rows.

Data Interpreter — Smart Excel Cleanup

When you connect to a messy Excel file, Tableau may show a "Use Data Interpreter" button in the Data Source panel on the left. Clicking it runs an analysis of your spreadsheet and attempts to automatically: remove blank rows and columns, detect and skip title rows above the real headers, merge split headers, and identify the actual data range. After running, a "Review the results" link shows you exactly what changed.

🔬
When Data Interpreter Isn't Enough
For highly irregular data, Data Interpreter may still leave problems. Your next steps: (1) Use Custom SQL to pre-clean the data if it's in a database. (2) Use Tableau Prep Builder (covered in Phase 4) for complex transformations — splitting, merging, filtering, and aggregating data in a visual flow before it reaches Tableau Desktop. (3) Pre-process in Python/R if the transformation is too complex for no-code tools.
🔑 Key Concepts — Lesson 4
Inner Join
Returns only rows where the join key exists in both tables. Excludes any rows without a match on either side.
Left Join
Returns all rows from the left table. Adds matching data from the right table; non-matching right rows become NULL.
Relationship
A flexible, context-aware connection between tables that resolves at query time. Prevents row duplication. The modern default in Tableau.
Union
Stacks rows from two or more tables with the same column structure. Used to combine data from multiple files or periods.
Pivot
Reshapes wide (column-per-period) data into tall (one-row-per-observation) data. Essential for time-series analysis.
Data Blending
Combines data from two different database connections at the aggregated view level. Acts as a left join at query time.
🧠 Knowledge Check
1. You have an Orders table and a Customers table. You want ALL orders, even those where the customer record was accidentally deleted. Which join type should you use?
2. What does a Union do in Tableau (as opposed to a Join)?
3. What is the main advantage of Tableau Relationships over traditional Joins?
4. Your Excel file has sales data spread across 12 monthly sheets (Jan, Feb... Dec), all with identical columns. What is the most efficient way to combine them in Tableau?
5. A spreadsheet has columns: "Product", "Jan_Sales", "Feb_Sales", "Mar_Sales". What Tableau operation converts this to three rows per product (one for each month)?
🏆

What You Learned

📋
Lesson 4 — Key Takeaways
✅ Tableau supports 7 data types: String, Integer, Decimal, Date, DateTime, Boolean, Geographic
✅ Inner Join = matching rows only · Left Join = all left rows + matching right rows
✅ Full Outer Join = all rows from both tables (NULLs where no match)
✅ Relationships resolve join type at query time — prevents row duplication
✅ Unions stack rows vertically (same structure) · Pivot reshapes wide data to tall
✅ Data Interpreter cleans messy Excel files automatically
🎉
Lesson 4 Complete!
You now understand data types, joins, unions, relationships, and data reshaping. Next: put it all together by building your first complete set of charts and visualizations!