📊 PHASE 1 · LEVEL 1 🟢 Beginner MODULE 02

Connecting Data Sources

⏱️ 40–55 min read
🔌 5 Connection Types
❓ 5 Quiz Questions
📅 Week 1
Data is only useful once it's connected. In this lesson you'll learn exactly how Tableau connects to every major data source — from a simple Excel file on your desktop to a PostgreSQL database in the cloud. You'll also master the critical choice between Live and Extract connections, which has a huge impact on dashboard performance.
Module 2 of 25+ · Phase 1 Progress 8%

Connecting to Excel & CSV Files

Excel and CSV files are the most common starting points for Tableau users. Whether you received a report from a colleague or exported data from your accounting software, Tableau connects to flat files in seconds with no configuration needed.

Connecting Step by Step

1
Open the Connect Panel
On the Tableau Start Page, look at the left column under "Connect". Click Microsoft Excel for .xlsx/.xls files, or Text File for .csv and .txt files.
2
Browse to Your File
A file browser opens. Navigate to your file and click Open. Tableau immediately analyzes the file structure and opens the Data Source page.
3
Examine the Data Source Page
You'll see your sheets or tables in the left panel. Drag the sheet you want (e.g., Orders) to the canvas area. Tableau previews the first 1,000 rows of data.
4
Check Field Types
Each column has a data type icon: Abc = String, # = Number, 📅 = Date. Click any type icon to change it if Tableau guessed incorrectly.
5
Click "Sheet 1" to Analyze
Once you've confirmed the fields look right, click the "Sheet 1" tab at the bottom to go to the worksheet and start building visualizations.
💡
Data Interpreter — Your Cleaning Assistant
If your Excel file has merged cells, headers in the wrong row, or extra blank rows, Tableau's Data Interpreter button appears automatically. Click it and Tableau will attempt to clean and reformat the sheet. It highlights exactly what it changed so you can review and accept. This saves hours of manual cleanup for messy spreadsheets.

Understanding Field Types at Connection

IconData TypeExamplesTableau Role
AbcString (Text)Names, Categories, CitiesDimension
#Number (Float/Int)Sales, Quantity, ProfitMeasure
📅DateOrder Date, Ship DateDimension
🕐Date & TimeTimestamps, Login TimesDimension
T|FBooleanIs Returned, Is ActiveDimension
🌍GeographicCountry, State, ZIP CodeDimension
⚠️
Date Columns Stored as Text
A common issue: date columns that look like dates but are stored as strings (you'll see Abc instead of 📅). Right-click the column header → Change Data Type → Date. If the format is unusual (e.g., 2024_01_15), you may need a calculated field with DATEPARSE() to convert it properly.

Database Connections

For production-level analytics, data lives in relational databases — MySQL, PostgreSQL, SQL Server, Oracle, and more. Tableau connects natively to all of them. You'll need connection credentials from your database administrator (or your own if you're the admin).

Connecting to MySQL

1
Select MySQL from Connect Panel
On the Start Page, under "To a Server", click MySQL. A connection dialog opens. If you don't see MySQL listed, click "More..." to see all connectors.
2
Enter Connection Details
Server: hostname or IP (e.g., localhost or db.mycompany.com). Port: 3306 (MySQL default). Database: your database name. Username & Password: your DB credentials.
3
SSL Configuration
If your database requires encrypted connections (recommended for production!), check "Require SSL". You may need to provide certificate files (.pem) if using mutual TLS authentication.
4
Select Schema & Tables
After connecting, the Data Source page shows your database schema. Drag tables to the canvas to build joins, or use Custom SQL to write a query that pre-filters and shapes your data before it hits Tableau.
🔧
Custom SQL — Query Before You Visualize
Instead of loading an entire table, you can write a SQL query in the "Custom SQL" option. This is powerful when you need to: pre-aggregate millions of rows, apply business logic, join tables with complex conditions, or restrict data to specific date ranges. The query runs on the database server, sending only the result set to Tableau — much faster for large datasets.

PostgreSQL Connection

PostgreSQL works identically to MySQL but uses port 5432 by default. One key difference: PostgreSQL is case-sensitive for schema and table names, so Orders and orders are different tables. Always confirm the exact casing with your DBA.

DatabaseDefault PortNotes
MySQL3306Most common in web apps and startups
PostgreSQL5432Case-sensitive; strong JSON support
SQL Server1433Common in enterprises; Windows Auth available
Oracle1521Requires Oracle JDBC driver installation
Amazon Redshift5439PostgreSQL-compatible cloud data warehouse

Cloud Data Sources

Modern data doesn't just live in local databases — it lives in the cloud. Tableau connects natively to Google Sheets, Salesforce, cloud data warehouses, and dozens of SaaS platforms. Cloud connections often require OAuth authentication rather than username/password.

Google Sheets

1
Connect via "Google Sheets"
On the Start Page, click Google Sheets under "To a Server". A browser window opens for Google OAuth authentication — sign in with the Google account that owns the Sheet.
2
Search for Your Sheet
A search dialog lets you find your spreadsheet by name. Select the file and click Connect. Tableau lists all tabs — drag the relevant tab to the canvas.
3
Choose Live or Extract
Google Sheets can be either. Live means the dashboard always shows the current Sheet data. Extract takes a snapshot — faster, but you need to refresh manually or on a schedule.

Salesforce

Tableau integrates deeply with Salesforce CRM data. From the Connect panel, select Salesforce and authenticate via your Salesforce credentials. You can connect to standard Salesforce objects (Accounts, Opportunities, Leads, Cases) or write SOQL queries for custom objects. This is especially powerful for sales pipeline and customer analytics dashboards.

AWS Athena & Cloud Warehouses

Amazon Athena lets you query data stored in S3 buckets using standard SQL. In Tableau, connect using the "Amazon Athena" connector — you'll need your AWS Access Key, Secret Key, S3 staging directory (for query results), and the AWS region. Similarly, Google BigQuery and Snowflake are first-class connectors with full OAuth support and high-performance query pushdown.

🚀
Query Pushdown — The Key to Cloud Performance
When Tableau connects to cloud warehouses like Snowflake, BigQuery, or Redshift, it uses query pushdown: instead of pulling all raw data to your machine, Tableau sends SQL queries to the cloud and only receives the aggregated results. A billion-row table becomes a manageable result set. This is why cloud data warehouse connections can be just as fast as local extracts for complex analyses.

Live vs Extract Connections

This is one of the most important architectural decisions in Tableau. Every connection is either Live (queries the source in real-time) or an Extract (a snapshot of data stored locally in a .hyper file). Understanding the trade-offs is essential for building reliable, performant dashboards.

AspectLive ConnectionExtract (.hyper)
Data FreshnessAlways currentSnapshot (refresh needed)
PerformanceDepends on database speedVery fast (columnar storage)
Network DependencyRequires DB connectionWorks offline
Best ForReal-time ops, live KPIsLarge datasets, slow DBs, sharing
SharingRecipient needs DB accessPackaged .twbx includes data
Row LimitUnlimited (server-side)Billions of rows supported

When to Use Live

Choose Live connections when your business requires real-time data — think operations dashboards monitoring live inventory, fraud detection, or manufacturing line metrics. Also use Live when your database already has great performance and the extra extract step creates unnecessary overhead.

When to Use Extract

Choose Extract when: (1) your database is slow or heavily loaded, (2) you need to share the workbook with someone who doesn't have database access, (3) you're working with cloud sources that charge per-query (minimize costs!), or (4) you need to work offline while traveling.

Scheduling Extract Refreshes

When you publish a workbook with an Extract to Tableau Server or Cloud, you can set an automatic refresh schedule — hourly, daily, weekly. Tableau connects to the original data source at the scheduled time and rebuilds the extract. This gives you the performance of extracts with the freshness of live connections for most business use cases.

📌
Incremental Extract Refresh
Full refreshes rebuild the entire extract. But for large datasets, you can configure an Incremental Refresh — Tableau only pulls new rows added since the last refresh (using a date/ID column you specify). A 100M-row extract that takes 2 hours to fully refresh might take only 5 minutes incrementally by pulling just today's new records.

Managing & Cleaning Data Sources

Once connected, Tableau gives you several tools to shape and clean your data before analysis. These are applied at the Data Source level, so they affect every worksheet using that connection.

Data Source Filters

A Data Source Filter permanently restricts which rows are available in the entire workbook — it runs before any other filter. Click "Add" in the Filters panel (top-right of the Data Source page) to add one. Use cases: filter to only the current year's data to reduce extract size, or restrict sensitive data so users only see their region's records.

Renaming & Hiding Fields

Database column names are often cryptic (cust_acq_dt instead of "Customer Acquisition Date"). On the Data Source page, right-click any column header to rename it. You can also hide fields you never use — they disappear from the Data pane, reducing clutter without deleting any data.

Splitting Columns

If a column contains combined values (e.g., "John Smith" in a single Name field, or "2024-Q1" in a Period field), use the Split option: right-click the column → Transform → Split. Tableau creates new calculated fields for each part. You can also use Custom Split to specify your delimiter.

💡
Pivot — Reshape Wide Data to Tall
Many datasets are in "wide" format: one row per entity with many value columns (Jan Sales, Feb Sales, Mar Sales...). Tableau works best with "tall" (normalized) data. Select the month columns, right-click → Pivot. Tableau creates two new columns: "Pivot Field Names" (Month) and "Pivot Field Values" (Sales). Now you have one row per entity-month, and building a time-series chart is effortless.

Multiple Connections in One Workbook

A single Tableau workbook can connect to multiple data sources simultaneously. Click the small cylinder icon (or "Add" in the Data pane) to add a second connection. Each data source gets its own entry in the Data pane. You can then use Data Blending (covered in Lesson 11) to combine them when a direct join isn't possible because the data lives in different database systems.

🔑 Key Concepts — Lesson 2
Live Connection
Queries the original data source in real-time every time the view loads. Always shows current data but performance depends on the source.
Extract (.hyper)
A local snapshot of data stored in Tableau's fast columnar format. Enables offline use, faster performance, and easy sharing.
Data Source Filter
A permanent filter applied at the connection level that restricts which rows are available across the entire workbook.
Data Interpreter
Tableau's built-in tool to detect and fix common Excel formatting issues like merged cells and extra header rows automatically.
🧠 Knowledge Check
1. You need to build a dashboard that always shows live inventory levels as they update every minute. Which connection type should you use?
2. What does the "Data Interpreter" feature in Tableau do?
3. Which of the following is an advantage of an Extract connection over a Live connection?
4. What is the default port for a MySQL database connection in Tableau?
5. A Data Source Filter in Tableau is applied at which level?
🏆

What You Learned

📋
Lesson 2 — Key Takeaways
✅ Connect to Excel/CSV files and use Data Interpreter for messy spreadsheets
✅ Database connections require server, port, database name, and credentials
✅ Cloud sources (Google Sheets, Salesforce, Athena) use OAuth authentication
✅ Live = real-time but database-dependent · Extract = fast snapshot, offline-capable
✅ Data Source Filters restrict data for the entire workbook before any analysis
✅ Rename, hide, split, and pivot columns directly on the Data Source page
🎉
Lesson 2 Complete!
You now know how to connect Tableau to any major data source and make the right Live vs Extract decision. Next: explore every corner of the Tableau workspace — shelves, cards, and the Show Me panel.