Datasets/Add Dataset from Redshift

Add Dataset from Redshift

Overview

The Redshift Dataset allows you to expose a specific table from your connected Amazon Redshift cluster as a structured Dataset in Sourcesible. Once configured, the Dataset can be used across Audiences, Computed Fields, and Dataset Models. Redshift follows the same 3-step wizard as other warehouse sources (Azure Synapse), using schema-based table organization with native type auto-detection. Note that both the table list and the Data Settings field schema can take additional time to load due to Redshift cluster connection latency. Before creating a Dataset, ensure you have a Redshift Source already connected under Data Sources.

Creating a Dataset from Redshift

Step 1 — Open the Dataset Page

In the left navigation, click Datasets.
Click Add Dataset in the top-right corner.

Step 2 — Choose a Data Source

On the Choose Data Source screen, select the radio button next to Redshift.
Click Next.

Step 3 — Choose Method (Step 1 of 3)

Select how you want to define your dataset:

Method	Description	Best For
BigQuery Table	Browse and select a table directly from your connected BigQuery project. Requires a user with read and write permissions.	Quick setup when you want the full table with no transformation
SQL	Write a custom query using the online SQL editor.	Custom datasets, filtered views, joins, or aggregations

Click Next after selecting your method.

Step 4 — Choose a Redshift Table (Step 2 of 3)

The Choose a Redshift Table screen displays after selecting Table Selector. Sourcesible fetches the list of schemas and tables from your connected Redshift cluster.

The table list may take several seconds to load while Sourcesible establishes a connection to your Redshift cluster. A loading spinner is displayed during this time. Do not navigate away until the list appears.

Wait for the table list to finish loading.
Tables are grouped under their parent Redshift schema (e.g., oak_edu, pg_auto_copy). Select the radio button next to the table you want to use.
Use the Search field to filter tables by name if needed.
Click Confirm.

In the example shown, the available tables under oak_edu include: courses, enrollments, payments, session, students, students_sample_v4, test_data_v5, testdata_v5, testdataforredshift, testdataforredshift_v2, testdatareddd, v5. Under pg_auto_copy: copy_job_detail.

If your Redshift cluster contains many schemas and tables, use the Search field to quickly narrow down to the table you need rather than scrolling through the full list.

Step 5 — Set Up Dataset (Step 3 of 3)

The wizard advances to Set up Dataset.

Dataset Identity

In the Dataset Name field, enter a name to identify this dataset in Sourcesible (maximum 50 characters). You can update this at any time under Settings.
Optionally, enter a Dataset Description (maximum 50 characters).

Source Reference (Read-only)

The following fields are auto-populated from your table selection and cannot be edited here:

Schema Name from Redshift schema — displays the Redshift schema name (e.g., oak_edu)
Table Name from Redshift Table — displays the selected table name (e.g., courses)

Data Settings

After arriving at the Set up Dataset step, Data Settings may display a loading spinner for several additional seconds while Sourcesible retrieves the column schema from Redshift. This is separate from the table list loading in Step 4 and is expected behavior.

Once loaded, the Data Settings section lists all fields detected from your selected Redshift table with their native Redshift data types auto-detected (e.g., VARCHAR, INTEGER, DATE). For each field, configure the following:

Column	Description
Show In Filter	Makes this field available as a filter criterion in Audience and segmentation tools. Click the header checkbox to toggle all fields at once.
PII	Marks this field as Personally Identifiable Information. PII fields are displayed as masked in the data preview.
Exclude from Personalization	Prevents this field from being used in personalization or activation contexts.

Configure the Show In Filter, PII, and Exclude from Personalization checkboxes for each field as required.
Click Preview to validate that Sourcesible can read records from the table. A preview of up to 10 rows renders inline below Data Settings.
Review the preview data to confirm columns and values look correct.
Click Save.

The note at the bottom of the preview reads: "For data privacy, your selected PII fields will be displayed as masked."

Test Your Connection (Preview)

When you click Preview, Sourcesible queries the first 10 records from your selected Redshift table and renders them inline. The following are validated during preview:

Sourcesible can establish a connection to the Redshift cluster using the stored credentials
The selected schema and table exist and are accessible by the configured database user
The field schema matches the columns listed in Data Settings
PII-flagged fields render as masked values in the preview output

If the preview returns no records or an error, do not click Save. Verify that the Redshift user has SELECT permission on the selected table and that the cluster's security group allows inbound connections from Sourcesible's IP range.

Next Steps

Once your Redshift Dataset is saved, you can:

Create a Dataset Model — Join multiple Datasets together or define relationships for unified customer profiles.
Build Computed Fields — Derive new attributes from Redshift table fields.
Define Audiences — Use fields marked Show In Filter as segmentation conditions.
Configure Single View of Customer — Map Dataset fields to identity resolution and profile unification.

Tips and Troubleshooting

Table List Takes a Long Time to Load or Does Not Appear

Symptom: After reaching the Choose a Redshift Table step, the loading spinner displays for more than 30 seconds or the table list never populates.

Cause: Redshift connection latency is higher than other warehouse sources, especially for clusters in regions far from Sourcesible's infrastructure. This can also occur if the Redshift cluster is paused, the VPC security group blocks the connection, or the database user lacks the pg_catalog query permission needed to enumerate schemas and tables.

Fix: First verify the cluster is active in the AWS Console. Then confirm the security group allows inbound TCP on port 5439 from Sourcesible's IP range. Finally, confirm the database user has access to information_schema:

GRANT USAGE ON SCHEMA information_schema TO sourcesible_user;

GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO sourcesible_user;

Data Settings Takes a Long Time to Load After Confirming Table

Symptom: After clicking Confirm on the table selection and reaching Set up Dataset, the Data Settings section shows a loading spinner for an extended period and the Save button remains greyed out.

Cause: Sourcesible runs a separate query to fetch column-level metadata from Redshift after the table is confirmed. This is an additional round-trip to the cluster and can be slow for large tables or under high cluster load.

Fix: Wait for the spinner to resolve — this is expected behavior for Redshift and does not indicate an error. If it does not resolve after 60 seconds, click Back, reselect the table, and click Confirm again to retry the metadata fetch.

No Tables Appear Under a Schema in the Table List

Symptom: A schema group header is visible in the table list (e.g., oak_edu) but no tables are listed beneath it, or the schema is missing entirely.

Cause: The Redshift database user does not have USAGE permission on that schema, or does not have SELECT permission on any tables within it.

Fix: Grant the required permissions in Redshift:

GRANT USAGE ON SCHEMA oak_edu TO sourcesible_user;

GRANT SELECT ON ALL TABLES IN SCHEMA oak_edu TO sourcesible_user;

To automatically grant access to future tables added to the schema:

ALTER DEFAULT PRIVILEGES IN SCHEMA oak_edu

GRANT SELECT ON TABLES TO sourcesible_user;

Preview Returns Empty or Truncated Timestamp Values

Symptom: The created_at or other DATE/TIMESTAMP columns in the preview show values like 2024-12-06 00:00:00.00... with trailing truncation.

Cause: Redshift returns full datetime precision in timestamp columns. Sourcesible truncates the display in the preview table for readability, but the underlying data stored in the Dataset retains full precision.

Fix: This is display-only behavior in the preview. No action is required. The full timestamp value will be available when the Dataset is used in Computed Fields, Audiences, or downstream activations.