---
title: "Connecting to Databricks"
description: "In xP&A, you can connect to the following spreadsheet file formats: Google Sheets, Excel Online, Excel Upload, CSV Upload\nThe following three data formats for spreadsheet files are supported: Tables, Transactions, Time-series\nEach connected file needs to be formatted in a certain way so that the data can be imported into xP&A."
source_url: https://support.lucanet.cloud/en/documentation/xp-a---extended-planning-and-analysis/integrating-data/connecting-to-a-data-source/connecting-to-a-bi-data-warehouse-system/connecting-to-databricks
language: en
last_updated: 2023-08-16
---
# Connecting to Databricks

## Overview

You can establish a direct connection to Databricks to pull in data into xP&A.

This article contains a description of the prerequisites and the individual steps of the set-up process.

{% info-box %}
The fields that are pulled in must be defined during the set-up process using a database query. For a detailed instruction on how to structure such a query, see [Defining Database Queries](https://support.lucanet.cloud/en/documentation/xp-a---extended-planning-and-analysis/integrating-data/connecting-to-a-data-source/connecting-to-a-bi-data-warehouse-system/defining-database-queries.md).
{% /info-box %}

## Prerequisites for the Setup

### Whitelist IP Address in Databricks

Before connecting Databricks with xP&A, you have to whitelist the following IP address in your Databricks database:

- 52.59.129.235

{% stepper %}
{% stepper-step %}
Choose one of the following options:

- Open the **Data** workspace from the overview on the start page and click **\+ New.**

New data source connection from the Data workspace

- Open the model into which you would like to integrate the data, click the **+** sign next to **Data** in the overview, and choose **New data source**:

New data source connection from within a model
{% /stepper-step %}
{% stepper-step %}
In the **Data sources** dialog, open the **BI/Database** tab and choose **Databricks.**

'BI/Database' tab
{% /stepper-step %}
{% stepper-step %}
The **New Data Source** dialog is displayed as follows:

'New Data Source' dialog for Databricks
{% /stepper-step %}
{% stepper-step %}
Configure the steps as described in the following section.
{% /stepper-step %}
{% stepper-step %}
Click **Create data source**.
{% /stepper-step %}
{% /stepper %}

### Set-up Steps

#### Choose a connection

Choose an existing connection, or, if you have not configured a connection yet, click **New Connection** and enter the following in the **New Databricks connection** dialog:

- **Host** name of the Databricks cluster you want to connect
- **Port** of the Databricks cluster you want to connect
- **HTTP path** of the workspace/warehouse you want to connect
- Authentication type to be used when connecting to Databricks. Choose one of the following options:
 - **Personal account token**: If you choose this option, copy the personal access token to be used to access Databricks into the **Token** field. (For more information on how to get the access token, see [Authenticate with Databricks personal access tokens](https://docs.databricks.com/aws/en/dev-tools/auth/pat).)
 - **Machine-to-Machine OAuth**: If you choose this option, enter the **client ID** and the **Client secret** used in the machine-to-machine authentication workflow with Databricks. (For more information on the client ID and client secret, see [Authenticate access to Databricks using OAuth token federation.)](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-federation)

New Databricks connection

#### Complete the query form

Enter the following:

- **Data Source Name**
- **Query** to define the fields which are to be pulled in. For more information, see [Defining Database Queries](https://support.lucanet.cloud/en/documentation/xp-a---extended-planning-and-analysis/integrating-data/connecting-to-a-data-source/connecting-to-a-bi-data-warehouse-system/defining-database-queries.md).
- Name of the **Date column**, which must be one of Databricks' date formats
- Names of the columns that contain **variables**(which must have a **numeric** data type)

{% idea-box %}
Any remaining columns will be treated as dimensions, and must have a **string** data type.

An exception is the **cohort** dimension, which must be a date, with the column header explicitly labelled **Cohort**.
{% /idea-box %}