Connecting to Databricks

Overview

You can establish a direct connection to Databricks to pull in data into xP&A.

 

This article contains a description of the prerequisites and the individual steps of the set-up process.

The fields that are pulled in must be defined during the set-up process using a database query. For a detailed instruction on how to structure such a query, see Defining Database Queries.

This article contains the following sections:

Prerequisites for the Setup
Whitelist IP Address in Databricks

Before connecting Databricks with xP&A, you have to whitelist the following IP address in your Databricks database:

  • 52.59.129.235
Connecting to Databricks

To connect to Databricks:

Choose one of the following options:

  • Open the Data workspace from the overview on the start page and click + New.
Shows the start page of Lucanet xP&A. Highlighted are the buttons that need to be used to create a new data source connection from the start page.
New data source connection from the Data workspace
  • Open the model into which you would like to integrate the data, click the + sign next to Data in the overview, and choose New data source:
Shows the view of a model. Highlighted are the buttons that need to be used to create a new data source connection from the model view.
New data source connection from within a model

In the Data sources dialog, open the BI/Database tab and choose Databricks.

Shows the Data Sources dialog. Highlighted in red is the 'BI/Database' tab.
'BI/Database' tab

The New Data Source dialog is displayed as follows:

Shows the 'New Data Source' dialog for a Databricks connection 'New Data Source' dialog for Databricks

Configure the steps as described in the following section.

Click Create data source.

Set-up Steps

Step

Description


Choose a connection

Choose an existing connection, or, if you have not configured a connection yet, click New Connection and enter the following in the New Databricks connection dialog:

  • Host name of the Databricks cluster you want to connect
  • Port of the Databricks cluster you want to connect
  • HTTP path of the workspace/warehouse you want to connect
  • Authentication type to be used when connecting to Databricks. Choose one of the following options:

 

Upload file step
New Databricks connection

Complete the query form

Enter the following:

  • Data Source Name
  • Query to define the fields which are to be pulled in. For more information, see Defining Database Queries.
  • Name of the Date column, which must be one of Databricks' date formats 
  • Names of the columns that contain variables (which must have a numeric data type)

Any remaining columns will be treated as dimensions, and must have a string data type.

An exception is the cohort dimension, which must be a date, with the column header explicitly labelled Cohort.


Contact Us