Databricks Widgets in SQL Notebook

Databricks Widgets in SQL Notebook

Databricks widget API enables users to apply different parameters for notebooks and dashboards. It’s best for re-running the same code using different parameter values. When used in dashboards, it creates filters for the dashboard charts.

Databricks supports both python and SQL code for the widget API. In this tutorial, we will talk about how to create widgets using SQL. We will cover:

  • How to create different types of widgets using SQL?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

To learn about python widgets, please refer to my tutorial Databricks Widgets in Python Notebook.

Resources for this post:

  • Databricks notebook with code
  • More video tutorials on Databricks
  • More blog posts on Databricks
  • If you prefer the video version of the tutorial, watch the video below on YouTube
Databricks Widgets in SQL – GrabNGoInfo.com

Step 0: Databricks Widget Types

There are four types of Databricks widgets:

  • text takes text as inputs.
  • dropdown creates a dropdown list with values.
  • combobox is a combination of text and dropdown. Users can either select values from the dropdown list or input their own values.
  • multiselect creates a list of values. Users can select one or more values from the list.

To get the help information about widgets, use dbutils.widgets.help(). We need to add %python at the beginning of the cell because the default language for the notebook is SQL. The output has the methods available for widgets and their syntax.

%python
# Get documentation about widgets
dbutils.widgets.help()

Output:

dbutils.widgets provides utilities for working with notebook widgets. You can create different types of widgets and get their bound value. For more info about a method, use dbutils.widgets.help("methodName").
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value

To get the help information about a specific widget method, use dbutils.widgets.help(methodName). For example, I can use dbutils.widgets.help('combobox') to get information about the combobox method.

%python
# Get help informatuon for one method
dbutils.widgets.help('combobox')

Output:

/**
* Creates a combobox input widget with a given name, default value and choices. Optionally, you
* can provide a label for the combobox widget that will be rendered in place of the name. If a
* widget with a given name already exists, its properties will be overwritten. The default
* value does not have to be one choices.
*
* Example: dbutils.widgets.combobox("product", "Other", Seq("Camera", "GPS", "Smartphone"))
*
* @param name unique name identifying the widget
* @param defaultValue value value which widget is populated by default
* @param choices possible choices for the dropdown menu
* @param label optional widget label
*/
combobox(name: java.lang.String, defaultValue: java.lang.String, choices: scala.collection.Seq, label: java.lang.String): void

Step 1: Read In Dataset

In step 1, A CSV dataset on cryptocurrency prices is read from a mounted S3 bucket. The dataset is a subset of the Kaggle G-Research Crypto Forecasting dataset. To learn how to mount an AWS S3 bucket to Databricks, please refer to my previous tutorial Databricks Mount To AWS S3 And Import Data.

-- mode "FAILFAST" will abort file parsing with a RuntimeException if any malformed lines are encountered
CREATE OR REPLACE TEMPORARY VIEW crypto_100k_records
USING CSV
OPTIONS (path "/mnt/demo4tutorial/data/crypto_100k_records.csv", header "true", mode "FAILFAST");

-- Take a look at the data
SELECT * FROM crypto_100k_records

After reading the data, we will do some data processing. The timestamp is in UNIX epoch format, which is the number of seconds since January 1st of 1970 Coordinated Universal Time (UTC). Using from_unixtime, we changed it to a DateTime format. The columns that are not used in the visualization are dropped. We also created a new column for asset names.

-- Data processing
CREATE OR REPLACE TEMPORARY VIEW df AS
SELECT 
  asset_id,
  volume,
  --Change epoch to datetime format
  from_unixtime(timestamp) AS datetime, 
  -- Create asset name
  CASE WHEN asset_id = 1 THEN 'Bitcoin' WHEN asset_id = 6 THEN 'Ethereum' ELSE 'Other' END AS asset_name
FROM crypto_100k_records;

-- Take a look at the data
SELECT * FROM df

Output:

Cryptocurrency Data for Databricks Widgets – GrabNGoInfo.com

Step 2: Create Databricks Widgets Using SQL

In step 2, we use SQL to create different types of Databricks widgets.

  • The dropdown widget is for the Asset_Name column. It has the name of dropdown_filter and the default value of Bitcoin. There are four choices in the dropdown. BitcoinEthereumOther, and AllAll means selecting all the asset names.
  • The multiselect widget is based on the Asset_Name column too. It has the name of multiselect_filter and the default value of Bitcoin. The three choices, BitcoinEthereum, and Other are the three unique values for the Asset_Name column. We can select multi-values using the multiselect widget.
  • The combobox widget is based on the Asset_ID column. It has the name of combobox_filter and the default value of 0. The unique values of the Asset_ID column is pulled using SELECT DISTINCT.
  • The text widget is based on the Asset_ID column too. It has the name of text_filter and the default value of 0. Users can enter the asset ID into the box.
-- Create a dropdown widget
CREATE WIDGET DROPDOWN dropdown_filter DEFAULT "Bitcoin"  CHOICES (VALUES 'Bitcoin', 'Ethereum', 'Other', 'All');

-- Create a multiselect widget
CREATE WIDGET MULTISELECT multiselect_filter DEFAULT "Bitcoin"  CHOICES (VALUES 'Bitcoin', 'Ethereum', 'Other');

-- Create a combobox widget
CREATE WIDGET COMBOBOX combobox_filter DEFAULT "0"  CHOICES SELECT DISTINCT asset_id FROM df;

-- Create a text widget
CREATE WIDGET TEXT text_filter DEFAULT "0";

After running the code, the widgets show on the top of the notebook.

Databricks combobox, dropdown, multi-select, and text widgets – GrabNGoInfo.com

Step 3: Pass Widget Values in SQL Code

After creating the widgets, in step 3, we will talk about how to pass the widget values using SQL. A widget value can be retrieved by passing the widget name into the getArgument() function.

-- Pass widget value to SQL code
SELECT * 
FROM df
WHERE asset_id = getArgument('text_filter')

Output:

Output after Filtered by Databricks Text SQL Widget – GrabNGoInfo.com

When the dropdown widget has All as an option, we need to treat All and other options differently because All is not a value in the dataframe column. One way to do this is to use the CASE WHEN conditions to include all records if the dropdown widget value is All, and filter by the widget value otherwise.

-- Dropdown widget with All as one option
SELECT 
  datetime,
  volume
FROM df
WHERE CASE getArgument('dropdown_filter') WHEN 'Bitcoin' THEN asset_name = 'Bitcoin'
                                          WHEN 'Ethereum' THEN asset_name = 'Ethereum'
                                          WHEN 'Other' THEN asset_name = 'Other'    
                                          ELSE asset_name IN (SELECT DISTINCT asset_name FROM df) END

Output:

Output after Filtered by Databricks Dropdown SQL Widget – GrabNGoInfo.com

Step 4: Use Widgets As Filters For Dashboard

In step 4, we will talk about how to use widgets as filters for Databricks dashboard.

Firstly, let’s create a chart using Databricks’ built-in tool.

-- Dropdown widget with All as one option
SELECT 
  datetime,
  volume
FROM df
WHERE CASE getArgument('dropdown_filter') WHEN 'Bitcoin' THEN asset_name = 'Bitcoin'
                                          WHEN 'Ethereum' THEN asset_name = 'Ethereum'
                                          WHEN 'Other' THEN asset_name = 'Other'    
                                          ELSE asset_name IN (SELECT DISTINCT asset_name FROM df) END

Click the downward triangle next to the bar chart icon, then select the chart type.

Databricks SQL Create Line Chart – GrabNGoInfo.com

Next, click the Plot Options icon to check if the settings for the chart are correct and make changes if necessary.

Databricks Line Chart with SQL Widget Filter – GrabNGoInfo.com

To create a dashboard, click the bar chart icon on the upper right corner of the cell, then click Add to New Dashboard.

Databricks SQL Notebook Add New Dashboard – GrabNGoInfo.com

This opens the dashboard. We can see the filters on the top of the dashboard.

Databricks SQL Dashboard with Filters – GrabNGoInfo.com

Step 5: Configure Databricks Widgets

In step 5, we will configure the Databricks widgets by clicking the gear icon on the top right of the notebook.

Configure Databricks SQL Widgets – GrabNGoInfo.com

The Widgets Panel Settings window will pop up. Under On Widget Change, there are three options.

  • Run Accessed Commands means that when the widget values change, only the cells that directly retrieve the changed widget are rerun. This is the default setting, but it does not work for SQL cells.
  • Do Nothing means that the notebook will not rerun based on the new widget values.
  • Run Notebook means rerun the whole notebook. I recommend choosing this option to prevent missing some of the important steps in the code.
Databricks SQL Widgets Panel Settings – GrabNGoInfo.com

The widgets panel is pin to the top of the notebook by default, but we can uncheck the Pinned to top option to show it above the first cell.

Databricks SQL Widgets Location – GrabNGoInfo.com

Step 6: Pass Values to Widgets in Another Notebook

Sometimes we may need to run the notebook with specific parameters from another notebook. In that case, we can use %run to run the notebook and pass the parameters at the same time. The sample code below is from the Databricks documentation for widgets.

%python
# Pass parameters to widgets in another notebook
%run /path/to/notebook $X="10" $Y="1"

Step 7: Delete Databricks Widgets

In step 7, we will talk about how to delete Databricks widgets. To delete one widget, use REMOVE WIDGET widget_name.

-- Remove one widget
REMOVE WIDGET text_filter

To remove all widgets, we need to use python code dbutils.widgets.removeAll().

%python
# Remove all widgets
dbutils.widgets.removeAll()

Summary

In this tutorial, we talked about how to create widgets using SQL. You learned:

  • How to create different types of widgets using SQL?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Note that you cannot use widgets to pass arguments between different languages within a notebook and run it in a scheduled job. So when the widgets are created in SQL, the value of the widgets need to be retrieved by SQL. To learn about how to run Databricks jobs, please refer to my tutorial Databricks Multi-Task Job Scheduling

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

References

Leave a Comment

Your email address will not be published. Required fields are marked *