Databricks Widgets in Python Notebook

Databricks Widgets in Python Notebook

Databricks widget API enables users to apply different parameters for notebooks and dashboards. It’s best for re-running the same code using different parameter values. When used in dashboards, it creates filters for the dashboard charts.

Databricks supports both python and SQL code for the widget API. In this tutorial, we will talk about how to create widgets using python. We will cover:

  • How to create different types of widgets using python?
  • How to retrieve widgets values?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

To learn about SQL widgets, please refer to my tutorial Databricks Widgets in SQL Notebook.

Resources for this post:

  • Databricks notebook with code
  • More video tutorials on Databricks
  • More blog posts on Databricks
  • If you prefer the video version of the tutorial, watch the video below on YouTube.

Step 0: Databricks Widget Types

There are four types of Databricks widgets:

  • text takes text as inputs.
  • dropdown creates a dropdown list with values.
  • combobox is a combination of text and dropdown. Users can either select values from the dropdown list or input their own values.
  • multiselect creates a list of values. Users can select one or more values from the list.

To get the help information about widgets, use dbutils.widgets.help(). The output has the methods available for widgets and their syntax.

# Get documentation about widgets
dbutils.widgets.help()

Output:

dbutils.widgets provides utilities for working with notebook widgets. You can create different types of widgets and get their bound value. For more info about a method, use dbutils.widgets.help("methodName").
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value

To get the help information about a specific widget method, use dbutils.widgets.help(methodName). For example, I can use dbutils.widgets.help('combobox') to get information about the combobox method.

# Get help informatuon for one method
dbutils.widgets.help('combobox')

Output:

/**
* Creates a combobox input widget with a given name, default value and choices. Optionally, you
* can provide a label for the combobox widget that will be rendered in place of the name. If a
* widget with a given name already exists, its properties will be overwritten. The default
* value does not have to be one choices.
*
* Example: dbutils.widgets.combobox("product", "Other", Seq("Camera", "GPS", "Smartphone"))
*
* @param name unique name identifying the widget
* @param defaultValue value value which widget is populated by default
* @param choices possible choices for the dropdown menu
* @param label optional widget label
*/
combobox(name: java.lang.String, defaultValue: java.lang.String, choices: scala.collection.Seq, label: java.lang.String): void

Step 1: Import Libraries

In the first step, we will import the pyspark SQL functions for data processing. to_timestamp is for processing time data, when is for creating new columns based on conditions, and col is for working with columns.

# Functions for data processing
from pyspark.sql.functions import to_timestamp, when, col

Step 2: Read In Dataset

In step 2, A CSV dataset on cryptocurrency prices is read from a mounted S3 bucket. The dataset is a subset of the Kaggle G-Research Crypto Forecasting dataset. To learn how to mount an AWS S3 bucket to Databricks, please refer to my previous tutorial Databricks Mount To AWS S3 And Import Data.

# Read in CSV data
df = (spark.read.format('csv')
  .option("inferSchema", True)
  .option("header", True)
  .option("sep", ',')
  .load("/mnt/demo4tutorial/data/crypto_100k_records.csv"))
 
# Take a look at the data
display(df)

After reading the data, we will do some data processing. The timestamp is in UNIX epoch format, which is the number of seconds since January 1st of 1970 Coordinated Universal Time (UTC). Using to_timestamp, we changed it to a DateTime format. The columns that are not used in the visualization are dropped. We also created a new column for asset names.

# Change epoch to datetime format and drop unwanted columns
df = df.withColumn('DateTime', to_timestamp(df['timestamp'])).drop('timestamp', 'Count', 'Open', 'High', 'Low', 'Close', 'VWAP', 'Target')
 
# Create asset name
df = df.withColumn('Asset_Name', when(df['Asset_ID']==1, 'Bitcoin')
                                  .when(df['Asset_ID']==6, 'Ethereum')
                                  .otherwise('Other'))

# Take a look at the data    
display(df)
CryptoCurrency Data for Databricks Widgets – GrabNGoInfo.com

Step 3: Create Databricks Widgets Using Python

In step 3, we use python to create different types of Databricks widgets.

  • The dropdown widget is for the Asset_Name column. It has the name of dropdown_filter and the default value of Bitcoin. There are four choices in the dropdown. BitcoinEthereumOther, and AllAll means selecting all the asset names.
  • The multiselect widget is based on the Asset_Name column too. It has the name of multiselect_filter and the default value of Bitcoin. The three choices, BitcoinEthereum, and Other are the three unique values for the Asset_Name column. We can select multi-values using the multiselect widget.
  • The combobox widget is based on the Asset_ID column. It has the name of combobox_filter and the default value of 0. The unique values of the Asset_ID column is pulled using the distinct() function.
  • The text widget is based on the Asset_ID column too. It has the name of text_filter and the default value of 0. Users can enter the asset ID into the box.
# Create a dropdown widget
dbutils.widgets.dropdown(name='dropdown_filter', defaultValue='Bitcoin', choices=['Bitcoin', 'Ethereum', 'Other', 'All'], label='Select asset from the dropdown')

# Create a multiselect widget
dbutils.widgets.multiselect(name='multiselect_filter', defaultValue='Bitcoin', choices=['Bitcoin', 'Ethereum', 'Other'], label='Select multiple assets')

# Get unique asset id
unique_asset_id = [str(df.select('Asset_ID').distinct().collect()[i][0]) for i in range(len(df.select('Asset_ID').distinct().collect()))]

# Create a combobox widget
dbutils.widgets.combobox(name='combobox_filter', defaultValue='0', choices=unique_asset_id, label='Select or enter asset ID')

# Create a text widget
dbutils.widgets.text(name='text_filter', defaultValue='0', label='Enter asset ID')

Step 4: Get Values From Databricks Widgets

After creating the widgets, in step 4, we will check the value of each widget. A widget value can be retrieved by passing the widget name into dbutils.widgets.get(). We saved the retrieved widget values into variables and printed it out.

# Save the dropdown widget value into a variable
dropdown_filter_value = dbutils.widgets.get("dropdown_filter")

# Save the multi-select widget value into a variable
multiselect_filter_value = dbutils.widgets.get("multiselect_filter")

# Save the combobox widget value into a variable
combobox_filter_value = dbutils.widgets.get("combobox_filter")

# Save the text widget value into a variable
text_filter_value = dbutils.widgets.get("text_filter")

# Print the widget values
print(f'The dropdown filter value is {dropdown_filter_value}.\nThe multiselect filter value is {multiselect_filter_value}.\nThe combobox filter value is {combobox_filter_value}.\nThe text filter value is {text_filter_value}.')

Output:

The dropdown filter value is Bitcoin.
The multiselect filter value is Bitcoin,Ethereum.
The combobox filter value is 9.
The text filter value is 0.

Step 5: Pass Widget Values in Python Code

In step 5, we will talk about how to pass the widget values using python.

We can use dbutils.widgets.get() function to pull the value directly from the widget like the code below.

# Pass widget values using get()
display(df.filter(col('Asset_ID')==dbutils.widgets.get('text_filter')))
Output after Filtered by Databricks Text Widget – GrabNGoInfo.com

Or alternatively, we can pass the value using the variable created in step 4. In this example, we take the variable called multiselect_filter_value, split the string using comma as the delimeter, and filter the dataframe by checking if the asset name is in the multiselect filter list.

# Pass widget values using the varaibles created in step 4.
display(df.filter(col('Asset_Name').isin(multiselect_filter_value.split(','))))
Output after Filtered by Databricks Multiselect Widget – GrabNGoInfo.com

When the dropdown widget has All as an option, we need to treat All and other options differently because All is not a value in the dataframe column. One way to do this is to use if else conditions to display all records if the dropdown widget value is All, and filter by the widget value otherwise.

# Pass dropdown value with All as one option
if dropdown_filter_value == 'All':
    display(df)
else:
    display(df.filter(col('Asset_Name').isin(dropdown_filter_value)))
Output after Filtered by Databricks Dropdown Widget – GrabNGoInfo.com

Step 6: Use Widgets As Filters For Dashboard

In step 6, we will talk about how to use widgets as filters for Databricks dashboard.

Firstly, let’s create a time series chart for volume using Databricks’s built-in tool.

# Example for dashboard filter
if dropdown_filter_value == 'All':
    display(df)
else:
    display(df.filter(col('Asset_Name').isin(dropdown_filter_value)))

Click the downward triangle next to the bar chart icon, then select the line chart option.

Databricks Create Line Chart – GrabNGoInfo.com

Next, click the Plot Options icon to check if the settings for the chart are correct and make changes if necessary.

Databricks Line Chart with Widget Filter – GrabNGoInfo.com

To create a dashboard, click the bar chart icon on the upper right corner of the cell, then click Add to New Dashboard.

Databricks Add New Dashboard – GrabNGoInfo.com

This opens the dashboard. We can see the filters on the top of the dashboard.

Databricks Dashboard with Filters – GrabNGoInfo.com

To learn more about how to create a Databricks dashboard, please refer to my tutorial Databricks Dashboard For Big Data

Step 7: Configure Databricks Widgets

In step 7, we will configure the Databricks widgets by clicking the gear icon on the top right of the notebook.

Configure Databricks Widgets – GrabNGoInfo.com

The Widgets Panel Settings window will pop up. Under On Widget Change, there are three options.

  • Run Accessed Commands means that when the widget values change, only the cells that directly retrieve the changed widget are rerun. This is the default setting, but it does not work for SQL cells.
  • Do Nothing means that the notebook will not rerun based on the new widget values.
  • Run Notebook means rerun the whole notebook. I recommend choosing this option to prevent missing some of the important steps in the code.
Databricks Widgets Panel Settings – GrabNGoInfo.com

The widgets panel is pin to the top of the notebook by default, but we can uncheck the Pinned to top option to show it above the first cell.

Databricks Widgets Location – GrabNGoInfo.com

Step 8: Pass Values to Widgets in Another Notebook

Sometimes we may need to run the notebook with specific parameters from another notebook. In that case, we can use %run to run the notebook and pass the parameters at the same time. The sample code below is from the Databricks documentation for widgets.

# Pass parameters to widgets in another notebook
%run /path/to/notebook $X="10" $Y="1"

Step 9: Delete Databricks Widgets

In step 9, we will talk about how to delete Databricks widgets. To delete one widget, use dbutils.widgets.remove("widget_name").

# Remove one widget
dbutils.widgets.remove("text_filter")

To remove all widgets, use dbutils.widgets.removeAll().

# Remove all widgets
dbutils.widgets.removeAll()

After removing a widget, we cannot create new widgets in the same cell. New widgets need to be created in a separate cell.

Summary

In this tutorial, we talked about how to create widgets using python. You learned:

  • How to create different types of widgets using python?
  • How to retrieve widgets values?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Note that you cannot use widgets to pass arguments between different languages within a notebook and run it in a scheduled job. So when the widgets are created in python, the value of the widgets need to be retrieved by python as well. To learn about how to run Databricks jobs, please refer to my tutorial Databricks Multi-Task Job Scheduling

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

References

Leave a Comment

Your email address will not be published. Required fields are marked *