Databricks Dashboard For Big Data

Databricks Dashboard For Big Data

Databricks provides a dashboard view of the notebook results. Users can choose which output or charts to include in the dashboard with a single click. The dashboard can be shared in a presentation format with others. It can also be published and shared as a link. Compared to other visualization tools, Databricks dashboard has the advantage of being able to visualize very large datasets.

In this tutorial, we will talk about:

  • How to create filters for Databricks dashboard using widgets?
  • How to create tables and charts in Databricks dashboard?
  • How to format a Databricks dashboard?
  • How to switch between notebook view and dashboard view?
  • How to share the notebook with others?
  • How to delete a Databricks dashboard?

Resources for this post:

Databricks Dashboard Visualization – GrabNGoInfo.com

Step 1: Import Libraries

In the first step, we will import the pyspark SQL functions for data processing.

# Functions for data processing
import pyspark.sql.functions as F

Step 2: Read In Dataset

In step 2, A CSV dataset on cryptocurrency prices is read from a mounted S3 bucket. To learn how to mount an AWS S3 bucket to Databricks, please refer to my previous tutorial Databricks Mount To AWS S3 And Import Data.

# Read in CSV data
df = (spark.read.format('csv')
  .option("inferSchema", True)
  .option("header", True)
  .option("sep", ',')
  .load("/mnt/demo4tutorial/data/crypto_100k_records.csv"))

# Take a look at the data
display(df)

After reading the data, we will do some data processing. The timestamp is in UNIX epoch format, which is the number of seconds since January 1st of 1970 Coordinated Universal Time (UTC). Using F.to_timestamp, we changed it to a DateTime format. The columns that are not used in the visualization are dropped. We also created a new column for asset names.

# Change epoch to datetime format and drop unwanted columns
df = df.withColumn('DateTimeType', F.to_timestamp(df['timestamp'])).drop('timestamp', 'Count','High', 'Low', 'Close', 'VWAP', 'Target')
 
# Create asset name
df = df.withColumn('Asset_Name', F.when(df['Asset_ID']==1, 'Bitcoin')
                                  .when(df['Asset_ID']==6, 'Ethereum')
                                  .otherwise('Other'))

# Take a look at the data    
display(df)

Step 3: Create Dashboard Filters Using Widget

In step 3, we will create dashboard filters using widgets. Widgets enable the dashboard to rerun with different parameters. There are four types of widgets:

  • text takes text as inputs.
  • dropdown creates a dropdown list with values.
  • combobox is a combination of text and dropdown. Users can either select values from the dropdown list or input their own values.
  • multiselect creates a list of values. Users can select one or more values from the list.

In this tutorial, we will use multiselect as an example.

Before creating the new widget, we first removed all existing widgets using the removeAll() function. When creating the widget using multiselect, we need to give the widget a name, a default value, a list of values to choose from, and the text displayed next to the filter.

# Remove existing widgets if there are any
dbutils.widgets.removeAll()

# Create widget
dbutils.widgets.multiselect('name', 'Bitcoin', ['Bitcoin', 'Ethereum', 'Other'], 'Select Crypto Currency Name')

After creating the widget, we can set up the configuration by clicking the gear icon on the upper right corner of the notebook.

Clicking this icon opens the widget configuration. There are three settings to choose from:

  • Run Notebook reruns the entire notebook when there is a change in the widget value selection.
  • Run Accessed Commnds reruns only the cells that retrieve the values for the particular widget.
  • Do Nothing does not rerun any cell when there is a change in widget value selection.
Databricks Widget Configuration – GrabNGoInfo.com

To make sure the dashboard is updated with the newly selected values, either Run Notebook or Run Accessed Commands need to be selected. We will choose Run Accessed Commands in this example because we only want to rerun the cells that are in the dashboard.

Step 4: Create Charts

In step 4, we will create some commonly used charts using the Databricks built-in chart functions in the notebook.

Step 4.1: Line Chart

Step 4.1 creates a line chart. We first selected the columns needed for the chart, then filter the data to include only the values selected in the widget. After that, group the data by date and asset name, and calculate the average. Finally, sort the data by date.

# Line chart
display(df.select('Asset_Name', 'DateTimeType', 'Open')
          .filter(F.col('Asset_Name').alias('Asset Name').isin(dbutils.widgets.get("name").split(",")))
          .groupby(F.to_date('DateTimeType').alias('date'), 'Asset_Name').agg(F.avg('Open').alias('averge open price'))
          .sort(F.to_date('DateTimeType')))

The default output is a data table, but we can click the downward arrow next to the bar chart icon and select the line chart option.

To make changes to the chart, click Plot Options below the chart.

We can change Keys, Series groupings, Values, and Aggregation calculations. We can also change the Y-axis Range, Show Points, and make the color consistent for groups.

The legend for the groups is clickable. We can show a subset of groups by deselecting some groups.

Databricks Line Chart – GrabNGoInfo.com

Step 4.2: Bar Chart

Step 4.2 creates a bar chart for cryptocurrency volume.

# Bar chart
display(df.select('Asset_Name', 'DateTimeType', 'Volume')
          .filter(F.col('Asset_Name').alias('Asset Name').isin(dbutils.widgets.get("name").split(",")))
          .groupby(F.to_date('DateTimeType').alias('date'), 'Asset_Name').agg(F.sum('Volume').alias('transaction volume'))
          .sort(F.to_date('DateTimeType')))

To create a grouped bar chart, choose Grouped in Plot Options.

Because the Global color consistency is selected, the color for different groups is consistent with the previous line chart.

We can also create a stacked bar chart by selecting Stacked in Plot Options.

To get the percent stacked bar chart, choose 100% stacked in Plot Options.

Step 4.3 Histogram

Step 4.3 creates a histogram for cryptocurrency volume distributions.

# Histogram
display(df.select('Asset_Name', 'DateTimeType', 'Volume')
          .filter(F.col('Asset_Name').alias('Asset Name').isin(dbutils.widgets.get("name").split(",")))
       )

When there is more than one group, use the group category name as the key. This produces one histogram for each group. We can change the number of bins for the histogram.

Databricks Histogram – GrabNGoInfo.com

Step 4.4: Pie Chart

Step 4.4 creates a pie chart for cryptocurrency volume.

We set the Assset_Name as the key, and take the sum of the volume by asset name. The pie chart has the option of displaying a donut chart by selecting the Donut option.

Databricks Pie Chart – GrabNGoInfo.com

The pie chart shows the volume percent of the selected cryptocurrency names.

Step 4.5: Area Chart

Step 4.5 creates an area chart for cryptocurrency volume.

We set the date as the key, Asset_Name as the grouping, and transaction volume as the values. The area chart has three options, Overlapped, Stacked, or 100% Stacked. We can choose an option that fits the business needs best.

Databricks Area Chart – GrabNGoInfo.com

Step 5: Create Databricks Dashboard

In step 5, we will talk about how to create a new Databricks dashboard.

To create a new dashboard, click the picture icon in the menu, and click the last item, + New Dashboard.

Databricks Create New Dashboard – GrabNGoInfo.com

All the existing markdown cells and outputs in the notebook will be automatically included in the dashboard. We can delete unwanted content by clicking the cross icon on the upper right corner of the content.

The new notebook outputs generated after creating the dashboard will not be included automatically. To include new content, click the bar chart icon on the upper right corner of the cell and check the dashboard name.

Step 6: Format a Databricks Dashboard

Databricks Dashboard tables and charts can be moved around by clicking and dragging.

We can also change the size of dashboard objects by clicking the arrow in the lower right corner. Using the markdown cell, we can include headers, texts, images, math equations, and more in the dashboard.

My previous tutorial Databricks Notebook Markdown Cheat Sheet covers how to use Databricks markdown.

Step 7: Switch Between Notebook View and Dashboard View

In step 7, we will talk about how to switch between a notebook and its dashboard.

To switch from a notebook view to a dashboard view, click the picture icon in the menu and select the dashboard name under the Dashboards section. One notebook can have multiple dashboards associated with it.

To switch from a dashboard view to a notebook view, we can click the picture icon in the menu and select Standard.

We can also open the dashboard view by clicking the bar chart icon on the upper right corner of the cell, and clicking the square and arrow at the end of the dashboard name.

To get back to the notebook, we can click the notebook name link on the right pane under the dashboard name.

Databricks Switch Between Notebook And Dashboard – GrabNGoInfo.com

Step 8: Share Databricks Dashboard

Databricks dashboard can be shared with others by clicking the lock icon in the notebook menu. In the Databricks community edition, the permission control is disabled, but users can publish the notebook by clicking the share button in the upper right corner. Anyone with the link can access the notebook and dashboard.

To present the dashboard results to others, click the Present Dashboard button on the right panel of the dashboard.

To delete a dashboard, click the red Delete this dashboard button.

Summary

In this tutorial, we covered:

  • How to create filters for Databricks dashboard using widgets?
  • How to create tables and charts for Databricks dashboard?
  • How to format a Databricks dashboard?
  • How to switch between notebook view and dashboard view?
  • How to share the notebook with others?
  • How to delete a Databricks dashboard?

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended For You

References

Leave a Comment

Your email address will not be published. Required fields are marked *