Databricks GitHub Repo Integration Setup

Databricks GitHub Repo Integration Setup

Databricks supports integration with version control tools such as GitHub and Bitbucket. In this tutorial, we will talk about how to set up integration with GitHub repositories. You will learn:

  • How to get a GitHub token for Databricks integration?
  • How to create a GitHub repo for Databricks?
  • How to set up Git integration on Databricks?
  • How to add a GitHub repo to DataBricks?

Note that the Databricks free community edition does not support Git integration. You need to upgrade to the paid Premium plan to integrate with GitHub. To learn how to upgrade, check out my tutorial on Databricks Community Edition Upgrade To Paid Plan AWS Setup. If you decide to skip the Databricks free community edition and start the Premium plan directly, check out my tutorial on Databricks AWS Account Setup.

Resources for this post:

  • More video tutorials on Databricks
  • More blog posts on Databricks
  • If you prefer the video version of the tutorial, please check out the video on YouTube

Let’s get started!

Step 1: Get GitHub Token For Databricks integration

Firstly, let’s create a token for the Databricks integration.

Step 1.1: After logging in your GitHub account, click the profile picture on the upper-right corner and select Settings.

Step 1.2: On the settings page, click Developer settings at the bottom of the left pane.

This image has an empty alt attribute; its file name is image-20-307x1024.png

Step 1.3: On the Developer settings page, click Personal access tokens on the left pane.

Step 1.4: Click Generate new token.

Step 1.5: Confirm access by entering the password for the GitHub account, then give the token a name under Note.

Step 1.6: Choose the expirations days for the token. Shorter expiration days means higher security, and longer expiration days is more convenient. I chose No expiration date here.

Step 1.7: Select the scopes that define the access for personal tokens. I selected repo.

Step 1.8: Click the green Generate token button to generate the token. You may be asked to confirm the password.

After the token is successfully generated, copy the token and save it to a secure place. You won’t be able to see the token again.

Step 2: Create repo For databricks

Step 2.1: Go back to the GitHub homepage and click the green Create repository on the upper left corner of the page.

Step 2.2: Give the repository a name, and choose Add a README file. The readme file enables us to write a long description for the project. Click the green Create repository button.

Step 2.3: Click the pencil icon to edit the readme file for the repository. I included a brief description of GrabNGoInfo, the website link, YouTube channel link, and the subscription link. After editing the repo description, click the green Commit changes button.

Step 2.4: Click Code on the menu, then click the green Code button, copy the URL for the repo and save it somewhere for later use.

Step 3: Set Up Git Integration On Databricks

Step 3.1: After logging in the Databricks account, go to the User Settings page by clicking Settings, then User Settings.

Step 3.2: On the user settings page, select Git Integration. Choose GitHub under Git provider, enter the GitHub username or email, paste the GitHub personal access token we just created, and click the blue Save button.

You will see the green Successfully saved.

Step 4: Add GitHub Repo To DataBricks

Step 4.1: Select Repos from the side bar and click Add Repo.

Step 4.2: Paste the Git repo URL saved in step 2. The version control tool name and the repo name will be automatically populated. Click the blue Create button to add the repo.

A popup window show up saying “NOTE: Databricks Repos will only clone Databricks notebooks from the remote repo. This repo does not contain any Databricks notebooks.”. This is as expected because we just created the repo, and there are no notebooks in the repo yet. Click the blue OK button.

Step 4.3: Under Repos, click the user name, we can see that the repo called “tutorials” is added to Databricks.

Summary

In this tutorial, we talked about how to set up integration with GitHub repositories. You learned:

  • How to get a GitHub token for Databricks integration?
  • How to create a GitHub repo for Databricks?
  • How to set up Git integration on Databricks?
  • How to add a GitHub repo to DataBricks?

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

Leave a Comment

Your email address will not be published. Required fields are marked *