Connect to Google Cloud Storage (2024)

  • Documentation
  • Connect to data sources
  • Configure access to cloud object storage for Databricks
  • Connect to Google Cloud Storage

Note

This article describes legacy patterns for configuring access to GCS. Databricks recommends using Unity Catalog to configure access to GCS and volumes for direct interaction with files. See Connect to cloud object storage using Unity Catalog.

This article describes how to configure a connection from Databricks to read and write tables and data stored on Google Cloud Storage (GCS).

Access GCS buckets using Google Cloud service accounts on clusters

You can access GCS buckets using Google Cloud service accounts on clusters. You must grant the service account permissions to read and write from the GCS bucket. Databricks recommends giving this service account the least privileges needed to perform its tasks. You can then associate that service account with a Databricks cluster.

You can connect to the bucket directly using the service account email address (recommended approach) or a key that you generate for the service account.

Important

The service account must be in the Google Cloud project that you used to set up the Databricks workspace.

The GCP user who creates the service account role must:

  • Be an GCP account user with the permissions to create service accounts and grant permissions roles for reading and writing to a GCS bucket.

The Databricks user who adds the service account to a cluster must have the Can manage permission on a cluster.

Step 1: Set up Google Cloud service account using Google Cloud Console

  1. Click IAM and Admin in the left navigation pane.

  2. Click Service Accounts.

  3. Click + CREATE SERVICE ACCOUNT.

  4. Enter the service account name and description.

    Connect to Google Cloud Storage (1)

  5. Click CREATE.

  6. Click CONTINUE.

  7. Click DONE.

  8. Navigate to the Google Cloud Console list of service accounts and select a service account.

    Copy the associated email address. You will need it when you set up Databricks clusters.

Step 2: Configure your GCS bucket

Create a bucket

If you do not already have a bucket, create one:

  1. Click Storage in the left navigation pane.

  2. Click CREATE BUCKET.

    Connect to Google Cloud Storage (2)

  3. Name your bucket. Pick a globally unique and permanent name that complies with Google’s naming requirements for GCS buckets.

    Important

    To work with DBFS mounts, your bucket name must not contain an underscore.

  4. Click CREATE.

Configure the bucket

Configure the bucket:

  1. Configure the bucket details.

  2. Click the Permissions tab.

  3. Next to the Permissions label, click ADD.

    Connect to Google Cloud Storage (3)

  4. Provide the desired permission to the service account on the bucket from the Cloud Storage roles:

    • Storage Admin: Grants full privileges on this bucket.

      • Storage Object Viewer: Grants read and list permissions on objects in this bucket.

    Connect to Google Cloud Storage (4)

  5. Click SAVE.

Step 3: Configure a Databricks cluster

When you configure your cluster, expand Advanced Options and set the Google Service Account field to your service account email address.

Use both cluster access control and notebook access control together to protect access to the service account and data in the GCS bucket. See Compute permissions and Collaborate using Databricks notebooks.

Access a GCS bucket directly with a Google Cloud service account key

To read and write directly to a bucket, you can either set the service account email address or configure a key defined in your Spark configuration.

Note

Databricks recommends using the service account email address because there are no keys involved, so there is no risk of leaking the keys. One reason to use a key is if the service account needs to be in a different Google Cloud project than the project that was used when creating the workspace. To use a service account email address, see Access GCS buckets using Google Cloud service accounts on clusters.

Step 1: Set up Google Cloud service account using Google Cloud Console

You must create a service account for the Databricks cluster. Databricks recommends giving this service account the least privileges needed to perform its tasks.

  1. Click IAM and Admin in the left navigation pane.

  2. Click Service Accounts.

  3. Click + CREATE SERVICE ACCOUNT.

  4. Enter the service account name and description.

    Connect to Google Cloud Storage (5)

  5. Click CREATE.

  6. Click CONTINUE.

  7. Click DONE.

Step 2: Create a key to access GCS bucket directly

Warning

The JSON key you generate for the service account is a private key that should only be shared with authorized users as it controls access to datasets and resources in your Google Cloud account.

  1. In the Google Cloud console, in the service accounts list, click the newly created account.

  2. In the Keys section, click ADD KEY > Create new key.

    Connect to Google Cloud Storage (6)

  3. Accept the JSON key type.

  4. Click CREATE. The key file is downloaded to your computer.

Step 3: Configure the GCS bucket

Create a bucket

If you do not already have a bucket, create one:

  1. Click Storage in the left navigation pane.

  2. Click CREATE BUCKET.

    Connect to Google Cloud Storage (7)

  3. Click CREATE.

Configure the bucket

  1. Configure the bucket details.

  2. Click the Permissions tab.

  3. Next to the Permissions label, click ADD.

    Connect to Google Cloud Storage (8)

  4. Provide the Storage Admin permission to the service account on the bucket from the Cloud Storage roles.

    Connect to Google Cloud Storage (9)

  5. Click SAVE.

Step 4: Put the service account key in Databricks secrets

Databricks recommends using secret scopes for storing all credentials. You can put the private key and private key id from your key JSON file into Databricks secret scopes. You can grant users, service principals, and groups in your workspace access to read the secret scopes. This protects the service account key while allowing users to access GCS. To create a secret scope, see Secrets.

Step 5: Configure a Databricks cluster

  1. In the Spark Config tab, use the following snippet to set the keys stored in secret scopes:

    spark.hadoop.google.cloud.auth.service.account.enable truespark.hadoop.fs.gs.auth.service.account.email <client-email>spark.hadoop.fs.gs.project.id <project-id>spark.hadoop.fs.gs.auth.service.account.private.key {{secrets/scope/gsa_private_key}}spark.hadoop.fs.gs.auth.service.account.private.key.id {{secrets/scope/gsa_private_key_id}}

    Replace <client-email>, <project-id> with the values of those exact field names from your key JSON file.

Use both cluster access control and notebook access control together to protect access to the service account and data in the GCS bucket. See Compute permissions and Collaborate using Databricks notebooks.

Step 6: Read from GCS

To read from the GCS bucket, use a Spark read command in any supported format, for example:

df = spark.read.format("parquet").load("gs://<bucket-name>/<path>")

To write to the GCS bucket, use a Spark write command in any supported format, for example:

df.write.mode("<mode>").save("gs://<bucket-name>/<path>")

Replace <bucket-name> with the name of the bucket you created in Step 3: Configure the GCS bucket.

Example notebooks

Read from Google Cloud Storage notebook

Open notebook in new tab

Write to Google Cloud Storage notebook

Open notebook in new tab

Connect to Google Cloud Storage (2024)
Top Articles
Monero Mining Calculator - XMR Mining Calculator
Set discrete graphics card as default
Ffxiv Act Plugin
Cappacuolo Pronunciation
Paris 2024: Kellie Harrington has 'no more mountains' as double Olympic champion retires
Unlocking the Enigmatic Tonicamille: A Journey from Small Town to Social Media Stardom
Noaa Swell Forecast
Routing Number 041203824
Waive Upgrade Fee
Erskine Plus Portal
Slmd Skincare Appointment
C Spire Express Pay
Labor Gigs On Craigslist
Best Suv In 2010
The Grand Canyon main water line has broken dozens of times. Why is it getting a major fix only now?
Vipleaguenba
Lehmann's Power Equipment
Earl David Worden Military Service
Uta Kinesiology Advising
Webcentral Cuny
BMW K1600GT (2017-on) Review | Speed, Specs & Prices
Best Nail Salons Open Near Me
How to Download and Play Ultra Panda on PC ?
Rochester Ny Missed Connections
3 2Nd Ave
Обзор Joxi: Что это такое? Отзывы, аналоги, сайт и инструкции | APS
What Equals 16
Account Now Login In
R Baldurs Gate 3
Pronóstico del tiempo de 10 días para San Josecito, Provincia de San José, Costa Rica - The Weather Channel | weather.com
Hwy 57 Nursery Michie Tn
San Jac Email Log In
Himekishi Ga Classmate Raw
Mchoul Funeral Home Of Fishkill Inc. Services
Homewatch Caregivers Salary
Ixlggusd
One Credit Songs On Touchtunes 2022
Selfservice Bright Lending
Indiana Wesleyan Transcripts
450 Miles Away From Me
Frommer's Philadelphia &amp; the Amish Country (2007) (Frommer's Complete) - PDF Free Download
How Does The Common App Work? A Guide To The Common App
Improving curriculum alignment and achieving learning goals by making the curriculum visible | Semantic Scholar
Newsweek Wordle
Busted Newspaper Mcpherson Kansas
Minecraft: Piglin Trade List (What Can You Get & How)
El Patron Menu Bardstown Ky
Wera13X
Diablo Spawns Blox Fruits
Asisn Massage Near Me
4015 Ballinger Rd Martinsville In 46151
Die 10 wichtigsten Sehenswürdigkeiten in NYC, die Sie kennen sollten
Latest Posts
Article information

Author: Trent Wehner

Last Updated:

Views: 6260

Rating: 4.6 / 5 (56 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Trent Wehner

Birthday: 1993-03-14

Address: 872 Kevin Squares, New Codyville, AK 01785-0416

Phone: +18698800304764

Job: Senior Farming Developer

Hobby: Paintball, Calligraphy, Hunting, Flying disc, Lapidary, Rafting, Inline skating

Introduction: My name is Trent Wehner, I am a talented, brainy, zealous, light, funny, gleaming, attractive person who loves writing and wants to share my knowledge and understanding with you.