Google BigQuery
Google BigQuery is a fully managed, serverless data warehouse with built-in machine learning capabilities.
What can you do with the Google BigQuery integration?
As with all of Deepnote's SQL-related integrations, the Google BigQuery integration allows you to query databases and explore the structure of your warehouse. Think "SQL editor" but with all the modern productivity boosters that come with Deepnote's notebook. For example,
- Write native SQL and Python in the same notebook
- Search your entire warehouse instantly via the integrated schema explorer
- Get intelligent autocomplete for columns, tables, and databases
- Interactively explore data without writing any additional code
- Run queries powered by BigQuery DataFrames
How to connect to Google BigQuery
Authenticating with a service account
A service account will provide a shared connection to Google BigQuery. That is, all collaborators with least Editor privileges will be able run queries against databases provisioned in the service account.
To use the service account authentication, you will need to supply a JSON service account key. Click here for a guide on creating a JSON service account key. Your service account key will be encrypted and stored in Deepnote's database.
Authenticating to BigQuery with Google OAuth
With BigQuery's Google OAuth authentication you can give every member of your Deepnote workspace their own set of credentials. This ensures greater security by using short-lived tokens and enabling the use of multi-factor authentication. Follow the principle of least privilege and use granular access control for various BigQuery resources to ensure users can only access the data they need. Click here to learn how to set up BigQuery's Google OAuth authentication in Deepnote.
Working with data from Google BigQuery
Now that you are connected to your Google BigQuery can do the following actions in Deepnote:
-
Click the newly created integration in the Integrations section to open the schema browser. Click here to learn more about the schema browser.
-
Create an SQL block that points to your warehouse and begin querying your data. Autocomplete for columns, tables, and databases will be at your fingertips as you type. Click here to learn more about SQL blocks.
-
Explore column distributions as well as the sorting and filtering capabilities on the results set. All results are displayed as an interactive Pandas DataFrame. Click here to learn about interactive DataFrame output.
-
Pipe the results of your query into a chart block for rapid interactive data visualization. Click here to learn more about chart blocks.
Using pure Python to connect to BigQuery
To go beyond querying (like listing tables, creating datasets, etc.), you may need to use the official Python client library (docs).
Use this code snippet to authenticate the python client using the integration's service account:
import json
import os
from google.oauth2 import service_account
from google.cloud import bigquery
bq_credentials = service_account.Credentials.from_service_account_info(
json.loads(os.environ['INTEGRATION_NAME_SERVICE_ACCOUNT']))
client = bigquery.Client(credentials=bq_credentials,
project=bq_credentials.project_id)
Just replace the INTEGRATION_NAME
with an uppercased, underscore-connected name of your integration. If you have trouble finding it, run this one-liner to list environment variables that contain service accounts:
[var for var in os.environ if '_SERVICE_ACCOUNT' in var]
Once the BigQuery client
is initialized, you can use it to run queries and materialize the results as dataframes like this:
sql = """
SELECT *
FROM `bigquery-public-data.usa_names.usa_1910_current`
LIMIT 1000
"""
df = client.query(sql).to_dataframe()
Working with Multiple Projects
Deepnote's BigQuery integration supports querying data over multiple projects, offering a flexible and powerful way to handle diverse datasets across your Google Cloud Platform (GCP) environment. This capability is particularly useful for users managing or analyzing data across various GCP projects.
To leverage this feature, you must set up the necessary access permissions within GCP. Ensure your service account or Google OAuth credentials are configured to access all the projects you intend to work with. For detailed guidance on managing access and permissions in GCP, you can refer to the GCP documentation on access control.
Once you have the required access, querying data from different projects is straightforward. In your SQL queries, use the projectId.dataset.table
format to specify exactly where your data resides. This format allows you to seamlessly switch between datasets hosted in different projects without changing your connection settings.
Deepnote's schema explorer enhances your ability to browse through various projects and tables. With this tool, you can visually inspect the structure of your datasets, including the columns and their data types, across different GCP projects. This feature simplifies the process of understanding and navigating through your data, especially when dealing with multiple projects.
By utilizing these features, Deepnote users can effectively manage and analyze data across multiple GCP projects, making it a robust solution for complex data environments.