Back to Blog

BigQuery Tutorial – How to Use BigQuery for Your Projects

What should a perfect BigQuery Tutorial consist of? It should probably answer three questions: what, why, and how? We have dared to write such a perfect piece and go beyond. Our goal is to look at BigQuery from the business perspective: what business benefits this cloud data warehouse has compared to its competitors. Read on and tell us whether we succeeded and our tutorial hit the target. 🙂

What is Google BigQuery?

In our opinion, there can be a few answers to this question: 

BigQuery is a database 

In the widest sense, BigQuery is a database. As you know, databases are collections of related data, and BigQuery allows you to store terabytes of records.  

BigQuery is a cloud data warehouse

Data warehouse is BigQuery’s official title. Data warehouses are systems that allow you not only to collect structured data from multiple sources but also analyze it. So, you can call it an analytics database for querying and getting insights from your data.

BigQuery is a columnar database

This title rests on BigQuery’s columnar storage system that supports semi-structured data — nested and repeated columns. This is mostly a technical definition, which we have introduced to broaden your horizons.

BigQuery is a spreadsheet database 😮

This is a down-to-earth definition of BigQuery if the ones above are not enough. BigQuery combines the features of both a spreadsheet software, such as Google Sheets, and a database management system, such as MySQL. 

Why you should use BigQuery

The main reason to opt for BigQuery is analytical querying. BigQuery allows you to run complex analytical queries on large sets of data. Queries are requests for data that can include calculation, modification, merging, and other manipulations with data. 

Let’s say, in Google Sheets, you can also query data sets using the QUERY function. This may work for different kinds of reports and charts based on small to medium data sets. However, any spreadsheet app (even Excel) won’t be able to handle complex queries of large data sets that include millions of rows in a table.

BigQuery is aimed at making analytical queries beyond simple CRUD operations and can boast a really good throughput. However, in our BigQuery tutorial, we do not claim it to be the best database solution, and definitely not a replacement for a relational database. 

BigQuery setup guide 

Another reason why you may consider BigQuery is that it’s a cloud service. You don’t have to install any software. Google handles the infrastructure and you just need to set up BigQuery, which is quite easy.  

Your journey will start with Google Cloud Platform. If it’s your first visit, you’ll need to select your country and agree to the Terms of Service.

After that, go to BigQuery – you can use either the search bar or find it manually in the left menu.

Create a BigQuery project

Here is what BigQuery looks like on your first visit.

Click the Create Project button to spin the prop. Name your project, choose organization if needed, and click Create

Now you’re officially welcomed to BigQuery. 

What is a BigQuery sandbox

Two messages on the top of the BigQuery console have likely drawn your attention.

SANDBOX means that you’re using a sandbox account, which does not require you to enter payment information. This free tier option grants you 10 GB of active storage and 1 TB of processed query data per month. Using this account, your tables will expire in 60 days. Click Learn more to discover other limits.

The second banner offers you to activate a free trial. The difference from the sandbox account is that if you activate the trial, you’ll need to enter your billing details. If you do, you’ll get $300 of cloud credits free.

For our Google BigQuery tutorial, we’ll be using the sandbox option. So, feel free to click Dismiss for both options.

How to use Google BigQuery

Create a data set in BigQuery

Let’s add some data into BigQuery to check out how it works. Click the project you want, and then Create Dataset

Assign a Dataset ID – you can enter letters and numbers. If needed, you can select the Data location as well as a table expiration (up to 60 days) and encryption. After that, click Create dataset.

A new dataset is now created. You can find it by clicking the Expand node button next to your project name:

The next step is to create a table in the dataset. Here is the button to click:

You have a few options here:

  • Create an empty table and fill it manually
  • Upload a table from your device in one of the supported formats (explained in the next section)
  • Import a table from Google Cloud Storage or Google Drive (this option allows you to import Google Sheets)
  • Import a table from Google Cloud Bigtable through the CLI

File formats you can import into BigQuery

You can easily load your tabular data into BigQuery in the following formats:

  • CSV
  • JSONL (JSON lines)
  • Avro
  • Parquet
  • ORC
  • Google Sheets (for Google Drive only)
  • Cloud Datastore Backup (for Google Cloud Storage only)

Note: You cannot import Excel files directly into BigQuery. To do this, you’ll need to either convert your Excel file to CSV, or convert Excel to Google Sheets and then load it to BigQuery. In this BigQuery tutorial, we won’t focus on Excel cases. 

Upload CSV data to BigQuery

Once you click the Create table button, you need to complete the following steps:

  1. Choose source – Upload
  2. Select file – click Browse and choose the CSV file from your device
  3. File format – choose CSV, but usually, the system auto-detects the file format
  4. Table name – enter the table name
  5. Check the Auto detect checkbox 
  6. Click Create table

This is what the basic flow looks like. Additionally, you can define partition settings (to divide your table into smaller segments), cluster settings (to organize data based on the contents of specified columns), as well as configure the Advanced options. This is what your table uploaded to BigQuery looks like:

Note: The table preview feature shows previews for tables stored inside BigQuery. For example, when you upload CSV, it is saved in BigQuery – you’ll see the preview. However, when you pull data from Google Sheets, it is a real-time connection since BigQuery scans Google Sheets every time you query it. In this case, you won’t have the preview available.

Import data from Google Sheets to BigQuery manually

Most of you would probably like to learn more about importing tables from Google Sheets to BigQuery. The workflow is very similar, but with a few modifications. Click the Create table button and:

  1. Choose source – Drive
  2. Select Drive URI – insert the URL of your Google Sheets spreadsheet 
  3. File format – choose Google Sheets
  4. Sheet range – specify the sheet and data range to import. If you leave this field blank, BigQuery will retrieve the data from the first sheet of your spreadsheet.
  5. Table name – enter the table name
  6. Mark the Auto detect checkbox 
  7. Click Create table

You may be interested in setting up Advanced options since they let you:

  • Skip rows with the column values that do not match the schema.
  • Skip a specific number of rows from the top.
  • Enable including newlines contained in quoted data sections.
  • Enable accepting rows that are missing trailing optional columns.
  • Select an encryption key management solution.

Once you click Create table, the specified sheet from your spreadsheet will be imported into BigQuery. Here are the details (table preview is not available for importing Google Sheets):

Import data to BigQuery from Google Sheets and other software on a schedule

Let’s say you have a data set in Google Sheets that you need to import to BigQuery every day. You can do this manually as was described above or automate the import of data to BigQuery using one of the integrations provided by Coupler.io. 

Coupler.io is a data integration tool to automate data flow from different sources to BigQuery, Google Sheets, and Microsoft Excel. 

You need to sign in to Coupler.io. You can use your Google or Microsoft account for this or create an account with a custom email address. Then click Add importer and select your source application, Google Sheets, and the destination app which is BigQuery.

google sheets and bigquery apps

Click Proceed to configure these app connections.

  • Set up Source

Connect to Google account, then select a file and a sheet to export data from.

Note: If you select multiple sheets, their data will be combined in one master view in a BigQuery table.

google sheets source
  • Set up Destination

Connect your BigQuery account. For this, you’ll need to create a service account with two roles: BigQuery Data Editor and BigQuery Job User. Then download and add your .json key file. Here is a detailed instruction.

You must create a service account with two roles: BigQuery Data Editor and BigQuery Job User. 

Then enter the names of the BigQuery dataset and table where the data from Google Sheets will be loaded.

bigquery destination
  • Set up Schedule

You’ll need to customize the schedule for your automatic exports of data from Google Sheets to BigQuery.

8 pipedrive export schedule

Once you’re ready, click Save and Run to get your data into the specified BigQuery table. 

In a similar way, you can load data from Airtable, HubSpot, Excel and other sources to BigQuery.

Query tables in BigQuery

The real power of BigQuery lies in querying. You can query the tables in your database using the standard SQL dialect. The non-standard or legacy SQL dialect is also supported, but BigQuery recommends using the standard SQL dialect.

Check out our Google BigQuery SQL Tutorial to delve into this.

If you know what the Google Sheets QUERY function looks like, then you should understand how queries work. For example, here is a QUERY formula example:

=query(Deals!A:EU,"select E, N, T order by T Desc")

"select E, N, T order by T Desc" – this is the query to retrieve three columns of the entire data set and order the results in descending order.

In BigQuery, the same query on the dataset imported from Pipedrive to BigQuery will look like this:

SELECT 
       string_field_4, 
       string_field_13, 
       string_field_19 
FROM `test-project-310714.test.pipedrive-deals` 
ORDER BY string_field_19 DESC

 Now we’ll explain how it works.

How to query data in BigQuery + syntax example

Click the Query table button to start querying.

You’ll see the query boilerplate, like this:

SELECT FROM `test-project-310714.test.pipedrive-deals` LIMIT 1000

This is the basic example you can use to start your introduction to querying. Add * after the SELECT method, so that the query looks like this:

SELECT * FROM `test-project-310714.test.pipedrive-deals` LIMIT 1000

This query will return all available columns from the specified table, but no more than 1,000 rows. Click Run, and there you go!

Now, let’s query specific fields (columns) and order them. So, instead of using *, we need to specify the field names we need. You can find the field names in the Schema tab, or from your last query.

Let’s replace the LIMIT method from the default query with ORDER BY – this will let you sort data by a specified column. To sort data in descending order, add DESC to the end of the query. Here is what it looks like:

SELECT 
       string_field_4, 
       string_field_13, 
       string_field_19 
FROM `test-project-310714.test.pipedrive-deals` 
ORDER BY string_field_19 DESC

Here is our Google BigQuery SQL tutorial. You can also check out this SQL Video Tutorial for Beginners, made by Railsware. 

Query settings

If you click More and select Query Settings, you’ll be able to configure the destination for your query results and other settings.

Query Settings

Here you can also set up to run queries in batches. Batch queries are queued and started as soon as idle resources are available in the BigQuery shared resource pool.

How to save queries in BigQuery

You can save your queries for later use. To do this, click Save => Save Query.

In the next window, name your query and choose its visibility:

  • personal – only you will be able to edit the query
  • project – only project members will be able to edit the query
  • public – the query will be available publicly for edit

Click Save.

You can find your saved queries in the respective popup tab.

How to schedule queries in BigQuery

Next to the Save button, there is a Schedule button, which allows you to enable scheduled queries. Your first thought: “Why would I run queries on a schedule?” Well, there are at least two reasons to do this:

  • Queries can be huge and take a lot of time to run, so it is better to prepare data in advance.
  • Google charges money for data queries, so if it is OK for you to have data updated daily, it is better to do that and use the already prepared views to query them ad-hoc.

Note: Query scheduling is only available for projects with billing enabled. It won’t work for SANDBOX account projects.

Once you click the Schedule button, you’ll get a notification that you need to first enable the BigQuery Data Transfer API. 

Click Enable API and wait a short time. After that, you’ll be able to create scheduled queries when you click the Schedule button.

Click Create new scheduled query and define the following parameters:

  • Name for scheduled query
  • Schedule options
    • Repeats
    • Start date and run time
    • End date
  • Destination 
    • Table name
    • Write preference (overwrite or append)
      • Overwrite – query results will overwrite the data in the table
      • Append – query results will be appended to the data in the table

Optionally, you can set up advanced options and notification options. Click Schedule when the setup is complete.

Scheduled query setup

Next, you’ll need to select your Google account to continue to BigQuery Data Transfer Service.  

Query history

Let’s say you forgot to save your advanced query, but you want to restore it now. No worries, BigQuery provides you with logs of the queries and jobs you made. You will find them in the popup tabs Jobs history and Query history.

Note: BigQuery displays all load, export, copy, and query jobs for the past 6 months. It limits the job and query histories to 1,000 entries.

Export queries from BigQuery manually and automatically 

In most cases, users need to export the results of their queries outside BigQuery. The common destinations are spreadsheet apps, such as Google Sheets and Excel, visualization and dashboarding tools, such as Google Data Studio (we have a wonderful Google Data Studio tutorial on our blog) and Tableau, and other software. You can also connect Power BI to BigQuery.

BigQuery exporting limits

To export your query results, you need to click the Save Results button and select one of the available options:

  • CSV file
    • Download to your device (up to 16K rows)
    • Download to Google Drive (up to 1GB)
  • JSON file
    • Download to your device (up to 16K rows)
    • Download to Google Drive (up to 1GB)
  • BigQuery table
  • Google Sheets (up to 16K rows)
  • Copy to clipboard (up to 16K rows)

Read more about BigQuery data export.

Example of exporting query from BigQuery

As an example, let’s choose the BigQuery table option. You’ll need to choose the Project and Dataset, as well as name your table. 

Click Save, and there you go!

Export queries from BigQuery to Google Sheets automatically

Saving your query results to Google Sheets manually is not particularly efficient, especially if you need to do this recurrently. You should automate exporting queries from BigQuery to Google Sheets – it only takes a few minutes and no coding skills are required. 

You need to sign in to Coupler.io, a solution for importing data to Google Sheets from apps, APIs, and other sources. They have a dedicated BigQuery integration that schedules the export of queries to Google Sheets! So, once you’ve signed in to Coupler.io, click Add importer, select BigQuery as a source app and Google Sheets as a destination app, and proceed to their configuration::

  • Set up Source

Get credentials to connect your BigQuery account, then enter your query.

  • Set up Destination

Connect your Google account, then choose a spreadsheet and sheet to export query results to.

  • Set up Schedule

You’ll need to customize the schedule for your automatic exports of queries from BigQuery to Google Sheets.

Once you’re ready, click Save and Run to get your query results into the spreadsheet. 

Bonus: How BigQuery stores data

Unlike traditional relational databases that store data row by row, BigQuery’s storage is column by column. This means that a separate file block is used to store each column. This columnar format, called Capacitor, allows BigQuery to achieve a very high throughput, which is crucial for online analytical processing.

BigQuery architecture

In BigQuery’s serverless architecture, the resources for storage and computing are decoupled. This allows you to get your data of any size into the warehouse and get on with data analysis right away. Here are the infrastructure technologies that make this happen:

  • Colossus – responsible for storage. This is a global storage system optimized for reading large amounts of structured data, as well as handling replication, recovery, and distributed management.
  • Dremel – responsible for compute. This is a multi-tenant cluster that turns SQL queries into execution trees. These trees have leaves called slots, and a single user can get thousands of slots to run their queries.
  • Jupiter – responsible for data movement between storage (Colossus) and compute (Dremel). This is a petabit network, which moves data from one place to another, and does so very quickly.
  • Borg – responsible for allocation of hardware resources. This is a cluster management system to run hundreds of thousands of jobs in BigQuery.

Learn more about BigQuery

Let’s be honest – our goal was not to write a perfect BigQuery tutorial but to provide you with the answers every beginner has when they discover a new tool or technology. We are sure that you’ll probably have more questions after reading it, but you’ll have to find the answers to them yourself. 

The premier source of information is the official BigQuery documentation,  where you’ll find an extensive knowledge base of BigQuery usage. The drawback of this source is that it’s absolutely huge and sometimes over-structured (as most of Google documentation is).

From our side, we’ll try to cover other BigQuery-related topics to clarify some advanced points like BigQuery SQL syntax or using client libraries to get started with the BigQuery API in your favorite programming language. Good luck!

Back to Blog

Comments are closed.

Access your data
in a simple format for free!

Start Free