Back to Blog

BigQuery Tutorial – How to Add Agility to Your Business

What should a perfect BigQuery Tutorial consist of? It should probably answer three questions: what, why, and how? We have dared to write such a perfect piece and go beyond. Our goal is to look at BigQuery from the business perspective: what business benefits this cloud data warehouse has compared to its competitors. Read on and tell us whether we succeeded and our tutorial hit the target. 🙂

What is Google BigQuery?

In our opinion, there can be a few answers to this question: 

BigQuery is a database 

In the widest sense, BigQuery is a database. As you know, databases are collections of related data, and BigQuery allows you to store terabytes of records.  

BigQuery is a cloud data warehouse

Data warehouse is BigQuery’s official title. Data warehouses are systems that allow you not only to collect structured data from multiple sources but also analyze it. So, you can call it an analytics database for querying and getting insights from your data.

BigQuery is a columnar database

This title rests on BigQuery’s columnar storage system that supports semi-structured data — nested and repeated columns. This is mostly a technical definition, which we have introduced to broaden your horizons.

BigQuery is a spreadsheet database 😮

This is a down-to-earth definition of BigQuery if the ones above are not enough. BigQuery combines the features of both a spreadsheet software, such as Google Sheets, and a database management system, such as MySQL. 

Why you should use BigQuery

The main reason to opt for BigQuery is analytical querying. BigQuery allows you to run complex analytical queries on large sets of data. Queries are requests for data that can include calculation, modification, merge and other manipulations with data. 

Let’s say, in Google Sheets, you can also query data sets using the QUERY function. This may work for different kinds of reports and charts based on small to medium data sets. However, any spreadsheet app (even Excel) won’t be able to handle complex queries of large data sets that include millions of rows in a table.

BigQuery is aimed at making analytical queries beyond simple CRUD operations and can boast a really good throughput. However, in our BigQuery tutorial, we do not claim it to be the best database solution, and definitely not a replacement for a relational database. 

How to use Google BigQuery

Another reason why you may consider BigQuery is that it’s a cloud service. You don’t have to install any software. Google handles the infrastructure and you just need to set up BigQuery, which is quite easy.  

BigQuery setup guide 

First steps

Your journey will start with Google Cloud Platform. If it’s your first visit, you’ll need to select your country and agree to the Terms of Service.

After that, go to BigQuery – you can use either the search bar or find it manually in the left menu.

Create a project

Here is what BigQuery looks like on your first visit.

Click the “Create Project” button to spin the prop. Name your project, choose organization if needed, and click “Create“. 

Now you’re officially welcomed to BigQuery. 

BigQuery sandbox

Two messages on the top of BigQuery console have likely drawn your attention.

SANDBOX means that you’re using a sandbox account, which does not require you to enter payment information. This free tier option grants you 10 GB of active storage and 1 TB of processed query data per month. Using this account, your tables will expire in 60 days. Click “Learn more” to discover other limits.

The second banner offers you to activate a free trial. The difference from the sandbox account is that if you activate the trial, you’ll need to enter your billing details. If you do, you’ll get $300 of cloud credits free.

For our Google BigQuery tutorial, we’ll be using the sandbox option. So, feel free to click “Dismiss” for both options.

Create a data set in BigQuery

Let’s add some data into BigQuery to check out how it works. Click the project you want, and then “Create Dataset“. 

Assign a Dataset ID – you can enter letters and numbers. If needed, you can select the Data location as well as a table expiration (up to 60 days) and encryption. After that, click “Create dataset“.

A new dataset is now created. You can find it by clicking the “Expand node” button next to your project name:

The next step is to create a table in the dataset. Here is the button to click:

You have a few options here:

  • Create an empty table and fill it manually
  • Upload a table from your device in one of the supported formats (explained in the next section)
  • Import a table from Google Cloud Storage or Google Drive (this options allow you to import Google Sheets)
  • Import a table from Google Cloud Bigtable through the CLI

File formats you can import into BigQuery

You can easily load your tabular data into BigQuery in the following formats:

  • CSV
  • JSONL (JSON lines)
  • Avro
  • Parquet
  • ORC
  • Google Sheets (for Google Drive only)
  • Cloud Datastore Backup (for Google Cloud Storage only)

Note: You cannot import Excel files directly into BigQuery. To do this, you’ll need to either convert your Excel file to CSV, or convert Excel to Google Sheets and then load it to BigQuery. In this BigQuery tutorial, we won’t focus on Excel cases. 

Upload CSV data to BigQuery

Once you click the “Create table” button, you need to complete the following steps:

  1. Choose source – Upload
  2. Select file – click “Browse” and choose the CSV file from your device
  3. File format – choose CSV, but usually the system auto-detects the file format
  4. Table name – enter the table name
  5. Check the “Auto detect” checkbox 
  6. Click “Create table

This is what the basic flow looks like. Additionally, you can define partition settings (to divide your table into smaller segments), cluster settings (to organize data based on the contents of specified columns), as well as configure the “Advanced options“. This is what your table uploaded to BigQuery looks like:

Note: The table preview feature shows previews for tables stored inside BigQuery. For example, when you upload CSV, it is saved in BigQuery – you’ll see the preview. However, when you pull data from Google Sheets, it is a real-time connection since BigQuery scans Google Sheets every time you query it. In this case, you won’t have the preview available.

Import data from Google Sheets to BigQuery

Most of you would probably like to learn more about importing tables from Google Sheets to BigQuery. The workflow is very similar, but with a few modifications. Click the “Create table” button and:

  1. Choose source – Drive
  2. Select Drive URI – insert the URL of your Google Sheets spreadsheet 
  3. File format – choose Google Sheets
  4. Sheet range – specify the sheet and data range to import. If you leave this field blank, BigQuery will retrieve the data from the first sheet of your spreadsheet.
  5. Table name – enter the table name
  6. Mark the “Auto detect” checkbox 
  7. Click “Create table

You may be interested in setting up “Advanced options” since they let you:

  • Skip rows with the column values that do not match the schema.
  • Skip a specific number of rows from the top.
  • Enable including newlines contained in quoted data sections.
  • Enable accepting rows that are missing trailing optional columns.
  • Select an encryption key management solution.

Once you click “Create table“, the specified sheet from your spreadsheet will be imported into BigQuery. Here are the details (table preview is not available for importing Google Sheets):

Import data from your source to BigQuery

Let’s say you have a data set in Airtable or QuickBooks or another source that you want to import to BigQuery. You can do this manually via the CSV options as was described above or automate the import of data to BigQuery using Coupler.io. We’ll explain how to do this in the “Import data into BigQuery automatically” section of this Google BigQuery tutorial.

Query tables in BigQuery

The real power of BigQuery lies in querying. You can query the tables in your database using the standard SQL dialect. 

Note: The non-standard or legacy SQL dialect is also supported, but BigQuery recommends using the standard SQL dialect.

If you know what the Google Sheets QUERY function looks like, then you should understand how queries work. For example, here is a QUERY formula example:

=query(Deals!A:EU,"select E, N, T order by T Desc")

"select E, N, T order by T Desc" – this is the query to retrieve three columns of the entire data set and order the results in a descending order.

In BigQuery, the same query will look like this:

SELECT 
       string_field_4, 
       string_field_13, 
       string_field_19 
FROM `test-project-310714.test.pipedrive-deals` 
ORDER BY string_field_19 DESC

 Now we’ll explain how it works.

How to query data in BigQuery + syntax example

Click the “Query table” button to start querying.

You’ll see the query boilerplate, like this:

SELECT FROM `test-project-310714.test.pipedrive-deals` LIMIT 1000

This is the basic example you can use to start your introduction to querying. Add * after the SELECT method, so that the query looks like this:

SELECT * FROM `test-project-310714.test.pipedrive-deals` LIMIT 1000

This query will return all available columns from the specified table, but no more than 1,000 rows. Click “Run“, and there you go!

Now, let’s query specific fields (columns) and order them. So, instead of using *, we need to specify the field names we need. You can find the field names in the Schema tab, or from your last query.

Let’s replace the LIMIT method from the default query with ORDER BY – this will let you sort data by a specified column. To sort data in descending order, add DESC to the end of the query. Here is what it looks like:

SELECT 
       string_field_4, 
       string_field_13, 
       string_field_19 
FROM `test-project-310714.test.pipedrive-deals` 
ORDER BY string_field_19 DESC

We’re going to create a detailed Google BigQuery tutorial on SQL quite soon, but for now you can check out this SQL Video Tutorial for Beginners, made by Railsware. 

Query settings

If you click the “More” button and select “Query Settings“, you’ll be able to configure the destination for your query results, as well as other settings.

Query Settings

Here you can also set up to run queries in batches. Batch queries are queued and started as soon as idle resources are available in the BigQuery shared resource pool.

How to save queries in BigQuery

You can save your queries for later use. To do this, click “Save” => “Save Query“.

In the next window, name your query and choose its visibility:

  • personal – only you will be able to edit the query
  • project – only project members will be able to edit the query
  • public – the query will be available publicly for edit

Click “Save“.

You can find your saved queries in the respective popup tab.

How to schedule queries in BigQuery

Next to the “Save” button, there is a “Schedule” button, which allows you to enable scheduled queries. Your first thought may be, “Why would I run queries on a schedule?” Well, there are at least two reasons to do this:

  • Queries can be huge and take a lot of time to run, so it is better to prepare data in advance.
  • Google charges money for data queries, so if it is OK for you to have data updated daily, it is better to do that and use the already prepared views to query them ad-hoc.

Note: Query scheduling is only available for projects with billing enabled. It won’t work for SANDBOX account projects.

Once you click the “Schedule” button, you’ll get a notification that you need to first enable the BigQuery Data Transfer API. 

Click “Enable API” and wait a short time. After that, you’ll be able to create scheduled queries when you click the “Schedule” button.

Click “Create new scheduled query” and define the following parameters:

  • Name for scheduled query
  • Schedule options
    • Repeats
    • Start date and run time
    • End date
  • Destination 
    • Table name
    • Write preference (overwrite or append)
      • Overwrite – query results will overwrite the data in the table
      • Append – query results will be appended to the data in the table

Optionally, you can set up advanced and notification options. Click “Schedule” when the setup is complete.

Scheduled query setup

Next, you’ll need to select your Google account to continue to BigQuery Data Transfer Service.  

Query history

Let’s say you forgot to save your advanced query, but you want to restore it now. No worries, BigQuery provides you with logs of the queries and jobs you made. You will find them in the popup tabs Jobs history and Query history.

Note: BigQuery displays all load, export, copy, and query jobs for the past 6 months. It limits the job and query histories to 1,000 entries.

Export queries from BigQuery… and get your data into BigQuery 

In most cases, users need to export the results of their queries outside BigQuery. The common destinations are spreadsheet apps, such as Google Sheets and Excel, visualization and dashboarding tools, such as Google Data Studio and Tableau, and other software. 

To export your query results, you need to click the “Save Results” button and select one of the available options:

  • CSV file
    • Download to your device (up to 16K rows)
    • Download to Google Drive (up to 1GB)
  • JSON file
    • Download to your device (up to 16K rows)
    • Download to Google Drive (up to 1GB)
  • BigQuery table
  • Google Sheets (up to 16K rows)
  • Copy to clipboard (up to 16K rows)

As an example, let’s choose the BigQuery table option. You’ll need to choose the Project and Dataset, as well as name your table. 

Click “Save“, and there you go!

Export queries from BigQuery to Google Sheets automatically

Saving your queries results to Google Sheets manually is not particularly efficient, especially if you need to do this recurrently. You should automate exporting queries from BigQuery to Google Sheets – it only takes a few minutes and no coding skills are required. 

You need to sign in to Coupler.io, a solution for importing data to Google Sheets from apps, APIs, and other sources. They have a dedicated BigQuery integration that schedules export of queries to Google Sheets! So, once you’ve signed in to Coupler.io, click “Add importer” and complete these three steps:

  • Set up Source

You’ll need to choose BigQuery as the source, get credentials to connect to Google Sheets, and enter your query.

  • Set up Destination

You’ll need to choose Google Sheets as the destination, connect your Google account, and choose the spreadsheet, as well as the sheet, to export query results to.

  • Set up Schedule

You’ll need to customize the schedule for your automatic exports of queries from BigQuery to Google Sheets.

Once you’re ready, click “Save and Run” to get your query results into the spreadsheet. 

Read our blog post “BigQuery to Google Sheets” to learn how to get BigQuery credentials and how to use the add-on version of BigQuery importer.

Import data into BigQuery automatically

In the previous section, we chose Google Sheets as the destination. However, Coupler.io allows you to use BigQuery as the destination as well – this means that you can import data into BigQuery automatically on a schedule!

It works as follows:

  • You need to configure your source (where you want to import data from). Currently, Coupler.io supports more than ten ready-to-use sources, as well as connection to the REST API. 
  • You need to configure your destination – choose BigQuery, click connect and insert the credentials file to connect to BigQuery. It is similar to how we set up BigQuery as the source above, but with an important difference:

You must create a service account with two roles: BigQuery Data Editor and BigQuery Job User. 

The .json key file generated for BigQuery as a source won’t work for BigQuery as the destination.

After that, choose the BigQuery destination account 

and specify the names of the Dataset and Table to import data to.

This is it. Choose the schedule you want and get your data to BigQuery according to it.

Bonus: How BigQuery stores data

Unlike traditional relational databases that store data row by row, BigQuery’s storage is column by column. This means that a separate file block is used to store each column. This columnar format, called Capacitor, allows BigQuery to achieve very high throughput, which is crucial for online analytical processing.

BigQuery architecture

In BigQuery’s serverless architecture, the resources for storage and computing are decoupled. This allows you to get your data of any size into the warehouse and get on with data analysis right away. Here are the infrastructure technologies that make this happen:

  • Colossus – responsible for storage. This is a global storage system optimized for reading large amounts of structured data, as well as handling replication, recovery, and distributed management.
  • Dremel – responsible for compute. This is a multi-tenant cluster that turns SQL queries into execution trees. These trees have leaves called slots, and a single user can get thousands of slots to run their queries.
  • Jupiter – responsible for data movement between storage (Colossus) and compute (Dremel). This is a petabit network, which moves data from one place to another, and does so very quickly.
  • Borg – responsible for allocation of hardware resources. This is a cluster management system to run hundreds of thousands of jobs in BigQuery.

Learn more about BigQuery

Let’s be honest – our goal was not to write a perfect BigQuery tutorial, but to provide you with the answers every beginner has when they discover a new tool or technology. We are sure that you’ll probably have more questions after reading it, but you’ll have to find the answers to them yourself. 

The premier source of information is the official BigQuery documentation,  where you’ll find an extensive knowledge base of BigQuery usage. The drawback of this source is that it’s absolutely huge and sometimes over-structured (as are most of Google documentations).

From our side, we’ll try to cover other BigQuery-related topics to clarify some advanced points like BigQuery SQL syntax or using client libraries to get started with the BigQuery API in your favorite programming language. Good luck!

Back to Blog

Comments are closed.

Access your data
in a simple format for free!

Start Free