Back to Blog

How to Connect MySQL to BigQuery?

MySQL is a relational database management system that uses the client-server model. Many web applications rely on MySQL for handling, storing, and manipulating data. BigQuery is an enterprise data warehouse with a built-in query mechanism that’s also powered by SQL. 

Since BigQuery provides better performance and offers higher scalability, one may decide to connect MySQL to Google BigQuery. In this article, we’ll discuss a couple of methods that can help you migrate and replicate data between both database management systems. 

Why sync MySQL to Google BigQuery?

Although MySQL is widely popular and has been around for a long time, it’s possible to encounter some limitations when dealing with big data. One great solution would be to sync MySQL to BigQuery. Here are some reasons why:

  • BigQuery allows users to store and query massive datasets using the processing power of Google’s infrastructure.
  • You can seamlessly load bulk data records into BigQuery or stream smaller batches in real time. 
  • BigQuery can rapidly scale up and down to any size based on query complexity and volume of datasets.  
  • It enables users to load data records in a variety of data formats such as CSV, JSON, Avro, ORC, or Parquet. 
  • BigQuery allows users to analyze data using machine learning, geospatial analysis, and business intelligence.
  • You can access BigQuery through the Cloud Console, the command-line tool, or by making calls to the BigQuery REST API using client libraries such as Java, .NET, or Python. 

Import data from MySQL to BigQuery on a schedule

The data replication process between database systems is never-ending. You can use manual techniques or opt for a third-party option to automate the entire process. 

Coupler.io is a data integration tool that allows you to automatically import data from different sources to BigQuery, Google Sheets, and Excel. Here’s how to get started with our tool. 

12 coupler add importer

Select MySQL as the Source application and BigQuery as the destination app. Then click Proceed to continue. 

  • Set up your Source account

Connect to your MySQL account, then provide the database and table names. 

13 coupler source account
  • Set up your destination account

To connect to your BigQuery account, you have to do a few things:

  • First, go to your Google Cloud Platform and create a service account with two roles: BigQuery Data Editor and BigQuery Job User. 
  • Then, download your .json key file and use it to configure the integration. Here’s our guide on how to complete this setup.  
14 coupler gcp setup
  • Input the names of the BigQuery dataset and table where your data will be loaded. 
15 coupler destination
  • For the import mode, select Replace to fully replace all previously imported data. Or choose Append to place newly imported data under previous entries. 
16 coupler import mode
  • Create a custom schedule for the data importer to run
4. Coupler.io scheduling 1

Click Save and Run to transfer your data into BigQuery. This will also enable automatic import from MySQL to BigQuery.

Manual MySQL to BigQuery ETL

ETL is the process of collecting data from multiple sources and transferring it to a central repository. According to its initials, ETL includes three steps: Extract, Transform, and Load. Let’s discover the ETL process for extracting data from MySQL and loading it into Google BigQuery.  

Extract MySQL to CSV

BigQuery doesn’t exactly support the default format of a MySQL database so it’s best to export your data as a CSV file. Below are two methods for a MySQL export to CSV. 

  • Use mysqldump

mysqldump is a client utility that comes with MySQL. It allows users to export tables and databases. On the same machine as your MySQL server, run this command to directly create a CSV file:

mysqldump -u [username] -p -t -T/path/to/directory [database] [tableName] --fields-terminated-by=, 
  • Fill in your MySQL username, the location you prefer, the database name, and the name of the table you want to export. 
  • The above command will generate a copy of the table and save it at the specified location.
  • Use MySQL Workbench

If you prefer using a graphic user interface, MySQL has an Import/Export Wizard. You can output your database in CSV or JSON formats. 

  • First, open your database in Workbench and navigate to the Schemas tab on the left. You should see something like this
1 mysql workbench
  • Next, choose the table you want to export from the database. We are going to export the customer table here. 
2 workbench schemas
  • Use your mouse to right-click on the table, then click Table Data Export Wizard. Now, select all the columns you’d like to export. 
3 workbench export wizard
  • Click on Next. Now, choose the path you want to save the file. Select CSV as the format. 
4 workbench export csv
  • Click on Next to begin the export process. Once it’s done, you should find your CSV file in the chosen directory.

MySQL to BigQuery schema translation

BigQuery has a schema auto-detection feature for supported data formats. But you can specify a table’s schema when you load data into it. If you define a table’s schema, you also have to supply each column’s name and data type. 

  • Column Names

Column names in BigQuery can only contain letters, numbers, and underscores. The maximum length for column names is 300 characters. You’re also not allowed to use duplicate column names. 

  • BigQuery Data Types

If you’d like to know how to translate MySQL into BigQuery, look at the datatype table below:

MySQL Data TypeBigQuery Data Type
BITBOOL
CHARSTRING
VARCHARSTRING
BINARYBYTES
VARBINARYBYTES
INTINT64
TINYINTINT64
SMALLINTINT64
MEDIUMINTINT64
BIGINTINT64
FLOATFLOAT64
DOUBLEFLOAT64
DECIMALNUMERIC
TEXTSTRING
TINYTEXTSTRING
MEDIUMTEXTSTRING
LONGTEXTSTRING
ENUMNo type. Use any data type that can represent the values in ENUM
SETNo type. Use any data type that can represent the values in SET

Data types such as DATE, TIME, DATETIME, and TIMESTAMP are the same in MySQL and 

Google BigQuery. 

Load CSV data to BigQuery

A Comma Separated Values (CSV) file is one of the most popular data formats. Once you’ve exported your file from MySQL, the next step is to load your CSV into BigQuery. 

Although you can upload data directly into BigQuery from a local data source, it’s better to use Google Cloud Storage to upload large files. Here’s the easy two-step process:

  • Upload to the Google Cloud Storage 

Sign in to your Google Cloud account and follow the steps below:

  • Open the navigation bar and click Cloud Storage.
5 google cloud storage
  • Create or select a GCS bucket. You should see it under Buckets
6 gcs buckets
  • Click on your bucket. On the bucket details page, click on Upload files and select the file. 
7 gcs test bucket
  • Once your CSV file has been uploaded, it will be listed there.
8 test bucket csv
  • Create and load a table in BigQuery

On the Google Cloud console, navigate to BigQuery. Then follow these steps:

  • Create or select an existing dataset, then click on Create Table
9 bigquery create table
  • Next, choose Google Cloud Storage as the Source. Select your file from the GCS bucket and specify CSV as the file format. 
10 bigquery table source

Fill out the table name and check the Auto detect checkbox for Schema. You can also configure the Advanced Options. Then, click Create Table. This should create the table and populate it with data from the CSV file. 

  • Here’s an example of a BigQuery table loaded with CSV data:
11 sample bigquery table

How to migrate MySQL to BigQuery?

As streaming analytics becomes more popular, it’s imperative for companies to optimize how they capture and process data. With streams of data records generated continuously, you can either process streams in real-time or batch process at regular intervals (hourly or daily). 

However, you’d have to migrate data from your data sources to destination databases that can handle real-time data and also accommodate batch processes. A technique known as Change Data Capture (CDC) makes it possible for us to work with changed data in real time. 

Here are two Google Cloud products to help you perform a MySQL CDC to Google BigQuery:

Google Datastream 

Google Datastream is a serverless Change Data Capture (CDC) and replication service. It allows you to synchronize data across different databases, storage systems, and applications with minimal latency. This self-paced lab will help you learn how to migrate MySQL to BigQuery using Datastream. 

To work with Google Datastream, you have to be familiar with standard Linux environments and have a good understanding of change data capture (CDC concepts. Below are the steps for using Google Datastream:

  • Create a MySQL Cloud SQL instance in Google Cloud Console 
  • Prepare a Cloud Storage Bucket
  • Create Pub/Sub topics, subscriptions, and notification
  • Import database file into MySQL
  • Configure Datastream connection profiles for MySQL and Cloud Storage
  • Create and define Datastream stream source and stream destination
  • Create a BigQuery dataset for your project
  • Deploy the Dataflow job and view the data in BigQuery. 

Google Dataflow

Dataflow is a fast and cost-effective service for unified stream and batch data processing. It enables real-time streaming data analytics, simplifies operations and management, and also reduces the total cost of ownership. This sample CDC solution lets you ingest data from a MySQL database and sync it into BigQuery. 

To use the Dataflow CDC solution, you need to be familiar with Java 8 and Apache Maven. Here’s a summary of the steps:

  • Create a MySQL Cloud Instance and deploy your database
  • Create a set of Pub/Sub topics for the tables you want to export
  • Deploy a Debezium-based connector for MySQL
  • Start a Dataflow pipeline to synchronize MySQL and BigQuery tables

How to replicate MySQL to BigQuery?

Database replication is the process of sending data from a primary database to another data warehouse. It’s necessary for building and maintaining a strong data integration system. Change Data Capture(CDC) also allows you to replicate a critical database. There’s a Google tool to help you replicate MySQL to BigQuery. 

  • Google Cloud Data Fusion

Cloud Data Fusion is a data integration service that helps users efficiently develop ETL/ELT data pipelines. It allows you to easily replicate databases like MySQL directly into BigQuery. It provides adequate support and performance for both real-time and batch data integration. 

An entire blog post is needed to cover this process, but here’s a quick rundown of the steps for working with Google Cloud Data:

  • Set up your Compute Engine and install the MySQL database on it. 
  • Enable replication by setting up your MySQL CDC source
  • Configure and start a Cloud Data Fusion Replication job
  • Check out the replicated database and table in BigQuery

What’s the best way to import from MySQL to BigQuery?

Although MySQL is an efficient database management system, it cannot provide the full capacity to deal with big data. So we have to migrate or replicate our databases to BigQuery for better scaling and to run complex analytical queries against massive data volumes.

As discussed earlier, there are a number of methods that you can use to connect MySQL to BigQuery. However, it is best to choose a method based on your specific data needs. Coupler.io is an efficient and code-free approach to help you automate the entire data integration process. 

Back to Blog

Comments are closed.

Focus on your business
goals while we take care of your data!

Try Coupler.io