BigQuery Data Types Explained
Google BigQuery supports lots of different types of data, ranging from simple numbers to even the most complex arrays or structs. There are many of them so you can have the flexibility you need when working with data. Inevitably, it also builds up a little bit of a learning curve for beginners.
No worries, though. They’re fairly simple to grasp. To make things easier, we have gone through the list and explained each BigQuery data type for you in a simple manner, with examples.
Google BigQuery data types
Each platform that allows for storage and manipulation of data has its own data types. Because of them, data can be displayed in a logical order – for example, date and time entries follow the ‘DD-MM-YY, HH:MM:SS format (or similar).
Even more importantly, it allows for all sorts of operations to be performed on data, regardless of its type.
As you import the data into BigQuery, you will want to map the data types from the source to those available in BigQuery. They can be grouped into eight categories:
- NUMERIC (including INT64, NUMERIC, BIGNUMERIC, and FLOAT64)
- TIME (including DATE, TIME, DATETIME, and TIMESTAMP)
For the purpose of demonstration, we’ve set up a Google Sheets file with sample data for each category we’ll cover. Then, we quickly ran a Coupler.io importer that moved our table into BigQuery. After import, it looks like this:
Let’s now go through each data type, one by one.
This type can be further broken down into four sub-types:
- Integer type (INT64)
- Numeric type (NUMERIC DECIMAL)
- Bignumeric type (BIGNUMERIC BIGDECIMAL)
- Floating point type (FLOAT64)
Numeric data is used in all sorts of calculations, data reporting, and analysis. It’s arguably the most common data type.
Integers can range from -2^63 to 2^63 – 1 (or, in other words, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807), so the maximum capacity of 64bit integer. The values can’t contain decimals or fractional components.
SELECT cast('12345' as INT64) int
Numeric and bignumeric types can contain fractional numbers and are defined with precision (number of digits) as well as scale (number of digits after the decimal point). The specs are as follows:
|Precision||38||76.76 (77th digit is partial)|
SELECT cast('5.2E11' as NUMERIC) numeric, cast('5.2E37' as BIGNUMERIC) bignumeric
Both types are frequently used in financial formulas.
Floating points are numbers with approximate numeric values, with decimal or fractional components. These include non-numeric values: NaN, +inf, and -inf.
SELECT cast('5.4321' as FLOAT64) float
The hierarchy of values looks as follows (starting with the least valuable)
- Negative numbers
- Positive numbers
BOOLEAN type can be TRUE or FALSE (both case insensitive), but also NULL. The latter is the least valuable, followed by FALSE and TRUE.
SELECT cast('false' AS BOOL) AS boolean
Boolean values are common when exporting data from applications or other software solutions. They can indicate particular settings, states, or responses.
STRING type refers to variable-length data that operates on Unicode characters (rather than bytes). It must be UTF-8 encoded.
To quote STRINGs, you have to use either single (‘) or double (“) quotation marks. Alternatively, they can be triple-quoted with groups of three single (”’) or three double (“””) quotation marks.
SELECT cast(555 AS STRING) AS string
Strings are in frequent use for storing any user-generated inputs, such as logins, email addresses, replies to surveys, etc. They’re particularly common in eCommerce where they represent shipping and billing addresses, customers’ requests, tracking numbers, and many other things.
BYTES also represent variable-length data but, as the name suggests, they operate on raw bytes rather than Unicode characters. For that reason, STRING and BYTE types can’t be used interchangeably.
SELECT cast ('coupler_io' as BYTES) bytes
Similar to boolean type, bytes are in common use in computer programming when certain strings need to be encoded into a common format.
DATE type represents a calendar date, independent of a time zone. It doesn’t refer to a particular 24-hours window but, instead, will represent a different period in each different time zone.
For example, for PST (Pacific Standard Time, UTC-7) date type defined as 2021-05-01 will represent the period between 5 p.m. on April 30th and 5 p.m. on May 1st, while for Japan Standard Time (UTC+9), the same data type will represent the period between 9 a.m. on May 1st and 9 a.m. on May 2nd.
SELECT cast('2021-05-01' AS DATE) AS date
This type represents time in the way you would expect to see on a digital watch – for example, 10:37:52. It’s independent of a specific date.
The canonical format is
[H]H:[M]M:[S]S[.DDDDDD], so hours, minutes, and seconds can be expressed both as single and double-digit values. “D” stands for fractional digits and can amount to up to 6 digits (microseconds precision).
SELECT cast('5:59:12.0422' AS TIME) AS time
The same as with Date type, Time type can refer to different times, depending on the time zone in question. At the same time, Date and Time types can’t be CAST() with a specific time zone.
The only option is running a CAST() as a timestamp. For example:
SELECT CAST('2021-06-07 05:59:12.0422-3:00' AS TIMESTAMP) AS time
And would return:
DATETIME combines Date and Time types and can be CAST() as follows:
SELECT cast('2021-05-01 21:32:45' AS DATETIME) AS datetime
TIMESTAMP represents an absolute point in time, with date and time. TIMESTAMP type supports time zones and is always displayed in UTC. This time zone will also be used if you fail to specify one.
SELECT cast('2021-05-27 8:05:01-3:00' AS TIMESTAMP) AS timestamp
Making sense of date/time types
All four Google BigQuery data types are in common use in solutions built for tracking – monitoring employee’s time, estimating the delivery time, logging visits, and many many other things.
They’re quite similar to each user and often can be used interchangeably. The choice of data type depends mainly on the precision you’re after.
TIMESTAMP will always assume a time zone and can be precise down to microseconds. The same precision is also available for TIME and DATETIME. The difference is that TIME doesn’t require a time zone to be specified while DATETIME won’t support it.
DATE type works with time zones but won’t assume one by default.
GEOGRAPHY type is used to represent the collections of points, lines, and polygons and is based on OFC Simple Features specification (SFS). Most commonly, points are used to specify a particular location on Earth, using the same longitude/latitude values as any GPS system.
There are also more sophisticated use cases. Data types for BigQuery, such as GEOGRAPHY, can be exported into Google Data Studio, for example, for visualization purposes. Entire maps and routes can be built automatically with the data from BigQuery tables.
SELECT cast(ST_GEOGFROMTEXT('POINT(51.500989020415034 -0.12471081312336843)') as GEOGRAPHY) geography
ARRAY type represents an ordered list of 0 or more elements of the same data type. Note that none of these elements can be of an ARRAY type themselves. To use an ARRAY as an argument of a query, you’ll need to incorporate a STRUCT type instead.
SELECT cast(['word1', 'word2', 'word3'] as ARRAY<string>)
When working with arrays, there are two limitations that concern NULL values:
- If an ARRAY contains a NULL, BigQuery will return an error. However, queries with NULL can be, however, used inside a query.
- In query results, an ARRAY that contains only NULL values will be returned as an empty ARRAY. In the query itself, an empty ARRAY and NULL are two distinct values.
Arrays are in common use in computer programming as well as when working with large data sets. They eliminate the need to declare thousands of variables or repeat the same data over and over again (for example, when thousands of users have the same settings but only differ by the email address they used to sign up).
STRUCT type represents a container of ordered data of any type. Data types can be freely mixed up within a STRUCT. It’s required to specify the type while a field name is optional.
SELECT cast((555,'name') as STRUCT<INT64,STRING>) struct_
While declaring a STRUCT type, the types of elements are defined within angle brackets (< and >). Here are some examples of STRUCT types:
STRUCT<INT64>– the simplest STRUCT that contains only integers
STRUCT<123 STRUCT<4 INT64, 5 INT64>>– a STRUCT can also be nested within another STRUCT. Here, a STRUCT type named ‘123’, containing two fields, named ‘4’ and ‘5’, was nested within another STRUCT. Both fields are of Integer type.
STRUCT<some_array ARRAY<INT64>>– a STRUCT contains an ARRAY, named some_array.
STRUCT<NULL>– a STRUCT containing only NULL values.
STRUCT has many uses as it can contain any other Google BigQuery data types, and it can combine different types together.
A STRUCT can be, for example, a representation of an entire row in a table. Events companies could, for example, use a STRUCT element to store all the available information about a certain participant. It could be their contact details, dates of past events they participated in, timestamp of their registration and last login, all preferences, payments, and even the GPS coordinates of their favorite venue – all of these in a single STRUCT element.
BigQuery data types & standard SQL expressions
Many of the BigQuery data types work with easy with common SQL expressions, such as:
- ORDER BY
- GROUP BY
- PARTITION BY
- COMPARE TO (other values of the same type)
The exceptions are GEOGRAPHY, ARRAY, AND STRUCT types because of the wide range of arguments they can accept. It’s understandable that, for example, a STRUCT that only contains integers won’t be easy to group along with a STRUCT nesting another STRUCT or an ARRAY containing different data types.
GEOGRAPHY and STRUCT, however, offer a limited ability to be compared to other values of the same type.
All in all, the support for expressions among all data types looks like this:
|ORDER BY||GROUP BY||DISTINCT||PARTITION BY||COMPARE TO OTHERS|
|GEOGRAPHY||❌||❌||❌||❌||✅ / ❌|
|STRUCT||❌||❌||❌||❌||✅ / ❌|
When working with BigQuery data, you’ll often need to export it to different applications or, quite the opposite, bring it from outside to your BigQuery database. The latter can be done with the BigQuery Data Transfer Service.
Coupler.io supports both flows, making it easy to sync data between apps both manually, and on schedule. Be sure to give it a try if you haven’t yet.
Also, check out our detailed Google BigQuery SQL tutorial where we touched on the basics and dived deeper into the syntax and common functions. If you’re just getting started with BigQuery, it should be quite a read!
Thanks for your time, and see you around.Back to Blog