Google BigQuery Regexp: How to Apply Regular Expressions in BigQuery
Regular Expressions (RegExp for short) are maybe one of the most important skills in the field of data analytics and data manipulation. They are used in search engines, word processors, text editors, and other popular tools.
In this article, we’ll present the functions that BigQuery provides for the use of regular expressions and some useful tips and tricks on how to perform popular actions on your data.
BigQuery Regexp Functions
Regular expressions are a pattern or a sequence of characters that allows you to match, search and replace or validate a string input. There are a lot of cases where during analysis we want to extract certain parts of a string, validate whether a string has a specific format or not, and even replace or remove certain characters.
As BigQuery is one of the most popular cloud data warehouse tools used by millions of people for storage and analysis, RegEx functions could not be missing from the list of the provided default functions:
Function | Syntax | Description |
---|---|---|
REGEXP_CONTAINS | REGEXP_CONTAINS(value, regexp) | Verifies whether the input matches the given regular expression pattern |
REGEXP_EXTRACT | REGEXP_EXTRACT(value, regexp) | Extracts the first matched sequence of characters that matches the regular expression pattern. |
REGEXP_EXTRACT_ALL | REGEXP_EXTRACT_ALL(value, regexp) | Extracts every matched sequence of characters that matches the regular expression pattern. |
REGEXP_INSTR | REGEXP_INSTR(source_value, regexp, [position], [occurrence], [occurrence_position]) | Extracts the position where the first matched pattern is starting. |
REGEXP_REPLACE | REGEXP_REPLACE(value, regexp, replacement) | Replaces the matched sequence of characters that matches the regular expression pattern with the given string. |
REGEXP_SUBSTR | REGEXP_SUBSTR(value, regexp, [position], [occurrence]) | It’s a synonym to the REGEXP_EXTRACT function, and it works the same way. |
BigQuery RegExp Tips & Tricks
Before showing you great tips and tricks on applying regular expressions and BigQuery RegExp functions on your data to extract and manipulate your data, we will first import our example dataset into BigQuery. For this, we will use Coupler.io, a solution for importing data from multiple sources into BigQuery, or Google Sheets, or Excel. The key benefit here is that you can automate data load on a schedule.

You need to:
- Set up source (the app or file to get data from)
- Set up destination (configure your BigQuery project and select the dataset and table to import data to)
- Set up a schedule (customize the frequency for automatic data refresh)
Read our BigQuery Tutorial guide to check out how it works. And here is the dataset we imported from Excel to BigQuery for the sake of examples:

Here is the preview:

Check out the many BigQuery integrations available.
Now that we’ve loaded our sample dataset, it’s time to see all the tips and tricks on how to apply regular expressions to it and make the most out of our data.
BigQuery RegExp: How to check if a column is a valid phone number
The first tip we’ll show you is how to verify whether a column contains a valid phone number or not. For this case, we will use the REGEXP_CONTAINS function and the below regular expression, which is created to match any US formatted number:
^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
So we can use it in the below query:
SELECT phone_number, REGEXP_CONTAINS(phone_number, r"^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$") AS is_valid FROM `projectname.datasetname.tablename`

In case we use the same regular expression with the “email” column, we would get “false” as a result because it’s not a valid phone number:
SELECT email, REGEXP_CONTAINS(email, r"^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$") AS is_valid FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to find a specific word
On a similar approach, we will try to find a specific word in one of our columns. In detail, we will use the “message” column to search for the word “dolor” (random word). So, if we use the RegExp .*dolor.*
, we can see the following results:
SELECT message, REGEXP_CONTAINS(message, r".*dolor.*") AS contains_word FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to check if a column does not contain a word
Now let’s see a trick where you can check if a column does not contain a word. For just a single word you can also use the [NOT] LIKE operator, but RegExp are also handy for more than one word or to make search case insensitive.
Similar to the previous operation, we can try to find if a column contains a word and reverse the outcome by using the NOT operator. For example, using the same regular expression but adding NOT to the start of the expression, we can achieve the desired result:
SELECT message, NOT REGEXP_CONTAINS(message, r".*dolor.*") AS contains_word FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to check if a column contains a word from another column
There are many cases where we want to evaluate whether a column contains a word from another column. For this case, we will again use the REGEXP_CONTAINS function. For the regular expression, we will use the CONCAT function to concatenate the information from the other column with the regular expression pattern
CONCAT(r'(?i)', word, r'')
Now, let’s construct the query:
SELECT message, word, REGEXP_CONTAINS(message, CONCAT(r'(?i)', word, r'')) FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to replace a word
Another popular operation where regular expressions are used is to replace a specific word from an entry with another one. To do that, we’ll use the REGEXP_REPLACE function and we’ll replace the word “tellus” with the word “REPLACED”:
SELECT message, REGEXP_REPLACE(message, r"tellus", "REPLACED") AS replaced_word FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to escape a special character
There are some cases where we want to use a special character in our regular expression that’s also reserved by regular expressions. Below are some examples of popular special characters reserved for regular expressions:
- Dot (
.
): Matches any possible character or symbol - Asterisk (
*
): Expects zero or more occurrences of preceding character - Question Mark (
?
): Expects zero or one occurrence of preceding character - Plus sign (
+
): Expects one or more occurrences of preceding character - Vertical Bar (
|
): Add to denote an alternative match - Circumflex (
^
): Used to denote the beginning of a buffer - Dollar Sign (
$
): Used to denote the end of a buffer
To review all reserved special characters, you can visit Mozilla’s Regular expression syntax cheat sheet.
To escape a special character like the above and use it by it’s literal value, we can use the backslash (\
). If you use a backslash right before a special regular expression character, then this will be used as a literal value rather than as an expression. For example, if we want to find all messages that contain a dot (.
) we can use the function REGEXP_CONTAINS and the escape character in our regular expression:
SELECT message FROM `projectname.datasetname.tablename` WHERE REGEXP_CONTAINS(message, r"\.")

As you can see, in this query, we returned only the messages that contain a dot using regular expressions.
BigQuery RegExp: How to split a string
A great example of how regular expressions can be useful in your analysis is when you want to split a string on a given delimiter (e.g., a space) and take the first or the second part. Using the function REGEXP_EXTRACT and this regular expression ^(.+?),
, we can extract everything that appears before the first comma in a string:
SELECT message, REGEXP_EXTRACT(message, r"^(.+?),") AS extracted FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to lookahead
Lookaheads might not be that popular but can be a really useful approach in some cases. A positive lookahead will look for a matched pattern only if it’s being followed by another pattern but without making the latter one a part of the result. A negative lookahead is when we want to look for a matched pattern only if it’s not followed by a specific matched pattern.
Unfortunately, Google BigQuery is using the RE2 expression library, which is not currently supporting either positive or negative lookaheads.
BigQuery RegExp: How to replace special characters
Special characters can be really useful in a text or something that is not wanted. So there are cases where we either want to remove or replace all special characters in a text. A special character can be anything other than a letter or a number, including dots, commas, spaces, and others.
This can be achieved by regular expressions like this [^a-zA-Z0-9]+
and REGEXP_REPLACE function as shown below, where we remove all special characters from an email address:
SELECT message, REGEXP_REPLACE(message, r"[^a-zA-Z0-9]+", "") AS removed_special_characters FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to use extract groups
The extraction of groups is a very popular operation of regular expressions. Groups can be a set of matched patterns that exist in the same text of interest.
The most common usage is to extract every available word from a given text. A word is defined as any set of alphabetical characters that do not contain a special character between them. To extract all words from a given text using regular expressions, we can use the REGEXP_EXTRACT_ALL function and this regular expression (\w+)
to identify all words. This function will return an array of all identified words:
SELECT message, REGEXP_EXTRACT_ALL(message, r"(\w+)") AS words FROM `projectname.datasetname.tablename`

BigQuery RegExp: How to extract a number from a string
Finally, the last trick is how to extract all numbers from a string. This can be useful if you want to find out all numbers that exist in a string and/or even replace them. To do that, we can use the REGEXP_EXTRACT_ALL function again along with this regular expression ([0-9]+)
to extract all numbers that appear in a given address:
SELECT address, REGEXP_EXTRACT_ALL(address, r"([0-9]+)") AS numbers FROM `projectname.datasetname.tablename`

When should I use BigQuery Regexp?
Regular expressions are great when it comes to string manipulation and can be an amazing asset in your analysis arsenal. With the above tips and tricks, we hope we gave you a strong head start on the topic and smooth the learning curve.
Using these tricks you will be able to boost your analysis capabilities and manipulate your data more powerfully. Regular expressions are a great skill to learn, and you will not regret getting your hands dirty.
Back to Blog