At first glance, Google BigQuery (BQ), a data warehouse, does not offer much in terms of SEO. However, there are some important restrictions that BQ can help to overcome.
Also, besides solving problems, it can provide insights that are not available using other SEO tools and services. As part of the “Google Cloud” service, it can be natively integrated with other Google apps for easy data importing or visualization.
Why choose BigQuery for SEO data storage?
Nowadays, advanced SEO looks more like data science. It involves analyzing huge amounts of information that needs to be stored somewhere.
BigQuery comes to mind first when thinking about data storage for SEO, simply because it is owned by the same company that has the most popular search engine.
Additionally, other Google services come up when you think of analytics – Google Search Console (GSC) and Google Analytics (GA).
Considering all of the above, BigQuery is an obvious choice to store and manage SEO-related data.
How BigQuery can help you with SEO
Mainly used for data storage, BigQuery is often overlooked when speaking of SEO. At first glance, it makes sense: what can you do besides dumping data into it? But there are a few non-obvious ways to take advantage of using BQ in the SEO race.
Solution 1: Store all the data from GA
Google Analytics is the most popular tool for analyzing website performance. Its capabilities are enough for most websites, but there is a serious flaw.
The problem with GA is that it cannot show 100% accurate data for big projects. Google calls it “Data sampling”. What it does is basically show you calculations based on a part of the data (sample), instead of the full data.
What information is sampled
Google posted an article about sampling where it defined the thresholds for data sampling:
- Default reports will not be sampled
- Analytics Standard: 500k sessions
- Analytics 360: 100M sessions with events, custom variables, and custom dimensions and metrics – other queries are restricted to 1M. Also, historical data is limited to up to 14 months.
Looking through these limitations, it can be difficult to understand whether you need to transfer to BigQuery or not. The easiest way to check if your reports are calculated or based on real data is to check the reports you are using: if you see this icon on your dashboard, then you see sampled statistics.
How to store and use GA data?
What you are doing with BigQuery is basically making a copy of all your GA data. You need to transfer every session to BQ to make it work (we will talk about this later).
After this is completed, it is time to recreate reports based on this data. The easiest way to do this is via Google Data Studio which is also a part of Google Cloud, so connecting data sources from BQ is not challenging at all.
The main limitation of GDS is its speed – for heavy dashboards, it can take a long time to load. An alternative is Microsoft Power BI. Its pros and cons are the exact opposite of GDS: it is fast but harder to set up.
Solution 2: Make a keyword rank checker from GSC
The Google Search Console brings the most valuable data for any SEO: the search queries data. While GSC’s dashboards are good enough to briefly check how the site’s going and the performance of some specific pages or keywords, it cannot show you a position change for each keyword in a single view.
This is where BigQuery comes into play. Using the GSC API, it is possible to store the positions of each keyword by every country and date. Having this database structure, it is not a problem to display positions in a similar format:
Google Search Console limitations
Although it is an avowed #1 SEO tool, GSC, unfortunately, has some data loss. All its metrics (clicks, impressions, positions) are recorded when the page is shown to a user (this action counts as one impression).
Another fact is that not all pages from the top 100 are shown to the user. By default, the first page of SERP is limited to 10 results. This means that pages beyond the tenth position get much fewer impressions, and impressions decrease for pages that rank lower.
In fact, it is possible that pages that rank in the 20th and 98th positions will receive 0 impressions. That way, both URLs will have no data in GSC, so determining their positions will not be possible.
This is how the GSC position graph looks like in this case:
Those gaps are the data loss we are talking about. What positions did we have when there was no position recorded? Did we drop below 10th, 20th, 50th, or 100th place? Or did no one search for this term that day? Those are questions we will never know the answer to.
Or will we?
Actually, there is a way to overcome this flaw: you can use a 3rd-party keyword rank checker. It works in a different way than the Search Console: each day it performs a Google search of the keywords you want to track the positions of. Its settings let it see the top 100 SERPs per query.
That way, if your URL is not presented in the results, its position is outside of the top 100. You can also know your exact rankings on each specific date. This way, you will be able to see a complete picture of your site’s performance.
Solution 3: Store and analyze your server logs
Not so long ago, Google Search Console released a Coverage report. With it, it is possible to see how long ago Googlebot visited your page. This data is valuable because if the page is not crawled often, it means that Google does not consider it valuable enough. It is a strong sign that there is something wrong with the page and you need to fix it.
Unfortunately, the Crawling report provides only the last crawling date for a limited amount of pages.
With BigQuery, it is possible to store every single Googlebot visit to precisely know at what time, what page, and how many times Google visited your site. From this data, you will know how many times each page is crawled and discover URLs that are rarely crawled (compared to others).
Solution 4: Merge GA, GSC, and custom data
Since BigQuery is based on SQL, it is the perfect place to save all the data in one place because it makes data merging much easier via DML.
To merge data, you need a property that will be presented in several datasets. Basically, it is very similar to the VLOOKUP function in Microsoft Excel.
Luckily, all SEO metrics relate to one of the following items: domain, URL, keyword, or date.
That way, data from different sources can be merged by common properties. Let’s see an example: you have several tables with SEO data. They are:
- Position tracker
- Server logs
All of them relate to some specific URL, so we can create a single database containing all the data of this format:
There can be much more data. For example, you can display your GA events and goals to see how users interact with your page or calculate the page’s search visibility – it is all up to you. The point is that you will be able to see all the stats of your URLs to detect specific problems on each page.
How to import data to BigQuery
To store and process data, it should be imported to BQ beforehand. There are several methods to do this, depending on the data source and regularity of imports: for some services, you need to set up daily uploads, and some data needs to be uploaded just once.
Luckily, BQ supports various uploading methods, so let’s review them in detail.
Currently, there are 350+ apps integrated with BigQuery. You are most likely using some of them already. The list includes mainly Google products:
- Google Analytics
- Google Tag Manager
- Google Sheets
- Google Calendar
- Google Maps
- Play Market
- PageSpeed Insights, etc.
For non-Google products, there is a separate marketplace of pre-made APIs. There are much more apps to choose from: you can connect anything from CMS (e.g., WordPress) to financial services (e.g., Stripe)
Besides Google’s marketplace, you can use BigQuery integrations by Coupler.io.
Coupler.io is a solution for importing data from multiple sources into BigQuery or spreadsheets, Google Sheets, or Excel. It provides more sources to choose from that are not presented in Google’s marketplace, such as Airtable, Facebook Ads, Xero, and many more.
In addition to loading data to BigQuery, you can export queries from it to Google Sheets or Excel on a custom schedule. The BigQuery data export is quite beneficial for making miscellaneous reports and live dashboards.
Try Coupler.io for free to check out how easier it is to get data from your app to BigQuery!
If you have your own application to transfer data from, BigQuery allows this: there are libraries that support any type of import from a third-party application, such as migration from another data storage platform or regular imports from your application.
Something to consider: BQ’s API is divided into five parts:
- Data Transfer
We will not dive deeply into this, because if you have your own application you know how to work with API.
If you have files that do not need to be regularly uploaded or you are okay with regular manual uploads, you can upload local files or folders.
However, if you need to regularly upload local files, it will be much more effective to set up automatic uploads to BigQuery.
Wrapping up: Should I use a BigQuery for SEO?
Yes, absolutely! The current big data era has an influence on the entire web, including SEO.
The fastest way for SEOs to enter this era is to use BigQuery since it provides simple data import, transformation, and further visualization. Use the rich integrations library to automate imports, the well-known language SQL to perform operations with databases, and Google Data Studio to build reports in a couple of minutes.
BigQuery is a comparatively simple tool for big data storage and processing and its capabilities are enough for websites of any size.
Kelly Breland is a Digital Marketing Manager at SE Ranking with experience in SEO, digital and content marketing. She is a persistent advocate of using content marketing to build a solid brand. In her spare time, she is engaged in gardening.Back to Blog