Back to Blog

How to Retrieve Large Data Sets Through APIs With Paginated Requests

When you make a call to the REST API, you expect to get the entire requested data set. However, if the result includes hundreds or thousands of records, you will likely retrieve the first batch with a limited number of records. For example, the Shopify API array size limit is 250 records per page. So, to get a data set containing 534 records, you’ll have to make three paginated requests – separate calls for three pages:

  • page #1 for 250 records
  • page #2 for 250 records
  • page #3 for 34 records

Read on to learn how to deal with pagination, and what to do if you can’t send a paginated request from your app. 

What is a paginated request?

REST API pagination is the process of splitting data sets into discrete pages – a set of paginated endpoints. An API call to a paginated endpoint is a paginated request. 

There are three most common pagination techniques that define the way to make paginated requests:

  • Cursor pagination 
  • Page-based pagination (also known as offset pagination)
  • Keyset pagination (also known as seek pagination)

Cursor pagination

When you make a request, the server returns the first page of data and the cursor that points to the next page (and the previous one, if applicable). The page will be limited by default unless you use the “limit=” parameter with the maximum permissible value. As a rule, the cursor is a response link header that may appear as follows:

link →<https://sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250&page_info=ey2IiwibGFzdJDgxODYwMzE2MiwiI6MTYkaXJlY3Rpb24iOiJwcmVF9pZCI6SMNDI3OwNTA4bGFzdF92YWx1ZjI4NTAwMH0>; rel="previous", 
<https://sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250&page_info=ey2IiwibGFzdJkaXJlYw4MNDI4ODgxODYwMzE2Y3Rpb25iOiJwcmVF9pZCI6SI6MTMiwNTAibGFzdF92YWx1ZjI5BTAwMT6>; rel="next"

How to make cursor paginated requests

Let’s take a look at how it works in the example of the Shopify API. We need to retrieve data for 534 customers in our Shopify store. Since the Shopify API limit is 250 records per page, we’ll have to make three requests: 250 + 250 + 34 records.

To make requests to the Shopify API, you’ll need first to create a custom Shopify app. For more on this, read our blog post How to Connect Shopify to Google Sheets Without Coding

Here is what our initial request looks like:

https://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250

Note: If you forget to specify limit=250, the Shopify API will return records according to the default limit, which is 50. In this case, you’ll have to make 11 requests instead of 3.  

We now have the first batch of data limited to 250 records. If we check out the response headers, we’ll find the link header, which is our cursor to the next page:

link →<https://sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250&page_info=ey2IiwibGFzdJDgxODYwMzE2MiwiI6MTYkaXJlY3Rpb24iOiJwcmVF9pZCI6SMNDI3OwNTA4bGFzdF92YWx1ZjI4NTAwMH0>; rel="next"

Use this URL to retrieve the second page of records. After the request, you’ll get the cursors to the next and previous pages:

link →<https://sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250&page_info=ey2IiwibGFzdJDgxODYwMzE2MiwiI6MTYkaXJlY3Rpb24iOiJwcmVF9pZCI6SMNDI3OwNTA4bGFzdF92YWx1ZjI4NTAwMH0>; rel="previous", 
<https://sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250&page_info=ey2IiwibGFzdJkaXJlYw4MNDI4ODgxODYwMzE2Y3Rpb25iOiJwcmVF9pZCI6SI6MTMiwNTAibGFzdF92YWx1ZjI5BTAwMT6>; rel="next"

Use the URL marked as  rel="next" to retrieve the next (here, final) batch of records. This is how it works to import data through cursor pagination. Although this is the fastest option for paginated requests, it has a significant drawback:   

You can’t skip pages and send parallel requests for different batches. So, you’ll have to go one batch after another to get the page you need. 

Page-based or offset pagination 

With offset pagination, the data set is split into pages by two parameters:

  • limit=” – the number of records per page
  • offset=” – the page offset 

The API request should contain the “page=” parameter that indicates the page number to return, for example:

https://api.trello.com/1/members/HGJH23/notifications?limit=400&page=2

This is the most common pagination method for apps that use SQL databases such as Jira, Trello, etc.

How to make offset paginated requests

Unlike cursor pagination, where you get a response cursor to the next page, here your API request should contain the limit of records and the page number based on the limit. Let’s say, we need to retrieve 3500 member notifications in Trello. The maximum limit of records per page is 1000, so, you’ll need to make three requests with the following parameters:

  • Request #1: GET /notifications?limit=1000&page=1
  • Request #2: GET /notifications?limit=1000&page=2
  • Request #3: GET /notifications?limit=1000&page=3

The best features of the page-based method are that you can jump to a specific page and send parallel requests with different pages. 

However, the larger the offset, the slower the request is. That’s why some apps like Shopify or Slack replaced the page-based pagination with the cursor-based format. 

Keyset or seek pagination

With the seek pagination method, the data set is split into pages by a key or delimiter, which sets the sort order of the records. For example, the records are sorted by the “created_at” key. So, the key parameter for your paginated request should be either “since_created_at=” or “after_created_at=“. 

Some APIs also consolidate the seek pagination into the WHERE clause. For example, if “product_id” is a delimiter, you can use such parameters as “product_id>“, “product_id<“, etc.

Although the seek pagination looks quite handy due to the WHERE clause, it has several drawbacks:

  • Cannot jump to a specific page 
  • Cannot make parallel requests for different pages
  • Delimiter is tied to the sort order 

How to make keyset paginated requests

For example, we have an inventory of 890 products that are sorted by their id in ascending order: 1,2,3…890. The maximum limit of records per page is 300, so, you’ll need to make three requests with the following parameters:

  • Request #1: GET /products?limit=300

You’ll get records from 1 to 300

  • Request #2: GET /products?limit=300&since_id=300

You’ll get records from 301 to 600

  • Request #3: GET /products?limit=300&since_id=600

You’ll get records from 601 to 890

How to retrieve large data sets with Coupler.io 

Coupler.io is an out-of-the-box solution to import data from different sources into Google Sheets. It provides the JSON Client importer to get JSON data from REST APIs and convert it to spreadsheets without coding. Read more about importing JSON to Google Sheets in our blog post. 

The JSON Client importer does not support cursor paginated requests for now. Therefore, some users who import data from Shopify, Slack, and other apps with Coupler.io face the issue that they can’t import more data than the max limit per request.

Check out a few workarounds to handle this.

Simulate paginated requests in Coupler.io with the help of an API testing tool

Let’s get back to our Shopify case and use Coupler.io to import a data set containing 534 records. Use the following parameters to set up the JSON Client importer and get the first page of records:

Source
JSON URLhttps://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json
HTTP methodGET
URL query stringlimit: 250
Pathcustomers
Settings
Import modeappend

Click Save & Import and welcome the first batch of data in Google Sheets.

After that, use an API testing tool, such as Postman or Hoppscotch, to send the paginated request using the same JSON URL attached with the “limit=” parameter:

https://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json?limit=250

This is needed to read a response header with the link to the next page. Enter the URL in the corresponding field and click Send to make the request. After that, check out the Headers section to find the link header. This is how it looks in Postman:

Copy the URL form the link header and use it in the “page_info” parameter added in the URL query string field of your JSON Client importer. Here is how the setup will look:

Source
JSON URLhttps://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/customers.json
HTTP methodGET
URL query stringlimit: 250
page_info: eyJsYXN0X2lVlIjoxNjA1MDgyMjkIjo0Mjc4ODE4NjY4Njk4LCJsYXN0X3ZhbHg1MDAwLCJkaXJlY3Rpb24iOiJuZXh0In0
Pathcustomers
Settings
Import modeappend

Click Save & Import and welcome the second page of records in Google Sheets.

Repeat this for the next requests using both Coupler.io and your API testing tool. 

Export the data set manually in CSV and use Coupler.io to keep it updated 

This approach is possible in the example of Shopify, since the manual export of customers is supported. 

Manually export the data set from Shopify into a CSV file.

Upload this CSV file to Google Sheets from your device. 

Now you can set up the JSON Client importer to import the recently added entries from Shopify to Google Sheets. 

Note: the structure of manually exported data and data imported from API differs. So, you’ll need to import new entries to a separate sheet and map them with the main data set using the QUERY function. Learn more about using QUERY in Google Sheets

Bonus: How to import 250+ Shopify orders to Google Sheets

This is a real-life use case that we had to deal with. One of the Coupler.io users needed to automatically import around 10K Shopify orders to Google Sheets. The following solution did the job well:

Step 1: Set up a JSON Client importer 

Set up a JSON Client importer with the following parameters and run it to import the first batch of orders data:

Source
JSON URLhttps://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/orders.json
HTTP methodGET
URL query stringlimit: 250
Pathorders
Settings
Import modeappend

Step 2: Tailor a delimiter for orders to send requests 

We’ll use something like seek pagination to retrieve data using the “created_at” delimiter. Scroll down to the final record of the imported data.

As Coupler.io rests on the user’s spreadsheet time zone, they will see an appropriate time as a result. In order to fetch the next 250 rows, you need to subtract one second from the exact time of last order creation in the user’s time zone (DATE – 1 second). Here is the formula to do this:

={date-value} - 1/(24*60*60)

Now, transform the resulting date-time value into the following format: 

yyyymmddThh:mm:ss{time-zone} 
  • {time-zone} is the time zone of your Google Sheets account in the format hh:mm (for example, +02:00).

To learn your time zone, open File => Spreadsheet settings:

Here is how the date-time value looks in our case:

20200810T09:11:32-07:00

Step 3: Add the “created_at_max” parameter to the JSON Client importer setup 

Insert the “created_at_max” parameter into the URL query string field. This is how the JSON Client importer configuration should look:

Source
JSON URLhttps://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/orders.json
HTTP methodGET
URL query stringlimit: 250
created_at_max=20200810T09:11:32-07:00
Pathorders
Settings
Import modeappend

Click Save & Import and welcome the next 250 records of your data set. 

You’ll have to repeat Steps 2 and 3 for each consecutive request. 

How to append the downloaded history data on a daily basis

The following approach will help you import the recently added orders from Shopify and collect them as a historical database in Google Sheets. For this, create four separate sheets:

  • Sheet 1 titled “Last 250 orders
  • Sheet 2 titled “Last order
  • Sheet 3 titled “Only new orders
  • Sheet 4 titled “History data

JSON Client importer for Last 250 orders

For the sheet “Last 250 orders“, set up a JSON Client importer that will import the last 250 orders from Shopify on a daily basis. Here are the parameters to use:

Source
JSON URLhttps://KEY:shppa_PASSWORD@sf-sandwiches.myshopify.com/admin/api/2020-10/orders.json
HTTP methodGET
URL query stringlimit: 250
Pathorders
Destination
Sheet nameLast 250 orders
Settings
Automatic data refreshInterval: Every Day
Day of week: {choose the days you need}
At: 4 {or other time you need}
Time zone: {choose the time zone you need}
Import modereplace

Google Sheets importer for Last order

For the sheet “Last order“, set up a Google Sheets importer that will extract the latest order from the list of the last 250 orders. Here are the parameters to use:

Source
Spreadsheet{URL or ID of the "Last 250 orders" sheet}
Range2:2
Sheet name(s)Last 250 orders
Destination
Sheet nameLast order
Settings
Automatic data refreshInterval: Every Day
Day of week: {choose the days you need}
At: 5 {or other time one hour later than the import of Last 250 orders}
Time zone: {choose the time zone you need}
Import modereplace

The Shopify API always fetches orders sorted by the creation date, so you will have today’s last order in the first row. 

QUERY+FILTER formula for Only new orders

For the “Only new orders” sheet, use the QUERY and FILTER function to fetch only the orders after the last order. The following formula syntax in A1 cell will do the job: 

=QUERY(FILTER('Last 250 orders'!{data-range},'Last 250 orders'!A2:A>'Last order'!A1),"select Col1,Col2,Col3")

In our case, it looks as follows:

=QUERY(FILTER('Last 250 orders'!A2:H,'Last 250 orders'!A2:A>'Last order'!A1),"select Col1,Col2,Col3")

Note: Specify as many columns as you need both in the data-range and in the Select clause ("select Col1,Col2,Col3"). 

Google Sheets importer to accumulate historical data

And the last thing. Set up another Google Sheets importer to accumulate the historical data using the following parameters:

Source
Spreadsheet{URL or ID of the "Only new orders" sheet}
Sheet name(s)Only new orders
Destination
Sheet nameHistory data
Settings
Automatic data refreshInterval: Every Day
Day of week: {choose the days you need}
At: 6 {or other time one hour later than the import of Last order}
Time zone: {choose the time zone you need}
Import modeappend

As a result of the above manipulation, you’ll get the following flow:

  • 4 am: the last 250 orders from Shopify are imported; the new orders are filtered and stored on a separate sheet.
  • 5 am: Only new orders are appended to your historical data sheet.
  • 6 am: The last order is updated with a new value for the next import.

None of the above worked for me

Sign up to Coupler.io and import data from APIs and other sources into Google Sheets. However, if you failed to import data using Coupler.io, do not worry at all! Our team will be happy to help you solve your use case and make your importer work seamlessly. Feel free to contact our support team and get professional advice. Good luck with your data!

  • Julia Ryzhkova

    Product Manager at Railsware. I consider myself a 4 P's manager, focusing on Product, Process, Project, and People. In any of those fields, the main driver for me is the ability to make people's lives easier. That's also why I like to share my experience via blog. I have worked in consultancy, R&D, support, marketing, and PMO, which helps me look at any question from diverse points of view. Besides work, I really enjoy traveling and playing board games🎲

Back to Blog

Comments are closed.

Focus on your business
goals while we take care of your data!

Try Coupler.io