Google Analytics presents their data sampling technology is a way to present meaningful data on large data subsets (here is official documentation on this topic However most of the medium-sized online businesses worry a lot when they get a notification that the report is built on 70% of their data and not 100%.

And they worry not without reason especially in case they analyze some small segments of their audience.

Google Analytics users with a large amount of traffic normally know about the data sampling issues they may face. However, sometimes my clients ask me if sites with smaller amounts of traffic can also be affected with data sampling.

Strictly speaking data sampling in Google Analytics does not depend on traffic volume, it depends on the number of sessions that should be analyzed while building the report. It means that even in case the volume of traffic is high but the reporting time frame is small (one day or a couple of days) the reports will not be sampled. On the opposite in case, the amount of traffic is small but you want to have a report for 5 years there is still a chance that the report will contain sampled data.

It is not evident for most of the Google Analytics users but in case they want to analyze a segment which contains only a few sessions but the total amount of traffic in the Google Analytics view is large the reports will be sampled. It happens because while creating a report for the segment all the data for that timeframe is analyzed, not only the data that belong to the particular segment as many people assume.

I had a case when a client of mine generated a report for all the months starting from the beginning of the year. Everything was fine till June but in July their data started to be sampled because too many sessions were included in the time frame.

Samplation level also depends on the number and kind of dimensions that are present in the report. In case you apply Country as a secondary dimension to the traffic source report there is a chance that the report will be sampled.

Good news is that in Google Analytics interface you can always see when your report got sampled and the % of sessions that are used in sampling.

In case your data is affected by sampling you and you are not satisfied with the accuracy you get there is a number of solutions you can use.

Subdivide your data and send it to multiple Google Analytics properties

This is a good approach in case you have multiple regional versions of the site. In this case, you can use individual Google Analytics properties for each of them. Each of these properties will get fewer data and therefore will be either not affected by sampling at all or sampling threshold will be higher.

The negative outcome of this approach is that you will not have a view that aggregates all your data. In order to analyze general trends, it will be required to merge it either in some external tool.

You can also track all your data into two different properties: an individual property and a roll-up property which is same for all the sites. In this case, the sampling threshold in individual properties will be much higher than in the roll-up property. But sampling is not an important issue for exploring general trends.

Subdividing the data is not an ideal solution but it is very low cost compared to other solutions and is suitable for many situations.

Sending your data to BigQuery or other data base

Officially only Google Analytics 360 has an option to export data to BigQuery. However, there are opportunities for free Google Analytics users to send data to BigQuery as well.

This is a more advanced solution than splitting your data between multiple Google Analytics properties. This solution requires quite a lot of development work but in the long run, it's worth it because you will have all your data aggregated in one place, it will not be affected by any sampling, you will own your data, you can analyze and process the data the way you want to do it.

Here is an article that explains how free Google Analytics users can export Google Analytics Data to BigQuery

Leave a comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.