This article is mostly inspired by the following conference: "The impact of Big Data and Machine Learning on online marketing" that took place at SEO Camp 2017, by Rodolphe Quenette, from Authoritas & Cécile Beroni, SEO manager at Price Minister.



At Price Minister, SEO represents:

  • 24 million pages
  • 17 million pages indexed
  • 30-40% of total visits

Hence the very strong issues in Big Data.

The goal of this article is not to be exhaustive with these methods, nor to give complete answers: PriceMinister is willing to tell us what it does, but not to reveal all the recipes, anyway!

Do you know the recipe for Nutella ?

>> The goal is rather to list 3 specific examples of optimizations linked to big data on which PriceMinister has worked.

Predict Google's crawl in order to optimize it

At the beginning, the crawl of the whole PriceMinister site took...6 months! And so, the Google crawls were never up to date, hence, it was not working well :).

“We had to think about optimizing Google's crawl. How could we make Google crawl what we wanted and optimize caching to that effect so that Google spends its daily crawl time on the things that matter!”

  • PriceMinister has managed to predict 80% of the urls that will be crawled by Google every day!
  • And therefore a big possibility of optimizing caching, rotation time, etc.

Prediction of the best keywords

Often keyword analysis goes pretty well..see this article on keyword research for example. But how to do it when we are not talking about millions of keywords rather than hundreds?

Here is an example of a method you can follow :

  • Extraction of all the h1 of all pages :
    • detect the keywords
    • then look at all the similar keywords by looking at the Google results for each one
    • and then look at competitors’ h1, and repeat the sequence
    • and finally, we recover ... 2.4 millions of keyword opportunities!
    • yeah that's not bad :D
  • Well obviously, we have to process this data...
  • Scoring keywords: we will sort by search volume (easy), check (anyway!) that we sell the products to support those search queries, and bam, we get a prioritized list. Then of course, on a site like PriceMinister, we can quickly create landing pages for each one, so no problem :) (I know, it's not the case for everyone!).
    • how to estimate the ranking potential on each keyword?
    • of course:
      • search volume
      • CPC
      • CTR %
    • Plus : Url of competitors with Majestic data: Trust Flow, Citation Flow, and analysis versus the best page of the priceminister site on this keyword, to understand our data compared to theirs
    • and many others apparently, but we won't know!

Predictions on urls

Beyond the keyword, we will try to know the urls with the highest growth potential. Because given the volume of pages, we will not be able to work on everything.

“And after all, looking at the urls, it makes sense, right? A website, SEO, we always talk about keywords, but in the end, it's pages that we manage right? :)”

Here we work by category, by cocoon or cluster (use the word you prefer) and then we send the analysis.

For example the semantic cluster "jewels":

  • we look at these urls versus the competitors urls
  • we order these urls by type with Majestic data for example
  • then we are able to detect the pages with the highest potential for improvement by comparing our pages to these, and to understand what we are missing, from an on-page point of view, linking, etc.

There you have it, hopefully this little article gave you some hints on how to use Machine Learning and Big Data to optimize your SEO when you manage a marketplace of tens of millions of web pages.

   Article written by Louis Chevant

Further reading

The complete guide to using the Google Search Console

Google Search Console: free tool provided by Google to analyze the SEO performance of your website.