This article is largely inspired by the conference "The Impact of Big Data and Machine Learning on Online Marketing" at SEO Camp 2017, by Rodolphe Quenette, from Authoritas & Cécile Beroni, SEO Manager at Price Minister.
At Price Minister, the SEO is:
- 24 million pages
- 17 million pages indexed
- 30-40% of total visits
That's why there are so many problems with Big Data...
The purpose of this article is not to be exhaustive on these methods, nor even to give complete answers: Price Minister wants to tell us what he does, but not to reveal all the recipes, anyway!
Do you know the recipe for Nutella yourself?
>> The objective is rather to list 3 concrete examples of big data related optimizations that PriceMinister has worked on.
At first, the crawl of the whole Price Minister website took...6 months! And so Google's crawls were never up to date, and so it wasn't working well :).
>> We had to think about how to optimize Google's crawl: how to make it crawl what we want, optimize the caching based on that, to make sure that, in the daily crawl time that Google offers us, it spends it on the things that matter!
- Price Minister managed to predict 80% of the urls that will be crawled by Google every day!
- And therefore a big possibility to optimize caching, turnaround time, etc...
The analysis of keywords, often, it goes well on a few keywords...see for example this article on the search for keywords. But how to do when we're not talking about a few hundred keywords, but millions?
Here's an example of a methodology:
- Extraction of all the h1 of all the pages :
- we can detect the keywords
- then look at all the similar keywords by looking at the Google results for each one.
- and so by looking at the competitors' h1s:
- and we lock up...
- and then we get... 2.4 million keyword opportunities!!
- Of course, then you have to process this data...
- Keyword scoring: we'll sort by search volume, ease, check (still!) that we have the products behind, and zou, we'll have a prioritized list. Then of course, on a site like Priceminister, you can quickly create landing pages for everyone, so zou, it's not a problem 🙂 (I know, it's not the case for everyone!). Well, back to scoring:
- how to estimate the ranking capacity on each of the keywords?
- of course:
- and more: Competitors' Url with Majestic: Trust Flow, Citation Flow, and analysis versus the best page on the priceminister site on this keyword, to understand our data compared to theirs.
- and many others apparently, but that we won't know!
Beyond the keyword, we'll try to know the urls with the highest growth potential. Because given the volume of pages, we won't be able to work on everything.
I mean, after all, looking at the urls makes sense, right? A website, the SEO, we always talk about keywords, but in the end, concretely, it's pages that we manage, isn't it? 🙂
There we work by category, by cocoon, cluster, use the word you prefer, and then we send analysis.
For example the semantic cluster "jewels":
- we're looking at these urls versus the contestants' urls.
- we typologize these urls with e.g. Majestic data
- there we will be able to detect the pages with the highest potential for improvement by comparing our pages to these, and to understand what is missing from our site, from an on-page, mesh, etc. point of view.
That's the end of this little article that gives some ideas on how to use the Learning Machine and Big Data to optimize your SEO strategy when you manage a marketplace of tens of millions of web pages.