Aarhus University Seal

Screaming Frog SEO

Tutorial by Screaming Frog.

Screaming Frog SEO is an application for automated data retrieval on the web. It can crawl and analyse single web sites, and as a more specialised option it can crawl several web pages and extract hyperlinks.

This latter type of specific information extraction is commonly referred to as "scraping". (A formal distinction between "web harvesting" and "web scraping" does not seem to be established, but in most cases where specific data extraction occurs the term "web scraping" is used).

Working from a list of URLs will give you a list of other URLs that the web pages specified in your crawl are pointing to, from which you can proceed with network analyses, e.g. in Gephi. A CDMM step by step guide to link extraction may be found at the bottom of this page.

If it is needed to know the full amount of external links from an entire website, one can start in the free edition with scraping the main pages, thus obtaining an overview of other links on the website. This may have to be cleaned of external links and doublets, and this process may likely have to be rerun with a couple of iterations before finally using a full list of single web pages on the website for complete crawls of external links. The limit of 500 URLs per website means that this method is not feasible for large-scale websites or research.

Alternatively, the licensed version of the application allows for unlimited depth crawling by unticking "crawl depth limit", thus extracting outgoing links from entire websites from the starting page in one go. This method may however be time consuming and demanding on computing capacity if one attempts to scrape many websites in one go. The results from this kind of scraping may also be merged and demand data cleaning.

CDMM step by step guide (also includes tips on data cleaning): Link extraction with Screaming Frog SEO.

Service, including help pages: https://www.screamingfrog.co.uk/

Application download: https://www.screamingfrog.co.uk/seo-spider/

Works on: