Have you ever been stuck in scrapping meta tags’ information from thousands of URLs?
You might think about using a third-party tool to scrap these data for you including important metadata, hreflang, H1s, and more. then do the required actions before running the tool again.
While that seems good for some people, there are some who need a fast and free way to do that, and that when our Google Sheet Scrapper comes.
Google Sheets Scrapper
Google Sheets has been every SEO’s best friend for a long time. Due to its various extensions and scripts to do various tasks on our behalf. And now, we’ll extend Google Sheets usability even further. Using SEO Basha’s Free Google Sheet Scrapper, you can do the following:
- Page Titles & Meta Titles
- Meta Description
- H1s
- The canonical link on the page
- Hreflangs
While that seems useful, you can also check for duplicate meta on every page (i.e., duplicate titles, descriptions, H1s, etc.).
Benefits of Google Sheet Scrapper
- Get the data you want: You can get the title tag, description tag, H1, robots tag, canonical tag, and hreflangs for each URL.
- Easy to use: we all used to manage our data using Google Sheets with its functionality of creating graphs, getting an overview of the data, and more.
- Right in place: you don’t need to import/ export your data anymore. You can run the functions, get the data you want, take your actions, and re-check again.
- No Code Required: you don’t need to write a line of code using this sheet, you can use simple functions to get the job done for you.
- Get data for thousands of URLs: using this sheet, you can get your needed data for thousands of URLs from any website.
- Free: No need to purchase a third-party tool, pay for your quota, or pay for this service. It’s a 100% FREE.
Getting Website Meta Tags In Google Sheets 100% Free
In the following steps, we will guide you on how to get the most out of the Google Sheets Scrapper in scrapping URL data and checking for errors. The main function we’re gonna use is the ImportXML built-in function in Google Sheets. In the following steps, we will see how to extract/ scrape our data from a set of URLs. The data we want to scrape are:
- Page Titles & Meta Titles
- Meta Description
- H1s
- The canonical link on the page
- Robots
- Hreflangs
1. Getting Title& Meta Title
As you know the difference between title and meta title tags: the title tag is the one showing in the browser tab while viewing a page, while the meta title is the one set for search engines to show it within SERP.
To scrape the title tag, it’s pretty straightforward. You should only use the importXML function as the following. Given the first parameter in the function, which is the cell containing your URL. The second parameter is set to “//title” to tell the function about the data we want.
=IMPORTXML(A2,"//title")
As we said before, we want to scrape the data and check for duplicate ones at the same time. The previous solution works fine and returns all the tags one above the other on each cell as the following image shows.
The previous image shows how titles are shown one above the other in each cell. While that function works, it will cause an error while trying to get title for multiple URLs and will cause a lot of confusion tracking each title correctly.
Solution:
The best way to get title tags for a URL without interfering with other URLs is to use another useful function in Google Sheets called textJoin. This function allows us to join different text instances in one cell using a delimiter. In our case, we can use any special character delimiter (i.e., $$$). You can use any special characters that you know will not appear in any place in the data. So, the final formula will be as follows:
=textjoin(" $$$",1,IMPORTXML(A2,"//title"))
The textjoin function will join any number of title tags in the cell A2 delimitered by a new line char(10).
Getting the meta title
Getting the meta tile is a little bit tricky and you need to add another layer for the importXML function to get it as it isn’t shown in the HTML code of the page like <title>this is a title</title>. But instead, it’s shown like <meta name=”title” content=”this is meta title”>.
So, to track the new layer of content in the meta title we need to make it clear in the function as the following:
=IMPORTXML(A2,"//meta[@name='description']/@content")
The second parameter looks different as the tags start with //meta referring to the meta tag, and the name of this tag is the title. While the final value we want to get appears in the content field.
As stated before, we need to check if there are multiple meta title tags on the page. And we can easily do this using the textjoin function. The final function formula will be as follows:
=textjoin(" $$$",1,IMPORTXML(A2,"//meta[@name='description']/@content"))
In this case, there is no meta title on our page, but if you have one, it should appear there.
2. Getting Meta Description
Getting the meta description is very similar to getting the meta title mentioned above. The only difference is @name=’description’. So, combining this with the textjoin function can give us a quick look at the meta description and any duplicates if any.
=textjoin(" $$$",1,IMPORTXML(A2,"//meta[@name='description']/@content"))
3. Getting H1 (Headings)
To get the H1, you can use the importXML function and the second parameter will be “//h1”.
=IMPORTXML(A2,"//h1")
Adding the textJoin from before to check if there multiple instances of H1 on the same page, the final formula will look as the following:
=textjoin(" $$$",1,IMPORTXML(A2,"//h1"))
You can also apply the same method for finding H2, H3, etc.
4. Getting Canonical Links
Canonical links are crucial to let the search engines (i.e., Google) know which URL is the main copy of this page. You can check Google’s Guide For Canonicals.
As you can see from the image above, the canonical link is shown in the HTML source of the page with the rel reference and specifically, as a value of href. To get that in Google Sheets using ImportXML, we can use the following formula:
=IMPORTXML(A2,"//link[@rel='canonical']/@href")
Combining this formula with the previous textJoin function will give us a quick overview of any other canonical link was found on the page. So, the final formula is:
=textjoin(" $$$",1,IMPORTXML(A2,"//link[@rel='canonical']/@href"))
5. Getting Robots Data (index, noindex)
Getting the index status of any page is very important. It helps SEs crawl and index our valuable content, remove other poor content from indexing, and save the crawling budget.
To extract the robots status of any page, we should access meta > robots > content. To do that we will use the following importXML formula combined with textJoin to get any duplicates as all the previous formulas:
=textjoin(" $$$",1,IMPORTXML(A2,"//meta[@name='robots']/@content"))
6. Getting Hreflang Versions
Hreflangs are important if your website is targeting clients from different languages. It helpfully specifies the best version for each user by the hreflang attribute. you can always return to Google’s Guide on Hreflangs.
To extract the hreflang versions on any page, you can use the following formula to get hreflang versions separated by a new line:
=textjoin(char(10),1,IMPORTXML(A2,"//link[@rel='alternate']/@hreflang"))
Bonus (Checking Duplicates)
Now you have all URLs listed in the Google Sheet, you can see some duplicate titles, meta titles, H1s, canonicals, robots, and hreflangs. To better view the errors in this sheet you can make a conditional formatting as follows:
- Select all the data ranges in your sheet.
- Go to Format > Conditional Formatting
- Set a new rule for text containing the special characters we added before ” $$$”
- Select the color you want and click enter.
Limitations
Every tool has certain limitations as well as this one. The limitation of this tool is 50,000 requests for the importXML function per sheet. So, If you want to get all the data listed in this article, you will scrape nearly 10,000 URLs. If you want to get around that limitation, you can always use different sheets.
Get Your Copy!
If you reached this point, Congrats!
At this point, we provide you with the final sheet you can use to get all the different data specified in this article. After downloading the .xlsx sheet, you can import it in any Google Sheet and work freely.
Notice: Make sure to allow access to use the functions properly!
More SEO Automation Tools
Finally, don’t forget to check our other Free SEO Automation Tools on our website, and feel free to contact us anytime 🙂
Credits
Great thanks to Freepik for its large library of photos and icons. Icon made by Freepik from www.flaticon.com