Google Sheets
by Hady ElHady | Mar 13, 2024
If you use spreadsheets to store data from the web or other external sources, both Excel and Google Sheets offer built-in functions that allow you to import data without having to recur to copying and pasting from the web.
For example, theIMPORTHTML function in Google Sheetsallows you to import data from actionable tables and lists on websites. Moreover, theGoogle Sheets IMPORTDATA functionlets you import .csv and .tsv files from a URL into your spreadsheet.
In this article, you’ll learn what the IMPORTXML function is in Google Sheets, understand the syntax of the IMPORTXML formula, and how to use the IMPORTXML formula through examples. In case this function is not working, you can also find recommendations on how to solve this issue at the end.
What is IMPORTXML in Google Sheets?
The IMPORTXML function imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds. The IMPORTXML function allows you to import data from websites for data mining or web scraping.
Although this function works with most webs, there may be web owners that block these types of functions, as well as other bots, to access their data. In the last section, you can learn how to work around these issues.
IMPORTXML Syntax
The syntax for the IMPORTXML function is:
=IMPORTXML(url, xpath_query)
- urlrefers to the URL of the website. You can represent this value in the formula by including the protocol “http://” or “https://” and URL between double quotation marks or a reference to a cell containing the corresponding text.
- xpath_queryis the parameter in charge of telling the function what data to import. For example, if you define this parameter as “//h1/@title”, you’ll get all the attributes from < h1 > tags on the website.
As you can see, XPath is a type of query language in itself that is used to retrieve data from the web. If you want to learn more about the XPath language, read this tutorial provided by W3Schools.
How To Use IMPORTRANGE Function In Google Sheets?
Google Sheets allows you to import and link a specific range of cells from another spreadsheet. Here’s how to use the IMPORTRANGE function in Google Sheets
READ MORE →
How to use IMPORTXML in Google Sheets?
The IMPORTXML function allows you to import data from websites directly into your Google Sheets. Now you’ll learn the different ways this function can be used through practical examples.
There are many ways to use the IMPORTXML function in Google Sheets. Here, you’ll learn the most widely used according to the type of XPath query.
Use the “//@href” to scrape all links
Web scraping is a common practice for businesses that want to grow with web data. For example, let’s say you’re looking into your competitor’s website and want to have a list of all the links used on their website. This is how you can use IMPORTXML to scrape all links on a website.
1.Go to your Google Sheets and type in “=IMPORTXML” to trigger the built-in function.
How to Use IMPORTXML in Google Sheets (Examples) – Trigger function
2.Type in the URL from the website you’d like to scrape all links. Include the URL between quotation marks.
How to Use IMPORTXML in Google Sheets (Examples) – Include URL
3.Include the XPath query that will scrape all links, “//@href”, between quotations marks and close with parenthesis.
How to Use IMPORTXML in Google Sheets (Examples) – Include XPath query
4.Press “Enter” to obtain all links from the website, as shown below.
How to Use IMPORTXML in Google Sheets (Examples) – Scraped links
Use the “//a[not(contains(@href, ‘example.com’))]/@href” to scrape external links
An external link is a hyperlink that targets a domain outside the source website; these usually have more impact on search engine rankings since they represent more of an objective stamp of approval for a website. This is how you can scrape an external link using the IMPORTXML function.
1.Repeat steps 1-3 from the previous section.
2.Include the XPath query that will scrape all links, “//a[not(contains(@href, ‘example.com’))]/@href”, between quotations marks and close with parenthesis.
How to Use IMPORTXML in Google Sheets (Examples) – XPath query for external links
3.Press “Enter” to obtain all external links from the website, as shown below.
How To Share Only One Tab in Google Sheets
When sharing a Google Sheets spreadsheet Google usually tries to share the entire document. Here’s how to share only one tab instead.
READ MORE →
Use the “//link[@rel=’canonical’]/@href” to scrape the canonical link
A canonical link is the “preferred” version of a web page that webmasters specify to prevent duplicate content. This is how you can scrape a canonical link using the IMPORTXML function.
1.Repeat steps 1-3 from the previous section.
2.Include the XPath query that will scrape all links, “//link[@rel=’canonical’]/@href”, between quotations marks and close with parenthesis.
How to Use IMPORTXML in Google Sheets (Examples) – Include XPath query for canonical link
3.Press “Enter” to obtain the canonical link from the website, as shown below.
How to Use IMPORTXML in Google Sheets (Examples) – Scraped canonical link
If you’re interested in importing financial data in your Google Sheets, you can combine the IMPORTXML function with either theGOOGLEFINANCEorYAHOOFINANCEfunctions. However, this will require using a split function that is a bit more complex.
Google Sheets IMPORTXML is not working
If you’ve encountered errors when using the IMPORTXML function, such as “Could not fetch the URL” or “Imported content is empty”, check the following:
- Is the URL correct?The data you’re trying to scrape may have been moved to another URL, so it’s worth double-checking.
- Is it the protocol “https” or “http”?The protocol may have changed as well without you noticing it. Sometimes, websites are not optimized to redirect from one to the other, so you’ll need to make sure to use the proper one.
- Does the website block scraping?Some web owners include blocks to stop others from reading or scraping their web content.
Conclusion
As you can see, the IMPORTXML function in Google Sheets can be a powerful function when it comes to importing data from external sources, especially websites. The example provided here can be applied to any domain. However, if you own a website or work with web data, this function is the fastest and easiest way to perform web scraping without having to leave your spreadsheet.
You’ve learned what the IMPORTXML function is and the syntax behind the IMPORTXML formula. Through a set of useful examples, you’ve also seen how the IMPORTXML function can be used and applied to your data. You should also feel comfortable fixing any issue that may occur in case the IMPORTXML doesn’t seem to work properly.
If you want to learn more about how to import data, read the following articles below:
- How to use IMPORTDATA in Google Sheets
- How to use IMPORHTML in Google Sheets
- How to use IMPORTRANGE function In Google Sheets
- How to Import CSV to Google Sheets Automatically