Popular Scraping libraries in C#
Web scraping/Data scraping is the process of extracting data from websites. In C#, several libraries can be used to perform web scraping.
One of the most important points I should add here is that there are two words we use in the field of scraping.
1. Scraper: the software used in data extraction from any source.
2. Scrapper: the person/programmer who writes software that does scraping.
Here are some of the most popular packages that we can use in C#:
1. HtmlAgilityPack
2. ScrapySharp
3. AngleSharp
All of the mentioned packages are used interchangeably based on programmer interests and needs.
1. HtmlAgilityPack
This library allows you to parse HTML documents and extract data from them. It supports XPath expressions, which makes it easy to navigate through the document and extract specific data.
Here's an example of how to use HtmlAgilityPack to extract the title and meta description from a web page:
using HtmlAgilityPack;
var web = new HtmlWeb();
var doc = web.Load("http://example.com");
var titleNode = doc.DocumentNode.SelectSingleNode("//title");
var descriptionNode = doc.DocumentNode.SelectSingleNode("//meta[@name='description']");
var title = titleNode.InnerText;
var description = descriptionNode.GetAttributeValue("content", "");
2. ScrapySharp
This is a web scraping framework that is built on top of HtmlAgilityPack. It provides a higher-level API for performing web scraping, making it easier to write complex scrapers.
Here's an example of how to use ScrapySharp to extract links from a web page:
// Include library references on top of the .cs file.
using ScrapySharp.Extensions;
using ScrapySharp.Network;
//Create an instance of the main scraping class of this library
var browser = new ScrapingBrowser();
//Navigate to the specific webpage by putting an address
var page = browser.NavigateToPage(new Uri("http://example.com"));
//This is how we can query the webpage using CSS selectors/Xpath
var links = page.Html.CssSelect("a").Select(a => a.Attributes["href"].Value);
3. AngleSharp
This is another library for parsing and manipulating HTML documents. It supports both CSS selectors and XPath expressions.
Here's an example of how to use AngleSharp to extract the text of all the paragraphs on a web page:
using AngleSharp;
using AngleSharp.Dom;
var config = Configuration.Default.WithDefaultLoader();
var document = await BrowsingContext.New(config).OpenAsync("http://example.com");
var paragraphs = document.QuerySelectorAll("p").Select(p => p.TextContent);
These are just a few examples of the libraries that are available for web scraping in C#. There are many other options as well, so you should choose the one that best fits your needs.
With that said, I recommend HTMLAgility as it is the best library with handy extension methods that can help you scrap data very effectivity and quickly. Furthermore, this library is very easy to learn.
I will write more about scraping using HTMLAgility packages in my upcoming articles.
Support Me
Join me on Codementor for more helpful tips. Make sure to like and Follow to stay in the loop with my latest articles on different topics including programming tips & tricks, tools, Framework, Latest Technologies updates.
Please support me on PATREON on below link.
Support me on Patreon
Thank you very much for supporting me.
I would love to see you in the followers list on code mentor.
Stay tuned and stay updated!
Thank you !
Thank you