Web scraping is pulling the HTML of a website down and parsing useful data out of it. The use-cases for this type of functionality are endless. Have a bunch of data on governmental sites that are only listed online in HTML without a download? There's an API for that! Do you want to keep abreast of what your competitors are featuring on their site? There's an API for that. Need alerts for changes on a website, for example enrollment is now open at your college and you want to be first to get in and avoid the 8am Monday morning course slot? There's an API for that. <br/> <br/> That API is screen scraping and Attila Tóth from ScrapingHub is here to tell us all about it.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Attila Tóth on LinkedIn</b>: <a href="https://ift.tt/2G7HWm8" target="_blank" rel="noopener">linkedin.com</a><br/> <b>Scrapy project</b>: <a href="https://scrapy.org/" target="_blank" rel="noopener">scrapy.org</a><br/> <b>Scrapinghub on Twitter</b>: <a href="https://twitter.com/scrapinghub" target="_blank" rel="noopener">@scrapinghub</a><br/> <b>Scrapinghub</b>: <a href="https://ift.tt/1sM8ueu" target="_blank" rel="noopener">scrapinghub.com</a><br/> <b>cookiecutter template for Scrapy projects</b>: <a href="https://ift.tt/3mML5Ja" target="_blank" rel="noopener">github.com</a><br/> <b>Splash: headless browser designed specifically for web scraping</b>: <a href="https://ift.tt/303L8qi" target="_blank" rel="noopener">scrapinghub.com/splash</a><br/> <b>Awesome Web Scraping list</b>: <a href="https://ift.tt/1TPVJgT" target="_blank" rel="noopener">github.com</a><br/> <br/> <b>Talk Python episode 50 on web scraping</b>: <a href="https://ift.tt/22ibVfa" target="_blank" rel="noopener">talkpython.fm</a><br/> <b>How Web Scraping is Revealing Lobbying and Corruption in Peru</b>: <a href="https://ift.tt/3iVFVrS" target="_blank" rel="noopener">blog.scrapinghub.com</a><br/> <b>Web Data Extraction Summit event</b>: <a href="https://ift.tt/32M6V87" target="_blank" rel="noopener">extractsummit.io</a><br/></div><br/> <strong>Sponsors</strong><br/> <br/> <a href='https://ift.tt/2PVc9qH Python Training</a><br> <a href='https://ift.tt/34qNZMT>
from Planet Python
via read more
Subscribe to:
Post Comments (Atom)
TestDriven.io: Working with Static and Media Files in Django
This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...
-
Podcasts are a great way to immerse yourself in an industry, especially when it comes to data science. The field moves extremely quickly, an...
-
Graph traversal algorithms are used to perform various operations on a graph data structure. In this article, we will use the breadth-first ...
-
In an earlier tutorial we've already covered how to open dialog windows. These are special windows which (by default) grab the focus o...
No comments:
Post a Comment