PROJECT 3 (web scraping): RAPE NEWS WEB-SCRAPER BOT
No doubt, the major challenge of data oriented professions in third world countries is lack of a reliable & democratic central database.
Nigeria for example is yet to solve this problem fully.
However, Data professionals need to collect data on their own, whether by traditional method or data mining.
As we all know: DATA IS LIFE
WEB SCRAPING PROCESS
So what is with RAPE NEWS Web_scraper?
Few weeks ago, a friend messaged me that he need a web-scraper for his project, he actually needs to get data on popular sites in Nigeria on rape cases during the lockdown (march - Septemeber 2020). He has done most of the work, but he feels he should spice and step things up by introducing Machine Learning, Data Science & Python stuffs. So In less than 3 hours, I completed it.
GUESS WHAT?
The initial manual data collection was about 40 something on about 15 different websites. But with Selenium, I was able to scrape over 200 rape-related news on just one site (Linda-Ikeji alone) in less than 2 mins of runtime.
WEB CRAWLER
Although, he later ask me to modify the code to suit their earlier work (he doesn’t want distruption). Which I did Well, the difference is clear. Isn’t it?
There are lot of web-scrapers like, Request, BeautifulSoup, lxml, Pandas, scrapy…but I prefer using Selenium for my scraping.
Initially, there was difficulty configuring a scraper, especially on Virtual environment Like Google Colab, But I was able to configure (headless-mode) and master the scrapers as time goes by.
Check my Github for Source code