Since the blow up of conspiracy theories around coronavirus, social media platforms like Facebook, Twitter, and Instagram have been actively working on scrutinizing and fact-checking to fight against misinformation. As more reliable sources get amplified, Twitter becomes more supportive than it was during the early stage of the outbreak. I figured it would be more interesting to hear the real public voice and discover the true sentiment regarding the coronavirus.
Don’t get intimidated by the word “Scraping.” If you can browse the web page, you are able to perform web scraping like a pro, even if you are a newbie. So bear with me.
The easiest way to find out the attitude is by collecting all tweets containing the word of coronavirus. I even narrow down the research scope by setting the language as English and Terrain within the United States. This will ensure the sample data sets stay consistent with the search topic and increase the accuracy of the prediction.
After the research scope is settled, we can now start scraping. I prefer using Octoparse when it comes to picking the best web scraping tool, it has auto-detecting features which saves me a lot of time on hand-picking and selecting the data.
Twitter is more dynamic as it has infinite scrolling, meaning tweets are showing up once we keep scrolling down the page. In order to get as many tweets as possible, I build a loop list to maintain the scrolling action while fetching the information. This ensures the scraping workflow stays consistent without interruption.
Next, I create an extraction action. Octoparse renders the web page as we input the search URLs. It will break down the web page structure into sub-component so I can click on the target element easily to set up a command and tell the robot — go get the information for me. As I click one of the tweets, the tips panel pops up suggesting to select the sub-elements.
There it is! A corresponding event is added to the workflow automatically. It also finds other tweets. Follow the tips guide, and click the “Select All” command. The final workflow should be like this:
The logic is simple: the scraper will first visit the page. Then it starts extracting the tweets until it finishes all the tweets inside the loop. It will repeat the scrolling action to locate another set of tweets and continue the extraction again until all the information is extracted successfully.
This is the final result I’ve got:
NLP for sentiment analysis:
NLP is the acronym for Natural Language Process. It has been widely used to analyze the sentiment of the text. The idea is to build a classifier model to calculate the word, and understand the connotations of the words that represent. For example, if I input a tweet, it should tell me whether the sentence is positive or negative. And obviously, a finer-grained sentiment classification is a more challenging task.
I already have a well-trained model, so I just use FastText to predict the tweets in this case. The result I got is like this,
As you can see, the tweets have been classified into two groups — Positive and Negative. There are probability scores as well. The higher the score it gets, the more accurate the prediction is. As for the scores that are around 0.5, they show a neutral sentiment that is neither positive nor negative.
I filter out the tweets that have scores less than 0.7 and make a graph:
As the graph shows 42.2% of the tweets are positive towards the novel coronavirus, whereas 57.8% of them are negative. Tweets that gained the most replied tend to be more positive. Whereas, the most-liked tweets appeared to be more negative. This result shows an ironic situation as the general public attitudes are showing a level of dichotomy rather than unity. This explains why there are people having protests on the economy reopening on one side, but concerning the winding down of the medical task force on the other. We are in a situation of paradoxy or uncertainty.
The news media always has the loudest voices to inform the public. But we know that most major players hold political perspectives that have profound impacts on our decision making. Especially, when the conspiracy theories stir in and make everything muddy in the water — this is a typical phenomenon during the crisis.
Apart from defeating the disease, we all should do our own parts in containing the spread of fears and loathe. How to stay clear-headed? Don’t just read one side of the story, listen to more voices. A critical practice and a more responsible account should be taken when we read the news, so we will never make mistakes like “ scarves ‘better’ than masks” in front of millions of people. Most importantly, we won’t blame but unite together to cure.
- How Can Dropshippers Learn from D2C Business? - August 11, 2021
- How Web Extraction Helps eCommerce Businesses Increase Profits - May 20, 2021
- How to Scrape Data to Fuel your Job board/Job aggregator - May 14, 2021