A few days ago I published an article and analyzed the social impact of the coronavirus (COVID-19) in China . However, some people in general still lack a full understanding about this outbreak. I thought it’d be interesting to visualize the situation from a more objective perspective.
Please note that the data I’ve collected is up to February 11. As you read this article, the data may be off the mark and can’t reflect the current situation of this outbreak. I will explain there is an easy way to keep up with the live data later in the article. I used a web scraping tool to extract data instead of coding since it can transit the data to a feasible format without data cleaning.
Choose a data source:
If you google Coronavirus data, I’m sure you will find many resources. Sources like Kaggle is secondary data collected by others which lags behind the latest data from the primary source like the Chinese official health website. If you are a data analyst who has strict standards regarding the accuracy and timeliness, you should avoid drawing conclusions with the secondary data. So what source should you use? Primary data is what you choose. At this point, I chose Coronavirus Update Source as it is saved as JSONs, enabling us to stream the data for individual cities to our system through an API pipeline. ( Read this guideline of a JSON file )
Another way to extract the live data is using a scraping template as I did from the last article. It’s a cut and dry solution for people who can’t do coding (Watch this video to get details). You can set a task scheduler in order to get up-to-the-minute data. Here is the data I’ve collected and feel free to play with it.
Data Visualization with Tableau
After getting a sheer volume of data, we can upload it to Tableau. I first create a map layer by simply dragging the Province/State to the drop fields. After that I add time-series and accumulate values to give a full look of the data trends over each province. I draw out Hubei province as I can take special care of its data trends. The map shows a historical spread of Coronavirus over the last 20 days since January 22nd. As of February 11th, the number of confirmed infections in Hubei alone hit 33,366.
We can tell that besides Hubei, this outbreak has a large impact on Guangdong, Zhejiang, Hunan and Henan as well.
Notice the reported cases from Hubei are significantly greater than all the others combined. I create a group and divide them into two categories: Hubei and Others. To get a better idea where this outbreak leads to, I also add trend lines to analyze the current situation.
Both Hubei and others begin to slide underneath the trendline which indicates a tendency of declining in confirm cases. However, the death toll doesn’t show a positive change as the numbers are still above the trendline.
The recovery rate among provinces besides Hubei seems to be some cheerful news as the trendline is stiffer over the time, and more places move upwards with an indication of inclining in the recovery. The recovery rate will continue to grow as people now are taking prompt actions to defeat the virus.
I made animation since it is a great way to understand the big picture where we are able to see the progression of this outbreak. Once we visualize the data, it becomes much easier to analyze. The biggest challenge in data analysis is data collection. I usually would invest most of the time on mindless labor work. Often, I also need to repair the data format manually. I found that a web scraping tool can greatly elevate the productively. However, I wouldn’t recommend abusing and scraping any website excessively. This would lead to serious legal consequences. Check out this article for more information: Is web crawling legal?
I will work to improve the visualization and feel free to share your thoughts and email me.