Visualizing Suicide Trends Using Microsoft Power BI
An Introduction to Interactive Data Visualization
“A picture is worth more than a thousand word”
If you are like me, you probably enjoy seeing movies, watching YouTube or Tiktok videos or scrolling through your Facebook timeline than reading a full book. Humans generally tend to assimilate and relate to visuals better than every other form of communication. Thus visualization is one of the key skills required of a data scientist/ data analyst. The ability to generate pictorial insights from raw data and communicate our results effectively to other people either technical, non-technical or management team is very important.
In this article, we would be exploring interactive visualization to draw insights on the causes and trends of suicide over a number of years. We would do so in two sections. The first section briefly introduces us to data visualization and the tool we would be using in our visualization (power BI) while the latter delves straight to visualizing and drawing insights from our dataset. If you are already familiar with the fundamentals of visualization, you can scroll down to the actual visualization part instead
What is Data Visualization?
We can simply refer to data visualization as the graphical representation of information and data using visual elements such as charts, graphs, timelines, maps, etc. Generally, Data visualization can be categorized as explanatory or exploratory visualization. It is important that we understand the distinction between these two categories as they serve different purposes, hence require slightly different tools and technique.
We can use Exploratory visualization to find the story behind our dataset. It is useful when we are considering an unfamiliar dataset. exploratory visualization helps us get a sense of what’s going on inside our dataset, we can quickly identify data features, including interesting correlations, trends, or/and outliers. We can consider our visualization in this article to be exploratory in nature.
While explanatory visualization is appropriate for story telling when we already know what the data has to say. It is useful for presentation to an audience such as team members, manager, general public, etc. This involves f selecting focused data that will support the story we are trying to tell and removing any distracting or irrelevant details.
Data visualization tools
A data visualization tool is a software that takes data from different sources and presents it in visual charts, graphs, tables, dashboards, etc. Depending on their features, Data visualization tools can create anything from simple pie graphs to complicated interactive choropleths. The learning curve for each tool varies, some are easy to use, while others are a bit more challenging. some of my favorite visualization tools so far are: Tableau, Microsoft Power BI and Python visualization packages -Matplotlib and Seaborn (forgive my bias for now, might add to the list as I skill up). Here in this article, We would be exploring Power BI to visualize an interesting dataset. I am hoping to find the time to write on the other mentioned visualization tools soon.
Power BI
Microsoft introduced Power BI in 2014 and describes it as “a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website.” It is a suite of business intelligence (BI), reporting, and data visualization products and services for individuals and organizations.
There are different versions of Power BI available, as summarized below:
- Power BI Desktop — Free, can be downloaded by individuals for nosmall to midsize businesses
- Power BI Service: secure Microsoft hosted cloud service that lets users view dashboards, reports, and Power BI apps
- Power BI Pro — Paid per-user license, needed to get access to advanced features and the ability to share reports
- Power BI Premium — Licenses by scale, intended for large businesses and enterprises
- Power BI Mobile — Device-based app for phones and tablets
- Power BI Embedded — A white-label version of Power BI which Independent Software Vendors can embed in their own apps, rather than build their own analytical features
- Power BI Report Server — An on-premise version of the Power BI Desktop app for businesses that need to keep their data and reports on their own servers
The Power BI product is made up of a number of apps such as Power Query, Power Pivot, Power View, Power Map, and Power Q&A. Each product has its own features and uses. The detailed Power BI documentation is a great guide to using the software.
Now that we’ve gotten a good grabs of data visualization and power BI as our focus data visualization tool, we would proceed to get our hands dirty with some practice.
The dataset
We would be looking at a compiled dataset pulled from four other datasets linked by time and place. The data is built to find signals correlated to increased suicide rates among cohorts globally across the socio-economic spectrum. we would try to derive some insights through exploratory visualization. It’s important to note that, in developing countries especially African countries, causes of death are not properly documented (or such documents are not readily available to data analyst). Hence, our data set might not cover the story of suicides in these countries.
First, we load load and clean the data. Data transformation/cleaning in is done using Power query. we check that our variables have the right data types and formatting.
By exploring the Power BI chart and map tools, we can answer some questions such as:
Which country has the highest rate of suicide?
A map viz helps us answer this question efficiently. With country as the longitude and latitude and number of suicide as the size of the dots. We can create a DAX (Data Analysis Expressions: an expression or formula used for data analysis and calculations in Power BI) to calculate the percentage of each population that committed suicide in a country or generation. From the size of the markers, We can see that the Russian Federation (Asia) recorded the highest rate of suicide between 1985 and 2016, followed by the United states and Japan. However, it is important that we take the population size of various countries into consideration. When considered by percentage, Lithuania (Europe) appears to have the highest suicide rate with 4.12%.
In what year so far has the highest rate of suicide been recorded?
Is there an increasing or decreasing trend of suicide over the years?
A column chart with number of suicide on the y-axis and years on the x-axis shows us that, the highest number of suicide occurred in 1999 with a record of 0.26million suicide cases. However the numbers haven’t be on an increase or decline but have rather been fluctuating over the years with 1985 recording the least number 0.12million. (NB: I have deliberately ignored 2016 as it appears that the data is incomplete judging from the relative nearness of the figures in every other year.)
Which generation committed suicide the most by percentage?
Our visual shows that the greatest generation (G.I. generation) had the highest suicide rate. With the world war and all, it’s not surprising. We seem to have witnessed a decreased suicide rate down the generations, except for the Baby boomers.
Power BI gives us the ability to present our visuals on a single report page . we can also use slicers to drill down for more granular details on our data. I used the multi-card feature to show numerical summaries at a glance.
Wondering which gender is more likely to commit suicide?
A pie Chart can answer this question for us. we can use the quick measure feature to calculate percentage suicide filtered by gender. Our viz shows that males are more likely to commit suicide that females. Maybe because of societal fear of failure or inability to express emotions. That’s up to further analysis or intuition to decide.
Which age range has a higher tendency to commit suicide?
Using a line chart, we would see more middle-aged (35–54) people have committed suicide over time time. This could be because this age comes with lots of life challenges and bills.
Does country’s GDP (Gross domestic product) have any correlations with the rate of suicide?
Using a scatter plot we visualize the relationship that exist between GDP and number of suicide per country. Countries with low GDP such Lithuania and Russian Republic tend to have a higher rate of suicides. However, this is not absolute, as we can see that some other countries with very low GDP still have low suicide rates.
Power BI is very interactive and we can click around our dashboards for specific details. Unfortunately, as at the time of publishing this article, I haven’t been able to activate a premium power BI account to enable me publish my dashboards here. I do hope the screenshots provide effective visuals for us all.
There are a ton of other insights and questions that can be derived from our dataset, however I wouldn’t like to keep you here any longer. I hope you enjoyed exploring Power BI and practicing some visualization with me?
I would appreciate your claps, criticism and feedback in the comments. You should also share with your buddies to let them know what concepts you are reading.