Matplotlib charts 1

Let’s start a series of posts about matplotlib, a module to create charts and other visual data rappresentation tool.

Data on the screen

The first thing we are going to take a look at is the simplest chart ever with a sequence of data of my blog views. Here is the example from the matplot lib site.

import numpy as np
import matplotlib.pyplot as plt


N = 5
menMeans = (20, 35, 30, 35, 27)
womenMeans = (25, 32, 34, 20, 25)
menStd = (2, 3, 4, 1, 2)
womenStd = (3, 5, 2, 3, 3)
ind = np.arange(N)    # the x locations for the groups
width = 0.35       # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, menMeans, width, yerr=menStd)
p2 = plt.bar(ind, womenMeans, width,
             bottom=menMeans, yerr=womenStd)

plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), ('Men', 'Women'))

plt.show()

And this is the output

Let’s transform this into the most simple chart ever

I want to get a basic chart with just data across months for an imaginary blog views. So, after some deletings and some changes we have this code:

import numpy as np
import matplotlib.pyplot as plt


N = 10
ind = np.arange(N)    # the x locations for the groups
views = (634, 754, 937, 1300, 2200, 3000, 2800, 3600, 4200, 4600)
width = 0.35       # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, views, width)
plt.ylabel('Views')
plt.title('My blog views')
plt.xticks(ind, ('N\n18', 'D\n18', 'G\n19', 'F\n19', 'M\n19', 'A\n19', 'M\n19', 'J\n19', 'J\n19', 'A\n19'))
plt.yticks(np.arange(0, 81, 10))
plt.show()

And this brought me to this chart

Now I thing that I miss is the number of views on the left… right?

We need to adjust this

plt.yticks(np.arange(0, 81, 10))

to this

plt.yticks(np.arange(0, 5000, 1000))

The values on the vertical axe starts from 0 and ends at 5000 (because the max value is 4600) and the interval is set to 1.000.

So, now the result is this:

In this video the live coding of the most basic chart ever

Let’s give a look at the live coding of a basic bar chart, even more basic than the one in the code above. It’s a short video, a little more than 6 minutes, to see how easy it easy to create charts with this very famouse tool: matplotlib for Python.

 

What if I don’t like having two lists for views and months?

I do not like having to check what is the view corresponding to the mounth in different lists, so I decided to rearrange the code like this:

import numpy as np
import matplotlib.pyplot as plt


views = np.array([
	[634, "N", 2018],
	[754, "D", 2018],
	[937, "G", 2019],
	[1300, "F", 2019],
	[2200, "M", 2019],
	[3000, "A", 2019],
	[2800, "M", 2019],
	[3600, "J", 2019],
	[4200, "J", 2019],
	[4600, "A", 2019]
])
v= np.array([int(x[0]) for x in views])
m = np.array([x[1] + "\n" + str(x[2]) for x in views])
ind = np.arange(len(views))
p1 = plt.bar(ind, v, 0.35)
plt.ylabel('Views')
plt.title('My blog views')
plt.xticks(ind, m)
plt.yticks(np.arange(0, 5000, 1000))
plt.show()

The result is just the same.

Other types of charts

This time we want to take advantage of 3 different ways of showing data. We are going to use a dictionary to store data. I used the same data from the code above (into an array) and changed it into a dictionary. As I cannot have the same key in a dictionary I made them different concatenating the month with the year and adding two letters to the month, so that there are no month with the same name and year.

import matplotlib.pyplot as plt
import numpy as np

views = np.array([
	[634, "Nov", 2018],
	[754, "Dec", 2018],
	[937, "Gen", 2019],
	[1300, "Feb", 2019],
	[2200, "Mar", 2019],
	[3000, "Apr", 2019],
	[2800, "May", 2019],
	[3600, "Jun", 2019],
	[4200, "Jul", 2019],
	[4600, "Aug", 2019]
])

data = {}
for x in views:
	data[x[1] + "\n" + str(x[2][2:])] = int(x[0])
print(data)

names = list(data.keys())
print(names)
values = list(data.values())

fig, axs = plt.subplots(1, 3, figsize=(9, 3), sharey=True)
axs[0].bar(names, values)
axs[1].scatter(names, values)
axs[2].plot(names, values)
fig.suptitle('Categorical Plotting')
plt.show()

The result will be this if you run the code:

Now it’s up to you to choose the chart that better suits your data visualization aim. There is a clear raising of the views with a little ‘pause’ in may that could be interesting to consider to understand the reason of that fact. I think the most effective of the graph to get different behaviours is the third, while in the first I can see more clearly the gap between the views in the month next to each other. In the first one, in fact, I can see more clearly that there is a bigger gap among frb 19 and mar 19 then from the other (the first 3 month grow is slower, while from the fourth the increase is much bigger). The second chart is perhaps the less useful for this kind of data of sequences according to a timeline. This chart is more useful to compare different entities (like in a market analysis with comparison of different brands, for example, just to make an example) with more dimentions to compare (that can be show with different colors of the dots and different dimensions of the dots).

Not using dictionaries

We can achieve the same result as above with this code:

import matplotlib.pyplot as plt
import numpy as np

views = np.array([
	[634, "Nov", 2018],
	[754, "Dec", 2018],
	[937, "Gen", 2019],
	[1300, "Feb", 2019],
	[2200, "Mar", 2019],
	[3000, "Apr", 2019],
	[2800, "May", 2019],
	[3600, "Jun", 2019],
	[4200, "Jul", 2019],
	[4600, "Aug", 2019]
])

values = [int(x[0]) for x in views]
names = [x[1] + "\n" + x[2][2:] for x in views]

fig, axs = plt.subplots(1, 3, figsize=(9, 3), sharey=True)
axs[0].bar(names, values)
axs[1].scatter(names, values)
axs[2].plot(names, values)
fig.suptitle('Categorical Plotting')
plt.show()

The result being the same

Let’s experiment mixing different charts type

Why we do not merge/blend two type of chart together? This is some code to do that.

import matplotlib.pyplot as plt
import numpy as np

views = np.array([
	[637, "Nov", 2018],
	[754, "Dec", 2018],
	[937, "Gen", 2019],
	[1267, "Feb", 2019],
	[2166, "Mar", 2019],
	[3019, "Apr", 2019],
	[2829, "May", 2019],
	[3643, "Jun", 2019],
	[4189, "Jul", 2019],
	[5019, "Aug", 2019]
])
# Data for x and y
values = [int(x[0]) for x in views]
names = [x[1] + "\n" + x[2][2:] for x in views]
plt.plot(names, values)
# This is the chart with bars
ind = np.arange(len(views))
plt.bar(ind, values, 0.35)
plt.show()

This is the output:

Now we can have more visual informations about data at once. As we said previously, the most interesting charts for this data are the bar chart and the plot chart, now we’ve mixed them togheter, so we can have the best of both in one single chart, for a more intuitive visualization of the serie of data.

Published by pythonprogramming

Started with basic on the spectrum, loved javascript in the 90ies and python in the 2000, now I am back with python, still making some javascript stuff when needed.