Matplotlib: Histogram

A histogram is a diagrammatic representation of data as rectangles whose area is proportional to the class frequencies and whose width is equal to the class bin/interval. Unlike in a bar chart, the bars in a histogram can be of unequal width. If all the class intervals are of equal length, then the heights are proportional to the numbers.

In this tutorial, we will take as data the number of primes between 1 and 100 and create a histogram out of it using the the Matplotlib function hist().

We specify the bins (or intervals) between 0 and 100 as [0,20,40,60,80,100]. The first bin is [0, 20), which includes 0, but excludes 20. However, the last bin [80,100], includes 100.

Inside the hist() function, the facecolor property sets the colour of the bars (we set it to r - red); the alpha property sets its opacity and takes in values from 0 to 1. But the bars without an outline would look indistinguishable, so we set the edgecolor with a darker colour (say, k - black) of width 1 (linewidth = 1).

				
					from matplotlib import pyplot as plt 
					import numpy as np  					   
					a = np.array([2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,
						71,73,79,83,89,97]) 
					bins = [0,20,40,60,80,100]
					plt.hist(a, bins, facecolor='r', alpha=0.7, edgecolor='k', linewidth=1) 
					plt.title("Primes between 1 and 100")
					plt.xlabel("Primes")
					plt.ylabel("Frequency")
					plt.show()
				
			

If you save the above Python program as histogram.py, you can run it typing the command

				
					$python3 histogram.py
				
			
matplotlib histogram

We can specify the bins with just the number of intervals you require. The statement bins = [0,20,40,60,80,100] in the above program can be replaced by bins = 5.

				
					from matplotlib import pyplot as plt 
					import numpy as np  					   
					a = np.array([2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,
						61,67,71,73,79,83,89,97]) 
					bins = 5
					plt.hist(a, bins, facecolor='r', alpha=0.7, edgecolor='k', 
						linewidth=1) 
					plt.title("Primes between 1 and 100")
					plt.xlabel("Primes")
					plt.ylabel("Frequency")
					plt.show()
				
			

It will generate the same graph as above.

Multiple Histograms

Now let us plot two histograms in a single graph. We will consider another set of data between 1 and 100, say, exponents $n$ which give Mersenne Primes.

marin mersenne (Marin Mersenne via Wikimedia Commons, Public Domain

Mersenne Primes are prime numbers of type $2^{n} - 1$, for some integer $n$. Below are the exponents $n \lt 100$ which give Mersenne Primes:

[2, 3, 5, 7, 13, 17, 19, 31, 61, 89]

In our below Python program, we create another array b for it. Also note that there are 10 equal class intervals here, assigned to bins .

					
					from matplotlib import pyplot as plt
					import numpy as np
					a = np.array([2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,
						71,73,79,83,89,97]) # primes
					b=np.array([2,3,5,7,13,17,19,31,61,89]) # exponents
					bins = [0,10,20,30,40,50,60,70,80,90,100]
					plt.hist([a,b],bins,label=['Primes','Exponents for Mersenne Primes'])
					plt.legend(loc='upper right')
					plt.title("Primes & Exponents for Mersenne Primes")
					plt.xlabel("Primes & Exponents for Mersenne Primes")
					plt.ylabel("Frequency")
					plt.show()
					
				

On execution, the program plots the following graph.

matplotlib multiple histogram

Notes

  • If you do not want any outline on the bars, set edgecolor=none inside the hist() function.