A Brief Introduction to Plotting in Python

Datascience George
5 min readApr 30, 2020

--

Plotting using matplotlib.pyplot

In this blog I demonstrate how to make several plots in python using matplotlib. Simply import matplotlib.pyplot with an alias (I use “plt”) and follow the instructions to start making plots!

import matplotlib.pyplot as plt

Before making plots consider you can change the size of the plot by calling plt.figure and passing in a figsize. Also you can set labels such as a title, a label for the x-axis, and a label for the y-axis. Finally make sure to call plt.show() after every plot to prevent code showing above your plot. Here are some examples of how to do all that.

plt.figure(figsize=(7, 5) # Changes size to be 7 units wide & 5 tall
plt.title('Speed') # Sets a title above the plot to "Speed"
plt.xlabel('Time') # Sets the x-axis label to "Time"
plt.ylabel('Distance') # Sets the y-axis label to "Distance"
plt.show() # Makes sure only the plot is shown

Histograms

Histograms are good for finding the distribution of a set of data. They the frequency that a value appears in a set of data. A histogram only needs one variable, I will call it “x”. To plot a histogram call plt.hist() and pass in x. Below is a fictional example of daily temperatures in the month of July. In this example it can be seen that the majority of days had a temperature between 65 and 75 degrees.

import matplotlib.pyplot as plt # this only has to be done once
import numpy as np # for data generation
base_temp = [65] # base temperature
x = [i + 25*np.random.rand() for i in base_temp*31] # temperatures
# plot
plt.figure(figsize=(7, 5))
plt.hist(x)
plt.title('Daily Temperatures in July')
plt.xlabel('Temperature')
plt.ylabel('Frequency')
plt.show()

Bar Plots

Bar plots are good for comparing the values of multiple variables. Some examples could be showing the GDP of different countries or the win rate of different sports teams. To make a bar plot you need two variables, I will call them “x” and “y”. The x variable can be a list or array of numbers or strings. The y variable should be a list or array of numbers (integers or floats). The lists or arrays must be ordered. The first item in x should match the first item in y and so on. To make a bar plot call pyplot.bar() and pass in the x and y variables in that order. Below is an example of a bar plot showing different countries’ GDP’s that uses data I took from wikipedia

import matplotlib.pyplot as plt # This only has to be done oncex = ['Japan', 'Germany', 'India'] # Different countries' names
y = [5.2, 3.9, 2.9] # Different countries' GDP's
# plot
plt.figure(figsize=(7, 5))
plt.bar(x, y)
plt.title("Different Countries' GDP's")
plt.ylabel('GDP')
plt.show()

Scatter Plots

Scatter plots are good at checking the relationships between variables. For example it may be important to check if two sets of data have a linear relationship. Just like bar plots, scatter plots require two sets of data, x and y, and they must be ordered. Their is a difference in scatter plots that both sets of data have to be numerical (integer or float) as these values will coordinate where a point lies between the x and y axes. To make a scatter plot simply call plt.scatter() and pass in x and y in that order. Below is an example of a scatter plot showing a fictional relationship between bike sales and advertising.

import matplotlib.pyplot as plt # This only has to be done once
import numpy as np # for data generation
x = np.linspace(100, 200, 40) # Dollars spent on advertisementy = [i*10 + 500*np.random.rand() for i in x] # Sales in dollars# plot
plt.figure(figsize=(7, 5))
plt.scatter(x, y)
plt.title('Advertisement and Sales')
plt.ylabel('Sales')
plt.xlabel('Dollars Spent on Advertisement')
plt.show()

Line Plots

Line plots are good for showing how something changes over time. A Lineplot is much like a scatter plot but all the points are connected. One thing to keep in mind is that in a line plot the values need to be in ascending or descending order or else the line will jump around. To make a line plot simply call plt.plot() and pass in x and y. Below is a fictional example of a company’s stock prices over the last 50 days illustrated with a line plot.

import matplotlib.pyplot as plt # this only has to be done once
import numpy as np # for data generation
x = np.linspace(0, 50, 90) # Days
y = [5000 + 0.7*i + 20*np.random.rand() for i in x] # Stock prices
# plot
plt.figure(figsize=(7, 5))
plt.plot(x, y)
plt.title('Company Stock Prices Over Last 50 Days')
plt.ylabel('Stock Price')
plt.xlabel('Day')

Final Notes

Now that you know the basics of plotting it is time to introduce colors and alpha levels. Pass in a color parameter (i.e. “color=’green’) to change the color of your plot and an alpha parameter (i.e. “alpha=0.7”) to change its transparency! Also to show two plots on top of each other simply call the plotting method twice before calling plt.show(). If plotting more than one plot on top of another make sure to pass in a label parameter (i.e. “label=’customers’”) to your plots and use plt.legend() at the end to create a legend for your plot!

import matplotlib.pyplot as plt # This only has to be done once
import numpy as np # for data generation
x = np.linspace(100, 200, 40) # advertisement
np.random.shuffle(x)
y1 = [i*10 + 500*np.random.rand() for i in x] # sales
y2 = [i*10 + 1000*np.random.rand() for i in x] # customers
y3 = [i*10 + 250 for i in x] # trend
# plot
plt.figure(figsize=(7, 5))
plt.scatter(x, y1, color="blue", label='customers * 100')
plt.scatter(x, y2, color="green", label='sales')
plt.plot(x, y3, color="red", alpha=0.7, label='trend')
plt.title('Effects of Advertisement')
plt.xlabel('Dollars Spent in Advertisement')
plt.ylabel('Sales')
plt.legend()
plt.show()

Sources

  1. Wikipedia on different countries’ GDP’s. https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)

--

--

Datascience George
Datascience George

Written by Datascience George

Data scientist learning at Flat Iron School

No responses yet