A Brief Introduction to Plotting in Python
Plotting using matplotlib.pyplot
In this blog I demonstrate how to make several plots in python using matplotlib. Simply import matplotlib.pyplot with an alias (I use “plt”) and follow the instructions to start making plots!
import matplotlib.pyplot as plt
Before making plots consider you can change the size of the plot by calling plt.figure and passing in a figsize. Also you can set labels such as a title, a label for the x-axis, and a label for the y-axis. Finally make sure to call plt.show() after every plot to prevent code showing above your plot. Here are some examples of how to do all that.
plt.figure(figsize=(7, 5) # Changes size to be 7 units wide & 5 tall
plt.title('Speed') # Sets a title above the plot to "Speed"
plt.xlabel('Time') # Sets the x-axis label to "Time"
plt.ylabel('Distance') # Sets the y-axis label to "Distance"
plt.show() # Makes sure only the plot is shown
Histograms
Histograms are good for finding the distribution of a set of data. They the frequency that a value appears in a set of data. A histogram only needs one variable, I will call it “x”. To plot a histogram call plt.hist() and pass in x. Below is a fictional example of daily temperatures in the month of July. In this example it can be seen that the majority of days had a temperature between 65 and 75 degrees.
import matplotlib.pyplot as plt # this only has to be done once
import numpy as np # for data generationbase_temp = [65] # base temperature
x = [i + 25*np.random.rand() for i in base_temp*31] # temperatures# plot
plt.figure(figsize=(7, 5))
plt.hist(x)
plt.title('Daily Temperatures in July')
plt.xlabel('Temperature')
plt.ylabel('Frequency')
plt.show()
Bar Plots
Bar plots are good for comparing the values of multiple variables. Some examples could be showing the GDP of different countries or the win rate of different sports teams. To make a bar plot you need two variables, I will call them “x” and “y”. The x variable can be a list or array of numbers or strings. The y variable should be a list or array of numbers (integers or floats). The lists or arrays must be ordered. The first item in x should match the first item in y and so on. To make a bar plot call pyplot.bar() and pass in the x and y variables in that order. Below is an example of a bar plot showing different countries’ GDP’s that uses data I took from wikipedia.¹
import matplotlib.pyplot as plt # This only has to be done oncex = ['Japan', 'Germany', 'India'] # Different countries' names
y = [5.2, 3.9, 2.9] # Different countries' GDP's# plot
plt.figure(figsize=(7, 5))
plt.bar(x, y)
plt.title("Different Countries' GDP's")
plt.ylabel('GDP')
plt.show()
Scatter Plots
Scatter plots are good at checking the relationships between variables. For example it may be important to check if two sets of data have a linear relationship. Just like bar plots, scatter plots require two sets of data, x and y, and they must be ordered. Their is a difference in scatter plots that both sets of data have to be numerical (integer or float) as these values will coordinate where a point lies between the x and y axes. To make a scatter plot simply call plt.scatter() and pass in x and y in that order. Below is an example of a scatter plot showing a fictional relationship between bike sales and advertising.
import matplotlib.pyplot as plt # This only has to be done once
import numpy as np # for data generationx = np.linspace(100, 200, 40) # Dollars spent on advertisementy = [i*10 + 500*np.random.rand() for i in x] # Sales in dollars# plot
plt.figure(figsize=(7, 5))
plt.scatter(x, y)
plt.title('Advertisement and Sales')
plt.ylabel('Sales')
plt.xlabel('Dollars Spent on Advertisement')
plt.show()
Line Plots
Line plots are good for showing how something changes over time. A Lineplot is much like a scatter plot but all the points are connected. One thing to keep in mind is that in a line plot the values need to be in ascending or descending order or else the line will jump around. To make a line plot simply call plt.plot() and pass in x and y. Below is a fictional example of a company’s stock prices over the last 50 days illustrated with a line plot.
import matplotlib.pyplot as plt # this only has to be done once
import numpy as np # for data generationx = np.linspace(0, 50, 90) # Days
y = [5000 + 0.7*i + 20*np.random.rand() for i in x] # Stock prices# plot
plt.figure(figsize=(7, 5))
plt.plot(x, y)
plt.title('Company Stock Prices Over Last 50 Days')
plt.ylabel('Stock Price')
plt.xlabel('Day')
Final Notes
Now that you know the basics of plotting it is time to introduce colors and alpha levels. Pass in a color parameter (i.e. “color=’green’) to change the color of your plot and an alpha parameter (i.e. “alpha=0.7”) to change its transparency! Also to show two plots on top of each other simply call the plotting method twice before calling plt.show(). If plotting more than one plot on top of another make sure to pass in a label parameter (i.e. “label=’customers’”) to your plots and use plt.legend() at the end to create a legend for your plot!
import matplotlib.pyplot as plt # This only has to be done once
import numpy as np # for data generationx = np.linspace(100, 200, 40) # advertisement
np.random.shuffle(x)y1 = [i*10 + 500*np.random.rand() for i in x] # sales
y2 = [i*10 + 1000*np.random.rand() for i in x] # customers
y3 = [i*10 + 250 for i in x] # trend# plot
plt.figure(figsize=(7, 5))
plt.scatter(x, y1, color="blue", label='customers * 100')
plt.scatter(x, y2, color="green", label='sales')
plt.plot(x, y3, color="red", alpha=0.7, label='trend')
plt.title('Effects of Advertisement')
plt.xlabel('Dollars Spent in Advertisement')
plt.ylabel('Sales')
plt.legend()
plt.show()
Sources
- Wikipedia on different countries’ GDP’s. https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)