Where I Sometimes Write Stuff

Plotting Data with Python and matplotlib
2015-03-17

I've recently become fascinated by data visualization, and how large sets of data can be distilled down to an easily consumed visualization.

Luckily data visualization appears to be a great fit for my other learning interest, Python. I'm slowly working my through the different aspects of the language, and as it is my first language, I still have a long way to go. I find writing about the things I'm learning and experimenting with helps me retain the knowledge as well as finding errors or places for improvement.

So, I've been playing around with Matplotlib and wanted to figure out a way to easily plot some data from an external file.

I decided to create some fictitious comic book sales numbers for the past 60 or so years and then try to plot them on a standard line graph.

Here's how I worked throug the problem...

Creating My Fake Data

My first issue was creating some fake data that I could plot.

As a sidenote, I'm currently looking for some actual data to try this on, but this was just a quick project that I wanted to get done in an evening.

First, I needed  to create the file where I was going to be writing the years and sales data to. Then I needed to generate the years between 1945 and 2014. You could type these out by hand, but that would take forever. So I used range()to speed up that process.

Finally I needed to generate some fake sales numbers, and combine the years with their fake sales data. I did this using random.randrange().

```
import random
comicYears = open("comicsSold.txt", "a")
years = range(1945, 2015)

for year in years:
sold = random.randrange(5000, 150000000)
sold = str(sold)
comicYears.write(str(year) + ',' + sold + '\n')

```

Here I am using a for loop that creates a fake sales number for for each year, converts them to a string and concatanates the year and the fake sales data, seperated by a comma.

Here's what the code looks like put together

```
import random
comicYears = open("comicsSold.txt", "a")
years = range(1945, 2015)

for year in years:
sold = random.randrange(5000, 150000000)
sold = str(sold)
comicYears.write(str(year) + ',' +
sold + '\n')

```

Here's a gist of this code for easier readability

The output looks like this

```
1945,114950089
1946,5327739
1947,92066212
1948,8359428
1949,104528851
1950,87344945
1951,111024866
1952,85318191
1953,146137175
1954,97641070
1955,144609067
1956,142349233
1957,64969373

```

Now I've got my fake data that I will be plotting with matplotlib.

Now it's time to start graphing!

I'm going to use the popular Python library, matplotlib to do the heavy lifting of my graphing.

Lets start by importing the matplotlib library so we can have access to it's modules. The plt.ion() will make our graph interactive.

```
import matplotlib.pyplot as plt
plt.ion()
file_open = open('comicsSold.txt', 'r')
years = []
sold = []

```

So, now I've got a series of strings with a year and a sales number separated by a comma. Next, I'll need to add all my years to a list and all my sales figures to a sales list so that I can use them to plot.

```
for line in file_split:
year = line.split(',')
num = line.split(',')
years.append(year)
sold.append(num)

```

Next I want to set up the correct x axis that will display the range of years I'm working with. To do this I need to access the first year and last year in my list.

```
first_year = years
last_year = years[-1]

```

This last bit of code below sets my x axis to the range of the year I'm plotting. The reason I had to use +1 is because the range() function is exclusive so if I didn't include +1 then the last year in my year list would not get included.

The problem with this is (besides my graph being inaccurate) you can only plot when you have the same number of x axis elements and y axis elements, or you'll get an error.+1 will make range() inclusive and get me that last year, 2014, making my 2 axis's equal.

```
x = range(int(first_year), int(last_year)+1)

plt.xlabel('Years')
plt.ylabel('# of Comics Sold')
plt.title('Comic Book Sales Data')
plt.grid(True)

plt.plot(x, sold)
plt.show()

```

Here is code all together

```
import matplotlib.pyplot as plt
plt.ion()

file_open = open('comicsSold.txt', 'r')

years = []
sold = []

for line in file_split:
year = line.split(',')
num = line.split(',')
years.append(year)
sold.append(num)

first_year = years
last_year = years[-1]

x = range(int(first_year), int(last_year)+1)

plt.xlabel('Years')
plt.ylabel('# of Comics Sold')
plt.title('Comic Book Sales Data')
plt.grid(True)

plt.plot(x, sold)
plt.show()

```

And here's the result Here's a gist of this script