Weather data visualization for San Francisco Bay Area – a Python Pandas and Matplotlib Tutorial

 In Data Science

Weather data is a great type of input when starting to learn tools and technologies for your data science skills. This project will introduce us to the basics of Pandas and Matplotlib Python libraries using data for San Francisco, San Mateo, Santa Clara, Mountain View and San Jose in California.

Why Pandas?

Pandas is an open source Python library which will help with data cleaning, manipulation, transformation, and visualizations. This project will introduce basic concepts like

  • data cleaning
  • data frames
  • data manipulation
  • data transformation
  • data visualisation

Weather data analysis – the code

Let’s start with defining all of the classes and functions we need and loading the weather data. You can download the weather.csv file here.

Then we make the variables names a little friendlier for users:

After that we can delete the unused column in CSV file and remove the unreal samples (i.e. -100 degrees Celsius) – this makes our chart more accurate. In our example, we remove unrealistic temperature values. For this we will use the following code:

Now we are going to extract a list of unique ZIP code values from CSV file. This will allow us to determine how many different cities we have in our data.

The results are the following ZIP codes, which we will use in the next steps.

After that, we can create plots of the data, such for example for mean temperature in San Francisco Area using following code:

Pandas linear plot Temperature

…and that’s it 🙂

Want to launch the project? Download code from GitHub .

If you have any questions about the project, the libraries or this post, please ask the questions in the comments.


Recommended Posts