{"id":3054,"date":"2018-08-12T14:04:29","date_gmt":"2018-08-12T14:04:29","guid":{"rendered":"https:\/\/ermlab.com\/?p=3054"},"modified":"2018-09-12T20:52:05","modified_gmt":"2018-09-12T20:52:05","slug":"pandas-weather-data-visualization-tutorial","status":"publish","type":"post","link":"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/","title":{"rendered":"Weather data visualization for San Francisco Bay Area &#8211; a Python Pandas and Matplotlib Tutorial"},"content":{"rendered":"<p>Weather data is a great type of input when starting to learn tools and technologies for your data science skills. This project will introduce us to the basics of Pandas and Matplotlib Python libraries using data for San Francisco, San Mateo, Santa Clara, Mountain View and San Jose in California.<\/p>\n<p><!--more--><\/p>\n<h2><b>Why Pandas?<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Pandas is an open source Python library which will help with data cleaning, manipulation, transformation, and visualizations.\u00a0<\/span><span style=\"font-weight: 400;\">This project will introduce basic concepts like<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">data cleaning<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">data frames<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">data manipulation<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">data transformation<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">data\u00a0visualisation<\/span><\/li>\n<\/ul>\n<h2><b>Weather data analysis &#8211; the code<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Let\u2019s start with defining all of the classes and functions we need and loading the weather data. You can download the weather.csv file\u00a0<\/span><a href=\"https:\/\/www.data.gov\/\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<pre class=\"lang:python decode:true \">import numpy as np\r\nimport pandas as pd\r\nimport matplotlib as mpl\r\nimport matplotlib.pyplot as plt\r\nimport seaborn as sns\r\n\r\n# Import San Francisco Bay Area Weather data from CSV file\r\ndata = pd.read_csv('weather.csv')<\/pre>\n<p>Then we make the variables names a little friendlier for users:<\/p>\n<pre class=\"lang:python decode:true \"># Make variables some friendlier names for users\r\nold_names = ['Max TemperatureF', 'Min TemperatureF', 'Mean TemperatureF', 'Max Dew PointF', 'MeanDew PointF',\r\n             'Min DewpointF', 'Max Humidity',\r\n             ' Mean Humidity', ' Min Humidity', ' Max Sea Level PressureIn', ' Mean Sea Level PressureIn',\r\n             ' Min Sea Level PressureIn', ' Max VisibilityMiles', ' Mean VisibilityMiles',\r\n             ' Min VisibilityMiles', ' Max Wind SpeedMPH', ' Mean Wind SpeedMPH', ' Max Gust SpeedMPH', 'PrecipitationIn',\r\n             ' CloudCover', ' WindDirDegrees', ' Events']\r\nnew_names = ['maxTemp', 'minTemp', 'meanTemp', 'maxDew', 'meanDew', 'minDew', 'maxHum', 'meanHum', 'minHum', 'maxPress',\r\n             'minPress', 'meanPress', 'maxVis', 'meanVis',\r\n             'minVis', 'maxWind', 'meanWind', 'maxGust', 'preIn', 'cloud', 'WindDir', 'events']\r\ndata.rename(columns=dict(zip(old_names, new_names)), inplace=True)<\/pre>\n<p><span style=\"font-weight: 400;\">After that we can delete the unused column in CSV file and remove the unreal samples (i.e. -100 degrees Celsius) &#8211; this makes our chart more accurate. In our example, we remove unrealistic temperature values. For this we will use the following code:<\/span><\/p>\n<pre class=\"\"># Delete unused column in CSV File\r\n\r\ndel data['preIn']\r\n\r\n# Remove the bad samples in temperature\r\ndata = data[(data['maxTemp'] &lt;= 110) &amp; (data['minTemp'] &gt;= 25)]<\/pre>\n<p><span style=\"font-weight: 400;\">Now we are going to extract a list of unique ZIP code values from CSV file. This will allow us to determine how many different cities we have in our data.<\/span><\/p>\n<pre class=\"\"># List unique values on example column using drop_duplicates(We can also use unique())\r\ndf2 = pd.DataFrame(data, columns=['ZIP'])\r\nu = df2.drop_duplicates(['ZIP'])<\/pre>\n<p><span style=\"font-weight: 400;\">The results are the following ZIP codes, which we will use in the next steps.<\/span><\/p>\n<pre class=\"\"># Get data for cities\r\n# 94107 -&gt; San Francisco\r\n# 94063 -&gt; San Mateo\r\n# 94301 -&gt; Santa Clara\r\n# 94041 -&gt; Mountain View\r\n# 95113 -&gt; San Jose\r\nzipcodes = [94107, 94063, 94301, 94041, 95113]<\/pre>\n<p><span style=\"font-weight: 400;\">After that, we can create plots of the data, such for example for mean temperature in San Francisco Area using\u00a0following\u00a0code:<\/span><\/p>\n<pre class=\"\"># Plots of Mean temperature in Fahrenheit scale\r\n\r\nplt.figure()\r\nfor zcode in zipcodes:\r\n  local = data.loc[data['ZIP'] == zcode]\r\n  df1 = pd.DataFrame(local, columns=['meanTemp'])\r\n  plt.plot(df1.as_matrix(), '-', label=str(zcode))\r\n\r\nplt.xticks(x,labels,rotation='vertical',fontsize=12)\r\nplt.grid(True)\r\nplt.xlabel('Month')\r\nplt.ylabel('Temperature in Fahrenheit scale', fontsize=15)\r\nplt.title('Fahrenheit Mean Temperature on Bay Area Cities',fontsize=20)\r\nplt.legend([\"San Francisco\", \"San Mateo\",\"Santa Clara\", \"Mountain View\",\"San Jose\"])\r\nplt.show()<\/pre>\n<p><a href=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-3055 size-large\" src=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot-1024x501.png\" alt=\"Pandas linear plot Temperature\" width=\"1024\" height=\"501\" srcset=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot-1024x501.png 1024w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot-300x147.png 300w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot-768x376.png 768w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot.png 2000w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>&#8230;and that\u2019s it \ud83d\ude42<\/p>\n<p><strong>Want to launch the project?\u00a0<a href=\"https:\/\/github.com\/Ermlab\/pandas-weather-plots\">Download code from\u00a0GitHub<\/a>\u00a0.<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">If you have any questions about the project, the libraries or this post, please ask the questions in the comments.<\/span><\/p>\n<h3><b>Resources<\/b><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/stable\/\"><span style=\"font-weight: 400;\">Official Pandas Documentation<\/span><\/a><span style=\"font-weight: 400;\">\u00a0(You can also download it in\u00a0<\/span><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/stable\/pandas.pdf\"><span style=\"font-weight: 400;\">PDF version<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/li>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/www.packtpub.com\/big-data-and-business-intelligence\/mastering-pandas\"><span style=\"font-weight: 400;\">Femi Anthony &#8220;Mastering Pandas&#8221;<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/www.amazon.com\/Learning-Pandas-Python-Discovery-Analysis\/dp\/1783985127\"><span style=\"font-weight: 400;\">Michael Heydt &#8220;Learning Pandas&#8221;<\/span><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Weather data is a great type of input when starting to learn tools and technologies for your data science skills. This project will introduce us to the basics of Pandas and Matplotlib Python libraries using data for San Francisco, San Mateo, Santa Clara, Mountain View and San Jose in California.<\/p>\n","protected":false},"author":2,"featured_media":3055,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[113],"tags":[116,114,115],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.9.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Weather data visualization for San Francisco Bay Area - a Python Pandas and Matplotlib Tutorial - Ermlab Software<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Weather data visualization for San Francisco Bay Area - a Python Pandas and Matplotlib Tutorial - Ermlab Software\" \/>\n<meta property=\"og:description\" content=\"Weather data is a great type of input when starting to learn tools and technologies for your data science skills. This project will introduce us to the basics of Pandas and Matplotlib Python libraries using data for San Francisco, San Mateo, Santa Clara, Mountain View and San Jose in California.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"Ermlab Software\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/krzysztof.sopyla\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-12T14:04:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-12T20:52:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2000\" \/>\n\t<meta property=\"og:image:height\" content=\"978\" \/>\n<meta name=\"twitter:card\" content=\"summary\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/ksopyla\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"3 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ermlab.com\/#website\",\"url\":\"https:\/\/ermlab.com\/\",\"name\":\"Ermlab Software\",\"description\":\"Data science, aplikacje web i mobilne. Projektujemy aplikacje na zam\\u00f3wienie.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/ermlab.com\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_temperature_plot.png\",\"width\":2000,\"height\":978,\"caption\":\"Pandas linear plot Temperature\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/#webpage\",\"url\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\",\"name\":\"Weather data visualization for San Francisco Bay Area - a Python Pandas and Matplotlib Tutorial - Ermlab Software\",\"isPartOf\":{\"@id\":\"https:\/\/ermlab.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/#primaryimage\"},\"datePublished\":\"2018-08-12T14:04:29+00:00\",\"dateModified\":\"2018-09-12T20:52:05+00:00\",\"author\":{\"@id\":\"https:\/\/ermlab.com\/#\/schema\/person\/c060870e04525bb2fbf8b4964686ad73\"},\"breadcrumb\":{\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/\",\"url\":\"https:\/\/ermlab.com\/en\/\",\"name\":\"Strona g\\u0142\\u00f3wna\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\",\"url\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-weather-data-visualization-tutorial\/\",\"name\":\"Weather data visualization for San Francisco Bay Area &#8211; a Python Pandas and Matplotlib Tutorial\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ermlab.com\/#\/schema\/person\/c060870e04525bb2fbf8b4964686ad73\",\"name\":\"Krzysztof Sopy\\u0142a\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ermlab.com\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9c872ab9609beb8ec82c3a72b0310974?s=96&r=g\",\"caption\":\"Krzysztof Sopy\\u0142a\"},\"description\":\"Wsp\\u00f3\\u0142za\\u0142o\\u017cyciel firmy i prezes zarz\\u0105du. Pasjonat technologii \\u0142\\u0105cz\\u0105cy wiedz\\u0119 akademick\\u0105 z wieloletni\\u0105 praktyk\\u0105 programisty i architekta. W ci\\u0105gu dnia p\\u0142ywa, je\\u017adzi na rowerze oraz biega.\",\"sameAs\":[\"https:\/\/ksopyla.com\",\"https:\/\/www.facebook.com\/krzysztof.sopyla\",\"https:\/\/twitter.com\/https:\/\/twitter.com\/ksopyla\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3054"}],"collection":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/comments?post=3054"}],"version-history":[{"count":2,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3054\/revisions"}],"predecessor-version":[{"id":3058,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3054\/revisions\/3058"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/media\/3055"}],"wp:attachment":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/media?parent=3054"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/categories?post=3054"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/tags?post=3054"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}