{"id":3059,"date":"2018-07-10T14:19:39","date_gmt":"2018-07-10T14:19:39","guid":{"rendered":"https:\/\/ermlab.com\/?p=3059"},"modified":"2018-09-12T20:52:41","modified_gmt":"2018-09-12T20:52:41","slug":"pandas-seaborn-world-bank-gdp-analysis","status":"publish","type":"post","link":"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/","title":{"rendered":"The World Bank GDP Analysis using Pandas and Seaborn Python libraries"},"content":{"rendered":"<p><a href=\"http:\/\/pandas.pydata.org\/\"><span style=\"font-weight: 400;\">Pandas<\/span><\/a><span style=\"font-weight: 400;\">\u00a0and\u00a0<\/span><a href=\"http:\/\/seaborn.pydata.org\/\"><span style=\"font-weight: 400;\">Seaborn<\/span><\/a><span style=\"font-weight: 400;\">\u00a0are one of the most useful data science related Python libraries. The first one provides an easy to use and high-performance data structures and methods\u00a0for data manipulation. The latter\u00a0is build\u00a0on top of matplotlib and provides a high-level interface for drawing attractive statistical graphics. How do they work?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s check it out using World Bank GDP data from 10 central European countries &#8211; Poland, Germany, Belarus, the Czech Republic, the Slovak Republic, Hungary, Estonia, France, Ukraine\u00a0and\u00a0the United Kingdom.<\/span><\/p>\n<p><!--more--><\/p>\n<h2><span style=\"font-weight: 400;\">What are we looking for?<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The question &#8211; How far in economic\u00a0development\u00a0eastern Europe countries are relative to developed countries like Germany and France?<\/span><\/p>\n<p>To answer it we need to analyze four GDP factors &#8211; GDP per capita (US$),\u00a0GDO\u00a0per capita growth (annual %), GDP growth (annual %) and GDP (current US$).<\/p>\n<p><span style=\"font-weight: 400;\">The data from the World Bank (from the\u00a0<\/span><a href=\"http:\/\/data.worldbank.org\/products\/wdi\"><span style=\"font-weight: 400;\">World Development Indicators<\/span><\/a><span style=\"font-weight: 400;\">\u00a0website to be exact) are in an open format and have good\u00a0history\u00a0records for many countries that include a number of economic and social indicators.<\/span><\/p>\n<p>We chose the years 1990 &#8211; 2016 because only these were available for the selected indicators.<\/p>\n<p>You can find the data\u00a0<a href=\"http:\/\/ksopyla.com\/wp-content\/uploads\/2016\/12\/GDP_Poland_neighbours.csv\">here<\/a>.<\/p>\n<h2><span style=\"font-weight: 400;\">The code<\/span><\/h2>\n<p>First, we load the data from the CSV\u00a0file. Then we remove the last 5\u00a0lines,\u00a0because they contain empty values and information about the date of the last data update. In addition, we have to remove the column with the year 2016, because, as it turned out it is empty (no data). &#8220;<em>gdp.replace<\/em>&#8221; is responsible for the replacement of two dots, symbolizing the empty NaN.<\/p>\n<pre class=\"lang:python decode:true\" title=\"Loading and cleaning the data - Pandas\">import numpy as np\r\nimport pandas as pd\r\nimport matplotlib as mpl\r\nimport matplotlib.pyplot as plt\r\n \r\ngdp = pd.read_csv('.\/shared\/WorldBank\/GDP_Poland_neighbours.csv')\r\n \r\n#we take only data, not additional informations\r\ngdp = gdp[0:-5]\r\n#delete empty column\r\ndel gdp['2016 [YR2016]']\r\n#replace '..' string with nan values\r\ngdp.replace('..', np.nan, inplace=True)<\/pre>\n<p>In the course of further work with DataFrame I received mysterious errors and at first, I was not able to determine what was wrong. After some time I decided to check the types of the individual columns:<\/p>\n<pre class=\"lang:python decode:true\">gdp.dtypes\r\n \r\nCountry Name      object\r\nCountry Code      object\r\nSeries Name       object\r\nSeries Code       object\r\n1990 [YR1990]     object\r\n1991 [YR1991]     object\r\n1992 [YR1992]     object\r\n1993 [YR1993]     object\r\n1994 [YR1994]     object\r\n1995 [YR1995]     object\r\n1996 [YR1996]    float64\r\n1997 [YR1997]    float64\r\n1998 [YR1998]    float64\r\n1999 [YR1999]    float64\r\n2000 [YR2000]    float64\r\n2001 [YR2001]    float64\r\n2002 [YR2002]    float64\r\n2003 [YR2003]    float64\r\n2004 [YR2004]    float64\r\n2005 [YR2005]    float64\r\n2006 [YR2006]    float64\r\n2007 [YR2007]    float64\r\n2008 [YR2008]    float64\r\n2009 [YR2009]    float64\r\n2010 [YR2010]    float64\r\n2011 [YR2011]    float64\r\n2012 [YR2012]    float64\r\n2013 [YR2013]    float64\r\n2014 [YR2014]    float64\r\n2015 [YR2015]    float64\r\ndtype: objec<\/pre>\n<p>To my surprise dates from 1990 to 1995 didn&#8217;t have the data type float64\u00a0only\u00a0object, so I decided to be sure all the columns of years to convert to numeric values. For this purpose, I\u00a0select \u00a0columns\u00a0from 4 up to the end (that is, all of the years) and with use of &#8220;<em>apply<\/em>&#8221; method &#8216;I applied the function &#8220;<em>pd.to_numeric<\/em>&#8220;. It converts all years to a floating point number.<\/p>\n<pre class=\"lang:python decode:true\" title=\"Loading and cleaning the data - Pandas\"># some of the colums are objects, we have to convert to floats, \r\n#then pivot_table will take them into consideration\r\ncol_list = gdp.columns[4:].values\r\ngdp[col_list]=gdp[col_list].apply(pd.to_numeric)<\/pre>\n<p>In each row, was the name of the country, its code, the name of a series of data from the World Bank, its code, and in subsequent columns the years. Such\u00a0arrangement\u00a0of the data was not too comfortable so I decided to reindex the table using the functions &#8220;<em>pivot_table<\/em>&#8221;<\/p>\n<pre class=\"lang:default decode:true\" title=\"Loading and cleaning the data - Pandas\">#reindex all table, create pivot view\r\npv2 = pd.pivot_table(gdp,index=['Series Name','Country Code'], dropna=False, fill_value=0.0)\r\n# set the years\r\npv2.columns= np.arange(1990,2016)<\/pre>\n<p>This has changed\u00a0dataframe\u00a0from\u00a0form:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone wp-image-60 size-full\" src=\"https:\/\/blog.plon.io\/wp-content\/uploads\/2017\/03\/wordbank_dataframe.png\" alt=\"worldbank pandas dataframe\" width=\"706\" height=\"349\" \/><\/p>\n<p>To this one.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone wp-image-59 size-full\" src=\"https:\/\/blog.plon.io\/wp-content\/uploads\/2017\/03\/wordbank_pivot.png\" alt=\"worldbank pivoted table in pandas \" width=\"572\" height=\"603\" \/><\/p>\n<p>That way I can pull any economic indicator and immediately have all the countries along\u00a0with all the years.<\/p>\n<p>Now I can easily visualize 4 selected indicators. For nicer graphs import Seaborn and set the color palette so that each line on the graph was plotted with a different color. Try comparing charts with and without Seaborn.<\/p>\n<p>Drawing directly\u00a0with\u00a0the pandas is really simple &#8211; just for our pivot table choose the interesting indicator, then transpose the data (function .T) and plot (, plot &#8216;).<\/p>\n<pre class=\"lang:python decode:true\">import seaborn as sns\r\npalette = sns.color_palette(\"Paired\", 10)\r\nsns.set_palette(palette)\r\n \r\npv2.loc['GDP (current US$)'].T.plot(alpha=0.75, rot=45)\r\npv2.loc['GDP per capita (current US$)'].T.plot(alpha-0.8, rot=45)\r\npv2.loc['GDP per capita (current US$)'].T.plot(alpha=0.75, rot=45)\r\npv2.loc['GDP growth (annual %)'].T.plot(alpha=0.75, rot=45)<\/pre>\n<p>The first two charts<\/p>\n<ol>\n<li>GDP (current US$), data from World\u00a0bank<\/li>\n<\/ol>\n<p><a href=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_europe_wordbank_pandas.png\"><img loading=\"lazy\" class=\"wp-image-3063 size-full aligncenter\" src=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_europe_wordbank_pandas.png\" alt=\"\" width=\"689\" height=\"506\" srcset=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_europe_wordbank_pandas.png 689w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_europe_wordbank_pandas-300x220.png 300w\" sizes=\"(max-width: 689px) 100vw, 689px\" \/><\/a><\/p>\n<p>2.\u00a0GDP per capita, data from World\u00a0bank<\/p>\n<p><a href=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_capita_europe_wordbank_pandas.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-3061 size-full\" src=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_capita_europe_wordbank_pandas.png\" alt=\"\" width=\"710\" height=\"497\" srcset=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_capita_europe_wordbank_pandas.png 710w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/gdp_capita_europe_wordbank_pandas-300x210.png 300w\" sizes=\"(max-width: 710px) 100vw, 710px\" \/><\/a><\/p>\n<p>Let&#8217;s try to perform a simple regression from the GDP data to see if there is a chance that one day we can catch up with Germany. This time we will use the &#8220;lmplot&#8221; function from the Seaborn library, except that the data must lead to a form of time series.<\/p>\n<p>From the data in the form of a table with countries as columns, we need to create a table in which we will have only three columns [years, the country GDP]. We do this through a series of operations, the removal of the\u00a0index,\u00a0because our table at the beginning of the year is indexed (unique rows), changes of the name of the column. The key operation here is the &#8220;melt&#8221; function that transmits the data from the column and adds them into the next rows. So that we are able to make the following transformation. The attached images omitted part of the columns and rows but I hope its clear.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-64\" src=\"https:\/\/blog.plon.io\/wp-content\/uploads\/2017\/03\/wordbank_pandas_pivot-1.png\" alt=\"\" width=\"274\" height=\"318\" \/>\u00a0<img loading=\"lazy\" class=\"alignnone size-full wp-image-63\" src=\"https:\/\/blog.plon.io\/wp-content\/uploads\/2017\/03\/wordbank_pandas_melt.png\" alt=\"\" width=\"254\" height=\"330\" \/><\/p>\n<pre class=\"lang:python decode:true \" title=\"A simple regression of the GDP\">#seaborn plots\r\nplot_data = pv2.loc['GDP (current US$)'].T.reset_index()\r\nplot_data.rename(columns={'index':'Years'}, inplace=True)\r\n# unpivot the data, change from table view, where we have columns for each \r\n# country, to big long time series data, [year, country code, value]\r\nmelt_data = pd.melt(plot_data, id_vars=['Years'],var_name='Country')\r\nmelt_data.rename(columns={'value':'GDP'}, inplace=True)\r\nsns.lmplot(x=\"Years\", y=\"GDP\", hue=\"Country\", data=melt_data, palette=\"Set1\");<\/pre>\n<p>We should get a result similar to this:<\/p>\n<p><a href=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-3065 size-full\" src=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp.png\" alt=\"\" width=\"566\" height=\"485\" srcset=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp.png 566w, https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp-300x257.png 300w\" sizes=\"(max-width: 566px) 100vw, 566px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Important links<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/github.com\/ksopyla\/Pandas_Wordbank_GDP\">Downlod project from Github<\/a><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A great\u00a0<\/span><a href=\"http:\/\/pandas.pydata.org\/pandas-docs\/stable\/10min.html\"><span style=\"font-weight: 400;\">10 minutes to pandas<\/span><\/a><span style=\"font-weight: 400;\">\u00a0tutorial<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Pandas\u00a0and\u00a0Seaborn\u00a0are one of the most useful data science related Python libraries. The first one provides an easy to use and high-performance data structures and methods\u00a0for data manipulation. The latter\u00a0is build\u00a0on top of matplotlib and provides a high-level interface for drawing attractive statistical graphics. How do they work? Let\u2019s check it out using World Bank GDP [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":3065,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[113],"tags":[116,114,117],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.9.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The World Bank GDP Analysis using Pandas and Seaborn Python libraries - Ermlab Software<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The World Bank GDP Analysis using Pandas and Seaborn Python libraries - Ermlab Software\" \/>\n<meta property=\"og:description\" content=\"Pandas\u00a0and\u00a0Seaborn\u00a0are one of the most useful data science related Python libraries. The first one provides an easy to use and high-performance data structures and methods\u00a0for data manipulation. The latter\u00a0is build\u00a0on top of matplotlib and provides a high-level interface for drawing attractive statistical graphics. How do they work? Let\u2019s check it out using World Bank GDP [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\" \/>\n<meta property=\"og:site_name\" content=\"Ermlab Software\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/krzysztof.sopyla\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-10T14:19:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-12T20:52:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp.png\" \/>\n\t<meta property=\"og:image:width\" content=\"566\" \/>\n\t<meta property=\"og:image:height\" content=\"485\" \/>\n<meta name=\"twitter:card\" content=\"summary\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/ksopyla\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"5 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ermlab.com\/#website\",\"url\":\"https:\/\/ermlab.com\/\",\"name\":\"Ermlab Software\",\"description\":\"Data science, aplikacje web i mobilne. Projektujemy aplikacje na zam\\u00f3wienie.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/ermlab.com\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/ermlab.com\/wp-content\/uploads\/2018\/09\/pandas_seaborn_lmplot_wordbank_gdp.png\",\"width\":566,\"height\":485},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/#webpage\",\"url\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\",\"name\":\"The World Bank GDP Analysis using Pandas and Seaborn Python libraries - Ermlab Software\",\"isPartOf\":{\"@id\":\"https:\/\/ermlab.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/#primaryimage\"},\"datePublished\":\"2018-07-10T14:19:39+00:00\",\"dateModified\":\"2018-09-12T20:52:41+00:00\",\"author\":{\"@id\":\"https:\/\/ermlab.com\/#\/schema\/person\/c060870e04525bb2fbf8b4964686ad73\"},\"breadcrumb\":{\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/\",\"url\":\"https:\/\/ermlab.com\/en\/\",\"name\":\"Strona g\\u0142\\u00f3wna\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\",\"url\":\"https:\/\/ermlab.com\/en\/blog\/data-science\/pandas-seaborn-world-bank-gdp-analysis\/\",\"name\":\"The World Bank GDP Analysis using Pandas and Seaborn Python libraries\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ermlab.com\/#\/schema\/person\/c060870e04525bb2fbf8b4964686ad73\",\"name\":\"Krzysztof Sopy\\u0142a\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ermlab.com\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9c872ab9609beb8ec82c3a72b0310974?s=96&r=g\",\"caption\":\"Krzysztof Sopy\\u0142a\"},\"description\":\"Wsp\\u00f3\\u0142za\\u0142o\\u017cyciel firmy i prezes zarz\\u0105du. Pasjonat technologii \\u0142\\u0105cz\\u0105cy wiedz\\u0119 akademick\\u0105 z wieloletni\\u0105 praktyk\\u0105 programisty i architekta. W ci\\u0105gu dnia p\\u0142ywa, je\\u017adzi na rowerze oraz biega.\",\"sameAs\":[\"https:\/\/ksopyla.com\",\"https:\/\/www.facebook.com\/krzysztof.sopyla\",\"https:\/\/twitter.com\/https:\/\/twitter.com\/ksopyla\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3059"}],"collection":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/comments?post=3059"}],"version-history":[{"count":3,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3059\/revisions"}],"predecessor-version":[{"id":3068,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/posts\/3059\/revisions\/3068"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/media\/3065"}],"wp:attachment":[{"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/media?parent=3059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/categories?post=3059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ermlab.com\/en\/wp-json\/wp\/v2\/tags?post=3059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}