I Create a Map Showing Population Change by State Using Census Bureau Data¶

Data Sources¶

Shape files: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html¶

Data files: https://www.census.gov/data/datasets/time-series/demo/popest/2020s-national-total.html#v2022¶

In [2]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
In [3]:
data_path = 'US_Pop_2020_2022.csv'
shape_path = 'Shape_Files/cb_2018_us_state_500k.shp'

df = pd.read_csv(data_path)
print(df)
         Geographic_Area        2020        2021        2022
0                Alabama   5,031,362   5,049,846   5,074,296
1                 Alaska     732,923     734,182     733,583
2                Arizona   7,179,943   7,264,877   7,359,197
3               Arkansas   3,014,195   3,028,122   3,045,637
4             California  39,501,653  39,142,991  39,029,342
5               Colorado   5,784,865   5,811,297   5,839,926
6            Connecticut   3,597,362   3,623,355   3,626,205
7               Delaware     992,114   1,004,807   1,018,396
8   District of Columbia     670,868     668,791     671,803
9                Florida  21,589,602  21,828,069  22,244,823
10               Georgia  10,729,828  10,788,029  10,912,876
11                Hawaii   1,451,043   1,447,154   1,440,196
12                 Idaho   1,849,202   1,904,314   1,939,033
13              Illinois  12,786,580  12,686,469  12,582,032
14               Indiana   6,788,799   6,813,532   6,833,037
15                  Iowa   3,190,571   3,197,689   3,200,517
16                Kansas   2,937,919   2,937,922   2,937,150
17              Kentucky   4,507,445   4,506,589   4,512,310
18             Louisiana   4,651,664   4,627,098   4,590,241
19                 Maine   1,363,557   1,377,238   1,385,340
20              Maryland   6,173,205   6,174,610   6,164,660
21         Massachusetts   6,995,729   6,989,690   6,981,974
22              Michigan  10,069,577  10,037,504  10,034,113
23             Minnesota   5,709,852   5,711,471   5,717,184
24           Mississippi   2,958,141   2,949,586   2,940,057
25              Missouri   6,153,998   6,169,823   6,177,957
26               Montana   1,087,075   1,106,227   1,122,867
27              Nebraska   1,962,642   1,963,554   1,967,923
28                Nevada   3,115,648   3,146,402   3,177,772
29         New Hampshire   1,378,587   1,387,505   1,395,231
30            New Jersey   9,271,689   9,267,961   9,261,699
31            New Mexico   2,118,390   2,116,677   2,113,344
32              New York  20,108,296  19,857,492  19,677,151
33        North Carolina  10,449,445  10,565,885  10,698,973
34          North Dakota     779,518     777,934     779,261
35                  Ohio  11,797,517  11,764,342  11,756,058
36              Oklahoma   3,964,912   3,991,225   4,019,800
37                Oregon   4,244,795   4,256,301   4,240,137
38          Pennsylvania  12,994,440  13,012,059  12,972,008
39          Rhode Island   1,096,345   1,096,985   1,093,734
40        South Carolina   5,131,848   5,193,266   5,282,634
41          South Dakota     887,799     896,164     909,824
42             Tennessee   6,925,619   6,968,351   7,051,339
43                 Texas  29,232,474  29,558,864  30,029,572
44                  Utah   3,283,785   3,339,113   3,380,800
45               Vermont     642,893     646,972     647,064
46              Virginia   8,636,471   8,657,365   8,683,619
47            Washington   7,724,031   7,740,745   7,785,786
48         West Virginia   1,791,420   1,785,526   1,775,156
49             Wisconsin   5,896,271   5,880,101   5,892,539
50               Wyoming     577,605     579,483     581,381
In [4]:
print(df.columns)
Index(['Geographic_Area', '2020', '2021', '2022'], dtype='object')
In [5]:
df = df.drop(columns=['2020'])
In [6]:
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Geographic_Area  51 non-null     object
 1   2021             51 non-null     object
 2   2022             51 non-null     object
dtypes: object(3)
memory usage: 1.3+ KB
None
In [7]:
df['2021'] = df['2021'].str.replace(',','').astype('int')
df['2022'] = df['2022'].str.replace(',','').astype('int')
In [8]:
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Geographic_Area  51 non-null     object
 1   2021             51 non-null     int32 
 2   2022             51 non-null     int32 
dtypes: int32(2), object(1)
memory usage: 944.0+ bytes
None

Create a new column for the population change¶

In [9]:
df['Pop_Change'] = ((df['2022'] - df['2021']) / df['2021']) * 100
In [10]:
print(df)
         Geographic_Area      2021      2022  Pop_Change
0                Alabama   5049846   5074296    0.484173
1                 Alaska    734182    733583   -0.081587
2                Arizona   7264877   7359197    1.298301
3               Arkansas   3028122   3045637    0.578411
4             California  39142991  39029342   -0.290343
5               Colorado   5811297   5839926    0.492644
6            Connecticut   3623355   3626205    0.078656
7               Delaware   1004807   1018396    1.352399
8   District of Columbia    668791    671803    0.450365
9                Florida  21828069  22244823    1.909257
10               Georgia  10788029  10912876    1.157273
11                Hawaii   1447154   1440196   -0.480806
12                 Idaho   1904314   1939033    1.823176
13              Illinois  12686469  12582032   -0.823216
14               Indiana   6813532   6833037    0.286269
15                  Iowa   3197689   3200517    0.088439
16                Kansas   2937922   2937150   -0.026277
17              Kentucky   4506589   4512310    0.126947
18             Louisiana   4627098   4590241   -0.796547
19                 Maine   1377238   1385340    0.588279
20              Maryland   6174610   6164660   -0.161144
21         Massachusetts   6989690   6981974   -0.110391
22              Michigan  10037504  10034113   -0.033783
23             Minnesota   5711471   5717184    0.100027
24           Mississippi   2949586   2940057   -0.323062
25              Missouri   6169823   6177957    0.131835
26               Montana   1106227   1122867    1.504212
27              Nebraska   1963554   1967923    0.222505
28                Nevada   3146402   3177772    0.997012
29         New Hampshire   1387505   1395231    0.556827
30            New Jersey   9267961   9261699   -0.067566
31            New Mexico   2116677   2113344   -0.157464
32              New York  19857492  19677151   -0.908176
33        North Carolina  10565885  10698973    1.259601
34          North Dakota    777934    779261    0.170580
35                  Ohio  11764342  11756058   -0.070416
36              Oklahoma   3991225   4019800    0.715946
37                Oregon   4256301   4240137   -0.379766
38          Pennsylvania  13012059  12972008   -0.307799
39          Rhode Island   1096985   1093734   -0.296358
40        South Carolina   5193266   5282634    1.720844
41          South Dakota    896164    909824    1.524275
42             Tennessee   6968351   7051339    1.190927
43                 Texas  29558864  30029572    1.592443
44                  Utah   3339113   3380800    1.248445
45               Vermont    646972    647064    0.014220
46              Virginia   8657365   8683619    0.303256
47            Washington   7740745   7785786    0.581869
48         West Virginia   1785526   1775156   -0.580781
49             Wisconsin   5880101   5892539    0.211527
50               Wyoming    579483    581381    0.327533
In [11]:
shape = gpd.read_file(shape_path)
print(shape.columns)
Index(['STATEFP', 'STATENS', 'AFFGEOID', 'GEOID', 'STUSPS', 'NAME', 'LSAD',
       'ALAND', 'AWATER', 'geometry'],
      dtype='object')
In [12]:
print(shape['NAME'])
0                                      Mississippi
1                                   North Carolina
2                                         Oklahoma
3                                         Virginia
4                                    West Virginia
5                                        Louisiana
6                                         Michigan
7                                    Massachusetts
8                                            Idaho
9                                          Florida
10                                        Nebraska
11                                      Washington
12                                      New Mexico
13                                     Puerto Rico
14                                    South Dakota
15                                           Texas
16                                      California
17                                         Alabama
18                                         Georgia
19                                    Pennsylvania
20                                        Missouri
21                                        Colorado
22                                            Utah
23                                       Tennessee
24                                         Wyoming
25                                        New York
26                                          Kansas
27                                          Alaska
28                                          Nevada
29                                        Illinois
30                                         Vermont
31                                         Montana
32                                            Iowa
33                                  South Carolina
34                                   New Hampshire
35                                         Arizona
36                            District of Columbia
37                                  American Samoa
38                    United States Virgin Islands
39                                      New Jersey
40                                        Maryland
41                                           Maine
42                                          Hawaii
43                                        Delaware
44                                            Guam
45    Commonwealth of the Northern Mariana Islands
46                                    Rhode Island
47                                        Kentucky
48                                            Ohio
49                                       Wisconsin
50                                          Oregon
51                                    North Dakota
52                                        Arkansas
53                                         Indiana
54                                       Minnesota
55                                     Connecticut
Name: NAME, dtype: object

Merge the two dataframes¶

In [13]:
shape = pd.merge(
left=shape,
right=df,
left_on='NAME',
right_on='Geographic_Area',
how='left'
)

print(shape.columns)
Index(['STATEFP', 'STATENS', 'AFFGEOID', 'GEOID', 'STUSPS', 'NAME', 'LSAD',
       'ALAND', 'AWATER', 'geometry', 'Geographic_Area', '2021', '2022',
       'Pop_Change'],
      dtype='object')

Check if there is any missing data¶

In [14]:
print(shape.info())
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 56 entries, 0 to 55
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   STATEFP          56 non-null     object  
 1   STATENS          56 non-null     object  
 2   AFFGEOID         56 non-null     object  
 3   GEOID            56 non-null     object  
 4   STUSPS           56 non-null     object  
 5   NAME             56 non-null     object  
 6   LSAD             56 non-null     object  
 7   ALAND            56 non-null     int64   
 8   AWATER           56 non-null     int64   
 9   geometry         56 non-null     geometry
 10  Geographic_Area  51 non-null     object  
 11  2021             51 non-null     float64 
 12  2022             51 non-null     float64 
 13  Pop_Change       51 non-null     float64 
dtypes: float64(3), geometry(1), int64(2), object(8)
memory usage: 6.6+ KB
None

Remove missing data¶

In [15]:
shape = shape.dropna()

Map the dataframe¶

In [16]:
ax = shape.boundary.plot()
shape.plot(ax=ax,column='Pop_Change')
plt.show()

To zoom in on the continental U.S., I drop the areas that I am not interested in¶

In [17]:
shape = shape[~shape['NAME'].isin(['Alaska','Hawaii','Puerto Rico'])]
In [18]:
ax = shape.boundary.plot()
shape.plot(ax=ax,column='Pop_Change')
plt.show()

The overall map style needs to be improved¶

In [19]:
ax = shape.boundary.plot(edgecolor='black',linewidth=0.2,figsize=(10,5))
shape.plot(ax=ax,column='Pop_Change',legend=True,cmap='RdBu',legend_kwds={'shrink':0.3,'orientation':'horizontal','format':'%.1f%%'})
plt.show()

Get rid of the borders, the x and y axes, add a title and change the matplotlib color map¶

In [20]:
ax = shape.boundary.plot(edgecolor='black',linewidth=0.2,figsize=(10,5))
shape.plot(ax=ax,column='Pop_Change',legend=True,cmap='coolwarm',legend_kwds={'shrink':0.3,'orientation':'horizontal','format':'%.1f%%'})

ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

for edge in ['right','bottom','left','top']:
    ax.spines[edge].set_visible(False)

ax.set_title('U.S. Population Change 2021 to 2022',size=18,weight='bold')

plt.show()
In [ ]: