Spatial Panel with Fixed Effects
In this post I want to show you some of the estimations I’m working on for the GSoC project of Spatial Econometrics with panel data in PySAL. Specifically, I will focus on estimating the relationship of the Homicides Rates with the variables of Resources and Population. For the estimation, I will use a panel of 3085 counties of US for three decades. The panel model will account for fixed effects.
In a second post, I will talk about the estimation with random effects and how to choose the best model.
import numpy as np
import libpysal
import spreg
Data
I’m going to use the data of NCOVR US County Homicides (3085 areas). The dependent variable will be the Homicide Rates, and the independent variables are the Resource Deprivation (principal component composed of percent black, log of median family income, gini index of family income inequality, and more), and also the Population Structure (principal component composed of the log of population and the log of population density). Finally, the time period will be three decades: 1970, 1980, and 1990.
# Open data on NCOVR US County Homicides (3085 areas).
nat = libpysal.examples.load_example("NCOVR")
db = libpysal.io.open(nat.get_path("NAT.dbf"), "r")
# Create spatial weight matrix
nat_shp = libpysal.examples.get_path("NAT.shp")
w = libpysal.weights.Queen.from_shapefile(nat_shp)
w.transform = 'r'
# Define dependent variable
name_y = ["HR70", "HR80", "HR90"]
y = np.array([db.by_col(name) for name in name_y]).T
# Define independent variables
name_x = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90"]
x = np.array([db.by_col(name) for name in name_x]).T
Spatial Lag model
Let’s estimate a spatial lag panel model with fixed effects:
$$ y_{it} = \rho \sum_{j=1}^N w_{ij} y_{jt} + x_{it} \beta + \mu_i + e_{it} $$
Where $\sum_{j=1}^N w_{ij} y_{jt}$ represents the mean homicide rate of the neighbors of county $i$.
fe_lag = spreg.Panel_FE_Lag(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")
Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
print(fe_lag.summary)
REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set : NAT
Weights matrix : unknown
Dependent Variable : HR Number of Observations: 9255
Mean dependent var : 0.0000 Number of Variables : 3
S.D. dependent var : 3.9228 Degrees of Freedom : 9252
Pseudo R-squared : 0.0319
Spatial Pseudo R-squared: 0.0079
Sigma-square ML : 14.935 Log likelihood : -67936.533
S.E of regression : 3.865 Akaike info criterion : 135879.066
Schwarz criterion : 135900.465
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
RD 0.8005886 0.1614474 4.9588189 0.0000007
PS -2.6003523 0.4935486 -5.2686851 0.0000001
W_HR 0.1903043 0.0159991 11.8947008 0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
All the coefficients are statistically significant. Resource Deprivation has a positive relationship with Homicides Rates. On the other hand, the Population Structure has a negative relationship with the Homicides Rate of the counties. Finally, there is evidence that there is spatial interaction between the Homicides Rates of the counties.
Spatial Durbin model
Let’s estimate a spatial Durbin panel model with fixed effects:
$$ y_{it} = \rho \sum_{j=1}^N w_{ij} y_{jt} + \theta \sum_{j=1}^N w_{ij} x_{jt} + x_{it} \beta + \mu_i + e_{it} $$
Where $\sum_{j=1}^N w_{ij} x_{jt}$ represents the mean resource deprivation and mean population structure of the neighbors of county $i$.
name_x_durbin = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90", "W_RD70", "W_RD80", "W_RD90", "W_PS70", "W_PS80", "W_PS90"]
xlag = w.full()[0] @ x
x_durbin = np.hstack((x, xlag))
fe_durbin = spreg.Panel_FE_Lag(y, x_durbin, w, name_y=name_y, name_x=name_x_durbin, name_ds="NAT")
Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
print(fe_durbin.summary)
REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set : NAT
Weights matrix : unknown
Dependent Variable : HR Number of Observations: 9255
Mean dependent var : 0.0000 Number of Variables : 5
S.D. dependent var : 3.9228 Degrees of Freedom : 9250
Pseudo R-squared : 0.0332
Spatial Pseudo R-squared: 0.0088
Sigma-square ML : 14.916 Log likelihood : -67931.868
S.E of regression : 3.862 Akaike info criterion : 135873.736
Schwarz criterion : 135909.400
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
RD 0.9447922 0.1902488 4.9660866 0.0000007
PS -3.4882306 0.6842330 -5.0980155 0.0000003
W_RD -0.5184882 0.2888065 -1.7952792 0.0726092
W_PS 1.7251554 0.9116001 1.8924476 0.0584314
W_HR 0.1940036 0.0160269 12.1048752 0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
The coefficients for the Durbin model are statistically significant at least at a 10%. It is interesting that the spatial lag variables for Resource Deprivation and Population Structure have the opposite sign in the coefficients.
Spatial Error model
Now, let’s estimate a spatial error panel model with fixed effects:
$$ y_{it} = x_{it} \beta + \mu_i + v_{it} $$
where
$$ v_{it} = \lambda \sum_{j=1}^N w_{ij} v_{jt} + e_{it} $$
fe_error = spreg.Panel_FE_Error(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")
Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.
print(fe_error.summary)
REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL ERROR PANEL - FIXED EFFECTS
-------------------------------------------------------------------------
Data set : NAT
Weights matrix : unknown
Dependent Variable : HR Number of Observations: 9255
Mean dependent var : 0.0000 Number of Variables : 2
S.D. dependent var : 3.9228 Degrees of Freedom : 9253
Pseudo R-squared : 0.0000
Sigma-square ML : 68.951 Log likelihood : -67934.005
S.E of regression : 8.304 Akaike info criterion : 135872.010
Schwarz criterion : 135886.276
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
RD 0.8697923 0.3692968 2.3552662 0.0185094
PS -2.9660674 1.1703765 -2.5342849 0.0112677
lambda 0.1943460 0.0160253 12.1274197 0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Again, all the coefficients are statistically significant. Also, the coefficients are very similar to the ones estimated in the Spatial Lag model.
Conclusions
Three models have been estimated using spatial econometrics in a panel data setting. The next step should be to apply some diagnostic tests to assess the best model. Before that, it is necessary to estimate the model using random effects instead of fixed effects. But I prefer to leave that part for other blog post :)