Spatial Panel with Fixed Effects

Last updated on Mar 20, 2024

In this post I want to show you some of the estimations I’m working on for the GSoC project of Spatial Econometrics with panel data in PySAL. Specifically, I will focus on estimating the relationship of the Homicides Rates with the variables of Resources and Population. For the estimation, I will use a panel of 3085 counties of US for three decades. The panel model will account for fixed effects.

In a second post, I will talk about the estimation with random effects and how to choose the best model.

import numpy as np
import libpysal
import spreg

Data

I’m going to use the data of NCOVR US County Homicides (3085 areas). The dependent variable will be the Homicide Rates, and the independent variables are the Resource Deprivation (principal component composed of percent black, log of median family income, gini index of family income inequality, and more), and also the Population Structure (principal component composed of the log of population and the log of population density). Finally, the time period will be three decades: 1970, 1980, and 1990.

# Open data on NCOVR US County Homicides (3085 areas).
nat = libpysal.examples.load_example("NCOVR")
db = libpysal.io.open(nat.get_path("NAT.dbf"), "r")

# Create spatial weight matrix
nat_shp = libpysal.examples.get_path("NAT.shp")
w = libpysal.weights.Queen.from_shapefile(nat_shp)
w.transform = 'r'

# Define dependent variable
name_y = ["HR70", "HR80", "HR90"]
y = np.array([db.by_col(name) for name in name_y]).T

# Define independent variables
name_x = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90"]
x = np.array([db.by_col(name) for name in name_x]).T

Spatial Lag model

Let’s estimate a spatial lag panel model with fixed effects:

$$ y_{it} = \rho \sum_{j=1}^N w_{ij} y_{jt} + x_{it} \beta + \mu_i + e_{it} $$

Where $\sum_{j=1}^N w_{ij} y_{jt}$ represents the mean homicide rate of the neighbors of county $i$.

fe_lag = spreg.Panel_FE_Lag(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")

Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.

print(fe_lag.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set            :         NAT
Weights matrix      :     unknown
Dependent Variable  :          HR                Number of Observations:        9255
Mean dependent var  :      0.0000                Number of Variables   :           3
S.D. dependent var  :      3.9228                Degrees of Freedom    :        9252
Pseudo R-squared    :      0.0319
Spatial Pseudo R-squared:  0.0079
Sigma-square ML     :      14.935                Log likelihood        :  -67936.533
S.E of regression   :       3.865                Akaike info criterion :  135879.066
                                                 Schwarz criterion     :  135900.465

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
                  RD       0.8005886       0.1614474       4.9588189       0.0000007
                  PS      -2.6003523       0.4935486      -5.2686851       0.0000001
                W_HR       0.1903043       0.0159991      11.8947008       0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

All the coefficients are statistically significant. Resource Deprivation has a positive relationship with Homicides Rates. On the other hand, the Population Structure has a negative relationship with the Homicides Rate of the counties. Finally, there is evidence that there is spatial interaction between the Homicides Rates of the counties.

Spatial Durbin model

Let’s estimate a spatial Durbin panel model with fixed effects:

$$ y_{it} = \rho \sum_{j=1}^N w_{ij} y_{jt} + \theta \sum_{j=1}^N w_{ij} x_{jt} + x_{it} \beta + \mu_i + e_{it} $$

Where $\sum_{j=1}^N w_{ij} x_{jt}$ represents the mean resource deprivation and mean population structure of the neighbors of county $i$.

name_x_durbin = ["RD70", "RD80", "RD90", "PS70", "PS80", "PS90", "W_RD70", "W_RD80", "W_RD90", "W_PS70", "W_PS80", "W_PS90"]
xlag = w.full()[0] @ x
x_durbin = np.hstack((x, xlag))
fe_durbin = spreg.Panel_FE_Lag(y, x_durbin, w, name_y=name_y, name_x=name_x_durbin, name_ds="NAT")

Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.

print(fe_durbin.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set            :         NAT
Weights matrix      :     unknown
Dependent Variable  :          HR                Number of Observations:        9255
Mean dependent var  :      0.0000                Number of Variables   :           5
S.D. dependent var  :      3.9228                Degrees of Freedom    :        9250
Pseudo R-squared    :      0.0332
Spatial Pseudo R-squared:  0.0088
Sigma-square ML     :      14.916                Log likelihood        :  -67931.868
S.E of regression   :       3.862                Akaike info criterion :  135873.736
                                                 Schwarz criterion     :  135909.400

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
                  RD       0.9447922       0.1902488       4.9660866       0.0000007
                  PS      -3.4882306       0.6842330      -5.0980155       0.0000003
                W_RD      -0.5184882       0.2888065      -1.7952792       0.0726092
                W_PS       1.7251554       0.9116001       1.8924476       0.0584314
                W_HR       0.1940036       0.0160269      12.1048752       0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

The coefficients for the Durbin model are statistically significant at least at a 10%. It is interesting that the spatial lag variables for Resource Deprivation and Population Structure have the opposite sign in the coefficients.

Spatial Error model

Now, let’s estimate a spatial error panel model with fixed effects:

$$ y_{it} = x_{it} \beta + \mu_i + v_{it} $$

where

$$ v_{it} = \lambda \sum_{j=1}^N w_{ij} v_{jt} + e_{it} $$

fe_error = spreg.Panel_FE_Error(y, x, w, name_y=name_y, name_x=name_x, name_ds="NAT")

Warning: Assuming panel is in wide format, i.e. y[:, 0] refers to T0, y[:, 1] refers to T1, etc.
Similarly, assuming x[:, 0:T] refers to T periods of k1, x[:, T+1:2T] refers to k2, etc.

print(fe_error.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL ERROR PANEL - FIXED EFFECTS
-------------------------------------------------------------------------
Data set            :         NAT
Weights matrix      :     unknown
Dependent Variable  :          HR                Number of Observations:        9255
Mean dependent var  :      0.0000                Number of Variables   :           2
S.D. dependent var  :      3.9228                Degrees of Freedom    :        9253
Pseudo R-squared    :      0.0000
Sigma-square ML     :      68.951                Log likelihood        :  -67934.005
S.E of regression   :       8.304                Akaike info criterion :  135872.010
                                                 Schwarz criterion     :  135886.276

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
                  RD       0.8697923       0.3692968       2.3552662       0.0185094
                  PS      -2.9660674       1.1703765      -2.5342849       0.0112677
              lambda       0.1943460       0.0160253      12.1274197       0.0000000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Again, all the coefficients are statistically significant. Also, the coefficients are very similar to the ones estimated in the Spatial Lag model.

Conclusions

Three models have been estimated using spatial econometrics in a panel data setting. The next step should be to apply some diagnostic tests to assess the best model. Before that, it is necessary to estimate the model using random effects instead of fixed effects. But I prefer to leave that part for other blog post :)

GSoC