BlogalysisStop the Seal Hunt!

POOLED REGRESSION

Typically time-series regression models need a sufficient history of data to yield robust results (you need at least 2 years of data to get sensible results). If you have less than 2 years of data, but you have this for multiple groups, like stores or similar products, then you can still build a "pooled" model by combining time-series observations across several groups.

Pooled Regression is usually carried out on Time-Series Cross-Sectional data- data that has observations over time for several different units or ‘cross-sections’. For example concatenating Monthly Net Income data for different companies with Quarterly GDP information allows an analyst to model the relationship between Net Income and GDP even with limited Quarters of data per company, since concatenating across companies increases observations, yielding greater degrees of freedom.

Panel Data Models

Another example would be if you have sales data for 70 weeks from 10 different stores. You would not be able to build a regular model as you do not have 104 weeks of data, but you would be able to build a Pooled regression model because by pooling data, you have 10 times 70=700 data points instead of 70.

Pooled regression works similar to regular regression, except an extra intercept or ‘dummy’ is added for each store. It is important to remember that Pooled Regression Coefficients do not measure demand effect separately for each store, but yield an ‘overall’ measure of demand.

This technique can also be used with product groups instead of stores provided the products are similar. In this case it is important to remember that the model doesn’t really measure demand effects of the variables for a specific product, but instead are measures of overall cross-product demand.

Pooled Regression is part of the Panel family of Regression models- below is not an exhaustive taxonomy of these models.

Panel Regression Models

Pooled Regression: This approach can be used when the groups to be pooled are relatively similar or homogenous. Level differences can be removed by 'mean-centering' (similar to Within-Effects Model) the data across the groups (subtracting the mean or average of each group from observations for the group). The model can be directly run using Ordinary Least Squares on the concatenated groups. If the model yields large standard errors (small T-Stats), this could be a warning flag that the groups are not all that homogenous and a more advanced approach like Random Effects Model may be more appropriate.

Fixed Effects Model: Fixed Effects Models measure differences in intercepts for each groups (calculated using a separate dummy variable for each group. The approach is also called "Least Squares Dummy Variable" method for this reason. This is basically an OLS model with dummy variables to control for group differences, assuming constant slopes (coefficients) for independent variables and constant variance across groups. For SAS users the appropriate procedure to do this is using the TSCSREG Procedure or the Panel Procedure. Within-Effects Model avoids using dummies by mean-centering all modeled variables, including the dependent, thus increasing degrees of freedom.

Random Effects Model: This approach leverages the differences in the variance of the error term to model groups together, assuming constant intercept and slopes. Compared to Fixed Effects Models, Random Effects Models are more complex to estimate. Again, for SAS users TSCSREG and Panel Procedures can be used to estimate these models.

Random Parameters (Coefficients) Model: This approach is similar to the Random Effects model except it allows slopes and intercepts to vary across cross-sections or groups, assuming they are normally distributed around a mean. If they are not normally distributed a Hierarchical Bayes' approach can be used to estimate distribution-independent parameters by sampling from posterior probabilities. These models can be estimated using the Mixed or NLMixed Procedures in SAS.

Panel VAR Model:  A traditional VAR (Vector Auto-regression) model is a reduced form model that estimates a system of equations by using non-contemporaneous lags of each dependent variable in the system, creating a Dynamic Model. A Panel VAR model estimates a VAR across multiple Panels or groups by using lags of endogenous and exogenous variables for each group. Panel VAR Analysis cannot be conducted in SAS presently. EVIEWS, a popular Time-Series package does provide this functionality.