# Multiple Regression

In multiple regression analysis, the relationship between one dependent variable and several independent variables (called predictors) is analyzed. The regression equation takes the

=Y *b*_{0}+*b*+_{1}x_{1}*b*…+_{2}x_{2}*b*+_{p}*e*

where *Y* is the dependent variable, the *b*'s are the regression coefficients for the corresponding *x* (independent) terms, *b*_{0} is a constant or intercept, and *e* is the error term reflected in the residuals. The parameters of the regression equation are estimated using the ordinary least squares method (OLS).

**Ordinary least squares:**This method derives its name from the criterion used to draw the best-fit regression line: a line such that the sum of the squared deviations of the distances of all the points to the line is minimized.-
**Intercept**: The intercept,*b*_{0}, is where the regression plane intersects with the Y-axis. It is equal to the estimated*Y*value when all the independents have a value of 0. **Regression coefficient**: Regression coefficients*b*iare the slopes of the regression plane in the direction of*x*_{i}. Each regression coefficient represents the net effect the*i*^{th}variable has on the dependent variable, holding the remaining*x*'s in the equation constant**.****Beta weights**are the regression coefficients for standardized data. Beta is the average amount by which the dependent variable increases when the independent variable increases one standard deviation and other independent variables are held constant. The ratio of the beta weights is the ratio of the predictive importance of the independent variables.**Standardized**means that for each datum the mean is subtracted and the result divided by the standard deviation. The result is that all variables have a mean of 0 and a standard deviation of 1.-
**Residuals**are the difference between the observed values and those predicted by the regression equation **Dummy variables**:intervalness . Nominal and ordinal categories can be transformed into sets of dichotomies, called dummy variables. To prevent perfect multicollinearity, one category must be left out**.****Interpretation of**. For*b*for dummy variables*b*coefficients for dummy variables, which have been*binary coded*(the usual 1=present, 0="not" present),b is relative to the*reference category*(the category left out).**Multiple R:**The correlation coefficient between the observed and predicted values. It ranges in value from 0 to 1. A small value**Multiple**is the percent of the variance in the dependent variable, explained by the independent variables. It is also called the coefficient of multiple determination. Mathematically,*R*^{2}*R*^{2}= [ 1 - (SSE/SST) ] , where**SSE =**error sum of squares = (*Y*i -*Est**Y*i)^{2}where Yi is the actual value of*Y*for the*i*^{th}case and*Est**Y*i is the regression prediction for the*i*^{th}case.**SST**= total sum of squares = (*Y*i - Mean*Y*)^{2}**Adjusted R-Square:**When there are a large number of independent variables, it is possible that*R*^{2}may become artificially large, simply because some independent variables' chance variations "explain" small parts of the variance of the dependent variable. It is therefore essential to adjust the value of*R*^{2}as the number of independent variables increases. In the case of a few independent variables,*R*^{2}and adjusted*R*^{2}will be close. In the case of a large number of independent variables, adjusted*R*^{2}may be noticeably lower.**Multicollinearity**is the intercorrelation of the independent variables. The values of*r*^{2}'s near 1 violate the assumption of no perfect collinearity, while high*r*^{2}'s increase the standard error of the regression coefficients and makeassessment of the unique role of each independent variable difficult or impossible. While simple correlations tell something about multicollinearity, the preferred method of assessing multicollinearity is to compute the determinant of the correlation matrix. Determinants near zero indicate that some or all independent variables are highly correlated.**Partial correlation**is the correlation of two variables while controlling fora third or more other variables . Forexample *r*_{12.34}is the correlation of variables 1 and 2, controlling for variables 3 and 4. Partial correlation*r*_{12.34}equal to uncontrolled correlation*r*_{12} No effect of control variables Partial correlation near 0 Original correlation is spurious**.**