See the repo on github


Chapter 4 Problems 2,9

2 Consider an equation to explain salaries of CEOs in terms of annual firm sales, return on equity (roe in percentage form), and return of the firm´s stock (ros in percentage form):

\[ log(salary) = \beta_0 +\beta_1 log(sales)+\beta_2 roe+\beta_3 ros +u \]

  1. In terms of the model parameters, state the null hypothesis that, after controlling for \(sales\), \(roe\), \(ros\) has no effect on CEOs salary. State the alternative that better stock market performance increases a CEO´s salary.

Answer:

\[ H_0 : β_3 = 0,\\ H_1 : β_3 > 0 \]

  1. Using the data CEOSAL1, the following equation was obtained by OLS: \[\begin{align} \widehat{log(salary)} = & 4.32 + .280log(sales) +.0174roe+.00024ros\\ & (.32)\quad\quad(.035)\quad\quad(.0041)\quad\quad\quad(.00054)\\ &n=209\quad R^{2}=.283 \end{align}\]

By what percentage is \(salary\) predicted to increase if \(ros\) increases by 50 point ? Does \(ros\) have a practically large effect on \(salary\) ?

Answer:

By exploring the CEOSAL1 data, we can see:

library(wooldridge)

str(ceosal1)
'data.frame':   209 obs. of  12 variables:
 $ salary  : int  1095 1001 1122 578 1368 1145 1078 1094 1237 833 ...
 $ pcsalary: int  20 32 9 -9 7 5 10 7 16 5 ...
 $ sales   : num  27595 9958 6126 16246 21783 ...
 $ roe     : num  14.1 10.9 23.5 5.9 13.8 ...
 $ pcroe   : num  106.4 -30.6 -16.3 -25.7 -3 ...
 $ ros     : int  191 13 14 -21 56 55 62 44 37 37 ...
 $ indus   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ finance : int  0 0 0 0 0 0 0 0 0 0 ...
 $ consprod: int  0 0 0 0 0 0 0 0 0 0 ...
 $ utility : int  0 0 0 0 0 0 0 0 0 0 ...
 $ lsalary : num  7 6.91 7.02 6.36 7.22 ...
 $ lsales  : num  10.23 9.21 8.72 9.7 9.99 ...
 - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
write.csv(ceosal1,"ceosal1.csv")

library(xlsx)

write.xlsx(ceosal1,"ceosal1.xlsx")

Now you can download this data here.

Using R, the estimated model, looks like:

fit<-lm(lsalary~lsales+roe+ros, data=ceosal1)

summary(fit)

Call:
lm(formula = lsalary ~ lsales + roe + ros, data = ceosal1)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.96060 -0.27144 -0.03264  0.22563  2.79805 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.3117125  0.3154329  13.669  < 2e-16 ***
lsales      0.2803149  0.0353200   7.936 1.34e-13 ***
roe         0.0174168  0.0040923   4.256 3.17e-05 ***
ros         0.0002417  0.0005418   0.446    0.656    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4832 on 205 degrees of freedom
Multiple R-squared:  0.2827,    Adjusted R-squared:  0.2722 
F-statistic: 26.93 on 3 and 205 DF,  p-value: 1.001e-14

Click here to see in Stata how to run this regression

The proportionate effect on \(\widehat{salary}\) is \(.00024(50) = .012\). or 1.2%. Therefore, a 50 point ceteris paribus increase in \(ros\) is predicted to increase \(salary\) by only 1.2%. Practically speaking, this is a very small effect for such a large change in \(ros\).

  1. Test the null hypothesis that \(ros\) has no effect on \(salary\) against the alternative that \(ros\) has a positive effect. Carry out the test at 10% significance level.

Answer:

The 10% critical value for a one-tailed test, using \(df = \infty\), is obtained from Table G.2 as 1.282. The \(t\) statistic on \(ros\) is \(.00024/.00054 ≈ .44\), which is well below the critical value. Therefore, we fail to reject \(H_0\) at the 10% significance level.

  1. Would you include \(ros\) in a final model explaining CEO compensation in terms of firm perfomance ? Explain.

Answer:

Based on this sample, the estimated \(ros\) coefficient appears to be different from zero only because of sampling variation. On the other hand, including \(ros\) may not be causing any harm; it depends on how correlated it is with the other independent variables (although these are very significant even with \(ros\) in the equation).

If you are a policy maker trying to estimate the causal effect of per-student spending on math test performance, explain why the first equation is more relevant than the second. What is the estimated effect of a 10% increase in expenditures per student?

 


9 In Problem 3 in Chapter 3, we estimated the equation:

\[\begin{align} \widehat{sleep} =& 3,638.25 - .148 totwrk - 11.13 educ + 2.20 age\\ & (112.28)\quad (.0172)\quad\quad\quad (5.88)\quad\quad (1.45)\\ &\quad\quad\quad\quad n=706, R^2 = .113, \end{align}\]

where we now report standard errors along with the estimates.

  1. Is either \(educ\) or \(age\) individually significant at the 5% level against a two-sided alternative?

Show your work.

Answer:

With \(df = 706 – 4 = 702\), we use the standard normal critical value (\(df = \infty\) in Table G.2), which is 1.96 for a two-tailed test at the 5% level. Now \(t_{educ} = −11.13/5.88 ≈ −1.89\), so \(|t_{educ}| = 1.89 < 1.96\), and we fail to reject \(H0: β_{educ} = 0\) at the 5% level. Also, \(t_{age} ≈ 1.52\), so age is also statistically insignificant at the 5% level.

  1. Dropping educ and age from the equation gives \[\begin{align} \widehat{sleep} =& 3,586.38 - .151 totwrk\\ & \quad(38.91)\quad (.017)\\ &\quad n=706, R^2 = .103 \end{align}\]

Are \(educ\) and \(age\) jointly significant in the original equation at the 5% level? Justify your answer.

Answer:

We could to compute the \(R^2\) form of the \(F\) statistic for joint significance. \(F = \frac{0.113−0.103}{1−0.113} \frac{702}{2} = 3.9572\). The 5% critical value is the \(F_{2,702}\) distribution can be obtained with a denominator \(df = \infty\): 3.00. Therefore, \(educ\) and \(age\) are jointly significant at the 5% level. (In fact, the \(p\) value is about 0.019, and so \(educ\) and \(age\) are jointly significant at the 2% level).

  1. Does including \(educ\) and \(age\) in the model greatly affect the estimated tradeoff between \(sleeping\) and \(working\)?

Answer:

Not really. These variables are jointly significant, but including them only changes the coefficient on \(totwork\) from −0.151 to −0.148.

  1. Suppose that the sleep equation contains heteroskedasticity. What does this mean about the tests computed in parts (i) and (ii)?

Answer:

The \(t\) and \(F\) statistics that we used assume homoskedasticity. If there is heteroskedasticity in the equation, the tests are no longer valid.

 


Chapter 5 Problems 4

4 In the simple regression model (5.16), under the first four Gauss-Markov assumptions, we showed that estimators of the form (5.17) are consistent for the slope, \(\beta_1\). Given such an estimator, define an estimator of \(\beta_0\) by \(\widetilde{\beta}_0 = \overline{y}-\widetilde{\beta}_{1}\overline{x}\). Show that \(plim \widetilde{\beta_0}=\beta_0\).

Answer:

Write \(y = β_0 + β_1x + u\), and take the expected value: \(E(y) = β_0 + β_{1}E(x) + E(u)\), or \(μ_y = β_0 + β_{1}μ_x\), since \(E(u) = 0\), where \(μ_y = E(y)\) and \(µ_x = E(x)\). We can rewrite this as \(β_0 = µ_y − β_1 µ_x\). Now, \(\widetilde{β_0} = y − β_{1} \overline{x}\) . Taking the plim of this we have plim\((\widetilde{β_0}) = plim(\overline{y} − \widetilde{β_1}\overline{x}) = plim(\overline{y}) – plim( \widetilde{β1})\cdot plim(\overline{x}) = μ_y − β_1 μ_x\), where we use the fact that \(plim(\overline{y}) = μ_y\) and \(plim(\overline{x}) = μ_x\) by the law of large numbers, and \(plim(\widetilde{β_1}) = β_1\) . We have also used the parts of the Property PLIM.2 from Appendix C.

 


Chapter 6 Problems 3,10,C8

3 Using the data in RDCHEM, the following equation was obtained by OLS:

\[\begin{align} \widehat{rditens}=&2.613+.00030sales-.0000000070 sales^2\\ &\quad(.429)\quad(.00014)\quad\quad(.0000000037)\\ &n=32,R^2 = .1484 \end{align}\]

  1. At what point does the marginal effect of \(sales\) on \(rdintens\) become negative?

Answer:

\[\begin{align} \Delta(rdintens/sales) =& 0.0003 – 0.000000007sales\\ =& sales = 21428.57. \end{align}\]

At the point $21428.57 millions of \(sales\), \(rdintens\) reaches the highest point. When \(salesexceeds\) $21428.57 millions, the marginal effect of \(sales\) on \(rdintens\) becomes negative.

  1. Would you keep the quadratic term in the model? Explain.

Answer:

\[ H0: \beta_2 = 0,\\ H1: \beta_2 = 0 \]

\(t\) = - 0.000000007 / 0.0000000037 = -1.89 and the critical \(t\) (29, 0.01) = 0.256

  • If \(|t| > \mbox{critical}\,\, t\) => Reject \(H_0 \\\)

  • \(sale^2\) has significant impact on \(rdintens\), thus \(sale^2\) should be included in the model

  1. Define \(salesbil\) as sales measured in billions of dollars: \(salesbil = sales/1,000\). Rewrite the estimated equation with \(salesbil\) and \(salesbil2\) as the independent variables. Be sure to report standard errors and the R-squared. [Hint: Note that \(salesbil^2 = sales^2/ (1, 000)^2.\)]

Answer:

Using R we can do:

str(rdchem)
'data.frame':   32 obs. of  8 variables:
 $ rd      : num  430.6 59 23.5 3.5 1.7 ...
 $ sales   : num  4570 2830 597 134 42 ...
 $ profits : num  186.9 467 107.4 -4.3 8 ...
 $ rdintens: num  9.42 2.08 3.94 2.62 4.05 ...
 $ profmarg: num  4.09 16.5 18 -3.22 19.05 ...
 $ salessq : num  20886730 8008900 356170 17849 1764 ...
 $ lsales  : num  8.43 7.95 6.39 4.89 3.74 ...
 $ lrd     : num  6.065 4.078 3.157 1.253 0.531 ...
 - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
library(dplyr)
rdchem<-rdchem%>%
  mutate(salesbil = sales/1000,
         salesbil2 = sales^2 / 1000^2) #Adding the new variables on the dataset

str(rdchem)
'data.frame':   32 obs. of  10 variables:
 $ rd       : num  430.6 59 23.5 3.5 1.7 ...
 $ sales    : num  4570 2830 597 134 42 ...
 $ profits  : num  186.9 467 107.4 -4.3 8 ...
 $ rdintens : num  9.42 2.08 3.94 2.62 4.05 ...
 $ profmarg : num  4.09 16.5 18 -3.22 19.05 ...
 $ salessq  : num  20886730 8008900 356170 17849 1764 ...
 $ lsales   : num  8.43 7.95 6.39 4.89 3.74 ...
 $ lrd      : num  6.065 4.078 3.157 1.253 0.531 ...
 $ salesbil : num  4.57 2.83 0.597 0.134 0.042 ...
 $ salesbil2: num  20.88673 8.0089 0.35617 0.01785 0.00176 ...
 - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"

I can generate the new dataset for run in other econometric softwares:

write.csv(rdchem,"rdchem.csv")

write.xlsx(rdchem,"rdchem.xlsx")

You can download the .csv file here and the .xlsx file here

The new fitted model is:

summary(lm(rdintens~salesbil+salesbil2, data=rdchem))

Call:
lm(formula = rdintens ~ salesbil + salesbil2, data = rdchem)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1418 -1.3630 -0.2257  1.0688  5.5808 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.612512   0.429442   6.084 1.27e-06 ***
salesbil     0.300571   0.139295   2.158   0.0394 *  
salesbil2   -0.006946   0.003726  -1.864   0.0725 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.788 on 29 degrees of freedom
Multiple R-squared:  0.1484,    Adjusted R-squared:  0.08969 
F-statistic: 2.527 on 2 and 29 DF,  p-value: 0.09733
  1. For the purpose of reporting the results, which equation do you prefer?

Answer:

By comparing the standard errors and the \(R^2\) we can see:

\[\begin{align} \widehat{rdintens}=&2.613+.00030sales-.0000000070 sales^2\\ &\quad(.429)\quad(.00014)\quad\quad(.0000000037)\\ &n=32,R^2 = .1484 \end{align}\]

versus

\[\begin{align} \widehat{rdintens}=&2.612+0.305salesbil-0.007salesbil2\\ &\quad(.429)\quad(0.139)\quad(0.003)\\ &n=32,R^2 = .1484 \end{align}\]

The first model has lower standard errors and maybe the chosen as prefered.

 


  1. The following two equations were estimated using the data in MEAPSINGLE. The key explanatory variable is lexppp, the log of expenditures per student at the school level.
  1. If you are a policy maker trying to estimate the causal effect of per-student spending on math test performance, explain why the first equation is more relevant than the second. What is the estimated effect of a 10% increase in expenditures per student?

Answer:

By declaring the hypothesis test:

\[ H_0 : \widehat{\beta}_{lexppp} = 0\\ H_0 : \widehat{\beta}_{lexppp} \neq 0 \]

By dividing the coefficient \(\widehat{\beta}_{lexppp}\) by his standard error ($t_{{lexppp}}={lexppp}/se(_{lexppp})9.01/4.01 2.231.96 $) and compare it to the critical value at 5% level of significance at 224 degrees of freedom with four explanatory variables.

The coefficient of (\(lexppp\)) is statistically significant because the null hypothesis is rejected.

For the second regression model the coefficient of (\(lexppp\)) is statistically insignificant because the null hypothesis is not rejected (\(t_{\beta_{lexppp}}=1.93/2.82\)).

The estimated change in the math scores by substituting the increase in expenditure per student by 10%. This indicates that 10% increase in the expenditure per student increases the math scores by 0.901%.

\[\begin{align} \widehat{\Delta math4} =& 9.01(\Delta lexppp)\\ =& 9.01/100\equiv 0.901\% \end{align}\]

  1. Does adding \(read4\) to the regression have strange effects on coefficients and statistical significance other than \(\beta_{lexppp}\)?

Answer:

By calling the dataset, we can see:

str(meapsingle)
'data.frame':   229 obs. of  18 variables:
 $ dcode   : int  63010 63010 63270 63270 63010 63010 63010 63130 63130 63130 ...
 $ bcode   : int  3030 3133 2023 2978 316 5670 1494 1631 1753 2254 ...
 $ math4   : num  92.8 100 72.1 76.1 95.2 88.6 95.2 66.7 83.9 95.7 ...
 $ read4   : num  82.5 94.3 46.5 65.7 80.6 72.7 90.5 46.3 44.6 56.5 ...
 $ enroll  : int  607 370 220 356 329 331 288 452 428 238 ...
 $ exppp   : num  6620 6620 5608 5830 6620 ...
 $ free    : num  1 0 5.9 8.1 0.3 1.2 12.2 50.2 40.2 24.4 ...
 $ reduced : num  0.7 0 5 2.8 0.3 0.9 5.2 17.5 10 17.6 ...
 $ lunch   : num  1.7 0 10.9 10.9 0.6 2.1 17.4 67.7 50.2 42 ...
 $ medinc  : int  110322 110322 65119 65119 109313 109313 109313 43750 43750 43750 ...
 $ totchild: int  4076 4076 2524 2524 3486 3486 3486 4651 4651 4651 ...
 $ married : int  3542 3542 2091 2091 3241 3241 3241 3258 3258 3258 ...
 $ single  : int  534 534 433 433 245 245 245 1393 1393 1393 ...
 $ pctsgle : num  13.1 13.1 17.16 17.16 7.03 ...
 $ zipcode : int  48009 48009 48017 48017 48025 48025 48025 48030 48030 48030 ...
 $ lenroll : num  6.41 5.91 5.39 5.87 5.8 ...
 $ lexppp  : num  8.8 8.8 8.63 8.67 8.8 ...
 $ lmedinc : num  11.6 11.6 11.1 11.1 11.6 ...

We can generate the dataset in .csv or .xlxs format using:

write.csv(meapsingle, "meapsingle.csv")

write.xlsx(meapsingle,"meapsingle.xlsx")

The files are avaiable here and here.

By comparing the two regression models:

model1<-lm(math4~lexppp + free + lmedinc + pctsgle, data=meapsingle)

model2<-lm(math4~lexppp + free + lmedinc + pctsgle +read4, data=meapsingle)

summary(model1)

Call:
lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle, data = meapsingle)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.259  -7.422   1.615   7.274  49.524 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 24.48949   59.23781   0.413   0.6797    
lexppp       9.00648    4.03530   2.232   0.0266 *  
free        -0.42164    0.07064  -5.969 9.27e-09 ***
lmedinc     -0.75221    5.35816  -0.140   0.8885    
pctsgle     -0.27444    0.16086  -1.706   0.0894 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.59 on 224 degrees of freedom
Multiple R-squared:  0.4716,    Adjusted R-squared:  0.4622 
F-statistic: 49.98 on 4 and 224 DF,  p-value: < 2.2e-16
summary(model2)

Call:
lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle + read4, 
    data = meapsingle)

Residuals:
     Min       1Q   Median       3Q      Max 
-29.5690  -4.6729  -0.0349   4.3644  24.8425 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 149.37870   41.70293   3.582 0.000419 ***
lexppp        1.93215    2.82480   0.684 0.494688    
free         -0.06004    0.05399  -1.112 0.267297    
lmedinc     -10.77595    3.75746  -2.868 0.004529 ** 
pctsgle      -0.39663    0.11143  -3.559 0.000454 ***
read4         0.66656    0.04249  15.687  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.012 on 223 degrees of freedom
Multiple R-squared:  0.7488,    Adjusted R-squared:  0.7432 
F-statistic: 132.9 on 5 and 223 DF,  p-value: < 2.2e-16
By putting the results sidye by side:
Dependent variable:
math4
model1 model2
(1) (2)
Constant 24.489 149.379***
(59.238) (41.703)
lexppp 9.006** 1.932
(4.035) (2.825)
free -0.422*** -0.060
(0.071) (0.054)
lmedinc -0.752 -10.776***
(5.358) (3.757)
pctsgle -0.274* -0.397***
(0.161) (0.111)
read4 0.667***
(0.042)
Observations 229 229
R2 0.472 0.749
Adjusted R2 0.462 0.743
Residual Std. Error 11.594 (df = 224) 8.012 (df = 223)
F Statistic 49.979*** (df = 4; 224) 132.941*** (df = 5; 223)
Note: p<0.1; p<0.05; p<0.01

 

By including the variable \(read4\) we see the increseases de \(R^2\) but the \(lexppp\) and \(free\) lost statistical significance and the \(lmeing\) and \(postgle\) be significative.

  1. How would you explain to someone with only basic knowledge of regression why, in this case, you prefer the equation with the smaller adjusted R-squared?

Answer:

The importance of causal relationships in an econometric model is more fundamental and interesting to remain prominent in relation to the selection of a model compared to the mere objective of statistical adjustment, which can often lead us to an inadequate interpretation of an economic phenomenon .

The statistical significance of the variables in their set of explanations provides us with a more robust interpretation than a biased \(R ^ 2\).

 


C8 Use the data in HPRICE1 for this exercise.

  1. Estimate the model

\[ price = \beta_0 + \beta_1 lotsize + \beta_2 sqrft + \beta_3 bdrms + u \]

and report the results in the usual form, including the standard error of the regression. Obtain predicted price, when we plug in \(lotsize = 10,000\), \(sqrft = 2,300\), and \(bdrms = 4\); round this price to the nearest dollar.

Answer:

Calling R for access the dataset:

str(hprice1)
'data.frame':   88 obs. of  10 variables:
 $ price   : num  300 370 191 195 373 ...
 $ assess  : num  349 352 218 232 319 ...
 $ bdrms   : int  4 3 3 3 4 5 3 3 3 3 ...
 $ lotsize : num  6126 9903 5200 4600 6095 ...
 $ sqrft   : int  2438 2076 1374 1448 2514 2754 2067 1731 1767 1890 ...
 $ colonial: int  1 1 0 1 1 1 1 1 0 0 ...
 $ lprice  : num  5.7 5.91 5.25 5.27 5.92 ...
 $ lassess : num  5.86 5.86 5.38 5.45 5.77 ...
 $ llotsize: num  8.72 9.2 8.56 8.43 8.72 ...
 $ lsqrft  : num  7.8 7.64 7.23 7.28 7.83 ...
 - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"

Generating dataset in .csv and .xlsx format for download:

write.csv(hprice1, "hprice1.csv")

write.xlsx(hprice1, "hprice1.xlsx")

Now estimating the model:

model <- lm(price~lotsize + sqrft + bdrms, data=hprice1)

summary(model)

Call:
lm(formula = price ~ lotsize + sqrft + bdrms, data = hprice1)

Residuals:
     Min       1Q   Median       3Q      Max 
-120.026  -38.530   -6.555   32.323  209.376 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.177e+01  2.948e+01  -0.739  0.46221    
lotsize      2.068e-03  6.421e-04   3.220  0.00182 ** 
sqrft        1.228e-01  1.324e-02   9.275 1.66e-14 ***
bdrms        1.385e+01  9.010e+00   1.537  0.12795    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 59.83 on 84 degrees of freedom
Multiple R-squared:  0.6724,    Adjusted R-squared:  0.6607 
F-statistic: 57.46 on 3 and 84 DF,  p-value: < 2.2e-16

By obtained coefficients we can see predicted prices

estimated <- summary(model)$coef[1,1]+summary(model)$coef[2,1]*1000+summary(model)$coef[3,1]*2300+summary(model)$coef[4,1]*4

estimated
[1] 318.0973

Is the same to do \(\widehat{price}=-2.177e+01 + 2.068e-03lotsize+1.228e-01sqrft+1.385e+01bdrms\) and changing by the expected values:

\[ \widehat{price}=-2.177e+01 + 2.068e-03\times 1000 + 1.228e-01\times 2300 + 1.385e+01\times 4 = \mbox{US\$ }318.10 \]

  1. Run a regression that allows you to put a 95% confidence interval around the predicted value in part (i). Note that your prediction will differ somewhat due to rounding error.

Answer:

confint(model)
                    2.5 %       97.5 %
(Intercept) -80.384661400 36.844045104
lotsize       0.000790769  0.003344644
sqrft         0.096454149  0.149102222
bdrms        -4.065140551 31.770184040

The estimated \(\beta_1=2.068e-03\) is with the interval [0.000790769, 0.003344644], \(\beta_2=1.228e-01\) is with the interval [0.096454149, 0.149102222] and \(\beta_3=1.385e+01\) is with the interval [-4.065140551, 31.770184040]

  1. Let \(price^0\) be the unknown future selling price of the house with the characteristics used in parts (i) and (ii). Find a 95% CI for \(price^0\) and comment on the width of this confidence interval.

Answer:

First we can simulate the future selling price oriented by (i) results:

lotsize <- c(1000)

sqrft <- c(2300)

bdrms <- c(4)

unknown.price.zero <- data.frame(lotsize, sqrft, bdrms)

By using predict function …

unknown.price.zero
  lotsize sqrft bdrms
1    1000  2300     4
predict(model, newdata = unknown.price.zero, interval = "confidence")
       fit      lwr      upr
1 318.0973 299.3994 336.7952

The 95% prediction intervals associated with a sqrft of 2300 is (299.4, 336.8). This means that, according to our model, 95% of the prices with a sqrft of 2300 have a selling prices between 299.4 and 336.8.

 


References

Wooldridge, J.M. Introductory Econometrics: A modern approach, 6 .ed, 2016. Avaible in zlib