

Analysis
Tolerances (1/VIF) are between 0.4 and 0.9. This means that between 40% and 90% of the
variance of a particular independent variable is not explained by the
other independent variables.
Conclusion
All variables have high tolerance and a low VIF value
indicating a low degree of multicollinearity, if any. VIF for bloc and mpat is higher than for the
other independent variables but not clearly a significant problem.
(ii) Correlation Matrix
Second, I performed a sample estimation of the
correlations between the independent variables. As a general rule of thumb, correlation coefficients (R) of


Analysis
There are varying opinions in
the literature on what level of correlation constitutes multicollinearity. Jensen (2003) indicates that it would be
conservative to assume multicollinearity if two variables have a correlation
coefficient greater than R=0.5. He
indicates a liberal view would be to assume multicollinearity if two variables
have a correlation coefficient greater than R=0.9.
None of the correlation
coefficient satisfies the “conservative” requirement of R=0.5 or larger, except
the correlation between mpat and bloc.
Therefore, the correlation matrix confirms the conclusions of
the VIF test: if there is any collinearity problem at all, it might be among
the two “blockbuster” variables mpat and bloc.
MPAT
is the number of blockbuster drugs in the acquirer’s marketed portfolio with
patent expiration within two years of the announcement of the transaction divided
by the total number of the acquirer’s marketed blockbuster drugs.
BLOC
is the number of blockbuster drugs in the target’s marketed portfolio (marketed
or achieved) or pipeline divided by number of blockbuster drugs in the acquirer’s
marketed portfolio (marketed or achieved) or pipeline.
Both variables require that
the acquirer owns at least one blockbuster drug before the acquisition,
otherwise, the value of BLOC and MPAT is zero.
In addition, BLOC requires that the target owns at least one blockbuster
drug before the acquisition, otherwise, BLOC is zero.
In other words, it is
necessary but not sufficient for both BLOC and MPAT that the acquirer owns at
least one blockbuster drug for the value to be different from zero. Therefore, a certain correlation between
BLOC and MPAT is expected, even though each variable performs a measurement
that is significantly different from the other.
Eliminating either BLOC or
MPAT from the model would eliminate any doubt about a possible
multicollinearity problem (all correlation coefficients smaller than
R=0.5).
However, the elimination of
either BLOC or MPAT would also results in significantly lower R-squared
values:
|
|
Full model |
Removing MPAT |
Removing BLOC |
|
F |
5.57 |
4.76 |
3.61 |
|
R-squared |
0.5821 |
0.4876 |
0.4195 |
In the literature, it is
typically argued that in case of multicollinearity, an independent variable is
redundant in the model. Removing a redundant
variable from the model would reduce R-squared and Adjusted R-squared only
marginally.
In contrast, removing either
MPAT or BLOC from my model significantly reduces both the R-squared and
Adjusted R-squared value. Therefore,
both MPAT and BLOC make their individual contribution cannot be considered
redundant in the model, even if they have a correlation of 0.6723.
Conclusion
Correlation between MPAT and BLOC may initially appear as
too high, which would indicate multicollinearity. However, the high correlation is explained by a common necessary
condition for the variable to be different from zero. Each variable makes its own significant contribution to the
quality of the overall model and is not redundant. Therefore, multicollinearity does not seem to be a problem.
2.
2.
Heteroskedasticity
Heteroskedasticity is defined as unequal variance in
regression errors. This is caused by
different kinds of cases in the sample.
In other words, the error variance is systematically larger or smaller
in some portions of a sample than in others.
When heteroskedasticity is present, ordinary least-squares estimation
places more weight on the observations with large error variances than on those
with small error variances. This can lead to biased estimates of the variances
of each of the estimated parameters (Pindyck and Rubinfeld, 1991). I used three different methods to detect
heteroskedasticity:
(i) graphical method
Heteroskedasticity can be detected through a
post-regression analysis of the
residuals squared, to see if they show any systematic pattern. I created a scatter plot for fitted values
and residuals.

Analysis
The graph shows the residual
by fitted (predicted) value. The
variability of the residuals for fitted values below 50 appears smaller than
the variability of the residuals for fitted values 75 and higher. This observation may indicate mild
heteroskedasticity.
(ii) Breusch-Pagan
test
Typically, a formal test is designed to test the null
hypothesis of homoskedasticity (equal error variance between parameters) versus
an alternative hypothesis of heteroskedasticity. I used the Breusch-Pagan test, as suggested by Dr. Shackman. I ran the test with fitted values for pre30
as the independent variable:

Conducting the
Breusch-Pagan test shows that the model as a whole is subject to mild
heteroskedasticity (P-value .0421).
Based on this result, the null hypothesis would need to be rejected and
alternative hypothesis that the variance is not homogenous would need to be
accepted.
I repeated the test to detect heteroskedasticity related
to any particular independent variable:

Analysis
For individual variables, only one on its
own (AETB) comes close to being significantly related to the error variance (P-value
.0621), all other have significantly higher p-values. Based on these results, the null
hypotheses for individual variables would need to be accepted.
(iii) White
test
The White test is considered less sensitive to outliers
than the Breusch-Pagan test.

The p-value of the White test is 0.5744, which is clearly
not significant. Based on this result, the null hypotheses
for individual variables would need to be accepted.
3. Bootstrap
Inference
Bootstrapping is
a general approach to statistical inference.
It builds a sampling
distribution for a statistic by resampling from the collected data. In other words, the bootstrap takes the
values of the independent and dependent variables as the population and the
estimates of the sample as actual values.
Instead of drawing a specific distribution at random, the bootstrap
draws with replacement from the sample.
From these random samples, the
bootstrap standard
error as well as confidence intervals are estimated
by their empirical counterparts (Efron and Tibshirani, 1993).
DOCTORAL DISSERTATION MENTORING
AND ADVISING