MGT 100 Week 3
This version: April 2026 | License: CC BY 4.0 | We use javascript to track readership.
We welcome reuse with attribution. Please share widely.

Correlation is the empirical tendency for two variables to move together. We hypothesize about a causal relationship using logic and judgment. If X “causes” Y, then if we intervene to change prob(X=x), prob(Y=y|X=x) will change as a result.

Philip Wright was an economics professor on sabbatical at the US Tariff Commission. Wright (1928) predicted the tax revenue that would be raised by a tariff on linseed oil, by predicting how much foreign demand would fall as a function of the change in price. Does he describe demand as a correlational or a causal relationship? Similar ideas go farther back, to Alfred Marshall (1890).

The demand curve is the relationship between price and quantity demanded. Why do we call it “inverse”? Marginal revenue shows how total revenue is changing at each price. Marginal cost represents the cost of each additional unit of quantity supplied. Where does the firm maximize profit? Where does the firm maximize revenue?
Have you ever heard any coworker or relative talk about a business’s demand curve? Why or why not?
Demand curves enable counterfactual predictions — “what would happen to sales if we changed the price?”
Have you heard of price endogeneity before? What is it?
The key insight: you need exogenous variation in price to isolate the causal effect from correlates of both price and sales. What does “exogenous” mean here?
Firms use many approaches to learn demand, but best practice is triangulation — combining multiple methods. Why might relying on a single approach be risky?
What might competitors, distribution partners, or suppliers react to price variation?
Why might combining experiments with demand modeling be better than either alone?
Demand modeling uses real consumer choices (revealed preferences) rather than survey responses (stated preferences). It’s also confidential and fast compared to running market experiments. What’s the difference between revealed preferences and stated preferences, and why does it matter?
To be fair, all sophisticated predictive & prescriptive analytic techniques require these 3. Which of these challenges do you think is the biggest? Which are shared by other demand estimation techniques?

McFadden (1974, 1978, 1981) introduced the Multinomial Logit (MNL) model into Econ by showing it was consistent with theories of rational choice. MNL was based on earlier work in stats & psych (“logistic regression”), and became probably the most popular demand model. Ken Train, who wrote the pre-class reading, assisted McFadden’s research at Berkeley, and later became a Berkeley professor.

This table shows how his predictions mapped onto the post-BART choices: he pretty much nailed his BART ridership predictions. This stood in contrast to survey-based results, which predicted much larger BART ridership. High-profile early “win” for MNL, combined with theoretical interpretation, led to model adoption. What was the confidence interval on the MNL predicted share for BART ridership? Why did survey-based predictions overestimate BART ridership compared to the model-based approach?
\[u_{ijt}=V_{jt}+\epsilon_{ijt}=x_{jt}\beta-\alpha p_{jt}+\epsilon_{ijt}\]
\[s_{jt}=Prob\{u_{ijt}>u_{ikt}\forall{k\ne j}\}=\int \left(\prod_{k\ne j} e^{-e^{-(\epsilon_{ijt}+V_{jt}-V_{kt})}}\right) e^{-\epsilon_{ijt}}e^{-e^{-\epsilon_{ijt}}} \, d\epsilon_{ijt} = \frac{e^{x_{jt}\beta-\alpha p_{jt}}}{\sum_{k=1}^J e^{x_{kt}\beta-\alpha p_{kt}}}\]
With \(N_t\) consumers, \(q_{jt}(\vec{x}; \vec{p})=N_t s_{jt}\). Estimating \(\alpha\) and \(\beta\) enables us to predict every product’s quantity response to a change in any product’s attributes \(x_{jt}\) or price \(p_{jt}\)
The integral simplifies because of the distributional assumption on \(\epsilon_{ijt}\); see Train 3.10. What does i.i.d. mean?
Suppose \(\gamma_{t}\) indicates how popular the category is in market \(t\), so utility is \(u_{ijt}=\gamma_{t}+x_{jt}\beta-\alpha p_{jt}+\epsilon_{ijt}\). Then market share becomes
\[s_{jt}=\frac{e^{\gamma_{t}+x_{jt}\beta-\alpha p_{jt}}}{\sum_{k=1}^J e^{\gamma_{t}+x_{kt}\beta-\alpha p_{kt}}}=\frac{e^{x_{jt}\beta-\alpha p_{jt}}}{\sum_{k=1}^J e^{x_{kt}\beta-\alpha p_{kt}}}\]
We usually normalize \(V_{1t}=0\) for one product, and other products utilities are identified in reference to product 1
\[s_{1t}=\frac{1}{\sum_{k=1}^J e^{x_{kt}\beta-\alpha p_{kt}}}\]
Define \(y_{ijt} \equiv 1\{i \text{ chose } j \text{ in } t\}\). I.e., \(y_{ijt}=1\) iff \(i\) choose \(j\) at \(t\); otherwise \(y_{ijt}=0\).
\[\max_{\alpha,\beta}\sum_{\forall i,j,t} y_{ijt}\ln s_{jt}(\alpha,\beta)\]
\[ln(s_{jt})-ln(s_{1t})=x_{jt}\beta-\alpha p_{jt}+\xi_{jt}\]
Estimating the model means finding \(\alpha\) and \(\beta\) to maximize the predicted probabilities of chosen products (\(y=1\)), and minimize the probabilities of not-chosen products (\(y=0\))
| Person | Product | \(y\) | \(p\) | \(x\) |
|---|---|---|---|---|
| A | 1 | 0 | — | — |
| A | 2 | 0 | 4 | 5 |
| A | 3 | 1 | 2 | 4 |
| B | 1 | 0 | — | — |
| B | 2 | 1 | 2 | 5 |
| B | 3 | 0 | 2 | 4 |
If \(\alpha=1\) and \(\beta=1\): \(\quad V_1=0\) (normalization); \(\quad V_2=5-p_2\); \(\quad V_3=4-2=2\) \(\quad s_{jt}=\frac{e^{x_{jt}\beta-\alpha p_{jt}}}{\sum_{k=1}^J e^{x_{kt}\beta-\alpha p_{kt}}}\)
Person A chose product 3 and Person B chose product 2. Product 1 is the “outside option” — not purchasing. What happens to the choice probabilities if you change \(\alpha\) to \(1.1\)? What if you change \(\alpha\) to \(0.9\)?
\[\rho=1-\frac{ln L(\hat\beta)}{ln L(0)}\]
Hit Rate: % of individuals for whom most-probable choice was actually chosen
R-sq using prediction errors at the \(jt\) level
*Andrews, Fudenberg, Lei, Liang, and Wu (2023) supports this belief

IIA is testable and usually rejected by data. Common remedies: (1) impose structure on the choice set (e.g., Nested Logit, Ordered Logit); (2) relax the i.i.d. \(EV_1\) error assumption (e.g., Multivariate Probit with correlated errors); or (3) change model structure so IIA does not obtain (e.g., heterogeneous logit).


Better prediction does not always lead to better decisions. What matters is not just how likely a customer is to buy each product, but how products compete with each other for the customer’s attention.

Same guy that we opened with — on the very next page after introducing supply & demand, Wright describes the price endogeneity problem, plus how to resolve it. We need supply shifters to trace out the demand curve. We need demand shifters to trace out the supply curve. If we only have endogenous P&Q, we cannot estimate either curve’s shifts, since both move frequently. We’ve known about price endogeneity for about 100 years now.
Data science is amazingly useful for making data-driven decisions, but doesn’t always consider the importance of missing data. Econometrics, by contrast, carefully considers what we can or cannot learn from available data, but is more focused on inference than action. What can you gain by combining the two disciplines?
This is a concrete example of how automated pricing creates price endogeneity. The price increase was caused by the demand shock, not the other way around. What would happen if you tried to estimate demand from this data without accounting for the shock?


The Luka trade shifted demand — more fans want to see the Lakers now. Prices and attendance both rose, but the higher price didn’t cause the higher attendance. What’s the omitted variable here?

These automated systems’ pricing decisions ensure that price and quantity variables will both be correlated with unobserved demand- and supply-shifters. Would it be possible to escape price endogeneity in demand estimation?

What is the best measure of price? Is it per-unit, or per-volume? Consumers are more responsive to price adjustments than to changes in product size, so shrinkflation and skimpflation can effectively obfuscate price increases. But what would this do to demand estimates?
Any may correlate with equilibrium prices, leading to endogeneity biases if left uncontrolled
This is called the “Problem of Multiple Determinants,” emphasizing the challenge of isolating individual causal relationships when many causes exist, and when causes may co-occur with each other. Where else does this challenge manifest?
This is a great place to ask questions.
This type of mistake could arise from a company failing to identify a relevant competitor, and therefore neglecting to include it in the choice set. Is that a plausible scenario?
# 1. Simulation parameters and correlated cost shocks
set.seed(14) # for reproducibility
n_periods <- 100 # number of periods
market_size <- 100 # total market size (e.g. number of customers)
rho <- 0.9 # influence of costshock1 on costshock2
# Demand model parameters
alpha <- 0.2 # price sensitivity (common across products)
intercept1 <- 9 # baseline utility for product 1
intercept2 <- 9 # baseline utility for product 2
# (Outside option utility is normalized to 0)
# Simulate cost shocks for the two firms (correlated)
shock1 <- rnorm(n_periods)
shock2 <- rho * shock1 + (1 - rho) * rnorm(n_periods)
# Derive prices from costs (higher cost shock -> higher price)
base_cost <- 1
price1 <- 3 * base_cost + shock1
price2 <- 3 * base_cost + shock2
cor(price1, price2)
# 2. Compute market shares and quantities using multinomial logit demand
data <- tibble(
period = 1:n_periods,
shock1 = shock1,
shock2 = shock2,
price1 = price1,
price2 = price2
) %>%
mutate(
# Indirect utilities for each product and outside option:
U1 = intercept1 - alpha * price1,
U2 = intercept2 - alpha * price2,
U0 = 0, # outside option utility (baseline 0)
# Convert utilities to choice probabilities (logit formula):
expU1 = exp(U1),
expU2 = exp(U2),
expU0 = exp(U0),
share1 = expU1 / (expU1 + expU2 + expU0),
share2 = expU2 / (expU1 + expU2 + expU0),
Q1 = market_size * share1,
Q2 = market_size * share2,
Q0 = market_size * (1 - share1 - share2)
)
# Create decile variable for Firm 2's price
data <- data %>%
mutate(p2_decile = ntile(price2, 10))
# 3. OLS regressions for Firm 1's demand
model_naive <- lm(Q1 ~ price1, data = data)
summary(model_naive)
model_full <- lm(Q1 ~ price1 + price2, data = data)
summary(model_full)
Scroll to read the full script, or download and run it.
> # 3. OLS regressions for Firm 1's demand
> model_naive <- lm(Q1 ~ price1, data = data)
> summary(model_naive)
Call:
lm(formula = Q1 ~ price1, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.31867 -0.34908 -0.00637 0.32919 1.19378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.78942 0.17660 293.26 <2e-16 ***
price1 -0.59737 0.05565 -10.73 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5011 on 98 degrees of freedom
Multiple R-squared: 0.5404, Adjusted R-squared: 0.5357
F-statistic: 115.2 on 1 and 98 DF, p-value: < 2.2e-16
> model_full <- lm(Q1 ~ price1 + price2, data = data)
> summary(model_full)
Call:
lm(formula = Q1 ~ price1 + price2, data = data)
Residuals:
Min 1Q Median 3Q Max
-4.177e-04 -6.103e-05 7.340e-06 7.187e-05 7.270e-04
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.000e+01 7.456e-05 670528 <2e-16 ***
price1 -4.999e+00 1.321e-04 -37832 <2e-16 ***
price2 4.998e+00 1.489e-04 33571 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0001478 on 97 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.226e+09 on 2 and 97 DF, p-value: < 2.2e-16
How do the demand slope estimates differ between the two models? Scroll to compare price1 estimates.

We used the exact same data to estimate both models, but they produced very different demand curves. Which appears to better fit the data?

Each color indicates a decile of Price2 realizations, such as the lowest 1-10%, 11-20%, …, highest 91-100%. Is price2 an unobserved variable that biases the Naive OLS model’s estimate of how Q1 responds to price1?
Best practice is to use multiple approaches and triangulate.


Use the class script as a starting point. Follow good visualization practices from week 1. Make your visualization beautiful and clear.


