UCSD MGT 100 Week 03
Segmentation: How do customers differ
Targeting: Which segments do we seek to attract and serve
Positioning
- What value proposition do we present
- How do our product's objective attributes compare to competitors
- Where do customers perceive us to be
- How do we want to influence consumer perceptions
Market mapping helps with Positioning
Market maps use customer data to depict competitive situations. Why?
- Understand brand/product positions in the market
- Track changes
- Identify new products or features to develop
- Understand competitor imitation/differentiation decisions
- Evaluate results of recent tactics
- Cross-selling, advertising, identifying complements or substitutes, bundles...
We often lack ground truth data
- Using a single map to set strategy is risky
Repeated mapping builds confidence
(“Movies, not pictures”)
Many large brands do this regularly
Suppose you are the UCSD Chancellor
You want to know how much each UC Campus competes with you for California freshman applicants
You posit that selectivity and time-to-degree matter most
- Students want to connect with smart students
- Students want to graduate on time
Enter Principal Components Analysis
- Powerful way to summarize data
- Projects high-dimensional data into a lower dimensional space
- Designed to minimize information loss during compression
- Pearson (1901) invented; Hotelling rediscovered (1933 & 36)
For \(J\) products with \(K<J\) continuous attributes,
we have \(X\), a \(J\times K\) matrix
Consider this a \(K\)-dimensional space containing \(J\) points
Calculate \(X'X\), a \(K \times K\) covariance matrix of the attributes
1st \(x\) eigenvectors of the attribute covariance matrix give unit vectors to map products in \(x\)-dimensional space
- We'll use first 1 or 2 eigenvectors for visualization
How do I interpret the principal components?
- Each principal component is a linear combination of the larger space's axes
- Principal components are the "new axes" for the newly-compressed space
- Principal components are always orthogonal to each other, by construction
What are the main assumptions of PCA?
- Variables are continuous and linearly related
- Principal components that explain the most variation matter most
- Drawbacks: information loss, reduced spatial interpretability, outlier sensitivity
How do I choose the # of principal components?
- Business criteria: 1 or 2 if you want to visualize the data
- Business criteria: Or, value of compressed data in subsequent operations
- Statistical criteria: Cume variance explained, scree plot, eigenvalue > 1
What are some similar tools to PCA?
- Factor analysis, linear discriminant analysis, independent component analysis...
K-Means identifies clusters within a dataset
- K-Means augments a dataset by identifying similarities within it
- K-Means never discards data
PCA combines data dimensions to condense data with minimal information loss
- PCA is designed to optimally reduce data dimensionality
- PCA facilitates visual interpretation but does not identify similarities
Both are unsupervised ML algos
- Both have "tuning parameters" (e.g. # segments, # principal components)
- They serve different purposes & can be used together
- E.g. run PCA to first compress large data, then K-Means to group points
- Or, K-Means to identify clusters, then PCA to visualize them in 2D space
How to measure intangible attributes like trust?
- Ask consumers, e.g. "How much do you trust this brand?"
- Marketing Research exists to measure subjective attributes and perceptions
What if we don’t know, or can’t measure, the most important attributes?
- Multidimensional scaling
How should we weigh attributes?
Suppose you can measure product similarity
For \(J\) products, populate the \(J\times J\) matrix of similarity scores
- With J brands, we have J points in J dimensions. Each dimension j indicates similarity to brand j. PCA can projects J dimensions into 2D for plotting
Use PCA to reduce to a lower-dimensional space
- Pro: We don't need to predefine attributes
- Con: Axes can be hard to interpret
MDS Intuition, in 2D space
- With a ruler and map, measure distances between 20 US cities ("similarity")
- Record distances in a 20x20 matrix: PCA into 2D should recreate the map
- But, we don't usually know the map we are recreating, so we look for ground-truth comparisons to indicate credibility and reliability
Examples:
- Poli Sci: Which political candidate positioning, eg left to right
- Psychologists: understand perceptions and evaluation of personality traits
- Marketers: how consumers perceive brands or perceive product attributes
Demand modeling uses product attributes and prices to explain customer purchases
Heterogeneous demand modeling uses product attributes, prices and customer attributes to explain purchases
- "Revealed preferences": Demand models explain observed choices in uncontrolled market environments
Related: Conjoint analysis estimates attribute weights in simulated choice environments
- "Stated preferences": Conjoint explain hypothetical choice data in controlled experiments
Suppose an English speaker knows \(n\) words, say \(n=10,000\)
How many unique strings of \(N\) words can they generate?
- N=1: 10,000
- N=2: 10,000^2=100,000,000
- N=3: 10,000^3=1,000,000,000,000=1 Trillion
- N=4: 10,000^4=10^16
- N=5: 10,000^5=10^20
- N=6: 10,000^5=10^24=1 Trillion Trillions
- ....
Why do we make kids learn proper grammar?
- Average formal written English sentence is ~15 words
represent words as vectors in high-dim space
- Really, "tokens," but assume words==tokens for simplicity
Assume \(W\) words, \(A<W\) abstract concepts
- Assume we have all text data from all history. Each sentence is a point in $W$-dimensional space
We could run PCA to reduce from \(W\) to \(A\) dimensions
- Assume we have infinite computing resources
- We now have every sentence represented as a point in continuous A-space
Recode prompt to maximize contextual understanding
- E.g. 'the bank of the river is steep' vs 'the bank near the river is solvent'
- This step is the 'attention' step you hear a lot about
- Basically, modify every word's location depending on every other word's position in the prompt sequence
Feed recoded prompt into transformer as a sequence of points in concept-space
Predict the next point and add it to the sequence
Repeat step 3 until no more good predictions
Repeat steps 1-4 many many times, then hire humans to evaluate results, use evaluations for RLHF to refine the process
Sell access to customers
- Use the money to train a bigger LLM
Can generate intelligible semantic sequences
Can help humans save time and effort in semantic tasks
Can uncover previously unknown relations in training data
Can enable semantic analysis of product review corpuses to understand customer perceptions, evaluations and satisfaction
Can’t distinguish truth from frequency in training data
- Need a conceptual model of the world for this
- LLMs propagate popular biases in training data, unless taught otherwise
Can’t reliably evaluate previously-unknown relationships in training data
- At least, not by themselves, or not yet; but maybe soon
Can’t discover new relationships that are not present in training data
- At least, not by themselves, or not yet; but maybe soon
Can’t think, reason, imagine, feel, want, question
- But might complement other components that do these things
Truly, no one knows yet. The tech is far ahead of science
- LLMs are productive combinations of existing components
- This has happened before: stats/ML theory chases applications
- Spellchecker and calculator are wrong long-term analogies
My guesses
- "It's easy to predict everything, except for the future."
- Simple tasks: LLMs outcompete humans
- Medium-complexity tasks: LLMs help low-skill humans compete
- Complex tasks: Skillful LLM use requires highly skilled humans
- Law matters a LOT: Personal liability, copyright, privacy, disclosure
- In eqm, typical quality should rise; *not* using LLMs will handicap
- Long term: More automation, more products, more concentration of capital
- More word math techniques will be invented, some will be useful
What future new technologies might complement LLMs?
- Argument about Sentient AI comes down to this
- Robots? World models? Causal reasoning engines? Volition?
Generate consumer choice data, analyze it to
Until now, we studied existing product attributes
Enter conjoint analysis:
Survey and model to estimate attribute utilities
Probably the most popular quant marketing framework
- Autos, phones, hardware, durables
- Travel, hospitality, entertainment
- Professional services, transportation
- Consumer package goods
Combines well with cost data to select optimal attributes
Identify \(K\) product attributes and levels/values \(x_k\)
Hire consumers to make choices
Sample from product space, record consumer choices
Specify model, i.e. \(U_j=\sum_{k}x_{jk}\beta_k-\alpha p_j+\epsilon_j\)
and \(P_j=\frac{\sum_{k}x_{jk}\beta_k-\alpha_k p_j}{\sum_l\sum_{k}x_{lk}\beta_k-\alpha p_l}\)
- Beware: p is price , P is choice probability or market share
Calibrate choice model to estimate attribute utilities
Combine estimated model with cost data to choose product locations and predict outcomes
In 2013, Uber hypothesized
- some riders would wait and walk for lower price
- some riders would trade pre-trip predictability for lower price
- shared ridership could ↓ average price and ↑ quantity
- more efficient use of drivers, cars, roads, fuel
Shared rides were new for Uber
- Rider/driver matching algo could reflect various tradeoffs
- POOL reduces routing and timing predictability
Uber had little experience with price-sensitive segments
- What price tradeoffs would incentivize new behaviors?
- How much would POOL expand Uber usage vs cannibalize other services?
Coordination costs were unknown
- "I will never take POOL when I need to be somewhere at a specific time"
- Would riders wait at designated pickup points?
- How would comunicating costs upfront affect rider behavior?
So, Uber used market research to design UberPOOL
23 in-home diverse interviews in Chicago and DC
- Interviewed {prospective, new, exp.} riders to (1) map rider's regular travel, (2) explore decision factors and criteria, (3) a ride-along for context
- Findings identified 6 attributes for testing
Online Maximum Differentiation Survey
- Selected participants based on city, Uber experience & product; N=3k, 22min
We may fail to consider most important attributes or levels
- But we can consult with experts beforehand and ask participants during
Additive utility model may miss interactions or other nonlinearities
- But this is testable & can be specified in the utility model
Exploring a large attribute space gets expensive
- But we can prioritize attributes and use efficient, adaptive sampling algorithms
Assumes accurate hypothetical choices based on attributes
- But we can train consumers how to mimic more organic choices
Participants may not represent the market
- But we can measure & debias some dimensions of selection
Choice setting may not represent typical purchase contexts
- But we can model the retail channel and vary # of competing products
Participant fatigue or inattention
- But we can incentivize them and check for preference reversals
Consumer preferences evolve
- But we can repeat our conjoints regularly to gauge durability
Market maps use customer data to depict competitive situations
PCA projects high dimensional data into lower dimensional space w minimal information loss
Embeddings represent words as points in concept-space, enabling word-math
Conjoint analysis uses survey choice data to
- map markets
- estimate product attribute utilities
- help design products as bundles of attributes
- predict how location choice leads to revenues, profits