MGT 100 Week 2
This version: March 2026 | License: CC BY 4.0 | We use javascript to track readership.
We welcome reuse with attribution. Please share widely.
Product needs predict behavior far better than demographics. Why might marketers still default to demographic segmentation?

Few people ask for room-temperature tea. Why did Coke introduce Coke Zero when it already had Diet Coke? Sometimes when you try to appeal to everyone, it works; at other times, you appeal to nobody. How can you predict how the market will react to your product?
Segments in this class, ranked by size: (1) BE majors partially interested in Customers/Marketing, (2) BE majors keenly interested in Customers/Marketing, (3) Non-BE majors, often minoring in Business, Marketing or Business Analytics.

Good segments must be measurable – you need data to identify who belongs to each group. Why do you think those high-FICO accounts are unprofitable?

Small segments can sustain profitable niche-focused services, but if you go too small, you can’t sustain operations. Which of these services do you think survived?

Literally, yes – segmenting by gender is sexist by definition. Does gender reflect genuine differences in product needs, or is it just lazy categorization? Who gets to decide?

We have some high-quality evidence that some products targeted toward women do not cost more than identical products targeted toward men. Gender-based price differences are sometimes conflated with preferences or costs that correlate with gender. Should we ban price differences between products that target different genders?
Demographics are easily measurable, visually perceivable, and how many people identify themselves. But what really predicts customer purchases? Why do customers buy things?

| Low Price Sensitivity |
Mid Price Sensitivity |
High Price Sensitivity |
|
| Low Income | |||
| Mid Income | |||
| High Income |
If Walmart only targets 4 out of 9 segments, how many segments should your business target?
Press c to draw

This is a fairly common procedural approach to market research, which usually involves data generation to answer questions about customers (see MGT 108R). What do you notice about segment sizes? Why do the demographic stats follow the user needs?

Why did Mozilla publicize its segmentation scheme? To convince its volunteer developers that typical users are much less technical and interested in simpler features than what those engineers want to build. You must always remember that your customers, by and large, are very different from you, and that you need to rely on data to understand them: your own experience is not reliably predictive.

Internal language shows deep understanding of target customer needs. How would UO customers feel about this characterization?

Previously called “CRM systems,” “data warehouses,” “data lakes.” The names change but the core jobs remain the same. Which of these 4 jobs do you think is hardest to do well?
How can we find out how our competitors are segmenting the market?

This process illustrates how analytics techniques are often most useful when integrated within cross-functional teams, enabling the domain experts and business users to collaborate on the outputs.
\[ \min \sum_{k=1}^K W(C_k)\]
K-Means is an early unsupervised ML algorithm that categorizes objects without known truth. Many alternative approaches exist. K-Means formalizes an intuitive idea: find K groups that maximize internal similarity. How does average internal similarity change with K?
\[ W(C_k)=\sqrt {\sum_{i \in I_k} \sum_{p} (x_{ip}-\bar{x}_{kp})^2}\]
where \(\bar{x}_{kp}\) is the average of \(\bar{x}_{ip}\) for all \(i \in I_k\), and the centroid is \(C_k=(\bar{x}_{k1}, ..., \bar{x}_{kP})\)
Next, we talk about how we get the \(I_k\) segment membership sets.

We usually use ‘hillclimbers’ to optimize numeric functions. Imagine I asked you to find the highest point on campus; how? What if I asked you to find the highest point on Earth?
\(W(C_k)\) is not globally convex, so we can’t guarantee a global minimum. Thus, we pick many starting points, and see which offers the lowest \(W(C_k)\).
Note: Some algos promise to find global minimum, but this is only provable for a globally convex function.


The idea of Positioning was adapted from military battlefield tactics: Where are our infantry? Where is the enemy? Which troops do we direct where to win? Market mapping enables quantifiable and evaluable strategic decisions.
Vertical (Quality)
Horizontal (Fit or Match)

Hotelling argued persuasively that sellers of a commodity could generate economic profits, simply because consumer transportation costs generate local market power. What is a “marginal consumer”?

What happens when an ice cream vendor moves toward her competitor?

A famous theoretical application of the Hotelling line predicts that two competing political parties may adopt similar positions, if both try to appeal to the same median voter, even if nearly every other voter might prefer a different position.
This two-attribute restriction is simply to enable graphing. What other attributes might matter for university choice?

LA vs. Berkeley: Is that ordering right? Market maps are only as good as the data and attributes we choose.

“Movies, not pictures” – tracking positions over time reveals trends that a single snapshot might miss. What changes do you notice?
PCA facilitates market mapping by optimally compressing high-dimensional product attribute spaces to graphable lower-dimensional spaces.
We customarily rank-order the principal components in descending order of variance explained, so the first is always the most important, second is second-most important, etc. What is the max number of principal components possible in a K-dimensional space?

Project each datapoint orthogonally onto the best-fit line (R-sq = .86). This is the core geometric idea behind PCA – finding the direction that explains the most variation in the data.

Now, just graph the 1-dim line with the original points projected onto it. Do the relative conclusions hold up? What does PC1 mean? How much information did we lose? Note, these are standardized data; that’s why some relative positions change. Now imagine projecting from 20-dimensions down to 2.
Key PCA choices: Which variables to include, how many PCs to keep, how to interpret locations in space. What would you do if the first component explains only 30% of variance?
PCA compresses; K-Means groups. They complement each other well: compress first to remove noise, then cluster, or cluster first then compress to visualize. Why does the order matter?

Understanding the landscape of ML algorithms helps you choose the right tool for the job. Where does K-Means fit?
Marketing research measures subjective perceptions. Demand modeling can help weigh attributes.

Applications: political candidate positioning, personality trait perception, brand/product perception.

Netzer et al. pioneered using text data from online car reviews to construct market maps. Brand similarity rankings came from organic pairwise comparisons within text reviews. How does this approach compare to marketing research surveys?

The map on the left is based on co-occurrence of car brands within text reviews; the map on the right is based on switching data from JD Power. The map clearly shows distinct submarkets. How do these two data sources compare? Comparisons like this are called “convergent validity.”

The authors also showed how to relate MDS results to attribute space. Notice that “attributes” are N-grams of words. Which features do consumers use to compare these brands?
How does demand modeling differ from PCA in choosing product attribute weights?
Grammar narrows down the set of unique sentences a person can generate, making prediction easier and enabling mutual understanding. The combinatorial explosion of language is a core challenge for any text analysis method.

Embeddings connect text analysis to the PCA and dimensionality reduction ideas we just covered. Words become points in a continuous space, enabling mathematical operations on language. How does this compare to what we did with product attributes?

Other classics include Paris - France + Germany ≈ Berlin ; dollar - USA + UK ≈ pound ; Google - search + social ≈ Facebook. Triton - UCSD + UCLA ≈ ____ ?

Suppose we had many many sentences, and defined 13 binary attributes for the presence of each of these 13 words. We could run PCA to reduce that 13-dimensional word space to a 2-dimensional concept space. Suppose we were to add a third dimension; what might it represent?

We could visualize each sentence within the concept space as a sequence of points. We could use this sequence data to train an autoregressive model to predict each sentence’s next point given the sequence’s history up to that point. What would that model commonly be called?

Transformers were developed to translate languages, e.g. mapping ‘sombrero’ to ‘hat.’ When trained well, they can transform data of generic type x to generic type y. There has been rapid progress in the past 20 years thanks to digital data, faster computers, and better algorithms.

Have you ever trained a dog to sit? If yes, you provided the reward function: You get treat/praise if and only if you sit. The dog learns its “policy function” through experimentation. What is the reward function in RLHF for LLMs?
Train once: Pre-train on all digital text to learn possible sequences in concept-space; train via RLHF
Then, when you input a prompt:
Sell access to customers, train a bigger model, teach the model to improve itself, take over the world.
Newer ‘reasoning’ features break this process up into sub-steps for complex prompts, and evaluate subsequence utility via narrower versions of RLHF. For example, if a prompt response includes specifying and then solving a math problem, the output can be compared against a database of correctly-solved problems, and discarded if the wrong solution was provided, and a new sequence can be generated. This usually involves generating multiple sequences and outputting the most-rewarded.

Try to make your visualization easy to understand. Test it on your friends before you submit.


