Customer Data and Visualization

UCSD MGT 100 Week 02

Kenneth C. Wilbur and Dan Yavorsky

Let’s reflect

Customer Data

Customer Data

  • Customer analytics is … ?

  • Marketing is … ?

  • Customer data show how customers and consumers learn, feel, behave and use products and services

  • Customer data increasingly drives marketing, but implementation varies widely

How can we use customer data?

  • Customer relationship management: Acquire, develop, retain and “fire” customers
  • Marketing mix: Improve product offerings, prices, promotion, distribution
  • Understand customer heterogeneity for targeting, personalization, recommendations, product development…
  • Privacy and security, e.g. misuse, theft, regulatory compliance

How to evaluate customer data

  • Accurate : The data are what we think they are
  • Representative : The data reflect the relevant customer population as a whole
  • Private : The data do no harm & comply with laws & ethics
  • Relevant: The right data for the decision at hand
  • Complete : Missingness causes problems

3 main types of Customer Data

  • Customer Attributes:
    Demographics, Psychographics, Needs, Behaviors

  • Product Attributes:
    Tangible features, intangible features (e.g., brand, name, warranty), costs

  • Transaction/Event Attributes:
    Sales, retail context, time, “4 P’s”, ads, ratings, reviews, posts, complaints, support requests, returns, referrals

Example: Nielsen

CONSUMER PANEL DATA

      - The Consumer Panel Data include longitudinal data beginning in 2004 with annual updates. These data track a panel of 40,000–60,000 US households and their purchases of fast-moving consumer goods from a wide range of retail outlets across all US markets. 

RETAIL SCANNER DATA

      - Retail Scanner Data consist of weekly pricing, volume, and store environment information generated by point-of-sale systems from more than 90 participating retail chains across all US markets. Data begin in 2006 and include annual updates. 

Customer Data: Guiding Principle

  • Start simple. Complexify slowly. Why?

  • Never assume data are correct, clean, complete or as described.

      - It's impossible to certify an absence of problems
      - Most commercial datasets have issues
      - Issues often detected months after project starts
      - ~70-90% of data scientist time spent checking and cleaning data
      - Credibility is hard to gain, easy to lose

Data takeouts

  • A great way to start looking at customer data:
    Look at your own!

  • Google Takeout

  • Instagram

  • Amazon Request My Data

        - Or, search "[brand] + data takeout"
        - Let's take a quick look
        - Brands with data takeout often have good analytics

Using Customer Data for Customer Analytics

How do we “do” customer analytics?

  • Decide what we want to do & how to judge performance
  • Collect, wrangle, clean & verify relevant data
  • Analyze data
  • Communicate analyses and recommendations
  • Make decisions
  • Implement data-driven decisions
  • Retrospectively evaluate and improve
  • Repeat
  • …Once you have a stable process, automate carefully & monitor

Challenges: Executives

  • May be territorial, or incentivized to be
  • May worry that analytics will constrain or replace them
  • May think data == magic
  • May prefer hunches or misunderstand uncertainty

Challenges: Analysts

  • Expensive
  • Hard to find
  • Not always current
  • Not always interested in business

Challenges: Cultural

  • Do analytics make or justify decisions?
  • High- or low-trust environment? Tolerance for uncertainty?
  • Do messengers get rewarded or shot?
  • Are data available and integrated?
  • Do teams work together or compete?

Signs of a great analytics org

  • C-level champion(s), i.e. C{E,A,F,M,O}O
  • Centralized team regulates data, arch., standards & tools
  • Decentralized analysts collaborate with execs
  • Analytics career tracks are well established
  • Careful in-housing/outsourcing decisions about analytics
  • Good examples?

Evaluating a commercial data set

  • Utility: What profit impact?
  • Frequency: Movies > Pictures
  • Reliability: Can you verify it?
  • Privacy: Is it legal? Will customers object?
    Would your use be newsworthy?
  • Size: Sample or population? What identifiers?
  • Cost: Commensurate with utility? Can you test-drive it?

Customer analytics: Tools

  • FORTRAN, C++
  • SQL, SAS
  • Python, R
  • Matlab, Octave, Julia
  • S-Plus, STATA, SPSS
  • Best bundle for new analysts: SQL + Python + R

Analytics truisms

  • Analytics matters more in B2C than B2B (why?)

  • Selection effects are usually large

      treatment effects are usually small
  • Demographics don’t predict behavior very well

  • Agencies lie about data sometimes

  • You have limited credibility. You may only get a few strikes

  • “If it’s written in LaTeX, it’s probably correct”

Analytics applications

Marketing Science

The OG Customer Analytics discipline

Marketing Science

  • Branched from mgmt sci & econ in 1950s
  • Theories begin from observations, not assumptions
  • Research often driven by real problems (e.g., “How to…”)
  • Assimilated “consumer behavior” experiments from psychology
  • Open: Relies on anthro, CS, DS, econ, psych, soc, stats, …

Marketing Science Milestones

  • 1910s: Opinion polls became popular
  • 1920s: Questionnaire survey research
  • 1920s: Copy research based on AIDA model
  • 1920s: Retail product sales measurements (Nielsen)
  • 1930s: Field experiments, telephone surveys and panel data
  • 1930s: Radio audiences (Arbitron)
  • 1940s: Panel data on consumer purchases
  • 1950s: TV audiences (Nielsen)
  • 1960s: Companies’ own customer data, RFM metrics
  • 1960s: Inventory tracking data

Marketing Science Milestones

  • 1970s: Census geodemographic data and credit agencies
  • 1970s: UPC codes and digital registers enabled scanner panel data
  • 1980s: Retail loyalty cards, TV set-top boxes
  • 1990s: IRI in-home barcode scanning panel
  • 1990s: CRM software, metrics and personalization
  • 1990s: Email, Online display ads, WWW clickstream data
  • 1990s: Search engine optimization and marketing
  • 2000s: Social network data, Online marketplace data
  • 2000s: User-generated content e.g. reviews, blogs, video; big data tech
  • 2010s: Phones & app usage, geolocation surveillance; many new ML techniques
  • 2020s: Large language models, generative models, …
  • All of these are still in active use; more will be invented

Study Groups: Overview

  • Best approach: Use group to check your work

        - Teaches you & prepares you for the paper-based final
        - Exposing you to no risk
        - Schedule a regular meeting shortly before quiz deadline
        - Schedule a kick-off as get-to-know-you
  • A shaky approach that is permitted

        - Divide up the quizzes, each prepare 1-2 quizzes and scripts
        - Good luck on the final, maybe you will get lucky!
  • A bad approach that violates academic integrity

        - Sharing answers or scripts across groups
        - We can detect this: Don't let this happen to you
        - The academic integrity process is stressful
  • Note: You are not obligated, all work is individual

  • Meet your study group
  • Schedule a kick-off meal & regular weekly meetings
  • Input your schedule into Canvas

Data Visualizations (Viz)

  • “The greatest value of a picture is when it forces us to notice what we never expected to see.”

        - Boxplot Inventor John Tukey
        - Great visualizations raise new questions          

When Elon bought Twitter

DraftKings, FanDuel Searches during an NFL Game, 9-10 P.M.

SDPD Crime Reports Near UCSD

Viz build trust and understanding

  • Viz are lingua franca across disciplines
  • Eyeballs can interpret data quickly
  • Easily replicated -> more easily trusted
  • Understandable to managers
  • Can detect unknown errors
  • Best when they raise deeper questions

4 Datasets, Same \(\hat{\beta}^{OLS}\)

Univariate Viz

  • Numeric data: Summary stats, boxplots for quantiles, histograms for distributions

  • Categorical data: Values, counts and frequencies

  • Alphanumeric data: Random entries and outliers

  • All types: What observations are zero, missing or NA? Why?

      - Perennially a key issue

3 Bivariate Viz Types

  • Continuous vs continuous
  • Continuous vs categorical
  • Categorical vs categorical
  • Which visualization?

Class script

  • Univariate statistics
  • Univariate plots
  • Bivariate statistics
  • Bivariate plots

Wrapping up

Homework

  • Let’s take a look

Recap

  • Customer data consist of customer attributes, product attributes, transaction attributes
  • Good data are Representative, Unbiased, Private, Relevant, Complete
  • Data Viz are best way to start in customer analytics

Going further