WeWork NPR Podcast

This podcast from NPR is pretty great and sounds an interesting anticipation of all the stuff that just unfolded at WeWork. Some highlights:

  • Adam Neumann used to walk around barechested and barefooted in the elevator in his NYC condo building. There’s a couple of other details of him but he definitely comes out as this quite odd character – no surprise:
seems a pretty normal character, walking barefoot and dodging dog turds in NYC right?
  • At some point in the recording Guy Raz, the host, suggests quickly “…so you guys were bs a little bit?”, (min 21) when referring to the fake credentials the founders used to approach landlords.
    Now I guess it is common to “bs” your way around things when bootstrapping a company but it seems they went as far as installing all the electric wiring on one of their first location apparently without relevant experience or certification…

You can listen the podcast here:

<iframe src="https://www.npr.org/player/embed/643774290/643778532" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player"></iframe>
WeWork NPR Podcast

Ecommerce Analytics Playbook – Cohort Analysis

This article describe the steps on how to run a cohort analysis in python. However is also part of something more extended that I always wanted to work on. In the past few weeks I’ve written a series of articles on practical use of Python for specific eCommerce use case. Consider it as a basic toolkit of concepts and methodology for analysts to quickly answer the most relevant questions around an eCommerce business, from overall performance to active/repeat buyers, profitability, lifetime value or basket composition. These should be the essential for any transactional business on the internet.

Please note this is still a work-in-progress. I’m consolidating years of notes and smaller projects, so the list below is likely to change:

  1. Cohort Analysis
  2. eCommerce Exploratory Data Analysis
  3. Association Analysis
  4. Customer Segmentation
  5. RFM Analysis
  6. Customer Lifetime Value
  7. TBD

…ok, but WHY??

Most machine learning and data analytics courses provide a solid foundation on methodology and overall statistical knowledge. This is clearly necessarily and indeed it should be the core of any course, however most of them show significant weaknesses in two main areas:
  1. Examples are usually extremely simplified versions of real problems. While this is great for beginners, the reality is often way less forgiving. The average data available in most companies is often incomplete, requires significant cleaning and feature engineering. The transition from class materials to the real world can be quite unforgiving. The closer to a real world scenario we train and learn, the less intimidated we’re going to be.
  2. second and to me the most relevant: there is very little information available for commerce analytics specifically. Most courses provide general knowledge on methods and techniques, but hardly specific use cases for commerce or marketplaces. And most of these courses focus on quite some aspirational goals totally overlooking the basics (e.g. active/repeat buyers, CLV, basked analysis…). Most students would finalize those courses with a pretty decent understanding of statistics but could hardly relate which tool is best for which problem.
The goal here is to provide the reader with enough knowledge to solve real day to day problems in any commerce or transactional environment using Python and machine learning libraries, clearly relating each use case with the right tool and eventually having a positive impact to the bottom line of the company.

…so who is this for??

I’d rather say for who is this NOT:

  • If you’re an accomplished data scientist on the 0.5% of Kaggle, you’d be better closing this tab and switch over to reddit. I’m afraid there’rs nothing
  • If you’re not a data scientist but know enough Python to build and deploy your own models, you’d be also better getting on reddit now


  • If you’re a marketer or a business lead with a pretty decent technical background and some coding, you could enjoy this
  • If you’re a business analyst with strong SQL skills and decent business acumen, this could be a decent intro (and mostly immediately actionable) to Python and data science.

…fine, I’m neither one of the last two categories but I want to do this. What do I need?

  • Some familiarity with Python. Don’t expect to understand each and every line of code but at least have a grasp of the language
  • Python3.6 installed on your machine (look into this guide for OSX and this other one for windows if you dont have any). Also Jupiter Notebooks is a great tool to build and test stuff locally.
  • Some business understanding (e.g. you must know how an eCom business work, but that should go without saying).

OK, let’s start.

What is Cohort Analysis

Straight up from wikipedia:

Cohort analysis is a subset of behavioral analytics that takes the data from a given data set (e.g. an EMRS, an e-commerce platform, web application, or online game) and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined time-span.

What this means is that we’ll be slicing our dataset in different bits with the same characteristics and we’ll try to understand how differently these groups behave through time.

Always keep in mind that cohorts are mutually exclusive segments that compare metrics across product lifecycle but also compare metrics across customer lifecycle. Essentially you can have time cohorts, behavior cohorts and size cohorts. For this example we will be looking at behaviour through the customer lifecycle, measuring time between first and following purchases.

Why (eventually) every business needs to do Cohort Analysis

Current eCommerce often struggle to switch from a transactional/sales-oriented to a customer-based perspective. A last-click-sales-oriented perspective of the business is one that primates gross revenue, average order value and sales and often neglects active buyers, repeat buyers, new or reactivated buyers. These metrics are all complementary to each other.
This mindset has a lot to do with the level of maturity of an analytics organization. Early stages eCommerce are mostly focused on sales, mostly because of the lack of volume of clients and the constant needs to refine their operational
These metrics are even more relevant for any SaaS product in  which the transactional perspective is less relevant (you might have a limited number of tiers or plans, with little impact in terms of sales variance) therefore the important metrics are tied directly to the user, the amount of time it decides to remain with the product and what takes them to leave the platform.

The dataset

Here’s the dataset. This is a pretty basic purchase dataset, with customer IDs, OrderDate, OrderID. In the real world you might need to do quite some wrangling before getting to a clear table like this (possibly join a user data table with a purchase table for instance). The most relevant characteristic here is that each purchase or order has a “pickup date” associated and a Order Date, which is also related to a specific user ID. With these three pieces of information we can clearly understand what was a user first purchase, if the same user went back purchasing again later, which is essentially what we want to understand. The time of purchase will be the defying dimension of our dataset.

Now we could easily make things a notch more complicated, adding for instance the “channel” dimension, or the “device” dimension (are users from mobile devices coming from paid search campaigns more likely to repeat their purchase etc. etc.?). In that case we should add this new variable to the mix and create a further drill-down dimension to our cohorts, however for the purpose of this example we would stick to the most basic scenario.

The process

  1. Review the dataset
  2. Create additional attribute and necessary feature engineering
  3. Review and visualize the cohorts

The code

Let’s fire up all the relevant libraries. Numpy and Pandas will be required for some wrangling, Seaborn and Matplotlib for visualization. Datetime will be required to work on the time attribute of the dataset

import numpy as np
import pandas as pd
import datetime as dt
import seaborn as sns
import matplotlib.pyplot as plt

Now we need to read the excel file:

df = pd.read_excel('relay_foods.xlsx')

Let’s look at the dataframe:

OrderId OrderDate UserId TotalCharges CommonId PupId PickupDate
0 262 2009-01-11 47 50.67 TRQKD 2 2009-01-12
1 278 2009-01-20 47 26.60 4HH2S 3 2009-01-20
2 294 2009-02-03 47 38.71 3TRDC 2 2009-02-04
3 301 2009-02-06 47 53.38 NGAZJ 2 2009-02-09
4 302 2009-02-06 47 14.28 FFYHD 2 2009-02-09

The OrderDate column shows year, month and day. For the purpose of this exercise, we’ll be creating cohorts based on months. For this reason, we’ll need an additional column based on the OrderDate column, stating only the month:

#process order period date to cohort label

df['OrderPeriod'] = df.OrderDate.apply(lambda x: x.strftime('%Y-%m'))

Let’s look again at our dataframe:

OrderId OrderDate TotalCharges CommonId PupId PickupDate CohortGroup OrderPeriod
47 262 2009-01-11 50.67 TRQKD 2 2009-01-12 2009-01 2009-01
47 278 2009-01-20 26.60 4HH2S 3 2009-01-20 2009-01 2009-01
47 294 2009-02-03 38.71 3TRDC 2 2009-02-04 2009-01 2009-02
47 301 2009-02-06 53.38 NGAZJ 2 2009-02-09 2009-01 2009-02
47 302 2009-02-06 14.28 FFYHD 2 2009-02-09 2009-01 2009-02

With the lambda function above we’ve taken the data from the OrderDate column and transformed into a YYYY-MM string. This is now our CohortGroup, since it allows as to group the data by the month of the purchase.

df['CohortGroup'] = df.groupby(level=0)['OrderDate'].min().apply(lambda x: x.strftime('%Y-%m'))

If we look at our dataframe now, it will be ordered by UserID with the earliest purchase on top with the relevant cohort group:

UserId OrderId OrderDate TotalCharges CommonId PupId PickupDate CohortGroup OrderPeriod
0 47 262 2009-01-11 50.6700 TRQKD 2 2009-01-12 2009-01 2009-01
1 47 278 2009-01-20 26.6000 4HH2S 3 2009-01-20 2009-01 2009-01
2 47 294 2009-02-03 38.7100 3TRDC 2 2009-02-04 2009-01 2009-02
3 47 301 2009-02-06 53.3800 NGAZJ 2 2009-02-09 2009-01 2009-02
4 47 302 2009-02-06 14.2800 FFYHD 2 2009-02-09 2009-01 2009-02
5 47 321 2009-02-17 29.5000 HA5R3 3 2009-02-17 2009-01 2009-02
6 47 333 2009-02-23 18.9100 RSXQG 2 2009-02-23 2009-01 2009-02
7 47 341 2009-03-01 64.4100 KHBDJ 3 2009-03-03 2009-01 2009-02

It would be all good only if the UserID column wouldnt be used as Index. For the purpose of the following steps, we must make sure that the UserID column is usable and not employed as the dataframe index:


Now we can start aggregating all the relevant variables, Users, Orders and TotalCharges. The lines below apply sum to TotalCharges and run a SQL-like count distinct select to OrderIDs and UserIDs:

grouped = df.groupby(['OrderPeriod','CohortGroup'])

# count the unique users, orders, and total charges per Group + Period
cohorts = grouped.agg({'UserId': pd.Series.nunique,
 'OrderId': pd.Series.nunique,
 'TotalCharges': np.sum})

# make the column names more meaningful
cohorts.rename(columns={'UserId': 'TotalUsers',
 'OrderId': 'TotalOrders',
 'TotalCharges': 'TotalValue'}, inplace=True)

Now that we have all our relevant data nicely aggregated, we still need to build the actual reference for the period of the cohort we want to analise:

def cohort_period(df):
 df['cohort_period'] = np.arange(len(df))+1
 return df

cohorts = cohorts.groupby(level=0).apply(cohort_period)

The function above simply reference each cohort to the month of the first purchase.

We’re almost there: now we just need to visualize things the right way. The Pandas unstuck function allows us to pivot to the required level so that we can shape the dataframe the way we need to to plot a headmap:

cohort_group_size = cohorts['TotalUsers'].groupby(level=0).first()
user_retention = cohorts['TotalUsers'].unstack(0).divide(cohort_group_size, axis=1)

And finally, this is the seaborn code to plot the heatmap:

plt.figure(figsize = (11,9))
plt.title('Cohort Analysis - Retention Rate')
sns.heatmap(data = user_retention, 
 annot = True, 
 fmt = '.0%', 
 vmin = 0.0,
 vmax = 0.5,
 cmap = "YlGnBu")

The end result should look something like this:

Ecommerce Analytics Playbook – Cohort Analysis

Why Amazon is probably the best managed company in the world

From https://www.reddit.com/r/business/comments/544bmu/amazon_closes_at_alltime_high_above_800_for_the/

Jeff Bezos keeps his eye on long-term growth and cares nothing for short-term stock value. So many companies fall apart because their CEOs’ number 1 goal is to raise the value of the stock quarter after quarter. Bezos, on the other hand, does all sorts of things that make good sense but would sound insane to CEOs with this attitude, most famously avoiding profits (in favor of using all of the money to grow bigger). That’s one of the reasons why the stock has a ridiculous P/E ratio, sometimes in the thousands, and yet is a lot better than that would normally indicate.

He also is willing to take plenty of measured risks. Sure, the Fire Phone was a flaming pile of garbage, but the Kindle sure as hell worked out. There’s constantly a non-zero chance that Amazon is going to stumble into some new thing that makes them an unpredicted pile of money.

Also, Bezos is afraid of markets that are too good to be true. He gives the appearance of straight up hating high margins, rightfully fearing that if he’s successful in a high margin business, somebody will undercut him. So he starts with margins low enough to discourage competition.

And everything eventually gets funneled into making deliveries cheaper and faster, trying to make sure that it’s impossible to compete with Amazon on fulfillment.

It reminds me of the same ethic and vision behind Japanese companies (focus on long term growth instead of near term profit, the customer is God etc.). Nintendo is one great example of that, being founded in the late 1800 and went through different business cycles, diversifying to a crazy extent (from playing cards to brothels).

Why Amazon is probably the best managed company in the world

Richard Stallman Uses The Internet His Own Way

I am careful in how I use the Internet.

I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I fetch web pages from other sites by sending mail to a program (seegit://git.gnu.org/womb/hacks.git[1] ) that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (but I make sure I have no net connection, so that it won’t fetch anything else).

I also browse from other people’s computers, with their permission. Since I don’t identify myself to the sites I visit, this browsing can’t be connected with me.

One consequence of this method is that most of the survellance methods used on the Internet can’t see me.

Another consequence is that I never pay for anything on the Web. Anything on the net that requires payment, I don’t do. (I made an exception for the fees for the stallman.org domain, since that is connected with me anyway.)

I would not mind paying for a copy of an e-book or music recording on the Internet if I could do so anonymously, and it were ethical in other ways (no DRM or EULA). But that option almost never exists. I keep looking for ways to make it happen.

His experience is just so different the average user’s. I deeply respect and appreciate his relentless work on privacy, I just wonder how close he is to the average user  and how he can understand the infinite nuances of the media if he is eventually exposed to a very limited and filtered amount of information.

Richard Stallman Uses The Internet His Own Way

CRO and Pricing Optimization

This post previously appeared on semwired.com and is also available on the EIM Blog.

In the world of ecommerce and online marketing, conversion rate optimization (CRO) is used to ensure companies maximize their revenue by providing the best experiences to consumers. It’s a blend of user experience design, data analysis, and strict experimental methodology to test what kind of online experience and product pricing consumers respond most favorably to in order to then convert. In this article, envisionit’s CRO expert, Fabio Fanni, explains some of the methodology we use for our clients to ensure they’re getting the most out of their spending to reach their goals of a larger user base, higher profits, or whatever their target may be.

A/B and Multivariate Testing

A/B testing, or split testing, is a process where we run a simultaneous experiment between two pages to see which performs better. With A/B testing we validate or reject changes and modifications depending on their performance and how they affect our KPIs.

We usually hear about A/B and multivariate testing in UX design, user engagement and conversion optimization. These are some of the most common contexts of use for these tools.

However, for those cases in which there is no customer relationship management (CRM) backing up the pricing strategy, we can actually use these tools to find out the right price range for products yielding the highest revenue per click and profit margin, thus helping us to maximize ROI and advertisement spending.

With this method we are able to find out if a lower price drives more traffic or a higher price reduces conversions but delivers a higher profit for sale. This can be extremely helpful for those products whose pricing can be hard to define (online/digital SaaS) or industries in which the acquisition costs significantly affect net profits (travel, ecommerce retail etc.).

pricing optimization with a/b testing

This chart shows the revenue per visitor on the y-axis and the actual product price on the x-axis. In this example, we tested different prices on a $2 increase scale and tracked revenue per visitor. Considering that a lower price generates a higher conversion rate, revenue per visitor increased up to $3.1 with a $11 price.

Despite the fact that the lowest prices increased the conversion rate and brought in more clients, the improvement was actually offset by the lower price and eventually affected the profit per user. With this method, it is fairly easy to find the sweet spot that generates the highest profit, and most importantly, provides valuable insights about price elasticity.

This process can also be applied with a different objective in mind. Let´s say that we want to grow our user base within the break-even limit, regardless of profit: we would choose the intersection of price with the highest number of conversions in which a profit per user = $0.

Base price and upselling optimization

Regardless of how far we can go optimizing our landing page price, this can turn pretty tricky if we consider a base price and upselling price.

In fact, the right price range for your base product might not be the same for the total upselling price. This case is pretty common in those industries that strongly rely on ancillary revenues (travel and accommodation) and generic upselling (banks, insurance). Upsells tend to create bias and skew value perception, eventually affecting price elasticity.

pricing optimization with a/b and multivariate testing

As for this chart, the optimal price range for the base product is not the same as for the total up selling product. In this case, we would need a multivariate test rather than a standard A/B. We would run multiple combinations of base and total price.

Things to Consider

Now this looks like a reliable way to make some very important pricing decisions. Even though this methodology is solid, before taking on an extensive pricing optimization effort it’s important to always take into consideration the following:

1. Achieving statistical significance could take too long or require too much traffic.

The difference in performance between the different pricing levels could be minimal, which means that the amount of traffic required to achieve significance would be too high. Always make sure you are comfortable with the level of significance you want to achieve and the amount of traffic you need. You might want to find out how much traffic and time you might need in order to achieve those results beforehand using this tool from Optimizely.

2. Continuously testing different prices could confuse users and eventually backfire.

This could be especially true for digital SaaS products whose purchase journey can have multiple touch points and prospective clients might be visiting the pricing page with some frequency. Narrow down the testing to certain geographical areas or in order to avoid this type of problems.

It’s important to keep hard data in mind and use strict methodology when evaluating pricing and optimizing conversions, but intuition and creativity play key roles as well – something Fabio and the rest of the envisionit teams excel at in order to reach our clients’ target customers. Get in touch with us today to learn how we can bring the latest in marketing and creative work to your brand.

CRO and Pricing Optimization