Sales for Engineers

Most engineers don’t think about sales and to them it is a very foreign concept.  In my nearly seven years at Intel I never once saw a salesperson, and similarly at Google Jake never once met one.  If you worked at smaller companies you probably saw such elusive creatures, and even interacted with them. (After leaving we both met salespeople from these companies, yes they do exist.)  Now say you are an engineer with a good idea and you want to take it out and make it into a product, guess what: if you can’t sell or hire someone who can sell, your product is just a science project not a business.      

Now don’t worry sales can be learned and if you read ahead I will give you a brief introduction.  Once we define things sales can become very intuitive for an engineer.  

Ok the first thing you need to know is that you are going to have to pay some money to get your product or service into the hands of customers.  Let’s put this another way: you are going to have to allocate some of the profits from your sale to get this into the hands of your customer.  Peter Thiel calls this distribution.  There are other terms but to be consistent we will stick with Peter’s terminology.  This distribution has a cost associated with it, everything that is sold has some cost associated with getting it in front of a buyer.  If you buy a commodity on an exchange, you pay a transaction cost.  If you buy a house the real estate agent gets a commision.  If you can think of an example of a sale without a transaction cost, please email me. You can think of this as frictional losses in your physics class.  
Ok here is the big point, these friction losses have to be less than the profit you make on each item.  If the losses are greater, then you will lose money on each sale.  This frictional loss or transaction cost is usually referred to as:
CPA (Cost per customer acquisition)
usually we talk in terms of how much money the customer will give us after we have acquired them, this is denoted by:
LTV (Life Time Value) of a customer.  
If we can convince the customer to buy a product and keep him/her coming back and paying us every day/month/year, that is great we only have to sell to them once.  Most companies think this way so they define the value of a customer as the amount you can extract from that customer over their lifetime.  If the customer buys once and never comes back then the LTV is just the value from one sale.  

So back to the frictional losses, the frictional losses (CPA) have to be less than profit per customer (LTV):


CPA < LTV

This is the bounds by which all your distribution decisions will be made  If you have a flexible pricing you will probably need to adjust your pricing until this inequality holds.  I have read some rules of thumb around CPA and LTV like: LTV/CPA should be greater than three.  For this discussion let’s just say it has to be greater than one.    

The next thing I want to talk about is the methods of distribution.  When I was first learning about this world of sales and distribution, I went around taking notes of all the LTVs from every company I met with.  I will not share their data here but I have given some illustrative examples below in Figure 1.  


Figure 1. Lifetime value vs. different distribution methods

Figure 1. Plots the LTV of many products and the distribution methods used to reach these customers.  The right hand side has sales, which here means humans involved in every sale.  Sales here is a very broad term, it means people making calls and emails or visiting the customer face to face.  On the very extreme end you have space vehicles and government contracts where the sale takes a long time with many people involved in the sale with the CEO calling on customers directly.  On the other end of the sales spectrum you may have a call center in the Philippines cold calling businesses.

The eye opening thing for me when I first started taking data similar to Figure 1 was that there was a solid line where companies stopped using “sales”.  Usually you hear enterprise sales experts say this line is somewhere around $10k however, in practice I’ve seen scrappy/hungry companies get very creative and push this number near $500.  

Companies may choose to use multiple distribution methods.  We will next talk about the funnel and what sets the dotted line somewhere between $500 and $10k.  

The sales funnel is a term for describing the process in a path to a sale.  The sales funnel can apply to both sides of the dotted line in Figure 1.  One side would involve a human being and the other would be completely automated.  In Figure 2 I have shown two example funnels; the left hand funnel shows an online text advertisement funnel, while the right shows a targeted phone calling campaign.  

Figure 2. Hypothetical Sales Funnel for an online text ad on the left funnel and inside sales on the right funnel.

The two funnels in Figure 2 use different technologies but the process is the same: you start with a an initial group and this is winnowed down until you get to a group that will purchase.  The images in Figure 2 are not drawn to scale, for the online text ad you probably have somewhere around 1 out of 100 people actually clicking on the ad.  This all depends on your ad and what you are selling, but for the sake of argument let’s say 1 out of 1000 people actually purchase the product, and let’s say that it costs $0.10 each time the ad is shown.  How much profit do you have to make on the product sale for this to be worth your time?   How much are you spending to get that one sale?  You are spending 1000* $0.10 = $100 to get one sale.  Now you should make over $100 in profit from that one sale or you will be losing money on each transaction.  I described the cost in CPC (Cost Per Click) you can however pay by the number of people that see your ad in which case you will be be interested in the CPM.  There is a whole universe of terms to look up here.  Google makes this super easy, it is easy to set up a campaign over a few days to see how this works.  You can spend $50 or $100 and see how many people come to your site.
The right hand funnel in Figure 2 is similar but transactions are handled on the phone by sales reps.  On the left hand side you have to pay for some advertisements and pay for your site to hosted and that’s about it for your costs. On the right side you have to have pay people, and you may have to pay for them to travel, eat, etc.  

Both of these funnels have multiple stages that you can optimize to produce the maximum output (sales.)  When sales are presented this way, it begins to resemble a mining machine or a factory which is much easier for a technical person to understand.   I think the mining example is the best, as you are trying to filter out valuable material from invaluable material.  

Both of these funnels work a lot better if you have higher quality input.  For online advertising that means making sure the ad is targeted correctly.  For sales with humans making the sale that means getting the best list of people to start calling.  There are a number of ways companies can pre filter the list of people they want to contact before they pick up a phone.  A large team may have a set of junior sales professionals doing prefiltering before passing them on to more senior (expensive) sales masters.  A number of startups have even used algorithms to pre filter prospects before the junior sales professional interacts with them.  
Where do these prospects come from?  It depends on the organization, some have inbound prospects, some buy lists of prospects, and I’ve even heard of people using a telephone book.  Obviously the telephone book is not as good as a solid targeted list.  With Zillabyte you can generate these lists segmented along a number of variables such as business properties or technologies that the company is using.  The key thing to remember here is you want to find people that need what you are selling, and are ready to buy it.  The more intelligence you have on your customers, the easier it will be to determine this.  

Why Scientific Sales?

Here at Zillabyte we are scientists and engineers that have taken an interest in sales.  Why would we take an interest in sales where there are many other things we could study?  Sales is very important to a business, without a functioning sales mechanism a business goes nowhere.  Sales are the contact between a business and its customers.  This is similar to tires on a car, a car may have a monster engine but without tires it is not moving.  Even with a set of tires a car may move but will not perform up to its potential without the right tires. Tires are the mechanism that provide contact between the car and the road.  Sales provides the revenue needed to build a factory and provide jobs for people.  
As an engineer or scientist sales is interesting.  In is interesting in the same way that the stock market is interesting to a scientist: it is an intersection of social sciences, psychology, data science, statistics and common sense.  Many PhDs that found their way to Wall Street can testify that understanding these elements is not easy however it’s incredibly rewarding.  
That’s what motivates us at Zillabyte, I hope you will continue reading our Scientific Sales blog, I hope it is helpful for you and your sales endeavors.  

Do people give higher ratings to more expensive restaurants?

I was sitting in the SGD Tofu House and decided to see what people thought of this restaurant on line.  I looked a popular restaurant review site and found that the SGD Tofu House had only four stars out of five.  What?  How could this be true?  The place is always packed, there are a series of newspaper articles written about the place, and you always get enough to eat for a decent price.  How could anyone give this anything other than five stars?  I did some more searching on this restaurant review site to find restaurants with five stars cost a lot more.  I decided to collect some data to see if price and the average rating were indeed correlated. 

I started looking into this phenomena and found that similar things exist in iPhone apps.  People will rate paid apps higher than free apps, see more here.  This has actually been studied before and is called the Endowment effect.  I wanted to title this article: 

Evidence of endowment effects in online restaurant reviews

I’m not sure I could get much attention with a title like that.  It makes sense: if you paid a lot for something you wouldn’t admit that you were a fool to buy a bad product, that would reflect poor decision making skills on your behalf.  

The more I thought about resturant ratings I started to wonder: what does a star mean?  Is is it a measure of the pleasure or satisfaction you gain from a dining experience?  I am more interested in a satisfaction/$.  I expect to get better food and service for more money, no one should be lauded for this.   That’s just my opinion and after thinking about this too much I collected some data and created a linear model to predict the number of stars based on a the price.  The data is below in Figure 0.  

Figure 0

Figure 0 plot of restaurant average rating vs. price.

The data in Figure 0 was taken from 100 restaurants in the San Francisco Bay area.  The price range can take one of four distinct values, and the average rating is one to five stars.  Restaurants were only considered if they had more than 20 reviews.  Many of the data points are displayed on top of each other to make Figure 0 not very interesting.  A linear regression was performed on the data in Figure 0 to yield the following coefficients:

Average Rating = 0.0736(price) + 3.3864

What this means is that the price does not have a huge impact on the rating.  Or to answer the question posed in this post’s title: “not much”.  

Looking into this lead to some interesting observations that warrant more study.  While collecting the data it was apparent that some of that data presented by this restaurant review site is altered.  It is well known that this restaurant review site will allow restaurants to remove negative reviews for a fee.  There were a surprisingly high number of cupcake shops with perfect ratings.  I’m not sure why this is, but it would be interesting to look into this further and come up with a method for identifying altered average ratings.  Send me an email if you are interested in looking into this.

Hedging the Election

The elections are upon us and the candidates are vying for campaign contributions. Most donations are given to a single candidate. However, a few people choose to donate to multiple candidates. We call this a pair-wise contribution.

This is an interesting behavior. It amounts to funding two warring parties. If a person invests in both the Romney campaign and the Perry campaign, then surely at least one will lose. We can only guess at people’s motivations, but it seems likely that donors are approaching the election like an investment. They are hedging their bets across multiple candidates.

The Federal Election Commission publishes a dataset with all campaign contributors.  We filter this dataset to find all pair-wise contributors. That is, we find all people who have contributed to at least two candidates. The interesting point is that not all pair-wise contributions are split evenly. 

The above visualization shows how pair-wise contributions are divided. For example, consider the Romney-Huntsman pair. The data indicates that, on average, people who contributed to both campaigns favored Romney with 10x more money.

Even more interesting is the number of pair-wise contributions involving President Obama. The data indicates that people who contribute to Obama AND a republican favor Obama roughly 10x. The exception is Romney. People who contribute to Romney AND Obama favor Romney roughly 10x.

Hue Histograms

In a previous post, we described hue histograms and color pies as great ways to visualize the color structure of an image.  Today, we’re open sourcing the python scripts used to create these visualizations, and introducing a new hybrid version that looks like this:

This graph is based on this source image:

The center of the graph is a color pie - a set of representative colors for the given image, with pie slices sized to match the portion of the image they represent. Along the outside is a hue histogram that’s been wrapped around the pie, which works well because the range of hues itself is periodic - there is no natural starting or ending point in the color spectrum.

As of today, generating these graphs is easy. The source is available here:

https://github.com/tylerneylon/imghist

Make sure to install pypng and pycairo to get this script to work. Then type:

./imghist.py -mode=both --size=600x600 sourceImage.png outputImage.png

to create a hue histogram and color pie graph for sourceImage.png. You can also set the mode to “hist” for just a hue histogram, or to “pie” for just a color pie.

Here are a few more examples to see how this graph relates to the source images:

Image credits: 1 2 3 4.

Correlations Between Endowment Funds And School Rankings

I was listening to a speech the other day and the speaker made the assertion that the best schools keep getting better because they have money. Money allows the school to better prepare its students to make money. These successful alumni donate money back to the school and the cycle repeats its self.

Around this same time I was reading More money than God by Sebastian Mallaby. This book is a history of hedge funds, and dedicates one chapter to the story of Yale’s (and later other schools’) switch to aggressive management of the school’s endowment. An endowment is simply a fund with restricted uses of the money. The idea is that a fund stays the same size and profits made from the investment are used to fund the school. Private schools which do not draw support from the government typically have larger endowments than public schools. Schools typically take ~5% of the value of the endowment each year. I wanted to see if the size of the endowment really made schools better, so I decided to look at some data. I took the top 50 schools from the US News and World Reports and then looked up the size of their endowment. I was surprised at how well the results turn out. Figure 0 shows a plot of the schools ranking on the vertical axis, and the school’s endowment in Billion USD. Harvard (ranked #1) is rightmost data point with an endowment of $27.4 Billion.

I first plotted the data it looked like a pattern emerged so I made a best-fit line to these data points. The data tends to follow an inverse pattern like: ranking = c/endowment, where c is some constant. This best fit line describes how the average school performs. Since we have an idea of the average school we can see schools that underperform or over perform. Figure 1 shows the regions where schools performed better or worse than the average given the size of its endowment.

The school with the worst endowment performance was UT Austin; University of Michigan was also a poor performer. Tech-centric schools like Caltech and Carnegie Mellon have small endowments but are ranked high. The data may be skewed against public schools like University of Texas and University of Michigan for multiple reasons: 1. The endowment data is given for the system wide schools, while the ranking is for a single campus. 2. Traditionally public schools have received funding from the state government, so they may not have historically been reliant on begging alumni for money.

I would like to see what this looked like with more data. If anyone wants to continue my work, send me an email and I can give you the data for this plot.

Color as Data

Great data analysis is beautiful.  Data is only useful once we understand it; it is critical to use tools and perspectives that fit the information you begin with and provide the information you want to extract.

Typical Color Histograms

Let’s take a look at treating color as data.  When working with a digital image, artists or photographers often view its color histogram to be aware of the image’s overall brightness, saturation, and primary hues.  A histogram is a graph showing the frequency of different items - higher points denote more frequent items.  A typical (not beautiful) color histogram looks like this:

There are three overlapping graphs here, one each for red, green, and blue, and showing their combinations where they overlap (for example, red and green as light combine to give yellow).  The color of every pixel in an image is described by three numbers for the proportions of light to emit for red, green, and blue, called channels.  This graph is created by building one histogram for each color channel.  The left end of the graph denotes low light values, corresponding to pixels with very little of that color, while the right side denotes high values.

In the example histogram, there’s a blue spike near the left side.  This means the image in question has a large number of pixels with low blue values.

New Perspective: Color Pies

 A big trouble spot with typical color histograms is that they ignore correlations between the color channels.  You get three separate buckets, one per color, without any idea if all the greens go with all the reds to make a yellow image, or if you have a half-green, half-red image.

Let’s do something about that.

We’ll use an analysis tool called k-means clustering to find a very small representative set of colors from any image, which I’ll call a color pie.  K-means clustering works by choosing a small set of sample colors, clustering the images’ pixels around the nearest sample color, and then adjusting the sample color to be in the middle of its cluster.  The cool thing about this algorithm is that it often converges quickly, meaning we end up with sample colors that are optimally representative of all the pixels.  (And this technique applies to a lot more data than just pixels!)

Here are some example images along with their k-means color pies:

Hue Histograms

Color pies are great for visualizing the color theme of a complex image at a glance.  However, they throw away a lot of information.  Let’s build a visualization that captures most of the critical color-frequency information at a glance.

Our graph will be organized so that each hue gets its own place along the horizontal axis.  This is easier to understand with an example:

Every pixel in the image has a representative portion in the hue histogram, and vice versa.  Every channel is taken into account, and the per-pixel channels are kept together in the histogram, addressing the weakness of a typical color histogram.  This is achieved by drawing each vertical stripe in the hue according to its horizontal position (red at left, then greens, blues, etc until we get back to red on the right), and then averaging the saturation and lightness values of that vertical stripe to match the values of all the pixels in the image with that particular hue.  The result is a histogram that very obviously matches the image.

Scroll back up to the typical color histogram to compare.  Which would you rather work with?

Great data analysis is our passion at Zillabyte - inventing these two new ways to visualize your image data is just one small step toward our vision.  What data visualizations would you like to see reinvented?

(Photo Credits: 1st row: 1 2 32nd row: 1 2 3Beach image)

Hadoop Doesn’t Solve All Problems

Hadoop is a hot topic in Silicon Valley these days. Walk into any coffee shop and you’ll likely hear people discussing Hadoop. More companies are adopting it. Some companies, like Cloudera, are built solely on top of it.

Here at Zillabyte, we gratefully use Hadoop. Hands down, it’s helped us build our product quickly and efficiently. However, it is important to note that Hadoop is not the end-all-be-all for distributed processing. In fact, Hadoop only solves a small slice of the solvable-problem universe.

Map-reduce (and by extension, Hadoop) has limitations. Map-reduce performs poorly on algorithms that rely on intra-data relationships. For example, clustering algorithms are supposed to find geometric regions of data. To pull this off, the algorithm must effectively compare every data point with every other data point. These intra-data relationships are the death nail for Hadoop. Map-reduce fundamentally struggles to compare datapoints with other datapoints.

Consider another example: recommendations. A recommendation engine is an implementation of a clustering algorithm. Although it’s possible to run this on Hadoop, our experience has shown that it takes six times longer than a non-Hadoop implementation.

While we gratefully use Hadoop, we’re cognizant of its limits. Other solutions need to be found for different types of problems.

The State of the Zillabyte Repository

The following is a visualization of our Zillabyte code repository. Red bands indicate backend code, blue indicates frontend, and gray indicates open-source. The color intensity of each segment indicates how “pure” a segment is with respect to frontend, backend, or open-source code.  The levels correspond to directories and files. The thickness of each segment indicates how old the file/directory is. The arc-length of each segment indicates how much space it consumes relative to its siblings.  

Aside from aesthetics, visualizing a repository is a convenient way to understand the state of a software product. This visualization tells us that our frontend and backend code size is roughly equal, and the repo is relatively young. 

S&P Sovereign Debt Rating Visualization