Wednesday, November 4, 2015

The Calm Before the Storm

Have you ever spent an afternoon in the backyard, maybe grilling or enjoying a game of croquet, when suddenly you notice that everything goes quiet? The air seems still and calm -- even the birds stop singing and quickly return to their nests.

After a few minutes, you feel a change in the air, and suddenly a line of clouds ominously appears on the horizon -- clouds with a look that tells you they aren't fooling around. You quickly dash in the house and narrowly miss the first fat raindrops that fall right before the downpour. At this moment, you might stop and ask yourself, "Why was it so calm and peaceful right before the storm hit?"
-How Stuff Works

The last 5 years, with the onset of Hadoop, Cloud, and Mobile was merely the calm before the storm. There is a new modern technology stack, a data revolution, and the onslaught of machine learning that will shape the storm to come over the next decade.


The mobile supply chain has wreaked havoc on the traditional technology stack. With the advent of high volume chips, screens, storage, etc. it has become cost effective to move away from a vertically integrated architecture, to one that is much more flexible and dynamic. We have evolved to a 6 layer Next Generation Technology Stack:

Layer 1: There are 2 aspects to layer 1: a) the repositories and b) the data itself. The repositories include the new breed of flexible and fluid data layers, ranging from Hadoop to Cassandra to other NoSQL data stores. Very flexible, adaptable, and tuned to modern internet and mobile applications. This also includes databases, data warehouses, and mainframes; said another way, anything that stores data of strategic and operational relevance. Within the repositories, the data itself creates a competitive moat, and offers strategic advantage when used appropriately.

Layer 2: A highly performant processing layer, which enables access to all data in a unified way, and easily incorporates machine learning and produces real-time insights. This is why I have called Spark the Analytics Operating System.

Layer 3: Machine learning, on a corpus of strategically relevant data, is the new competitive moat for an enterprise. This layer automates the application of analytics and delivers real-time insights for business impact. It's the holy grail that has never quite been found in most organizations.

Layer 4: A unified application layer, which provides seamless access to analytical models, data, and insights. This is the glue that enables most business users to leverage and understand data-rich applications.

Layer 5: The easiest way to democratize access to data in an organization is to give users something elegant and insightful. Vertical and horizontal applications, built for a specific purpose serve this role in an organization.

Layer 6: The number of people connected to the Internet has surged from 400 million in 1999 to 3 billion last year. The number of connected devices is estimated at 50 billion by 2020. These are all access points for the Next Generation Technology Stack.


In Big Data Revolution, I dissected 3 Business Models for the Data Era. In summary, there are 3 dominant business models that I see emerging:

Data as a competitive advantage: While this is somewhat incremental in its approach, it is evident that data can be utilized and applied to create a competitive advantage in a business. For example, an investment bank, with all the traditional processes of a bank, can gain significant advantage by applying advanced analytics and data science to problems like risk management. While it may not change the core functions or processes of the bank, it enables the bank to perform them better, thereby creating a market advantage.

Data as improvement to existing products or services: This class of business model plugs data into existing offerings, effectively differentiating them in the market. It may not provide competitive advantage (but it could), although it certainly differentiates the capabilities of the company. A simple example could be a real estate firm that utilizes local data to better target potential customers and match them to vacancies. This is a step beyond the data that would come from the Multiple Listing Service (MLS). Hence, it improves the services that the real estate firm can provide.

Data as the product: This class of business is a step beyond utilizing data for competitive advantage or plugging data into existing products. In this case, data is the asset or product to be monetized. An example of this would be Dun & Bradstreet, which has been known as the definitive source of business-related data for years.

Since my work on business models was published, my thinking has evolved a bit. While I think each of those business models is still valid, I am less certain that any of them on their own will create a distinctive competitive advantage. Instead, I believe that the value is where the software meets the data, and access is democratized. Said another way, it's hard to create value by only looking at one layer of the Next Generation Technology Stack.

Enter The Weather Company...


Last week, we announced our intention to acquire The Weather Company. The media reaction has ranged from, "IBM is buying a TV station?" (we are not), to "IBM is buying the clouds", to "IBM is entering the data business." Some of the reactions are wrong, others are humorous, and some are overly simplistic. The reality is that IBM has just made a significant move in defining and leading in the Next Generation Technology Stack. This interview captures it well. Let's look at this in terms of each layer:

Layer 1: IBM has long been a leader in Layer 1, around all types of repositories. From Netezza to DB2 to the mainframe to Informix to Cloudant to BigInsights to enterprise content; most of the worlds valuable enterprise data is stored in IBM technology. With The Weather Company, we now have a rich set of data assets. The Weather Company can decompose what is happening on earth into over 3 billion elements. And, its not just weather data. In an increasingly mobile world, location matters.

Layer 2: IBM is the enterprise leader in Spark. Through a variety of partnerships like Databricks and Typesafe, we are a key part of this blossoming community.

Layer 3: Through our open source contributions to Machine Learning, our rich portfolio of Analytical models, and the worlds greatest Cognitive system (Watson), IBM can provide applications (Layer 5) and insights that are unmatched. Just think how powerful Watson becomes when it understands location and environment, as well as everything else it already knows.

Layer 4: The Weather Company has an internet-scale high volume platform for IoT. It can seamlessly be extended for other sources of data and “can ingest data at a very high volume in fractions of a second that will be an engine that feeds Watson”.

Layer 5: IBM has a rich set of industry applications and solutions across Commerce, Watson, and countless other areas. The Weather Company applications and websites handle 65 billion unique accesses on weather and related data per day. This is scale that is unmatched.

Layer 6: The Weather Company mobile application has 66 million unique visitors a month and connective tissue to tap into the 50 billion connected devices that are emerging.

In summary, this is much more than weather data. Overnight, IBM has become the leader in the Next Generation Technology Stack. It is the basis for extension into financial services, automotive, telematics, healthcare, and every other industry being transformed by data.


It's always calm before the storm hits. Sometimes, in the moment, you don't even recognize the calm for what it is. My guess is that most people have not considered that last 5 years the calm before the storm. But, it was.

Thursday, October 22, 2015

Preparing your career for the data science revolution

I became familiar with Geoffrey Moore in 2002, while consulting at Symbol Technologies, helping it transform from a product-only company into a solutions company (that is, both products and services). The president of Symbol mentioned Moore’s work, and he often used the terms “core” and “context” when describing Symbol’s business operations. As I began reading Moore, his ideas crystallized in my mind, developing in me a growing appreciation for his work.

Innovating in a complex business environment

In 2005, Moore published Dealing with Darwin, offering a glimpse into how companies innovate during each phase of their evolution. Moore paints the picture of a business environment that is ever more competitive, globalized, deregulated and commoditized. Unsurprisingly, the combination of these forces puts immense pressure on companies to find ways of innovating in an increasingly complex environment.

Back in 2005, not all those forces were pronounced, but they have since become indisputably so. Indeed, not only are they present, but they are beginning to make many large companies irrelevant. The unbundling chart in Figure 1 demonstrates:


Competition and innovation have been forever changed. But such forces’ effects on companies merely marked a beginning. The next assault will be on individuals within companies. In the next five years, every employee will be dealing with Darwin personally.

Individuals first dealt with Darwin on substantial levels during the Industrial Revolution, continuing through the rest of the 1900s, when factories began replacing traditional small industries. As factories appeared, demand for labor heightened. But factory work was quite different from traditional work and quickly became known for its poor working conditions. Even worse, because most factory work did not require a particular strength or skill of its workers, workers were considered unskilled and were thus easily replaceable. At first, factory workers were replaced by other workers, but eventually they were automated away, forcing them to move to lower-cost countries.

Shifting roles in the workforce

Economists typically categorize three kinds of work in a country: agriculture (farming), industry (manufacturing) and services. Each type of work plays a role in the economy, but macroeconomic forces have changed the mix over time. As seen in Figure 2, although 70 percent of the labor force worked in agriculture in 1840, the agricultural share reduced to 40 percent by 1900 and today sits at a mere 2 percent. Such shifts force employees to adapt, dealing with Darwin at a very personal level.


As Figure 3 shows, this shift has accelerated in the past 15 years, a period during which productivity has exploded:

Clearly, Darwin has struck at traditional workers and approaches. Thanks to widespread automation, broad acceptance of best practices and experience, skills that were once valued—and, indeed, that were necessary for industry—have been reduced to commodity services. Could the same thing happen to traditional IT jobs?

Redefining the skilled worker

In Big data revolution: what farmers, doctors, and insurance agents teach us about discovering big data patterns, I identify 54 big data patterns across a variety of industries. Whether you are a retailer, an insurance agent, a commercial banker or a doctor, these patterns will affect your role and the industry in which you play it. One pattern I identify is that of redefining the skilled worker:

The data era is demanding a new definition of skilled workers. While this may require skills like statistics or math, that is merely one aspect of the skill gap that must be filled. In medicine, it’s about redefining medical school to include skills like data analysis and data collection. In farming, the new skill set involves understanding how to utilize multiple sources of data (from drones, GPS, or otherwise) and apply that insight to deliver better yields and productivity.

The definition of a skilled worker has changed dramatically, based on different eras. In the 1700s and 1800s, skilled workers were defined by either their physical abilities or their knowledge of a certain craft (think bookkeeper). At the time of the Industrial Revolution, skilled workers were defined by their physical capacity to operate a machine or work on an assembly line. In the last 20 years, skilled workers have evolved further, with a premium placed on customer service and technology skills.

A skilled worker in the next decade will be defined by her ability to acquire, analyze, and utilize data and information. This new skilled worker will emerge in every industry, with a slightly different definition of the skill set for each industry.

The key patterns to redefine a skilled worker are

-Understanding the skill sets needed today, tomorrow, and further in the future, based on the potential for data disruption.
-Redefining roles and skill sets to take advantage of the new data available that can impact business processes.
-Training and retraining current and new workers is a distinguishing capability to remain relevant.

An organization must accept that the skills they have today may not be sufficient for the data era. The challenge is that the skill gaps, once understood, will likely prove to be excessively broad. The data era demands skills in data science, statistics, and probability. And that’s just on the business side. On the IT side, the skill needs are much different than traditional IT, with a premium placed on programming and modeling skills. Leading organizations will document their skills today versus the skills needed tomorrow and systematically begin to fill the inevitable gaps.

Though companies must understand their skill gaps and solve them at a company level, the challenge for individuals is even more pressing: What skill will you master? What skill aligns with the data era? For what craft will you be known? I believe that data science is the answer to all three questions.

Taking the cloud into account

The cloud has profoundly affected IT. Although moving to the cloud can help diminish the costs of starting a company and can help cut capital expenses, I believe that the cloud’s bigger effect will be on the traditional definition of a skilled IT worker. As organizations move to the cloud, I expect the importance of traditional IT roles and skills—systems administrator, architect, DBA, IT operations—to diminish or even be eliminated over the next 5 to 10 years. Make no mistake: I think this shift will take a long time to play out. But such a belief means that now is the time to prepare for the coming revolution.

Let’s imagine an organization having revenue of $1 billion, and let’s assume that it spends 6 percent of its revenue per year on IT. Figure 4 shows some related factors that we can calculate under such an assumption:

For such a company, the decision to shift to the cloud will be predicated on increasing leverage and efficiency and will force a rethinking of the workforce. There is good news—salaries of the employees will go up as they acquire rare and sophisticated skills—but as Figure 5 depicts, there is also bad: Traditional IT skills will be less in demand, for much such work will have been shifted to the cloud.

This shift presents both pitfall and opportunity for individuals. Indeed, the greatest risk lies in doing nothing—in merely continuing business as usual. More tools and education are available than ever have been for individuals who decide that they want to be leaders in the data era. To be blunt: Data science will be the data era’s defining skill. Though traditional IT skills will remain important for some, such skills will be increasingly less relevant in the cloud-centric data era.

Monday, September 14, 2015

IBM | Spark - An Update

IBM made some significant announcements around our investment in Spark, back in June (See here and here). 90 days later, I saw it fit to provide an update on where we stand in our community efforts.


Before I get to the details, I want to first re-state why I believe Spark is will be a critical force in technology, on the same scale as Linux.

1) Spark is the Analytics operating system: Any company interested in analytics will be using Spark.
2) Spark unifies data sources. Hadoop is one of many repositories that Spark may tap into.
3) The unified programming model of Spark, makes it the best choice for developers building data-rich analytic applications.
4) The real value of Spark is realized through Machine Learning. Machine Learning automates analytics and is the next great scale effect for organizations.

I was recently interviewed by Datanami on the topic of Spark. You can read it here. I like this article because it presents an industry perspective on Hadoop and Spark, but it's also clear on our unique point of view.

Also, this slideshare illustrates the point of view:


So, what have we accomplished and where are we going?

1) We have been hiring like crazy. As fast as we can. The STC in San Francisco is filling up and we have started to bring on community leaders like @cfregly .

2) Client traction and response has been phenomenal. We have references already and more on the way.

3) We have open sourced SystemML as promised (see on github) and we are working on it with the the open. This contribution is over 100,000 lines of code.

4) Spark 1.5 was just released. We went from 3 contributors to 11, in one release cycle. Read more here.

5) Our Spark specific JIRAs have been ~5,000 lines of code. You can watch them in action here.

6) We are working closely with partners like Databricks and Typesafe.

7) We have trained ~300,000 data scientists through a number of forums, including You can also find some deep technical content here.

8) We have seen huge adoption of the Spark Service on Bluemix.

9) We have ~10 IBM products that are leveraging​ Spark and many more in the pipeline.

10) We launched a Spark newsletter. Anyone can subscribe here.


Between now and the end of the year, we have 5 significant events, where we will have much more to share regarding Spark:

a) Strata NY- Sept 29-Oct 1 in New York, NY.
b) Apache Big Data Conference- Sept 28-30 in Budapest, Hungary.
c) Spark Summit Europe- Oct 27-29 in Amsterdam.
d) Insight- Oct 26-29 in Las Vegas.
e) Datapalooza- November 10-12 in San Francisco.

In closing, here is a peek inside:

Friday, August 21, 2015

3 Business Models for the Data Era


A walk through the meatpacking district in New York City is a lively affair. In the early 1900s, this part of town was known for precisely what its name implies: slaughterhouses and packing plants. At its peak, there were nearly 300 such establishments, located somewhat central to the city and not far from shipping ports. Through the years, this area of town has declined at times, but in the early 1990s, a resurgence began that shows no signs of ending.

Located in some proximity to the fashion district of Manhattan, the Meatpacking district stands for what is modern, hip, and trendy; a bastion of culture in the work-like city. Numerous fashionable retailers have popped up, along with trendy restaurants. And, on the fringes, there is evidence of the New York startup culture flowing from the Flatiron District, sometimes known as Silicon Alley. A visit to one of the companies in this area opened my eyes to some of the innovation that is occurring where data is the business, instead of being an enabler or addition to the business.


As I entered the office of this relatively newborn company, I was confident that I understood their business. It was pretty simple: They were a social sharing application on the web that enabled users to easily share links, share content, and direct their networks of friends to any items of interest. The term social bookmarking was one description I had heard. The business seemed straightforward: Attract some power users, enable them to attract their friends and associates, and based on the information shared, the company could be an effective ad-placement platform, since it would know the interests of the networks and users.

But what if social bookmarking and ad placement was not the business model at all? What if all that functionality was simply a means to another end?


Data has the power to create new businesses and even new industries. The challenge is that there are many biases about the use of data in a business. There is a view that data is just about analytics or reporting. In this scenario, it’s relegated to providing insight about the business. There is another view that data is simply an input into existing products. In this case, data would be used to enrich a current business process, but not necessarily change the process. While these cases are both valid, the power of the Data era enables much greater innovation than simply these incremental approaches.

There are three classes of business models for leveraging data:

Data as a competitive advantage: While this is somewhat incremental in its approach, it is evident that data can be utilized and applied to create a competitive advantage in a business. For example, an invest- ment bank, with all the traditional processes of a bank, can gain significant advantage by applying advanced analytics and data science to problems like risk management. While it may not change the core functions or processes of the bank, it enables the bank to perform them better, thereby creating a market advantage.

Data as improvement to existing products or services: This class of business model plugs data into existing offerings, effectively differentiat- ing them in the market. It may not provide competitive advantage (but it could), although it certainly differentiates the capabilities of the company. A simple example could be a real estate firm that utilizes local data to better target potential customers and match them to vacancies. This is a step beyond the data that would come from the Multiple Listing Service (MLS). Hence, it improves the services that the real estate firm can provide.

Data as the product: This class of business is a step beyond utilizing data for competitive advantage or plugging data into existing products. In this case, data is the asset or product to be monetized. An example of this would be Dun & Bradstreet, which has been known as the defini- tive source of business-related data for years.

In these business models, there are best practices that can be applied. These best practices are the patterns that teach us how to innovate in the Data era.


Procter & Gamble is legendary for the discipline of brand management. To be the leading consumer packaged-goods company in the world, brand is everything. Ultimately, consumers will buy brands that they know and understand, and those that fulfill an expectation. This was one of the first lessons that Scott Cook learned when he joined Procter & Gamble in the 1970s. He also observed that the core business processes for managing a brand (accounting, inventory, etc.) would be transformed and subsumed by the emerging personal computer revolution. This insight led him to co- found Intuit in 1983, really at the dawn of the expansion of personal computers. The premise was simple: Everyone, small companies and individuals alike, should have access to the financial tools that previously had been reserved for large enterprise companies.

Now broadly known for tax preparation software (TurboTax), along with software solutions for small businesses (QuickBooks) and individuals (Quicken), Intuit has transformed many lives. At over $4 billion in revenue and nearly 8,000 employees, the company is a success by any barometer. Intuit is one of the few software companies to challenge Microsoft head-on and not only live to tell about it, but to prosper in its wake. Microsoft Money versus Quicken was a battle for many years; Microsoft was trying to win with software, while Intuit was focused on winning with data, utilizing software as the delivery vehicle.
Robbie Cape, who ran Microsoft’s Money business from 1999 to 2001, believes that Intuit’s advantage had very little to do with technology. Instead, he attributes Intuit’s success to its marketing prowess. While there may be some truth to the statement, its very hard to believe that Intuit had deep- enough pockets to out-market Microsoft. Instead, the differentiation seems to come from data.

The NPD Group analyst Stephen Baker said that Intuit won by building out critical mass in financial software and the surrounding ecosystem. Intuit had the insight that adjacency products and services, leveraging data, made the core software much more attractive. This insight led to their early and sustained domination of the retail channel.

Intuit’s ability to collect and analyze a large amount of sensitive and confi- dential data is nearly unsurpassed. Nearly 20 million taxpayers use TurboTax online, sharing their most personal data. Over 10 million customers use QuickBooks, with employee information for over 12 million people flowing through its software. Brad Smith has been cited as declaring that 20 percent of the United States Gross Domestic Product flows through QuickBooks. No other collection of data has this type and extent of financial information on individuals and small businesses.

With these data assets, Intuit began to publish the Intuit Small Business Index. The Index provides summarized insights about sales, profit, and employment data from the small businesses that use QuickBooks. This information can provide headlights to business and salary trends, which ultimately becomes insight that can be fed back into the product. This was the point that Microsoft Money either missed or simply could not achieve: The value was never in the software itself. The value was in the collection, analysis, and repurposing of data to improve the outcomes for the users.

In 2009, Intuit purchased Mint, a free web-based personal-finance application. Mint took the Intuit business model a step further: They provide their software for free, knowing that it’s a means to an end. The social aspects of Mint enable users to do much more than simply track their spending. Instead, it became a vehicle to compare the spending habits of an individual to others of a similar geography or demographic. The user can do these comparisons, or the comparisons can show up as recommendations from Mint. Further, Mint brings an entirely different demographic of data to Intuit. While the Intuit customer base was largely the 40-and-over demographic (people who had grown up with Quicken, QuickBooks, etc.), Mint attracted the Millennial crowd. The opportunity to combine those two entirely different sets of data was too attractive for Intuit to pass up.

To date, Intuit has not had a strategy for monetizing the data itself. Perhaps that may change in the future. However, with data at the core of its strategy, Intuit has used that data to drive competitive advantage, while software was merely the delivery vehicle. The companies that tried the opposite have not fared so well.


Chapters 1 through 9 offer a multitude of examples in which data is being utilized to improve existing products or services. In the pursuit of a business model leveraging data, this category is often the low-hanging fruit; more obvious, although not necessarily easy to do. The examples covered previously are:

Farming and agriculture: Monsanto is using data to augment applica- tions like FieldScripts, which provides seeding prescriptions to farmers based on their local environments. While Monsanto could provide prescriptions through their normal course of operation, data has served to personalize and thereby improve that offering.

Insurance: Dynamic risk management in insurance, such as pay-as-you- drive insurance, leverages data to redefine the core offering of insurance. It changes how insurance is assessed, underwritten, and applied.

Retail and fashion: Stitch Fix is redefining the supply chain in retail and fashion, through the application of data. Data is augmenting the buying process to redefine traditional retail metrics of inventory, days sales outstanding, etc.

Customer service: Zendesk integrates all sources of customer engage- ment in a single place, leveraging that data to improve how an organiza- tion services customers and fosters loyalty over time.

Intelligent machines: Vestas has taken wind turbines — previously regarded as dumb windmills — and turned them into intelligent machines through the application of data. The use of data changes how their customers utilize the turbines and ultimately optimizes the return on their investment.

Most companies that have an impetus to lead in the Data era will start here: leveraging data to augment their current products or services. It’s a natural place to start, and it is relatively easy to explore patterns in this area and apply them to a business. However, it is unlikely that this approach alone is sufficient to compete in the Data era. It’s a great place to start, but not necessarily an endpoint in and of itself.


Previously in this chapter, the examples demonstrated how data is used to augment existing businesses. However, in some cases, data becomes the product; the sole means for the company to deliver value to shareholders. There are a number of examples historically, but this business model is on the cusp of becoming more mainstream.


In 1841, Lewis Tappan first saw the value to be derived from a network of information. At the time, he cultivated a group of individuals, known as the Mercantile Agency, to act “as a source of reliable, consistent, and objective” credit information. This vision, coupled with the progress that the idea made under Tappan and later under Benjamin Douglass, led to the creation of a new profession: the credit reporter. In 1859, Douglass passed the Agency to his brother-in-law, Robert Graham Dun, who continued expansion under the new name of R.G. Dun & Company.

With a growing realization of the value of the information networks being created, the John M. Bradstreet company was founded in 1849, creating an intense rivalry for information and insight. Later, under the strain caused by the Great Depression, the two firms (known at this time as R.G. Dun & Company and Bradstreet & Company) merged, becoming what is now known as Dun & Bradstreet.

Dun & Bradstreet (D&B) continued its expansion and saw more rapid growth in the 1960s, as the company learned how to apply technology to evolve its offerings. With the application of technology, the company intro-duced the Data Universal Numbering System (known as D&B D-U-N-S), which provided a numerical identification for businesses of the time. This was a key enabler of data-processing capabilities for what had previously been difficult-to-manage data.

By 2011, the company had gained insight on over 200 million businesses. Sara Mathew, the Chairman and CEO of D&B, commented, “Providing insight on more than 200 million businesses matters because in today’s world of exploding information, businesses need information they can trust.”

Perhaps the most remarkable thing about D&B is the number of companies that have been born out of that original entity. As the company has restruc- tured over the years, it has spun off entities such as Nielsen Corporation, Cognizant, Moody’s, IMS Health, and many others. These have all become substantial businesses in their own right. They are each unique in the markets served and all generate value directly from offering data as a product:

Nielsen Corporation: Formerly known as AC Nielsen, the Nielsen Corporation is a global marketing research firm. The company was founded in 1923 in Chicago, by Arthur C. Nielsen, Sr., in order to give marketers reliable and objective information on the impact of market- ing and sales programs. One of Nielsen’s best known creations is the Nielsen ratings, an audience measurement system that measures television, radio, and newspaper audiences in their respective media markets. Nielsen now studies consumers in more than 100 countries to provide a view of trends and habits worldwide and offers insights to help drive profitable growth.

Cognizant: Starting as a joint venture between Dun & Bradstreet and Satyam Computers, the entity was originally designed to be the in-house IT operation for D&B. As the entity matured, it began to provide similar services outside of D&B. The entity was renamed Cognizant Technology Solutions to focus on IT services, while the former parent company of Cognizant Corporation was split into two companies: IMS Health and Nielsen Media Research. Cognizant Technology Solutions became a public subsidiary of IMS Health and was later spun off as a separate company. The fascinating aspect of this story is the amount of intellec- tual property, data, and capability that existed in this one relatively small part of Dun & Bradstreet. The interplay of data, along with technology services, formed the value proposition for the company.

IMS Health: IMS became an independent company in 1998. IMS’s competitive advantage comes from the network of drug manufacturers, wholesalers, retailers, pharmacies, hospitals, managed care providers, long-term care facilities and other facilities that it has developed over time. With more than 29,000 sources of data across that network, IMS has amassed tremendous data assets that are valuable to a number of constituents — pharmaceutical companies, researchers, and regulatory agencies, to name a few. Like Lewis Tappan’s original company back in 1841, IMS recognized the value of a network of information that could
be collected and then provided to others. In 2000, with over 10,000 data reports available, IMS introduced an online store, offering market intelligence for small pharmaceutical companies and businesses, enabling anyone with a credit card to access and download data for their productive use. This online store went a long way towards democratizing access to the data that had previously been primarily available to large enterprise buyers.

Moody’s: Moody’s began in 1900, with the publishing of Moody’s Manual of Industrial and Miscellaneous Securities. This manual provided in-depth statistics and information on stocks and bonds, and it quickly sold out. Through the ups and downs of a tumultuous period, Moody’s ultimately decided to provide analysis, as opposed to just data. John Moody, the founder, believed that analysis of security values is what investors really wanted, as opposed to just raw data. This analysis of securities eventually evolved into a variety of services Moody’s provides, including bond ratings, credit risk, research tools, related analysis, and ultimately data.

Dun & Bradstreet is perhaps the original innovator of the data-is-the-product business model. For many years, their reach and access to data was unsur- passed, creating an effective moat for competitive differentiation. However, as is often the case, the focus on narrow industries (like healthcare) and new methods for acquiring data have slowly brought a new class of competitors to the forefront.


Despite its lack of broad awareness, CoStar is a NASDAQ publicly traded company with revenues of $440 million, 2,500 employees, a stock price that has appreciated 204 percent over the last three years, a customer base that is unrivaled, and a treasure trove of data. CoStar’s network and tools have enabled it to amass data on 4.2 million commercial real estate properties around the world. Simon Law, the Director of Research at CoStar, says, “We’re number one for one very simple reason, and it’s our research. No one else can do what we do.” Here are some key metrics:

* 5.1 million data changes per day

* 10,000 calls per day to brokers and developers
* 500,000 properties canvased nationwide annually
* 1 million property photographs taken annually

CoStar has an abundance of riches when it comes to real estate data. Founded in 1987 by Andrew Florance, CoStar invested years becoming the leading provider of data about space available for lease, comparable sales information, tenant information, and many other factors. The data covers all commercial property types, ranging from office to multi-family to industrial to retail properties.

The company offers a set of subscription-based services, including

CoStar Property Professional: The company’s flagship product, which offers data on inventory of office, industrial, retail, and other commer- cial properties. It is used by commercial real estate professionals and others to analyze properties, market trends, and key factors that could impact food service or even construction.

CoStar Comps Professional: Provides comparable sales information for nearly one million sales transactions primarily for the United States and the United Kingdom. This service includes deeds of trust for properties, along with space surveys and demographic information.

CoStar Tenant: A prospecting and analytical tool utilized by profes- sionals. The data profiles tenants, lease expirations, occupancy levels, and related information. It can be an effective business development tool for professionals looking to attract new tenants.

CoStarGo: A mobile (iPad) application, merging the capabilities of Property Professional, Comps Professional, Tenant, and other data sources.

The value of these services are obviously determined by the quantity and quality of data. Accordingly, ensuring that the data remains relevant is a critical part of CoStar’s business development strategy.

Since its inception, CoStar has grown organically, but also has accelerated growth through a series of strategic acquisitions. In 2012, CoStar acquired LoopNet, which is an online marketplace for the rental and sale of properties. CoStar’s interest in the acquisition was less about the business (the marketplace) and much more about the data. Said another way, their acquisition strategy is about acquiring data assets, not people or technology assets (although those are often present). As a result of the acquisition, it is projected that CoStar will double their paid subscriber base to at 160,000 professionals, which represents about 15 percent of the approximately 1 million real estate professionals. Even more recently, in 2014, CoStar acquired, a digital alternative to classified ads. The war chest of data assets continues to grow.

The year 2008 was one of the most significant financial crises the world has seen. Financial institutions collapsed, and the real estate market entered a depression based on the hangover from subprime debt. Certainly, you would expect a company such as CoStar to see a similar collapse, given their dependence on the real estate market. But that’s not exactly what happened.

From 2008 to 2009, CoStar saw an insignificant revenue drop of about
1 percent. This drop was followed by an exponential rebound to growth in 2010 and beyond. Is it possible that data assets create a recession-proof business model?

CoStar Financial Results

While there are other data alternatives in the market (Reis Reports, Xceligent, CompStak, ProspectNow, and others), the largest collection of data is a differentiator. In fact, it is a defensible moat that makes it very hard for any other competitors to enter. For CoStar, data is the product, the business, and a control point in the industry.


In 1959, Richard O’Brien founded IHS, a provider of product catalog databases on microfilm for aerospace engineers. O’Brien, an engineer himself, saw how difficult it was to design products and knew that utilizing common components could dramatically increase the productivity of engineers. However, he took it one step further by applying technology to the problem — using electronic databases and CD-ROMs to deliver the knowledge furthered the productivity gains. And engineers love productivity gains.

This attitude toward data was set in the company’s DNA from the start as the company could see how to improve the lives and jobs of their clients, just through a better application of data. In the 1990s, IHS started distributing their data over the Internet, and with an even more cost-effective way to share data assets, they decided to expand into other industries. Soon enough, IHS had a presence in the automotive industry, construction, and electronics.

As seen with CoStar, IHS quickly realized that business development for a data business could be accelerated through acquisitions. Between 2010 and 2013, they acquired 31 companies. This acquisition tear continued, with the recent high-profile $1.4-billion acquisition of R.L. Polk & Company. As the parent company of CARFAX, R.L. Polk cemented IHS’s relevance in the automotive industry.

IHS’s stated strategy is to leverage their data assets across interconnected supply chains of a variety of industries. For their focus industries, there is $32 trillion of annual spending in those companies. IHS data assets can be utilized to enhance, eliminate, or streamline that spending, which makes them an indispensible part of the supply chain. IHS’s data expertise lies in

*Renewable energy

*Automotive parts
*Aerospace and defense
*Maritime logistics

IHS also has a broad mix of data from different disciplines.

While some view data as a competitive differentiator or something to augment current offerings, CoStar, IHS, and D&B are examples of compa- nies that have a much broader view of data: a highly profitable and defensi- ble business model.


The role of data in enterprises has evolved over time. Initially, data was used for competitive advantage to support a business model. This evolved to data being used to augment or improve existing products and services. Both of these applications of data are relevant historically and in the future. How- ever, companies leading in the data era are quickly shifting to a business model of data as the product. While there have been examples of this in history as discussed in this chapter, we are at the dawn of a new era, the Data era, where this approach will become mainstream.

The company that started as a social bookmarking service quickly realized the value of that data that they were able to collect via the service. This allowed them to build a product strategy around the data they collected, instead of around the service that they offered. This opportunity is available to many businesses, if they choose to act on it.

This post is adapted from the book, Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns, Wiley, 2015. Find more on the web at

Monday, July 27, 2015

Reinventing Retail: Customer Intimacy in the Data Era

Retail has continually reinvented itself over the past 100-plus years. Every 20 to 30 years, the form of retail has changed to meet the changing tastes of the public. McKinsey & Company, the global strategy consultancy, has explored the history of retail in depth, citing five distinct timeframes:

*1900s: The local corner store was prominent in many towns. These small variety stores offered a range of items, including food, clothes, tools, and other necessities. The primary goal was to offer anything a person would need for day-to-day life.

*1920–1940: The corner store was still prominent but had grown to a much larger scale. In this era, department stores first began to emerge, and some specialization of stores began to occur.

*1940–1970: In order to effectively deal with some of the specialization seen in the previous era, this timeframe was marked by the emergence of malls and shopping centers. This allowed for concentration of merchants, many of whom served a unique purpose.

*1970–1990: Perhaps best described as the Walmart era — a time when large players emerged, putting pressure on local store owners. These massive stores offered one-stop shopping and previously unseen value in terms of pricing and promotions. The size of these stores gave them economies of scale, which enabled aggressive pricing, with the savings passed on to the consumer.

*1990–2008: This era was marked by increased focus on discounting and large selection, coupled with the emergence of e-commerce.

Each era represented a significant innovation in the business model, but more important was the impact it had on each part of the retail value chain: merchandise and pricing, store experience, and the approach to marketing. Each new era has longed for balancing the new innovations and expansion with a key hallmark of the past: customer intimacy.


Retail, by definition, is mass market. It has been through every era. While subtle changes in approach have occurred, very few have captured the intimacy of the original corner store. The corner store’s owner knew the customers personally; he understood what was happening in their lives, and the store became an extension of the community. In the Data era, mass marketing can reclaim the corner-store experience.

Stitch Fix

Stitch Fix is a data era retailer, focused on personalizing a shopping experi- ence for women. While many women love clothes shopping, Stitch Fix realized that it is an inefficient experience today. It requires visiting many stores, selecting items to try on, and repeating. In fact, a successful shopping trip requires a relatively perfect set of variables to align:

*Location: A store must be near the shopper.
*Store: The store itself must interest the shopper and draw them in. Clothing: The clothing in the store must be of interest to the shopper.
*Circumstance: The clothing must match the circumstance for which the shopper needs clothes (dinner party, wedding, outing, etc.).
*Size: Even if all the preceding elements are present, the store must have the right size clothing in stock.
*Price: Even if all the preceding elements exist, the shopper must be able to afford the clothing.

To some extent, it’s amazing that all of these variables ever align. And perhaps they do not, which leads to compromise. But if all the variables could align and occurred repeatedly, would the shopper be more inclined to buy? Yes, and hence the premise of Stitch Fix.

Stitch Fix is disrupting fashion and retail, targeting professional women shoppers who want all the variables to align. These women do not have the time nor perhaps inclination to search for the alignment and hence, Katrina Lake, the CEO and cofounder states, “We’ve created a way to provide scalable curation. We combine data analytics and retail in the same system.”

When a person signs up for the service, she provides a profile of her prefer- ences: style, size, profession, budget, etc. The data from that profile become attributes in Stitch Fix’s systems, which promptly schedule the dates to receive the clothes, assign a stylist based on best fit, and enable the stylist to see the person’s profile (meaning her likes and dislikes). The customer also specifies when and how often she wants to receive a fix, which is a customized selection of five clothing items. Then the data-and-algorithms team will present sugges- tions to the stylist. This recommendation system helps the stylist make great decisions. Once the customer receives the fix, she can keep what she wants and send back the rest. Stitch Fix obviously maintains the data on preferences so that, over time, it becomes a giant analytics platform, where recommendations can be catered to a unique shopper. Not since the corner store has such intimacy been available, and it’s all because of the data. Clients are happier, the job of the stylist is easier, and this data then feed into the backend processes.

Retail is a difficult business. Fashion retail is even harder. It’s not as simple as managing the supply chain (although that’s not simple) because changing styles, seasons, and tastes are overlaid against the more traditional issues of sizes and stock. Any one poor decision can destroy the profit of a fashion retailer for a particular period, and therefore making the right decisions is at a premium. Stitch Fix attacks this challenge with human capital. Said a different way, this is not your typical management team for a fashion retailer. The leader of Operations at Stitch Fix comes from, while the analytics leader was previously an executive at Netflix. In a sense, Stitch Fix is building a supply chain and data analytics company that happens to focus on fashion. Not the other way around.

The company is making the bet that better customer insight will resolve many of the common fashion retailer issues: returns (ensuring fewer returns), inventory (predicting what people will want), and higher inventory turns (stocking things that customers will buy in the near-term). While Stitch Fix may not succeed as a retailer (although we think it will), it is laying the groundwork for the architecture of a retailer in the Data era.

Ms. Lake makes it clear that the company is first and foremost a retailer, but a retailer with a unique business model incorporating data and technology. Lake says, “We are a retailer. We just use data to be better at the core functions of retail. It’s hard to buy inventory accurately without knowing your customer, so we use data in the sourcing process as well.” She cites the example of looking at not just basic sizes (S, M, L or 2, 4, 6) as most buyers would, but looking at the detail of inseam size too. They can use this level of granularity in the buying process because of data. This attention to detail leads to a better fit for their clients and a higher likelihood those clients
will buy.

Most data leveraged by Stitch Fix is generated by the company. Their advantage comes from the large amount of what Lake calls explicit data, which is direct feedback from clients on every fix. That’s specific, unique, and real-time feedback that can be incorporated into future fixes and purchases. The buyers at Stitch Fix, responsible for stocking inventory according to new trends and feedback, love this data, as it tells them what to buy and focus on. As Lake says, “What customers buy and why, and what they don’t buy and why not, is very powerful.”

Stitch Fix has analyzed over 500 million individual data points. While the company has shipped over 100,000 fixes, no two have ever been the same. That’s personalization. The company sells 90 percent of the inventory that it buys each month at full price, again because of personalization. Data and personalization have the impact of delighting clients while revolutionizing the metrics of retail.


Zara’s business model is based on scarcity. In a store, if a shopper sees a pair of pants he likes, in his size, he knows it’s the only one that will ever be available, which drives him to purchase impulsively and with conviction. Scarcity is a powerful motivator. In 2012, Inditex (the parent company of Zara) reported total sales of $20.7 billion, with Zara representing 66 percent of total sales (or $13.6 billion), with 120 stores worldwide. Scarcity can also be a revolutionary business model and profit producer.

Amancio Ortega was born in Spain in 1936. In 1972, he founded Confecciones Goa to sell quilted bathrobes. He quickly learned the complexity of fashion, extending to retail, as he operated this supply chain of his own creating. Using sewing cooperatives, Ortega relied on thousands of local women to produce the bathrobes. This was the most cost-effective way for him to produce robes, but it came with the complexity of managing literally thousands of suppliers. This experience taught Ortega the importance of vertical integration or, said another way, the value of owning every step of the value chain. He founded Zara in 1975, with this understanding.

Zara uses data to expedite the entire process of the value chain. While it takes a typical retailer 9 to 12 months to go from concept to store shelf, Zara can do the same in approximately two weeks. This reduced timetable is accomplished through the use of data: The stores directly feed the design team with real-time behavioral data. Zara’s designers create approximately 40,000 new designs annually, from which 10,000 are selected for production. Given the range of sizes and colors, this variety of choice leads to approxi- mately 300,000 new stock keeping units (SKUs) every year.Chapter 4: Personalizing Retail and Fashion 67

Zara’s approach to the business has become known as fast fashion, as they will quickly adapt their designs to what is happening on the store floor, usher new products quickly to market, and just as swiftly move onto the next thing. This fast pace drives incredible efficiency in the implementation of the business model, yet at the same time, it creates enormous customer loyalty and intimacy, given the role of scarcity. Since the business can react so quickly, there is always sufficient capacity to produce the right design at the right time.

Zara’s system depends on the frequent sharing and exchange of data through- out the supply chain. Customers, store managers, designers, production staff, buyers, and warehouse managers are all connected by data and react accord- ingly. Data drives the business model, but it’s the reaction to the data that produces competitive advantage. Many businesses have a lot of data, but very few utilize it to rapidly effect decision making.

Unsold items account for less than 10 percent of Zara’s stock, compared with the industry average of 17 to 20 percent. This is the data in action. According to Forbes, “Zara’s success proves the theory that if a retailer can forecast demand accurately, far enough in advance, it can enable mass production under push control and lead to well managed inventories, lower markdowns, higher profitability (gross margins), and value creation for shareholders in the short- and long-term.”


Stitch Fix and Zara each provide a glimpse into the future of retail. It's not simply about ecommerce and automation. Instead, with the power of data, a retailer can redefine core business processes and in many cases, invent new ways of interacting with customers. This new level of intimacy changes the role that a retailer plays in a consumers life; from a sales outlet to a trusted advisor. However, knowing what needs to be done is easier than actually doing it — therein lies the challenge for all fashion designers and retailers.

This post is adapted from the book, Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns, Wiley, 2015. Find more on the web at

Monday, July 6, 2015

100% Effectiveness

In a recent profile, Reid Hoffman declared that he is only operating at 60% of capacity/effectiveness. Given that this is coming from the founder/Chairman of LinkedIn, and someone who is also a Partner at Greylock, it makes you think twice. It made me wonder if I'm setting the bar too low.


The Stanford Graduate School of Business has done a nice job with its 'Insights' program. All/most of them are available to view online. I recently watched the one with Steve Schwarzman and his views on talent and hiring resonated with me.

He talks about assessing the talent in your organization on a scale of 1 to 10 (10 being best). He says,

"If you're a 10, God bless you. You'll be wildly successful. If you attract 10's, they always make it rain if you need rain. A 10 knows how to sense problems, design solutions, and do new things.

A nine is great at executing. They come up with good strategies, but not great strategies. A firm full of nines, that's a winning firm. Eights, they just do stuff that you tell them. And sevens and below, I don't know what they are since we don't tolerate them."

Let me paraphrase and augment the descriptions a bit:

-designs great strategies
-leads from the front
-senses problems/issues and resolves them
-constantly drives new initiatives and creates new value
-executes and delivers...over and over again

-designs good strategies
-demonstrates attributes of a great leader
-executes flawlessly
-resolves issues quickly, as they are understood or highlighted

-executes flawlessly

7 and below

I realized when I heard Schwarzman talking and then paraphrased per above, that my post on "Principles of Great Performance" was a bit off. In that post, I really defined the principles of an 8 or 9 performer. This confirmed that I am mentally setting the bar too low.

Perhaps I am operating at a mere 50% of capacity.


Other great Stanford Insights interviews:

Ajay Banga, Mastercard
Marc Andreesen, a16z
Vinod Khosla, Khosla Ventures

Tuesday, June 16, 2015