Second-Level Thinking

Howard Marks is a well respected investor and the founder of Oaktree Capital Management. In a recent letter to investors, he introduced a concept that he calls 'Second-Level Thinking'. In his words:

This is a crucial subject that has to be understood by everyone who aspires to be a superior investor. Remember your goal in investing isn’t to earn average returns; you want to do better than average. Thus your thinking has to be better than that of others – both more powerful and at a higher level. Since others may be smart, well-informed and highly computerized, you must find an edge they don’t have. You must think of something they haven’t thought of, see things they miss, or bring insight they don’t possess. You have to react differently and behave differently. In short, being right may be a necessary condition for investment success, but it won’t be sufficient. You must be more right than others . . . which by definition means your thinking has to be different. . .

For your performance to diverge from the norm, your expectations have to diverge from the norm, and you have to be more right than the consensus. Different and better: that’s a pretty good description of second-level thinking.

Second-level thinking is deep, complex and convoluted.

Certainly, he sets a high mark for how to stretch our thinking.

In the context of the technology industry, I would use the following examples to contrast first-level and second-level thinking around building products:

First-level thinking says, “Clients are asking for this; this functionality will fill a need.” Second-level thinking says, “It’s something that our clients are asking for, but everyone is asking for that. Therefore, every competitor is pursuing that and its just a race to the finish and will quickly commoditize; let’s go in a different direction.”

First-level thinking says, “The IT analyst firms say this market will have low growth and most companies already have the capability. Let’s focus on a different market.” Second-level thinking says, “The outlook stinks, but everyone else is abandoning it. We could reinvent how clients are consuming in this critical area. Double down!”

These are rudimentary and simple, but hopefully sufficient examples for how Second-Level Thinking may apply in the technology industry.


Market Forces at Work

We are in an unprecedented business cycle. Protracted low interest rates have discouraged saving, and therefore money is put to work. At the same time, the rise of activist investors has altered traditional approaches to capital allocation. Public companies are being pushed to monetize their shareholders investments, either in the form of dividends or buybacks (and most often both). Because of this non-relenting pressure on public companies, investment has begun to flow more drastically towards private enterprises (at later and later stages), leading to the 'unicorn' phenomena. These 'unicorn' companies, which have the time and resources in their current form, are doing 3 things:

1) Paying anything for talent, causing wage inflation for engineers and some other roles.
2) Attempting to re-invent many industries, by applying technology and in many cases, shifting them to a pay-as-you-go (or as-a-service) model.
3) Spending aggressively, in any form necessary, to drive growth.

Public companies, in some cases, are crowded-out of the investments they would normally make, given this landscape. But, a central truth remains: at some point, an enterprise must make money. That timeline is typically compressed when capital begins to dry up. The term 'unicorn' was first used to connote something that is rarely seen. The fact that they are now on every street corner is perhaps an indication that time is short.


The Impact

1) "Winter is coming" for the engineering wage cycle. Currently, this inflation is driven in part by supply/demand but more so by the cult of "free money" and nothing else better to do with it. At some point, when 'hire at any cost' dissipates, we will know who has truly built differentiated skills.

2) The rise of cloud and data science will eliminate 50% of traditional IT jobs over the next decade. Read more here. The great re-skilling must start now, for companies that want to lead in the data era. Try this.

3) As-a-service is a cyclical change (not secular). The length of the cycle is anyones guess. And, as with most cycles, it will probably last longer and end faster, than most people believe. Much of this cycle is driven by the market forces described above (less money for capex, since all of it is being spent on buybacks/dividends). At some point, companies will realize that 'paying more in perpetuity' is not a good idea, and there will be a Reversion to the Mean.

4) Centralized computing architectures (cloud) will eventually diminish in importance. Right now, we are in a datacenter capital arms race, much like the Telco's were in 1999. But, as edge devices (smartphones, IoT, etc.) continue to advance and the world is blanketed with super computers, there will be less of a need for a centralized processing center.

5) Machine Learning is the new 'Intel inside'. This will become a default capability in every product/device, instrumenting business processes and decision making. This will put even more pressure on the traditional definition of roles in an organization.

6)There is now general agreement that data is a strategic asset. Because of this, many IT and Cloud providers are seeking to capture data, under the notion that 'data has gravity'. Once it is captured, the belief goes, it is hard to move, and therefore can be monetized. While I understand that in concept, its not very user centric. Who likes having their data trapped? No one. Therefore, I believe the real winners in this next cycle will be those that can enable open and decentralized data access. This is effectively the opposite of capturing it. It's enabling a transparent and open architecture, with the ability to analyze and drive insights from anywhere. Yet another reason to believe in Spark.


It's debatable if the 6 impacts above represent Second-Level Thinking. While they may to some extent, the real thinking would be to flesh out the implications of each, and place bets on the implications. These are bets that could be made in the form of financial investments, product investments, or "start a new company" investments.

IBM | Spark - An Update

IBM made some significant announcements around our investment in Spark, back in June (See here and here). 90 days later, I saw it fit to provide an update on where we stand in our community efforts.


Before I get to the details, I want to first re-state why I believe Spark is will be a critical force in technology, on the same scale as Linux.

1) Spark is the Analytics operating system: Any company interested in analytics will be using Spark.
2) Spark unifies data sources. Hadoop is one of many repositories that Spark may tap into.
3) The unified programming model of Spark, makes it the best choice for developers building data-rich analytic applications.
4) The real value of Spark is realized through Machine Learning. Machine Learning automates analytics and is the next great scale effect for organizations.

I was recently interviewed by Datanami on the topic of Spark. You can read it here. I like this article because it presents an industry perspective on Hadoop and Spark, but it's also clear on our unique point of view.

Also, this slideshare illustrates the point of view:


So, what have we accomplished and where are we going?

1) We have been hiring like crazy. As fast as we can. The STC in San Francisco is filling up and we have started to bring on community leaders like @cfregly .

2) Client traction and response has been phenomenal. We have references already and more on the way.

3) We have open sourced SystemML as promised (see on github) and we are working on it with the the open. This contribution is over 100,000 lines of code.

4) Spark 1.5 was just released. We went from 3 contributors to 11, in one release cycle. Read more here.

5) Our Spark specific JIRAs have been ~5,000 lines of code. You can watch them in action here.

6) We are working closely with partners like Databricks and Typesafe.

7) We have trained ~300,000 data scientists through a number of forums, including You can also find some deep technical content here.

8) We have seen huge adoption of the Spark Service on Bluemix.

9) We have ~10 IBM products that are leveraging Spark and many more in the pipeline.

10) We launched a Spark newsletter. Anyone can subscribe here.


Between now and the end of the year, we have 5 significant events, where we will have much more to share regarding Spark:

a) Strata NY- Sept 29-Oct 1 in New York, NY.
b) Apache Big Data Conference- Sept 28-30 in Budapest, Hungary.
c) Spark Summit Europe- Oct 27-29 in Amsterdam.
d) Insight- Oct 26-29 in Las Vegas.
e) Datapalooza- November 10-12 in San Francisco.

In closing, here is a peek inside:

3 Business Models for the Data Era


A walk through the meatpacking district in New York City is a lively affair. In the early 1900s, this part of town was known for precisely what its name implies: slaughterhouses and packing plants. At its peak, there were nearly 300 such establishments, located somewhat central to the city and not far from shipping ports. Through the years, this area of town has declined at times, but in the early 1990s, a resurgence began that shows no signs of ending.

Located in some proximity to the fashion district of Manhattan, the Meatpacking district stands for what is modern, hip, and trendy; a bastion of culture in the work-like city. Numerous fashionable retailers have popped up, along with trendy restaurants. And, on the fringes, there is evidence of the New York startup culture flowing from the Flatiron District, sometimes known as Silicon Alley. A visit to one of the companies in this area opened my eyes to some of the innovation that is occurring where data is the business, instead of being an enabler or addition to the business.


As I entered the office of this relatively newborn company, I was confident that I understood their business. It was pretty simple: They were a social sharing application on the web that enabled users to easily share links, share content, and direct their networks of friends to any items of interest. The term social bookmarking was one description I had heard. The business seemed straightforward: Attract some power users, enable them to attract their friends and associates, and based on the information shared, the company could be an effective ad-placement platform, since it would know the interests of the networks and users.

But what if social bookmarking and ad placement was not the business model at all? What if all that functionality was simply a means to another end?


Data has the power to create new businesses and even new industries. The challenge is that there are many biases about the use of data in a business. There is a view that data is just about analytics or reporting. In this scenario, it’s relegated to providing insight about the business. There is another view that data is simply an input into existing products. In this case, data would be used to enrich a current business process, but not necessarily change the process. While these cases are both valid, the power of the Data era enables much greater innovation than simply these incremental approaches.

There are three classes of business models for leveraging data:

Data as a competitive advantage: While this is somewhat incremental in its approach, it is evident that data can be utilized and applied to create a competitive advantage in a business. For example, an invest- ment bank, with all the traditional processes of a bank, can gain significant advantage by applying advanced analytics and data science to problems like risk management. While it may not change the core functions or processes of the bank, it enables the bank to perform them better, thereby creating a market advantage.

Data as improvement to existing products or services: This class of business model plugs data into existing offerings, effectively differentiat- ing them in the market. It may not provide competitive advantage (but it could), although it certainly differentiates the capabilities of the company. A simple example could be a real estate firm that utilizes local data to better target potential customers and match them to vacancies. This is a step beyond the data that would come from the Multiple Listing Service (MLS). Hence, it improves the services that the real estate firm can provide.

Data as the product: This class of business is a step beyond utilizing data for competitive advantage or plugging data into existing products. In this case, data is the asset or product to be monetized. An example of this would be Dun & Bradstreet, which has been known as the defini- tive source of business-related data for years.

In these business models, there are best practices that can be applied. These best practices are the patterns that teach us how to innovate in the Data era.


Procter & Gamble is legendary for the discipline of brand management. To be the leading consumer packaged-goods company in the world, brand is everything. Ultimately, consumers will buy brands that they know and understand, and those that fulfill an expectation. This was one of the first lessons that Scott Cook learned when he joined Procter & Gamble in the 1970s. He also observed that the core business processes for managing a brand (accounting, inventory, etc.) would be transformed and subsumed by the emerging personal computer revolution. This insight led him to co- found Intuit in 1983, really at the dawn of the expansion of personal computers. The premise was simple: Everyone, small companies and individuals alike, should have access to the financial tools that previously had been reserved for large enterprise companies.

Now broadly known for tax preparation software (TurboTax), along with software solutions for small businesses (QuickBooks) and individuals (Quicken), Intuit has transformed many lives. At over $4 billion in revenue and nearly 8,000 employees, the company is a success by any barometer. Intuit is one of the few software companies to challenge Microsoft head-on and not only live to tell about it, but to prosper in its wake. Microsoft Money versus Quicken was a battle for many years; Microsoft was trying to win with software, while Intuit was focused on winning with data, utilizing software as the delivery vehicle.
Robbie Cape, who ran Microsoft’s Money business from 1999 to 2001, believes that Intuit’s advantage had very little to do with technology. Instead, he attributes Intuit’s success to its marketing prowess. While there may be some truth to the statement, its very hard to believe that Intuit had deep- enough pockets to out-market Microsoft. Instead, the differentiation seems to come from data.

The NPD Group analyst Stephen Baker said that Intuit won by building out critical mass in financial software and the surrounding ecosystem. Intuit had the insight that adjacency products and services, leveraging data, made the core software much more attractive. This insight led to their early and sustained domination of the retail channel.

Intuit’s ability to collect and analyze a large amount of sensitive and confi- dential data is nearly unsurpassed. Nearly 20 million taxpayers use TurboTax online, sharing their most personal data. Over 10 million customers use QuickBooks, with employee information for over 12 million people flowing through its software. Brad Smith has been cited as declaring that 20 percent of the United States Gross Domestic Product flows through QuickBooks. No other collection of data has this type and extent of financial information on individuals and small businesses.

With these data assets, Intuit began to publish the Intuit Small Business Index. The Index provides summarized insights about sales, profit, and employment data from the small businesses that use QuickBooks. This information can provide headlights to business and salary trends, which ultimately becomes insight that can be fed back into the product. This was the point that Microsoft Money either missed or simply could not achieve: The value was never in the software itself. The value was in the collection, analysis, and repurposing of data to improve the outcomes for the users.

In 2009, Intuit purchased Mint, a free web-based personal-finance application. Mint took the Intuit business model a step further: They provide their software for free, knowing that it’s a means to an end. The social aspects of Mint enable users to do much more than simply track their spending. Instead, it became a vehicle to compare the spending habits of an individual to others of a similar geography or demographic. The user can do these comparisons, or the comparisons can show up as recommendations from Mint. Further, Mint brings an entirely different demographic of data to Intuit. While the Intuit customer base was largely the 40-and-over demographic (people who had grown up with Quicken, QuickBooks, etc.), Mint attracted the Millennial crowd. The opportunity to combine those two entirely different sets of data was too attractive for Intuit to pass up.

To date, Intuit has not had a strategy for monetizing the data itself. Perhaps that may change in the future. However, with data at the core of its strategy, Intuit has used that data to drive competitive advantage, while software was merely the delivery vehicle. The companies that tried the opposite have not fared so well.


Chapters 1 through 9 offer a multitude of examples in which data is being utilized to improve existing products or services. In the pursuit of a business model leveraging data, this category is often the low-hanging fruit; more obvious, although not necessarily easy to do. The examples covered previously are:

Farming and agriculture: Monsanto is using data to augment applica- tions like FieldScripts, which provides seeding prescriptions to farmers based on their local environments. While Monsanto could provide prescriptions through their normal course of operation, data has served to personalize and thereby improve that offering.

Insurance: Dynamic risk management in insurance, such as pay-as-you- drive insurance, leverages data to redefine the core offering of insurance. It changes how insurance is assessed, underwritten, and applied.

Retail and fashion: Stitch Fix is redefining the supply chain in retail and fashion, through the application of data. Data is augmenting the buying process to redefine traditional retail metrics of inventory, days sales outstanding, etc.

Customer service: Zendesk integrates all sources of customer engage- ment in a single place, leveraging that data to improve how an organiza- tion services customers and fosters loyalty over time.

Intelligent machines: Vestas has taken wind turbines — previously regarded as dumb windmills — and turned them into intelligent machines through the application of data. The use of data changes how their customers utilize the turbines and ultimately optimizes the return on their investment.

Most companies that have an impetus to lead in the Data era will start here: leveraging data to augment their current products or services. It’s a natural place to start, and it is relatively easy to explore patterns in this area and apply them to a business. However, it is unlikely that this approach alone is sufficient to compete in the Data era. It’s a great place to start, but not necessarily an endpoint in and of itself.


Previously in this chapter, the examples demonstrated how data is used to augment existing businesses. However, in some cases, data becomes the product; the sole means for the company to deliver value to shareholders. There are a number of examples historically, but this business model is on the cusp of becoming more mainstream.


In 1841, Lewis Tappan first saw the value to be derived from a network of information. At the time, he cultivated a group of individuals, known as the Mercantile Agency, to act “as a source of reliable, consistent, and objective” credit information. This vision, coupled with the progress that the idea made under Tappan and later under Benjamin Douglass, led to the creation of a new profession: the credit reporter. In 1859, Douglass passed the Agency to his brother-in-law, Robert Graham Dun, who continued expansion under the new name of R.G. Dun & Company.

With a growing realization of the value of the information networks being created, the John M. Bradstreet company was founded in 1849, creating an intense rivalry for information and insight. Later, under the strain caused by the Great Depression, the two firms (known at this time as R.G. Dun & Company and Bradstreet & Company) merged, becoming what is now known as Dun & Bradstreet.

Dun & Bradstreet (D&B) continued its expansion and saw more rapid growth in the 1960s, as the company learned how to apply technology to evolve its offerings. With the application of technology, the company intro-duced the Data Universal Numbering System (known as D&B D-U-N-S), which provided a numerical identification for businesses of the time. This was a key enabler of data-processing capabilities for what had previously been difficult-to-manage data.

By 2011, the company had gained insight on over 200 million businesses. Sara Mathew, the Chairman and CEO of D&B, commented, “Providing insight on more than 200 million businesses matters because in today’s world of exploding information, businesses need information they can trust.”

Perhaps the most remarkable thing about D&B is the number of companies that have been born out of that original entity. As the company has restruc- tured over the years, it has spun off entities such as Nielsen Corporation, Cognizant, Moody’s, IMS Health, and many others. These have all become substantial businesses in their own right. They are each unique in the markets served and all generate value directly from offering data as a product:

Nielsen Corporation: Formerly known as AC Nielsen, the Nielsen Corporation is a global marketing research firm. The company was founded in 1923 in Chicago, by Arthur C. Nielsen, Sr., in order to give marketers reliable and objective information on the impact of market- ing and sales programs. One of Nielsen’s best known creations is the Nielsen ratings, an audience measurement system that measures television, radio, and newspaper audiences in their respective media markets. Nielsen now studies consumers in more than 100 countries to provide a view of trends and habits worldwide and offers insights to help drive profitable growth.

Cognizant: Starting as a joint venture between Dun & Bradstreet and Satyam Computers, the entity was originally designed to be the in-house IT operation for D&B. As the entity matured, it began to provide similar services outside of D&B. The entity was renamed Cognizant Technology Solutions to focus on IT services, while the former parent company of Cognizant Corporation was split into two companies: IMS Health and Nielsen Media Research. Cognizant Technology Solutions became a public subsidiary of IMS Health and was later spun off as a separate company. The fascinating aspect of this story is the amount of intellec- tual property, data, and capability that existed in this one relatively small part of Dun & Bradstreet. The interplay of data, along with technology services, formed the value proposition for the company.

IMS Health: IMS became an independent company in 1998. IMS’s competitive advantage comes from the network of drug manufacturers, wholesalers, retailers, pharmacies, hospitals, managed care providers, long-term care facilities and other facilities that it has developed over time. With more than 29,000 sources of data across that network, IMS has amassed tremendous data assets that are valuable to a number of constituents — pharmaceutical companies, researchers, and regulatory agencies, to name a few. Like Lewis Tappan’s original company back in 1841, IMS recognized the value of a network of information that could
be collected and then provided to others. In 2000, with over 10,000 data reports available, IMS introduced an online store, offering market intelligence for small pharmaceutical companies and businesses, enabling anyone with a credit card to access and download data for their productive use. This online store went a long way towards democratizing access to the data that had previously been primarily available to large enterprise buyers.

Moody’s: Moody’s began in 1900, with the publishing of Moody’s Manual of Industrial and Miscellaneous Securities. This manual provided in-depth statistics and information on stocks and bonds, and it quickly sold out. Through the ups and downs of a tumultuous period, Moody’s ultimately decided to provide analysis, as opposed to just data. John Moody, the founder, believed that analysis of security values is what investors really wanted, as opposed to just raw data. This analysis of securities eventually evolved into a variety of services Moody’s provides, including bond ratings, credit risk, research tools, related analysis, and ultimately data.

Dun & Bradstreet is perhaps the original innovator of the data-is-the-product business model. For many years, their reach and access to data was unsur- passed, creating an effective moat for competitive differentiation. However, as is often the case, the focus on narrow industries (like healthcare) and new methods for acquiring data have slowly brought a new class of competitors to the forefront.


Despite its lack of broad awareness, CoStar is a NASDAQ publicly traded company with revenues of $440 million, 2,500 employees, a stock price that has appreciated 204 percent over the last three years, a customer base that is unrivaled, and a treasure trove of data. CoStar’s network and tools have enabled it to amass data on 4.2 million commercial real estate properties around the world. Simon Law, the Director of Research at CoStar, says, “We’re number one for one very simple reason, and it’s our research. No one else can do what we do.” Here are some key metrics:

* 5.1 million data changes per day

* 10,000 calls per day to brokers and developers
* 500,000 properties canvased nationwide annually
* 1 million property photographs taken annually

CoStar has an abundance of riches when it comes to real estate data. Founded in 1987 by Andrew Florance, CoStar invested years becoming the leading provider of data about space available for lease, comparable sales information, tenant information, and many other factors. The data covers all commercial property types, ranging from office to multi-family to industrial to retail properties.

The company offers a set of subscription-based services, including

CoStar Property Professional: The company’s flagship product, which offers data on inventory of office, industrial, retail, and other commer- cial properties. It is used by commercial real estate professionals and others to analyze properties, market trends, and key factors that could impact food service or even construction.

CoStar Comps Professional: Provides comparable sales information for nearly one million sales transactions primarily for the United States and the United Kingdom. This service includes deeds of trust for properties, along with space surveys and demographic information.

CoStar Tenant: A prospecting and analytical tool utilized by profes- sionals. The data profiles tenants, lease expirations, occupancy levels, and related information. It can be an effective business development tool for professionals looking to attract new tenants.

CoStarGo: A mobile (iPad) application, merging the capabilities of Property Professional, Comps Professional, Tenant, and other data sources.

The value of these services are obviously determined by the quantity and quality of data. Accordingly, ensuring that the data remains relevant is a critical part of CoStar’s business development strategy.

Since its inception, CoStar has grown organically, but also has accelerated growth through a series of strategic acquisitions. In 2012, CoStar acquired LoopNet, which is an online marketplace for the rental and sale of properties. CoStar’s interest in the acquisition was less about the business (the marketplace) and much more about the data. Said another way, their acquisition strategy is about acquiring data assets, not people or technology assets (although those are often present). As a result of the acquisition, it is projected that CoStar will double their paid subscriber base to at 160,000 professionals, which represents about 15 percent of the approximately 1 million real estate professionals. Even more recently, in 2014, CoStar acquired, a digital alternative to classified ads. The war chest of data assets continues to grow.

The year 2008 was one of the most significant financial crises the world has seen. Financial institutions collapsed, and the real estate market entered a depression based on the hangover from subprime debt. Certainly, you would expect a company such as CoStar to see a similar collapse, given their dependence on the real estate market. But that’s not exactly what happened.

From 2008 to 2009, CoStar saw an insignificant revenue drop of about
1 percent. This drop was followed by an exponential rebound to growth in 2010 and beyond. Is it possible that data assets create a recession-proof business model?

CoStar Financial Results

While there are other data alternatives in the market (Reis Reports, Xceligent, CompStak, ProspectNow, and others), the largest collection of data is a differentiator. In fact, it is a defensible moat that makes it very hard for any other competitors to enter. For CoStar, data is the product, the business, and a control point in the industry.


In 1959, Richard O’Brien founded IHS, a provider of product catalog databases on microfilm for aerospace engineers. O’Brien, an engineer himself, saw how difficult it was to design products and knew that utilizing common components could dramatically increase the productivity of engineers. However, he took it one step further by applying technology to the problem — using electronic databases and CD-ROMs to deliver the knowledge furthered the productivity gains. And engineers love productivity gains.

This attitude toward data was set in the company’s DNA from the start as the company could see how to improve the lives and jobs of their clients, just through a better application of data. In the 1990s, IHS started distributing their data over the Internet, and with an even more cost-effective way to share data assets, they decided to expand into other industries. Soon enough, IHS had a presence in the automotive industry, construction, and electronics.

As seen with CoStar, IHS quickly realized that business development for a data business could be accelerated through acquisitions. Between 2010 and 2013, they acquired 31 companies. This acquisition tear continued, with the recent high-profile $1.4-billion acquisition of R.L. Polk & Company. As the parent company of CARFAX, R.L. Polk cemented IHS’s relevance in the automotive industry.

IHS’s stated strategy is to leverage their data assets across interconnected supply chains of a variety of industries. For their focus industries, there is $32 trillion of annual spending in those companies. IHS data assets can be utilized to enhance, eliminate, or streamline that spending, which makes them an indispensible part of the supply chain. IHS’s data expertise lies in

*Renewable energy

*Automotive parts
*Aerospace and defense
*Maritime logistics

IHS also has a broad mix of data from different disciplines.

While some view data as a competitive differentiator or something to augment current offerings, CoStar, IHS, and D&B are examples of compa- nies that have a much broader view of data: a highly profitable and defensi- ble business model.


The role of data in enterprises has evolved over time. Initially, data was used for competitive advantage to support a business model. This evolved to data being used to augment or improve existing products and services. Both of these applications of data are relevant historically and in the future. How- ever, companies leading in the data era are quickly shifting to a business model of data as the product. While there have been examples of this in history as discussed in this chapter, we are at the dawn of a new era, the Data era, where this approach will become mainstream.

The company that started as a social bookmarking service quickly realized the value of that data that they were able to collect via the service. This allowed them to build a product strategy around the data they collected, instead of around the service that they offered. This opportunity is available to many businesses, if they choose to act on it.

This post is adapted from the book, Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns, Wiley, 2015. Find more on the web at

Technical Leadership

As companies grow and mature, it is difficult to maintain the pace of innovation that existed in the early days. This is why as many companies mature (i.e. Fortune 500), they sometimes lose their innovation edge. The edge is lost when technical leadership in the company either takes a backseat or evolves to a different role (different than the role it had in the early days). I see a number of companies where over time, the technical managers give way to "personnel" or "process" managers, which tends to be a death knell for innovation.

Great technical leaders provide a) team support and motivation, b) technical excellence, and c) innovation. Said another way, they lead through their actions and thought leadership.


As I look at large organizations today, I believe that technical leaders fall into 3 types (this is just my framework for characterizing what I see).

The Ambassador
A technical leader of this type brings broad insight and knowledge and typically spends a lot of time with the clients of the company. They drive clients in broad directional discussions and will often be a part of laying out a logical architectures and approaches. They are typically not as involved where the rubber hits the road (ie implementation of architectures or driving specific influence product roadmaps). Most of the artifacts from The Ambassador are in email, powerpoint, and discussion (internally and with clients).

The Developer
A technical leader that is very deep, typically in a particular area. They know their user base intimately and use that knowledge to drive changes to the product roadmap. They are heavily involved in critical client situations, as they have the depth of knowledge to solve the toughest problems and they make the client comfortable due to their immense knowledge. Most of the artifacts from The Developer are code in a product and a long resume of client problems solved and new innovations delivered in a particular area.

The Ninjas
A technical leader that is deep, but broad as appropriate. They integrate across capabilities and products, to drive towards a market need. They have a 'build first' mentality or what i call a 'hacker mentality'. They would prefer to hack-up a functional prototype in 45 days, than do a single slide of powerpoint. Their success is defined by their ability to introduce a new order to things. They thrive on user feedback and iterate quickly, as they hear from users. Said another way, they build products like a start-up would. Brian, profiled here, is a great example of a Ninja. Think about the key attributes of Brian's approach:

1) Broad and varied network of relationships
2) Identifying 'strategy gaps'
3) Link work to existing priorities
4) Work with an eye towards scale
5) Orchestrating milestones to build credibility

That's what Ninja's do.


Most large companies need Ambassadors, Developers, and Ninjas. They are all critical and they all have a role. But, the biggest gap tends to be in the Ninja category. A company cannot have too many, and typically does not have enough.

Big Data Revolution

My book is finally coming (looks like January). I shared a draft of the foreword here, and thought I would share Chapter 1 now, as we are getting down to final proofs. Patrick McSharry, my co-author, is a friend that I met on the journey. Note: the graphics and formatting will be cleaner/sharper in the book.

Blog subscribers (you can subscribe on the right side of this page) will receive a copy of Chapter 2: Why Doctors Will Have Math Degrees, once it's ready.

Chapter 1: Transforming Farms with Data


AS THE WHEELS came down on my cross-country flight, I prepared for our landing at San Francisco International Airport (SFO). Looking out the window,I could see the sprawl of Silicon Valley, the East Bay, and in the distance, the San Francisco skyline. It is hard to believe that I was here to explore agriculture in 2013, given that what I could see from the plane was mostly concrete, highways, and heavy construction.

Not too many miles away from SFO, I began to wind through the tight curves of back roads, making my way to the headquarters of a major agricultural producer. While I had never visited this company before, I had the opportunity to sit down with the executive team to explore the topic of big data in farming and agriculture.

I embraced the calm and serene scene, a far cry from the vibrancy of San Francisco and the rush of Silicon Valley. As we entered a conference room, the discussion turned to produce, as I asked, “Why is it that the strawberries that I bought last week taste so much better than the ones I bought the week before?” While I posed the question as a conversation starter, it became the crux of our discussion.

It seems that quality — and, more specifically, consistency of quality — is the foremost issue on the mind of major producers. I asked about the exquisite quality of produce in Japan. The executive team quickly noted that Japan achieves quality at the price of waste. Said another way, they keep only
10 percent of what a grower provides. This clarified the point in my mind that quality, consistency of quality, and eliminating waste create the three sides of a balanced triangle.

The conversation that followed revealed one significant consensus in the room: Weather alone impacts crop production and the consistency of crops. And since no one in the room knew how to change the weather, they believed that this was the way things would always be. I realized that by blaming the weather this team believed their future did not belong in their own hands but was controlled by the luck, or the misfortune, of each passing season.


The evolution of farming in the developed world provides context to much of the conventional wisdom about farming that exists today. Dating back to the 1700s, farming has been defined by four eras:

1700s (Subsistence Farming): Farmers produced the minimum amount of food necessary to feed their families and have some in reserve for the cold winter months.

1800s (Farming for Profit): This era marked the transition from subsistence farming to for-profit farming. This is when the widespread use of barns began, for the purpose of storing tools, crops, and related equipment. These were called pioneer farms.

Early 1900s (Power Farming): At this time, the “power” came in the form of 1,800-pound horses. The farmers used animals for plowing, planting, and transporting crops. The use of animal labor drove the first significant increase in crop productivity.

Mid-to-Late 1900s (Machine Farming): Sparked by the Industrial Revolution, this era’s farmers relied on the automation of many of the tasks formerly done by hand or animal. The addition of machinery created tremendous gains in productivity and quality.

Each era represented a significant step forward, based on the introduction of new and tangible innovations: barns, tools, horses, or machines. The progress was physical in nature, as you could easily see the change happening on the farms. In each era, production and productivity increased, with the most significant increases in the latter part of the 20th century.

Through these stages, farming became more productive, but not necessarily more intelligent.


The current era of farming is being driven by the application of data. It is less understood than previous eras because it is not necessarily physical in nature. It’s easy to look at a horse and understand quickly how it can make farm labor easier, but understanding how to use geospatial information is a different proposition. The advantage is driven by intangibles: knowledge, insight, decision making. Ultimately, data is the fertilizer for each of those intangibles.

A simple understanding of how a crop grows can aid in understanding the impact of data on farms. The basic idea is that a plant needs sunlight, nutrients from the soil, and water to grow into a healthy plant, through a process called photosynthesis. Healthy plants must keep cool through a process called transpiration (similar to how a human sweats when physically stressed). But, if a plant lacks the nutrients or conditioning to transpire, then its functions will start to break down, which leads to damage. Using data to improve farming is fundamentally about having the ability to monitor, control, and if necessary, alter these processes.

Today, according to the Environmental Protection Agency, there are 2.2 million farms in the United States and many more outside of the U.S. The average farm spends $110,000 per year on pest control, fertilizer, and related items to drive yield. The prescient way to improve profit and harvest yields across a vast territory requires better collection, use, and application of data.


Potato farming can be exceedingly difficult, especially when attempted at a large scale with the goal of near perfect quality. The problem with potato farming is that the crop you are interested in is underground; therefore, producing a high-quality and high-yield potato crop depends on agronomic management during the growing process.

At the Talking Data South West Conference in 2013, Dr. Robert Allen, a Senior GIS Analyst at Landmark Information Group, highlighted the importance of data in potato farming, in his talk titled, “Using Smartphones to Improve Agronomic Decision Making in Potato Crops.” Dr. Allen makes the case that leveraging data that describes the growth and the maturation of a crop during the growing season is instrumental to a successful yield. Continuous insight, delivered throughout the growing season, may have a material impact on the productivity of a crop.

One of the key variables required for yield prediction, and needed to manage irrigation, is groundcover. Groundcover, which calculates the percentage of ground covered by green leaf, provides critical input in the agronomic management of potato crops. Measuring groundcover is not as simple as pulling out a measuring tape; it requires capturing imagery of potato crops and large-scale collection of data related to the images (the water balance in soil, etc.), and the data must be put in the hands of farm managers and agronomists so that they can actually do something about what the data is telling them. The goal is not to collect data, but to act on it.

Dr. Allen describes four considerations in potato farming related to using data:

Time: Data needs to be collected at regular intervals and decisions need to be made in near-real-time.

Geography: These tend to be large-scale operations (10,000 to 20,000 acres), with fields distributed over large areas.

Man power: Data is often collected by farm field assistants (not scientists) and must be distributed because decision makers tend to be remote from the field.

Irrigation: Irrigation, while very expensive, is a primary factor in the maturation of a potato crop. Utilizing data to optimize the use of irrigation can lead to a productive crop, at the lowest possible cost.

These considerations led to a data collection and analysis solution called CanopyCheck. While it requires only a download from Apple’s App Store, it provides a rich data experience to compare groundcover and other related data to optimize the quality and yield of a potato crop.

The Landmark Information Group describes CanopyCheck (http:// 3260-20_4-10094055.html) as

This app is for potato growers, using the CanopyCheck ground- cover monitoring system, and captures accurate and reliable images of the potato crop which can be used to describe crop development. Each image is geo-located and labelled with farm and field information specified by the potato grower on the accompanying CanopyCheck website.

Conventional wisdom states that growing potatoes is easy: They don’t need sunlight, they do not need daily care, and by controlling the amount of water they receive, growing potatoes is a fairly simple process. However, as is often the case, conventional wisdom overlooks the art of the possible. In the case of potatoes, the application of data and agronomy can drive yield productivity up 30 to 50 percent, which is material in terms of the economics and the waste that is reduced.


Whether you strike up a conversation with a farmer in the 1800s, 1900s, or even in the early part of this century, they would highlight:

1) Their growing strategy evolves each year.

2) While the strategy evolves, their ability to execute improves each year,based on increased knowledge.

While this farming approach has been good enough for the better part of three centuries, the Data era ushers in the notion of precision farming. According to Tom Goddard, of the Conservation and Development Branch of Alberta Agriculture, Food and Rural Development, the key components of precision farming are:

Yield monitoring: Track crop yield by time or distance, as well as distance and bushels per load, number of loads, and fields.

Yield mapping: Global Positioning System (GPS) receivers, along with yield monitors, provide spatial coordinates, which can be used to map entire fields.

Variable-rate fertilizer: Managing the application of a variety of fertilizer materials.

Weed mapping: Mapping weeds using a computer connected to a GPS receiver while adjusting the planting strategy, as needed.

Variable spraying: Once you know weed locations from weed mapping, spot control can be practiced.

Topography and boundaries: Creating highly accurate topographic maps using a Differential Global Positioning System (DGPS). This data can be used to take action on yield maps.

Salinity mapping: This is valuable in interpreting yield maps and weed maps, as well as tracking the salinity over a period of time.

Guidance systems: Guidance systems, such as DGPS (accurate to a foot or less) are valuable for assessing fields.

Records and analyses: Large data collection is necessary to store pertinent data assets, along with images and geospatial information. It is important that this information can be archived and retrieved for future use.

The extensive insight that can be gained by collecting each of these data points is potentially revolutionary. It evolves a process from instinctual to data-driven — which, as seen in the potato example, has a fundamental impact on yields and productivity.

The underlying assumption is that the tools and methodology for capturing farm data are available and utilized efficiently. This is a big assumption because many farms today are not set up to actively collect and capitalize on new data assets. Accordingly, the ability to capture farm data becomes the source of competitive advantage.


It sounds easy. Collect data. Then use that data to deliver insights. But, for anyone who has been on a rural farm in the last decade, it is easier said than done. There are limitations that exist on many farms: lack of digital equip- ment, lack of skilled technology labor, poor distribution of electricity, and poorly defined processes. Because of these factors, each farmer must establish a new order of doing things to take advantage of the Data era. The data landscape for farming consists of three primary inputs.

Sensing equipment: Mounted devices on machinery, in fields, or anywhere near crops could be designed to collect/stream data or to control the application of water, pesticides, etc. This could range from instrumented tractors for harvesting to devices to monitor crop transpiration. The evolution of machines to collect data on crops and soil has been dramatic. In the last decade alone, equipment has evolved from mechanical-only to a combination of mechanical and digital technology. This change has been expedited by early insights that even small adjustments in planting depth or spacing can have huge impact on yields. So, while today the sensing equipment is largely a digitized version of common farm machines, the future will see a marked advancement in machines. Drones, driverless tractors, and other innovations will become commonplace.

Global Positioning System (GPS): GPS provides the ability to pinpoint location accuracy within one meter. While GPS first emerged for automobiles in the early 1990s in places like Japan, it has just now become common in all automobiles. Farming equipment, as you may expect, has been even a step further behind, with the wide use of GPS just accelerating in the last decade.

Geographic Information System (GIS): GIS assesses changes in the environment, tracks the spread of disease, as well understanding where soil is moist, eroded, or has experienced similar changes in condition. Once you know weed locations from weed mapping, spot control can be implemented. Topography and geology are important considerations in the practice of farming. Both are well accounted for with modern-day Geographic Information Systems.

By combining these three inputs, farmers will be able to accurately pinpoint machinery on their farms, send and receive data on their crops, and know which areas need immediate attention.


John Deere founded Deere & Company in 1836, when he moved to Grand Detour, Illinois to open a repair shop for farming tools. Deere eventually moved beyond tools and into the production of plows, which became a mainstay in the Farming for Profit and Power Farming eras. In 1848, Deere relocated to its still-current home in Moline, Illinois, and after his death in 1886, he passed the presidency of the company to Charles Deere.

Charles led the company into the 20th century, where the company pioneered the move to gasoline tractors, which became the defining product of not only the company, but of farming and agriculture in this time. The dominance of the company was ensured by continuous innovations in their tractors, innovation in their business model (a robust dealer network), and their defining image: John Deere green. As of 2010, the company employed 55,000 people and was operating in 30 countries worldwide. A shoe-in for continued dominance, right?

Monsanto, founded in 1901, took a bit longer to come into its defining moment. Moving into detergents and pesticides, Monsanto eventually became the pioneer in applying biotechnology to farming and agriculture. With biotechnology at its core, Monsanto applies data and insight to solve problems. Accordingly, Monsanto was a data-first company in its birth, which continued to drive its innovation and relevance. But sometimes, it takes time for an industry to catch up to its innovative leaders, and the first major evidence of how Monsanto would lead a change in the landscape was seen around 2010. That is when you see the fortunes of Deere & Company and Monsanto start to go in different directions.

Monsanto had one critical insight: Establishing data-driven planting advice could increase worldwide crop production by 30 percent, which would deliver an estimated $20-billion economic impact — all through the use and application of data. As Monsanto bet the company on the Data era, the stock market began to realize the value of the decision, leading to a period of substantial stock appreciation.

Data is disrupting farming, and we are starting to see that in the business performance of companies driving innovation in the industry. Gone are the days in which a better gasoline tractor will drive business performance. Instead, farmers demand data and analytics from their suppliers, as they know that data will drive productivity.


Monsanto calls their approach to farming in the Data era, Integrated Farming Systems (IFS). Their platform provides farmers with field-by-field recommendations for their farm, including ways to increase yield, optimize inputs, and enhance sustainability. Listening to the data and making small adjustments to planting depth or the spacing between rows makes a vast difference in production. As Monsanto says, this is “Leveraging Science- Based Analytics to Drive a Step Change in Yield and Reduced Risk.”

Monsanto’s prescribed process for Integrated Farming Systems involves six steps:

1. Data backbone: Seed-by-environment testing to produce on-farm prescriptions
2. Variable rate fertility: Adjusting prescriptions, based on conditions.
3. Precision seeding: Optimal spacing between rows
4. Fertility and disease management: Custom applications, as needed
5. Yield monitor: Delivering higher resolution data
6. Breeding: Increase data points collection to increase genetic gain

FieldScripts became the first commercial product to be made available as a component of Monsanto’s overall IFS platform. FieldScripts provides accurate seeding prescriptions for each farmer and each field.

Monsanto, through its seed dealer network, engages directly with farmers to optimize two variables: planting and seeding technology data. The seeding technology, which is primarily data about seeding, is the differentiating factor. Applying that insight to a personalized planting plan enables Monsanto to deliver personalized prescriptions for every field.

FieldScripts, delivered via iPad, utilizes a custom application called Field- View. FieldView, deployed to farmers, while leveraging the data acquired throughout the years, equips farmers with the tools and insights needed to make adjustments for optimal yields.

Deere & Company and Monsanto both have bright futures. According to Jeremy Grantham, chief investment strategist of Grantham Mayo Van Otterloo (GMO), with the world’s population forecasted to reach almost 10 billion by 2050, the current approach cannot sustainably feed the world’s population. The demand presented by population growth creates an oppor- tunity for all companies that service the industry. For the moment, Monsanto has leaped ahead in this new era of data-farming over the past five years, forcing Deere & Company to play catch-up.


Data is starting to prevail in agriculture. This is evident not only in the changing practices of farmers, but also in the ecosystem. New companies are being built, focused on exploiting the application of data.


Monsanto’s aggressive move into the Data era was perhaps punctuated in October 2013 with their announced acquisition of the Climate Corporation for $930 million. Why would a firm with its roots in fertilizers and pesticides spend nearly $1 billion on an information technology (IT) company? This aggressive acquisition demonstrates the evolution of the industry. “The Climate Corporation is focused on unlocking new value for the farm through data science,” commented Hugh Grant, the chairman and chief executive officer for Monsanto. Founded in 2006, the Climate Corporation employs a team unlike any other in the agriculture industry. The team is composed of many former Google employees, along with other elite technology minds from the Silicon Valley scene. The tools they develop help farmers boost productivity, improve yields, and manage risks, all based on data.

At the heart of this acquisition lies the core belief that every farmer has an unrealized opportunity of around 50 bushels of crop (corn, potatoes, etc.)in each of their fields. The key to unlocking these additional bushels lies in the data.

While the leaders of the past would provide better machines, Monsanto focuses on providing better data. By combining a variety of data sources (historical yield data, satellite imagery, information on soil/moisture, best practices around planting and fertility), this information equips the farmers with the information they need to drive productivity.


GrowSafe Systems began studying cattle in 1990. This was not a group of former cattle hands, but a team of engineers and scientists who foresaw data science as playing a role in cattle raising. In 2013, the GrowSafe team won the Ingenious Award from the Information Technology Association of Canada for best innovation. This was the first time that this organization gave an innovation award to anyone in the world of cattle.

GrowSafe developed a proprietary way of collecting data through the use of sensors in water troughs and feedlots. With these sensors, they track every movement of cattle, including specifics about the cattle themselves: con- sumption, weight, movement, behaviors, and health. Each night, the data is collected and then compared against a larger corpus of historical data. The goal is to look for outliers. GrowSafe knows that the data reveals information that cattle farmers often cannot detect. This innovative approach enables farmers to prevent a disease before it begins.


The mainstays of today’s farms are people, fertilizers, irrigation, gas machines, trucks and carts for transport, and local knowledge. It is a craft, and typically the only person that can run a certain farm is the person that started it. This is why so many farms fold after the head of the operation retires. The success is in their hands — it’s their craft.

Farms in 2020 will have a completely different feel from today’s farms. In fact, they may be unrecognizable to a farmer of the early 21st century. Approaches that seem futuristic today will be common to all farmers in 2020:

Digital machines: Digital machines, acting as sensors, will be the norm. The days of simple gas machines will be far in the past. In fact, by 2020, many farm machines will be battery- or solar-powered, with gas itself becoming a rarity. The digital machines will be much more than Internet-enabled tractors. There will be drones. Many drones. In fact, drones will become the most cost-effective and precise mechanism for managing many chores that farmers do via hand or by tractor today. With the sprawl of digital machines, device management will become the “cattle herding of the future,” as all devices will have to be managed and maintained appropriately.

IT back office: Every farm will have an information technology (IT) back office. Some will manage it themselves, while many will rely on a third party. The IT office will be responsible for the aforementioned device management, as well as remote monitoring and, ultimately, data-driven decision making. The IT back office will be the modern-day farm hand, responding to the farmers’ every need and ultimately ensuring that everything operates as programmed.

Asset optimization: With the sprawl of new devices and machines, asset optimization will be at the forefront. Maximizing the useful life of machines, optimizing location, and managing tasks (workloads) will be key inputs into determining the productivity of a farm.

Preventative maintenance: Digital machines, like gas machines, break. It is a reality of complex systems. This fact places the burden on preventing or minimizing outages because of maintenance and repairs. Many of the digital machines and devices will be designed to predict and prevent failures, but ultimately, this must become a core compe- tence of the farmer or his IT back office. Given that each farm will use the machines differently, the maintenance needs will likely be unique.

Predictable productivity: In today’s farms, the yield and productivity of crops vary significantly. Whether it is the weather, impacts of deforestation, or the impact of certain pesticides and fertilizers, it is an often-unpredictable environment. By 2020, productivity will be more predictable. Given all of the sources of data, GIS and GPS capabilities, and the intensive learning that will happen over the next five years, yields will become predictable, creating greater flexibility in the financial model for a farmer.

Risk management: In 2020, instead of being a key determinant of success, weather will simply be another variable in the overall risk management profile of an asset (in this case, a farm). Given the predictable productivity, risk management will be more about managing catastrophic outliers and, in some cases, sharing that risk with counterparties. For example, index-based insurance offers great potential in this area.

Real-time decision-making: Decisions will be made in the moment. With the growth of streaming data, farms will be analyzed as variables are collected and acted upon immediately. Issues will be remediated in 2020 faster than they will be identified in 2014. This is part of what drives the predictable productivity.

Production variability: Farms will no longer produce a single crop or focus on a subset. Instead, they will produce whatever will drive the greatest yield and productivity based on their pre-planting-season analysis. Farms will also begin to factor in external data sources (supply and demand) and optimize their asset for the products in greatest demand. This will completely change the variability that we see in commodities and food prices today.


As I left the headquarters of the agricultural company outside of San Francisco, I was amazed that a belief persists, in some places, that weather is the major force impacting our ability to grow consistent and productive crops. That does not seem much different from the pioneer farms of the 1800s, where the weather determined not only their business, but also their livelihoods.

Perhaps, as postulated before, that is the easy answer, as opposed to the real answer. The innovations that we’ve seen with precision farming, using data to transform potato crops, and the emergence of leaders like Monsanto, makes it evident that the weather is merely one variable that could impact crops in the future.

Data trumps weather. Farming and agriculture will be transformed by making the leap of acknowledging this truth.


Copyright 2014 Rob Thomas.
All Rights Reserved.