Big Data Revolution

My book is finally coming (looks like January). I shared a draft of the foreword here, and thought I would share Chapter 1 now, as we are getting down to final proofs. Patrick McSharry, my co-author, is a friend that I met on the journey. Note: the graphics and formatting will be cleaner/sharper in the book.

Blog subscribers (you can subscribe on the right side of this page) will receive a copy of Chapter 2: Why Doctors Will Have Math Degrees, once it's ready.

Chapter 1: Transforming Farms with Data


AS THE WHEELS came down on my cross-country flight, I prepared for our landing at San Francisco International Airport (SFO). Looking out the window,I could see the sprawl of Silicon Valley, the East Bay, and in the distance, the San Francisco skyline. It is hard to believe that I was here to explore agriculture in 2013, given that what I could see from the plane was mostly concrete, highways, and heavy construction.

Not too many miles away from SFO, I began to wind through the tight curves of back roads, making my way to the headquarters of a major agricultural producer. While I had never visited this company before, I had the opportunity to sit down with the executive team to explore the topic of big data in farming and agriculture.

I embraced the calm and serene scene, a far cry from the vibrancy of San Francisco and the rush of Silicon Valley. As we entered a conference room, the discussion turned to produce, as I asked, “Why is it that the strawberries that I bought last week taste so much better than the ones I bought the week before?” While I posed the question as a conversation starter, it became the crux of our discussion.

It seems that quality — and, more specifically, consistency of quality — is the foremost issue on the mind of major producers. I asked about the exquisite quality of produce in Japan. The executive team quickly noted that Japan achieves quality at the price of waste. Said another way, they keep only
10 percent of what a grower provides. This clarified the point in my mind that quality, consistency of quality, and eliminating waste create the three sides of a balanced triangle.

The conversation that followed revealed one significant consensus in the room: Weather alone impacts crop production and the consistency of crops. And since no one in the room knew how to change the weather, they believed that this was the way things would always be. I realized that by blaming the weather this team believed their future did not belong in their own hands but was controlled by the luck, or the misfortune, of each passing season.


The evolution of farming in the developed world provides context to much of the conventional wisdom about farming that exists today. Dating back to the 1700s, farming has been defined by four eras:

1700s (Subsistence Farming): Farmers produced the minimum amount of food necessary to feed their families and have some in reserve for the cold winter months.

1800s (Farming for Profit): This era marked the transition from subsistence farming to for-profit farming. This is when the widespread use of barns began, for the purpose of storing tools, crops, and related equipment. These were called pioneer farms.

Early 1900s (Power Farming): At this time, the “power” came in the form of 1,800-pound horses. The farmers used animals for plowing, planting, and transporting crops. The use of animal labor drove the first significant increase in crop productivity.

Mid-to-Late 1900s (Machine Farming): Sparked by the Industrial Revolution, this era’s farmers relied on the automation of many of the tasks formerly done by hand or animal. The addition of machinery created tremendous gains in productivity and quality.

Each era represented a significant step forward, based on the introduction of new and tangible innovations: barns, tools, horses, or machines. The progress was physical in nature, as you could easily see the change happening on the farms. In each era, production and productivity increased, with the most significant increases in the latter part of the 20th century.

Through these stages, farming became more productive, but not necessarily more intelligent.


The current era of farming is being driven by the application of data. It is less understood than previous eras because it is not necessarily physical in nature. It’s easy to look at a horse and understand quickly how it can make farm labor easier, but understanding how to use geospatial information is a different proposition. The advantage is driven by intangibles: knowledge, insight, decision making. Ultimately, data is the fertilizer for each of those intangibles.

A simple understanding of how a crop grows can aid in understanding the impact of data on farms. The basic idea is that a plant needs sunlight, nutrients from the soil, and water to grow into a healthy plant, through a process called photosynthesis. Healthy plants must keep cool through a process called transpiration (similar to how a human sweats when physically stressed). But, if a plant lacks the nutrients or conditioning to transpire, then its functions will start to break down, which leads to damage. Using data to improve farming is fundamentally about having the ability to monitor, control, and if necessary, alter these processes.

Today, according to the Environmental Protection Agency, there are 2.2 million farms in the United States and many more outside of the U.S. The average farm spends $110,000 per year on pest control, fertilizer, and related items to drive yield. The prescient way to improve profit and harvest yields across a vast territory requires better collection, use, and application of data.


Potato farming can be exceedingly difficult, especially when attempted at a large scale with the goal of near perfect quality. The problem with potato farming is that the crop you are interested in is underground; therefore, producing a high-quality and high-yield potato crop depends on agronomic management during the growing process.

At the Talking Data South West Conference in 2013, Dr. Robert Allen, a Senior GIS Analyst at Landmark Information Group, highlighted the importance of data in potato farming, in his talk titled, “Using Smartphones to Improve Agronomic Decision Making in Potato Crops.” Dr. Allen makes the case that leveraging data that describes the growth and the maturation of a crop during the growing season is instrumental to a successful yield. Continuous insight, delivered throughout the growing season, may have a material impact on the productivity of a crop.

One of the key variables required for yield prediction, and needed to manage irrigation, is groundcover. Groundcover, which calculates the percentage of ground covered by green leaf, provides critical input in the agronomic management of potato crops. Measuring groundcover is not as simple as pulling out a measuring tape; it requires capturing imagery of potato crops and large-scale collection of data related to the images (the water balance in soil, etc.), and the data must be put in the hands of farm managers and agronomists so that they can actually do something about what the data is telling them. The goal is not to collect data, but to act on it.

Dr. Allen describes four considerations in potato farming related to using data:

Time: Data needs to be collected at regular intervals and decisions need to be made in near-real-time.

Geography: These tend to be large-scale operations (10,000 to 20,000 acres), with fields distributed over large areas.

Man power: Data is often collected by farm field assistants (not scientists) and must be distributed because decision makers tend to be remote from the field.

Irrigation: Irrigation, while very expensive, is a primary factor in the maturation of a potato crop. Utilizing data to optimize the use of irrigation can lead to a productive crop, at the lowest possible cost.

These considerations led to a data collection and analysis solution called CanopyCheck. While it requires only a download from Apple’s App Store, it provides a rich data experience to compare groundcover and other related data to optimize the quality and yield of a potato crop.

The Landmark Information Group describes CanopyCheck (http:// 3260-20_4-10094055.html) as

This app is for potato growers, using the CanopyCheck ground- cover monitoring system, and captures accurate and reliable images of the potato crop which can be used to describe crop development. Each image is geo-located and labelled with farm and field information specified by the potato grower on the accompanying CanopyCheck website.

Conventional wisdom states that growing potatoes is easy: They don’t need sunlight, they do not need daily care, and by controlling the amount of water they receive, growing potatoes is a fairly simple process. However, as is often the case, conventional wisdom overlooks the art of the possible. In the case of potatoes, the application of data and agronomy can drive yield productivity up 30 to 50 percent, which is material in terms of the economics and the waste that is reduced.


Whether you strike up a conversation with a farmer in the 1800s, 1900s, or even in the early part of this century, they would highlight:

1) Their growing strategy evolves each year.

2) While the strategy evolves, their ability to execute improves each year,based on increased knowledge.

While this farming approach has been good enough for the better part of three centuries, the Data era ushers in the notion of precision farming. According to Tom Goddard, of the Conservation and Development Branch of Alberta Agriculture, Food and Rural Development, the key components of precision farming are:

Yield monitoring: Track crop yield by time or distance, as well as distance and bushels per load, number of loads, and fields.

Yield mapping: Global Positioning System (GPS) receivers, along with yield monitors, provide spatial coordinates, which can be used to map entire fields.

Variable-rate fertilizer: Managing the application of a variety of fertilizer materials.

Weed mapping: Mapping weeds using a computer connected to a GPS receiver while adjusting the planting strategy, as needed.

Variable spraying: Once you know weed locations from weed mapping, spot control can be practiced.

Topography and boundaries: Creating highly accurate topographic maps using a Differential Global Positioning System (DGPS). This data can be used to take action on yield maps.

Salinity mapping: This is valuable in interpreting yield maps and weed maps, as well as tracking the salinity over a period of time.

Guidance systems: Guidance systems, such as DGPS (accurate to a foot or less) are valuable for assessing fields.

Records and analyses: Large data collection is necessary to store pertinent data assets, along with images and geospatial information. It is important that this information can be archived and retrieved for future use.

The extensive insight that can be gained by collecting each of these data points is potentially revolutionary. It evolves a process from instinctual to data-driven — which, as seen in the potato example, has a fundamental impact on yields and productivity.

The underlying assumption is that the tools and methodology for capturing farm data are available and utilized efficiently. This is a big assumption because many farms today are not set up to actively collect and capitalize on new data assets. Accordingly, the ability to capture farm data becomes the source of competitive advantage.


It sounds easy. Collect data. Then use that data to deliver insights. But, for anyone who has been on a rural farm in the last decade, it is easier said than done. There are limitations that exist on many farms: lack of digital equip- ment, lack of skilled technology labor, poor distribution of electricity, and poorly defined processes. Because of these factors, each farmer must establish a new order of doing things to take advantage of the Data era. The data landscape for farming consists of three primary inputs.

Sensing equipment: Mounted devices on machinery, in fields, or anywhere near crops could be designed to collect/stream data or to control the application of water, pesticides, etc. This could range from instrumented tractors for harvesting to devices to monitor crop transpiration. The evolution of machines to collect data on crops and soil has been dramatic. In the last decade alone, equipment has evolved from mechanical-only to a combination of mechanical and digital technology. This change has been expedited by early insights that even small adjustments in planting depth or spacing can have huge impact on yields. So, while today the sensing equipment is largely a digitized version of common farm machines, the future will see a marked advancement in machines. Drones, driverless tractors, and other innovations will become commonplace.

Global Positioning System (GPS): GPS provides the ability to pinpoint location accuracy within one meter. While GPS first emerged for automobiles in the early 1990s in places like Japan, it has just now become common in all automobiles. Farming equipment, as you may expect, has been even a step further behind, with the wide use of GPS just accelerating in the last decade.

Geographic Information System (GIS): GIS assesses changes in the environment, tracks the spread of disease, as well understanding where soil is moist, eroded, or has experienced similar changes in condition. Once you know weed locations from weed mapping, spot control can be implemented. Topography and geology are important considerations in the practice of farming. Both are well accounted for with modern-day Geographic Information Systems.

By combining these three inputs, farmers will be able to accurately pinpoint machinery on their farms, send and receive data on their crops, and know which areas need immediate attention.


John Deere founded Deere & Company in 1836, when he moved to Grand Detour, Illinois to open a repair shop for farming tools. Deere eventually moved beyond tools and into the production of plows, which became a mainstay in the Farming for Profit and Power Farming eras. In 1848, Deere relocated to its still-current home in Moline, Illinois, and after his death in 1886, he passed the presidency of the company to Charles Deere.

Charles led the company into the 20th century, where the company pioneered the move to gasoline tractors, which became the defining product of not only the company, but of farming and agriculture in this time. The dominance of the company was ensured by continuous innovations in their tractors, innovation in their business model (a robust dealer network), and their defining image: John Deere green. As of 2010, the company employed 55,000 people and was operating in 30 countries worldwide. A shoe-in for continued dominance, right?

Monsanto, founded in 1901, took a bit longer to come into its defining moment. Moving into detergents and pesticides, Monsanto eventually became the pioneer in applying biotechnology to farming and agriculture. With biotechnology at its core, Monsanto applies data and insight to solve problems. Accordingly, Monsanto was a data-first company in its birth, which continued to drive its innovation and relevance. But sometimes, it takes time for an industry to catch up to its innovative leaders, and the first major evidence of how Monsanto would lead a change in the landscape was seen around 2010. That is when you see the fortunes of Deere & Company and Monsanto start to go in different directions.

Monsanto had one critical insight: Establishing data-driven planting advice could increase worldwide crop production by 30 percent, which would deliver an estimated $20-billion economic impact — all through the use and application of data. As Monsanto bet the company on the Data era, the stock market began to realize the value of the decision, leading to a period of substantial stock appreciation.

Data is disrupting farming, and we are starting to see that in the business performance of companies driving innovation in the industry. Gone are the days in which a better gasoline tractor will drive business performance. Instead, farmers demand data and analytics from their suppliers, as they know that data will drive productivity.


Monsanto calls their approach to farming in the Data era, Integrated Farming Systems (IFS). Their platform provides farmers with field-by-field recommendations for their farm, including ways to increase yield, optimize inputs, and enhance sustainability. Listening to the data and making small adjustments to planting depth or the spacing between rows makes a vast difference in production. As Monsanto says, this is “Leveraging Science- Based Analytics to Drive a Step Change in Yield and Reduced Risk.”

Monsanto’s prescribed process for Integrated Farming Systems involves six steps:

1. Data backbone: Seed-by-environment testing to produce on-farm prescriptions
2. Variable rate fertility: Adjusting prescriptions, based on conditions.
3. Precision seeding: Optimal spacing between rows
4. Fertility and disease management: Custom applications, as needed
5. Yield monitor: Delivering higher resolution data
6. Breeding: Increase data points collection to increase genetic gain

FieldScripts became the first commercial product to be made available as a component of Monsanto’s overall IFS platform. FieldScripts provides accurate seeding prescriptions for each farmer and each field.

Monsanto, through its seed dealer network, engages directly with farmers to optimize two variables: planting and seeding technology data. The seeding technology, which is primarily data about seeding, is the differentiating factor. Applying that insight to a personalized planting plan enables Monsanto to deliver personalized prescriptions for every field.

FieldScripts, delivered via iPad, utilizes a custom application called Field- View. FieldView, deployed to farmers, while leveraging the data acquired throughout the years, equips farmers with the tools and insights needed to make adjustments for optimal yields.

Deere & Company and Monsanto both have bright futures. According to Jeremy Grantham, chief investment strategist of Grantham Mayo Van Otterloo (GMO), with the world’s population forecasted to reach almost 10 billion by 2050, the current approach cannot sustainably feed the world’s population. The demand presented by population growth creates an oppor- tunity for all companies that service the industry. For the moment, Monsanto has leaped ahead in this new era of data-farming over the past five years, forcing Deere & Company to play catch-up.


Data is starting to prevail in agriculture. This is evident not only in the changing practices of farmers, but also in the ecosystem. New companies are being built, focused on exploiting the application of data.


Monsanto’s aggressive move into the Data era was perhaps punctuated in October 2013 with their announced acquisition of the Climate Corporation for $930 million. Why would a firm with its roots in fertilizers and pesticides spend nearly $1 billion on an information technology (IT) company? This aggressive acquisition demonstrates the evolution of the industry. “The Climate Corporation is focused on unlocking new value for the farm through data science,” commented Hugh Grant, the chairman and chief executive officer for Monsanto. Founded in 2006, the Climate Corporation employs a team unlike any other in the agriculture industry. The team is composed of many former Google employees, along with other elite technology minds from the Silicon Valley scene. The tools they develop help farmers boost productivity, improve yields, and manage risks, all based on data.

At the heart of this acquisition lies the core belief that every farmer has an unrealized opportunity of around 50 bushels of crop (corn, potatoes, etc.)in each of their fields. The key to unlocking these additional bushels lies in the data.

While the leaders of the past would provide better machines, Monsanto focuses on providing better data. By combining a variety of data sources (historical yield data, satellite imagery, information on soil/moisture, best practices around planting and fertility), this information equips the farmers with the information they need to drive productivity.


GrowSafe Systems began studying cattle in 1990. This was not a group of former cattle hands, but a team of engineers and scientists who foresaw data science as playing a role in cattle raising. In 2013, the GrowSafe team won the Ingenious Award from the Information Technology Association of Canada for best innovation. This was the first time that this organization gave an innovation award to anyone in the world of cattle.

GrowSafe developed a proprietary way of collecting data through the use of sensors in water troughs and feedlots. With these sensors, they track every movement of cattle, including specifics about the cattle themselves: con- sumption, weight, movement, behaviors, and health. Each night, the data is collected and then compared against a larger corpus of historical data. The goal is to look for outliers. GrowSafe knows that the data reveals information that cattle farmers often cannot detect. This innovative approach enables farmers to prevent a disease before it begins.


The mainstays of today’s farms are people, fertilizers, irrigation, gas machines, trucks and carts for transport, and local knowledge. It is a craft, and typically the only person that can run a certain farm is the person that started it. This is why so many farms fold after the head of the operation retires. The success is in their hands — it’s their craft.

Farms in 2020 will have a completely different feel from today’s farms. In fact, they may be unrecognizable to a farmer of the early 21st century. Approaches that seem futuristic today will be common to all farmers in 2020:

Digital machines: Digital machines, acting as sensors, will be the norm. The days of simple gas machines will be far in the past. In fact, by 2020, many farm machines will be battery- or solar-powered, with gas itself becoming a rarity. The digital machines will be much more than Internet-enabled tractors. There will be drones. Many drones. In fact, drones will become the most cost-effective and precise mechanism for managing many chores that farmers do via hand or by tractor today. With the sprawl of digital machines, device management will become the “cattle herding of the future,” as all devices will have to be managed and maintained appropriately.

IT back office: Every farm will have an information technology (IT) back office. Some will manage it themselves, while many will rely on a third party. The IT office will be responsible for the aforementioned device management, as well as remote monitoring and, ultimately, data-driven decision making. The IT back office will be the modern-day farm hand, responding to the farmers’ every need and ultimately ensuring that everything operates as programmed.

Asset optimization: With the sprawl of new devices and machines, asset optimization will be at the forefront. Maximizing the useful life of machines, optimizing location, and managing tasks (workloads) will be key inputs into determining the productivity of a farm.

Preventative maintenance: Digital machines, like gas machines, break. It is a reality of complex systems. This fact places the burden on preventing or minimizing outages because of maintenance and repairs. Many of the digital machines and devices will be designed to predict and prevent failures, but ultimately, this must become a core compe- tence of the farmer or his IT back office. Given that each farm will use the machines differently, the maintenance needs will likely be unique.

Predictable productivity: In today’s farms, the yield and productivity of crops vary significantly. Whether it is the weather, impacts of deforestation, or the impact of certain pesticides and fertilizers, it is an often-unpredictable environment. By 2020, productivity will be more predictable. Given all of the sources of data, GIS and GPS capabilities, and the intensive learning that will happen over the next five years, yields will become predictable, creating greater flexibility in the financial model for a farmer.

Risk management: In 2020, instead of being a key determinant of success, weather will simply be another variable in the overall risk management profile of an asset (in this case, a farm). Given the predictable productivity, risk management will be more about managing catastrophic outliers and, in some cases, sharing that risk with counterparties. For example, index-based insurance offers great potential in this area.

Real-time decision-making: Decisions will be made in the moment. With the growth of streaming data, farms will be analyzed as variables are collected and acted upon immediately. Issues will be remediated in 2020 faster than they will be identified in 2014. This is part of what drives the predictable productivity.

Production variability: Farms will no longer produce a single crop or focus on a subset. Instead, they will produce whatever will drive the greatest yield and productivity based on their pre-planting-season analysis. Farms will also begin to factor in external data sources (supply and demand) and optimize their asset for the products in greatest demand. This will completely change the variability that we see in commodities and food prices today.


As I left the headquarters of the agricultural company outside of San Francisco, I was amazed that a belief persists, in some places, that weather is the major force impacting our ability to grow consistent and productive crops. That does not seem much different from the pioneer farms of the 1800s, where the weather determined not only their business, but also their livelihoods.

Perhaps, as postulated before, that is the easy answer, as opposed to the real answer. The innovations that we’ve seen with precision farming, using data to transform potato crops, and the emergence of leaders like Monsanto, makes it evident that the weather is merely one variable that could impact crops in the future.

Data trumps weather. Farming and agriculture will be transformed by making the leap of acknowledging this truth.


Copyright 2014 Rob Thomas.
All Rights Reserved.

No comments:

Post a Comment