One philosophy has dominated Porsche engineering since the company’s formation: there is not one vehicle for all situations and people. Instead, each vehicle needs to perform a specific job for its user:

Patterns in Big Data
I first wrote about Next Generation Middleware in October of 2011. While alot has changed since then, many of my views on how Big Data will evolve have not. That being said, they have certainly become more granular.
I've had a front row seat to how Big Data is changing client environments for a few years now. 2 things are quite evident to me:
1) This change is quite real, it’s accelerating, and its much more than Hadoop.
2) There is a set of emerging deployment patterns.
As we’ve moved through the experimental phase of Hadoop and Big Data, I’m seeing clients take a much more strategic approach to the topic. It’s less about trying out the flavor of the month (Cassandra, Mongo, Hadoop, etc) and more about figuring out how to integrate many of these components into their existing environment.
A key tenet in developing a Big Data strategy requires an organization to take a page of Porsche's strategy and acknowledge that one size does not fit all. There are many technologies, most have a unique and special purpose, and the leaders in Big Data will leverage all or most in a complementary way. Hence, the pattern that I am seeing around building a Big Data Strategy revolves around 3 cornerstone environments:
The Landing Zone
The Discovery Zone
The Guided Zone
This is what it looks like logically:

You will recognize that IT environments of the last 20 years, have been largely focused in the ‘white areas’. These are traditional data repositories, providing data to business applications. This is how companies ran their business, in the e-business era. Certainly, as datawarehousing and analytics have risen to prominence, we have seen more investment in the ‘blue boxes’ or Big Data Zone. However, most of that investment to date has been an augmentation of the ‘white areas’ (ie providing analytics of structured data from transactional systems).
The Big Data Zone is where companies will separate themselves from others in the next 5-10 years. Those that can execute on this vision and get there faster will be more efficient, more information rich, and make better decisions.
The Landing Zone
This is the place where you 'land' your data in its native form. All data types, sizes, veracity accepted and expected. It's the innovation 'manufacturing floor', and as you begin to harvest your data assets, you can send those refined assets to other zones. The Landing Zone must be cost effective and differentiated by analytics and analysis (not just the run-time), as the effectiveness of your other zones may be dependent on the Landing Zone. I expect that we will see Hadoop and the plethora of NOSQL options take root in the Landing Zone.
The Discovery Zone
This is the place for discovery and deep analytics, primarily of structured data assets, but not limited to that. Have large complex analytic queries? Do them here. Need high performance analytics? Do it here. This becomes the core analysis and analytics hub for the organization. This will be the most efficient and cost effective place for high performance analytics. Obviously, this requires tight integration with the Landing Zone.
The Guided Zone
This is the place for mixed analytic workloads. It's not just deep analytics like the Discovery Zone; it encompasses thousands of concurrent users, operational workloads, analytic workloads and all of them in combination. It's the best place for mixed workloads, but it's too expensive to use for just landing data or for data discovery. This zone will be more important in some companies (like credit card companies tracking fraud transactions in real-time), than in others (a retailer analyzing last months sales).
This pattern of Big Data Zones is gaining steam in the forward looking IT environments across the industry. Like Porsche realized long ago, many companies know that there is not a single answer to every problem. Leaders in Big Data will embrace this notion of the Zones and start to build a plan to meet the analytic needs of the organization, leveraging all aspects of Next Generation Middleware.
This is a nice straightforward approach to implementing a Big Data platform. Where do you see Information Quality and Governance being used to ensure the analysis is built on good information vs. questionable answers? It looks like the Landing Zone, but just want be sure.
ReplyDeleteI think Info Quality and Gov have to exist across all of the zones. I suppose you could have your quality only on the Landing Zone and then feed the others, but Governance definitely has to exist in all of the zones...thoughts?
ReplyDeleteGreat post; very easy to read and understand. How difficult would it be to add new sources and varieties of data post implementation? I would imagine this architecture lends itself more easily to changes, but this could be one reason governance would be needed across all zones..
ReplyDelete