While we have had our Big Data Symposium this year and numerous analyst meetings, I think an update on our progress in Big Data is always helpful. Some of this may be well known already, but I want to give a comprehensive view. I've also included a slideshare below, which provides more detail. I would break down our Big Data strategy and focus into 6 areas:
1) Commitment to Open Source
I blogged about it here too, but I will restate: we are committed to Apache for our Big Data products and efforts. We have a long history in contributing to open source (see the slideshare below) and we have Hadoop committers and Lucene committers, to name a few. Given our commitment to Apache, we do not have a desire to be in the Hadoop distribution business.
2) Big Data as a Platform
We believe that clients will look to Big Data as a platform for managing all of their information assets. This is why I continue to allude to the application server as the best analogy for Big Data, instead of datawarehousing or databases. With Big Data as a platform, clients can write their own applications, buy 3rd party applications, or work with IBM for applications; either way, those applications will leverage the Big Data platform as the source of all data types, at scale. This will be very economical for large enterprises, as they leverage the platform across their business. Enterprises that become myopically focused on building out unique infrastructure for every use case, will find that to be very expensive and un-sustainable over time.
3) Data-in-Motion and Data-at-rest
If an enterprise cannot address both classes of problems, their Big Data strategy will not scale to meet their needs. For this reason, we continue to focus on delivering a Big Data platform that addresses both items: real-time/streaming analysis, as well as annotating/managing various data types at scale. A Big Data strategy that only addresses one of these is not a strategy: its a short term Band-Aid.
4) Research-led Capabilities
A Big Data platform is only as valuable as the capabilities that define its utility. Luckily, IBM Research has been focused on this problem for over a decade. Accordingly, we can deliver enterprise capabilites around Big Data that do not exist anywhere else:
-Text Analytics (System T)
-Machine Learning (System ML)
-Hardened and tested File System (POSIX compliant)
-Optimization (Adaptive MapReduce)
-Large Scale Indexing
Big Data cannot be a silo. For enterprises, it must plug into all of their existing infrastructure investments. Therefore, we made integration a fore-thought, not an afterthought. We are delivering integration with IBM and non-IBM enterprise software products.
The first 5 points don't mean much, if you can't demonstrate success. Fortunately, we can. We will release our first Big Data reference book at IOD in October. In it, you will see clients that are seeing real business impact from our Big Data products. You will also see what IBM has done for our own business with our Big Data capabilities. That being said, our real secret to creating success is partnership: working side-by-side with clients to prove out their business cases.