IBM made some significant announcements around our investment in Spark, back in June (See here and here). 90 days later, I saw it fit to provide an update on where we stand in our community efforts.
Before I get to the details, I want to first re-state why I believe Spark is will be a critical force in technology, on the same scale as Linux.
1) Spark is the Analytics operating system: Any company interested in analytics will be using Spark.
2) Spark unifies data sources. Hadoop is one of many repositories that Spark may tap into.
3) The unified programming model of Spark, makes it the best choice for developers building data-rich analytic applications.
4) The real value of Spark is realized through Machine Learning. Machine Learning automates analytics and is the next great scale effect for organizations.
I was recently interviewed by Datanami on the topic of Spark. You can read it here. I like this article because it presents an industry perspective on Hadoop and Spark, but it's also clear on our unique point of view.
Also, this slideshare illustrates the point of view:
So, what have we accomplished and where are we going?
1) We have been hiring like crazy. As fast as we can. The STC in San Francisco is filling up and we have started to bring on community leaders like @cfregly .
2) Client traction and response has been phenomenal. We have references already and more on the way.
3) We have open sourced SystemML as promised (see on github) and we are working on it with the community...in the open. This contribution is over 100,000 lines of code.
4) Spark 1.5 was just released. We went from 3 contributors to 11, in one release cycle. Read more here.
5) Our Spark specific JIRAs have been ~5,000 lines of code. You can watch them in action here.
6) We are working closely with partners like Databricks and Typesafe.
7) We have trained ~300,000 data scientists through a number of forums, including BigDataUniversity.com. You can also find some deep technical content here.
8) We have seen huge adoption of the Spark Service on Bluemix.
9) We have ~10 IBM products that are leveraging Spark and many more in the pipeline.
10) We launched a Spark newsletter. Anyone can subscribe here.
Between now and the end of the year, we have 5 significant events, where we will have much more to share regarding Spark:
a) Strata NY- Sept 29-Oct 1 in New York, NY.
b) Apache Big Data Conference- Sept 28-30 in Budapest, Hungary.
c) Spark Summit Europe- Oct 27-29 in Amsterdam.
d) Insight- Oct 26-29 in Las Vegas.
e) Datapalooza- November 10-12 in San Francisco.
In closing, here is a peek inside: