演讲嘉宾:辛湜(Apache Spark开源社区的主导人物之一)
个人简介:Reynold Xin (辛湜) 是Apache Spark开源社区的主导人物之一。他在UC Berkeley AMPLab进行博士学业期间参与了Spark的开发,并在Spark之上编写了Shark和GraphX两个开源框架。今年年中,他和AMPLab同僚共同创建了Databricks公司。
演讲题目:The Spark Ecosystem: Fast and Expressive Big Data Analytics
演讲简介: Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop MapReduce by up to 100x in many real-world applications. In addition to that, Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. The system has been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.
In this talk, I will introduce Spark and cover many projects in the ecosystem, including Shark for SQL query processing, GraphX for graph analytics, and DStream for streaming computations. We will demonstrate the power of rich analytics in a single platform, and the Spark team's vision for the future of Big Data processing.
|