演讲嘉宾:黄洁——Intel的Apache Chukwa PMC Committer
演讲嘉宾:黄洁,中国Hadoop技术峰会2015北京站(CHS2015 Beijing)
黄洁标签:Intel大数据技术中心 研发经理 Spark ApacheChukwa PMC Committer 上海交通大学 分布式大数据框架
黄洁,Intel大数据技术中心研发经理,Apache Chukwa PMC & Committer
Grace Huang is currently an engineering manager inIntel SSG (Software and Services Group), responsible for advanced Big Datatechnology enhancement and optimization including Haodop, Spark and etc. Priorto that, she had been working in the big data area in Intel for over 6 years,with intensive experience on Hadoop, HBase performance tuning and optimization.
黄洁在本次中国Hadoop技术峰会(CHS2015 Beijing)上的演讲主题是StreamSQL: 在Spark上使用SQL实现流式处理。
Grace Huang's topic in CHS2015Beijing is StreamSQLon Spark: Manipulating Streams by “SQL” using Spark.
超越MapReduce的崭新大数据分析范式已经势不可挡地出现了。尤其是如何从流式数据上进行数据挖掘或者分析已然成为实际生产中不断产生的迫切需求。作为继Hadoop MR之后下一代的大数据分析框架,Spark已经能够很好地为实时分析计算提供简便易用的开发平台。但是作为一个完整的流数据管理系统,如何使用简单易用的查询接口(例如SQL),来进行流式数据的处理,是更多的数据分析师所关心和期待的。
New big data analysis paradigms beyond MapReduce have inevitably emerged. Particularly, there is increasing demand to mine and explore data in a real-time, streaming manner. As the next generation big data analytics stack, Spark already well served such kind of new Real-Time Analytical Processing paradigm with further development efforts. To be a complete data stream management system (DSMS), some SQL-liked streams manipulation is quite essential for better user experience in RTAP paradigm.
In this talk, we will present our POC implementation of StreamSQL by using Spark-streaming and Catalyst modules, which makes SQL-user quickly grasp stream processing with ease. Currently, it supports simple stream queries and mutual operations between streams and structured data, and also typical usages in Catalyst(e.g., L I N Q exps, mixture of SQL and DStreamoperators).
Chinahadoop.com是China Hadoop Summit的内容网站。