Big Data Analysis – Kexin Rong | Stanford MLSys #61

Big Data Analysis – Kexin Rong | Stanford MLSys #61

HomeStanford MLSys SeminarsBig Data Analysis – Kexin Rong | Stanford MLSys #61
Big Data Analysis – Kexin Rong | Stanford MLSys #61
ChannelPublish DateThumbnail & View CountDownload Video
Channel AvatarPublish Date not found Thumbnail
0 Views
Episode 61 of the Stanford MLSys Seminar Series!

Learned indexing and sampling for improving query performance in Big Data Analytics
Speaker: Kexin Rong

Abstract:
Traditional data analytics systems improve query efficiency through fine-grained indexing and row-level sampling techniques. To keep up with data volumes, more and more systems are storing and processing datasets on large partitions with hundreds of thousands of rows. Therefore, these analytics systems need to adapt traditional techniques to work with coarse-grained data partitions as the basic unit to process queries efficiently. In this talk, I will discuss two related ideas that combine learning techniques with partition designs to improve query efficiency in the analytics systems. First, I'll describe PS3, the first approximate query processing system that supports non-uniform sampling at the partition level. PS3 reduces the number of partitions that 3 can access by up to 70x to achieve the same error compared to a uniform sample of the partitions. Next, I'll present OLO, an online learning framework that dynamically adjusts data organization based on changes in query workloads to minimize overall data access and movement. We show that dynamic reorganization in end-to-end runtime outperforms a single, optimized partition scheme by up to 30%. I conclude by discussing outstanding issues in this area.

Bio:
Kexin Rong is a postdoctoral researcher at Vmware Research Group. Her research focuses on improving the efficiency and usability of large-scale data analysis. She received her Ph.D. in computer science from Stanford, advised by Peter Bailis and Philp Levis. She will join Georgia Tech in the fall as an assistant professor in the School of Computer Science.


0:00 Presentation
32:20 Discussion

Stanford MLSys Seminar hosts: Dan Fu, Karan Goel, Fiodar Kazhamiaka and Piero Molino
Executive producers: Matei Zaharia, Chris Ré

Twitter:
https://twitter.com/realDanFu​
https://twitter.com/krandiash​
https://twitter.com/w4nderlus7

Check our website for the class schedule: http://mlsys.stanford.edu
Join our mailing list to receive weekly updates: https://groups.google.com/forum/#!forum/stanford-mlsys-seminars/join

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford #vmware #georgiatech #bigdata

Please take the opportunity to connect and share this video with your friends and family if you find it helpful.