Creating an extensible big data platform Uber

Creating an extensible big data platform Uber

HomeData CouncilCreating an extensible big data platform Uber
Creating an extensible big data platform Uber
ChannelPublish DateThumbnail & View CountDownload Video
Channel AvatarPublish Date not found Thumbnail
0 Views
Download the slides: https://www.datacouncil.ai/talks/creating-an-extensible-big-data-platform

Download slides for this lecture: https://www.dataengconf.com/speaker/creating-an-extensible-big-data-platform?hsLangen-us?utm_sourceyoutube&utm_mediumsocial&utm_campaign%20-%20DEC-BCN-18%20Slides%20Download

ABOUT THE CONVERSATION

Uber's mission is to provide transportation as reliable as running water, everywhere and for everyone. To fulfill this mission, Uber relies heavily on making data-driven decisions at every level. So we need to store an increasing amount of data as the business grows, in addition to providing faster, more reliable, and better-performing access to our analytics data. In practice, this has resulted in 100 PetaBytes of analytical data with minute-level data latency.

On the other hand, due to Uber's global presence, regional regulations (such as GDPR) require additional and potentially more complex operations to be supported on the stored analytics data. These additional operations are typically unknown in advance, in many cases contradict the way data lakes are traditionally built/stored, and may require fundamental changes in the underlying assumptions/architecture. A good example is the need for GDPR regulations to support update/delete operations on all historical Hadoop data that is traditionally considered append-only and stored in a read-only columnar file format within the analytical data lake.

In this talk, we'll dive into how we can build a generic big data platform that is flexible enough to support many of these unknown additional regulations/requirements out of the box and with minimal effort. This isn't an overview of how Uber addressed all the GDPR requirements, but a deep dive into how Uber's big data platform came up with the foundational primitives that allowed all other teams within the company to build their solution on top of Hadoop.

We will explore what technologies we could leverage from the open source community (e.g. Hadoop, Spark, Hive, Presto, Kafka, Avro, and Vertica) and what solutions we had to build in-house (and open-source) to make this possible. You'll leave the lecture with a greater understanding of how things work at Uber and be inspired to rethink your own data platform to make it more generic and flexible for future new requirements.

Presenter:
Reza Shiftehfar
Engineering Manager at Uber Technologies Inc

Reza currently leads the Hadoop Platform team at Uber, where his team is building the required reliable/scalable data platform that serves petabytes of data using technologies such as Hadoop, Hive, Kafka, Spark, Presto, etc. Reza is a founding partner of data at Uber and helped scale the Uber data platform from a few TBs to 100 PetaBytes, while reducing big data latency from 24 hours to minutes. Reza has a Ph.D. He has a computer science degree from the University of Illinois at Urbana-Champaign and had previously worked at Twitter and Apple on similar infrastructure/big data platforms.

If you are interested in the field of distributed systems and Big Data analytics and would like to follow Reza, you can reach him on LinkedIn (https://linkedin.com/in/reza-shiftehfar-39301b6/) or on Twitter (https: //twitter.com/RezaSH).

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Be sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from the top open source projects and startups.

FOLLOW DATA ADVICE:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520

Please take the opportunity to connect and share this video with your friends and family if you find it helpful.