프로그램

SESSION 2 13:50 ~ 14:30

Apache Kafka: Inside LinkedIn's distributed publish/subscribe messaging system

Richard Park | LinkedIn

Richard Park graduated from University of Waterloo in Computer Science. For the past three years, he has been a software engineer in the Search, Network and Analytics group at LinkedIn. His primary focus has been on distributed systems, and has helped grow LinkedIn's Hadoop infrastructure from dozens to thousands of nodes. He has also assisted in integrating Kafka and Voldemort into LinkedIn's data pipeline, and is a primary developer on Azkaban. Richard has previously worked on fraud detection software at PayPal, video editing software at Autodesk and mobile developement at Research in Motion.

Now an Apache Incubator project, Kafka was designed to be a scalable, high-throughput, yet low latency, distributed publish-subscribe message system. Initially created at LinkedIn to replace its aging logging and tracking system, it has been successfully used in LinkedIn for two years to process, collect and aggregate data for both real-time and offline consumption. Much of the data flowing through Kafka feeds directly into our Hadoop system for use in data analytics and various data products. Additionally, several companies have decided to adopt Kafka into their infrastructure. In this presentation, we will highlight the core design principles of this system, operational aspects of running Kafka in production, performance metrics, and how this system fits into LinkedIn's data ecosystem as well as some of the products and monitoring applications that are supported by Apache Kafka.

발표자료 : http://prezi.com/mxenutdbqzij/sdec-2012-apache-kafka/?auth_key=0db584dcee91bb1c77e867ce313eda4f12243891

아파치 카프카 : 링크드인 내의 분산 pub-sub 메시지 시스템

리처드 박은 워터루 대학교에서 컴퓨터 공학을 전공하였다. 지난 3년간 링크드인 내의 검색, 네트워크, 분석 파트에서 소프트웨어 엔지니어로 근무하고 있다. 그의 주요 관심사는 분산시스템으로써 단 몇 십 개에 불과하던 링크드인의 하둡 인프라 노드가 수천개가 되도록 구축하는데 일조하였다. 리처드는 카프카와 볼디모트를 링크드인의 데이터 파이프라인에 통합하는 작업에도 가담하였으며 아즈카반의 주 개발자이다. 그는 페이팔 재직 시 부정탐지 소프트웨어 개발을 하였고 오토데스크에서는 비디오 편집 소프트웨어 그리고 리서치 인 모션 (RiM)에서는 모바일 개발을 담당했다.

아파치의 초기 프로젝트로써 카프카는 확장가능하고 throughput이 높으며 latency가 낮은 분산 pub-sub 메시지 시스템으로 고안되었다. 이는 링크드인의 오래된 로그 & 트래킹 시스템을 대체하기 위해 개발된 것으로써 지난 2년간 실시간과 오프라인에서 데이터를 프로세스하고 수집하는 데 성공적으로 사용되었다. 카프카로 유입되는 데이터의 대부분은 다양한 데이터 제품과 분석을 위해 하둡 시스템으로 피드된다. 현재 많은 회사들이 그들의 인프라로 카프카를 도입하려고 하고 있다. 이러한 점에서 본 세션에서는 이 시스템의 핵심 구성원리와 카프카의 운영적인 측면, 성능측정 그리고 이 시스템이 왜 링크드인의 데이터 환경에 적합한지에 더불어 아파치 카프카가 지원하는 감시 어플리케이션과 다른 제품들을 조명해 보고자 한다.

전체 : 0