Results-driven Data Engineer with experience in designing and developing
Terabytes to Petabytes scale Data Platforms/Data Lakes and ETL Frameworks for processing batch and real-time data.
Passion for Distributed Systems and Big Data ecosystem tools and technology stack. Strong acumen in choosing the right tools, technologies for building scalable architectures and platform solutions to process/analyze structured and unstructured datasets, supporting Engineers, Data Analysts, Data Scientists, and many customers. Ability to build production-grade Data Pipelines using the business requirements from scratch.
• 9 years of architecting, designing largely scalable Big Data platforms from Data Ingestion, Data Processing. Experience in building homegrown frameworks catering to Batch Processing and Real-Time/Streaming analytics.
• Built a Streaming Data Analytics platform to process ~500TBs volume of events to process telemetry data on ~80PBs of raw data per day. I designed Cisco's Data-Lake catering to the Cross-Functional teams as well as the external customers of Cisco.
• Instrumental in scaling the Cisco Syslog NG[Next Generation] platform from processing ~13Million events per day to ~55Million events per day. Syslog Next Generation is a highly scalable and high available event-driven and real-time Distributed Data Pipeline designed using Apache Kafka as the message bus and inter-process communication orchestrating different functional services developed using Java and Tomcat-based Spring Boot containers. The data is processed using Spark Streaming jobs.
• Saved 3400$ per quarter to Cisco Systems Inc. by designing, rewriting, and running the Pentaho Data Integration-based ETL pipelines to Microservices-based pipelines at scale.
• Led the Data Engineer efforts to build a scalable Batch Processing platform to process Mobility data for Cisco customers gathering the metrics calculating KPIs to analyze the Quality of Experience.