Data Engineer
Expedia Group
2019-10-01-Present
Building data pipeline for booking trends application
Building data pipeline for booking trends application
Java
Apache Kafka
AWS Lambda
Java
Apache Kafka
AWS Lambda
Senior Data Engineer
Noodle.ai
2018-08-01-2019-10-01
- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode.
- Worked on centralised ...
- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode.
- Worked on centralised alert monitoring system to monitor data alert
- Build data deploy tool to deploy stage schemas to production
Python
Scala
PostgreSQL
View more
Python
Scala
PostgreSQL
Apache Spark
Apache Kafka
View more
Big Data Engineer
VideoTap
2015-12-01-2018-07-01
Worked on VideoTap product:
-Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server
-Coded Spark job in Scala to compute an...
Worked on VideoTap product:
-Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server
-Coded Spark job in Scala to compute analytics (user wise, location wise) from 13 GB data
-Created processors in apache nifi for dataflow between cassandra and hdfs
-Worked on video indexer api of azure to identify context from news.
I have also worked on 2 priority services projects here:
1) Value First- Need to extract attributes like gender,income,location etc(total 17 attributes) from raw SMS data.
-Text Mining of 6 TB SMS data on a 6 node Hadoop Cluster to evaluate gender,age,location,income(total 17 attributes) of 5 crores unique mobile number
-I have setup Hadoop and Spark Cluster with 5 datanodes and one namenode.
-Setup Hbase cluster with 1 masternode and 4 regionserver
-Coded the rules for Text Mining in Spark and I have used Hive for analyzing the data.
-Build Machine Learning Model on top of Spark for gender using Logistic Regression and income,age using Neural Network in Scala
-Integrated Apache Phoenix with HBase for quering data fast.
-Exposure-Spark(MLlib,Sql), Scala. Hive, MapReduce, HBase, Apache Phoenix, Machine
Learning(Logistic Regression, Neural Network)
2) Goals101-I was involved in building a rule based recommendation engine product which could be used by RBL Bank to increase the number of transactions and amount of transactions of their customers by sending offers at correct point of time
-Used Cassandra for storing large volume of bank's customer, transaction and statement data
-I have coded all spark jobs using scala on top of Hadoop file system and finally stored the processed output in Cassandra
-I have setup Hadoop,Spark and Cassandra Cluster on 3 EC2 instances and finally deployed our code over there
-Also I have coded 2 event driven Python REST API in flask for real time transactions.
Scala
MapReduce
Machine learning
View more
Scala
MapReduce
Machine learning
Cassandra
HBase
Apache Spark
Apache Hadoop
Apache Hive
View more