shubham kumar

4.5

(2 reviews)

US$12.00

For every 15 mins

Sessions/Jobs

ABOUT ME

Senior Data Engineer with 4+ years of experience

I am a star data engineer and have worked on lots of data engineering technologies during my tenure.
In my experience, I have worked for ValueFirst client, in which I have processed 6 terabytes of raw SMS data to extract attributes like gender, income, location etc from them.
Next, I have worked for Goals101 client, in which I have build rule based recommendation engine for RBL bank.
Rest all my experience is in building products for the company.
My skillset includes : BigData, SQL, NoSQL, Cloud( AWS, Azure), Python, Scala, Java

Kolkata (+05:30)

Joined December 2019

EXPERTISE

Data Engineering

4 years experience | 1 endorsement

Apache Hive Apache Hadoop Apache Spark MapReduce Apache Kafka Cassandra HBase MongoDB SQL PostgreSQL

Big Data

4 years experience

Apache Hive Apache Spark Apache Hadoop

Apache Hive Apache Spark Apache Hadoop Apache Kafka MapReduce

REVIEWS FROM CLIENTS

shubham's profile has been carefully vetted and approved as a Codementor. Connect with shubham now, and leave a review for them once you're done!

SOCIAL PRESENCE

GitHub

Logstofinal

MapReduce code for making logs data from sms data

Java

Sms-analysis-with-mapreduce

Text mining of SMS logs data using MapReduce and finally applying machine learning on rest of the attributes which are not identified by text mining

Java

EMPLOYMENTS

Data Engineer

Expedia Group

2019-10-01-Present

Building data pipeline for booking trends application

Java

Apache Kafka

AWS Lambda

Java

Apache Kafka

AWS Lambda

Senior Data Engineer

Noodle.ai

2018-08-01-2019-10-01

- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode. - Worked on centralised ...

- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode. - Worked on centralised alert monitoring system to monitor data alert - Build data deploy tool to deploy stage schemas to production

Python

Scala

PostgreSQL

Python

Scala

PostgreSQL

Apache Spark

Apache Kafka

Big Data Engineer

VideoTap

2015-12-01-2018-07-01

Worked on VideoTap product: -Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server -Coded Spark job in Scala to compute an...

Worked on VideoTap product: -Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server -Coded Spark job in Scala to compute analytics (user wise, location wise) from 13 GB data -Created processors in apache nifi for dataflow between cassandra and hdfs -Worked on video indexer api of azure to identify context from news. I have also worked on 2 priority services projects here: 1) Value First- Need to extract attributes like gender,income,location etc(total 17 attributes) from raw SMS data. -Text Mining of 6 TB SMS data on a 6 node Hadoop Cluster to evaluate gender,age,location,income(total 17 attributes) of 5 crores unique mobile number -I have setup Hadoop and Spark Cluster with 5 datanodes and one namenode. -Setup Hbase cluster with 1 masternode and 4 regionserver -Coded the rules for Text Mining in Spark and I have used Hive for analyzing the data. -Build Machine Learning Model on top of Spark for gender using Logistic Regression and income,age using Neural Network in Scala -Integrated Apache Phoenix with HBase for quering data fast. -Exposure-Spark(MLlib,Sql), Scala. Hive, MapReduce, HBase, Apache Phoenix, Machine Learning(Logistic Regression, Neural Network) 2) Goals101-I was involved in building a rule based recommendation engine product which could be used by RBL Bank to increase the number of transactions and amount of transactions of their customers by sending offers at correct point of time -Used Cassandra for storing large volume of bank's customer, transaction and statement data -I have coded all spark jobs using scala on top of Hadoop file system and finally stored the processed output in Cassandra -I have setup Hadoop,Spark and Cassandra Cluster on 3 EC2 instances and finally deployed our code over there -Also I have coded 2 event driven Python REST API in flask for real time transactions.

Scala

MapReduce

Machine learning

Scala

MapReduce

Machine learning

Cassandra

HBase

Apache Spark

Apache Hadoop

Apache Hive