BigData, Cloud, ETL, SQL & NoSQL DB, Spark, Hive, Python, Scala, Kafka
Have 13+ years of experience onto Software Architect, Design, Development, Support, Maintenance, Troubleshooting, Performance tuning & Environment/Clusters setup using various Open Source, BigData, ETL & Other technologies.
Have expertise into Data Analyst, Data Storage, Data Lakes, Data Processing, Mining, Optimizing, Reporting & Cluster tuning.
Have domain knowledge from Insurance, Product Information Management (PIM), Network & Media, along with hands on with Agile Methodologies, Scrum & Waterfall.
Have 5 years into BigData ecosystem, Hadoop Distributed File System (HDFS) & YARN; Well versed with tools & interpreters like Hive, Pig, Flume, Oozie, Sqoop, Spark, HBase, Beeline, Impala, Tez, Scala, Athena, Qubole & Git.
Also have good exposure in using various file formats (TXT, PARQUET, ORC) & compression techniques (Snappy, LZO, GZIP, BZIP2) around BigData.
Have hands on experience in installing, configuring and maintaining Apache Hadoop ecosystem (Ver 1.0, 2.0, 3.0) and various components & tools around Hadoop Distributed File System.
Have hands on over Amazon AWS’s EC2, S3, EMR, EBS & RedShift Spectrum for Data processing & storage accordingly.
Have hands on with Amazon RedShift & RedShift Spectrum, SCT tool, DMS along with Amazon’s Catalog usage.
Expertise in working on CENTOS/Linux/UNIX/Fedora/Ubuntu Operating System or it’s flavors as a user & super-user with good amount of administration knowledge too.
Good understanding of Hadoop Architecture and various other components such as YARN, HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource/Node/Application Managers, MRv1 & MRv2.
Hands with ETL & DWH process, worked on near real-time ETL Tool – Oracle GoldenGate at Implementation, Installation, administration and configuration, maintenance & troubleshooting; sourceing from multiple databases like MSSQL, Oracle, Teradata, DB2 and UDB.
Worked on various Hadoop distributions from Apache, Cloudera, Hortonworks & Qubole.
Hands on with Capacity planning of Hadoop & cluster sizing, Hadoop cluster tuning, Query optimization & performance tuning & improvement to utilize at its the best.
Hands on in analyzing tools by comparing its statistics generated based on process/resource/cost/maintenance utilizations.
Good into Data Analysis and Reporting along with designing and implementing Data Lakes using BigData tools, while at analyzing and extracting the periodical data to have insights/trend of usage/issues of a resource over network helping in addressing the critical & medium issues while solving the Business problems.
Worked on Data modeling - Conceptual, Logical & Physical models on BigData.
Have good knowledge of NoSql technologies such as MongoDB & HBase.
Experience in analyzing, processing, mining, optimizing, extracting, Importing & exporting data using Hive Queries (HiveQL), Pig Latin, Impala, Spark & Scala.
Good in utilizing Oozie scheduler & workflows in single/parallel/dependent execution mode with multiple actions attached.
Designed & implemented the automated process to balance usage of cluster resource by Oozie in parallel mode by controlling the parallel operations.
Handled the enhancements to the core design by adding complex datatypes Arrays, Maps & Structs to the existing Data Model without affecting the system functionalities.
Hands on experience with SQL, NoSql, Pentaho DI, Datameer & Shell Scripting along with basic knowledge on Java.
Also have experience in handling various Clean-up activities such as S3, Athena/Hive tables/database clean-up activities.
Have extensively worked on Network Application development using C language with Linux Internals (Socket, Threads, Message Queues, Shared Memory, Semaphores, Process & Signals), gdb, Valgrind.
Hands on with Shell Scripting (Bsh & Ksh) in writing Applications & Monitoring/Automation/Alert/deployment and S3 size calculation scripts, etc. also have exposure to Perl Scripting.
Work experience with Reporting tools on HDFS like Datameer and Excel with desired pivot and graphs.
Proficient with various Development & Debugging tools in performing coding, unit and integration testing accordingly.
Experience in writing make-file, generating archives, shared libraries & generating Spec file for building RPM package.
Also played various other roles such as Build, Deploy & Quality manager, Deployment process & testing coordinator over across apart from my actual roles.
Extensive experience in handling and executing jobs either as a team or an individual contributor as well in parallel.
Excellent documentation skills while having good knowledge of Visio.
Experience in generating various diagrams like workflow, use-case, sequence, communication & timing diagrams.
Maintained good rapport with all clients/managers across different projects and commitment towards application success and support.
Hands on with Microsoft office like Word, Power Point, Excel, Access DB & Visio.