How and why I built Event data ingestion on AWS using SQS/Kinesis/DynamoDB
About me
Hands-on-Architect / Architect/design/code cloud based software systems and integrations
The problem I wanted to solve
Huge volume of events needed to be ingested in a short span of time and actionable intelligence gleaned from it.
Approaches used prior did not scale and did not meet objectives to fulfill actionable intelligence targets
What is Event data ingestion on AWS using SQS/Kinesis/DynamoDB ?
A staged event driven pipeline using the cloud(AWS). High volume of events generated by upstream systems converted to analytics output that is provided as a dashboard for business folks to make decisions on.
It involved various stages of processing events and certain NLP techniques to glean knowledge from events and then applying analytics patterns to builds actionable intelligence
Tech stack
AWS(SQS/Kinesis/DynamoDB/Redshift/ECS/Lambda).
Compared to the cost of instrumenting the same set of technologies in a data-center, AWS provides flexibility and scale with minimal up-front expenditure.
The process of building Event data ingestion on AWS using SQS/Kinesis/DynamoDB
The process started with analyzing critical requirements and aligning to a set of possible architectures and technologies to fulfill those requirements. AWS was chosen as the primary IaaS provider because of reasons of cost & scale
Each stage of the processing pipeline was built in isolation so that it enabled testing (with mocks) and slowly integrated to complete a multistage pipeline, each stage acting as a micro-service with it;'s own datasource.
All of the pipeline was built using Java and instrumentation was done using Python/
Terraform/ Ansible / Jenkins / Maven.
Challenges I faced
Trying to get a grip on the partition aspects of DynamoDB was a bit challenging when data is being written at a very rapid rate, followed by reads at a higher scale. Various indexing and caching mechanisms alleviated those issues.
Trying to balance cost VS functionality was another challenge since any resource used on the cloud costs money and has to be taken into consideration when designing/coding/testing & deploying software onto the cloud.
Key learnings
The learnings were that AWS is a pretty easy platform to code for and scale rapidly. But it comes with some downsides as mentioned above.
There were other AWS services which could have been used to deliver certain functionality coded as part of the pipeline, but there were risks because of maturity of those services.
Tips and advice
My advice would be to always build testable components which can be developed/tested/deployed in isolation and then combined with other components to build systems in whole.
Final thoughts and next steps
The system built has met its objectives. The next steps are for integration with services outside of AWS and that requires complete set of new services and resources to support inter-cloud use.