I'm a backend software engineer working primarily with Python. I see software as a tool to solve problems, instead of the goal in itself. Sometimes it's necessary to compromise software quality perfection in favor of other goals and needs. When this is the case, I strive to make trade-offs conciously and judiciously.
My background involves building products with complex third-party API integrations, large scale data processing and ETL jobs, dealing with unstructured data sources, implementing automated unit & integration testing. I also have experience with implementing AI/ML models and libraries, especially LLM and NLP-related.
AWS is the cloud provider I have most experience with, in both traditional architectures using EC2, RDS, ELB, etc, as well as serverless architectures using Lambda, API Gateway, DynamoDB, Aurora Serverless, S3, Kinesis, Athena, etc.
Fully automated machine learning pipeline for news articles monitoring, feature extraction, NLP parsing, text classification, data enr...
Fully automated machine learning pipeline for news articles monitoring, feature extraction, NLP parsing, text classification, data enrichment with multiple external APIs (geolocation, corporate data, among others). Used AWS Lambda for compute processing and multiple datastores: Amazon S3 (data lake), MySQL (analytical querying), and DynamoDB (highly-scalable primary DB). Built an internal User Interface for data tagging for machine learning models, as well as human-analysis and vetting of machine learning outputs. A second User Interface was built for the company customers to consume the data with full-text search and faceting capabilities (using AWS CloudSearch), as well as export to Microsoft Excel.
Created a fully-automated pipeline to collect, extract and enrich data to build a database with corporate announcements around the wor...
Created a fully-automated pipeline to collect, extract and enrich data to build a database with corporate announcements around the world, including two frontend applications, one for internal analysts to curate the data and another for end users to query and interact with it.
Text announcements are collected from thousands of sources on a continuous basis, NLP is used for named-entity recognition and part-of-speech tagging, and machine learning models classify each part of the text into several categories.