Data Citizen & Modern Data Stack Specialist
Data Engineer specializing in building scalable data platforms and pipelines
I'm a Senior Data Engineer who's passionate about building systems that scale — and making data actually work for teams.
With 8+ years of experience across startups and Web3, I focus on designing modern data platforms, automating pipelines, and optimizing cloud infrastructure for performance and cost.
At companies like Spectral Finance, OpenBlock Labs, and Olist, I've had the opportunity to:
I enjoy working across Python, Spark, dbt, Terraform, ECS, FastAPI, and more — but what I really care about is solving real problems, mentoring teams, and delivering clean, maintainable solutions.
Needed a scalable, reliable, and cost-effective data transformation pipeline for cross-chain blockchain analytics across Ethereum, Arbitrum, Linea, and Eigenlayer.
Build a DBT-based transformation pipeline using a 3-layer architecture, integrating with multiple cloud data warehouses and supporting advanced analytics.
Delivered faster insights, scalable models, improved data reliability, and reduced infrastructure costs for blockchain analytics.
Manual data ingestion from Dune Analytics was slow, error-prone, and limited the team's ability to quickly access blockchain analytics data.
Build an automated pipeline to integrate Dune Analytics with the enterprise data platform, enabling self-service ingestion into Athena.
Reduced data ingestion time by 90%, eliminated manual steps, improved data reliability, enhanced team productivity, and reduced operational costs.
Teams relied on notebooks for prototyping and manual deployment, leading to slow, inconsistent, and unscalable data pipelines.
Build a data pipeline factory to automate project scaffolding, configuration management, and Airflow DAG generation via YAML, enabling seamless dev-to-prod deployment.
Reduced setup time from days to minutes, improved collaboration, automated deployment, enhanced scalability, and cut pipeline deployment from one month to 1-2 weeks.
Company needed a robust data platform, moving from a NoSQL database to a comprehensive solution for data-driven culture.
Design and implement a Data Lakehouse Platform to handle multiple data sources including blockchain, lending protocols, and cryptocurrency platforms.
Successfully established a functional Data Lakehouse platform enabling data-driven culture and efficient data access for analytics.
Need to obtain unique EOA addresses from DeFi events (AAVE V1, V2, and Compound) with specific data requirements.
Create data pipeline for unique EOA address tracking using Apache Hudi.
Successfully created tracking system for unique EOA addresses, enabling efficient DeFi event analysis.
Organization needed comprehensive Ethereum blockchain transaction data for on-chain credit scoring, requiring faster and more reliable data access than available through Etherscan API.
Design and implement a custom Ethereum node solution for on-chain credit scoring, including evaluation of different clients.
Successfully implemented a high-performance blockchain data processing system with custom Ethereum node, enabling real-time analysis of transactions for on-chain credit scoring.
We needed to fetch data from The Graph, specifically DeFi data from the Polygon subgraph. The data of interest were events generated by lending protocols.
Set up a reliable data extraction and processing pipeline to get this specific DeFi data from the Polygon subgraph on The Graph.
Successfully implemented a pipeline to fetch and process DeFi data from the Polygon subgraph, enabling detailed analysis of lending protocol events.
MWAA configuration had networking issues causing high costs and security concerns.
Reconfigure MWAA to operate within private network accessible only through VPN.
Reduced data traffic costs by 90% and improved security through VPN-only access.
Working on a project that involved defining business rules with the business team and developing data infrastructure.
Cooperate with business team to define rules and assist with service deployment, working with consulting company for DW data modeling.
Successfully defined business rules and established robust data infrastructure, leading to enhanced data management capabilities.
Company was facing a high number of product stockouts across all branches.
Develop an algorithm to suggest product transfers between branches to reduce stockouts based on business rules.
Significantly reduced product stockouts across branches through data-driven inventory management.
Company needed effective customer segmentation to improve marketing efficiency and sales outcomes.
Implement RFM analysis and cluster analysis for customer segmentation to improve marketing targeting.
Improved marketing effectiveness through targeted campaigns and increased sales through better customer understanding.
Academic project requiring Twitter data collection and analysis for specific hashtags.
Design and implement data pipeline for tweet extraction and analysis using Apache Nifi and StreamSets.
Successfully gathered and analyzed Twitter data, gaining practical experience with data pipeline tools.
20+ AWS Lambda functions lacked failure notification system.
Create observability stack for Lambda failure alerts on Slack.
Improved Lambda function monitoring with real-time Slack alerts, reducing downtime and improving reliability.
Company sought to become a data-driven organization, needing to construct a data lakehouse platform on AWS to handle 30+ databases, 300+ tables, and 35TB+ of data on S3.
Create a comprehensive data lakehouse platform using modern data stack technologies including Spark on EMR with Hudi, Athena, ECS, Airbyte, Airflow, and Power BI.
Successfully established a scalable data lakehouse platform enabling data-driven decision making and improved data accessibility.
Marketplace platform needed to summarize event data from DynamoDB for business analysis, requiring a robust data pipeline solution.
Design and implement a pipeline to extract data from DynamoDB and load it into the Data Lakehouse for analysis.
Successfully implemented a real-time data pipeline enabling efficient analysis of DynamoDB event data, leading to improved business insights.
Sales team needed real-time insights from CRM events but lacked a system to capture and process these events for immediate analysis.
Create a webhook system to listen to CRM events, process them, and store the data in the Data Lakehouse for real-time analysis.
Successfully implemented a real-time CRM event processing system, enabling immediate access to sales data and improving decision-making capabilities.