<evan.rosa/>

About Me

Transforming complex data challenges into scalable, efficient solutions

Evan Rosa - Lead Data Engineer

My Journey

With almost 15 years of experience in the data space, I've transitioned from analyst roles into architecting scalable batch and real-time data infrastructure that supports millions of users and billions of events. I specialize in building modern data platforms powered by Airflow, Spark, Kafka, Flink, and cloud-native tools.

My work focuses on optimizing ETL workflows, enhancing data accessibility, and driving cost-efficiency for high-impact teams across marketing, product, and engineering. I'm passionate about designing systems that turn complex data into clean, reliable insights—and thrive on tackling infrastructure challenges that unlock business growth.

Technical Expertise

Data Processing & ETL

Apache AirflowApache SparkApache KafkaApache FlinkSQLMeshETL Pipelines

Cloud & Storage

Google BigQueryGoogle Cloud StorageGoogle ComposerGoogle DataflowDatabricksAmazon RedshiftPostgresMongoDBFirebaseKubernetesTerraform (Currently learning)DockerCI/CD

Languages & Frameworks

PythonSQLREST APIsJupyter NotebooksPandasPySparkTypeScript

Data Visualization & BI

Apache SupersetLookerData ModelingPandasTableauPowerBI

Analytics & Monitoring

Google AnalyticsGoogle Tag ManagerSEOA/B Testing

Professional Experience

Lead Data Engineer

Digital TurbineMay 2020 - November 2024
  • Optimized GCS, Composer, and BigQuery ETL pipelines by refactoring legacy workflows, reducing processing time, cutting data costs by over $100K, and enhancing scalability across 20+ content products.
  • Developed and optimized batch data pipelines using Apache Airflow, Spark, SQL, and Python, supporting 10M daily active users and over 3B monthly ad impressions. Experimented with Flink for enhanced real-time streaming performance.
  • Integrated Looker with BigQuery and other data sources to create interactive dashboards, improving data accessibility for stakeholders.
  • Developed API-based data ingestion pipelines, improving ETL efficiency, reducing processing time and data maintenance complexity.
  • Maintained Databricks workflows, working with notebooks to troubleshoot Spark-based pipeline issues and ensure 100% uptime for critical data operations.

Associate Lead Analyst

Booz Allen HamiltonMarch 2015 - May 2020
  • Managed analytics for multiple HHS and NIH government websites under the Digital Analytics Program (DAP), driving performance and user engagement improvements through SEO audits, A/B testing, and goal funneling strategies.
  • Developed and managed marketing tag strategies using Google Analytics and Google Tag Manager, ensuring 100% data accuracy and alignment with client objectives.
  • Spearheaded the development of data warehousing solutions to understand key trends, enabling data-driven decision-making and actionable insights.
  • Regularly analyzed website metrics and delivered comprehensive analytics reports that shaped and enhanced client strategies, aligning with organizational goals.

Data Analyst

The American Chemical SocietyDecember 2010 - May 2015
  • Oversaw daily operations of the ACS Web Stats System, supporting marketing and sales analytics. Created monthly and annual reports to deliver strategic insights for editorial and marketing teams.
  • Analyzed ad performance across Google Search, YouTube, Google Analytics, Business Object, and external platforms to optimize campaign effectiveness.
  • Implemented metrics dashboards with Tableau and Google Data Studio for real-time web traffic monitoring.
  • Automated monthly and quarterly analytics reporting, reducing manual effort by 50%.
  • Migrated analytics infrastructure from legacy Google Analytics to Universal Analytics, ensuring seamless tracking and improved reporting.

Education & Certifications

B.S.B.A. in Business Administration

Western New England University

GCP (In Progress) GA: Data Sci GA: Full Stack

Featured Projects

Explore my data engineering projects and technical solutions

Stock Market ETL Pipeline

Stock Market ETL Pipeline

Designed and implemented an automated ETL pipeline to ingest, transform, and store AAPL stock data using Apache Airflow, Apache Spark, and MinIO, ensuring efficient data integration and real-time processing.

Apache AirflowApache SparkMinIOPythonParquet
Lead Data Engineer
BigQuery ETL Pipelines for Digital Turbine

BigQuery ETL Pipelines for Digital Turbine

Designed and optimized BigQuery ETL pipelines for scalable, high-performance data processing, supporting over 10M daily users and enabling analytics for 3B+ monthly ad impressions.

Google BigQueryApache AirflowApache FlinkApache SparkPython+2 more
Lead Data Engineer
English Premier League Match Outcome Prediction

English Premier League Match Outcome Prediction

Developed a comprehensive data pipeline to scrape, process, and analyze English Premier League match data, employing machine learning techniques to predict match outcomes.

PythonScikit-LearnPandasWeb ScrapingMachine Learning
Lead Data Engineer
Real-Time Subscription and Revenue Analytics

Real-Time Subscription and Revenue Analytics

Built an end-to-end real-time data pipeline for tracking subscription metrics, revenue trends, and customer churn using Kafka, BigQuery, SQLMesh, and Preset.

Apache KafkaGoogle BigQuerySQLMeshPresetPython
Lead Data Engineer
Real-Time Flight Analytics

Real-Time Flight Analytics

Developing an end-to-end streaming and batch pipeline to analyze flight trends, airline performance, and visitor influx in Puerto Rico using Kafka, Flink, Spark, and Airflow.

Apache KafkaApache FlinkApache SparkApache AirflowApache Superset
Lead Data Engineer
Streaming and Batch Experiments

Streaming and Batch Experiments

Designing real-time streaming and batch workflows for soccer match data using Kafka, Flink, Spark, and Airflow to enable high-velocity data ingestion, transformation, and analytics at scale.

Apache KafkaApache FlinkApache SparkApache AirflowConfluent+1 more
Lead Data Engineer
Wine Review Rating Prediction

Wine Review Rating Prediction

Developed a data pipeline to preprocess and analyze wine reviews, using machine learning models to predict ratings based on price, region, and variety, achieving an RMSE of 2.3.

PythonPandasScikit-LearnMachine LearningData Preprocessing
Lead Data Engineer

Data Engineering Consultancy

Turn your complex data challenges into strategic business advantages

Data Solutions

With nearly 15 years of data engineering experience, I help businesses build scalable, efficient data pipelines and infrastructure that deliver real value and solve complex data challenges.

Core Technologies

PythonSQLApache AirflowApache KafkaApache SparkSQLMeshApache FlinkApache SupersetLookerApache IcebergNessieGoogle AnalyticsGoogle Tag Manager

Ready to transform your data infrastructure?

Let's discuss how my expertise can help your organization build scalable, efficient solutions.

Get in Touch

Data Pipeline Architecture

Custom-designed batch and streaming data pipelines that scale with your business needs and optimize for cost efficiency.

Apache AirflowApache KafkaApache SparkSQLMeshApache Flink

Data Infrastructure Optimization

Refactoring and optimizing existing data workflows to reduce costs, improve reliability, and enhance performance.

PythonSQLApache IcebergNessieCloud Optimization

Analytics & Visualization

Implementation of comprehensive analytics solutions and interactive dashboards to unlock actionable insights.

Apache SupersetLookerGoogle AnalyticsGoogle Tag Manager

The Inner Join
Build Scalable Data Pipelines.

Less NULLs. More value.

Cover Image for Why Streaming ETL Is the Future — and How to Get Started

Streaming ETL is no longer a niche—it's the foundation of real-time, event-driven systems. In this post, I break down when to use streaming pipelines, how Kafka and Flink fit together, and walk through a real-world example.

More Stories

Cover Image for What is ETL in 2025? Moving Beyond Extract, Transform, Load

What is ETL in 2025? Moving Beyond Extract, Transform, Load

ETL has evolved—fast. Here's a clear, thoughtful guide on modern ETL vs. ELT, highlighting real-world use cases, tooling insights, and best practices for data engineers.

Cover Image for From Pipelines to Purpose: Why I’m Sharing My Journey in Data Engineering

From Pipelines to Purpose: Why I’m Sharing My Journey in Data Engineering

A senior data engineer's story of building real-time and batch pipelines—and why I'm sharing my journey to land my dream role.

Let's Connect

Interested in working together? I'm always open to discussing new projects, creative ideas or opportunities to be part of your vision.

Get In Touch

Feel free to reach out for collaborations or just a friendly hello

Connect with me

Currently

Location:Washington, DC Metro Area

Availability:Open to Work

Looking for:Full-time opportunities