Hi, I'm Nitish Songara — a Data Engineer with 4+ years of experience building scalable ETL pipelines and cloud-based data solutions at Deloitte and Ascendeum. I specialize in Python, PySpark, SQL, and cloud platforms (Azure, AWS, MongoDB), delivering high-performance systems that process millions of events and power data-driven decision-making.

I'm actively pursuing freelancing opportunities to collaborate on challenging data engineering projects worldwide, whether it's architecting pipelines, optimizing databases, or building custom analytics platforms.

technical skills.

< programming_languages_and_frameworks />

Python (PySpark, pandas)SQL (Query Optimization, Complex Queries, Databricks SQL)Java (Basic to Intermediate)Databricks Delta Lake

< cloud_platforms_and_data_orchestration />

Microsoft Azure (Azure Data Factory, DP-203, DP-300, AZ-900 certified)Amazon Web Services (S3, Lambda, EC2 basics)Apache Airflow (Workflow Orchestration & Scheduling)Data Pipeline Orchestration & Automation

< data_storage_and_modeling />

MongoDB (Indexing, Sharding, Performance Tuning)SQL Server (Performance Optimization)Data Modeling & Schema DesignDelta Lake (scalable storage management)

< data_engineering_operations />

ETL/ELT Pipeline DevelopmentData Quality Checks & MonitoringDebugging & Troubleshooting Data PipelinesWorkflow Migration (Pandas to PySpark)

< business_intelligence_and_analytics />

Power BI Dashboarding & VisualizationKPI Derivation & Performance Metric CreationData-driven Decision Support

< additional_skills />

Data Structures & AlgorithmsAI Tools for Development Automation & SpeedModular, Maintainable Pipeline DesignCollaborative Project Management & Version Control (Git)

< communication_and_consulting_skills />

Technical documentationClient requirement gatheringPresentation of analytics insights

featured projects.

Bathroom Pricing Engine

Click to know more...

  • AI-powered pricing engine that converts natural language bathroom renovation descriptions into detailed cost estimates with materials, labor, margins, and VAT calculations
  • Parses free-form project descriptions using OpenRouter API and generates structured JSON quotes with comprehensive cost breakdowns
  • Built modular pricing logic with separate modules for material database management (1300+ materials), labor calculations with difficulty-based rates, and multi-country VAT rules
  • Implements intelligent fallback systems with API integration for missing materials and regional pricing adjustments across major French cities
  • Features automated testing suite and handles edge cases including unknown materials, invalid inputs, and API failures with graceful degradation
  • Designed for future enhancement with local AI model for 10-50x faster processing, vendor comparisons, and budget-optimized recommendations
PythonOpenRouter APIJSONNatural Language ProcessingRESTful API Integration
  • Multi-site web scraping system that extracts bathroom product data (pricing, specs, images) from e-commerce platforms with automated anti-bot bypass mechanisms
  • Implemented dual scraping approach: lightweight requests-based scraper for Castorama and Selenium-based automation with undetected Chromium for ManoMano's Cloudflare protection
  • Built FastAPI REST server with pagination, fuzzy search, price filtering, and category/brand endpoints, serving scraped data with auto-generated API documentation
  • Engineered comprehensive technical specifications parser handling multiple HTML formats (tables, lists, nested structures) with smart data cleaning and fallback mechanisms
  • Designed modular YAML-based configuration system for CSS selectors, anti-bot settings (rotating user agents, random delays), and scalable pagination control
  • Architected for future enhancements including multithreading with worker thread isolation, VPN integration for geo-blocked sites, and ML-based captcha solving
PythonBeautifulSoupSeleniumFastAPIUndetected ChromiumYAMLREST APIPytest
  • Automated data extraction pipeline for European electricity spot prices from ENTSO-E Transparency Platform API, supporting both daily and historical bulk data collection
  • Built dual operation modes: daily automated extraction for real-time market monitoring and historical data loader for large-scale backtesting and analysis with batch processing
  • Integrated PostgreSQL database with structured schema for efficient storage of hourly electricity prices across multiple European market domains (bidding zones)
  • Implemented robust error handling with retry logic, API rate limiting compliance, and data validation to ensure reliable extraction from ENTSO-E's market transparency platform
  • Designed for energy trading analytics, market research, and data science applications with configurable domain codes covering all European electricity markets
  • Features comprehensive logging, performance optimizations for large datasets, and production-ready security considerations for API key management and database connections
PythonPostgreSQLENTSO-E APIREST API IntegrationBatch ProcessingData ValidationError Handling
  • Desktop dashboard application with Tkinter GUI for real-time aggregation and AI-powered sentiment analysis of Indian news articles from 10+ major national sources
  • Built multi-threaded web scraper supporting both HTML and RSS parsing for leading outlets including Hindustan Times, Indian Express, Times of India, The Hindu, News18, and Zee News
  • Integrated Google Gemini API for automated sentiment classification (Positive/Negative/Neutral) of news headlines and article content with customizable keyword filtering
  • Implemented SQLite database for persistent local archiving, enabling historical trend analysis and research with example SQL analytics for data querying
  • Engineered robust error handling for network failures, parsing errors, and database exceptions with fallback selectors for reliable data extraction across diverse site structures
  • Designed for media monitoring, political sentiment tracking, and news analytics with configurable search terms and privacy-focused local data storage
PythonTkinterBeautifulSoupGoogle Gemini APISQLiteMulti-threadingRSS ParsingSentiment Analysis
  • End-to-end batch data pipeline for aggregating historical and daily stock prices, financial metrics, and market news from NSE/BSE APIs, screener.in, and financial news sources for comprehensive market analysis
  • Built automated ETL workflows using Azure Data Factory and Airflow for scheduled data extraction, PySpark transformations for processing large-scale stock data, and PostgreSQL/MongoDB for structured storage
  • Implemented technical indicator calculation engine (moving averages, RSI, MACD, Bollinger Bands) and sentiment analysis on financial news to generate actionable trading signals and screening criteria
  • Designed scalable batch processing pipeline handling 5000+ stocks daily with Delta Lake for data versioning, quality checks, and Power BI dashboards for market trend visualization and analysis
  • Engineered small-cap stock screening system with custom filters for undervalued stocks based on P/E ratios, volume patterns, and financial metrics aligned with swing trading strategies
  • Features comprehensive backtesting framework with historical data analysis, ROI calculation engine, portfolio simulation, and performance tracking for evaluating trading strategies
PythonPySparkAzure Data FactoryApache AirflowMongoDBPostgreSQLDelta LakePower BIYahooFinance APISentiment AnalysisBatch Processing

work experience.

Associate Data Engineer

Ascendeum

June 2025 - PresentRemote
  • Automated end-to-end ETL pipelines processing 500M+ advertisement telemetry events per month, extracting data from diverse sources including CSV, Parquet, GZ files, Excel spreadsheets, email attachments, and RESTful APIs
  • Transformed raw data using pandas and PySpark frameworks before loading into MongoDB, powering real-time ad analytics dashboards for 50+ business stakeholders
  • Built a dynamic SQL query creation engine for in-house dashboarding software that automatically generates optimized queries based on user-selected columns and KPIs, with results powering interactive chart visualizations Implemented SQLGlot library to make the query engine database-agnostic, enabling seamless addition of SQL databases beyond MySQL with minimal code changes and improved platform scalability
  • Improved MongoDB performance through advanced indexing and sharding strategies, achieving 60% boost in ingestion and read speeds and reducing dashboard latency from 30 seconds to under 10 seconds
  • Migrated legacy pandas-based workflows to PySpark, delivering 5x performance improvement on 200GB+ daily datasets and enabling efficient processing of terabyte-scale data volumes
  • Derived actionable KPIs from raw event data to create performance metrics, enabling clients to track campaign outcomes and optimize strategies in real time
  • Implemented automated data quality checks and proactive monitoring systems, ensuring 99.9% pipeline reliability and data accuracy across all workflows
  • Collaborated with business teams to design data-driven advertising strategies, resulting in 12-15% increase in client ROI
PythonpandasPySparkSQLMongoDBAWS S3ETL/ELT Pipeline DevelopmentAPI IntegrationEmail ProcessingDatabase ShardingPerformance OptimizationData Quality Automation

Data Engineer

Deloitte

August 2021 - May 2025Bengaluru, India
  • Designed and maintained end-to-end Azure Data Factory and Databricks pipelines, processing 20M+ records weekly (~1TB/month) from 10+ diverse data sources
  • Built modular, reusable PySpark and SQL frameworks that reduced pipeline development time by 30% and significantly improved code maintainability across the engineering team
  • Developed optimized SQL models and curated database views that powered Power BI dashboards for 200+ business users across the organization
  • Implemented advanced SQL Server optimization routines including indexing strategies, query refactoring, and execution plan analysis, reducing reporting query times by 20–25%
  • Improved query performance by 40% through database optimization techniques, saving 300+ engineering hours annually and accelerating executive decision-making cycles
  • Enhanced pipeline reliability to 99.9% uptime by implementing automated data quality checks, proactive monitoring systems, and robust schema evolution handling
  • Worked with Delta Lake on Databricks for efficient data storage and versioning, enabling ACID transactions on large-scale datasets
  • Collaborated with cross-functional teams to translate business requirements into scalable data solutions, supporting strategic decision-making across multiple business units
  • Completed MBA in Finance alongside full-time role, applying financial acumen to data engineering projects for ROI analysis and cost optimization initiatives
Azure Data FactoryDatabricksPySparkSQL ServerDelta LakePower BIETL/ELT Pipeline DevelopmentDatabase OptimizationData Quality AutomationPipeline MonitoringBusiness Requirements TranslationCost Optimization

education.

M.B.A in Finance

2021 - 2023

Laxmi Narayan College of Technology, Bhopal

Score: 68%

B.E in Information Technology

2017 - 2021

Institute of Engineering and Technology, Devi Ahilya University, Indore

Score: 68%

certifications.

Microsoft Certified: Azure Data Engineer Associate

Feb 2023

Microsoft

Microsoft Certified: Azure Database Administrator Associate

May 2023

Microsoft

Microsoft Certified: Azure Data Fundamentals

Jun 2023

Microsoft

github.

 
Loading LeetCode stats...

let's connect.

I'd love to hear from you! Whether you have a question, want to discuss a project, or just want to say hi, feel free to reach out using the form below.