Nitish Songara - Portfolio Website

Hi, I'm Nitish Songara — a Data Engineer with 4+ years of experience building scalable ETL pipelines and cloud-based data solutions at Deloitte and Ascendeum. I specialize in Python, PySpark, SQL, and cloud platforms (Azure, AWS, MongoDB), delivering high-performance systems that process millions of events and power data-driven decision-making.

I'm actively pursuing freelancing opportunities to collaborate on challenging data engineering projects worldwide, whether it's architecting pipelines, optimizing databases, or building custom analytics platforms.

Resume Email

technical skills.

< programming_languages_and_frameworks />

Python (PySpark, pandas)SQL (Query Optimization, Complex Queries, Databricks SQL)Java (Basic to Intermediate)Databricks Delta Lake

< cloud_platforms_and_data_orchestration />

Microsoft Azure (Azure Data Factory, DP-203, DP-300, AZ-900 certified)Amazon Web Services (S3, Lambda, EC2 basics)Apache Airflow (Workflow Orchestration & Scheduling)Data Pipeline Orchestration & Automation

< data_storage_and_modeling />

MongoDB (Indexing, Sharding, Performance Tuning)SQL Server (Performance Optimization)Data Modeling & Schema DesignDelta Lake (scalable storage management)

< data_engineering_operations />

ETL/ELT Pipeline DevelopmentData Quality Checks & MonitoringDebugging & Troubleshooting Data PipelinesWorkflow Migration (Pandas to PySpark)

< business_intelligence_and_analytics />

Power BI Dashboarding & VisualizationKPI Derivation & Performance Metric CreationData-driven Decision Support

< additional_skills />

Data Structures & AlgorithmsAI Tools for Development Automation & SpeedModular, Maintainable Pipeline DesignCollaborative Project Management & Version Control (Git)

< communication_and_consulting_skills />

Technical documentationClient requirement gatheringPresentation of analytics insights

featured projects.

Bathroom Pricing Engine

Click to know more...

AI-powered pricing engine that converts natural language bathroom renovation descriptions into detailed cost estimates with materials, labor, margins, and VAT calculations
Parses free-form project descriptions using OpenRouter API and generates structured JSON quotes with comprehensive cost breakdowns
Built modular pricing logic with separate modules for material database management (1300+ materials), labor calculations with difficulty-based rates, and multi-country VAT rules
Implements intelligent fallback systems with API integration for missing materials and regional pricing adjustments across major French cities
Features automated testing suite and handles edge cases including unknown materials, invalid inputs, and API failures with graceful degradation
Designed for future enhancement with local AI model for 10-50x faster processing, vendor comparisons, and budget-optimized recommendations

PythonOpenRouter APIJSONNatural Language ProcessingRESTful API Integration

Bathroom Materials Scraper & API

Click to know more...

Multi-site web scraping system that extracts bathroom product data (pricing, specs, images) from e-commerce platforms with automated anti-bot bypass mechanisms
Implemented dual scraping approach: lightweight requests-based scraper for Castorama and Selenium-based automation with undetected Chromium for ManoMano's Cloudflare protection
Built FastAPI REST server with pagination, fuzzy search, price filtering, and category/brand endpoints, serving scraped data with auto-generated API documentation
Engineered comprehensive technical specifications parser handling multiple HTML formats (tables, lists, nested structures) with smart data cleaning and fallback mechanisms
Designed modular YAML-based configuration system for CSS selectors, anti-bot settings (rotating user agents, random delays), and scalable pagination control
Architected for future enhancements including multithreading with worker thread isolation, VPN integration for geo-blocked sites, and ML-based captcha solving

PythonBeautifulSoupSeleniumFastAPIUndetected ChromiumYAMLREST APIPytest

European Energy Market Data Extractor

Click to know more...

Automated data extraction pipeline for European electricity spot prices from ENTSO-E Transparency Platform API, supporting both daily and historical bulk data collection
Built dual operation modes: daily automated extraction for real-time market monitoring and historical data loader for large-scale backtesting and analysis with batch processing
Integrated PostgreSQL database with structured schema for efficient storage of hourly electricity prices across multiple European market domains (bidding zones)
Implemented robust error handling with retry logic, API rate limiting compliance, and data validation to ensure reliable extraction from ENTSO-E's market transparency platform
Designed for energy trading analytics, market research, and data science applications with configurable domain codes covering all European electricity markets
Features comprehensive logging, performance optimizations for large datasets, and production-ready security considerations for API key management and database connections

PythonPostgreSQLENTSO-E APIREST API IntegrationBatch ProcessingData ValidationError Handling

India News Sentiment Dashboard and Scraper

Click to know more...

Desktop dashboard application with Tkinter GUI for real-time aggregation and AI-powered sentiment analysis of Indian news articles from 10+ major national sources
Built multi-threaded web scraper supporting both HTML and RSS parsing for leading outlets including Hindustan Times, Indian Express, Times of India, The Hindu, News18, and Zee News
Integrated Google Gemini API for automated sentiment classification (Positive/Negative/Neutral) of news headlines and article content with customizable keyword filtering
Implemented SQLite database for persistent local archiving, enabling historical trend analysis and research with example SQL analytics for data querying
Engineered robust error handling for network failures, parsing errors, and database exceptions with fallback selectors for reliable data extraction across diverse site structures
Designed for media monitoring, political sentiment tracking, and news analytics with configurable search terms and privacy-focused local data storage

PythonTkinterBeautifulSoupGoogle Gemini APISQLiteMulti-threadingRSS ParsingSentiment Analysis

Indian Stock Market Analytics & Screening Platform

Click to know more...

End-to-end batch data pipeline for aggregating historical and daily stock prices, financial metrics, and market news from NSE/BSE APIs, screener.in, and financial news sources for comprehensive market analysis
Built automated ETL workflows using Azure Data Factory and Airflow for scheduled data extraction, PySpark transformations for processing large-scale stock data, and PostgreSQL/MongoDB for structured storage
Implemented technical indicator calculation engine (moving averages, RSI, MACD, Bollinger Bands) and sentiment analysis on financial news to generate actionable trading signals and screening criteria
Designed scalable batch processing pipeline handling 5000+ stocks daily with Delta Lake for data versioning, quality checks, and Power BI dashboards for market trend visualization and analysis
Engineered small-cap stock screening system with custom filters for undervalued stocks based on P/E ratios, volume patterns, and financial metrics aligned with swing trading strategies
Features comprehensive backtesting framework with historical data analysis, ROI calculation engine, portfolio simulation, and performance tracking for evaluating trading strategies

PythonPySparkAzure Data FactoryApache AirflowMongoDBPostgreSQLDelta LakePower BIYahooFinance APISentiment AnalysisBatch Processing

work experience.

Associate Data Engineer

Ascendeum

June 2025 - PresentRemote

Automated end-to-end ETL pipelines processing 500M+ advertisement telemetry events per month, extracting data from diverse sources including CSV, Parquet, GZ files, Excel spreadsheets, email attachments, and RESTful APIs
Transformed raw data using pandas and PySpark frameworks before loading into MongoDB, powering real-time ad analytics dashboards for 50+ business stakeholders
Built a dynamic SQL query creation engine for in-house dashboarding software that automatically generates optimized queries based on user-selected columns and KPIs, with results powering interactive chart visualizations Implemented SQLGlot library to make the query engine database-agnostic, enabling seamless addition of SQL databases beyond MySQL with minimal code changes and improved platform scalability
Improved MongoDB performance through advanced indexing and sharding strategies, achieving 60% boost in ingestion and read speeds and reducing dashboard latency from 30 seconds to under 10 seconds
Migrated legacy pandas-based workflows to PySpark, delivering 5x performance improvement on 200GB+ daily datasets and enabling efficient processing of terabyte-scale data volumes
Derived actionable KPIs from raw event data to create performance metrics, enabling clients to track campaign outcomes and optimize strategies in real time
Implemented automated data quality checks and proactive monitoring systems, ensuring 99.9% pipeline reliability and data accuracy across all workflows
Collaborated with business teams to design data-driven advertising strategies, resulting in 12-15% increase in client ROI

PythonpandasPySparkSQLMongoDBAWS S3ETL/ELT Pipeline DevelopmentAPI IntegrationEmail ProcessingDatabase ShardingPerformance OptimizationData Quality Automation

Data Engineer

Deloitte

August 2021 - May 2025Bengaluru, India

Designed and maintained end-to-end Azure Data Factory and Databricks pipelines, processing 20M+ records weekly (~1TB/month) from 10+ diverse data sources
Built modular, reusable PySpark and SQL frameworks that reduced pipeline development time by 30% and significantly improved code maintainability across the engineering team
Developed optimized SQL models and curated database views that powered Power BI dashboards for 200+ business users across the organization
Implemented advanced SQL Server optimization routines including indexing strategies, query refactoring, and execution plan analysis, reducing reporting query times by 20–25%
Improved query performance by 40% through database optimization techniques, saving 300+ engineering hours annually and accelerating executive decision-making cycles
Enhanced pipeline reliability to 99.9% uptime by implementing automated data quality checks, proactive monitoring systems, and robust schema evolution handling
Worked with Delta Lake on Databricks for efficient data storage and versioning, enabling ACID transactions on large-scale datasets
Collaborated with cross-functional teams to translate business requirements into scalable data solutions, supporting strategic decision-making across multiple business units
Completed MBA in Finance alongside full-time role, applying financial acumen to data engineering projects for ROI analysis and cost optimization initiatives

Azure Data FactoryDatabricksPySparkSQL ServerDelta LakePower BIETL/ELT Pipeline DevelopmentDatabase OptimizationData Quality AutomationPipeline MonitoringBusiness Requirements TranslationCost Optimization

education.

M.B.A in Finance

2021 - 2023

Laxmi Narayan College of Technology, Bhopal

Score: 68%

B.E in Information Technology

2017 - 2021

Institute of Engineering and Technology, Devi Ahilya University, Indore

Score: 68%

certifications.

Microsoft Certified: Azure Data Engineer Associate

Feb 2023

Microsoft

Microsoft Certified: Azure Database Administrator Associate

May 2023

Microsoft

Microsoft Certified: Azure Data Fundamentals

Jun 2023

Microsoft

github.

sonu7089

Loading LeetCode stats...

let's connect.

I'd love to hear from you! Whether you have a question, want to discuss a project, or just want to say hi, feel free to reach out using the form below.