
Nitish Songara
Data Engineer
Hi, I'm Nitish Songara — a Data Engineer with 4+ years of experience building scalable ETL pipelines and cloud-based data solutions at Deloitte and Ascendeum. I specialize in Python, PySpark, SQL, and cloud platforms (Azure, AWS, MongoDB), delivering high-performance systems that process millions of events and power data-driven decision-making.
I'm actively pursuing freelancing opportunities to collaborate on challenging data engineering projects worldwide, whether it's architecting pipelines, optimizing databases, or building custom analytics platforms.
technical skills.
< programming_languages_and_frameworks />
< cloud_platforms_and_data_orchestration />
< data_storage_and_modeling />
< data_engineering_operations />
< business_intelligence_and_analytics />
< additional_skills />
< communication_and_consulting_skills />
featured projects.
Bathroom Pricing Engine
Click to know more...
- AI-powered pricing engine that converts natural language bathroom renovation descriptions into detailed cost estimates with materials, labor, margins, and VAT calculations
- Parses free-form project descriptions using OpenRouter API and generates structured JSON quotes with comprehensive cost breakdowns
- Built modular pricing logic with separate modules for material database management (1300+ materials), labor calculations with difficulty-based rates, and multi-country VAT rules
- Implements intelligent fallback systems with API integration for missing materials and regional pricing adjustments across major French cities
- Features automated testing suite and handles edge cases including unknown materials, invalid inputs, and API failures with graceful degradation
- Designed for future enhancement with local AI model for 10-50x faster processing, vendor comparisons, and budget-optimized recommendations
Bathroom Materials Scraper & API
Click to know more...
- Multi-site web scraping system that extracts bathroom product data (pricing, specs, images) from e-commerce platforms with automated anti-bot bypass mechanisms
- Implemented dual scraping approach: lightweight requests-based scraper for Castorama and Selenium-based automation with undetected Chromium for ManoMano's Cloudflare protection
- Built FastAPI REST server with pagination, fuzzy search, price filtering, and category/brand endpoints, serving scraped data with auto-generated API documentation
- Engineered comprehensive technical specifications parser handling multiple HTML formats (tables, lists, nested structures) with smart data cleaning and fallback mechanisms
- Designed modular YAML-based configuration system for CSS selectors, anti-bot settings (rotating user agents, random delays), and scalable pagination control
- Architected for future enhancements including multithreading with worker thread isolation, VPN integration for geo-blocked sites, and ML-based captcha solving
European Energy Market Data Extractor
Click to know more...
- Automated data extraction pipeline for European electricity spot prices from ENTSO-E Transparency Platform API, supporting both daily and historical bulk data collection
- Built dual operation modes: daily automated extraction for real-time market monitoring and historical data loader for large-scale backtesting and analysis with batch processing
- Integrated PostgreSQL database with structured schema for efficient storage of hourly electricity prices across multiple European market domains (bidding zones)
- Implemented robust error handling with retry logic, API rate limiting compliance, and data validation to ensure reliable extraction from ENTSO-E's market transparency platform
- Designed for energy trading analytics, market research, and data science applications with configurable domain codes covering all European electricity markets
- Features comprehensive logging, performance optimizations for large datasets, and production-ready security considerations for API key management and database connections
India News Sentiment Dashboard and Scraper
Click to know more...
- Desktop dashboard application with Tkinter GUI for real-time aggregation and AI-powered sentiment analysis of Indian news articles from 10+ major national sources
- Built multi-threaded web scraper supporting both HTML and RSS parsing for leading outlets including Hindustan Times, Indian Express, Times of India, The Hindu, News18, and Zee News
- Integrated Google Gemini API for automated sentiment classification (Positive/Negative/Neutral) of news headlines and article content with customizable keyword filtering
- Implemented SQLite database for persistent local archiving, enabling historical trend analysis and research with example SQL analytics for data querying
- Engineered robust error handling for network failures, parsing errors, and database exceptions with fallback selectors for reliable data extraction across diverse site structures
- Designed for media monitoring, political sentiment tracking, and news analytics with configurable search terms and privacy-focused local data storage
Indian Stock Market Analytics & Screening Platform
Click to know more...
- End-to-end batch data pipeline for aggregating historical and daily stock prices, financial metrics, and market news from NSE/BSE APIs, screener.in, and financial news sources for comprehensive market analysis
- Built automated ETL workflows using Azure Data Factory and Airflow for scheduled data extraction, PySpark transformations for processing large-scale stock data, and PostgreSQL/MongoDB for structured storage
- Implemented technical indicator calculation engine (moving averages, RSI, MACD, Bollinger Bands) and sentiment analysis on financial news to generate actionable trading signals and screening criteria
- Designed scalable batch processing pipeline handling 5000+ stocks daily with Delta Lake for data versioning, quality checks, and Power BI dashboards for market trend visualization and analysis
- Engineered small-cap stock screening system with custom filters for undervalued stocks based on P/E ratios, volume patterns, and financial metrics aligned with swing trading strategies
- Features comprehensive backtesting framework with historical data analysis, ROI calculation engine, portfolio simulation, and performance tracking for evaluating trading strategies
work experience.
Associate Data Engineer
- Automated end-to-end ETL pipelines processing 500M+ advertisement telemetry events per month, extracting data from diverse sources including CSV, Parquet, GZ files, Excel spreadsheets, email attachments, and RESTful APIs
- Transformed raw data using pandas and PySpark frameworks before loading into MongoDB, powering real-time ad analytics dashboards for 50+ business stakeholders
- Built a dynamic SQL query creation engine for in-house dashboarding software that automatically generates optimized queries based on user-selected columns and KPIs, with results powering interactive chart visualizations Implemented SQLGlot library to make the query engine database-agnostic, enabling seamless addition of SQL databases beyond MySQL with minimal code changes and improved platform scalability
- Improved MongoDB performance through advanced indexing and sharding strategies, achieving 60% boost in ingestion and read speeds and reducing dashboard latency from 30 seconds to under 10 seconds
- Migrated legacy pandas-based workflows to PySpark, delivering 5x performance improvement on 200GB+ daily datasets and enabling efficient processing of terabyte-scale data volumes
- Derived actionable KPIs from raw event data to create performance metrics, enabling clients to track campaign outcomes and optimize strategies in real time
- Implemented automated data quality checks and proactive monitoring systems, ensuring 99.9% pipeline reliability and data accuracy across all workflows
- Collaborated with business teams to design data-driven advertising strategies, resulting in 12-15% increase in client ROI
Data Engineer
- Designed and maintained end-to-end Azure Data Factory and Databricks pipelines, processing 20M+ records weekly (~1TB/month) from 10+ diverse data sources
- Built modular, reusable PySpark and SQL frameworks that reduced pipeline development time by 30% and significantly improved code maintainability across the engineering team
- Developed optimized SQL models and curated database views that powered Power BI dashboards for 200+ business users across the organization
- Implemented advanced SQL Server optimization routines including indexing strategies, query refactoring, and execution plan analysis, reducing reporting query times by 20–25%
- Improved query performance by 40% through database optimization techniques, saving 300+ engineering hours annually and accelerating executive decision-making cycles
- Enhanced pipeline reliability to 99.9% uptime by implementing automated data quality checks, proactive monitoring systems, and robust schema evolution handling
- Worked with Delta Lake on Databricks for efficient data storage and versioning, enabling ACID transactions on large-scale datasets
- Collaborated with cross-functional teams to translate business requirements into scalable data solutions, supporting strategic decision-making across multiple business units
- Completed MBA in Finance alongside full-time role, applying financial acumen to data engineering projects for ROI analysis and cost optimization initiatives
education.
B.E in Information Technology
2017 - 2021Institute of Engineering and Technology, Devi Ahilya University, Indore
Score: 68%
certifications.
github.
let's connect.
I'd love to hear from you! Whether you have a question, want to discuss a project, or just want to say hi, feel free to reach out using the form below.