Improved cross-functional collaboration between development, QA, and product teams by standardizing communication and documentation, accelerating deployments by 30%.
Standardized configuration management using Salt, reducing configuration drift and environment inconsistencies by 30%.
Led on-call operations and incident response, reducing mean time to recovery (MTTR) by 35% through improved RCA and postmortem practices.
Administered Kubernetes clusters supporting production workloads, achieving 99.9% service availability.
Architected and operated highly available, fault-tolerant systems meeting strict uptime and performance SLAs.
Defined and tested disaster recovery strategies through regular drills, ensuring recovery objectives were consistently met.
Managed SSL certificate lifecycle using DigiCert CertCentral and Trust Lifecycle Manager (TLM), preventing certificate expirations and security incidents.
Defined and tracked SLIs and SLOs for critical services to measure reliability and ensure SLA compliance.
Tuned monitoring alerts and thresholds, reducing on-call noise and false positives by 30%.
Performed capacity planning and resource optimization for Kubernetes workloads to prevent performance degradation during peak usage.
Streamlined routine operational tasks and runbooks using Bash and CI/CD workflows, reducing operational toil.
Supported production readiness and change management processes, validating monitoring, rollback, and deployment strategies before releases.
Movate (InMobi) — Senior SRE (Oct 2021 – Feb 2024)
Promoted to Senior Site Reliability Engineer based on ownership of production reliability, CI/CD automation, and consistent incident management performance.
Drove CI/CD automation pipelines, increasing deployment frequency by 2x and reducing release failures by 30% across multiple services.
Orchestrated Kubernetes clusters supporting 15+ microservices, enabling zero-downtime production deployments and stable rollouts.
Deployed monitoring and observability using Prometheus and Grafana, improving system visibility and alert accuracy, and reducing false alerts by 25%.
Directed incident response and escalation handling, performing RCA and publishing postmortems, contributing to a 30% reduction in repeat incidents.
Executed production and UAT releases, managed SSL certificate renewals, and supported environment promotions through CI/CD workflows, ensuring release reliability.
Automated web and mobile testing using Selenium and Appium, increasing regression coverage by 40% and improving release confidence.
Created and executed TestNG-based test suites, reducing manual testing effort by 50% and speeding up validation cycles.
Designed and maintained data-driven automation frameworks in Java, improving test reusability by 40% and enabling scalable execution across multiple test suites.
Developed web scraping solutions to extract and validate test data, reducing manual data validation effort by 50% and improving test execution efficiency.
Provisioned AWS EC2 and S3 resources to support testing and application environments, enabling scalable test execution.
Hotel Zaryab — System Engineer (Nov 2019 – Jun 2021)
Installed, configured, and maintained software, hardware, and system components, ensuring stable day-to-day operations and reducing unplanned downtime by 25%.
Administered network servers and infrastructure tools, maintaining >99.5% system availability and consistent performance across production environments.
Monitored system performance and capacity metrics, proactively identifying resource constraints and preventing service impact during peak usage periods.
Diagnosed and resolved system outages, restoring services within defined SLAs and reducing average downtime per incident by 30%.
KVCH — Cloud AWS Intern (Jan 2019 – May 2019)
Provisioned and tuned EC2 instances to support secure and scalable workloads in AWS environments.
Monitored AWS services using CloudWatch and CloudTrail, improving operational visibility, audit readiness, and compliance reporting across cloud environments.
Designed AWS infrastructure components including VPC, IAM, Load Balancers, Route 53, EBS, EFS, and Glacier, following AWS best practices.