$

Shakir Bhat — Cloud • Site Reliability Engineer • DevOps

$ pwd

/Kashmir • /Bangalore

$ uptime

up 6.3 years, load average: 0.00, 0.01, 0.05

$ printenv CONTACT

Email=shakirbhatpc@gmail.com

GitHub=github.com/shakirbhattt

LinkedIn=linkedin.com/in/shakirbhattt

Dev=dev.to/shakirbhattt

Medium=medium.com/@shakirbhattt

$ cat skills/all.txt

Windows

Linux

AWS

VMware

Kubernetes

Docker

Git

Jenkins

Bamboo

Terraform

Ansible

Salt

Prometheus

Grafana

New Relic

Splunk

Nagios

Vault

SSL/TLS

Bash

MySQL

PostgreSQL

TCP/IP

DNS

Nexus

Docker Hub

GitHub

Helm

YAML

Jira

OpsGenie

Confluence

$ history | grep experience

DigiCert — Site Reliability Engineer (Feb 2024 – Present)

Automated deployment, scaling, and monitoring workflows, reducing manual operational effort by 40% and improving deployment consistency.
Implemented centralized monitoring, logging, and alerting using New Relic and Splunk, reducing incident detection time by 35%.
Owned CI/CD pipelines supporting 20+ services, improving deployment success rates and reducing release-related incidents.
Improved cross-functional collaboration between development, QA, and product teams by standardizing communication and documentation, accelerating deployments by 30%.
Standardized configuration management using Salt, reducing configuration drift and environment inconsistencies by 30%.
Led on-call operations and incident response, reducing mean time to recovery (MTTR) by 35% through improved RCA and postmortem practices.
Administered Kubernetes clusters supporting production workloads, achieving 99.9% service availability.
Architected and operated highly available, fault-tolerant systems meeting strict uptime and performance SLAs.
Defined and tested disaster recovery strategies through regular drills, ensuring recovery objectives were consistently met.
Managed SSL certificate lifecycle using DigiCert CertCentral and Trust Lifecycle Manager (TLM), preventing certificate expirations and security incidents.
Defined and tracked SLIs and SLOs for critical services to measure reliability and ensure SLA compliance.
Tuned monitoring alerts and thresholds, reducing on-call noise and false positives by 30%.
Performed capacity planning and resource optimization for Kubernetes workloads to prevent performance degradation during peak usage.
Streamlined routine operational tasks and runbooks using Bash and CI/CD workflows, reducing operational toil.
Supported production readiness and change management processes, validating monitoring, rollback, and deployment strategies before releases.

Movate (InMobi) — SRE (Oct 2021 – Feb 2024)

Promoted to Senior Site Reliability Engineer based on ownership of production reliability, CI/CD automation, and consistent incident management performance.
Drove CI/CD automation pipelines, increasing deployment frequency by 2x and reducing release failures by 30% across multiple services.
Orchestrated Kubernetes clusters supporting 15+ microservices, enabling zero-downtime production deployments and stable rollouts.
Deployed monitoring and observability using Prometheus and Grafana, improving system visibility and alert accuracy, and reducing false alerts by 25%.
Directed incident response and escalation handling, performing RCA and publishing postmortems, contributing to a 30% reduction in repeat incidents.
Executed production and UAT releases, managed SSL certificate renewals, and supported environment promotions through CI/CD workflows, ensuring release reliability.

Snapbizz — QA DevOps Engineer (Jun 2021 – Sep 2021)

Automated web and mobile testing using Selenium and Appium, increasing regression coverage by 40% and improving release confidence.
Created and executed TestNG-based test suites, reducing manual testing effort by 50% and speeding up validation cycles.
Designed and maintained data-driven automation frameworks in Java, improving test reusability by 40% and enabling scalable execution across multiple test suites.
Developed web scraping solutions to extract and validate test data, reducing manual data validation effort by 50% and improving test execution efficiency.
Provisioned AWS EC2 and S3 resources to support testing and application environments, enabling scalable test execution.

Hotel Zaryab — System Engineer (Nov 2019 – Jun 2021)

Installed, configured, and maintained software, hardware, and system components, ensuring stable day-to-day operations and reducing unplanned downtime by 25%.
Administered network servers and infrastructure tools, maintaining >99.5% system availability and consistent performance across production environments.
Monitored system performance and capacity metrics, proactively identifying resource constraints and preventing service impact during peak usage periods.
Diagnosed and resolved system outages, restoring services within defined SLAs and reducing average downtime per incident by 30%.

KVCH — Cloud AWS Intern (Jan 2019 – May 2019)

Provisioned and tuned EC2 instances to support secure and scalable workloads in AWS environments.
Monitored AWS services using CloudWatch and CloudTrail, improving operational visibility, audit readiness, and compliance reporting across cloud environments.
Designed AWS infrastructure components including VPC, IAM, Load Balancers, Route 53, EBS, EFS, and Glacier, following AWS best practices.

$ cat education.txt

B.Tech, Computer Science Engineering (2015 – 2019)

MRS Punjab Technical University, Punjab

$ cat achievements.log

Maintained 99.9% production uptime through improved observability and reliability practices.
Reduced incident MTTR by 35% via alert tuning and structured RCA processes.
Reduced on-call alert noise by 30% by optimizing alert thresholds, routing rules, and escalation policies.
Enabled faster and safer production releases by improving CI/CD deployment workflows, reducing release-related incidents by 25%.

$ cat awards.log

High-Flyer Award 2023 - InMobi (Movate)

$