Ani Sridharan
Building resilient systems at scale
Delivering 99.999% reliability for Fortune 500 enterprises and hyper-growth startups alike.
Principal Engineer and hands-on platform architect focused on reliability, performance, and AI-powered operations. 10+ years across AWS, Azure, and GCP delivering meaningful business outcomes.
Engineering Leadership
Built high-performing teams from scratch and scaled multi-team initiatives across infrastructure operations, observability, and systems reliability.
Thought Leadership
Practicing customer-obsessed engineering—using real user journeys to shape reliability, UX, and operational guardrails.
Hands-on Engineer
Designing and shipping infrastructure as code, Kubernetes platforms, and back-end services; debugging production systems.
AI/ML Expertise
Applying AIOps and self-healing to reduce toil, auto-remediating recurring issues and accelerating incident triage.
Systems Reliability & Operations
Defining SLIs/SLOs, capacity planning, and chaos/performance testing to make quiet on-call a first-class outcome.
Reliability at Scale
Delivering multi-region architectures and high-throughput telemetry pipelines with five-nines availability targets.
Operations & Incident Management
Leading incident command, tuning escalation policies, and using correlation-ID tracing to accelerate root-cause analysis.
Tech stack preview
Platforms and tooling powering reliability, automation, and observability.
AI & ML
Secure MCPs, LangChain, MLOps, OpenAI, PyTorch
Cloud
Multi-cloud expertise (AWS, GCP, Azure) + PCF/private clouds
Automation & CI/CD
Terraform, Jenkins, GitHub Actions
Containers & Orchestration
Docker, Kubernetes, Helm
Observability
Prometheus, Grafana, Datadog, ELK, AppDynamics, Dynatrace, New Relic, Splunk, SignalFx
Data
Databricks, Microsoft Fabric, Snowflake
Featured tool
Plan SLO targets, error budgets, and allowable downtime in seconds.
99.95% uptime
21m 54s / month
Latest insights
Experiments, playbooks from the field and 2AM thoughts.
- SLOs vs SLIs vs SLAs: A Practical Guide1 min read
- The Hidden Costs of Technical Debt1 min read
- Building High-Performing SRE Teams1 min read
