What is SRE lab?

Image result

Site reliability engineering or SRE is a bridge between development and the production maintenance/operations team. Companies build SRE teams to make sure while new features are developed and deployed continuously, the production systems are running smoothly. Sometimes SRE are designed handle both DevOps part + maintenance/ops part.

DevOps assignments and duties:

  • Creating and maintaining pipelines across environments
  • Improving and automating deployments for new services/features
  • Monitoring builds and upgrading existing pipelines for better performance
  • Improving automation for testing, Integration, deployment, configuration and monitoring
  • Provide detailed devops documentation (about pipelines and automation)
  • Provide system self-healing capability

Maintenance/Ops assignments and duties:

  • Proactively monitor and review application performance (known tools are Elasticsearch, Dynatrace)
  • Ensure scalability and availability of the system
  • Work with development to setup and configure API Gateway
  • Provide monitoring and tracing from API Gateway monitoring tools (like Prometheus and Grafana)
  • Handle on-call and emergency support
  • Ensure system’s logging and diagnostics
  • Help triage escalated support tickets
  • Work on feature requests, defects and other development tasks on pipelines and monitoring
  • Contribute to overall product roadmap
  • Provide detailed ops documentation (about pipelines and automation)
  • AI and predictive maintenance

Leave a Reply

Your email address will not be published. Required fields are marked *