Country/Region:  IN
Requisition ID:  31238
Work Model: 
Position Type: 
Salary Range: 
Location:  INDIA - NOIDA- BIRLASOFT OFFICE

Title:  Technical Lead-Cloud & Infra Engg

Description: 

Area(s) of responsibility

Grafana Administrator – Job Description

Core Responsibilities

User & Access Management

  • Create, update, and delete user accounts.
  • Assign roles and permissions via OKTA groups:
  •   - Grafana_Admin_Assignment_Group (Admins)
  •   - Grafana_Editors_Assignment_Group (Regular users)
  •   - Grafana_SOC_Admin_Group and Grafana_SOC_Editor_Group for SOC environments.
  • Ensure admin access is granted only upon ARF approval.

Dashboard & Visualization Management

  • Create and manage dashboards using data sources like Prometheus, Loki, and Tempo.
  • Customize panels, variables, and layouts for dynamic filtering.
  • Add trace components using Tempo and trace IDs.

Alerting & Monitoring

  • Set up and manage alerts based on log and metric data.
  • Ensure alerts are configured correctly and notifications are sent to appropriate users.
  • Monitor the health and performance of the Grafana instance.

System Administration

  • Perform regular backups of Grafana configurations and data.
  • Restore data from backups when necessary.
  • Escalate issues to platform owners as needed.

Documentation & Compliance

  • Maintain documentation for Grafana configurations, dashboards, and processes.
  • Support audit and compliance requirements by ensuring traceability and access logs.

Stack Deployment & Maintenance

  • Deploy and manage Grafana stack with Prometheus, Loki, and Tempo using Docker Compose.
  • Configure Prometheus to scrape metrics and Loki for log aggregation.
  • Maintain and update docker-compose and Prometheus configuration files.

Required Qualifications

Education & Certifications

  • Bachelor’s degree in Computer Science, IT, or related field.
  • Certifications preferred: Grafana Cloud Admin, Prometheus Certified Associate, or equivalent.

Experience

  • 3–5 years of experience in monitoring and observability platforms.
  • Hands-on experience with Grafana, Prometheus, Loki, Tempo, and Docker.
  • Familiarity with OKTA, ARF workflows, and enterprise access control.

Skills

  • Strong troubleshooting and analytical skills.
  • Proficiency in scripting (Bash, Python) and automation tools (Ansible, Terraform).
  • Excellent communication and documentation abilities.
  • Willingness to work in 24x7 support environments and rotational shifts.