Title: Technical Lead-App Development
Area(s) of responsibility
Java Backend Engineer with Reliability Engineering
We are looking for a highly skilled Java Backend Engineer with Reliability Engineering experience to help design, build, and maintain reliable, scalable, and observable backend services. The ideal candidate will have strong hands‑on expertise in Java/Spring Boot, microservices, API development, and a deep understanding of Splunk for observability, monitoring, alerting, and troubleshooting.
This role focuses on improving application reliability, performance, and logging quality across distributed systems while collaborating with DevOps, SRE, and platform teams.
Key Responsibilities
Backend Engineering
- Design, develop, and maintain backend services using Java (8+), Spring Boot, and microservices architecture.
- Build scalable RESTful APIs and backend components with strong emphasis on performance and security.
- Improve service reliability through better error handling, resiliency patterns, and design best practices.
- Contribute to architecture discussions, technical design, and code reviews.
Reliability Engineering & Observability
- Implement and maintain application observability using Splunk (dashboards, alerts, log analysis, correlation searches).
- Optimize log ingestion pipelines and ensure consistent logging standards (structured logs, correlation IDs, trace IDs).
- Monitor application health, performance metrics, latency, errors, and resource utilization.
- Troubleshoot production issues by analyzing logs, SPL queries, and monitoring data.
- Identify and resolve reliability bottlenecks and proactively improve system stability.
Systems Performance & Monitoring
- Develop actionable Splunk dashboards for service KPIs, throughput, latency, and error rates.
- Set up real-time alerts to detect anomalies, failures, or degradation in service behavior.
- Tune SPL queries to improve performance and reduce compute cost.
- Work with DevOps/SRE teams to strengthen monitoring, alerting, and incident response.
Long Description
Collaboration & Continuous Improvement
- Work closely with SRE, DevOps, QA, Cloud, and Product Engineering teams.
- Contribute to on-call readiness, root cause analysis (RCA), and reliability improvement plans.
- Promote logging best practices and observability standards across engineering teams.
- Participate in CI/CD pipeline improvements and automation efforts.
Required Skills & Qualifications
- Strong experience with Java (8 or above), Spring Boot, and REST API development.
- Solid understanding of microservices, multi-threading, and design patterns.
- Hands-on expertise with Splunk:
- SPL queries
- Dashboards (Classic or Dashboard Studio)
- Alerts and reports
- Field extractions & log parsing
- Deep knowledge of logging frameworks (Log4j2, SLF4J, Logback).
- Experience with JSON logging, structured logs, and correlation identifiers.
- Familiarity with CI/CD pipelines, Git, Maven/Gradle.
- Strong debugging and production troubleshooting skills using logs and monitoring tools.
- experience with Kubernetes, Docker, or cloud-native applications.
- Knowledge of observability tools like ELK, OpenTelemetry, Grafana, Prometheus.
- Understanding of SRE concepts: SLIs, SLOs, SLAs, error budgets.
- Familiarity with message brokers (Kafka, RabbitMQ).
- Experience with Splunk configuration-as-code or automation via REST APIs