Job Description
Key Responsibilities:
- Provide end-to-end operational support for GCP services including GKE, BigQuery, Cloud SQL, Redis, Cassandra, BigTable, Cloud Filestore, Apigee, Kafka, Dataflow, GCS.
- Manage and support DEV to PROD GCP Cloud Containers, PaaS, and IaaS services ensuring performance, availability, and compliance.
- Troubleshoot and resolve complex production issues and manage incidents (P1-P4) following ITIL processes.
- Develop and maintain CI/CD pipelines using GitHub Actions, Jenkins, and related tools.
- Design and deploy Helm charts and Kubernetes configurations (cluster roles, services, deployments, network policies, ingress controllers, certificate manager, service mesh, etc.).
- Implement and maintain compliance policies/scripts with tools such as Google Org Policy, AquaSec, Wiz.
- Monitor environments using Dynatrace, Datadog, or similar tools to optimize performance and utilization.
- Collaborate with cross-functional teams to improve automation, scalability, and container security.
- Participate in on-call rotations and handle security incidents, investigations, and post-mortem analysis.
- Provide operational consultancy for future-state technologies and support continuous improvement initiatives.
Required Skills & Experience:
- 3+ years of hands-on experience with Kubernetes, GKE, AKS, Docker, or Podman.
- Strong understanding of network security, encryption protocols, and identity management.
- Proficient in CI/CD pipelines (GitHub Actions, Jenkins).
- Experience with Google Cloud Run, GKE Autopilot, Anthos Service Mesh.
- Solid knowledge of Kubernetes resource types, Helm, and Kubernetes networking.
- Experience with Google Cloud services (BigQuery, Cloud SQL, Dataflow, Apigee, etc.).
- Familiar with monitoring tools (Dynatrace, Datadog).
- Strong scripting experience in Python, Bash, PowerShell, or JavaScript.
- Working knowledge of Linux (RedHat) and Windows operating systems.
- Understanding of LAN/WAN, DevOps, and Agile methodologies.
- Familiarity with ITIL processes (Change, Incident, Problem Management).
- Excellent analytical, problem-solving, and communication skills.
Nice-to-Have:
- Experience supporting Azure Cloud environments.
Professional Certifications:
- Certified Kubernetes Administrator (CKA)
- Certified Kubernetes Security Specialist (CKS)
- Certified Terraform Associate