Job Description

Responsibilities:​​​​End-to-End Alert Management​​: Respond to alerts from overseas Kubernetes (K8s) clusters, public cloud resources, and core business services. Follow standard operating procedures (SOPs) for tiered troubleshooting, prioritize P0/P1 incidents, and collaborate with domestic technical teams to drive issue resolution. Conduct regular reviews of alert root causes and optimize alerting rules.​​Business Demand Support​​: Address daily operational requests from overseas business teams (including alert handling, version updates, resource scaling, etc.). Ensure responses meet SLA timelines, provide progress and result updates, and maintain detailed operation records.​​K8s and Public Cloud Operations​​: Perform daily operations and maintenance for overseas K8s clusters (including managed clusters) and core public cloud resources.​​Documentation and Synchronization​​: Record alert handling processes, demand operation steps, and K8s cluster operations logs as required. Provide daily updates to overseas project teams. Assist in compiling overseas operations FAQ documentation to support knowledge sharing within the team.​​Requirements:​​Bachelor degree or above in computer science or related major​​Experience and Basic Skills​​: Proficient in Linux daily operations (command-line troubleshooting, process management, log analysis) with the ability to independently resolve basic failures. Experience in overseas business operations or public cloud maintenance is preferred.​​K8s and Public Cloud Proficiency​​: Skilled in daily K8s operations with knowledge of core K8s component logic. Proficient in at least one major public cloud platform (AWS/Azure/GCP), capable of independently performing instance start/stop, storage mounting, security group configuration, and cloud monitoring alert setup. Experience with managed K8s services (e.g., AWS EKS, Azure AKS) is preferred.Overseas Collaboration Skills: Fluent in English and Mandarin reading and writing with basic oral communication skills (able to handle daily meetings). Adaptable to overseas business time zones and capable of efficient cross-time-zone collaboration.​​Tooling and Development Skills​​: Experience in operational scripting/development with Shell, Python, or Go. Familiar with basic Docker operations. Knowledge of Jenkins/GitLab CI workflows is preferred.​​Soft Skills​​: Strong execution capability and sense of responsibility, strictly adhering to SOPs for alerts, requests, and public cloud operations without missing critical steps. Resilient under pressure, able to respond swiftly to emergencies. Proactive in communication, escalating issues to domestic teams promptly when unable to resolve independently.​​Key Focus Areas:​​Operational excellence in distributed systems and cloud-native environments.Cross-cultural and cross-time-zone collaboration agility.Continuous improvement through incident reviews and knowledge retention.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In