Snr IT Infrastructure Engineer
Posted: 2 days ago
Job Description
Position OverviewThe Principal Infrastructure Engineer (L4) serves as the senior-most technical authority for server and data center infrastructure, responsible for designing, implementing, and maintaining mission-critical compute, storage, virtualization systems, and enterprise Windows and Linux server environments. This role drives architectural strategy, provides escalation for complex incidents, and ensures platform stability, resilience, and performance across hybrid on-premises and cloud environments.Key ResponsibilitiesArchitecture & Design Design and architect enterprise server, storage, virtualization, and Windows/Linux server platforms for high availability and performance. Develop infrastructure blueprints, build standards, and reference architectures for server and data center environments. Define strategies for hardware refresh, capacity scaling, and hybrid integration (on-prem to cloud). Ensure compute, storage, and server designs align with security, networking, and disaster recovery standards. Participate in infrastructure roadmap planning and technology selection.Implementation & Operations Oversee the installation, configuration, and maintenance of physical and virtual servers, including Windows and Linux systems. Manage hypervisors and virtualization clusters (VMware vSphere, Hyper-V, Proxmox, or similar). Maintain and monitor SAN, NAS, and hyper-converged infrastructure (HCI) solutions. Serve as final escalation point for complex incidents, root cause analysis, and recovery operations. Lead data center migrations, hardware refreshes, and platform upgrades with minimal downtime. Ensure OS hardening, patch management, and compliance with security baselines across Windows and Linux servers.Data Center & Hardware Management Manage lifecycle of physical infrastructure: servers, racks, PDUs, cabling, and environmental monitoring. Perform capacity and performance assessments of compute, memory, and storage resources. Collaborate with vendors for warranty, maintenance, and hardware replacement cycles. Develop documentation for data center topology, rack layouts, and asset tracking. Oversee installation and decommissioning of physical equipment.Governance & Standards Define and enforce configuration management and operational standards for infrastructure systems. Maintain detailed documentation of infrastructure designs, builds, and procedures. Ensure systems meet organizational compliance requirements (e.g., ISO 27001, POPIA, PCI-DSS). Establish SLAs, KPIs, and reporting metrics for infrastructure availability and performance.Leadership & Mentorship Mentor and guide L2-L3 engineers, providing training and career development support. Lead post-incident reviews, ensuring knowledge transfer and operational improvement. Collaborate with cross-functional teams across cloud, network, and security disciplines. Provide executive-level input on infrastructure budgets, projects, and technology direction.Required Technical ExpertiseCompute, Virtualization & Servers Expert-level knowledge of server platforms (Dell PowerEdge, HPE ProLiant, Lenovo, Supermicro, etc.). Extensive experience with VMware vSphere, Hyper-V, or Proxmox VE environments. Skilled in vMotion, HA/DRS clusters, templates, snapshots, and VM lifecycle management. Deep knowledge and hands-on experience supporting both Windows Server and Linux server environments in enterprise data centers. Familiar with container orchestration (Docker, Kubernetes) in hybrid or on-prem setups.Storage & Backup Proficiency in SAN/NAS technologies (iSCSI, Fibre Channel, NFS, SMB). Experience with enterprise storage arrays (Dell EMC, NetApp, HPE 3PAR, Synology, etc.). Knowledge of hyper-converged platforms (vSAN, Nutanix, StarWind, etc.). Integration expertise with enterprise backup systems (Veeam, Commvault, Rubrik).Operating Systems Deep knowledge of Windows Server (AD, DNS, DHCP, GPO) and Linux distributions (Ubuntu, RHEL, CentOS). Expertise in system hardening, patch automation, and configuration management. Understanding of Active Directory replication, FSMO roles, and domain trust design. Proficient in troubleshooting and performance tuning for Windows and Linux server environments.Data Center Operations Hands-on experience managing physical infrastructure in data centers. Familiarity with power management, UPS, cooling, and structured cabling standards. Understanding of rack elevation design and airflow optimization.Automation & Monitoring Experience with infrastructure automation tools (Ansible, Terraform, PowerShell DSC). Strong understanding of monitoring and observability platforms (PRTG, Zabbix, SolarWinds, Prometheus). Skilled in writing scripts for task automation, reporting, and performance analysis.Qualifications & Experience Bachelor's degree in Information Technology, Computer Science, or related field (Master's preferred) or Relevant Industry Experience. 12+ years of hands-on experience in server and infrastructure engineering, with 5+ years in a senior/principal role. Relevant certifications preferred: VMware VCP/VCAP, Microsoft Certified: Azure Administrator/Server Expert, Dell EMC Proven Professional, or HPE ASE. Proven experience leading complex data center and server transformation projects, including Windows and Linux servers.Soft Skills Strong analytical and troubleshooting skills in high-pressure situations. Excellent communication and documentation skills. Leadership and mentoring ability to guide multi-disciplinary technical teams. Strategic thinking with the ability to translate business needs into infrastructure solutions. Commitment to operational excellence and continuous improvement.
Job Application Tips
- Tailor your resume to highlight relevant experience for this position
- Write a compelling cover letter that addresses the specific requirements
- Research the company culture and values before applying
- Prepare examples of your work that demonstrate your skills
- Follow up on your application after a reasonable time period