This is a full-time, direct hire opportunity; can sit 100% remote within the United States (Texas preferred).
What you will be doing:
- Design, develop, and maintain Azure DevOps pipelines and build processes
- Bring Azure-specific skills, with proficiency in Azure CLI
- Manage cloud-native tech stacks with Kubernetes and Container Orchestration
- Drive monitoring and observability standards
- Lead automation testing frameworks, integrated into the CI/CD process
- Operate and maintain cloud-native tech stacks like Azure Kubernetes Service (AKS), KEDA and advanced proficiency with Helm for automating Kubernetes.
- Experience with Azure Monitor, including setting up alerts, and utilize APM tooling such as DataDog and/or Dynatrace for observability and performance monitoring.
- Expertise in Azure DevOps for managing pipelines and build processes.
- Familiarity with data pipelines and the ability to troubleshoot them effectively.
- Required experience with SonarQube and lint tools for code quality and security scanning.
- Lead unit, integration, and load testing practices with tools like SonarQube and lint integrated into the CI/CD process.
- Proficiency in Azure CLI and networking services like Azure Virtual Networks (VNets), Private Endpoints, DNS, Private DNS, and VMSS ScaleSets will be critical for managing our cloud infrastructure.
- Maintains all server-side cloud and Security documentation to ensure compliance with the government’s cyber security policy.
- Develops strategies to manage support package releases and patch management.
- Monitors cloud infrastructure and VM system maintenance and provides technical recommendations for improvements.
What we are looking for:
- 5+ years of experience
- Bachelor’s degree or comparable work experience
- Close collaboration with DevSecOps teams to integrate security best practices into DevOps processes.
- Proficiency in Python and Bash scripting for automation tasks and managing infrastructure.
- Experience in supporting centralized management and automation systems using Ansible playbooks and ad-hoc commands.
- Familiarity with data pipelines and the ability to troubleshoot them effectively
- In-depth knowledge of Linux based operating systems and services from the command line.
- Familiarity with industry best practices for Linux server security; ability to implement organizational security standards on the Linux platform
- Experience with automated software update mechanisms such as yum, up2date and apt-get
- A SRE (Site Reliability Engineering) mindset, to ensure a strong focus on reliability, scalability, and continuous improvement within the operation.
- Thorough understanding and workings of incident handling protocols, personally identifiable information, Server network procedures and information handling
- Thorough understanding and workings of cloud Server models: IaaS, PaaS, DaaS, SaaS
- Experience migrating existing applications to cloud
- Good consultative, communication, analytical and judgement skills
- Experience with implementing and maintaining highly available mission critical environments, disaster recovery testing and creating technical documentation
- Demonstrated strong verbal and written communication and interpersonal skills; attention to detail and accuracy; and time management and organizational skills