Job Details

Site Reliability Engineer



Required Skills

    Linux shell

Infinity Consulting Solutions, Inc



Job Description

Site Reliability Engineer SRE

$100K $135K

Direct Hire

Job Description:

Site Reliability Engineers (SREs) are responsible for keeping all Systems up and running reliably.

We have a mindset for constant improvement and implementing good practices for running our distributed systems.

SREs will use a combination of infrastructure knowledge paired with software practices to automate and engineer solutions to achieve our goals.

SREs work in a DevOps environment with close collaboration to software engineers and traditional infrastructure groups.

SREs maintain a diverse set of technologies including in-house developed SaaS platforms for our customers, IoT for our electronic monitoring devices, cloud and traditional datacenters.


The following list shows skillsets that an ideal candidate would have.

These are not all required, but you will need a majority of these to be successful in the position.

Cloud experience (IaaS and PaaS, preferably in Azure)

Windows Server experience (2012 - 2019)

Linux shell experience

Programming experience ideally in one of the following languages (Python, Ruby, Go, C#)

Infrastructure as Code (Terraform, Ansible)

Container experience (Docker, Kubernetes)

Layer 7 reverse proxy / Load Balancing experience (NGINX, Azure Application Gateway, F5)

Strong troubleshooting skills to debug and troubleshoot during production issues

A mindset for constant improvement and documentation

Initiative to fix things that you find are broken


Assist with CI/CD builds and releases of our software and infrastructure

Monitoring and alerting of our production environment

GitOps approach for Infrastructure as Code (IaC)

Enable developers with good practices

Help with containerization efforts for production systems

Build and manage cloud environments

Help track and maintain uptime for production systems

Help lead and document Root Cause Analysis (RCA)

If there is an issue, fix the problem once so we never have it again

Participate in an on-call rotation to triage and fix production issues

Experience and Skills:


Education/Certification :

College education preferred, but not required

Technical certification preferred in cloud, networking, or server technologies

Experience Required :

Technical experience troubleshooting servers, networks, and applications

Experience or familiarity in at least one of the following tools/technologies (Ansible, Terraform, Helm, Docker, Kubernetes, Git)

PREFERRED Experience :

Azure Cloud experience (IaaS and PaaS)

Load Balancing experience

Experience running distributed systems

Physical Requirements/Work Environment:

Will be on an on-call rotation and need to respond to pager alerts after hours.

Information Technology

No Preference
FullTime Job

Candidate Requirements

Walkin Information

Recruiter Details
Doug Klares
1350 Broadway, Suite 2205, NEW YORK-10018, NY