Site Reliability Engineer Job at LightEdge Solutions, Austin, TX

T3JzcFdwVW5sN25jamhtRWVYOEFETG9YQmc9PQ==
  • LightEdge Solutions
  • Austin, TX

Job Description

LightEdge Solutions is developing the IT solutions that will propel businesses forward over the next 10 years. Using a combination of shared and private/dedicated platforms, LightEdge has been successful in offering businesses alternatives that streamline operations, improve reliability and reduce costs.

If you are passionate about creating real solutions that help businesses with cutting-edge technology, want to be challenged to think out of the box and be in a position where you can impact change on a daily basis, then LightEdge can offer you a dynamic corporate environment built on teamwork and personal responsibility.

As a Site Reliability Engineer (SRE), you will be an integral part of the team at LightEdge Solutions. This position will report to the DevOps Manager, and will be responsible for reliable operation of the organization's systems and services. You will play a key role in identifying our monitoring strategy and vision across multiple products and work with a variety of teams to improve the accuracy of our monitoring systems.

Responsibilities:
  • Monitoring and Observability: Design and implement monitoring solutions to track the performance, availability, and health of various systems and services. Establish robust monitoring frameworks, set up alerts, and analyze system metrics to identify and resolve issues proactively.
  • Establish and align metrics, including SLAs, SLOs, and SLIs, to closely tie system performance to business objectives, ensuring that the site reliability engineering efforts support the overall goals and customer satisfaction.
  • Utilize AIOPS techniques to leverage automation in Incident Management and Response. Develop and maintain automated incident response systems that can detect and mitigate issues automatically. This includes automated incident triaging, remediation, and escalation workflows to minimize manual intervention and improve response times.
  • Leverage the IT Service Management (ITSM) platform's capabilities to integrate monitoring into incident management, change management, and other operational processes, enhancing the efficiency and effectiveness of site reliability engineering practices.
  • Working closely with IT functional owners & SME's.
  • Perform implementation, monitoring system administration and integration functions.
  • Tasks will consist of developing detailed designs, execution and troubleshooting of strategic solutions in support of effective monitoring, alerting, escalation, automation, reporting and event correlation

Experience
  • 5 years hands-on experience with enterprise monitoring solutions
  • Must possess knowledge of Network Switches, Server hardware, Storage, and Virtualization Technologies
  • Understanding of VMware Infrastructure
  • Experience working with variety of monitoring systems such as Zabbix, vRealize Operations Manager, Nagios and Science Logic
  • Experience and proficiency in integrating with ServiceNow or similar IT service management platforms.
  • Experience with managing automations within a monitoring environment.
  • Ability to provide guidance with design, maintenance, and improvements to enterprise level monitoring solutions.
  • Excellent verbal and written communication skills, ability to present complex ideas and designs to a variety of technical or non-technical stakeholders.
  • Experience with design, implementation, and support of monitoring tools in a complex, multi-platform environment.
  • High level of understanding monitoring requirements for Storage, Network, and Compute servers.

With over 20 years in business, LightEdge offers a full stack of best-in-class IT services delivering flexibility, security, and control. Our solutions include premier colocation across seven purpose-built data centers spanning Des Moines, IA, Kansas City, MO, Omaha, NE, Austin, TX, and Raleigh, NC, industry-leading private Infrastructure as a Service (IaaS) and cloud platforms, and the top global security and compliance measures. Our owned and operated facilities, integrated DR solutions, and premium compliant cloud choices make up a true Hybrid Cloud Solution Center. LightEdge annually undergoes third-party audits for ISO 20000-1, ISO 27001, HIPAA, PCI-DSS 3.2, and SSAE 18 SOC 1 Type II, SOC 2 Type II and SOC 3.

Job Tags

Similar Jobs

Brightter, Inc.

WordPress Developer - Multiple Positions Job at Brightter, Inc.

WordPress Developer - Multiple Positions Full-TimeRemoteOverview We are looking for a skilled WordPress developer to join our remote team. The ideal candidate will have a solid background in WordPress web development and a demonstrated ability to write clean, maintainable... 

The University of Alabama

Student Success Coach - 526374 Job at The University of Alabama

 ...Midpoint: $55,300 (Salaried E6) Department/Organization: 209307 - Student Success Normal Work Schedule: Monday - Friday 8:00am to 4:45pm;...  ...to University policy . Job Summary: The Student Success Coach supports students throughout the academic year by providing... 

Abbott Laboratories

Leadless Clinical Specialist, CRM - Greater Atlanta Area Job at Abbott Laboratories

 ...scientists. The Opportunity This position is a field-based position based in Greater Atlanta Area in the...  ...heartbeats. What Youll Work On The Regional Leadless Specialist provides technical, clinical, educational and sales support to ensure adoption and... 

CHRISTUS Health

Registered Nurse, Acute Orthopedics - PRN Job at CHRISTUS Health

Description Summary: The competent Nurse, in the same or similar clinical setting, practices independently and demonstrates an awareness of all relevant aspects of a situation. Provides routine and complex care, with the ability to on long-range goals or plans. Continues... 

The Potter's House

Paid Media Specialist - Marketing Job at The Potter's House

The Paid Media Specialist is responsible for developing, managing...  ...expertise in paid search, social media advertising, programmatic...  ...collaboratively with the marketing, creative, and analytics teams...  ...best practices. Test and experiment with different ad types, targeting...