Job Description
TO BE CONSIDERED FOR THIS POSITION YOU MUST HAVE AN ACTIVE TS/SCI W/ POLYGRAPH SECURITY CLEARANCE (U.S. CITIZENSHIP REQUIRED) This opening is for a Site Reliability Engineer 3 that has development and system administration experience with large systems who can use their experience to formulate and implement automation solutions to support our monitoring and system administration teams in tasks that either are risky to the system, prone to mistakes, labor intensive, time consuming and/or repetitive. The tasks can include that for which an SOP currently exists or that can be developed, but is likely not to be followed consistently. The task is to create sustainable tools as a force multiplier that don't function more poorly than the manual methods. Experience with the pros and cons of tools like SALT and PUPPET will be useful for some tasks but not for other tasks where the team might build a GUI for the shift to perform tasks on the clusters (or to automate those tasks entirely) which will require development skills.
The Site Reliability Engineer provides support in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution. Provides support for highly distributed, massively parallel computation needs such as Hbase, Hadoop, Acumulo, Big Table, Cassandra, Scality et cetera.
Cloud Systems Administrator or Developer Certification.
Bachelor's Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to two (2) years of experience. A Master's degree in a Technical Field will be considered equivalent to four (4) years of experience. NOTE: A degree in Mathematics, Information Systems, Engineering, or similar degree will be considered as a technical field.
•Ten (10) years demonstrated experience developing software for one of the following: UNIX, or Linux OS.
•Knowledge and experience with developing distributed storage routing and querying algorithms.
•Experience in developing documentation required to support a program's technical issues and training situations.
•Ten (10) years of experience developing software systems using object- oriented programming languages (i.e. Java, Python, et cetera).
•Experience developing solutions integrating and extending COTS products.
•Demonstrated knowledge of analytical needs and requirements, query syntax, data flows, and traffic manipulation.
•Ten (10) years of experience in developing system performance, availability, scalability, manageability, and security requirements for mid-to-large scale programs.
•Experience designing, developing, testing, evaluating, and integrating information systems into a services oriented environment.
•Experience optimizing storage, retrieval, backup, and retention strategies across globally distributed, high throughput, text and multimedia storage within clustered or cloud environments.
•Experience operating in a multi-thread environment.
•Experience debugging and troubleshooting complex software in a cloud environment.
•Familiarity with Configuration Management and monitoring tools.
•Familiarity with Agile software methodologies and practices.
Significant experience provisioning and sustaining network infrastructures and have experience developing, operations, and managing networks required operating in a secure PKI, IPSEC, or VPN enabled environment.
- Shall have fourteen (14) years of experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution.
- Shall have ten (10) years experience in system engineering/architecture.
- Shall have ten (10) years experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, Acumulo, Big Table, Cassandra, Scality et cetera.
- At least ten (10) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation.
- At least four (4) years of experience managing and monitoring large Cloud System (>1000 nodes).
- Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management
Job Tags
Shift work,