Vacancy expired!
Company Description
Join us and make YOUR mark on the World!Are you interested in joining some of the brightest talent in the world to strengthen the United States' security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory's mission. Job Description We have an opening for a High Performance Computing (HPC) Development Operations (DevOps) Storage Engineer. You will combine software development and systems engineering with a focus on extreme scale high performance storage to provide systems capable of storing billions of files and hundreds of petabytes. You will work with a small team of DevOps engineers and developers to help architect, deploy, and manage the High Performance Storage Systems (HPSS) that provide a reliable, massively-distributed, long-term archival system for storing our irreplaceable data. This position is in the Livermore Computing (LC) Division within the Computing Directorate.This position will be filled at either level based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.In this role, you will- Perform hardware/software deployments, upgrades, configuration, monitoring, management, performance tuning, and ongoing support of HPSS in LC production archives.
- Perform software design, development, testing, and deployment of HPSS client interfaces.
- Troubleshoot, determine root cause, and fix complex storage system issues in a team of technical staff having different levels and areas of expertise.
- Apply site reliability engineering/systems engineering practices to manage and improve one or more production aspects of HPSS and underlying storage architecture (e.g., ZFS).
- Develop and maintain tools and utilities that aid in the operation, automation, and reliability of software-based administrative tasks associated with LC production archives.
- Monitor and manage general system health, security incidents, and other archive events.
- Participate in installation of software releases, patching of the various subsystems, and third-party utilities with emphasis on overall system reliability, availability and serviceability.
- Provide 24/7 customer support as a member of a rotating call list in a fast-paced and mission-critical environment.
- Perform other duties as assigned.
- Independently troubleshoot, determine root cause, and fix highly complex storage system issues that may involve interfacing with various technical staff across multiple organizations with differing levels of knowledge and expertise.
- Analyze and tune multiple aspects of archive service (e.g., database design, networks, large-scale disk and/or tape subsystems performance).
- Investigate, evaluate, test, and recommend technical solutions for future systems.
- Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship.
- Bachelor's degree in computer science or related field or the equivalent combination of education and related experience.
- Comprehensive skills performing Linux/UNIX or storage system administration, including software installations, updates and patching, configuration management, system security, networking, and/or storage allocation.
- Broad experience with software development using high-level programming language (e.g., C/C, Java, Python), and/or broad experience with system administration using shell scripting languages (e.g., Bash, Perl).
- Ability to engage with technical staff and end-users, requiring deep technical knowledge and critical thinking necessary to effectively work with members of the Scalable Storage Group, HPSS development community, other LC staff, LC end-users, and to represent the Laboratory publicly (e.g., user groups and technical conferences).
- Experience setting priorities and solving complex problems in a fast-paced, rapidly changing, customer-focused team environment with multiple competing priorities.
- Experience with software version control and configuration management systems, such as, Git, Subversion, Ansible, Puppet, etc.
- Proficient verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
- Significant experience with Linux/UNIX systems programming, large scale application debugging/testing techniques, and/or system administration in support of several independent but inter-related systems and software packages.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.
- Master's degree in Computer Science or related field.
- Experience with high performance computing, large scale data centers, HPSS and/or other mass storage systems.
- Knowledge of one or more storage system components (e.g., Spectra Logic or Oracle robotics, Oracle/IBM tape drives, ZFS, Qlogic HBAs, direct-attach fiber).
- Included in 2022Best Places to Work by Glassdoor!
- Work for a premier innovative national Laboratory
- Comprehensive Benefits Package
- Flexible schedules (depending on project needs)
- Collaborative, creative, inclusive, and fun team environment
Vacancy expired!