Vacancy expired!
Company Description
Join us and make YOUR mark on the World!Are you interested in joining some of the brightest talent in the world to strengthen the United States' security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory's mission. Job Description We have an opening for a High Performance Computer (HPC) System Engineer to support one of the largest supercomputer centers in the world. You will work in a challenging and team-oriented environment supporting Livermore Computing's (LC) high performance computing clusters. You will apply fundamental knowledge of HPC systems and contribute to technical projects using creativity and imagination. This position is in the Livermore Computing Division within the Computing Directorate.This position will be filled at eitherlevel based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.In this roleyou will- Administer and deploy multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Contribute to the deployment, configuration, and management of high-speed cluster fabrics for computer and storage networks.
- Conduct installations of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
- Improve the quality of service for end users working with system administrators, Hotline, and Operations staff.
- Analyze, diagnose, and respond to system problems and user questions in person, via email and trouble ticket system, while collaborating with other team members.
- Troubleshoot and determine root cause of system issues with limited complexity.
- Perform other duties as assigned.
- Manage and deploy multiple RAID controllers and disk enclosures systems.
- Analyze performance and implement moderately complex strategies to improve the operation and efficiency of the computer, network, file system, and disk sub-systems.
- Develop and maintain programs and scripts that aid in the operation and automation of administrative tasks.
- Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship.
- Bachelor's degree in Computer Science or related field or the equivalent combination of education and related experience.
- Fundamental experience with Linux/Unix systems including installation, configuration, networking, backups, updates and patching, and system security.
- Understanding of programming and scripting languages, such as, C/C, Java, Perl, Python, and bash/csh/ksh.
- Experience with version control and configuration management systems, such as, git, Ansible, and cfengine.
- Experience with disk and storage systems, such as, host based RAID controllers, software RAID and vendor RAID systems.
- Experience with virtualization environments, such as KVM and VMware vSphere ESXi 6.7/7.x.
- Experience using and developing solutions and plugins for host monitoring systems, such a,s Splunk or Nagios.
- Sufficient communication and interpersonal skills necessary to effectively work with members of the system administration group, application developers, LC staff, and end users.
- Ability to serve on a rotating off-hours on-call list.
- Comprehensive knowledge of HPC environments and HPC technologies, such as, Infiniband, Slurm, Lustre.
- Experience with local, parallel, and distributed file systems, such as, Ext4, XFS, NFS, ZFS, Lustre and GPFS.
- Ability to work with limited direction in a dynamic environment with competing priorities.
- Master's degree in Computer Science or related field.
- Experience developing software with C/C or Python within Linux or UNIX environments.
- Experience with Container technologies (e.g., singularity, docker, podman) and Kubernetes,
- Included in 2022Best Places to Work by Glassdoor!
- Work for a premier innovative national Laboratory
- Comprehensive Benefits Package
- Flexible schedules (depending on project needs)
- Collaborative, creative, inclusive, and fun team environment
Vacancy expired!