Job Details

ID #14857779
State Massachusetts
City Holliston
Job type Contract
Salary USD Depends on Experience Depends on Experience
Source Atlantic Partners
Showed 2021-06-01
Date 2021-05-21
Deadline 2021-07-20
Category Architect/engineer/CAD
Create resume

Principle Digital Solution Architect

Massachusetts, Holliston, 01746 Holliston USA

Vacancy expired!

Job Description: HPC Engineer responsible to design, implement and support HPC services including Compute, Network, Storage on AWS. Collaborate with global research team to deliver the healthy services adhering to standards.

Responsibilities:
  • Responsible to architect a framework that is more readily available and demonstrate ease of use. When factoring new architecture make build v/s buy decision and consider cost aspects.
  • Work in coordination with other internal teams to ensure the infrastructure fully and effectively supports current and planned application systems
  • Troubleshoot OS, Networking, Storage, and Software issues while leveraging internal teams for solutions.
  • Deliver changes to the HPC production platforms according to the change control process. Communicating and seeking approvals from business owners.
  • Practice network asset management, including maintenance of network component inventory and related documentation
  • Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.
  • Maintain asset lists of all servers, applications and licensing ensuring compliancy.
  • Maintain security standards according to internal policies.
  • Execute the day-to-day activities of the Incident Management process
  • Manage and respond to tickets/requests in accordance with SLA timeframes.
  • Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.

Required Experience:
  • 3-5 years of experience in High Performance Computing System Administration
  • Minimum of 2+ years of customer facing experience in HPC and AWS

Required Skills:
  • Working knowledge of complete HPC stack in building a high availability infrastructure
  • Demonstrated experience in AWS Cloud platform
  • Strong understanding and hands experience in deploying, troubleshooting issues with Compute, Networking, Storage, Database services on AWS
  • Experience in designing & implementing of HPC Clusters using AWS Parallel cluster
  • Experience in DevOps tools like Ansible Tower, Bitbucket, Terraform, and CloudFormation etc.
  • Experience in Linux Administration various distributions like Redhat, Amazon, CentOS
  • Team player with willingness to work in 24x7 environment
  • Strong verbal and written communications skills are a must
  • Familiarity with Cloud platforms, products, and tools.
  • Experience with job schedulers like Grid Engine, LSF, PBS, SLURM, Torque, Symphony, TIBCO.
  • Experience with compilers and libraries such as MPI, GCC, CUDA etc.
  • Experience with scripting (bash, Python, PowerShell, etc.).
  • Experience in Filesystem's like NFS, Lustre/GPFS, etc.,
  • Experience in Application installations and troubleshooting on HPC Clusters based on CPU, GPU.

Certifications (Desirable): 1. AWS Administrator Associate or up2. Linux Administration

Optional Skills:
  • Docker, Singularity, Kubernetes, Google Cloud Platform will be a plus.
  • Knowledge of distributed computing
  • Ansible, Jira, Confluence, Service Now, Excel, Presentation Skills
  • Worked on building clusters with individual machines (not a service like EMR etc.)

Education: BS or higher degree in Computer Science or other equivalent engineering disciplines from a reputable college.

Vacancy expired!

Subscribe Report job