Principle Digital Solution Architect job vacancy

Vacancy expired!

Job Description: HPC Engineer responsible to design, implement and support HPC services including Compute, Network, Storage on AWS. Collaborate with global research team to deliver the healthy services adhering to standards.

Responsibilities:

Responsible to architect a framework that is more readily available and demonstrate ease of use. When factoring new architecture make build v/s buy decision and consider cost aspects.
Work in coordination with other internal teams to ensure the infrastructure fully and effectively supports current and planned application systems
Troubleshoot OS, Networking, Storage, and Software issues while leveraging internal teams for solutions.
Deliver changes to the HPC production platforms according to the change control process. Communicating and seeking approvals from business owners.
Practice network asset management, including maintenance of network component inventory and related documentation
Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.
Maintain asset lists of all servers, applications and licensing ensuring compliancy.
Maintain security standards according to internal policies.
Execute the day-to-day activities of the Incident Management process
Manage and respond to tickets/requests in accordance with SLA timeframes.
Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.

Required Experience:

3-5 years of experience in High Performance Computing System Administration
Minimum of 2+ years of customer facing experience in HPC and AWS

Required Skills:

Working knowledge of complete HPC stack in building a high availability infrastructure
Demonstrated experience in AWS Cloud platform
Strong understanding and hands experience in deploying, troubleshooting issues with Compute, Networking, Storage, Database services on AWS
Experience in designing & implementing of HPC Clusters using AWS Parallel cluster
Experience in DevOps tools like Ansible Tower, Bitbucket, Terraform, and CloudFormation etc.
Experience in Linux Administration various distributions like Redhat, Amazon, CentOS
Team player with willingness to work in 24x7 environment
Strong verbal and written communications skills are a must
Familiarity with Cloud platforms, products, and tools.
Experience with job schedulers like Grid Engine, LSF, PBS, SLURM, Torque, Symphony, TIBCO.
Experience with compilers and libraries such as MPI, GCC, CUDA etc.
Experience with scripting (bash, Python, PowerShell, etc.).
Experience in Filesystem's like NFS, Lustre/GPFS, etc.,
Experience in Application installations and troubleshooting on HPC Clusters based on CPU, GPU.

Certifications (Desirable): 1. AWS Administrator Associate or up2. Linux Administration

Optional Skills:

Docker, Singularity, Kubernetes, Google Cloud Platform will be a plus.
Knowledge of distributed computing
Ansible, Jira, Confluence, Service Now, Excel, Presentation Skills
Worked on building clusters with individual machines (not a service like EMR etc.)

Education: BS or higher degree in Computer Science or other equivalent engineering disciplines from a reputable college.

Vacancy expired!

ID	#14857779
State	Massachusetts
City	Holliston
Job type	Contract
Salary	USD Depends on Experience Depends on Experience
Source	Atlantic Partners
Showed	2021-06-01
Date	2021-05-21
Deadline	2021-07-20
Category	Architect/engineer/CAD
Create resume

Job Details

Principle Digital Solution Architect