Job Details

ID #15167087
State Massachusetts
City Holliston
Job type Permanent
Salary USD Depends on Experience Depends on Experience
Source Atlantic Partners
Showed 2021-06-06
Date 2021-05-21
Deadline 2021-07-20
Category Et cetera
Create resume

HPC Architect

Massachusetts, Holliston, 01746 Holliston USA

Vacancy expired!

Job Description: HPC Architect this is offered as a full time position, or contract.

The HPC Cloud Architect will administer high performance scientific computing platforms, infrastructure, and support research projects.

Responsibilities:
  • Utilize your experience in multiple disciplines including high performance computing (HPC), cloud, architecture, design, network, security and systems to implement and provide advanced system engineering services to customer.
  • Manage, administer and support daily operation of computing systems both onsite and in the cloud.
  • Design, implement and maintain scalable High-Availability (HA) and Fault-Tolerant (FT) computing systems.
  • Following the best cloud computing practice by utilizing Amazon Virtual Private Cloud (VPC), Amazon Elastic Computing Cloud (EC2) and other advanced technical cloud features.
  • Investigate and provide technical options to managers and researchers for selecting effective computing solutions based on requirements.
  • Responsible to architect a framework that is more readily available and demonstrate ease of use. When factoring new architecture make build v/s buy decision and consider cost aspects.
  • Work in coordination with other internal teams to ensure the infrastructure fully and effectively supports current and planned application systems.
  • Troubleshoot OS, Networking, Storage, and Software issues while leveraging internal teams for solutions.
  • Deliver changes to the HPC production platforms according to the change control process. Communicating and seeking approvals from business owners.
  • Practice network asset management, including maintenance of network component inventory and related documentation
  • Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.
  • Maintain asset lists of all servers, applications and licensing ensuring compliancy.
  • Maintain security standards according to internal policies.
  • Execute the day-to-day activities of the Incident Management process
  • Manage and respond to tickets/requests in accordance with SLA timeframes.
  • Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.

Required Experience:
  • 8-10 years of hands-on systems administration/engineering experience with Linux.
  • Experience with high performance computing systems in Life Sciences will be added advantage, Engineering, Manufacturing or Financial Services
  • Minimum three years with Amazon Web Services (AWS) cloud computing.
  • Extensive administration experience in GPU-based platforms.
  • Excellent written and oral communication skills and ability to work with people at every level.

Required Skills:
  • Demonstrated experience in optimizing computing performance and measurement.
  • Comprehensive knowledge of security compliance and security control.
  • Proficient skills in shell scripting, Ruby, Perl or Python.
  • Excellent organization and time management skills and ability to identify priorities to accomplish a variety of tasks simultaneously.
  • Comprehensive knowledge in Configuration Management (CM) process and software development tools such as Git, GitLab, Nexus, Jenkins, Maven or JIRA.
  • Working knowledge of HPC schedulers and distributed/parallel file systems, underlying IT systems, and the HPC development process, high throughput and tight coupling approaches
  • Knowledge of statistics, numeric modeling, data analyzing and machine learning.
  • AWS certification at Professional level.
  • Experience with cloud CLI and SDK.
  • An understanding of the cloud computing delivery model as it relates to HPC
  • Knowledge of the underlying infrastructure requirements such as Networking, Storage, and Hardware Optimization.
  • Experience in a customer-facing, sales-aligned role such as consultant, solutions engineer or solutions architect
  • Track record of implementing AWS services in a variety of business environments such as large enterprises and start-ups.
  • AWS Certification, eg. AWS Solutions Architect Associate
  • Understanding of application, server, and network security
  • Experience in DevOps tools like Ansible Tower, Bitbucket, Terraform, and CloudFormation etc.
  • Experience in Linux Administration various distributions like Redhat, Amazon, CentOS
  • Experience with job schedulers like Grid Engine, LSF, PBS, SLURM, Torque, Symphony, TIBCO.
  • Experience with compilers and libraries such as MPI, GCC, CUDA etc.
  • Experience with scripting (bash, Python, PowerShell, etc.).
  • Experience in Filesystem's like NFS, Lustre/GPFS, etc.,
  • Experience in Application installations and troubleshooting on HPC Clusters based on CPU, GPU.

Certifications (Desirable):
  • AWS Administrator Professional or up
  • Linux Administration

Optional Skills:
  • Docker, Singularity, Kubernetes, Google Cloud Platform will be a plus.
  • Knowledge of distributed computing
  • Ansible, Jira, Confluence, Service Now, Excel, Presentation Skills
  • Worked on building clusters with individual machines (not a service like EMR etc

Vacancy expired!

Subscribe Report job