Vacancy expired!
- Utilize your experience in multiple disciplines including high performance computing (HPC), cloud, architecture, design, network, security and systems to implement and provide advanced system engineering services to customer.
- Manage, administer and support daily operation of computing systems both onsite and in the cloud.
- Design, implement and maintain scalable High-Availability (HA) and Fault-Tolerant (FT) computing systems.
- Following the best cloud computing practice by utilizing Amazon Virtual Private Cloud (VPC), Amazon Elastic Computing Cloud (EC2) and other advanced technical cloud features.
- Investigate and provide technical options to managers and researchers for selecting effective computing solutions based on requirements.
- Responsible to architect a framework that is more readily available and demonstrate ease of use. When factoring new architecture make build v/s buy decision and consider cost aspects.
- Work in coordination with other internal teams to ensure the infrastructure fully and effectively supports current and planned application systems.
- Troubleshoot OS, Networking, Storage, and Software issues while leveraging internal teams for solutions.
- Deliver changes to the HPC production platforms according to the change control process. Communicating and seeking approvals from business owners.
- Practice network asset management, including maintenance of network component inventory and related documentation
- Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.
- Maintain asset lists of all servers, applications and licensing ensuring compliancy.
- Maintain security standards according to internal policies.
- Execute the day-to-day activities of the Incident Management process
- Manage and respond to tickets/requests in accordance with SLA timeframes.
- Develop tools to deploy, manage, monitor, and troubleshoot HPC systems at scale.
- 8-10 years of hands-on systems administration/engineering experience with Linux.
- Experience with high performance computing systems in Life Sciences will be added advantage, Engineering, Manufacturing or Financial Services
- Minimum three years with Amazon Web Services (AWS) cloud computing.
- Extensive administration experience in GPU-based platforms.
- Excellent written and oral communication skills and ability to work with people at every level.
- Demonstrated experience in optimizing computing performance and measurement.
- Comprehensive knowledge of security compliance and security control.
- Proficient skills in shell scripting, Ruby, Perl or Python.
- Excellent organization and time management skills and ability to identify priorities to accomplish a variety of tasks simultaneously.
- Comprehensive knowledge in Configuration Management (CM) process and software development tools such as Git, GitLab, Nexus, Jenkins, Maven or JIRA.
- Working knowledge of HPC schedulers and distributed/parallel file systems, underlying IT systems, and the HPC development process, high throughput and tight coupling approaches
- Knowledge of statistics, numeric modeling, data analyzing and machine learning.
- AWS certification at Professional level.
- Experience with cloud CLI and SDK.
- An understanding of the cloud computing delivery model as it relates to HPC
- Knowledge of the underlying infrastructure requirements such as Networking, Storage, and Hardware Optimization.
- Experience in a customer-facing, sales-aligned role such as consultant, solutions engineer or solutions architect
- Track record of implementing AWS services in a variety of business environments such as large enterprises and start-ups.
- AWS Certification, eg. AWS Solutions Architect Associate
- Understanding of application, server, and network security
- Experience in DevOps tools like Ansible Tower, Bitbucket, Terraform, and CloudFormation etc.
- Experience in Linux Administration various distributions like Redhat, Amazon, CentOS
- Experience with job schedulers like Grid Engine, LSF, PBS, SLURM, Torque, Symphony, TIBCO.
- Experience with compilers and libraries such as MPI, GCC, CUDA etc.
- Experience with scripting (bash, Python, PowerShell, etc.).
- Experience in Filesystem's like NFS, Lustre/GPFS, etc.,
- Experience in Application installations and troubleshooting on HPC Clusters based on CPU, GPU.
- AWS Administrator Professional or up
- Linux Administration
- Docker, Singularity, Kubernetes, Google Cloud Platform will be a plus.
- Knowledge of distributed computing
- Ansible, Jira, Confluence, Service Now, Excel, Presentation Skills
- Worked on building clusters with individual machines (not a service like EMR etc
Vacancy expired!