Operation & Maintenance Development Engineer (SRE)

Aethir

Early Applicant

5 months ago
Be among the first 50 applicants

Exp: 0-2 Years

Malaysia, Kuala Lumpur

Job Description

Aethir is the only Enterprise-grade AI-focused GPU-as-a-service provider in the market. Its decentralized cloud computing infrastructure allows GPU providers (containers) to meet Enterprise clients who need powerful GPU chips for professional AI/ML tasks. Thanks to a constantly growing network of over 40,000 top-shelf GPUs, including 3,000 NVIDIA H100s, Aethir is able to provide enterprise-grade GPU computing wherever it's needed, at scale.

Backed by leading Web3 investors like Framework Ventures, Merit Circle, Hashkey, Animoca Brands, Sanctor Capital, Infinity Ventures Crypto (IVC), and others, with over $130M in funds raised for the ecosystem, Aethir is paving the way for the future of decentralized computing.

We are looking for an operations and maintenance development engineer (SRE) to join our new headquarters in Kuala Lumpur, Malaysia, who will play a critical role in monitoring, troubleshooting, and optimizing our production system to ensure the highest levels of performance and stability for our AI and gaming customers worldwide.

Responsibilities

Monitor, Review, and Respond to Faults: Take on the responsibility of monitoring, reviewing, responding to faults, troubleshooting, resolving, and subsequently optimizing the production system
System Architecture and Performance: Continuously monitor and review the system architecture, process logic, system performance, stability, and other technical areas and indicators to ensure their rationality
Coordination with Business Team: Drive the business team in resolving any issues related to operations and maintenance
Production Failure Response: Respond promptly to production failures, acting as the overall coordinator for resolution
Collaborative Problem-Solving: Organize relevant R&D, operations and maintenance, and product teams to collaboratively investigate and resolve problems
Failure Response Time: Responsible for the failure response time and resolution time, ensuring timely resolution of issues
Case Studies and Optimization: Conduct case studies on production issues and follow up with optimizations to improve system performance and stability
Documentation: Maintain comprehensive documentation of system architecture, processes, and troubleshooting procedures
Continuous Improvement: Identify areas for improvement in the operations and maintenance processes and implement necessary changes

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
Experience in operations and maintenance development, preferably in a cloud computing or AI-focused environment
Strong understanding of system architecture, performance monitoring, and troubleshooting methodologies
Excellent communication and collaboration skills
Ability to work in a fast-paced, startup environment

More Info

Industry:Other

Function:cloud computing

Job Type:Permanent Job

Skills Required

troubleshooting methodologies

Performance Monitoring

System Architecture

Cloud Computing

Date Posted: 29/05/2024

Job ID: 80248007

Report Job

About Company

AethirJob Source: www.linkedin.com

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Application Engineer Operation Maintenance

Darco Water Systems Sdn BhdCompany Name Confidential

0-2 yrs

Johor, Johor Bahru, Malaysia

1 months ago

Customer Support Engineer

Helius Technologies Sdn Bhd Company Name Confidential

1-5 yrs

Malaysia, Kuala Lumpur

1 months ago

Last Updated: 23-11-2024 06:54:57 PM

Home Jobs in Malaysia Operation & Maintenance Development Engineer (SRE)

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile

Operation & Maintenance Development Engineer (SRE)

Job Description

More Info

Skills Required

About Company

Similar Jobs

Application Engineer Operation Maintenance

Customer Support Engineer

Senior System Operation and Maintenance Engineer INT05082024 1

Senior Business Development Manager

Operation Maintenance Engineer Technician

Solution Architect Assistant Engineer

OPERATION MAINTENANCE ENGINEER

Desktop Operation amp Maintenance

Field Engineer Development Program FEDP

New Product Development Engineer Manufacturing

Fresh Grad ONLY Engineer Software Development Engineering Embedded

Junior Cloud Engineer Remote

Senior Staff Senior Engineer Embedded Software Development

Cloud Engineer

MS Engineer L1 Japanese and English Bilingual