Search by job, company or skills
Are you ready to get ahead in your career
Why does this job exist and why is it critical
The IT Surveillance, Incident Management, and Business Continuity Specialist is to is responsible to monitor & lead the incident management events including coordinating with relevant teams, declaring service level/impact, communicate with stakeholders by providing timely updates on incident status, resolution & postmortem review. The role focuses on ensuring the integrity, availability, and confidentiality of critical systems and data. It also includes developing, maintaining, and testing the Business Continuity Plan (BCP) for smooth recovery from disruptions.
What are you accountable for
Continuously monitor IT systems, applications, and networks to detect irregularities or threats.
Utilize monitoring tools (e.g., BMC, log management, network monitoring tools) to analyze system behavior and identify security or performance issues.
Investigate alerts, logs, and system anomalies to determine their impact and take appropriate action.
Generate and review reports related to system performance, availability, and security events.
Respond to and manage IT incidents, including system outages, security breaches, and application failures.
Coordinate cross-functional teams to troubleshoot, resolve, and mitigate incidents in a timely manner.
Maintain an incident management process and ensure documentation of incident reports and post-incident reviews (PIRs).
Escalate critical incidents to senior management and provide status updates during the lifecycle of the incident.
Conduct root cause analysis (RCA) for major incidents and implement corrective actions to prevent recurrence.
Ensure that Service Level Agreements (SLAs) are met during incident resolution.
Develop and maintain the organization's Business Continuity Plan (BCP), ensuring it is aligned with business priorities.
Conduct regular risk assessments and impact analysis to identify potential threats to IT operations.
Design strategies for disaster recovery, data backups, and redundancy to minimize downtime and data loss.
Test and update the BCP regularly, ensuring that all stakeholders are familiar with the procedures.
Lead BCP simulations and drills to ensure the readiness of the organization in case of emergencies.
Collaborate with key business units to ensure that their continuity requirements are met.
Identify potential IT risks (security, operational, or environmental) and develop mitigation strategies.
Work closely with cybersecurity teams to prevent and mitigate security threats such as malware, phishing attacks, and data breaches.
Maintain compliance with relevant IT and business continuity standards (e.g., ISO 27001, ISO 22301).
Proactive Issue Resolution - Implement automation for common tasks (e.g., password resets, access requests) and provide self-service options through a portal. This reduces waiting time and empowers users to solve minor issues independently.
AI-Driven Insights: Use analytics to predict common issues before they occur, allowing the service desk to proactively reach out to users or prepare knowledge base articles that address these concerns.
Comprehensive Documentation - Maintain a well-organized knowledge base that's regularly updated and easy to navigate, with guides, video tutorials, and FAQs.
User Training Sessions: Conduct periodic training sessions for end users to familiarize them with IT tools, cybersecurity best practices, and self-service options
What do you need to have for the role
Bachelor's degree in Information Technology, Computer Science, or related field.
Total 10-15 years of experience in IT Operations, Incident Management, or Business Continuity Planning.
Experience in incident response, disaster recovery, or security operations is highly preferred.
Strong understanding of IT infrastructure (servers, networks, databases, cloud services).
Familiarity with monitoring tools (e.g., BMC, SolarWinds, Nagios, Splunk).
Knowledge of incident management tools (e.g., BMC, Jira, PagerDuty).
Understanding of cybersecurity principles and common IT risks.
Experience with disaster recovery technologies (e.g., backups, data replication).
Strong troubleshooting skills with the ability to quickly diagnose issues.
Capable of conducting root cause analysis and implementing effective resolutions.
Excellent communication skills, both written and verbal.
Ability to communicate complex technical issues to non-technical stakeholders.
Strong leadership skills to manage teams during incident resolution or BCP drills.
Meticulous in monitoring, documentation, and implementation of risk management processes.
What's next
Maxis values diverse voices & people. We hire and reward our employees based on capability & performance - regardless of ethnicity, gender, age, education, religion, nationality or physical ability.
Date Posted: 27/10/2024
Job ID: 98212793