Production System Engineer, Infrastructure Engineering

ByteDance

Singapore
  • Job Type: Full-Time
  • Function: IT
  • Post Date: 06/21/2025
  • Website: bytedance.com
  • Company Address: Beijing, China, 100098

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Job Description

The Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable.
Embark on an exciting expedition to explore the rapidly expanding ByteDance domain in the United States, Europe, and Asia. Here, the Infrastructure Engineering team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers. As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers. Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory. But, like any epic tale, there will be times of challenge when you become a troubleshooter extraordinaire, mending and restoring with unwavering dedication. Eventually, you'll guide them into the sunset, orchestrating their decommissioning and ensuring their rebirth through recycling, all while contributing to the pulsating rhythm of ByteDance's technological evolution.
 
Key Responsibilities:
 
- Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale.
- Lifecycle Enhancement: Participate in and enhance the entire lifecycle of the server fleet - from system design/introduction consultation to launch reviews, deployment, operation, and retirement.
- Automation: Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter.
- Monitoring: Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health.
- Disaster Recovery: Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem.
- Cross-team Collaboration: Collaborate with stakeholders such as infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to comprehend overarching business objectives. Additionally, you will have the chance to design and implement innovative solutions for our Core IDCs and CDN/Edge.
- On-call: Engage in our on-call support spanning across regions and incident response teams to address critical issues in the production environment.
 
Qualifications
 
Minimum Qualifications
 
- Education: Bachelor's degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience.
- Experience: Minimal 3 years of experience in at least one of the areas below:
- Server Operations: Demonstrated proficiency in Linux system administration tasks. Possessed an in-depth comprehension of Linux kernels, drivers, and modules. Capable of scripting in Bash and Python to automate routine system operations, encompassing skills such as system configuration, performance tuning, and security management within the Linux environment.
- Had an in-depth understanding of server hardware, and was able to conduct troubleshooting or diagnostics. 3+ years of experience participating in the planning, delivery, and operation of large-scale data centers in different countries.
- Tooling Adaptation, Deployment, and Maintenance: Proficient in customizing operation and maintenance tools to satisfy specific demands for new server hardware. This encompasses tasks associated with facilitating the monitoring of server performance, effectively provisioning resources, timely handling of fault management, and conducting repairs to guarantee the smooth operation of new server hardware.
- Possessing over 3 years of experience in developing and maintaining hardware, network, or service monitoring software for more than 10,000 servers.
 
- Preferred Qualification:
 
- Data Center: An intermediate level of expertise is preferred. We are looking for individuals who are proficient in areas ranging from OS installations and break-fix operations to significant projects such as planning and operations (encompassing the entire infrastructure lifecycle), as well as new design-build or retrofit activities for existing systems.
- Proficiency in the operation and maintenance of GPU server is strongly preferred.
- Full Stack Software Development: Actively, we are in search of individuals proficient in full stack software development. The ideal candidates are expected to possess the following preferred skills:
- Be capable of creating and integrating RESTful APIs. This encompasses expertise in using Flask for Python-based back-end development to establish robust API endpoints.
- Have a profound understanding of JavaScript and be capable of leveraging it, along with Node.js, for both front-end and back-end development tasks.
- Demonstrate proficiency in SQL for efficient database management, including designing database schemas, composing queries, and ensuring data integrity; be familiar with Redis.
- Possess experience in Ansible Configuration Management, Application Deployment, and Task Execution.
 
Job Information
 
About Us
 
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.​
 
Why Join ByteDance
 
Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.​
 
As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.​
 
Diversity & Inclusion​
 
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

Related Jobs

Backend Engineer (Model Inference), Machine Learning System - 2025 Start

ByteDance - Singapore

Backend Engineer, Applied Machine Learning Platform - 2025 Start

ByteDance - Singapore

Software Engineer, Cloud Native Platform

ByteDance - San Jose, CA, US

Senior Software Development Engineer, Large Language Models & Generative AI

ByteDance - San Jose, CA, US

Site Reliability Engineering, Edge Services - Traffic Infrastructure

ByteDance - Singapore
Disclaimer: Local Candidates Only
This company does NOT accept candidates from outside recruiting firms. Agency contacts are not welcome.