SoC RAS Design Tech Lead, Machine Learning Accelerators
Job Type
Job Details
- Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, a related field, or equivalent practical experience.
- 10 years of experience with industry-standard tools, languages and methodologies relevant to the development of silicon-based ICs and chips.
- 3 years of experience working with system and hardware teams in defining the RAS requirements and architecture.
- Experience in computer architecture, logic design and leading block or subsystem level RTL development.
Preferred qualifications:
- Master's degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture, or a related field.
- 12 years of experience in SOC architecture and design, including 6 years of experience architecting and designing RAS features.
- Experience of SOC subsystem level logic redundancy design and test architecture.
- Understanding of circuit level SER (Soft Error Rate) modeling, measurement and mitigation techniques.
- Understanding of error coding techniques and design experience of ECC implementations.
- Understanding of SDC, DUE and DCE, and associated metrics, analysis and calculations.
In this role, you will join a team working on building SOC design for our data center accelerators. As a RAS SOC Design Technical Lead, you will own and lead the requirement definition, architecture, microarchitecture and the development of the SOC RAS features. This is a highly cross-functional role that requires a high-level of coordination and co-design with our platform and system hardware counterparts. You will have experience in RAS, computer architecture and logic design, and have a propensity for leading multi-faceted efforts involving many stakeholders.
Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.
- Define the architecture and microarchitecture of RAS features of TPU SOCs.
- Lead the design and implementation of the RAS features.
- Collaborate with Platform team and co-design the SOC level RAS requirements.
- Be responsible for setting the DCE (Detectable and Correctable Errors), DUE (Detected but Unrecoverable Errors) and SDC (Silent Data Corruption) goals, DPPM goals for TPUs.