Job details

DESCRIPTION

This position is not available in GPP database. Talent Acquisition team member will fill in the Posting description after intake meeting.

RESPONSIBILITIES

This position is not available in GPP database. Talent Acquisition team member will fill in the Posting description after intake meeting.

QUALIFICATIONS

Job title: AI Lab - LLM Applied Evaluation and Benchmark Intern

The Cummins AI Lab is seeking undergraduate or graduate students to participate in the evaluation and benchmarking of Large Language Model (LLM) applications. The role focuses on testing, analyzing, and improving LLM-based applications such as Chatbot and ChatBI systems in automotive and industrial domains.

Job summary:

• Design and execute evaluation frameworks for LLM-based applications

• Build and maintain benchmarking datasets and evaluation metrics for LLM applications

• Test and validate AI Agent-based applications in real-world scenarios

• Analyze and improve the quality, reliability, and robustness of LLM systems

Key responsibilities:

Design and execute LLM application evaluation: Develop systematic evaluation plans for Chatbot and ChatBI applications, including functional, logical, and edge-case testing
Build benchmarking datasets and metrics: Construct evaluation datasets and define metrics (e.g., accuracy, robustness, consistency, hallucination rate) to assess model performance
Develop and test AI Agents: Participate in building and testing Agent-based applications, validating multi-step reasoning and tool usage capabilities
Perform model output evaluation: Conduct both human evaluation and automated evaluation (e.g., LLM-as-a-judge) to assess response quality
Analyze model behavior and issues: Identify problems such as hallucinations, logical inconsistencies, and bias, and provide actionable insights for improvement
Data processing and desensitization: Perform data cleaning, anonymization, and masking to ensure compliance with data privacy requirements
Write evaluation reports: Document evaluation methodologies, findings, and recommendations, and communicate insights to stakeholders
Present findings to team members and stakeholders throughout and after the internship

Qualifications and competencies:

Undergraduate or graduate student in Computer Science / Data Science / Artificial Intelligence or related disciplines
Familiarity with software testing methodologies: Understanding of test case design, boundary testing, and logical validation
Strong analytical and logical thinking skills: Ability to design structured evaluation scenarios and identify system weaknesses
Familiarity with Large Language Model (LLM) concepts: Understanding of Prompt Engineering, RAG, Agent frameworks, etc.
Experience with Chatbot or data Q&A systems is preferred
Familiarity with Python or other programming languages: Ability to perform basic data processing and analysis
Understanding of data privacy and desensitization techniques: e.g., anonymization, masking
Familiarity with AI frameworks/tools (e.g., LangChain, LlamaIndex, OpenAI API) is a plus
Experience with SQL or data analysis is a plus (especially for ChatBI scenarios)
Good learning ability, problem-solving skills, and teamwork spirit

职位名称： AI Lab - 大模型应用测评与基准实习生

康明斯AI Lab正在寻求本科或研究生参与大语言模型（LLM）应用的评测与基准体系建设。该岗位聚焦于Chatbot、ChatBI等AI应用的测试、分析与优化，支持其在汽车与工业场景中的落地。

工作概要：

设计并执行大模型应用的评测体系
构建大模型应用的评测基准与数据集
测试与验证基于AI Agent的应用系统
分析并提升大模型系统的质量、稳定性与鲁棒性

主要职责：

设计并执行大模型应用评测：针对Chatbot与ChatBI等应用，设计系统化测试方案，包括功能测试、逻辑测试与边界测试
构建评测数据与指标体系：构建评测数据集，并设计评估指标（如准确率、鲁棒性、一致性、幻觉率等）
参与Agent应用搭建与测试：参与AI Agent系统的构建与测试，验证多步骤推理与工具调用能力
执行模型输出评估：结合人工评测与自动评测（如LLM-as-a-judge）评估模型输出质量
分析模型问题与行为：识别模型幻觉、逻辑错误、偏差等问题，并提出优化建议
数据处理与脱敏：进行数据清洗与脱敏（如匿名化、掩码处理），确保符合数据隐私要求
撰写评测报告：输出评测方法、结果分析及优化建议，并向相关方汇报
向团队成员和相关方展示成果与洞察

资格和能力：

计算机科学 / 数据科学 / 人工智能或相关专业本科或研究生在读
熟悉软件测试方法：了解测试用例设计、边界测试与逻辑验证
具备良好的逻辑分析能力：能够系统性设计评测场景并识别问题
熟悉大语言模型（LLM）相关技术：如Prompt工程、RAG、Agent框架等
有Chatbot或数据问答相关经验者优先
熟悉Python或其他编程语言：具备基础数据处理能力
了解数据隐私与脱敏方法：如匿名化、数据掩码等
熟悉AI开发框架（如LangChain、LlamaIndex、OpenAI API）者优先
有SQL或数据分析经验者优先（适用于ChatBI场景）
具备良好的学习能力、问题解决能力和团队合作精神

Job Systems/Information Technology

Organization Cummins Inc.

Role Category On-site with Flexibility

Job Type Student - Internship

ReqID 2427151

Relocation Package No

100% On-Site Yes

Due to the operational nature and specific job duties of this role, work is required to be completed 100% in person/On-site.

AI Lab - LLM Applied Evaluation and Benchmark Intern

Job details

Get Weekly Job Offers

Other Open Roles