About the Role

A structured AI evaluation initiative focused on improving the accuracy, reasoning, and reliability of conversational systems in data science and software engineering domains. The work targets how models generate, interpret, and explain code, particularly in statistical computing and analytical workflows.

This opportunity is ideal for experienced R practitioners with strong analytical thinking and a deep understanding of statistical programming, code correctness, and algorithmic reasoning. It suits individuals who can independently validate outputs and critically assess technical explanations.

The work involves reviewing AI-generated responses, executing and validating R code, and providing structured feedback on correctness and clarity, where attention to detail and consistency are critical to improving model performance.

What You'll Do

Evaluate AI-generated responses to coding and data science queries
Execute and validate R code to ensure accuracy and correctness
Identify logical flaws, statistical errors, and edge case failures
Annotate outputs with precise feedback on strengths and weaknesses
Assess code quality, readability, and analytical soundness
Perform fact-checking using reliable technical and statistical references
Apply standardized evaluation frameworks and scoring criteria
Ensure outputs align with expected technical and conversational standards

Requirements

5+ years of professional experience in software engineering, data science, or related fields
Strong expertise in R programming language
Ability to solve medium to hard algorithmic and analytical problems independently
Experience executing, testing, and debugging data-driven code
Strong foundation in statistics, data analysis, and algorithm design
High attention to detail in reviewing technical reasoning and outputs
Fluent English communication skills
Experience using LLMs in analytical or coding workflows and understanding their limitations
Ability to follow structured evaluation guidelines and frameworks
Bachelor’s degree or higher in Computer Science, Data Science, Statistics, or related discipline
Experience contributing to open-source projects with accepted contributions
Familiarity with additional programming languages or data ecosystems (preferred)
Experience in model evaluation, RLHF, or data annotation (preferred)
Background in competitive programming or technical problem-solving (preferred)
Experience reviewing code in production or analytical environments (preferred)
Ability to communicate complex statistical or technical concepts clearly (preferred)

Senior R Software Engineer (AI Evaluation)

About the Role

What You'll Do

Requirements

Explore Similar Global AI Roles

Marketing & Commercial Strategy Specialist – Growth Analytics & Strategic Evaluation (Remote, Contract)

Software Engineering Specialist – Backend Systems & Code Evaluation (Remote, Contract)

Radiology Specialist – Diagnostic Imaging Scenario Design & AI Evaluation (Remote, Contract)