Job Description

Principal / Director, AI Research – Reinforcement Learning for LLMs

We're hiring a Principal or Director-level AI Researcher with deep expertise in Reinforcement Learning and LLM post-training to join our growing AI research group. This is a research-first role, with a mandate to push the frontier of model alignment, safety, and performance — working with foundation models in real-world, high-stakes environments.

You won’t be handed toy problems or legacy systems. Instead, you'll lead applied research efforts focused on tuning, aligning, and optimizing large models for privacy, security, and interpretability - in one of the few spaces where LLMs have both massive scale and measurable consequences.

What You’ll Work On:

This role centers on building and refining intelligent agents that interact with sensitive data and complex access controls, using modern reinforcement learning and post-training techniques:

Post-training of LLMs using RL: Design and run experiments with methods like PPO, DPO, RLAIF, and other fine-tuning strategies to align model behavior with security and privacy goals
RL for Self-Correction & Redaction: Enable models to iteratively improve their predictions on document classification, redaction, and identity resolution through self-rewarded feedback loops
Model Alignment & Safety: Contribute to the development of our “LLM Firewall” — filtering prompts/responses to prevent jailbreaking, data leakage, and adversarial exploits
Inference Stack & Optimization: Collaborate with engineers optimizing our in-house inference stack to make LLaMA-class models performant at scale

What We’re Looking For:

Demonstrated expertise in Reinforcement Learning applied to language models or decision-making agents
Strong understanding of post-training methodologies (e.g., RLHF, DPO, preference modeling, rejection sampling, offline RL)
Solid background in LLMs , token-level reasoning , and language modeling internals
Publication record or research contributions in top-tier venues (NeurIPS, ICLR, ICML, ACL, etc.) preferred
Ability to work independently and iterate quickly — experience in scrappy, high-output research environments a plus
Industry experience is not required — we care more about the depth of your research thinking and experimentation rigor

Why This Role:

Join a company with massive real-world data , impactful use cases, and a mature infrastructure
Avoid the grind of infra-focused roles — we’ve already solved those problems
Shape the next phase of LLM alignment , self-correcting models , and AI safety at inference time
Work on problems with technical depth and direct product impact

Job Tags

Part time,

Similar Jobs

Law Firms

Family Law Attorney - Remote (Must have experience in Fairfax VA Courts) Job at Law Firms

...Hiring: Senior Family Law Attorney (Remote in Virginia)! Perks: Remote - This is an opportunity to be able to work from home at all times with the exception of court appearances. Aggressive Growth - We are a small dynamic team with a solid growth plan. Culture...

MCKESSON

OSR Production Manager Job at MCKESSON

...OSR Production Manager at MCKESSON summary: The OSR Production Manager at McKesson is responsible for overseeing automated order, storage, and retrieval systems to optimize production scheduling, throughput, and operational efficiency. This role involves leading a...

Specialized Aero, LLC

Airframe and Powerplant Mechanic Job at Specialized Aero, LLC

Company Description Specialized Aero, LLC is an FAA certified repair station located in San Marcos, TX. We specialize in aircraft maintenance, repair, and inspections, in addition to offering welding services, sheet metal repair, machining services, and fabrication. Specialized...

SFM Companies (SFM Mutual Insurance)

Insurance Support Representative I Job at SFM Companies (SFM Mutual Insurance)

...offers a flexible work environment, with remote work options. Hourly Range: $20.31 to... ...3 Our benefits include: ~ Affordable Medical, Dental, Vision Insurance, HSA, FSA ~ Traditional... ...from imaged documents such as medical bills and reimbursements to injured workers to...

Dogtopia

Canine Coach Dog Playroom Attendant Job at Dogtopia

Bring your dog to work? That's right! Dogtopia, the industry leader in dog daycare, boarding, and spa services, has an immediate opening for energetic and organized individuals who will be the star of the playroom. Candidates must LOVE dogs - ALL shapes, sizes, and breeds...

Principal Machine Learning Researcher (San Jose) Job at Alldus, San Jose, CA

U3AxWi8xKzJjbWFRYXZXUThiWC91cFd5Snc9PQ==