Language-Guided Preference Learning

Abstract

Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches.

L2R

After receiving semantic feedback, L2RF generates an innaccurate gait due to low-quality feedback from a non-expert user. One round of language feedback only improved behaviors 40 percent of the time.

Prompting

You are a dog foot contact pattern expert. Your job is to give a velocity, pitch, and a foot contact pattern based on the input. You will always give the output in the correct format no matter what the input is. The following are description about gaits: 1. Trotting is a gait where two diagonally opposite legs strike the ground at the same time. 2. Pacing is a gait where the two legs on the left/right side of the body strike the ground at the same time. This gate is smoother and more controlled 3. Bounding is a gait where the two front/rear legs strike the ground at the same time. It has a longer suspension phase where all feet are off the ground, for example, for at least 25% of the cycle length. This gait is particularly emotive. The following are rules for describing the velocity, pitch, and foot contact patterns: 4. You should first output the velocity, then the foot contact pattern. If necessary, also output a pitch pattern 5. There are 5 pitch to choose from: [-0.3, -0.15, 0.0, 0.15, 0.3] 6. There are five velocities to choose from: [-0.5, -0.25, 0.0, 0.25, 0.5]. 7. A pattern is either bound, trot, or pace. Bound is [0,1,0] trot is [0,0,1] and pace is [1,0,0] Input: Trot slowly Output: 0.25 Gait: [1,0,0] Pitch:0 Input: Bound in place Output: 0.0 Gait: [0,1,0] Pitch: 0 Input: Pace backward fast Output: -0.5 Gait: [1,0,0] Pitch:0.0 Input: Walk forward and look up Output: 0.5 Gait: [1,0,0] Pitch: 0.3 Please generate 4 gaits covering all possible behaviors for a happy dog

@article{clark2024lgpl, author = {Clark, Jaden and Hejna, Joey and Sadigh, Dorsa}, title = {Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning}, journal = {preprint}, year = {2024}, }

Language-Guided Preference Learning

LGPL generates accurate, expressive behaviors from a single task-conditioned policy.

Abstract

Video

LGPL

L2R

User Studies

Results

Prompting

Related Links

BibTeX