Reinforcement Learning From Human Ratings

dc.contributor.advisorCao, Yongcan
dc.contributor.authorWhite, Devin
dc.contributor.committeeMemberJin, Yufang
dc.contributor.committeeMemberKudithipudi, Dhireesha
dc.description.abstractReinforcement learning (RL), an important subject in artificial intelligence, focuses on learning control policies to tackle various tasks from Atari Games to autonomous driving and large language models. To derive control policies via RL, it is often required that reward functions are designed beforehand. Because reward functions, a map from states or state-action pairs to quantitative reward values, reflect both the task objectives and the underlying environments, it is often difficult and costly to design these reward functions. Instead of designing reward functions, an alternative approach is to learn reward functions from human guidance. In this thesis, we focus on developing a new rating-based reinforcement learning approach that uses human ratings to learn reward functions. Different from the existing human guidance methods, namely, preference-based and ranking-based reward learning paradigms which are based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function to learn the reward functions, which are then used to learn RL-based control policies. We conduct experimental studies based on both synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.
dc.description.departmentElectrical and Computer Engineering
dc.format.extent1 electronic resource (43 pages)
dc.subjectArtificial Intelligence
dc.subjectHuman Guided Reinforcement Learning
dc.subjectMachine Learning
dc.subjectReinforcement Learning
dc.subject.classificationArtificial intelligence
dc.subject.classificationComputer engineering
dc.subject.classificationComputer science
dc.titleReinforcement Learning From Human Ratings
dc.type.dcmiText and Computer Engineering of Texas at San Antonio of Science


Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
White - Thesis.pdf
1.42 MB
Adobe Portable Document Format