Steerable Preference Optimization of Reward Models
Minsik Oh, Advit Deepak, Sophie Wu, Douwe Kiela, Ekaterina Shutova
Minsik Oh, Advit Deepak, Sophie Wu, Douwe Kiela, Ekaterina Shutova
Minsik Oh, Jiwei Li, Guoyin Wang.
Minsik Oh, Sungju Kim, Douwe Kiela, Guoyin Wang, Kang Min Yoo, Joosung Lee
Mehmet Hamza Erol*, Minsik Oh*, Douwe Kiela
Minsik Oh, Joosung Lee, Jiwei Li, Guoyin Wang.
Joosung Lee, Minsik Oh, Donghun Lee.
Minsik Oh, Guoyin Wang, Taiwoo Park, Puyang Xu.
Minsik Oh*, Minsang Kim*.