Understanding preference datasets
The principles for creating high-quality preference datasets are the same as those discussed in Chapter 5 for instruction datasets. We want to maximize the accuracy, diversity, and complexity of our samples. To achieve this, we follow the same stages, as outlined in Figure 6.1: data curation, deduplication, decontamination, quality evaluation, exploration, generation, and augmentation.
Figure 6.1 – Overview of the post-training data pipeline covered in this chapter
To avoid repetition, this section will focus on the main differences between instruction and preference datasets. We will introduce the structure of preference samples and the ideal size for preference datasets. Then, we will focus on the two stages that differ most from creating instruction datasets: data generation and evaluation.
Preference data
Preference datasets lack the standardization of instruction datasets due to varying data requirements across different...