Dataset Management
Last updated
Last updated
Effective dataset preparation is crucial for successful fine-tuning of machine learning models. High-quality and well-structured datasets ensure that the models can learn accurately and generalize effectively. To support this, our system provides two flexible methods for managing datasets, catering to both advanced users who prefer manual preparation and those who seek automated solutions.
The Dataset Management feature provides two methods for handling datasets:
Users can prepare their own Dataset JSON files in the specified format and upload them. The example format is as follows:
Uploaded JSON files must follow this format and should not exceed 10 MB in size.
Once uploaded, the files will be listed in the Dataset List, showing the file name and size.
Users can delete any uploaded files.
Users can upload PDF(.pdf), Word (.docx), plain text (.txt), or Excel (.xlsx) documents, and the system will automatically generate a specified number of datasets from these files.
Uploaded files must not exceed 10 MB in size.
Users need to specify the number of datasets to be generated and click "Start
" to initiate the process.
If the data in the file is insufficient, the message "The amount of dataset is too small" may appear.
Uploaded documents will be displayed in the Document List, where each entry can be edited or deleted.
The system will show the generation progress and status, such as "Stopped by user" or "Completed."
Users can click on individual entries in the Document List
to view the detailed contents of the generated datasets and edit them in real time.
Clicking "Generate dataset files
" allows users to select multiple documents and combine them into a single JSON file, which can be used for subsequent fine-tuning.
These two methods provide flexibility for users, whether they prefer to upload fully prepared JSON files or use the system's tools for quick dataset generation, catering to different needs.