GenAI Studio
  • Edge AI SDK/GenAI Studio
  • Getting Started
    • About GenAI Studio
    • Quickstart
      • Prerequisite
      • Installation
      • Utilities
    • Feature Overview
      • Inference Chat
      • Fine-tuning
      • Model Management
      • Application
    • Version History
      • Version 1.1
      • Version 1.0
  • Inference
    • Chat Inference
    • AI Agents
  • Finetune
    • Text-to-Text
      • Overview
      • Full Parameter
      • LoRA
    • Text-to-Image (Coming Soon)
    • Dataset Management
    • Schedule
  • Model
    • Model Management
  • Validation
  • Convert
  • Administration
    • Resource Monitoring
  • System Configuration
    • AI Providers
      • LLM Setup
      • Embedder Setup
      • Vector DB
      • Transcription Setup
    • System Administration
      • Users
      • Workspace Chats
      • Invites
      • GPU Resource
      • Register an App
    • Appearance Customization
    • Tools
      • Embedded Chat Widgets
      • Event Logs
      • Security & Access
  • Application
    • Text to Image
    • Background Removal
    • OCR
  • FAQ
    • Technical
Powered by GitBook
On this page
  • Dataset Management
  • Method 1: Uploading Pre-Processed Dataset JSON Files
  • Method 2: Automatically Generating Datasets via the Dataset Generator
  1. Finetune

Dataset Management

PreviousText-to-Image (Coming Soon)NextSchedule

Last updated 4 days ago

Dataset Management

Effective dataset preparation is crucial for successful fine-tuning of machine learning models. High-quality and well-structured datasets ensure that the models can learn accurately and generalize effectively. To support this, our system provides two flexible methods for managing datasets, catering to both advanced users who prefer manual preparation and those who seek automated solutions.

The Dataset Management feature provides two methods for handling datasets:

Method 1: Uploading Pre-Processed Dataset JSON Files

Users can prepare their own Dataset JSON files in the specified format and upload them. The example format is as follows:

[
  {
    "instruct": "What processor is integrated into the AIR-100 system?",
    "output": "The AIR-100 system is integrated with an Intel Atom Processor E3950."
  }
]
  • Uploaded JSON files must follow this format and should not exceed 10 MB in size.

  • Once uploaded, the files will be listed in the Dataset List, showing the file name and size.

  • Users can delete any uploaded files.


Method 2: Automatically Generating Datasets via the Dataset Generator

Users can upload PDF(.pdf), Word (.docx), plain text (.txt), or Excel (.xlsx) documents, and the system will automatically generate a specified number of datasets from these files.

Important: This dataset, which was automatically generated, only supports English at this time, even if your original data is in a different language.

  • Uploaded files must not exceed 10 MB in size.

  • Users need to specify the number of datasets to be generated and click "Start" to initiate the process.

  • If the data in the file is insufficient, the message "The amount of dataset is too small" may appear.

  • Uploaded documents will be displayed in the Document List, where each entry can be edited or deleted.

  • The system will show the generation progress and status, such as "Stopped by user" or "Completed."

  • Users can click on individual entries in the Document List to view the detailed contents of the generated datasets and edit them in real time.

  • Clicking "Generate dataset files" allows users to select multiple documents and combine them into a single JSON file, which can be used for subsequent fine-tuning.

These two methods provide flexibility for users, whether they prefer to upload fully prepared JSON files or use the system's tools for quick dataset generation, catering to different needs.