Configuration Types ⚙️¶
SynthGenAI uses three main configuration types to generate synthetic datasets. These configurations work together to define the dataset parameters, LLM settings, and overall generation process:
- Dataset Configuration - Configure dataset parameters like topic, domains, language, and number of entries
 - LLM Configuration - Configure the language model settings including model selection, temperature, and API credentials
 - Dataset Generator Configuration - Combine dataset and LLM configurations for the complete generation setup
 
Configuration Overview 🔧¶
Dataset Configuration¶
The DatasetConfig defines what kind of dataset you want to generate, including:
- Topic and domains
 - Target language
 - Number of entries
 - Additional descriptions
 
LLM Configuration¶
The LLMConfig specifies which language model to use and how, including:
- Model provider and name
 - Generation parameters (temperature, top_p, max_tokens)
 - API credentials and endpoints
 
Dataset Generator Configuration¶
The DatasetGeneratorConfig combines both configurations to create a complete setup for dataset generation across all supported dataset types.
Environment Variables 🔐¶
SynthGenAI uses several environment variables to control behavior and configuration:
Logging Configuration¶
SYNTHGENAI_DETAILED_MODE- Controls logging verbosity"true"(default): Minimal logging output, recommended for production"false": Detailed debug logging, useful for development and troubleshooting
# Enable detailed logging for debugging
export SYNTHGENAI_DETAILED_MODE="false"
# No logging (default)
export SYNTHGENAI_DETAILED_MODE="true"
API Configuration¶
Environment variables for different LLM providers are documented in the LLM Configuration section.
Usage Pattern 📋¶
All dataset generators follow the same configuration pattern:
- Create a 
DatasetConfigwith your dataset requirements - Create an 
LLMConfigwith your preferred language model settings - Combine them into a 
DatasetGeneratorConfig - Use this configuration with any dataset generator type
 
This unified approach ensures consistency across all dataset types while providing the flexibility to customize both the dataset characteristics and the underlying language model behavior.