Files
Perplexica/sample.config.toml
haddadrm 5a603a7fd4 Implemented the configurable stream delay feature for
the reasoning models using ReasoningChatModel Custom Class.

1. Added the STREAM_DELAY parameter to the sample.config.toml file:

[MODELS.DEEPSEEK]
API_KEY = ""
STREAM_DELAY = 20  # Milliseconds between token emissions for reasoning models (higher = slower, 0 = no delay)

2. Updated the Config interface in src/config.ts to include the new parameter:

DEEPSEEK: {
  API_KEY: string;
  STREAM_DELAY: number;
};

3. Added a getter function in src/config.ts to retrieve the configured value:

export const getDeepseekStreamDelay = () =>
  loadConfig().MODELS.DEEPSEEK.STREAM_DELAY || 20; // Default to 20ms if not specified
Updated the deepseek.ts provider to use the configured stream delay:

const streamDelay = getDeepseekStreamDelay();
logger.debug(`Using stream delay of ${streamDelay}ms for ${model.id}`);

// Then using it in the model configuration
model: new ReasoningChatModel({
  // ...other params
  streamDelay
}),

4. This implementation provides several benefits:

-User-Configurable: Users can now adjust the stream delay without modifying code
-Descriptive Naming: The parameter name "STREAM_DELAY" clearly indicates its purpose
-Documented: The comment in the config file explains what the parameter does
-Fallback Default: If not specified, it defaults to 20ms
-Logging: Added debug logging to show the configured value when loading models

To adjust the stream delay, users can simply modify the STREAM_DELAY value in
their config.toml file. Higher values will slow down token generation
(making it easier to read in real-time), while lower values will speed it up.
 Setting it to 0 will disable the delay entirely.
2025-02-26 00:03:36 +04:00

35 lines
772 B
TOML

[GENERAL]
PORT = 3001 # Port to run the server on
SIMILARITY_MEASURE = "cosine" # "cosine" or "dot"
KEEP_ALIVE = "5m" # How long to keep Ollama models loaded into memory. (Instead of using -1 use "-1m")
[MODELS.OPENAI]
API_KEY = ""
[MODELS.GROQ]
API_KEY = ""
[MODELS.ANTHROPIC]
API_KEY = ""
[MODELS.GEMINI]
API_KEY = ""
[MODELS.DEEPSEEK]
API_KEY = ""
STREAM_DELAY = 5 # Milliseconds between token emissions for reasoning models (higher = slower, 0 = no delay)
[MODELS.OLLAMA]
API_URL = "" # Ollama API URL - http://host.docker.internal:11434
[MODELS.LMSTUDIO]
API_URL = "" # LM STUDIO API URL - http://host.docker.internal:1234
[MODELS.CUSTOM_OPENAI]
API_KEY = ""
API_URL = ""
MODEL_NAME = ""
[API_ENDPOINTS]
SEARXNG = "http://localhost:32768" # SearxNG API URL