Implementing AI on the "How to Spell?" Website

Our “How to Spell?” website has always aimed to be a useful tool for those seeking the correct forms of Russian words. Initially, its functionality was limited to providing declensions and conjugations from our database. This was good, but we saw the potential for much more. Our goal wasn’t just to provide dry grammatical forms, but to help users understand how these words work in real contexts. This is where artificial intelligence came to our aid.

The Search and Selection of an AI Solution

Before implementing AI, we identified key areas where it could bring the most benefit:

Contextual examples of word usage: Users often needed not just word forms, but also examples of their use in sentences to better understand the nuances of cases, number, and gender.
Improved search: The ability to offer more relevant results and understand queries, even if they contained typos or incomplete phrases.
Feedback and error correction: Automated processing of error reports and improvement suggestions.

To achieve these goals, we initially considered various NLP (Natural Language Processing) models. While we explored options like Google Cloud Vertex AI, we ultimately found that OpenAI’s GPT models (Generative Pre-trained Transformer) offered superior flexibility and text generation quality in Russian, significantly simplifying integration and speeding up development. This choice allowed us to get more natural and diverse examples of word usage without extensive custom training.

Architecture After AI Implementation

Improving Website Functionality with AI

1. Intelligent Word Usage Examples

This is arguably the most significant improvement. Previously, users saw a table with word forms, for example, “главарь” (leader) – “главаря”, “главарю”, “главарей,” and so on. But what does that mean in practice? When do you use “главарей” in the genitive case, and when in the accusative? AI helped us answer this question.

How It Works:

When a user views a word form, for example, “главарей,” and clicks the “Examples” button, our AI module performs the following actions:

Request Analysis: The AI receives information about the word (“главарей”) and its grammatical characteristics (e.g., “noun, genitive case, animate, plural” or “accusative case, animate, plural”).
Context Generation: Using its trained model, the AI generates several realistic sentences where the given word form is used in the appropriate case and context. The model can understand that “главарь” is an animate masculine noun and generate examples that reflect this.
Filtering and Ranking: The generated sentences are filtered for grammatical correctness, relevance, and naturalness. The best ones are displayed to the user.

Example Before and After AI Implementation:

Before AI: You see: “главарей” – noun, genitive case, animate, plural. User thinks: “Okay, but how do I use this?”

After AI (upon clicking the “Examples” button):

A pop-up appears with examples: Examples for “главарей” – noun, genitive case, animate, plural:

“В донесении говорилось о поимке нескольких главарей преступного синдиката.” (The report mentioned the capture of several leaders of a criminal syndicate.)
“Для поимки всех главарей банды понадобилось несколько месяцев.” (It took several months to capture all the leaders of the gang.)
“Объявлена награда за сведения о главарях сопротивления.” (A reward has been offered for information on the leaders of the resistance.)

Examples for “главарей” – noun, accusative case, animate, plural:

“Разведка выследила главарей повстанцев в горном районе.” (Intelligence tracked down the leaders of the rebels in a mountainous area.)
“Мы видели, как арестовывали главарей заговора.” (We saw the leaders of the conspiracy being arrested.)

This not only gives the user concrete examples but also helps them intuitively understand the difference between homonymous forms (e.g., genitive vs. accusative case for animate plural nouns).

2. Enhanced and “Smart” Search

AI has also improved our search system. Now, when a user types a word into the search bar:

Autocompletion with Form Awareness: AI can suggest word variants for autocompletion, considering not only the infinitive form but also possible cases or numbers.
Typo Correction: If a user makes a small typo (e.g., “гловори” instead of “главари”), the AI is capable of recognizing the likely correct word and suggesting it or automatically redirecting to the correct page.
Synonym Search (future): While still in development, AI will allow us to understand synonyms and suggest forms for words similar in meaning if there’s no direct match.

Example: User types: “главарём” (with the letter “ё” instead of “е”) AI corrects: “главарем” and shows forms for the word “главарь.”

3. Automated Feedback

Previously, when users reported typos or errors via the “Ctrl+Enter” form, our administrators manually processed these messages. Now, AI assists in this process:

Message Classification: The AI analyzes the message text and classifies it (e.g., “grammatical error,” “declension inaccuracy,” “improvement suggestion”).
Prioritization: Important messages requiring immediate attention (critical errors) are automatically flagged as high priority.
Preliminary Analysis: For grammatical errors, the AI can suggest possible corrections or confirm them, significantly speeding up the review process for our editors.

This has greatly reduced the workload on the team and allowed us to respond to feedback more quickly, improving the quality of the data on the site.

Challenges and Solutions

Implementing AI wasn’t without its challenges:

Data Quality: Training AI models required a vast amount of high-quality Russian language data. We utilized open text corpora and carefully filtered them.
Context Relevance: Ensuring that generated examples were natural and grammatically correct 100% of the time was tricky. It required fine-tuning the models and continuous monitoring.
Performance: AI models can be resource-intensive. We optimized their operation and used caching to ensure responses were fast and didn’t slow down the site.

We addressed these issues through iterative development, continuous model training with new data, and close collaboration with linguists to ensure grammatical accuracy.

Code Examples and OpenAI Integration

Let’s delve into the technical details and see how OpenAI is integrated.

1. Integrating Contextual Example Generation

When a user clicks the “Examples” button next to a word form, an asynchronous request is made to our backend.

Frontend (JavaScript):

The frontend code remains largely the same, as it simply sends a request to our backend. It doesn’t “know” which specific AI platform is used on the server.

JavaScript

// Inside your scripts.js or in a <script> tag
document.addEventListener('DOMContentLoaded', function() {
    const showExamplesButtons = document.querySelectorAll('.examples-btn');
    const popup = document.getElementById('examplePopup');
    const popupContent = document.getElementById('popupContent');
    const closePopupBtn = document.getElementById('closePopup');

    showExamplesButtons.forEach(button => {
        button.addEventListener('click', async function() {
            const wordForm = this.getAttribute('data-word-form'); // E.g., "главарей"
            const grammarInfo = this.getAttribute('data-grammar-info'); // E.g., "noun, genitive case, animate, plural"

            popupContent.innerHTML = `<p>Loading examples for "${wordForm}" (${grammarInfo})...</p>`;
            popup.style.display = 'flex'; // Show the modal

            try {
                // Send request to our backend
                const response = await fetch('/api/get-examples', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                    },
                    body: JSON.stringify({ word_form: wordForm, grammar_info: grammarInfo })
                });

                if (!response.ok) {
                    throw new Error(`HTTP error! status: ${response.status}`);
                }

                const data = await response.json();

                if (data.success && data.examples && data.examples.length > 0) {
                    let examplesHtml = `<h4>Examples for "${wordForm}" (${grammarInfo}):</h4><ul>`;
                    data.examples.forEach(example => {
                        // Highlight the word in the example
                        const highlightedExample = example.replace(new RegExp(`\\b${wordForm}\\b`, 'gi'), `<span class="example-highlight">${wordForm}</span>`);
                        examplesHtml += `<li>${highlightedExample}</li>`;
                    });
                    examplesHtml += `</ul>`;
                    popupContent.innerHTML = examplesHtml;
                } else {
                    popupContent.innerHTML = `<p>Examples for "${wordForm}" not found yet. We're working on it!</p>`;
                }

            } catch (error) {
                console.error('Error getting examples:', error);
                popupContent.innerHTML = `<p>An error occurred while loading examples. Please try again.</p>`;
            }
        });
    });

    closePopupBtn.addEventListener('click', function() {
        popup.style.display = 'none';
    });

    window.addEventListener('click', function(event) {
        if (event.target === popup) {
            popup.style.display = 'none';
        }
    });
});

Backend (PHP) – api/get-examples.php (using OpenAI):

This is where the key changes lie. We’ll use an HTTP client (like Guzzle if installed via Composer, or curl) to interact with the OpenAI API.

PHP

<?php
header('Content-Type: application/json');

// Check request method
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
    http_response_code(405); // Method Not Allowed
    echo json_encode(['success' => false, 'message' => 'Method Not Allowed.']);
    exit();
}

$input = json_decode(file_get_contents('php://input'), true);
$wordForm = $input['word_form'] ?? '';
$grammarInfo = $input['grammar_info'] ?? '';

if (empty($wordForm)) {
    http_response_code(400); // Bad Request
    echo json_encode(['success' => false, 'message' => 'Word form not specified.']);
    exit();
}

// --- OpenAI API Integration ---
$openai_api_key = getenv('OPENAI_API_KEY'); // Recommended to store key in environment variables
$openai_api_url = 'https://api.openai.com/v1/chat/completions'; // For GPT-3.5/4

if (!$openai_api_key) {
    // In a real application: log error, return message to user
    http_response_code(500);
    echo json_encode(['success' => false, 'message' => 'API key configuration error.']);
    exit();
}

$prompt_text = "Generate 3 example sentences in Russian where the word \"" . $wordForm . "\" is used as " . $grammarInfo . ". Sentences should be natural and grammatically correct. Avoid boilerplate phrases and simple enumerations.";

$data = [
    'model' => 'gpt-3.5-turbo', // Or 'gpt-4o' if available and within budget
    'messages' => [
        ['role' => 'system', 'content' => 'You are an expert in the Russian language and a generator of word usage examples.'],
        ['role' => 'user', 'content' => $prompt_text]
    ],
    'max_tokens' => 200, // Limit tokens for the response
    'temperature' => 0.7, // Make responses a bit more diverse
    'n' => 1 // Number of response variations
];

$ch = curl_init($openai_api_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json',
    'Authorization: Bearer ' . $openai_api_key
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));

$response = curl_exec($ch);
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($http_status !== 200) {
    // Handle API errors
    $error_data = json_decode($response, true);
    error_log("OpenAI API Error: HTTP Status {$http_status}, Message: " . ($error_data['error']['message'] ?? 'Unknown error'));
    http_response_code(500);
    echo json_encode(['success' => false, 'message' => 'Error querying AI service.']);
    exit();
}

$openai_response_data = json_decode($response, true);
$examples = [];

if (isset($openai_response_data['choices'][0]['message']['content'])) {
    $raw_content = $openai_response_data['choices'][0]['message']['content'];
    // OpenAI might return a bulleted list or just paragraphs.
    // We need to parse this into an array of sentences.
    // Simplest parsing: split by newline and clean up numbering/markers
    $lines = explode("\n", $raw_content);
    foreach ($lines as $line) {
        $clean_line = trim($line);
        if (!empty($clean_line) && (strpos($clean_line, '*') === 0 || preg_match('/^\d+\.\s/', $clean_line) || mb_strlen($clean_line) > 10)) {
            $examples[] = preg_replace('/^(\*|-|\d+\.)\s*/u', '', $clean_line); // Remove markers
        }
    }
}

// If no examples are found or they are of low quality, provide a fallback
if (empty($examples)) {
    $examples = ["Examples for this word form are currently unavailable. We are working on expanding our database."];
}

echo json_encode(['success' => true, 'examples' => $examples]);
exit();
?>

Explanation of Changes:

API Key and OpenAI URL: We use the official https://api.openai.com/v1/chat/completions URL for GPT model calls. The API key must be obtained from an OpenAI account and stored securely (e.g., via environment variables).
prompt_text Construction: We create a detailed prompt for the AI, specifying not only the word itself and its grammatical characteristics but also the desired number of examples and quality requirements (“natural,” “grammatically correct,” “avoid boilerplate phrases”).
messages Object: For gpt-3.5-turbo and gpt-4 models, the request format uses an array of messages where system (instructions for the AI) and user (user’s query) roles are defined.
max_tokens and temperature Parameters:
- max_tokens: Limits the length of the AI’s response to avoid overly long or irrelevant texts and control costs.
- temperature: Influences the “creativity” or randomness of responses. A value of 0.7 makes responses varied enough while maintaining relevance.
Response Handling: The response from the OpenAI API comes in JSON format. We extract the content from the first element of the choices array and then parse it into an array of individual sentences. More complex parsing logic might be needed here, as the AI can format the list in various ways.
Error Handling: Basic handling of HTTP statuses and errors returned by the OpenAI API is included.

2. Improving Search with NLP and Elasticsearch (OpenAI’s Role)

For search, Elasticsearch is still used for fast full-text searching within our word database. PyMorphy2 (or a similar library) handles basic lemmatization and morphological analysis of user queries on the server side.

OpenAI’s Role in Search:

OpenAI is not used for every search query directly due to potential latency and cost. However, it can be leveraged for more complex typo correction scenarios or understanding query context if basic Elasticsearch and PyMorphy2 methods are insufficient.

Example of Using OpenAI for Complex Corrections/Clarifications:

PHP

<?php
// ... (initial request processing and Elasticsearch connection)

// Function to correct typos using OpenAI
function correct_typo_with_openai($query, $openai_api_key) {
    $openai_api_url = 'https://api.openai.com/v1/chat/completions';
    $prompt = "Correct any possible typos in the Russian word or phrase, if present, and return only the corrected word/phrase. If there are no typos, return the original. Example: 'гловори' -> 'главари', 'котенок' -> 'котенок'. Only the corrected word.";

    $data = [
        'model' => 'gpt-3.5-turbo',
        'messages' => [
            ['role' => 'system', 'content' => $prompt],
            ['role' => 'user', 'content' => $query]
        ],
        'max_tokens' => 30, // Limit for a concise response
        'temperature' => 0.1, // Make responses highly deterministic (less creativity)
    ];

    $ch = curl_init($openai_api_url);
    curl_setopt($ch, CURLOPT_HTTPHEADER, [
        'Content-Type: application/json',
        'Authorization: Bearer ' . $openai_api_key
    ]);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));

    $response = curl_exec($ch);
    $http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($http_status === 200) {
        $openai_response_data = json_decode($response, true);
        if (isset($openai_response_data['choices'][0]['message']['content'])) {
            return trim($openai_response_data['choices'][0]['message']['content']);
        }
    }
    return $query; // In case of error or if AI didn't correct, return the original query
}

// ... (further in your search query processing logic)
$userQuery = $_GET['word'] ?? '';

if (!empty($userQuery)) {
    // First, try simple methods, then AI for complex cases
    $correctedQuery = $userQuery;
    // You can add a condition, e.g., if PyMorphy2 couldn't find a lemma or the word seems very erroneous
    if (mb_strlen($userQuery) > 3 && !lemmatize_russian_word($userQuery)) { // Example simple condition
         $correctedQuery = correct_typo_with_openai($userQuery, $openai_api_key);
    }

    $lemmaQuery = lemmatize_russian_word($correctedQuery);

    // ... (then the Elasticsearch query, as in the previous answer) ...
}
?>

Explanation:

correct_typo_with_openai: This function sends a request to OpenAI with a very strict prompt (low temperature) to get the most accurate typo correction.
Conditional AI Use: We don’t call OpenAI for every query. The AI model is invoked only in cases where our basic methods (e.g., PyMorphy2 or Elasticsearch with fuzziness) cannot provide a confident result, or the query seems heavily distorted. This helps optimize costs and response time.

Conclusion

The adoption of OpenAI for generating contextual examples and assisting in complex search scenarios has been a significant leap forward for “How to Spell?”. GPT models provide high-quality and natural Russian language output, which is critical for our educational resource.

We leverage OpenAI where deep language understanding and content generation are required, while traditional tools like Elasticsearch and PyMorphy2 handle high-speed and high-volume search and morphological analysis tasks. This hybrid approach allows us to achieve maximum efficiency, providing users with an intelligent and valuable tool for working with the Russian language.