In this article, we’ll explore automated testing with the Oracle AI Optimizer & Toolkit, and then generate Python LangChain or Java Spring AI applications from our tested AI configurations.
What is the Oracle AI Optimizer & Toolkit? It’s a free and open-source tool designed to make it easier for developers and data engineers to build, benchmark, and optimize AI workflows running on Oracle Database.
Prerequisites: A running AI Optimizer and Toolkit server with vector search configured.
- Part 1: Set Up A Local Sandbox
- Part 2: Connect LLMs and use your data
I suggest finishing parts 1 & 2 before continuing!
Create a Q&A Test Set
A Q&A test set is a collection of prompts and expected answers. We use test sets to evaluate a given AI configuration (models, vector data, parameters) for accuracy.
Q&A test sets are created and evaluated from the http://localhost:8501/testbed page. Q&A test set data is either user-provided or generated using the AI Optimizer.
In this example, we’ll use an AI model to generate a Q&A test set for us. To do this, download the Oracle AI Database Vector Search User Guide PDF and upload it using the “Browse Files” button on the testbed page:

Next, choose the language and embedding models you want to test with. Here we’ll use OpenAI models configured in part 2, but you can use any supported model provider. Note that the testbed “judge” model doesn’t have to match the model that built your knowledge base. In fact, using a different model often produces better test coverage.
Upload the PDF, and click “Generate Q&A” to create the Q&A test set. Q&A test set generation may take some time depending on the size of your data set:

Review the details of the generated Q&A test set, which the optimizer uses to score your model configuration:

Score the Q&A Test Set
Now it’s time to score the Q&A test set with our model configuration! In the sidebar, ensure you have “Vector Search” selected as the Toolkit, and any model parameters configured as desired. Then, click “Start Evaluation” to score the Q&A test set:


The evaluation runs and generates a test report, showing a correctness score (% of questions passed) with failures flagged:

The optimizer stores the generated Q&A test sets and test reports so you can review and reuse them. You can keep as many test sets as you need.
If your model scoring is incorrect, use the optimizer to experiment with the Q&A, model parameters, models used, and vector search settings – after all the purpose of the optimizer is experimentation!
Download an app template
From the http://localhost:8501/config page, choose to download either a LangChain or Spring AI app template.

Downloading the Spring AI sample, the result application contains the following files:

If you’re a Python or Java developer, you can take and extend this template to build an AI application using your optimizer configuration!

Leave a Reply