Introduction
Welcome to Chapter 10! So far, we’ve explored Stoolap’s core features, from its embedded nature and MVCC transactions to parallel query execution and the exciting world of vector search. Now, it’s time to put that knowledge into action by building a practical project: a hybrid OLTP/OLAP analytics dashboard.
In this chapter, you’ll learn how to leverage Stoolap’s unique capabilities to manage both high-volume transactional data ingestion (OLTP) and complex analytical queries (OLAP) within a single, embedded application. We’ll design a schema suitable for both workloads, insert dynamic data, and then query it to extract meaningful insights, simulating a real-time analytics dashboard. This project will solidify your understanding of Stoolap’s power as an HTAP database.
Before we dive in, make sure you have the Rust toolchain installed and a basic understanding of SQL. Familiarity with the concepts covered in previous chapters, especially schema design, querying, and basic Stoolap setup, will be beneficial. Let’s get building!
Core Concepts for an HTAP Dashboard
Building an effective HTAP dashboard requires a thoughtful approach to data modeling, transaction management, and query optimization. Stoolap, with its modern architecture, provides an excellent foundation for this.
HTAP Refresher: Why Stoolap Shines
Recall that Stoolap is designed to handle Hybrid Transactional/Analytical Processing (HTAP) workloads efficiently. This means it can gracefully manage:
- OLTP (Online Transaction Processing): Frequent, small, concurrent read/write operations (e.g., inserting new sales records, updating customer profiles). Stoolap’s Multi-Version Concurrency Control (MVCC) ensures high concurrency by allowing readers to see a consistent snapshot of the database without being blocked by writers.
- OLAP (Online Analytical Processing): Complex, read-heavy queries over large datasets, often involving aggregations, joins, and time-series analysis (e.g., calculating total sales by region, identifying top-selling products). Stoolap’s parallel query execution and cost-based optimizer are crucial here, enabling efficient processing of these demanding queries.
- Vector Search: For advanced analytical use cases like semantic recommendations, anomaly detection based on data embeddings, or intelligent content retrieval.
The goal of our dashboard is to ingest new transactions (OLTP) and immediately reflect those changes in aggregated analytical views (OLAP), all powered by a single Stoolap instance embedded directly in our application.
Dashboard Application Architecture
Our project will simulate a simple “Sales Analytics Dashboard.” It won’t have a fancy graphical user interface (GUI), but rather a command-line interface that demonstrates data ingestion and report generation.
Here’s a conceptual overview of how our application will interact with Stoolap:
Data Model Design for HTAP
For an HTAP system, schema design is a critical balancing act. We need a schema that allows for fast inserts (OLTP) while also being efficient for complex aggregations (OLAP). A common approach is to use a slightly denormalized “fact” table for transactions, which makes analytical queries simpler and faster, potentially combined with dimension tables.
For our sales dashboard, we’ll focus on a sales_transactions table. This table will capture each individual sale. To demonstrate vector search, we’ll also include a product_vector field, which could represent semantic features of a product.
Consider the sales_transactions table with these columns:
transaction_id: A unique identifier for each sale (primary key).product_id: Identifier for the product sold.customer_id: Identifier for the customer making the purchase.amount: The monetary value of the transaction.quantity: Number of items sold in this transaction.transaction_timestamp: When the transaction occurred (crucial for time-series analysis).product_vector: ABLOBtype storing an embedding for the product. As of March 2026, Stoolap primarily usesBLOBfor custom binary data types like vectors, allowing applications to handle serialization and deserialization. This enables semantic search capabilities.
This design is relatively flat, making it easy to insert new records and efficient for GROUP BY operations on product_id, customer_id, and transaction_timestamp.
Parallel Query Execution and Cost-Based Optimization
Stoolap is designed for performance, and two key features enabling this for OLAP workloads are:
- Parallel Query Execution: Stoolap can automatically detect and leverage available CPU cores to execute parts of a complex query in parallel. This significantly speeds up operations like large aggregations or joins. While often managed internally, some
DatabaseOptionsmight allow hints or configurations for thread pools. - Cost-Based Query Optimization: Before executing a query, Stoolap’s optimizer analyzes various potential execution plans and chooses the one estimated to be most efficient (lowest “cost”). This takes into account table statistics, indexes, and query predicates. Understanding the optimizer’s choices, often visible via an
EXPLAINcommand, is crucial for tuning complex queries.
Leveraging Vector Search in Analytics
Imagine you want to recommend products similar to a user’s past purchases or group products semantically, even if their product_id is different. This is where product_vector comes in. By performing a vector similarity search, we can find products that are “semantically close” to a given product, adding a powerful dimension to our analytics beyond simple categorical matching. Stoolap provides internal functions or operators to efficiently compute similarity metrics like cosine similarity directly within SQL queries.
Step-by-Step Implementation
Let’s start building our Stoolap-powered sales analytics dashboard.
1. Project Setup
First, create a new Rust project and add the stoolap dependency.
cargo new stoolap_htap_dashboard
cd stoolap_htap_dashboard
Now, open Cargo.toml and add Stoolap as a dependency. As of March 2026, we’ll assume a stable version like 1.2.0. Always check the official Stoolap GitHub releases page for the absolute latest version.
# Cargo.toml
[package]
name = "stoolap_htap_dashboard"
version = "0.1.0"
edition = "2021"
[dependencies]
stoolap = "1.2.0" # Use the latest stable version as of 2026-03-20
rand = "0.8" # For generating random data
chrono = { version = "0.4", features = ["serde"] } # For handling timestamps
serde = { version = "1.0", features = ["derive"] } # For (de)serializing if needed, good practice
bincode = "1.3" # For serializing/deserializing vectors into BLOBs
tokio = { version = "1", features = ["full"] } # For async runtime
Run cargo build to fetch dependencies and ensure everything compiles.
2. Database Initialization and Schema Definition
Now, let’s write the Rust code to open our Stoolap database and define the sales_transactions table. We’ll store our database file as sales_data.db.
Open src/main.rs and add the following code:
// src/main.rs
use stoolap::database::{Database, DatabaseOptions};
use stoolap::error::StoolapError;
use stoolap::types::Value;
use rand::Rng;
use chrono::{Utc, Duration};
use std::sync::Arc; // Arc for shared ownership across async tasks
// We'll define a simple struct to represent our transaction data later
#[derive(Debug, Clone)]
struct SaleTransaction {
product_id: u32,
customer_id: u32,
amount: f64,
quantity: u32,
transaction_timestamp: chrono::DateTime<Utc>,
product_vector: Vec<f32>, // A simple vector of floats for demonstration
}
// Helper function to create a random product vector for demonstration
fn generate_random_vector(size: usize) -> Vec<f32> {
let mut rng = rand::thread_rng();
(0..size).map(|_| rng.gen_range(-1.0..1.0)).collect()
}
#[tokio::main] // Stoolap often uses async operations, so we'll use tokio for the main function
async fn main() -> Result<(), StoolapError> {
println!("Initializing Stoolap HTAP Dashboard...");
// 1. Open/Create the Stoolap Database
let db_path = "sales_data.db";
let options = DatabaseOptions {
// Example: Configure parallel execution if Stoolap exposes this option.
// As of 2026-03-20, Stoolap might auto-detect or offer specific settings.
// Let's assume a plausible option for demonstration:
num_worker_threads: Some(std::thread::available_parallelism().map_or(4, |p| p.get())),
..Default::default()
};
let db = Arc::new(Database::open(db_path, options).await?);
println!("Database opened at: {}", db_path);
// 2. Define the Schema
// Stoolap supports various SQL types. For vectors, BLOB is used for custom binary data.
let create_table_sql = "
CREATE TABLE IF NOT EXISTS sales_transactions (
transaction_id INTEGER PRIMARY KEY AUTOINCREMENT,
product_id INTEGER NOT NULL,
customer_id INTEGER NOT NULL,
amount REAL NOT NULL,
quantity INTEGER NOT NULL,
transaction_timestamp TIMESTAMP NOT NULL,
product_vector BLOB -- Storing serialized Vec<f32> as a binary large object
);
";
db.execute(create_table_sql, &[]).await?;
println!("'sales_transactions' table ensured.");
// The rest of our logic will go here
// ...
Ok(())
}
Explanation:
use stoolap::...: We import necessary components from the Stoolap library.Arcis used because theDatabaseobject needs to be safely shared across different asynchronous operations without transferring ownership. Cloning theArcincrements its reference count, allowing multiple parts of your program to hold a reference to the same database instance.#[tokio::main]: Stoolap’s async nature means we need an async runtime.tokiois a popular choice in Rust.Database::open(db_path, options).await?: This is how we open or create a Stoolap database file. Ifsales_data.dbdoesn’t exist, it will be created.DatabaseOptionscan be customized for things like cache size or, as shown,num_worker_threadsto hint at parallel execution capabilities. This helps Stoolap leverage available CPU cores.db.execute(create_table_sql, &[]).await?: This executes our SQLCREATE TABLEstatement.IF NOT EXISTSprevents errors if the table already exists.PRIMARY KEY AUTOINCREMENTfortransaction_idhandles unique IDs automatically.REALforamountis a floating-point number.TIMESTAMPfortransaction_timestampis crucial for time-based analytics.BLOBforproduct_vectoris a generic binary large object. As of March 2026, Stoolap typically usesBLOBfor custom data structures like vectors, requiring the application to serialize (bincode::serialize) and deserialize (bincode::deserialize) them.
3. Data Ingestion (OLTP)
Now, let’s simulate some transactional data coming into our system. We’ll generate random SaleTransaction objects and insert them into our sales_transactions table.
Add this function to src/main.rs, after the generate_random_vector function (and before main):
// src/main.rs
// ... (previous code including SaleTransaction struct and generate_random_vector)
async fn insert_sample_data(db: Arc<Database>, count: usize) -> Result<(), StoolapError> {
println!("Inserting {} sample sales transactions...", count);
let mut rng = rand::thread_rng();
let product_vector_size = 8; // Small vector size for demo, typically 32-1536+ dimensions
for i in 0..count {
let transaction = SaleTransaction {
product_id: rng.gen_range(100..110), // 10 unique products
customer_id: rng.gen_range(1000..1020), // 20 unique customers
amount: rng.gen_range(5.0..250.0),
quantity: rng.gen_range(1..5),
transaction_timestamp: Utc::now() - Duration::hours(rng.gen_range(0..24*30)), // Last 30 days
product_vector: generate_random_vector(product_vector_size),
};
// Serialize the vector into a BLOB using bincode
let serialized_vector = bincode::serialize(&transaction.product_vector)
.map_err(|e| StoolapError::Other(format!("Failed to serialize vector: {}", e)))?;
let insert_sql = "
INSERT INTO sales_transactions (product_id, customer_id, amount, quantity, transaction_timestamp, product_vector)
VALUES (?, ?, ?, ?, ?, ?);
";
db.execute(
insert_sql,
&[
Value::Integer(transaction.product_id as i64),
Value::Integer(transaction.customer_id as i64),
Value::Real(transaction.amount),
Value::Integer(transaction.quantity as i64),
Value::Timestamp(transaction.transaction_timestamp),
Value::Blob(serialized_vector),
],
).await?;
if i % 1000 == 0 && i > 0 {
println!(" Inserted {} transactions...", i);
}
}
println!("Finished inserting {} transactions.", count);
Ok(())
}
Then, call insert_sample_data from main:
// Inside `main` function, after table creation
// ...
db.execute(create_table_sql, &[]).await?;
println!("'sales_transactions' table ensured.");
// Insert some sample data (OLTP workload)
let num_transactions = 10_000; // Let's insert 10,000 transactions
insert_sample_data(Arc::clone(&db), num_transactions).await?; // We clone the Arc to safely share the database connection across different asynchronous operations.
Explanation of insert_sample_data:
rand::Rng: Used to generate random product IDs, customer IDs, amounts, and timestamps, simulating real-world variability.chrono::Utc::now() - Duration::hours(...): Generates timestamps within the last month.bincode::serialize(...): Converts ourVec<f32>(the product vector) into a byte array (Vec<u8>) which Stoolap can store as aBLOB. This is a common pattern when dealing with complex types not directly supported by SQL.INSERT INTO ... VALUES (?, ?, ?, ?, ?, ?): A prepared statement for efficient and safe insertion. The?placeholders are replaced by theValueenum variants.Value::Integer,Value::Real,Value::Timestamp,Value::Blob: Stoolap’s way of representing different data types for query parameters.- MVCC in action: While
insert_sample_datais running, other parts of an application (or even separate threads performing analytical queries) could theoretically query thesales_transactionstable. Thanks to Stoolap’s MVCC, these concurrent readers would see a consistent snapshot of the data from before the current write operation started, preventing read-write contention and ensuring high availability.
4. Analytical Queries (OLAP)
Now that we have data, let’s run some analytical queries to simulate dashboard reports.
Add the following functions to src/main.rs, after insert_sample_data (and before main):
// src/main.rs
// ... (previous code)
async fn run_total_sales_by_product_report(db: Arc<Database>) -> Result<(), StoolapError> {
println!("\n--- Report: Top 5 Products by Total Sales ---");
let query_sql = "
SELECT
product_id,
SUM(amount) AS total_sales,
COUNT(transaction_id) AS total_transactions
FROM sales_transactions
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 5;
";
// You could run EXPLAIN here to see the query plan:
// let explain_sql = format!("EXPLAIN {}", query_sql);
// let explain_rows = db.query(&explain_sql, &[]).await?;
// for row in explain_rows {
// println!("EXPLAIN: {:?}", row);
// }
let rows = db.query(query_sql, &[]).await?;
for row in rows {
let product_id: i64 = row.get(0)?;
let total_sales: f64 = row.get(1)?;
let total_transactions: i64 = row.get(2)?;
println!(
"Product ID: {}, Total Sales: {:.2}, Total Transactions: {}",
product_id, total_sales, total_transactions
);
}
Ok(())
}
async fn run_daily_sales_trend_report(db: Arc<Database>) -> Result<(), StoolapError> {
println!("\n--- Report: Daily Sales Trend (Last 7 Days) ---");
// Stoolap's SQL dialect, like many embedded databases, often provides
// `DATE()` and `DATETIME()` functions for timestamp manipulation.
// `DATE(timestamp)` extracts the date part. `DATETIME('now', '-7 days')` calculates a past date.
let query_sql = "
SELECT
DATE(transaction_timestamp) AS sale_date,
SUM(amount) AS daily_total_sales
FROM sales_transactions
WHERE transaction_timestamp >= DATETIME('now', '-7 days')
GROUP BY sale_date
ORDER BY sale_date ASC;
";
let rows = db.query(query_sql, &[]).await?;
for row in rows {
// Stoolap might return DATE as a string or a timestamp, we adapt here.
let sale_date_str: String = row.get(0)?;
let daily_total_sales: f64 = row.get(1)?;
println!(
"Date: {}, Daily Sales: {:.2}",
sale_date_str, daily_total_sales
);
}
Ok(())
}
async fn run_semantic_product_recommendations(db: Arc<Database>) -> Result<(), StoolapError> {
println!("\n--- Report: Semantic Product Recommendations ---");
// For demonstration, let's pick a 'query product' vector.
// In a real application, this would come from a user's past purchase or a specific product.
let query_product_id = 105; // Example product ID for which to find similar products
let mut query_vector_bytes: Option<Vec<u8>> = None;
// First, retrieve the vector for our query product
let query_vector_sql = "SELECT product_vector FROM sales_transactions WHERE product_id = ? LIMIT 1;";
let query_rows = db.query(query_vector_sql, &[Value::Integer(query_product_id as i64)]).await?;
if let Some(row) = query_rows.into_iter().next() {
query_vector_bytes = Some(row.get(0)?)
} else {
println!("Query product ID {} not found.", query_product_id);
return Ok(());
}
let query_vector: Vec<f32> = bincode::deserialize(&query_vector_bytes.unwrap())
.map_err(|e| StoolapError::Other(format!("Failed to deserialize query vector: {}", e)))?;
// Now, find other products with similar vectors.
// Stoolap's vector search is typically exposed via specific functions or operators.
// As of March 2026, assuming a `VECTOR_COSINE_SIMILARITY(vector_col, query_vector_blob)`
// function is available and optimized for efficient similarity search.
// This is a common pattern for embedded databases supporting vector search.
// Always refer to the [official Stoolap documentation](https://github.com/stoolap/stoolap)
// for the exact function signature and usage.
let recommendation_sql = "
SELECT
product_id,
-- Stoolap's specific vector function for cosine similarity
VECTOR_COSINE_SIMILARITY(product_vector, ?) AS similarity_score
FROM sales_transactions
WHERE product_id != ? -- Exclude the query product itself
GROUP BY product_id -- Group to get unique products with their best similarity score
ORDER BY similarity_score DESC
LIMIT 5;
";
// Re-serialize the query vector for the SQL parameter
let serialized_query_vector = bincode::serialize(&query_vector)
.map_err(|e| StoolapError::Other(format!("Failed to serialize query vector for search: {}", e)))?;
let rows = db.query(
recommendation_sql,
&[
Value::Blob(serialized_query_vector),
Value::Integer(query_product_id as i64),
],
).await?;
if rows.is_empty() {
println!("No similar products found.");
} else {
println!("Top 5 products semantically similar to Product ID {}:", query_product_id);
for row in rows {
let recommended_product_id: i64 = row.get(0)?;
let similarity_score: f64 = row.get(1)?;
println!(
" Product ID: {}, Similarity Score: {:.4}",
recommended_product_id, similarity_score
);
}
}
Ok(())
}
Now, call these report functions from main:
// Inside `main` function, after `insert_sample_data`
// ...
insert_sample_data(Arc::clone(&db), num_transactions).await?;
// Run our analytical reports (OLAP workload)
run_total_sales_by_product_report(Arc::clone(&db)).await?;
run_daily_sales_trend_report(Arc::clone(&db)).await?;
run_semantic_product_recommendations(Arc::clone(&db)).await?;
println!("\nStoolap HTAP Dashboard simulation complete!");
Ok(())
}
Explanation of Analytical Queries:
run_total_sales_by_product_report:- Uses
SUM(amount)andCOUNT(transaction_id)withGROUP BY product_id. This is a classic OLAP aggregation. ORDER BY total_sales DESC LIMIT 5shows us the top-selling products.- Cost-Based Optimization: The commented
EXPLAINblock demonstrates how you would typically inspect the query optimizer’s plan. RunningEXPLAINon a query reveals details like which indexes are used, join order, and intermediate steps, helping you understand and optimize its performance.
- Uses
run_daily_sales_trend_report:DATE(transaction_timestamp)is used to truncate timestamps to just the date, allowing us to group by day. This is a common SQL function.WHERE transaction_timestamp >= DATETIME('now', '-7 days')filters for recent data, a common dashboard requirement.
run_semantic_product_recommendations:- This is where vector search comes into play. We first retrieve the
product_vectorfor a specific product. - Then, we use a hypothetical but plausible
VECTOR_COSINE_SIMILARITY(vector_column, query_vector_blob)function. This function calculates the cosine similarity between the storedproduct_vectorand ourquery_vector. Stoolap would internally optimize this operation, potentially using specialized indexes for vector data. - The results are
ORDER BY similarity_score DESCto show the most similar products. - Important: The exact SQL syntax for vector search (e.g., function names, operators) will depend on Stoolap’s specific implementation. Always refer to the official Stoolap documentation for the most accurate syntax as this is an evolving feature.
- This is where vector search comes into play. We first retrieve the
Now, run your application:
cargo run --release
You should see output showing the database initialization, data insertion progress, and then the results of your three analytical reports!
Mini-Challenge
You’ve built a basic HTAP dashboard! Now, let’s extend its capabilities.
Challenge: Implement a new analytical report function called run_customer_spending_distribution that calculates the average transaction value per customer and identifies the top 5 customers by their average transaction amount.
- Hint: You’ll need to use
AVG(amount)andGROUP BY customer_id. Don’t forget to order the results! - What to observe/learn: This exercise reinforces your understanding of
GROUP BYwith aggregate functions and how to extract specific insights about user behavior from transactional data.
Once implemented, call this new function from your main function alongside the other reports.
Common Pitfalls & Troubleshooting
- Incorrect SQL Syntax for Stoolap: Stoolap, while SQL-compliant, might have minor differences in specific functions (especially date/time or vector functions) compared to other databases.
- Troubleshoot: Always cross-reference with the official Stoolap documentation for exact function names and syntax. Error messages from Stoolap typically point to syntax issues.
- Vector Serialization/Deserialization Issues: Storing
Vec<f32>asBLOBrequires careful serialization and deserialization.- Troubleshoot: Ensure you’re using the same serialization library (
bincodein our case) and version for both writing and reading. Check forStoolapError::Othermessages related to serialization failures. Verify theproduct_vector_sizeis consistent.
- Troubleshoot: Ensure you’re using the same serialization library (
- Performance Bottlenecks with Large Datasets: As your
sales_transactionstable grows, some analytical queries might become slow.- Troubleshoot:
- Indexing: Ensure appropriate indexes are created (e.g., on
product_id,customer_id,transaction_timestamp) for frequently queried columns. Stoolap’s query optimizer relies heavily on indexes. - Query Optimization: As demonstrated, use
EXPLAINto analyze the execution plan of your slow queries. Look for full table scans, inefficient joins, or missing indexes. - Parallel Execution: While Stoolap aims to manage this automatically, ensure your
DatabaseOptions(e.g.,num_worker_threads) are configured appropriately to leverage available CPU cores for complex OLAP queries.
- Indexing: Ensure appropriate indexes are created (e.g., on
- Troubleshoot:
- Resource Contention (HTAP challenge): If you’re constantly inserting data while running very heavy analytical queries, you might observe temporary slowdowns.
- Troubleshoot: Stoolap’s MVCC is designed to largely mitigate this by allowing reads to proceed without blocking writes. However, for extremely high contention scenarios, consider batching inserts, optimizing indexes, or potentially running very heavy OLAP reports during off-peak hours (though the goal of HTAP is to minimize the need for this).
Summary
Congratulations! You’ve successfully built a basic Hybrid OLTP/OLAP analytics dashboard using Stoolap.
Here are the key takeaways from this chapter:
- HTAP in Practice: You saw how Stoolap can simultaneously handle transactional data ingestion (OLTP) and complex analytical reporting (OLAP) within a single, embedded application.
- Schema Design for HTAP: We designed a
sales_transactionstable that balances the needs of both fast writes and efficient analytical queries, usingBLOBfor flexible storage of vector embeddings. - Practical Data Ingestion: You learned how to programmatically insert data into Stoolap using Rust, including handling complex types like vectors via serialization (
bincode). - Diverse Analytical Queries: You implemented several common OLAP patterns, including aggregations (
SUM,COUNT), grouping (GROUP BY), and time-series analysis using Stoolap’s SQL date functions (DATE,DATETIME). - Vector Search Integration: We explored how Stoolap’s vector search capabilities can be integrated into analytical reports for semantic recommendations, using a specialized SQL function (
VECTOR_COSINE_SIMILARITY). - Performance Considerations: We touched upon how Stoolap’s parallel query execution and cost-based optimizer, along with proper indexing and
EXPLAINanalysis, are crucial for high-performance HTAP.
This project demonstrates the real-world power and flexibility of Stoolap as a modern embedded database. You now have a solid foundation for building more sophisticated data-driven applications that require high performance for mixed workloads.
What’s next? In the final chapters, we might delve into advanced topics like Stoolap’s configuration, performance tuning, or deployment considerations for various environments. Keep exploring and building!
References
- Stoolap GitHub Repository
- Stoolap Releases - GitHub
- Rust
randcrate documentation - Rust
chronocrate documentation - Rust
bincodecrate documentation - Rust
tokiocrate documentation
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.