Project: Building a Hybrid OLTP/OLAP Analytics Dashboard

Introduction

Welcome to Chapter 10! So far, we’ve explored Stoolap’s core features, from its embedded nature and MVCC transactions to parallel query execution and the exciting world of vector search. Now, it’s time to put that knowledge into action by building a practical project: a hybrid OLTP/OLAP analytics dashboard.

In this chapter, you’ll learn how to leverage Stoolap’s unique capabilities to manage both high-volume transactional data ingestion (OLTP) and complex analytical queries (OLAP) within a single, embedded application. We’ll design a schema suitable for both workloads, insert dynamic data, and then query it to extract meaningful insights, simulating a real-time analytics dashboard. This project will solidify your understanding of Stoolap’s power as an HTAP database.

Before we dive in, make sure you have the Rust toolchain installed and a basic understanding of SQL. Familiarity with the concepts covered in previous chapters, especially schema design, querying, and basic Stoolap setup, will be beneficial. Let’s get building!

Core Concepts for an HTAP Dashboard

Building an effective HTAP dashboard requires a thoughtful approach to data modeling, transaction management, and query optimization. Stoolap, with its modern architecture, provides an excellent foundation for this.

HTAP Refresher: Why Stoolap Shines

Recall that Stoolap is designed to handle Hybrid Transactional/Analytical Processing (HTAP) workloads efficiently. This means it can gracefully manage:

OLTP (Online Transaction Processing): Frequent, small, concurrent read/write operations (e.g., inserting new sales records, updating customer profiles). Stoolap’s Multi-Version Concurrency Control (MVCC) ensures high concurrency by allowing readers to see a consistent snapshot of the database without being blocked by writers.
OLAP (Online Analytical Processing): Complex, read-heavy queries over large datasets, often involving aggregations, joins, and time-series analysis (e.g., calculating total sales by region, identifying top-selling products). Stoolap’s parallel query execution and cost-based optimizer are crucial here, enabling efficient processing of these demanding queries.
Vector Search: For advanced analytical use cases like semantic recommendations, anomaly detection based on data embeddings, or intelligent content retrieval.

The goal of our dashboard is to ingest new transactions (OLTP) and immediately reflect those changes in aggregated analytical views (OLAP), all powered by a single Stoolap instance embedded directly in our application.

Dashboard Application Architecture

Our project will simulate a simple “Sales Analytics Dashboard.” It won’t have a fancy graphical user interface (GUI), but rather a command-line interface that demonstrates data ingestion and report generation.

Here’s a conceptual overview of how our application will interact with Stoolap:

flowchart TD App[Rust Application] --> StoolapDB[Stoolap Embedded Database] subgraph Data_Ingestion["Data Ingestion "] App -->|\1| StoolapDB end subgraph Analytics_OLAP["Analytics "] App -->|\1| StoolapDB end subgraph Reports_Section["Reports"] StoolapDB -->|\1| App end subgraph Advanced_Features_Section["Advanced Features"] App -->|\1| StoolapDB end

Data Model Design for HTAP

For an HTAP system, schema design is a critical balancing act. We need a schema that allows for fast inserts (OLTP) while also being efficient for complex aggregations (OLAP). A common approach is to use a slightly denormalized “fact” table for transactions, which makes analytical queries simpler and faster, potentially combined with dimension tables.

For our sales dashboard, we’ll focus on a sales_transactions table. This table will capture each individual sale. To demonstrate vector search, we’ll also include a product_vector field, which could represent semantic features of a product.

Consider the sales_transactions table with these columns:

transaction_id: A unique identifier for each sale (primary key).
product_id: Identifier for the product sold.
customer_id: Identifier for the customer making the purchase.
amount: The monetary value of the transaction.
quantity: Number of items sold in this transaction.
transaction_timestamp: When the transaction occurred (crucial for time-series analysis).
product_vector: A BLOB type storing an embedding for the product. As of March 2026, Stoolap primarily uses BLOB for custom binary data types like vectors, allowing applications to handle serialization and deserialization. This enables semantic search capabilities.

This design is relatively flat, making it easy to insert new records and efficient for GROUP BY operations on product_id, customer_id, and transaction_timestamp.

Parallel Query Execution and Cost-Based Optimization

Stoolap is designed for performance, and two key features enabling this for OLAP workloads are:

Parallel Query Execution: Stoolap can automatically detect and leverage available CPU cores to execute parts of a complex query in parallel. This significantly speeds up operations like large aggregations or joins. While often managed internally, some DatabaseOptions might allow hints or configurations for thread pools.
Cost-Based Query Optimization: Before executing a query, Stoolap’s optimizer analyzes various potential execution plans and chooses the one estimated to be most efficient (lowest “cost”). This takes into account table statistics, indexes, and query predicates. Understanding the optimizer’s choices, often visible via an EXPLAIN command, is crucial for tuning complex queries.

Leveraging Vector Search in Analytics

Imagine you want to recommend products similar to a user’s past purchases or group products semantically, even if their product_id is different. This is where product_vector comes in. By performing a vector similarity search, we can find products that are “semantically close” to a given product, adding a powerful dimension to our analytics beyond simple categorical matching. Stoolap provides internal functions or operators to efficiently compute similarity metrics like cosine similarity directly within SQL queries.

Step-by-Step Implementation

Let’s start building our Stoolap-powered sales analytics dashboard.

1. Project Setup

First, create a new Rust project and add the stoolap dependency.

cargo new stoolap_htap_dashboard
cd stoolap_htap_dashboard

Now, open Cargo.toml and add Stoolap as a dependency. As of March 2026, we’ll assume a stable version like 1.2.0. Always check the official Stoolap GitHub releases page for the absolute latest version.

# Cargo.toml
[package]
name = "stoolap_htap_dashboard"
version = "0.1.0"
edition = "2021"

[dependencies]
stoolap = "1.2.0" # Use the latest stable version as of 2026-03-20
rand = "0.8"      # For generating random data
chrono = { version = "0.4", features = ["serde"] } # For handling timestamps
serde = { version = "1.0", features = ["derive"] } # For (de)serializing if needed, good practice
bincode = "1.3" # For serializing/deserializing vectors into BLOBs
tokio = { version = "1", features = ["full"] } # For async runtime

Run cargo build to fetch dependencies and ensure everything compiles.

2. Database Initialization and Schema Definition

Now, let’s write the Rust code to open our Stoolap database and define the sales_transactions table. We’ll store our database file as sales_data.db.

Open src/main.rs and add the following code:

// src/main.rs
use stoolap::database::{Database, DatabaseOptions};
use stoolap::error::StoolapError;
use stoolap::types::Value;
use rand::Rng;
use chrono::{Utc, Duration};
use std::sync::Arc; // Arc for shared ownership across async tasks

// We'll define a simple struct to represent our transaction data later
#[derive(Debug, Clone)]
struct SaleTransaction {
    product_id: u32,
    customer_id: u32,
    amount: f64,
    quantity: u32,
    transaction_timestamp: chrono::DateTime<Utc>,
    product_vector: Vec<f32>, // A simple vector of floats for demonstration
}

// Helper function to create a random product vector for demonstration
fn generate_random_vector(size: usize) -> Vec<f32> {
    let mut rng = rand::thread_rng();
    (0..size).map(|_| rng.gen_range(-1.0..1.0)).collect()
}

#[tokio::main] // Stoolap often uses async operations, so we'll use tokio for the main function
async fn main() -> Result<(), StoolapError> {
    println!("Initializing Stoolap HTAP Dashboard...");

    // 1. Open/Create the Stoolap Database
    let db_path = "sales_data.db";
    let options = DatabaseOptions {
        // Example: Configure parallel execution if Stoolap exposes this option.
        // As of 2026-03-20, Stoolap might auto-detect or offer specific settings.
        // Let's assume a plausible option for demonstration:
        num_worker_threads: Some(std::thread::available_parallelism().map_or(4, |p| p.get())),
        ..Default::default()
    };
    let db = Arc::new(Database::open(db_path, options).await?);
    println!("Database opened at: {}", db_path);

    // 2. Define the Schema
    // Stoolap supports various SQL types. For vectors, BLOB is used for custom binary data.
    let create_table_sql = "
        CREATE TABLE IF NOT EXISTS sales_transactions (
            transaction_id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id INTEGER NOT NULL,
            customer_id INTEGER NOT NULL,
            amount REAL NOT NULL,
            quantity INTEGER NOT NULL,
            transaction_timestamp TIMESTAMP NOT NULL,
            product_vector BLOB -- Storing serialized Vec<f32> as a binary large object
        );
    ";

    db.execute(create_table_sql, &[]).await?;
    println!("'sales_transactions' table ensured.");

    // The rest of our logic will go here
    // ...

    Ok(())
}

Explanation:

use stoolap::...: We import necessary components from the Stoolap library. Arc is used because the Database object needs to be safely shared across different asynchronous operations without transferring ownership. Cloning the Arc increments its reference count, allowing multiple parts of your program to hold a reference to the same database instance.
#[tokio::main]: Stoolap’s async nature means we need an async runtime. tokio is a popular choice in Rust.
Database::open(db_path, options).await?: This is how we open or create a Stoolap database file. If sales_data.db doesn’t exist, it will be created. DatabaseOptions can be customized for things like cache size or, as shown, num_worker_threads to hint at parallel execution capabilities. This helps Stoolap leverage available CPU cores.
db.execute(create_table_sql, &[]).await?: This executes our SQL CREATE TABLE statement.
- IF NOT EXISTS prevents errors if the table already exists.
- PRIMARY KEY AUTOINCREMENT for transaction_id handles unique IDs automatically.
- REAL for amount is a floating-point number.
- TIMESTAMP for transaction_timestamp is crucial for time-based analytics.
- BLOB for product_vector is a generic binary large object. As of March 2026, Stoolap typically uses BLOB for custom data structures like vectors, requiring the application to serialize (bincode::serialize) and deserialize (bincode::deserialize) them.

3. Data Ingestion (OLTP)

Now, let’s simulate some transactional data coming into our system. We’ll generate random SaleTransaction objects and insert them into our sales_transactions table.

Add this function to src/main.rs, after the generate_random_vector function (and before main):

// src/main.rs
// ... (previous code including SaleTransaction struct and generate_random_vector)

async fn insert_sample_data(db: Arc<Database>, count: usize) -> Result<(), StoolapError> {
    println!("Inserting {} sample sales transactions...", count);
    let mut rng = rand::thread_rng();
    let product_vector_size = 8; // Small vector size for demo, typically 32-1536+ dimensions

    for i in 0..count {
        let transaction = SaleTransaction {
            product_id: rng.gen_range(100..110), // 10 unique products
            customer_id: rng.gen_range(1000..1020), // 20 unique customers
            amount: rng.gen_range(5.0..250.0),
            quantity: rng.gen_range(1..5),
            transaction_timestamp: Utc::now() - Duration::hours(rng.gen_range(0..24*30)), // Last 30 days
            product_vector: generate_random_vector(product_vector_size),
        };

        // Serialize the vector into a BLOB using bincode
        let serialized_vector = bincode::serialize(&transaction.product_vector)
            .map_err(|e| StoolapError::Other(format!("Failed to serialize vector: {}", e)))?;

        let insert_sql = "
            INSERT INTO sales_transactions (product_id, customer_id, amount, quantity, transaction_timestamp, product_vector)
            VALUES (?, ?, ?, ?, ?, ?);
        ";

        db.execute(
            insert_sql,
            &[
                Value::Integer(transaction.product_id as i64),
                Value::Integer(transaction.customer_id as i64),
                Value::Real(transaction.amount),
                Value::Integer(transaction.quantity as i64),
                Value::Timestamp(transaction.transaction_timestamp),
                Value::Blob(serialized_vector),
            ],
        ).await?;

        if i % 1000 == 0 && i > 0 {
            println!("  Inserted {} transactions...", i);
        }
    }
    println!("Finished inserting {} transactions.", count);
    Ok(())
}

Then, call insert_sample_data from main:

// Inside `main` function, after table creation
    // ...
    db.execute(create_table_sql, &[]).await?;
    println!("'sales_transactions' table ensured.");

    // Insert some sample data (OLTP workload)
    let num_transactions = 10_000; // Let's insert 10,000 transactions
    insert_sample_data(Arc::clone(&db), num_transactions).await?; // We clone the Arc to safely share the database connection across different asynchronous operations.

Explanation of insert_sample_data:

rand::Rng: Used to generate random product IDs, customer IDs, amounts, and timestamps, simulating real-world variability.
chrono::Utc::now() - Duration::hours(...): Generates timestamps within the last month.
bincode::serialize(...): Converts our Vec<f32> (the product vector) into a byte array (Vec<u8>) which Stoolap can store as a BLOB. This is a common pattern when dealing with complex types not directly supported by SQL.
INSERT INTO ... VALUES (?, ?, ?, ?, ?, ?): A prepared statement for efficient and safe insertion. The ? placeholders are replaced by the Value enum variants.
Value::Integer, Value::Real, Value::Timestamp, Value::Blob: Stoolap’s way of representing different data types for query parameters.
MVCC in action: While insert_sample_data is running, other parts of an application (or even separate threads performing analytical queries) could theoretically query the sales_transactions table. Thanks to Stoolap’s MVCC, these concurrent readers would see a consistent snapshot of the data from before the current write operation started, preventing read-write contention and ensuring high availability.

4. Analytical Queries (OLAP)

Now that we have data, let’s run some analytical queries to simulate dashboard reports.

Add the following functions to src/main.rs, after insert_sample_data (and before main):

// src/main.rs
// ... (previous code)

async fn run_total_sales_by_product_report(db: Arc<Database>) -> Result<(), StoolapError> {
    println!("\n--- Report: Top 5 Products by Total Sales ---");
    let query_sql = "
        SELECT
            product_id,
            SUM(amount) AS total_sales,
            COUNT(transaction_id) AS total_transactions
        FROM sales_transactions
        GROUP BY product_id
        ORDER BY total_sales DESC
        LIMIT 5;
    ";

    // You could run EXPLAIN here to see the query plan:
    // let explain_sql = format!("EXPLAIN {}", query_sql);
    // let explain_rows = db.query(&explain_sql, &[]).await?;
    // for row in explain_rows {
    //     println!("EXPLAIN: {:?}", row);
    // }

    let rows = db.query(query_sql, &[]).await?;
    for row in rows {
        let product_id: i64 = row.get(0)?;
        let total_sales: f64 = row.get(1)?;
        let total_transactions: i64 = row.get(2)?;
        println!(
            "Product ID: {}, Total Sales: {:.2}, Total Transactions: {}",
            product_id, total_sales, total_transactions
        );
    }
    Ok(())
}

async fn run_daily_sales_trend_report(db: Arc<Database>) -> Result<(), StoolapError> {
    println!("\n--- Report: Daily Sales Trend (Last 7 Days) ---");
    // Stoolap's SQL dialect, like many embedded databases, often provides
    // `DATE()` and `DATETIME()` functions for timestamp manipulation.
    // `DATE(timestamp)` extracts the date part. `DATETIME('now', '-7 days')` calculates a past date.
    let query_sql = "
        SELECT
            DATE(transaction_timestamp) AS sale_date,
            SUM(amount) AS daily_total_sales
        FROM sales_transactions
        WHERE transaction_timestamp >= DATETIME('now', '-7 days')
        GROUP BY sale_date
        ORDER BY sale_date ASC;
    ";

    let rows = db.query(query_sql, &[]).await?;
    for row in rows {
        // Stoolap might return DATE as a string or a timestamp, we adapt here.
        let sale_date_str: String = row.get(0)?;
        let daily_total_sales: f64 = row.get(1)?;
        println!(
            "Date: {}, Daily Sales: {:.2}",
            sale_date_str, daily_total_sales
        );
    }
    Ok(())
}


async fn run_semantic_product_recommendations(db: Arc<Database>) -> Result<(), StoolapError> {
    println!("\n--- Report: Semantic Product Recommendations ---");
    // For demonstration, let's pick a 'query product' vector.
    // In a real application, this would come from a user's past purchase or a specific product.
    let query_product_id = 105; // Example product ID for which to find similar products
    let mut query_vector_bytes: Option<Vec<u8>> = None;

    // First, retrieve the vector for our query product
    let query_vector_sql = "SELECT product_vector FROM sales_transactions WHERE product_id = ? LIMIT 1;";
    let query_rows = db.query(query_vector_sql, &[Value::Integer(query_product_id as i64)]).await?;
    if let Some(row) = query_rows.into_iter().next() {
        query_vector_bytes = Some(row.get(0)?)
    } else {
        println!("Query product ID {} not found.", query_product_id);
        return Ok(());
    }

    let query_vector: Vec<f32> = bincode::deserialize(&query_vector_bytes.unwrap())
        .map_err(|e| StoolapError::Other(format!("Failed to deserialize query vector: {}", e)))?;

    // Now, find other products with similar vectors.
    // Stoolap's vector search is typically exposed via specific functions or operators.
    // As of March 2026, assuming a `VECTOR_COSINE_SIMILARITY(vector_col, query_vector_blob)`
    // function is available and optimized for efficient similarity search.
    // This is a common pattern for embedded databases supporting vector search.
    // Always refer to the [official Stoolap documentation](https://github.com/stoolap/stoolap)
    // for the exact function signature and usage.
    let recommendation_sql = "
        SELECT
            product_id,
            -- Stoolap's specific vector function for cosine similarity
            VECTOR_COSINE_SIMILARITY(product_vector, ?) AS similarity_score
        FROM sales_transactions
        WHERE product_id != ? -- Exclude the query product itself
        GROUP BY product_id -- Group to get unique products with their best similarity score
        ORDER BY similarity_score DESC
        LIMIT 5;
    ";

    // Re-serialize the query vector for the SQL parameter
    let serialized_query_vector = bincode::serialize(&query_vector)
        .map_err(|e| StoolapError::Other(format!("Failed to serialize query vector for search: {}", e)))?;

    let rows = db.query(
        recommendation_sql,
        &[
            Value::Blob(serialized_query_vector),
            Value::Integer(query_product_id as i64),
        ],
    ).await?;

    if rows.is_empty() {
        println!("No similar products found.");
    } else {
        println!("Top 5 products semantically similar to Product ID {}:", query_product_id);
        for row in rows {
            let recommended_product_id: i64 = row.get(0)?;
            let similarity_score: f64 = row.get(1)?;
            println!(
                "  Product ID: {}, Similarity Score: {:.4}",
                recommended_product_id, similarity_score
            );
        }
    }
    Ok(())
}

Now, call these report functions from main:

// Inside `main` function, after `insert_sample_data`
    // ...
    insert_sample_data(Arc::clone(&db), num_transactions).await?;

    // Run our analytical reports (OLAP workload)
    run_total_sales_by_product_report(Arc::clone(&db)).await?;
    run_daily_sales_trend_report(Arc::clone(&db)).await?;
    run_semantic_product_recommendations(Arc::clone(&db)).await?;

    println!("\nStoolap HTAP Dashboard simulation complete!");

    Ok(())
}

Explanation of Analytical Queries:

run_total_sales_by_product_report:
- Uses SUM(amount) and COUNT(transaction_id) with GROUP BY product_id. This is a classic OLAP aggregation.
- ORDER BY total_sales DESC LIMIT 5 shows us the top-selling products.
- Cost-Based Optimization: The commented EXPLAIN block demonstrates how you would typically inspect the query optimizer’s plan. Running EXPLAIN on a query reveals details like which indexes are used, join order, and intermediate steps, helping you understand and optimize its performance.
run_daily_sales_trend_report:
- DATE(transaction_timestamp) is used to truncate timestamps to just the date, allowing us to group by day. This is a common SQL function.
- WHERE transaction_timestamp >= DATETIME('now', '-7 days') filters for recent data, a common dashboard requirement.
run_semantic_product_recommendations:
- This is where vector search comes into play. We first retrieve the product_vector for a specific product.
- Then, we use a hypothetical but plausible VECTOR_COSINE_SIMILARITY(vector_column, query_vector_blob) function. This function calculates the cosine similarity between the stored product_vector and our query_vector. Stoolap would internally optimize this operation, potentially using specialized indexes for vector data.
- The results are ORDER BY similarity_score DESC to show the most similar products.
- Important: The exact SQL syntax for vector search (e.g., function names, operators) will depend on Stoolap’s specific implementation. Always refer to the official Stoolap documentation for the most accurate syntax as this is an evolving feature.

Now, run your application:

cargo run --release

You should see output showing the database initialization, data insertion progress, and then the results of your three analytical reports!

Mini-Challenge

You’ve built a basic HTAP dashboard! Now, let’s extend its capabilities.

Challenge: Implement a new analytical report function called run_customer_spending_distribution that calculates the average transaction value per customer and identifies the top 5 customers by their average transaction amount.

Hint: You’ll need to use AVG(amount) and GROUP BY customer_id. Don’t forget to order the results!
What to observe/learn: This exercise reinforces your understanding of GROUP BY with aggregate functions and how to extract specific insights about user behavior from transactional data.

Once implemented, call this new function from your main function alongside the other reports.

Common Pitfalls & Troubleshooting

Incorrect SQL Syntax for Stoolap: Stoolap, while SQL-compliant, might have minor differences in specific functions (especially date/time or vector functions) compared to other databases.
- Troubleshoot: Always cross-reference with the official Stoolap documentation for exact function names and syntax. Error messages from Stoolap typically point to syntax issues.
Vector Serialization/Deserialization Issues: Storing Vec<f32> as BLOB requires careful serialization and deserialization.
- Troubleshoot: Ensure you’re using the same serialization library (bincode in our case) and version for both writing and reading. Check for StoolapError::Other messages related to serialization failures. Verify the product_vector_size is consistent.
Performance Bottlenecks with Large Datasets: As your sales_transactions table grows, some analytical queries might become slow.
- Troubleshoot:
  - Indexing: Ensure appropriate indexes are created (e.g., on product_id, customer_id, transaction_timestamp) for frequently queried columns. Stoolap’s query optimizer relies heavily on indexes.
  - Query Optimization: As demonstrated, use EXPLAIN to analyze the execution plan of your slow queries. Look for full table scans, inefficient joins, or missing indexes.
  - Parallel Execution: While Stoolap aims to manage this automatically, ensure your DatabaseOptions (e.g., num_worker_threads) are configured appropriately to leverage available CPU cores for complex OLAP queries.
Resource Contention (HTAP challenge): If you’re constantly inserting data while running very heavy analytical queries, you might observe temporary slowdowns.
- Troubleshoot: Stoolap’s MVCC is designed to largely mitigate this by allowing reads to proceed without blocking writes. However, for extremely high contention scenarios, consider batching inserts, optimizing indexes, or potentially running very heavy OLAP reports during off-peak hours (though the goal of HTAP is to minimize the need for this).

Summary

Congratulations! You’ve successfully built a basic Hybrid OLTP/OLAP analytics dashboard using Stoolap.

Here are the key takeaways from this chapter:

HTAP in Practice: You saw how Stoolap can simultaneously handle transactional data ingestion (OLTP) and complex analytical reporting (OLAP) within a single, embedded application.
Schema Design for HTAP: We designed a sales_transactions table that balances the needs of both fast writes and efficient analytical queries, using BLOB for flexible storage of vector embeddings.
Practical Data Ingestion: You learned how to programmatically insert data into Stoolap using Rust, including handling complex types like vectors via serialization (bincode).
Diverse Analytical Queries: You implemented several common OLAP patterns, including aggregations (SUM, COUNT), grouping (GROUP BY), and time-series analysis using Stoolap’s SQL date functions (DATE, DATETIME).
Vector Search Integration: We explored how Stoolap’s vector search capabilities can be integrated into analytical reports for semantic recommendations, using a specialized SQL function (VECTOR_COSINE_SIMILARITY).
Performance Considerations: We touched upon how Stoolap’s parallel query execution and cost-based optimizer, along with proper indexing and EXPLAIN analysis, are crucial for high-performance HTAP.

This project demonstrates the real-world power and flexibility of Stoolap as a modern embedded database. You now have a solid foundation for building more sophisticated data-driven applications that require high performance for mixed workloads.

What’s next? In the final chapters, we might delve into advanced topics like Stoolap’s configuration, performance tuning, or deployment considerations for various environments. Keep exploring and building!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.