Accelerating Queries with Parallel Execution

Introduction to Parallel Execution

Welcome back, intrepid data explorer! In our journey through Stoolap, we’ve already covered the foundational concepts of setting up your database, modeling data, and managing concurrent operations with MVCC transactions. These are crucial building blocks for any robust application.

Today, we’re going to dive into a feature that truly sets modern embedded databases like Stoolap apart: parallel query execution. Imagine you have a huge pile of work, and instead of doing it all yourself, you can enlist a team of helpers to tackle different parts simultaneously. That’s the essence of parallel execution in a database!

Why does this matter, especially for an embedded database? Traditionally, embedded databases were seen as lightweight solutions for simple data storage. Stoolap challenges this notion by bringing advanced features like parallel execution to the embedded world. This means you can perform complex analytical queries, crunching large datasets directly within your application, leveraging all the processing power of modern multi-core CPUs. This chapter will demystify how Stoolap achieves this, why it’s a game-changer, and how you can harness its power to make your applications blazingly fast.

The Power of Parallelism: Core Concepts

At its heart, parallel execution is about performing multiple tasks concurrently to reduce the overall time taken to complete a larger job. In the context of a database, this means breaking down a single, complex SQL query into smaller, independent sub-tasks that can be processed simultaneously by different CPU cores or threads.

Why Stoolap Embraces Parallel Execution

Traditional embedded databases like SQLite are primarily designed for single-threaded, transactional workloads. While incredibly efficient for their purpose, they typically don’t offer built-in parallel query execution. Stoolap, however, is built with a modern architecture from the ground up to support both high-performance OLTP (transactional) and OLAP (analytical) workloads within a single, embedded system.

Here’s why Stoolap excels at parallel execution:

Rust’s Concurrency Prowess: Stoolap is written in Rust, a language renowned for its memory safety and powerful concurrency primitives. This allows Stoolap’s developers to build highly efficient, thread-safe parallel execution mechanisms without the common pitfalls of other languages.
Modern Query Optimizer: Stoolap’s cost-based query optimizer (which we’ll explore in the next chapter!) is sophisticated enough to analyze a query and identify parts that can be executed in parallel. It intelligently decides how to best distribute the workload.
HTAP Design: As a Hybrid Transactional/Analytical Processing (HTAP) database, Stoolap is engineered to handle both quick, individual data operations and heavy, analytical aggregations. Parallel execution is crucial for the latter, allowing it to process vast amounts of data efficiently.

How Parallel Execution Works (The Big Picture)

Let’s simplify the process. When you submit a complex SQL query to Stoolap, here’s a high-level overview of what happens:

Parsing: The query string is broken down into its constituent parts, checking for syntax.
Optimization: The query optimizer takes this parsed query and figures out the most efficient way to execute it. This is where the magic of parallel execution planning happens. The optimizer identifies operations like large table scans, complex joins, or aggregations that can be split and run in parallel.
Task Distribution: The execution engine, acting like a project manager, takes the optimized plan and dispatches sub-tasks to a pool of worker threads.
Parallel Processing: Each worker thread processes its assigned portion of the data or sub-task concurrently.
Result Aggregation: Once all worker threads complete their tasks, their partial results are gathered and combined to produce the final result set for your query.

This collaborative approach significantly reduces the total query execution time, especially for data-intensive operations.

Let’s visualize this with a simple Mermaid diagram:

flowchart TD A[SQL Query] --> B[Parser] B --> C[Logical Plan Generator] C --> D[Cost-Based Optimizer] D -->|Optimized Physical Plan| E[Execution Engine] subgraph Parallel_Execution_Phase["Parallel Execution Phase"] direction LR E_1[Task Dispatcher] --> F_1[Worker Thread 1] E_1 --> F_2[Worker Thread 2] E_1 --> F_N[...N Worker Threads...] F_1 --> G_1[Partial Results] F_2 --> G_2[Partial Results] F_N --> G_N[Partial Results] end E --> E_1 G_1 --> H[Result Aggregation] G_2 --> H G_N --> H H --> I[Final Query Result]

A-C: Planning Phase: Your SQL query is first understood and converted into a logical execution plan.
D: Optimization: The optimizer analyzes the logical plan and determines the most efficient physical plan, including identifying opportunities for parallel execution.
E: Execution Engine: The engine orchestrates the actual execution, dispatching tasks.
Parallel Execution Phase (E_1, F_x, G_x): This is where the work is split among multiple worker threads, each processing a portion of the data independently.
H-I: Result Assembly: The partial results from all threads are then combined to form the final output.

Configuring Parallelism in Stoolap

Stoolap aims to be smart about parallel execution out-of-the-box, but you can often influence its behavior. While the exact configuration might evolve, typically databases allow you to set parameters like:

max_worker_threads (or similar): Defines the maximum number of threads the database can use for parallel operations. A common recommendation is to set this to the number of CPU cores available on your system, or slightly less to leave resources for other application tasks.
parallel_threshold: A hint to the optimizer, indicating the minimum amount of data or complexity a query must have before it considers parallel execution. Small queries often incur more overhead from parallelization than they gain in speed.

Important Note for Stoolap (as of 2026-03-20): Stoolap is actively developed. Configuration options for parallel execution might be exposed through its API when embedding it, or via specific pragmas within SQL. Always consult the official Stoolap GitHub repository and documentation for the most up-to-date configuration methods. For our examples, we’ll assume a PRAGMA or connection string option for simplicity.

Step-by-Step Implementation: Seeing Parallelism in Action

Let’s get our hands dirty and create a scenario where parallel execution can shine. We’ll simulate an IoT sensor data logging system, generating a large number of entries, and then run an analytical query that benefits from Stoolap’s parallel processing capabilities.

First, ensure you have your Rust environment set up and Stoolap added as a dependency, as covered in earlier chapters. We’ll use Stoolap version 0.5.2 for this example (a hypothetical stable release as of March 2026).

1. Project Setup and Dependencies

If you don’t have a Cargo.toml ready, create a new Rust project:

cargo new stoolap_parallel_example
cd stoolap_parallel_example

Now, add Stoolap to your Cargo.toml:

# Cargo.toml
[package]
name = "stoolap_parallel_example"
version = "0.1.0"
edition = "2021"

[dependencies]
stoolap = "0.5.2" # Verify latest stable release on GitHub: https://github.com/stoolap/stoolap/releases
rand = "0.8"      # For generating random data
chrono = { version = "0.4", features = ["serde"] } # For timestamps

Run cargo build to fetch dependencies.

2. Generating a Large Dataset

We need a substantial amount of data to make parallel execution relevant. Let’s create a table for sensor readings and populate it with a million rows.

Open src/main.rs and add the following code:

// src/main.rs
use stoolap::{Connection, Error};
use rand::Rng;
use chrono::{Utc, Duration};
use std::time::Instant;

fn main() -> Result<(), Error> {
    // 1. Establish a connection to an in-memory Stoolap database for simplicity
    // For a persistent database, specify a file path: Connection::open("sensor_data.db")?
    let db_path = "sensor_data.db";
    let mut conn = Connection::open(db_path)?;

    println!("Database opened at: {}", db_path);

    // 2. Create the 'sensor_readings' table
    conn.execute_batch(
        "CREATE TABLE IF NOT EXISTS sensor_readings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            sensor_id INTEGER NOT NULL,
            timestamp DATETIME NOT NULL,
            temperature REAL NOT NULL,
            humidity REAL NOT NULL
        );"
    )?;
    println!("Table 'sensor_readings' created or already exists.");

    // 3. Insert a large amount of data (e.g., 1,000,000 rows)
    let num_rows = 1_000_000;
    let mut rng = rand::thread_rng();
    let start_time = Utc::now() - Duration::days(365); // Data from the last year

    println!("Inserting {} rows...", num_rows);
    let insert_start = Instant::now();

    // Use a transaction for faster inserts
    conn.execute("BEGIN TRANSACTION", &[])?;
    for i in 0..num_rows {
        let sensor_id = rng.gen_range(1..=100); // 100 different sensors
        let offset_seconds = rng.gen_range(0..=(365 * 24 * 60 * 60));
        let timestamp = start_time + Duration::seconds(offset_seconds);
        let temperature = rng.gen_range(15.0..=35.0); // Temperature between 15 and 35
        let humidity = rng.gen_range(30.0..=80.0);    // Humidity between 30 and 80

        conn.execute(
            "INSERT INTO sensor_readings (sensor_id, timestamp, temperature, humidity) VALUES (?, ?, ?, ?)",
            &[&sensor_id, &timestamp.to_rfc3339(), &temperature, &humidity],
        )?;

        if (i + 1) % 100_000 == 0 {
            println!("  Inserted {} rows...", i + 1);
        }
    }
    conn.execute("COMMIT", &[])?;
    let insert_duration = insert_start.elapsed();
    println!("Inserted {} rows in {:?}", num_rows, insert_duration);

    // Placeholder for query execution
    // We'll add the analytical query here next

    Ok(())
}

Run this once with cargo run to populate your sensor_data.db file. This might take a little while, which is exactly what we want to demonstrate the benefits of parallel execution!

3. Executing a Parallelized Analytical Query

Now, let’s add an analytical query that computes average temperature and humidity for each sensor over a specific period. This type of query is a prime candidate for parallel execution because it involves scanning a large portion of the table and performing aggregations.

Add the following code after the insertion loop in src/main.rs:

    // ... (previous code)

    // 4. Configure Stoolap for parallel execution (if direct configuration is available)
    //    For Stoolap 0.5.2, parallel execution is often enabled by default for suitable queries.
    //    However, if there were a PRAGMA to set thread count, it might look like this:
    // conn.execute("PRAGMA parallel_threads = 4;", &[])?; // Example: Use 4 worker threads
    // println!("Configured Stoolap for parallel execution (if applicable).");

    // 5. Define an analytical query
    let analytical_query = "
        SELECT
            sensor_id,
            AVG(temperature) AS avg_temperature,
            AVG(humidity) AS avg_humidity,
            COUNT(*) AS readings_count
        FROM
            sensor_readings
        WHERE
            timestamp >= ? AND timestamp < ?
        GROUP BY
            sensor_id
        ORDER BY
            sensor_id;
    ";

    let query_start_date = (Utc::now() - Duration::days(180)).to_rfc3339(); // Last 6 months
    let query_end_date = Utc::now().to_rfc3339();

    println!("\nExecuting analytical query for last 6 months...");
    let query_start = Instant::now();

    let mut stmt = conn.prepare(analytical_query)?;
    let rows = stmt.query(&[&query_start_date, &query_end_date])?;

    let mut result_count = 0;
    for row in rows {
        let sensor_id: i64 = row.get(0)?;
        let avg_temp: f64 = row.get(1)?;
        let avg_humidity: f64 = row.get(2)?;
        let count: i64 = row.get(3)?;
        // println!("Sensor ID: {}, Avg Temp: {:.2}, Avg Humidity: {:.2}, Readings: {}", sensor_id, avg_temp, avg_humidity, count);
        result_count += 1;
    }

    let query_duration = query_start.elapsed();
    println!("Query returned {} results in {:?}", result_count, query_duration);

    // Optional: Explain the query plan to see if parallelism was used
    println!("\nExplaining query plan (if supported by Stoolap's API)...");
    // Stoolap might expose an EXPLAIN or EXPLAIN ANALYZE command.
    // For demonstration, let's assume `EXPLAIN` returns a textual plan.
    let mut explain_stmt = conn.prepare(&format!("EXPLAIN {}", analytical_query))?;
    let explain_rows = explain_stmt.query(&[&query_start_date, &query_end_date])?;
    println!("--- Query Plan ---");
    for row in explain_rows {
        let plan_node: String = row.get(0)?;
        println!("{}", plan_node);
    }
    println!("------------------\n");


    Ok(())
}

Explanation of the new code:

analytical_query: This SQL query calculates the average temperature and humidity for each sensor_id within a given date range. It uses AVG(), COUNT(), GROUP BY, and WHERE clauses, making it a good candidate for parallelization.
conn.prepare().query(): We prepare the query once and then execute it, passing the date range as parameters.
Instant::now().elapsed(): We use Rust’s Instant to measure the execution time of the query.
EXPLAIN (Hypothetical): Many databases offer an EXPLAIN command to show the execution plan. If Stoolap supports EXPLAIN (or EXPLAIN ANALYZE), running it will reveal how the optimizer plans to execute the query, and critically, if it intends to use parallel workers for specific steps. Look for keywords like “Parallel Scan”, “Parallel Aggregate”, or “Worker Threads” in the output.

Run cargo run again. This time, observe the query execution time. On a multi-core machine, Stoolap should automatically leverage parallel execution for this type of query, leading to a faster result compared to a purely serial execution model, especially as the data size grows.

What to Observe

Execution Time: Notice the Query returned X results in Y output. On systems with multiple cores, this query should execute significantly faster than if it were processed entirely by a single thread, especially with a million rows.
CPU Usage: While the query is running, open your system’s task manager or activity monitor. You should see multiple CPU cores actively engaged, indicating Stoolap is distributing the workload.
Explain Plan (if supported): If EXPLAIN provides detailed output, you might see evidence of parallel operators. For instance, a “Parallel Table Scan” would indicate that different threads are scanning different parts of the sensor_readings table simultaneously.

Mini-Challenge: Optimize Another Query

Now it’s your turn! Let’s practice identifying and optimizing another common analytical pattern.

Challenge:

Add a new column event_type TEXT to the sensor_readings table. You can do this with an ALTER TABLE statement or by recreating the table (if you don’t mind losing data, or just create a new table).
Update a portion of your existing data (or insert new data) with different event_type values (e.g., “ALERT”, “NORMAL”, “WARNING”).
Write a SQL query that calculates the maximum temperature and the count of ‘ALERT’ events for each sensor_id over the entire dataset.
Execute this query and measure its performance.
(Optional, if Stoolap has a direct way to disable parallelism) Try to disable parallel execution and compare the performance.

Hint:

You’ll need MAX(temperature) and SUM(CASE WHEN event_type = 'ALERT' THEN 1 ELSE 0 END) or COUNT(CASE WHEN event_type = 'ALERT' THEN 1 END) for the alert count.
Remember to use GROUP BY sensor_id.

To add a column and update existing rows:

ALTER TABLE sensor_readings ADD COLUMN event_type TEXT DEFAULT 'NORMAL';
UPDATE sensor_readings SET event_type = 'ALERT' WHERE temperature > 30 AND humidity > 70;
-- Set some more to WARNING
UPDATE sensor_readings SET event_type = 'WARNING' WHERE temperature > 25 AND event_type = 'NORMAL' LIMIT 100000;

You might need to re-run your initial data generation script if you drop and recreate the table.

What to Observe/Learn:

How adding another complex aggregation (conditional counting) still benefits from parallel execution.
The impact of ALTER TABLE and UPDATE on your data schema and contents.
The relative performance difference between your parallelized query and a potentially serial execution (if you manage to disable it).

Common Pitfalls & Troubleshooting

While parallel execution is powerful, it’s not a silver bullet. Understanding its limitations and common issues can help you leverage it effectively.

Overhead for Small Queries: Parallelizing a very simple query or one that processes only a few rows can actually be slower than running it serially. The overhead of task creation, distribution, and result aggregation can outweigh any gains.
- Troubleshooting: Stoolap’s optimizer is usually smart enough to avoid parallelizing trivial queries. If you suspect a small query is slow due to parallelism, check its EXPLAIN plan.
Resource Contention: If your application is already heavily multi-threaded or you’re running many parallel queries simultaneously, you might hit CPU or I/O bottlenecks. Too many parallel worker threads can lead to context switching overhead, slowing things down.
- Troubleshooting: Monitor system CPU usage. If it’s consistently at 100% across all cores, you might be over-parallelizing. Consider configuring max_worker_threads to a lower value if Stoolap exposes this option, or stagger your complex queries.
Non-Parallelizable Operations: Not all parts of a SQL query can be parallelized. For example, operations that require global ordering (like a single ORDER BY without a LIMIT on the final result set) or certain types of scalar functions might need to be performed serially after parallel steps.
- Troubleshooting: Again, the EXPLAIN plan is your best friend. It will show which operations are parallel and which are serial, helping you understand bottlenecks.
Ineffective Schema Design: While parallel execution helps, a poorly designed schema (e.g., missing indexes on WHERE clause columns) can still lead to full table scans even with parallelism, which might be inefficient.
- Troubleshooting: Ensure appropriate indexes are in place for columns used in WHERE, JOIN, and ORDER BY clauses to reduce the amount of data that needs to be processed, even by parallel workers.

Summary

Phew! You’ve just explored one of the most exciting features of modern embedded databases: parallel query execution. Let’s quickly recap the key takeaways:

Leveraging Modern Hardware: Stoolap utilizes parallel execution to take full advantage of multi-core CPUs, speeding up complex analytical queries within your embedded application.
Stoolap’s Advantages: Built with Rust’s concurrency features and a sophisticated query optimizer, Stoolap is designed for HTAP workloads, making parallel processing a core strength.
How it Works: Queries are parsed, optimized to identify parallelizable tasks, distributed to worker threads, executed concurrently, and then their partial results are aggregated.
Practical Application: You learned how to set up a scenario with a large dataset and execute an analytical query that benefits from Stoolap’s parallel capabilities.
Configuration & Monitoring: While often automatic, understanding how to configure parallel threads (if exposed) and interpret EXPLAIN plans is crucial for optimization.
Common Pitfalls: Be mindful of overhead for small queries, resource contention, and inherently serial operations.

You’re now equipped to understand how Stoolap can handle demanding analytical workloads, right alongside your transactional operations, all from within your application. This capability is a huge differentiator for Stoolap in the embedded database landscape.

In our next chapter, we’ll delve deeper into the brain behind these optimizations: Cost-Based Query Optimization. You’ll learn how Stoolap’s optimizer makes intelligent decisions about query plans, including when and how to apply parallel execution, to ensure your queries run as fast as possible!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.