WebAssembly’s std::thread underlying Web Workers relies on the browser, which has a strict limit on the number of Workers that can be active at the same time (usually the number of physical CPU cores, such as 4-8).

Why does a two-tier loop (say 10×10) create threads that consume more resources than a single-tier loop (100 iterations)?

In a WebAssembly (WASM) or browser environment, a two-tier loop that creates threads (for example, 10 outer loops, 10 inner loops, for a total of 100 thread creations) is more likely to cause resource depletion and performance degradation than a single-tier loop that creates 100 threads directly.

Because single-layer loops have thread (web worker) reuse! In a two-tier loop, after 10 threads are created in the outer layer, each inner layer thread needs to initialize these resources (execution context, stack memory, kernel resources) repeatedly.

What does it mean that a single-layer loop of 100 threads can reuse threads using a thread pool (such as a fixed size 8)? So the web worker doesn’t actually use 100, but 8?

Yes, your understanding is absolutely correct. When a single-layer loop of 100 threaded tasks uses a thread pool of a fixed size of 8, only eight active Web workers are actually created instead of 100. This is achieved through the task queue and thread reuse mechanism of thread pools. The following is a detailed analysis of the specific principles:

Thread Pool Core Design Concepts

The core design philosophy of a thread pool aims to reuse a fixed number of threads to handle a large batch of tasks, thereby avoiding the overhead associated with frequent creation and destruction of threads. The key elements in its design include:

Fixed Number of Threads: The maximum number of threads is determined based on the browser’s hardware concurrency (e.g., navigator.hardwareConcurrency). For example, setting up 8 threads helps prevent excessive thread creation.
Task Queue: Tasks that are not immediately assigned are placed into a queue. When a thread becomes available, it takes tasks from the queue rather than creating a new thread for each task.
Thread Lifecycle Management: The thread pool reuses the same set of threads throughout its lifecycle until it is destroyed. This avoids terminating threads as soon as their individual tasks are completed.

Execution Process for a Single Loop with 100 Tasks

Assuming there’s a single loop that requires executing 100 tasks, and an 8-thread sized fixed pool is used, the execution logic follows:

Initialization of Web Workers: The thread pool initializes 8 Web Workers when it starts. Each Worker consumes approximately 5MB of memory (totaling around 40MB).
Task Assignment: Each Worker has a status marked as either “idle” or “busy”. Initially, all Workers are idle. The first 8 tasks are assigned to these 8 Workers, and the remaining 92 tasks enter the queue waiting for execution.
Thread Reuse: Once a Worker completes its task, it automatically takes the next available task from the queue without needing to create a new thread.

Ultimately, the 8 Workers will sequentially complete all 100 tasks, with each Worker handling an average of approximately 12.5 tasks.

Resource Consumption Comparison

Without Thread Pool (100 Web Workers): Consumes roughly 500MB of memory (100 × 5MB). This may also trigger browser thread limits (e.g., Chrome’s limit of a maximum of 100 Workers per page).
With Thread Pool (8 Web Workers): Uses only 40MB of memory, fully adhering to hardware concurrency constraints.

Example doubleThreads

The first layer loops through 10 tasks. The second layer loops through 10 tasks and does a lot of computation on each task.

There is no thread pool size limit (unlike wasm), using pure JS to simulate the web worker working mechanism. (CPU utilization 100%)

Example singleThreads

The example of a single layer loop they all do the same task.

Performance of single-layer loop with 100 iterations (69% CPU utilization)

Task on childWorker.js:

        const len = 1000;
        for (let i = 0; i < len; i++) {
            result += Math.sqrt(i) * Math.sin(i);
            for (let j = 0; j < len*50; j++) {
                result += Math.sqrt(j) * Math.sin(j);
            }
        }

Files in doubleThreads

server.py:

# Python标准库的http.server模块中配置SharedArrayBuffer响应头

import os
from http.server import HTTPServer, SimpleHTTPRequestHandler

class CustomRequestHandler(SimpleHTTPRequestHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, directory=os.getcwd(), **kwargs)

    def send_head(self):
        path = self.translate_path(self.path)
        
        # 检查文件是否存在并获取其MIME类型
        if not os.path.exists(path) or not os.path.isfile(path):
            return None
        
        ctype = self.guess_type(path)
        try:
            f = open(path, 'rb')
        except OSError:
            self.send_error(404, "File not found")
            return None
        
        self.send_response(200)
        self.send_header("Content-type", ctype)
        fs = os.fstat(f.fileno())
        self.send_header("Content-Length", str(fs[6]))
        self.send_header('Cross-Origin-Embedder-Policy', 'require-corp')
        self.send_header('Cross-Origin-Opener-Policy', 'same-origin')
        self.end_headers()
        return f

def run_server(host='127.0.0.1', port=4003):
    server_address = (host, port)
    httpd = HTTPServer(server_address, CustomRequestHandler)
    print(f'Starting server on {host}:{port}...')
    httpd.serve_forever()

if __name__ == '__main__':
    run_server()

index.html

<!DOCTYPE html>
<html>
<head>
    <title>Nested Web Workers Test</title>
    <style>
        #status {
            max-height: 400px;
            overflow-y: auto;
            padding: 10px;
            border: 1px solid #ccc;
            margin: 10px;
            font-family: monospace;
        }
        #status div {
            margin: 2px 0;
            border-bottom: 1px solid #eee;
        }
    </style>
</head>
<body>
    <h1>Nested Web Workers Test</h1>
    <div id="status"></div>
    <script src="main.js"></script>
</body>
</html>

main.js

const PARENT_WORKERS = 10;
const parentWorkers = [];
let totalWorkers = 0;
const childWorkerCounts = new Map(); // 存储每个父 Worker 的子 Worker 数量
const maxWorkersLimit = navigator.hardwareConcurrency * 4;

function updateTotalWorkers() {
    const totalChildWorkers = Array.from(childWorkerCounts.values()).reduce((sum, count) => sum + count, 0);
    totalWorkers = parentWorkers.length + totalChildWorkers;
    return totalWorkers;
}

function checkWorkerLimit(count) {
    const memoryUsage = window.performance?.memory?.usedJSHeapSize;
    const memoryLimit = window.performance?.memory?.jsHeapSizeLimit;
    
    const totalChildWorkers = Array.from(childWorkerCounts.values()).reduce((sum, count) => sum + count, 0);
    updateStatus(`Parent workers: ${parentWorkers.length}`);
    updateStatus(`Child workers: ${totalChildWorkers}`);
    updateStatus(`Total workers: ${updateTotalWorkers()}`);
    updateStatus(`CPU cores: ${navigator.hardwareConcurrency}`);
    
    if (memoryUsage && memoryLimit) {
        const memoryPercentage = (memoryUsage / memoryLimit) * 100;
        updateStatus(`Memory usage: ${memoryPercentage.toFixed(2)}%`);
    }

    if (totalWorkers >= maxWorkersLimit) {
        updateStatus('WARNING: Reached maximum worker limit!');
        return false;
    }
    
    return true;
}

function updateStatus(message) {
    console.log(message);

    const statusDiv = document.getElementById('status');
    const timestamp = new Date().toLocaleTimeString();
    const logEntry = document.createElement('div');
    logEntry.textContent = `[${timestamp}] ${message}`;
    statusDiv.appendChild(logEntry);

    // 保持最新的消息可见
    statusDiv.scrollTop = statusDiv.scrollHeight;
}

// 创建父级 workers
for (let i = 0; i < PARENT_WORKERS; i++) {
    if (!checkWorkerLimit(totalWorkers)) {
        updateStatus('Stopping worker creation due to resource limits');
        break;
    }

    const worker = new Worker('parentWorker.js');
    
    worker.onmessage = function(e) {
        if (e.data.type === 'workerCount') {
            childWorkerCounts.set(i, e.data.count); // 为每个父 Worker 单独记录子 Worker 数量
            checkWorkerLimit();
        } else {
            updateStatus(`Parent Worker ${i}: ${JSON.stringify(e.data)}`);
        }
    };

    worker.onerror = function(error) {
        childWorkerCounts.delete(i); // 发生错误时清除该父 Worker 的子 Worker 计数
        updateTotalWorkers();
        updateStatus(`Parent Worker ${i} Error: ${error.message}`);
    };

    worker.postMessage({ id: i, type: 'start' });
    parentWorkers.push(worker);
}

parentWorker.js

const CHILD_WORKERS = 10;
const childWorkers = [];
let activeChildWorkers = 0;

function reportWorkerCount() {
    postMessage({
        type: 'workerCount',
        count: activeChildWorkers
    });
}

// 创建子级 workers
for (let i = 0; i < CHILD_WORKERS; i++) {
    try {
        const worker = new Worker('childWorker.js');
        activeChildWorkers++;
        reportWorkerCount();
        
        worker.onmessage = function(e) {
            if (e.data.type === 'error' || e.data.type === 'resourceLimit') {
                postMessage({
                    type: 'warning',
                    childId: i,
                    message: e.data.message
                });
            } else {
                postMessage({
                    childId: i,
                    data: e.data
                });
            }
        };

        worker.onerror = function(error) {
            activeChildWorkers--;
            reportWorkerCount();
            postMessage({
                type: 'error',
                childId: i,
                message: error.message
            });
        };

        worker.postMessage({ id: i, type: 'start' });
        childWorkers.push(worker);
    } catch (e) {
        postMessage({
            type: 'error',
            message: `Failed to create child worker ${i}: ${e.message}`
        });
        break;
    }
}

// 接收来自主线程的消息
onmessage = function(e) {
    postMessage({
        type: 'info',
        message: `Parent worker ${e.data.id} started with ${CHILD_WORKERS} child workers`
    });
};

childWorker.js

let count = 0;
let isResourceLimited = false;
let memoryLeakArray = []; // 用于存储大量数据

// 检查资源使用情况
function checkResources() {
    try {
        const memory = performance.memory;
        if (memory) {
            const usedHeap = memory.usedJSHeapSize;
            const heapLimit = memory.jsHeapSizeLimit;
            
            if (usedHeap / heapLimit > 0.8) {
                postMessage({
                    type: 'resourceLimit',
                    message: 'Memory usage too high',
                    usage: usedHeap,
                    limit: heapLimit
                });
                isResourceLimited = true;
                return false;
            }
        }
        return true;
    } catch (e) {
        return true; // 如果无法获取内存信息，继续执行
    }
}

// 执行计算任务
function performCalculations() {
    if (isResourceLimited || !checkResources()) {
        return null;
    }

    let result = 0;
    try {
        // const dataLen = 170000*1;
        // // 创建大量内存占用
        // let largeObject = new Array(dataLen).fill(0).map(() => ({
        //     data: new Float64Array(1000),
        //     string: 'x'.repeat(10000),
        //     array: new Array(1000).fill(Math.random()),
        //     date: new Date(),
        //     objects: new Array(100).fill(null).map(() => ({
        //         nested: new Array(100).fill(Math.random())
        //     }))
        // }));
        
        // // 使用完数据后进行释放
        // const processData = () => {
        //     memoryLeakArray.push(...largeObject);
        //     // 清理 largeObject
        //     largeObject.forEach(item => {
        //         item.data = null;
        //         item.array = null;
        //         item.objects = null;
        //     });
        //     largeObject = null;
            
        //     // 定期清理 memoryLeakArray
        //     if (memoryLeakArray.length > 5000000) {
        //         memoryLeakArray.splice(0, 1000000);
        //     }
        // };

        // processData();

        // 原有的计算任务
        const len = 1000;
        for (let i = 0; i < len; i++) {
            result += Math.sqrt(i) * Math.sin(i);
            for (let j = 0; j < len*50; j++) {
                result += Math.sqrt(j) * Math.sin(j);
            }
        }

        return result;
    } catch (e) {
        postMessage({
            type: 'error',
            message: `Calculation error: ${e.message}`
        });
        return null;
    }
}

// 定期执行任务并报告结果
const intervalId = setInterval(() => {
    if (isResourceLimited) {
        clearInterval(intervalId);
        return;
    }

    count++;
    const result = performCalculations();
    if (result !== null) {
        postMessage({
            type: 'calculation',
            iteration: count,
            result: result
        });
    }
}, 5000);

// 接收消息
onmessage = function(e) {
    postMessage({
        type: 'info',
        message: `Child worker ${e.data.id} started`
    });
};

Github

All code has been upload to GitHub:

tutorials/learnWebAssembly/MultipleThreadsOnWasm at main · theArcticOcean/tutorials

Two-layer multithreaded loops in the web consume more resources than single-layer loops

Published by StephenWei on 2025-02-232025-02-23

Why does a two-tier loop (say 10×10) create threads that consume more resources than a single-tier loop (100 iterations)?

What does it mean that a single-layer loop of 100 threads can reuse threads using a thread pool (such as a fixed size 8)? So the web worker doesn’t actually use 100, but 8?

Thread Pool Core Design Concepts

Execution Process for a Single Loop with 100 Tasks

Resource Consumption Comparison

Example doubleThreads

Example singleThreads

Files in doubleThreads

Github

Remove all AI tools that use OpenAI’s API

Instantly Convert Text to QR Code – Your Free Cross-Platform Solution!

How was emscripten_set_wheel_callback_on_thread generated in wasm file

Two-layer multithreaded loops in the web consume more resources than single-layer loops

Published by StephenWei on 2025-02-232025-02-23

Why does a two-tier loop (say 10×10) create threads that consume more resources than a single-tier loop (100 iterations)?

What does it mean that a single-layer loop of 100 threads can reuse threads using a thread pool (such as a fixed size 8)? So the web worker doesn’t actually use 100, but 8?

Thread Pool Core Design Concepts

Execution Process for a Single Loop with 100 Tasks

Resource Consumption Comparison

Example doubleThreads

Example singleThreads

Files in doubleThreads

Github

Related Posts

Remove all AI tools that use OpenAI’s API

Instantly Convert Text to QR Code – Your Free Cross-Platform Solution!

How was emscripten_set_wheel_callback_on_thread generated in wasm file