## Threadpool

### Overview

![overview](/doc-assets/threadpool.svg)

An api that lets you create a pool of worker threads, and a queue of tasks that
are bound to a wsi.  Tasks in their own thread  synchronize communication to the
lws service thread of the wsi via `LWS_CALLBACK_SERVER_WRITEABLE` and friends.

Tasks can produce some output, then return that they want to "sync" with the
service thread.  That causes a `LWS_CALLBACK_SERVER_WRITEABLE` in the service
thread context, where the output can be consumed, and the task told to continue,
or completed tasks be reaped.

ALL of the details related to thread synchronization and an associated wsi in
the lws service thread context are handled by the threadpool api, without needing
any pthreads in user code.

### Example

https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-threadpool

### Lifecycle considerations

#### Tasks vs wsi

Although all tasks start out as being associated to a wsi, in fact the lifetime
of a task and that of the wsi are not necessarily linked.

You may start a long task, eg, that runs atomically in its thread for 30s, and
at any time the client may close the connection, eg, close a browser window.

There are arrangements that a task can "check in" periodically with lws to see
if it has been asked to stop, allowing the task lifetime to be related to the
wsi lifetime somewhat, but some tasks are going to be atomic and longlived.

For that reason, at wsi close an ongoing task can detach from the wsi and
continue until it ends or understands it has been asked to stop.  To make
that work, the task is created with a `cleanup` callback that performs any
freeing independent of still having a wsi around to do it... the task takes over
responsibility to free the user pointer on destruction when the task is created.

![Threadpool States](/doc-assets/threadpool-states.svg)

#### Reaping completed tasks

Once created, although tasks may run asynchronously, the task itself does not
get destroyed on completion but added to a "done queue".  Only when the lws
service thread context queries the task state with `lws_threadpool_task_status()`
may the task be reaped and memory freed.

This is analogous to unix processes and `wait()`.

If a task became detached from its wsi, then joining the done queue is enough
to get the task reaped, since there's nobody left any more to synchronize the
reaping with.

### User interface

The api is declared at https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-threadpool.h

#### Threadpool creation / destruction

The threadpool should be created at program or vhost init using
`lws_threadpool_create()` and destroyed on exit or vhost destruction using
first `lws_threadpool_finish()` and then `lws_threadpool_destroy()`.

Threadpools should be named, varargs are provided on the create function
to facilite eg, naming the threadpool by the vhost it's associated with.

Threadpool creation takes an args struct with the following members:

Member|function
---|---
threads|The maxiumum number of independent threads in the pool
max_queue_depth|The maximum number of tasks allowed to wait for a place in the pool

#### Task creation / destruction

Tasks are created and queued using `lws_threadpool_enqueue()`, this takes an
args struct with the following members

Member|function
---|---
wsi|The wsi the task is initially associated with
user|An opaque user-private pointer used for communication with the lws service thread and private state / data
task|A pointer to the function that will run in the pool thread
cleanup|A pointer to a function that will clean up finished or stopped tasks (perhaps freeing user)

Tasks also should have a name, the creation function again provides varargs
to simplify naming the task with string elements related to who started it
and why.

#### The task function itself

The task function receives the task user pointer and the task state.  The
possible task states are

State|Meaning
---|---
LWS_TP_STATUS_QUEUED|Task is still waiting for a pool thread
LWS_TP_STATUS_RUNNING|Task is supposed to do its work
LWS_TP_STATUS_SYNCING|Task is blocked waiting for sync from lws service thread
LWS_TP_STATUS_STOPPING|Task has been asked to stop but didn't stop yet
LWS_TP_STATUS_FINISHED|Task has reported it has completed
LWS_TP_STATUS_STOPPED|Task has aborted

The task function will only be told `LWS_TP_STATUS_RUNNING` or
`LWS_TP_STATUS_STOPPING` in its status argument... RUNNING means continue with the
user task and STOPPING means clean up and return `LWS_TP_RETURN_STOPPED`.

If possible every 100ms or so the task should return `LWS_TP_RETURN_CHECKING_IN`
to allow lws to inform it reasonably quickly that it has been asked to stop
(eg, because the related wsi has closed), or if it can continue.  If not
possible, it's okay but eg exiting the application may experience delays
until the running task finishes, and since the wsi may have gone, the work
is wasted.

The task function may return one of

Return|Meaning
---|---
LWS_TP_RETURN_CHECKING_IN|Still wants to run, but confirming nobody asked him to stop.  Will be called again immediately with `LWS_TP_STATUS_RUNNING` or `LWS_TP_STATUS_STOPPING`
LWS_TP_RETURN_SYNC|Task wants to trigger a WRITABLE callback and block until lws service thread restarts it with `lws_threadpool_task_sync()`
LWS_TP_RETURN_FINISHED|Task has finished, successfully as far as it goes
LWS_TP_RETURN_STOPPED|Task has finished, aborting in response to a request to stop

The SYNC or CHECKING_IN return may also have a flag `LWS_TP_RETURN_FLAG_OUTLIVE`
applied to it, which indicates to threadpool that this task wishes to remain
unstopped after the wsi closes.  This is useful in the case where the task
understands it will take a long time to complete, and wants to return a
complete status and maybe close the connection, perhaps with a token identifying
the task.  The task can then be monitored separately by using the token.

#### Synchronizing

The task can choose to "SYNC" with the lws service thread, in other words
cause a WRITABLE callback on the associated wsi in the lws service thread
context and block itself until it hears back from there via
`lws_threadpool_task_sync()` to resume the task.

This is typically used when, eg, the task has filled its buffer, or ringbuffer,
and needs to pause operations until what's done has been sent and some buffer
space is open again.

In the WRITABLE callback, in lws service thread context, the buffer can be
sent with `lws_write()` and then `lws_threadpool_task_sync()` to allow the task
to fill another buffer and continue that way.

If the WRITABLE callback determines that the task should stop, it can just call
`lws_threadpool_task_sync()` with the second argument as 1, to force the task
to stop immediately after it resumes.

#### The cleanup function

When a finished task is reaped, or a task that become detached from its initial
wsi completes or is stopped, it calls the `.cleanup` function defined in the
task creation args struct to free anything related to the user pointer.

With threadpool, responsibility for freeing allocations used by the task belongs
strictly with the task, via the `.cleanup` function, once the task has been
enqueued.  That's different from a typical non-threadpool protocol where the
wsi lifecycle controls deallocation.  This reflects the fact that the task
may outlive the wsi.

#### Protecting against WRITABLE and / or SYNC duplication

Care should be taken than data prepared by the task thread in the user priv
memory should only be sent once.  For example, after sending data from a user
priv buffer of a given length stored in the priv, zero down the length.

Task execution and the SYNC writable callbacks are mutually exclusive, so there
is no danger of collision between the task thread and the lws service thread if
the reason for the callback is a SYNC operation from the task thread.

### Thread overcommit

If the tasks running on the threads are ultimately network-bound for all or some
of their processing (via the SYNC with the WRITEABLE callback), it's possible
to overcommit the number of threads in the pool compared to the number of
threads the processor has in hardware to get better occupancy in the CPU.