TIP 509: Implement reentrant mutexes on all platforms

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Frédéric Bonnet <fredericbonnet@free.fr>
State:          Final
Type:           Project
Vote:           Done
Created:        24-May-2018
Post-History:   
Keywords:       Tcl,threads
Tcl-Version:	8.7
Vote-Results:   8/0/1 accepted
Votes-For:      DKF, KBK, JN, JD, DGP, FV, SL, AK
Votes-Against:  none
Votes-Present:  BG
Tcl-Branch:     tip-509

Abstract

This TIP proposes to improve the Tcl_Mutex API by enforcing a consistent behavior on all core-supported platforms regarding reentrancy.

Context

This TIP is inspired by a request from FlightAware to fix deadlock issues with TclX signal handling. A specific issue has been opened to discuss the proposed implementation here.

Rationale

As of Tcl 8.6, the man page for thread support Thread.3 states that:

The result of locking a mutex twice from the same thread is undefined. On some platforms it will result in a deadlock.

On Windows platforms, mutexes are implemented using Win32 critical sections, which are reentrant. Tcl_Mutex are just plain CRITICAL_SECTION *.

On Unix platforms (this includes MacOS X), Tcl relies on the pthread library for multithreading and synchronization primitives such as mutexes. Tcl_Mutex are just plain pthread_mutex_t *. pthread mutexes are not reentrant by default, though the PTHREAD_MUTEX_RECURSIVE attribute can be used at creation time to make them so, but this possibility is not available on older systems.

The Tcl philosophy has always been to erase platform-specific peculiarities in favor of overall multi-platform consistency, sometimes to the point of implementing or emulating commonly available features on less capable platforms. Therefore it feels natural to pursue this goal by making Tcl_Mutex reentrant on all platforms and achieving a consistent behavior on both Windows and Unix.

Specifications

This TIP proposes to replace the following text from the Thread.3 man page:

The result of locking a mutex twice from the same thread is undefined. On some platforms it will result in a deadlock.

by the following text:

Mutexes are reentrant: they can be locked several times from the same thread. However there must be exactly one call to Tcl_MutexUnlock for each call to Tcl_MutexLock in order for a thread to release a mutex completely.

Portability issues

Windows

Mutexes are naturally reentrant on Windows systems, so no special work is required.

Unix

On pthread-based Unix systems that support the PTHREAD_MUTEX_RECURSIVE attribute, all pthread_mutex_t made available as Tcl_Mutex will be created using this attribute. This includes all but the oldest variants of Unix.

On pthread-based Unix systems that do not support the PTHREAD_MUTEX_RECURSIVE attribute, reentrancy will be achieved by combining a regular, non-reentrant pthread_mutex_t, with a thread-specific lock counter accessible through a pthread_key_t data key. This counter keeps track of the number of calls to Tcl_MutexLock minus the number of calls to Tcl_Mutex_Unlock. Tcl_MutexLock increments the counter, but only calls pthread_mutex_lock when the initial value is zero. Tcl_Mutex_Unlock behaves symmetrically: it decrements the counter, and only calls pthread_mutex_unlock when it reaches zero. This ensures that a thread never calls pthread_mutex_lock twice on the same mutex, and only calls pthread_mutex_unlock when the thread no longer holds it.

Detection of PTHREAD_MUTEX_RECURSIVE availability is done at configure time thanks to the AC_CHECK_DECLS autoconf macro in tcl.m4.

Potential incompatibilities

Although this TIP introduces a major change to Tcl_Mutex behavior on Unix, it is very unlikely that this will break any existing code:

  • Windows-only code will see no change in behavior.
  • Unix-only code with reentrant mutexes is fatally flawed in the current state of the Tcl core, since this results in a deadlock. At best, this TIP will fix such hard-to-reproduce situations (case in point: TclX signals), and the code will transition from the nonworking state to the working state (which, according to Dr. John Ousterhout, is the best performance improvement). At worst, this change will trigger a new class of reentrancy-related bugs on already broken code.
  • Multiplatform code in its current form behaves either inconsistently or consistently, depending on whether it uses reentrant mutexes or not. Consistent code will remain consistent, inconsistent code will become consistent as the Unix version aligns with the Windows version.

Related Bugs

Bug #f4f44174 demonstrates the deadlock issue with a script based on TclX. The root cause is the asynchronous event handler's Tcl_Mutex being locked twice from the same thread when a signal handler interrupts a thread in the middle of a mutex-protected section, which on Unix platforms results in a deadlock. The proposed implementation fixes this issue.

Implementation

The proposed implementation is available on branch tip-509 in the Tcl Fossil repository.

Copyright

This document has been placed in the public domain.

History