TIP 548:Support <code>wchar_t</code> conversion functions and deprecate <code>Tcl_WinUtfToTChar()</code> and <code>Tcl_WinTCharToUtf()</code>

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Jan Nijtmans <jan.nijtmans@users.sf.net>
Author:         Jan Nijtmans <jan.nijtmans@gmail.com>
State:          Final
Type:           Project
Vote:           Done
Created:        3-June-2019
Post-History:   
Discussions-To: Tcl Core list
Keywords:       Tcl
Tcl-Version:    8.7
Tcl-Branch:     tip-548
Vote-Results:   4/2/0 accepted
Votes-For:      JN, DKF, KW, KBK
Votes-Against:  none
Votes-Present:  DGP, SL

Abstract

This TIP proposes to add wchar_t conversion functions and deprecate Tcl_WinUtfToTChar() and Tcl_WinTCharToUtf() in favour of those new functions.

Rationale

The functions Tcl_WinUtfToTChar() and Tcl_WinTCharToUtf() originally were functions able to do two different conversions, depending on the runtime platform: On Windows 95/98/ME they performed conversions between UTF-8 and the Windows default encoding (usually CP1252), on later Windows versions they convert between UTF-8 and UTF-16. The length parameter of Tcl_WinTCharToUtf() always was in bytes, but most other Unicode-related Tcl functions expect their length in Unicode characters.

Since Windows 95/98/ME are not supported any more, it's time to fix this inconsistency.

Modern systems have a wchar_t type which represents a Unicode-like type, which can either be 16 bits (unsigned short) or 32 bits (int). This TIP proposes 3 additional functions which convert between wchar_t-related types and UTF-8, and 3 more which convert between unsigned short-related types (UTF-16) and UTF-8. The new functions work identically on all platforms, not only Windows.

Specification

This document proposes:

  • Enhance the Tcl_UniCharToUtfDString() function such that the uniLength parameter is allowed to have the value -1. In that case, the UniChar string will be read up to the closing /u0000 character.

  • Enhance the Tcl_UniCharToUtfDString() and Tcl_UtfToUniCharDString() functions such that the src/uniStr parameters are allowed to have the value NULL. In that case, the functions return NULL, without doing anything.

  • New functions:

    Tcl_UtfToWCharDString(), similar to Tcl_UtfToUniCharDString(), but has a wchar_t pointer type in its signature.

    Tcl_WCharToUtfDString(), similar to Tcl_UniCharToUtfDString(), but has a wchar_t pointer type in its signature

    Tcl_UtfToWChar(), similar to Tcl_UtfToUniChar(), but has a wchar_t pointer in its signature.

    On Windows, wchar_t is the same type as unsigned short, but on other platforms wchar_t might be a 32-bit type ('int' usually). These functions map to either the 32-bit or the 16-bit versions, depending on the size of wchar_t, automatically.

    Tcl_UtfToChar16DString(), similar to Tcl_UtfToUniCharDString(), but has an unsigned short pointer type in its signature.

    Tcl_Char16ToUtfDString(), similar to Tcl_UniCharToUtfDString(), but has an unsigned short pointer type in its signature

    Tcl_UtfToChar16(), similar to Tcl_UtfToUniChar(), but has an unsigned short pointer in its signature.

  • Deprecate the following functions:

    Tcl_WinUtfToTChar(), in favour of Tcl_UtfToWCharDString()

    Tcl_WinTCharToUtf(), in favour of Tcl_WCharToUtfDString()

If Tcl is compiled with either -DTCL_UTF_MAX=6 (which is not officially supported) or -DTCL_NO_DEPRECATED, those functions will become macro's, which do exactly the same thing. In Tcl 9.0, the 2 deprecated functions will be removed from the stub tables, but the replacement macro's will still be there. So, the functions can still be used in extensions, they will be replaced with the new functions automatically.

  • The windows dde and registry extensions (tclWinDde.c and tclWinReg.c) are updated to use the new functions Tcl_UtfToWCharDString() and Tcl_WCharToUtfDString(), serving as proof/demonstration that the functions in this TIP actually work.

How to upgrade.

No need to do anything. In Tcl 9.0, the two deprecated functions are replaced by macro's which do the same thing. But if you want to prevent a (future) deprecation warning, you can do the following:

In your extension, replace the call:

 `Tcl_WinUtfToTChar(....., bufPtr)`;

with the following two lines:

 `Tcl_DStringInit(bufPtr);`
 `Tcl_UtfToWCharDString(....., bufPtr);`

And also, replace:

 `Tcl_WinTCharToUtf(....., bufPtr);`

with the following two lines:

 `Tcl_DStringInit(bufPtr);`
 `Tcl_WCharToUtfDString(....., bufPtr);`

If the Tcl_WinTCharToUtf() call originally had a "length" parameter not equal to -1, divide it by 2 (or ... don't multiply it by 2 any more).

Compatibility

This is fully upwards compatible.

Reference Implementation

A reference implementation is available in the tip-548 branch. https://core.tcl-lang.org/tcl/timeline?r=tip-548

Copyright

This document has been placed in the public domain.

History