跳至內容

類型雙關

維基百科,自由的百科全書

類型雙關計算機科學的術語,指任何編程技術能顛覆或者繞過一門程序設計語言類型系統,以達成在形式語言內部難以甚至不可能實現的效果。

C語言C++語言,語法結構如類型轉換union,以及C++增加的reinterpret_cast運算符,用於實現類型雙關。

Pascal語言使用recordsvariants來按照多種方法處理特定數據類型。

Socket例子

[編輯]

Berkeley sockets使用類型雙關來處理IP地址。函數bind綁定一個位初始化的套接字到一個IP地址,其聲明如下:

int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);

bind函數通常如此使用:

struct sockaddr_in sa = {0};
int sockfd = ...;
sa.sin_family = AF_INET;
sa.sin_port = htons(port);
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

這是因為struct sockaddr_instruct sockaddr有相同的內存布局。兩個類型的指針可以互相轉換。

浮點例子

[編輯]

類型雙關不僅限於struct。對於浮點數,判斷其是否為負值:

bool is_negative(float x) {
    return x < 0.0;
}

假定浮點比較的代價高昂,並假定浮點數用IEEE 754標準,就可以用類型雙關獲取浮點數的符號位(sign bit)做整型比較:

bool is_negative(float x) {
    unsigned int *ui = (unsigned int *)&x;
    return *ui & 0x80000000;
}

注意有一些特例,如x負0,前一種實現返回false而第二種實現返回true.

這樣的實現適合於實時計算而又不能被優化實現的情形。注意把所有假定均寫為注釋記錄下來,並寫入靜態斷言(static assertions)驗證可移植期望是否滿足。雷神之錘III競技場遊戲用此方法實現平方根倒數速算法

使用union

[編輯]

為了遵循C99/C++的嚴格別名規則,可以使用union:[1]

bool is_negative(float x) {
    union {
        unsigned int ui;
        float d;
    } my_union = { .d = x };
    return my_union.ui & 0x80000000;
}


GCC編譯器支持這樣的語言擴展。[2]

其他的類型雙關,見數組步長

已隱藏部分未翻譯內容,歡迎參與翻譯

Pascal

[編輯]

A variant record permits treating a data type as multiple kinds of data depending on which variant is being referenced. In the following example, integer is presumed to be 16 bit, while longint and real are presumed to be 32, while character is presumed to be 8 bit:

  type variant_record = record
     case rec_type : longint of
         1: ( I : array [1..2] of integer );
         2: ( L : longint );
         3: ( R : real );
         4: ( C : array [1..4] of character);
     end;
   Var V: Variant_record;
      K: Integer;
      LA: Longint;
      RA: Real;
      Ch: character;
  ...
   V.I := 1;
   Ch := V.C[1];   (* This would extract the first binary byte of V.I *)
   V.R := 8.3;   
   LA := V.L;     (* This would store a real into an integer *)

In Pascal, copying a real to an integer converts it to the truncated value. This method would translate the binary value of the floating-point number into whatever it is as a long integer (32 bit), which will not be the same and may be incompatible with the long integer value on some systems.

These examples could be used to create strange conversions, although, in some cases, there may be legitimate uses for these types of constructs, such as for determining locations of particular pieces of data. In the following example a pointer and a longint are both presumed to be 32 bit:

 Type PA = ^Arec;
 
    Arec = record
      case rt : longint of
         1: (P: PA);
         2: (L: Longint);
    end;
 
  Var PP: PA;
   K: Longint;
  ...
   New(PP);
   PP^.P := PP;
   Writeln('Variable PP is located at address ', hex(PP^.L));

Where "new" is the standard routine in Pascal for allocating memory for a pointer, and "hex" is presumably a routine to print the hexadecimal string describing the value of an integer. This would allow the display of the address of a pointer, something which is not normally permitted. (Pointers cannot be read or written, only assigned .) Assigning a value to an integer variant of a pointer would allow examining or writing to any location in system memory:

 PP^.L := 0;
 PP := PP^.P;  (*PP now points to address 0 *)
 K := PP^.L;   (*K contains the value of word 0 *)
 Writeln('Word 0 of this machine contains ',K);

This construct may cause a program check or protection violation if address 0 is protected against reading on the machine the program is running upon or the operating system it is running under.

C#

[編輯]

In C# (and other .NET languages), this is a bit harder to achieve because of the type system, but can be done nonetheless, using pointers or struct unions.

Pointers

[編輯]

C# only allows pointers to so-called native types, i.e. any primitive type (except string), enum, array or struct that is composed only of other native types. Note that pointers are only allowed in code blocks marked 'unsafe'.

 float pi = 3.14159;
 uint piAsRawData = *(uint*)&pi;

Struct unions

[編輯]

Struct unions are allowed without any notion of 'unsafe' code, but they do require the definition of a new type.

 [StructLayout(LayoutKind.Explicit)]
 struct FloatAndUIntUnion
 {
     [FieldOffset(0)]
     public float DataAsFloat;
     [FieldOffset(0)]
     public uint DataAsUInt;
 }

 // ...

 FloatAndUIntUnion union;
 union.DataAsFloat = 3.14159;
 uint piAsRawData = union.DataAsUInt;

Raw CIL code

[編輯]

Raw CIL can be used instead of C#, because it doesn't have most of the type limitations. This allows one to, for example, combine two enum values of a generic type:

 TEnum a = ...;
 TEnum b = ...;
 TEnum combined = a | b; // illegal

This can be circumvented by the following CIL code:

 .method public static hidebysig
     !!TEnum CombineEnums<valuetype .ctor ([mscorlib]System.ValueType) TEnum>(
         !!TEnum a,
         !!TEnum b
     ) cil managed
 {
     .maxstack 2

     ldarg.0 
     ldarg.1
     or  // this will not cause an overflow, because a and b have the same type, and therefore the same size.
     ret
 }

The cpblk CIL opcode allows for some other tricks, such as converting a struct to a byte array:

 .method public static hidebysig
     uint8[] ToByteArray<valuetype .ctor ([mscorlib]System.ValueType) T>(
         !!T& v // 'ref T' in C#
     ) cil managed
 {
     .locals init (
         [0] uint8[]
     )

     .maxstack 3

     // create a new byte array with length sizeof(T) and store it in local 0
     sizeof !!T
     newarr uint8
     dup           // keep a copy on the stack for later (1)
     stloc.0

     ldc.i4.0
     ldelema uint8

     // memcpy(local 0, &v, sizeof(T));
     // <the array is still on the stack, see (1)>
     ldarg.0 // this is the *address* of 'v', because its type is '!!T&'
     sizeof !!T
     cpblk

     ldloc.0
     ret
 }

參考文獻

[編輯]
  1. ^ ISO/IEC 9899:1999 s6.5/7
  2. ^ GCC: Non-Bugs. [2017-11-20]. (原始內容存檔於2021-03-25). 

外部連結

[編輯]