Memory-Layout-Based Hacking in Swift

A Real Requirement: Enum’s Associated Value Accessor

Swift introduced associated value enum which also could be found in some modern programming languages. Such a system did improve Swift itself’s abstraction ability but also introduced verbose switch-case syntaxes in some cases. And there is a practicle example:

enum Vehicle {  
    case Car(windows: Int, wheels: Int)
    case Ship(windows: Int, funnels: Int, anchors: Int)
    case Plane(windows: Int, wheels: Int, wings: Int, engines: Int)
}

In the example above, each vehicle has windows. But in order to access each vehicle’s number of windows, we have to write a switch-case syntax to deal with each case. Since the number of windows was put at the first slot in the group of associated values, you might come up with a question: Is there any solution to be able to yank the number of windows out directly?

The answer is: Yes! But how? The memory would answer.

Memory Layout of Non-Recursive Associated Value Enums

We can firstly execute the expression below in a playground:

print(sizeof(Vehicle))  

And the result should be “33” on 64-bit architectures or “17” on 32-bit architectures, which means each value of Vehicle type occupies 33/17 bytes in memory. This number could remind us something: The case .Plane, which has the largest group of associated values in Vehicle type, has four elements: windows: Int, wheels: Int, wings: Int, engines: Int. As each Int type should be word sized - which is 64 bits on 64-bit architectures and 32 bits on 32-bit architectures - which equals to 8 bytes on 64-bit architectures and 4 bytes on 32-bit architectures, and there are four Ints in the group, so they occupy 32/16 bytes as total.

Such a number is magic. It is just 1 byte less than Vehicle type’s size. And we can now make assumptions that:

  1. Vehicle type uses 1 byte to store which case it is in.
  2. The rest bytes are used for storing associated values.
  3. Size of Vehicle type = 1 bytes + the largest associated value group’s size (likes C union, hmm).

But there is another question: Where does the 1 byte case slot locate at? The head? Or the tail?

For a given Vehicle value:

var aPlane: Vehicle = .Plane(windows: 1, wheels: 9, wings: 8, engines: 4)  

Since all things stored on memory are contiguous and in binary, we can induct that if the case slot is at the tail, the binary arrangement of it should be:

0x00 00 00 00 00 00 00 01,00 00 00 00 00 00 00 09,00 00 00 00 00 00 00 08,00 00 00 00 00 00 00 04,02  

And if the case slot is at the head, the binary arrangement of it should be:

0x02,00 00 00 00 00 00 00 01,00 00 00 00 00 00 00 09,00 00 00 00 00 00 00 08,00 00 00 00 00 00 00 04  

0x is hexadecimal's notation, we use hexadecimal representation here to show the arrangement concisely. Each number in hexadecimal notation occupies a half byte on memory (16 = 2 ^ 4).

To make the article tight, I didn't write assumptions for 32-bit architectures here.

Now, we're going to examine which one is right by executing codes below.

var aPlane: Vehicle = .Plane(windows: 1, wheels: 9, wings: 8, engines: 4)

// Int8 takes 1 byte (8 bits) memory space.
let bytePointer = withUnsafePointer(&aPlane) { UnsafePointer<Int8>($0) }

var offsetString = ""  
var contentString = ""

for offset in 0..<sizeof(Vehicle) {  
    offsetString += String(format: "%02d ", offset)
    contentString += String(format: "%02d ", bytePointer[offset])
}

print("offset:\t" + offsetString)  
print("content:\t" + contentString)  

Such codes print the content of a Vehicle value byte by byte. And we now got:

offset:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  
content: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 02  

The result is slightly different from what we assumed before but clearly told us that the case slot is at the tail(by where 02 is at the end of the output). And the difference between the result and our assumption shows the evidence that OS X represents all its data with little-endian (Endianness).

Since we've already know the memory layout of a non-recursive associated value enum, we can write accessor now.

Write the Accessor

Writing the accessor is quite easy. You can do it by using unsafeBitCast or withUnsafePointer.

// With unsafeBitCast

extension Vehicle {  
    var windows: Int {
        let casted = unsafeBitCast(aPlane, (Int, Int, Int, Int, Int8).self)
        return casted.0
    }
}
// With withUnsafePointer

extension Vehicle {  
    var windows: Int {
        var mutableSelf = self
        let ptr = withUnsafePointer(&mutableSelf) 
            { UnsafePointer<(Int, Int, Int, Int, Int8)>($0) }
        return ptr[0].0
    }
}

Any of them could make things done but one of them could introduce issues in future maintenance, which is the withUnsafePointer way.

Traps Or Pitfalls...

Wrong Type Casting

In an early tweet I tweeted before, I showed an example about utilizing such a technique with the withUnsafePointer way. But such an example miss used an Int but not Int8 to represent the case slot. Since OS X uses little-endian to represent all its data, it might be runnable on OS X - excepts where the hosting process has no right to read any byte in the next 7 bytes (for 64-bit architectures) or 3 bytes (for 32-bit architectures) continue to the enum, which could cause a segmentation fault and your app crashed.

Incompatible Binary Boundaries Or In-Fact Wrong Type Casting

And there is another issue you probably could encounter in the future if you applied such a technique in your app - incompatible binary boundaries.

If you applied this technique with withUnsafePointer way on a framework not compiled in your app's building time (such as Apple's framework), and the organization of the applied enum type in that framework was changed and you were not aware of it, since withUnsafePointer has a much higher tolerance with type casting mistake as I showed in my early tweet, and your old implementation of such a hacking was based on the old enum organization, it could crash because of wrong type casting but you probably could not be able to find the bug in time as I analyzed before.

So, please do not apply this technique on any framework not compiled in your app's building time and always with the unsafeBitCast way, which has a very strict type size equality checking at runtime.

How About Recursive Enums

I'm sorry. I just have no idea about how recursive enums get laid out on memory. Things are quite the same to non-recursive associated value enums with two cases, but different with much more cases. Anyway, if I found something, I would write it down and post it here.

WeZZard

Independent iOS developer, World of Warcraft add-on developer. Interested in Computer Graphics and Machine Learning.

People's Republic of China https://github.com/WeZZard