Analysis of CVE-2023-32439
Introduction
JavaScriptCore, the engine powering web browsers like Safari, employs advanced optimization layers to enhance JavaScript performance. These layers employ sophisticated techniques to analyze and optimize code, improving execution speed and memory efficiency. From baseline compilation to just-in-time (JIT) compilation, these layers adapt to runtime conditions, minimizing overhead and maximizing the application's responsiveness. Leveraging a blend of adaptive optimizations and inline caching, JavaScriptCore's optimization layers play a vital role in achieving optimal performance for modern web applications.
The complex design of this system makes it a prime target for potential attackers. Its intricate nature introduces many vulnerabilities and weak spots. This broadens the area an attacker can exploit, making it an attractive target for malicious individuals. The system's interconnections and diverse functionalities increase the chances of unnoticed security flaws, adding to the risk for cyber adversaries keen on exploiting its intricacies.
This article presupposes a reasonable familiarity with JSC's internal workings.
Bug analysis
On June 21, 2023, Apple rolled out a security update for Safari 16.5.1, tagging CVE-2023-32439 to a type confusion issue. In commit 52fe95e5805c735cc1fa4d6200fcaa1912efbfea the Bugzilla bug ID cited in the patch notes is cross-referenced.
The git message reads as follows:
EnumeratorNextUpdateIndexAndMod
andHasIndexedProperty
are different DFG nodes. However, they might introduce the same heap location kind in DFGClobberize.h which might lead to hash collision. We should introduce a new locationn kind forEnumeratorNextUpdateIndexAndMode
.
The test case added is:
//@ runDefault("--watchdog=300", "--watchdog-exception-ok")
const arr = [0];
function foo() {
for (let _ in arr) {
0 in arr;
while(1);
}
}
foo();
Program analysis and abstract interpretation
To grasp the inner workings of JavaScriptCore, having a foundational knowledge of static program analysis and abstract interpretation comes in handy.
from "Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints"
A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the resulta of abstract execution give some informations on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -151517 may be understood to denote computations on the abstract universe {(+), (-), (+-)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -151517 ==> -(+)(+) ==> (-)(+) ==> (-), proves that -1515+17 is a negative number. Abstract interpretation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inacurate (e.g. -1515+17 ==> -(+)+(+) ==> (-)+(+) ==> (+-)). Despite its fundamental incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimisations which are not carried in the absence of certainty about their feasibility, ...).
JavaScriptCore intricately models its computation; for more details, dive into the following: https://webkit.org/blog/10308/speculation-in-javascriptcore/.
Baseline JIT Compiler
The initial layer, known as the "Baseline JIT Compiler", swiftly translates JavaScript source code into native machine code, prioritising compilation speed over extensive optimisation. This first layer acts as a quick start for applications, providing immediate execution while preparing for more refined optimisations in subsequent layers.
For instance test.js
compiles to:
// ... truncated for brevity
bb#1
Predecessors: [ ]
[ 0] enter
[ 1] new_func dst:loc5, scope:loc4, functionDecl:0
// ... truncated for brevity
[ 49] resolve_scope dst:loc6, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[ 56] get_from_scope dst:loc5, scope:loc6, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[ 64] call dst:loc5, callee:loc5, argc:1, argv:12
[ 70] end value:loc5
Successors: [ ]
Basic Block #1 (bb#1) serves as the primary block where the function foo()
is invoked via opcode [64]
:
foo#C1tLOB:[0x10d44c220->0x10d42d800, NoneFunctionCall, 76]: 21 instructions (0 16-bit instructions, 0 32-bit instructions, 6 instructions with metadata); 196 bytes (120 metadata bytes); 1 parameter(s); 16 callee register(s); 5 variable(s); scope at loc4
bb#1
Predecessors: [ ]
[ 0] enter
[ 1] mov dst:loc5, src:<JSValue()>(const0)
[ 4] resolve_scope dst:loc6, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[ 11] get_from_scope dst:loc6, scope:loc6, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[ 19] mov dst:loc8, src:Int32: 0(const1)
[ 22] mov dst:loc9, src:Int32: 0(const1)
[ 25] get_property_enumerator dst:loc11, base:loc6
[ 28] jeq_ptr value:loc11, specialPointer:Cell: 0x101020e68 (0x3000042c0:[0x42c0/17088, JSPropertyNameEnumerator, (0/0, 0/0){}, NonArray, Leaf]), StructureID: 17088(const2), targetLabel:46(->74)
Successors: [ #6 #2 ]
bb#2
Predecessors: [ #1 #5 ]
[ 32] loop_hint
[ 33] check_traps
[ 34] enumerator_next propertyName:loc10, mode:loc8, index:loc9, base:loc6, enumerator:loc11
[ 41] jeq_ptr value:loc10, specialPointer:String (atomic),8Bit:(1),length:(1): $, StructureID: 16976(const3), targetLabel:33(->74)
Successors: [ #6 #3 ]
bb#3
Predecessors: [ #2 ]
[ 45] mov dst:loc5, src:loc10
[ 48] resolve_scope dst:loc12, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[ 55] get_from_scope dst:loc13, scope:loc12, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[ 63] in_by_val dst:loc14, base:loc13, property:Int32: 0(const4)
Successors: [ #4 ]
// ... truncated for brevity
The key aspects to focus on are the opcodes 25
, 34
, and 63
, constituting the structure of a for loop
[ 25] get_property_enumerator ..., ...
...
[ 34] enumerator_next ..., ...
...
[ 63] in_by_val ..., ...
Data Flow Graph (DFG)
JavaScriptCore (JSC) utilizes the Data Flow Graph (DFG) as a crucial component for optimizing JavaScript code. DFG is a representation of the program's control flow and data flow, enabling sophisticated analysis and transformation of the code to enhance performance. This dynamic approach ensures that JavaScript execution is efficient and well-tailored to the specific program's needs. The DFG plays a pivotal role in achieving optimal performance, making it an essential part of JSC's optimization strategy.
After DFG's bytecode parsing, instructions 25
, 34
, and `63 undergo conversion into its internal Intermediate Representation (IR). For instance, bytecode #34 is compiled to:
...
5 1 : D@44:<!0:-> GetLocal(JS|MustGen|UseAsOther, loc8(Q~/FlushedJSValue), R:Stack(loc8), bc#34, ExitValid) predicting None
6 1 : D@45:< 1:-> EnumeratorNextUpdateIndexAndMode(Check:Untyped:D@41, Check:Untyped:D@42, Check:Untyped:D@44, Check:Untyped:D@43, VarArgs, Int32+OriginalCopyOnWriteArray+InBounds+AsIs+Read, enumeratorModes = 1, R:World, W:Heap, Exits, ClobbersExit, bc#34, ExitValid)
...
and bytecode #63 to
...
12 2 : D@69:< 1:-> JSConstant(JS|UseAsOther, Int32: 0, bc#63, ExitValid)
13 2 : D@70:<!0:-> InByVal(Check:Untyped:D@66, Check:Untyped:D@69, Boolean|MustGen|PureInt, Int32+OriginalCopyOnWriteArray+InBounds+AsIs+Read, R:World, W:Heap, Exits, ClobbersExit, bc#63, ExitValid)
14 2 : D@71:<!0:-> MovHint(Check:Untyped:D@70, MustGen, loc14, W:SideState, ClobbersExit, bc#63, ExitInvalid)
...
Even though the git message references HasIndexedProperty, there is no apparent mention of it in the unoptimized graph.
Prediction propagation phase
Prediction propagation phase propagates predictions obtained from heap load sites via the value profiler earlier in computation, as well as from slow path executions, in order to create a prediction for every node within the graph.
setPrediction()
is used when we know that there is no way that we can change our minds about what the prediction is going to be.
For EnumeratorNextUpdateIndexAndMod
and HasIndexedProperty
case EnumeratorNextUpdateIndexAndMode: {
setTuplePredictions(SpecInt32Only, SpecInt32Only);
break;
}
...
case HasIndexedProperty: {
setPrediction(SpecBoolean);
break;
}
These two nodes are believed to possess distinct values, and this is expected to remain unchanged. Hence, considering them as identical in the value they yield would be incorrect.
Fixup phase
Fixup phase addresses inefficient segments of the graph based on our predictions. This step should occur after prediction propagation but before performing Common Subexpression Elimination (CSE).
If, during the Prediction propagation phase, the DFG anticipates that InByVal
is working on an object in search of anInt3
value, it is then replaced with a HasIndexedProperty
node.
...
case InByVal: {
if ((node->child1()->shouldSpeculateObject()
|| (node->child1()->shouldSpeculateObjectOrOther() && !m_graph.hasExitSite(node->origin.semantic, BadType)))
&& node->child2()->shouldSpeculateInt32()) {
convertToHasIndexedProperty(node);
break;
}
fixEdge<CellUse>(node->child1());
break;
}
...
Common Subexpression Elimination
Common Subexpression Elimination (CSE) is an optimization technique used in compilers. It identifies repetitive expressions—those yielding the same value—and assesses the potential benefit of substituting them with a single variable storing the computed value.
Many past exploits have leveraged the mechanisms of this optimization pass to trigger type confusion vulnerabilities, and this case is no exception. When seeking to remove a redundant node, DFG treats both EnumeratorNextUpdateIndexAndMode
and HasIndexedProperty
as interchangeable. This is due to their LocationKind
values being the same within their respective HeapLocation objects. The identical LocationKind
implies that these two nodes read the same data type from the same location, making them mergeable into one leading to a type-confusion issue.
Before CSE:
...
14 2 27: D@84:<!0:-> CheckInBounds(Int32:D@29, Check:KnownInt32:D@90, JS|MustGen|PureInt, Int32, Exits, bc#63, ExitValid)
15 2 27: D@70:< 1:-> HasIndexedProperty(Object:D@26, Int32:D@29, Check:Untyped:D@95, Check:Untyped:D@84, Boolean|VarArgs|PureInt, Bool, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, R:Butterfly_publicLength,JSObject_butterfly,IndexedInt32Properties, Exits, bc#63, ExitValid)
16 2 27: D@123:<!0:-> KillStack(MustGen, loc14, W:Stack(loc14), ClobbersExit, bc#63, ExitInvalid)
...
After CSE optimization node 70 is no longer HasIndexedProperty
...
14 2 28: D@84:<!0:-> CheckInBounds(Int32:D@29, Check:KnownInt32:D@90, JS|MustGen|PureInt, Int32, Exits, bc#63, ExitValid)
15 2 28: D@70:<!0:-> CheckVarargs(MustGen|VarArgs, Bool, bc#63, ExitValid)
16 2 28: D@42:<!0:-> KillStack(MustGen, loc14, W:Stack(loc14), ClobbersExit, bc#63, ExitInvalid)
...
Patch
This behaviour can be verified by examining the patch, which illustrates how CSE previously failed to differentiate between LocationKinds.
void clobberize( . . . ) {
...
switch (node->op()) {
...
case EnumeratorNextUpdateIndexAndMode:
case HasIndexedProperty: {
...
read(JSObject_butterfly);
ArrayMode mode = node->arrayMode();
+ LocationKind locationKind = node->op() == EnumeratorNextUpdateIndexAndMode ? EnumeratorNextUpdateIndexAndModeLoc : HasIndexedPropertyLoc;
switch (mode.type()) {
case Array::ForceExit: {
write(SideState);
return;
}
case Array::Int32: {
if (mode.isInBounds()) {
read(Butterfly_publicLength);
read(IndexedInt32Properties);
- def(HeapLocation(HasIndexedPropertyLoc, IndexedInt32Properties, graph.varArgChild(node, 0), graph.varArgChild(node, 1)), LazyNode(node));
+ def(HeapLocation(locationKind, IndexedInt32Properties, graph.varArgChild(node, 0), graph.varArgChild(node, 1)), LazyNode(node));
return;
}
break;
}
...
}
Exploitation
Identifying a way to exploit the bug might have entailed analyzing FTL's IR to identify the node misusing an alleged boolean value, which, actually is an Int32
type.
Conclusion:
EnumeratorNextUpdateIndexAndMode
and HasIndexedProperty
represent distinct DFG nodes, but they may both use the same heap location kind value during clobberization. This misuse of heap location kind has the potential to result in a type confusion bug after CSE, given that the former produces a 32-bit integer while the latter deals with boolean values.