Skip to content

Metadata Card

  • Prerequisites: Vol 3 Computer Systems (memory layout, stack and heap concepts), Vol 9 Chapter 1 (type systems)
  • Estimated time: 50 minutes
  • Core difficulty: Advanced
  • Reading mode: High focus
  • Optional skip: If only interested in Java, "Escape Analysis" details can be skipped; "Rust Ownership" is advanced
  • Completion mark: Can explain how objects are allocated in memory in Java, Python, and Rust; understand the core categories of GC

Your Progress

You step out of the type system chamber and walk deeper into the ruins. The runes on the wall change from "what data a variable holds" to "how data is stored." The ways different civilizations solve this problem strike you with awe—the Java civilization uses an "auto-sweep" mechanism (GC), the C civilization makes every programmer sweep themselves (manual management), and the Rust civilization calculates all garbage production times at construction (compile-time ownership checking).

They all want data to stay where it should be, but their understanding of "should" is completely different—like how different civilizations define "home" differently: some live in dormitories (shared heap), some build independent cottages (stack allocation), and some only live in houses with contractual obligations (ownership model).

Your Task

Every program processes data—but where is data stored, when is it created, and when is it destroyed? Different languages give wildly different answers. Java throws objects on the heap and lets GC sweep; Rust calculates at compile time who dies when; Python puts everything on the heap and relies on reference counting as a first line. This chapter breaks down the engineering philosophy behind these choices.

Chapter Layers

  • Required: Stack vs heap comparison, three basic GC strategies, value types vs reference types
  • Optional: Escape analysis and stack allocation
  • Advanced: Rust ownership model, GC-free approach

Breaking Ground · Tracing the Origin

Imagine you find a mural in the Java ruins depicting a coordinate point (3, 4). You write this seemingly innocent code in Java:

java
public class Point {
    int x, y;
    Point(int x, int y) { this.x = x; this.y = y; }
}

void draw() {
    Point p = new Point(3, 4);  // Where is p?
    // ...
}

p (the reference) is on the stack; new Point(3, 4) (the actual object) is on the heap. This is the standard answer for most GC languages.

Compare with C:

c
void draw() {
    struct Point p = {3, 4};  // p is entirely on the stack
    // ...
}

No new, no malloc; p is an 8-byte stack value. It disappears automatically when the function ends. Fast, but small—when Point is too large or needs to outlive the function's return, it won't fit on the stack, and you have to manually malloc.

Now compare Python:

python
def draw():
    p = (3, 4)  # tuple, allocated on the heap

Every Python object (including integers and tuples) is allocated on the heap. The integers 3 and 4 are also objects—each integer takes 28 bytes by default (sys.getsizeof(3) returns 28 in Python 3).

The same Point(3,4)—the memory used by the three languages is vastly different.


Stack vs Heap: A Choice

StackHeap
Allocation speedMove stack pointer, O(1)Find free block, O(n)+
ReleaseAuto pop on function returnExplicit free or GC
SizeLimited (usually MB-level), known at compile timeLarge (GB-level), runtime dynamic
Thread safetyNaturally thread-privateNeeds synchronization
Applicable scenariosLocal variables, small objects, known compile-time sizeLarge objects, dynamic size, cross-function lifetime

Why does Java put things on the heap by default? Because GC needs unified management. If some objects were on the stack, GC couldn't track them uniformly. Type erasure also means the JVM doesn't know how large T actually is, so it can't put unknown-sized data on the stack like C does.

Why can C put things on the stack? The struct size is known at compile time; sizeof(struct Point) is a constant.


Three Basic GC Strategies—Three "Auto-Sweep" Schemes in the Ruins

Different language civilizations in the ruins have different approaches to "garbage." Some assign a cleaning robot to follow every object (reference counting). Some periodically evict everyone and do a deep clean (mark-and-sweep). Some designed structures at construction time that simply don't generate garbage (no GC).

Different language designers chose different cleaning methods:

Reference Counting (Python, Swift, Objective-C)

Each object maintains a counter—how many references point to me. When it reaches 0, the object is reclaimed.

python
a = []
b = a    # a's reference count becomes 2
del a    # reference count decreases to 1
del b    # reference count decreases to 0, reclaimed

Advantage: Work is spread across each assignment; no "full pause." Disadvantage: Circular reference leaks—A references B, B references A; both counts are 1, and neither can be reclaimed.

Python uses a "generational cycle detector" to periodically scan for unreachable circular references. This is why you might occasionally notice Python memory not being released immediately.

Mark-and-Sweep + Generational Collection (Java HotSpot)

The JVM divides the heap into three "generations":

  • Young Generation: New objects go here, frequently GC'd
  • Old Generation: Objects that survive multiple GC cycles are promoted here
  • Metaspace: Class metadata

GC starts from roots (stack references, static variables) and marks all reachable objects. Unmarked objects are garbage.

java
void example() {
    Point p = new Point(3, 4);  // (1) Allocated in young generation
    Point q = p;                // (2) p and q both reference the same object
    q = new Point(5, 6);       // (3) Old Point(3,4) is unreachable — GC marks it
}
// After the function returns, p and q disappear from the stack
// Both Point objects, if no other references remain, are marked as garbage

Advantage: Can handle circular references. Disadvantage: A Full GC causes "Stop the World"—all threads pause until GC finishes.

No GC (Rust)—"Don't create garbage, don't need to sweep"

Rust takes a completely different path. Its idea is: rather than sweeping at runtime, design at construction time who does the cleaning.

It's like building a house in the ruins—when you lay the foundation, you already decide which wall each brick will go to and when that brick will be removed. No need for a cleaning crew (GC), and no need to sweep yourself (malloc/free).

Rust analyzes ownership and lifetimes at compile time. No runtime marking, no thread pausing.

rust
fn draw() {
    let p = Box::new(Point { x: 3, y: 4 });
    // p is &Point on the stack; the actual Point is on the heap
    // When the function ends, Box::drop() is called automatically, heap memory freed
}

Key difference: Rust has no GC, but it's not C-style manual free either. It determines who releases what at what time at compile time through ownership rules. The rules seem strict, but the trade-off is determinism—you know exactly where each free happens. The cost is the learning curve—you have to learn to negotiate with the compiler about "who manages this data."


Escape Analysis and Stack Allocation

Java says "objects are on the heap"—but JVM's escape analysis optimization can break this principle.

java
void draw() {
    Point p = new Point(3, 4);
    int sum = p.x + p.y;
    // p does not "escape" the draw() function
}

JVM's analysis finds: p is used only inside draw(), not returned, not placed in a global variable, not passed to another thread. So it allocates the Point object on the stack—just like C's local structs.

The bytecode compiled by javac still shows new, but the JIT compiler recognizes the escape pattern at runtime and optimizes hot code. This is the JVM's "hot-path sign"—transparent to the developer.

Compare when the object escapes:

java
Point globalPoint;

void store(Point p) {
    globalPoint = p;  // p escapes — must be heap-allocated
}

Languages that can't do escape analysis at compile time, like Python, almost never do this—because functions can be dynamically modified, parameter types can change, and the compiler can't determine reference relationships.


Value Types vs Reference Types

Value TypeReference Type
Data locationStack (or embedded in other objects)Heap; only reference on stack
Assignment semanticsCopy the entire valueCopy the reference (two references point to the same object)
null valueUsually can't be nullCan be null
Representative languagesint, struct (C#, Rust)Object, class (Java, Python)

Java's primitive types int, long, boolean are value types—int a = 5; int b = a copies 5 itself. But Integer is a reference type; Integer a = 5 (boxing) allocates an object on the heap.

java
int a = 5;
int b = a;
b = 10;
System.out.println(a);  // 5 — unaffected

Integer aBox = 5;
Integer bBox = aBox;
bBox = 10;
System.out.println(aBox);  // 5 — unaffected, but bBox becomes a new object

Same output, but the underlying mechanisms are vastly different: changing an int rewrites a stack value; changing an Integer creates a new Integer object.

Why doesn't Java make all types value types? JVM's GC is reference-tracing-based. If a value were on the stack, GC couldn't mark it as "reachable." Additionally, the size of a value type must be known at compile time—which is impossible with type-erased generics. Project Valhalla (the future of Java values) aims to solve this.

Rust does the opposite: value types by default. You write let p = Point { x: 3, y: 4 }, and the entire Point is on the stack (or embedded in the parent structure). For heap allocation, you must explicitly use Box::new().


Multi-language Comparison: Creating a "User" in Three Civilizations—Same Thing, Completely Different Approaches

Java (heap allocation, reference passing, GC reclamation):

java
class User {
    String name;
    int age;
}

void register() {
    User u = new User();  // Heap allocation
    u.name = "Alice";
    u.age = 30;
    // Function ends, u has no more references — waiting for GC
}

Rust (value semantics, compiler decides):

rust
struct User {
    name: String,  // String internally heap-allocated
    age: u32,      // On the stack
}

fn register() {
    let u = User {
        name: String::from("Alice"),
        age: 30,
    };
    // u itself is on the stack (a name pointer + age);
    // name's string data is on the heap.
    // Function ends, drop both fields; stack and heap are both freed.
}

Python (everything is an object, heap allocation):

python
class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

def register():
    u = User("Alice", 30)  # u (reference) on stack, User object on heap
    # Python doesn't require you to manage much—reference counting maintains itself

Same concept, three languages, completely different implementation strategies. No single solution satisfies "best performance," "easiest to write," and "most deterministic memory."


Common Pitfalls

  1. "GC languages can't have memory leaks" — The leak isn't memory itself, but useless references. A static List keeps adding but never removing—GC sees these objects as still "reachable" and doesn't clean them. It's like piling garbage at your doorstep, labeling it "my stuff," and then nobody can throw it away.
  2. "Rust has zero runtime overhead" — Ownership checking is compile-time, zero runtime overhead; but Box, Rc, RefCell have runtime overhead
  3. "Stack is faster than heap" — Faster at allocation (pointer moving), not access (CPU cache effect depends on data size and access patterns)

Pass Challenges

  • Warm-up: Draw a table comparing Java / C++ / Python / Rust on: default allocation location, GC strategy, value/reference type
  • Hands-on: Run java -XX:+PrintGCDetails YourProgram, observe GC logs, identify the difference between Minor GC and Full GC
  • Observe: Use sys.getsizeof() in Python to check the memory size of different objects—an int, a str, a list

Traveler's Notes

Memory management is the single biggest design decision in a language—choose GC and accept pauses, choose reference counting and handle circular references, choose ownership and accept the learning curve. There's no free lunch, only different trade-offs. Behind each language civilization's choice is a different answer to "what are you willing to sacrifice."

Next Stop Preview

A program runs; memory is allocated, threads exist. How do multiple threads coordinate? The third door in the ruins—Concurrency Model Comparison: Actor, CSP, STM—why does each civilization give a different definition of "sharing"?

Built with VitePress | Software Systems Atlas