Skip to content

Metadata Card

  • Prerequisites: Vol 3 Computer Systems (instruction sets, registers, stack concepts), Vol 9 Chapters 1-2 (types and runtime)
  • Estimated time: 50 minutes
  • Core difficulty: Black Magic
  • Reading mode: High focus
  • Optional skip: Instruction set enumeration can be skimmed; focus on method invocation and exception handling
  • Completion mark: Can use javap to decompile a class file and read the core bytecode; understand the differences between method invocation instructions

Your Progress

You discover a hidden chamber in the ruins. In the center of the room sits an old projector. The slides don't show code—they show the binary intermediate product that the Java compiler produces: bytecode. You find Master Chen's notes on the wall, in black and white: "Without understanding bytecode, you don't understand Java."

Your Task

Your .java file gets compiled into a .class file, then loaded and executed by the JVM. If you only look at .java-level syntax, you'll never know what the compiler does behind the scenes—generic erasure, syntactic sugar, boxing/unboxing, string concatenation optimization. This chapter lets you see the decompiled results with your own eyes and read bytecode.

Chapter Layers

  • Required: Class file structure overview, using javap, common bytecode instructions
  • Optional: Detailed explanation of method invocation instructions (invokevirtual/invokespecial/invokeinterface/invokestatic)
  • Advanced: Exception table structure, how Lambda expressions are implemented in bytecode

Breaking Ground · Tracing the Origin

You write a simple string concatenation in Java source code. But one question has never been answered directly—will the compiler calculate it at compile time, or leave it for runtime? Standing before the projector in the ruins, you decide to see for yourself:

java
String s = "hello" + "world";

Will the compiler optimize it to "helloworld", or concatenate at runtime? You don't know. But you compile it to a class file and decompile it to find out:

bash
javac Hello.java
javap -c Hello

Output looks like:

java
0: ldc           #7                  // String helloworld
2: astore_1
3: return

The compiler already calculated the concatenation—ldc #7 loads the "helloworld" string from the constant pool. This is the truth that bytecode reveals. Master Chen was right: without understanding bytecode, you don't understand Java.


Class File Structure Overview

A .class file contains the following parts:

ClassFile {
    u4             magic;                // 0xCAFEBABE
    u2             minor_version;
    u2             major_version;        // 65 = Java 21
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;         // public, final, abstract...
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

Use javap -v Hello (verbose mode) to see the complete structure. But you don't need to remember all of it—just know:

  • Constant Pool: The symbol table for all strings, class names, method names, field names
  • Methods: Bytecode instructions + exception table + line number table
  • Attributes: Annotations, generic signatures, inner class info, etc.

The constant pool is the JVM's "dictionary." All instructions reference indices in the constant pool rather than writing strings directly. This design has two advantages: fixed instruction length (2-byte index), and constants can be shared across multiple instructions.


Stack-Based Virtual Machine: Instruction Set

The JVM is a stack-based virtual machine—it has no registers, only an operand stack. Your simplest addition:

java
int add(int a, int b) {
    return a + b;
}

Decompiled, you see it working on the stack:

java
0: iload_1       // Push local variable 1 (a) onto stack
1: iload_2       // Push local variable 2 (b) onto stack
2: iadd          // Pop two values, add, push result
3: ireturn       // Pop top of stack and return

The operand stack is not a register. iadd doesn't have a "fetch from which register" parameter—it takes two values from the top of the stack. This is what "stack-based" means.

Why did the JVM choose stack architecture? Stack-based instruction encoding is more compact than register-based (each instruction doesn't need to specify register numbers), and the instruction set doesn't need to change across platforms. The cost is more load/store instructions—moving local variables from the variable table to the stack and back.

Compare with x86's register architecture (you don't need to know x86 assembly, just feel the difference):

asm
; x86 — operates directly on registers
mov eax, [a]    ; From memory a to register eax
add eax, [b]    ; eax = eax + b
; Result is in eax

The JVM needs extra load steps, but in exchange, its instruction set is independent of any physical CPU.


Common Instructions Quick Reference

TypeInstructionEffect
Loadiload, aload, floadPush local variable onto stack
Storeistore, astore, fstorePop stack top into local variable
Arithmeticiadd, isub, imul, idivPop two values, compute, push result
Objectnew, getfield, putfieldCreate object, read/write fields
Methodinvokevirtual, invokespecialInvoke method
Returnireturn, areturn, returnReturn various types

The Art of Method Invocation Instructions

Java has four method invocation instructions, each used in different scenarios. This is the key to understanding "how polymorphism is implemented at the bytecode level."

invokevirtual: Instance Method, Dynamic Dispatch

java
void call(Animal a) {
    a.speak();  // invokevirtual
}

Decompiled:

java
0: aload_1
1: invokevirtual #12  // Method Animal.speak()V

invokevirtual uses dynamic dispatch: at runtime, it looks up the method table based on a's actual type. If a is a Dog, it calls Dog.speak(); if it's a Cat, it calls Cat.speak().

The bytecode level doesn't care about the actual type—it just says "starting from the runtime class, look upward for the implementation of speak."

invokespecial: Constructor, Private Methods, Superclass Methods

java
class B extends A {
    B() {
        super();  // invokespecial
    }

    private void helper() {}  // Caller: invokespecial
}

invokespecial does no dynamic dispatch—the compiler has already determined which method to call. Calling super() in a constructor is fixed, and private methods can't be overridden.

invokestatic: Static Methods

java
int x = Math.max(3, 5);  // invokestatic

The simplest—no instance, no dispatch.

invokeinterface: Interface Methods

Similar to invokevirtual, but the JVM needs to look up the interface method table (itable) instead of the virtual method table (vtable). Theoretically slightly slower—practically imperceptible.

java
void call(List list) {
    list.size();  // invokeinterface
}

Comparison of four instructions:

InstructionTargetDispatchTypical Scenario
invokevirtualInstance methodRuntime dynamicobj.toString()
invokespecialConstructor/private/superCompile-time fixedsuper(), private void
invokestaticStatic methodCompile-time fixedMath.max()
invokeinterfaceInterface methodRuntime dynamic (itable lookup)list.size()

Exception Table

try-catch in bytecode isn't an "instruction" but an exception table:

java
void readFile(String path) {
    try {
        FileInputStream fis = new FileInputStream(path);
    } catch (IOException e) {
        System.out.println("error");
    }
}

Decompiled exception table:

Exception table:
   from    to  target type
       0    16    19   Class java/io/IOException

This means: for code between bytecode offsets 0 and 16, if an IOException is thrown, jump to offset 19 for execution.

How is finally implemented? The finally block is duplicated in the bytecode—placed at the end of the normal path, and also at the end of each catch block. The compiler ensures that whether the method returns normally or with an exception, the code here will be executed.

java
try {
    // ...
} finally {
    cleanup();  // Appears twice in bytecode
}

You can verify this with javap—you'll see the finally code appear in the exception table as a "final handler."


Bonus Exploration: Lambda in Bytecode

java
list.forEach(x -> System.out.println(x));

A Lambda is not an anonymous inner class. At the bytecode level, the compiler generates an invokedynamic instruction, and at runtime, it creates an instance of the functional interface through LambdaMetafactory.

java
// Decompiled (javap -v)
0: aload_0
1: invokedynamic #7, 0  // InvokeDynamic #0:accept:()Ljava/util/function/Consumer;

invokedynamic was introduced in Java 7 and became a first-class user in Java 8 with Lambdas. The reason: invokedynamic allows the specific call target to be determined at runtime—Lambda is generated only once at startup and reused afterward, performing better than anonymous classes.


Common Pitfalls

  1. "Reading bytecode = learning assembly" — The JVM instruction set has about 200 instructions (~30 commonly used), far fewer than x86 (thousands).
  2. "String + operation always creates objects" — The compiler and JIT optimize (string concatenation optimization, automatic StringBuilder usage), but in-loop concatenation is still a trap.
  3. "All exceptions need bytecode checking" — No, it's only for understanding finally duplication. You don't need to memorize the instruction table, just know how to read javap output.

Pass Challenges

  • Warm-up: Write a simple Java class (just an add method), decompile with javap -c, and explain the meaning of each line.
  • Hands-on: Write a method with try-catch-finally, decompile it, and verify how many times finally was duplicated.
  • Observe: Compare the bytecode of String s = "a" + "b" vs String s = "a"; String t = s + "b"—can you see under what conditions the compiler does compile-time folding?

Traveler's Notes

Bytecode is the Java compiler's "tell-it-like-it-is"—if you don't ask, it pretends the syntactic sugar is real. Once you javap, it reveals the truth about generic erasure, string folding, and finally duplication.

Next Stop Preview

You can finally see Java's cards laid on the table. But there's a longer road ahead—some languages aren't content with just using what the compiler gives them; they want to do the compiler's work themselves. The sixth door: Metaprogramming & DSL—code that writes code.

Built with VitePress | Software Systems Atlas