Metadata Card
- Prerequisites: Vol 3 Computer Systems (instruction sets, registers, stack concepts), Vol 9 Chapters 1-2 (types and runtime)
- Estimated time: 50 minutes
- Core difficulty: Black Magic
- Reading mode: High focus
- Optional skip: Instruction set enumeration can be skimmed; focus on method invocation and exception handling
- Completion mark: Can use
javapto decompile a class file and read the core bytecode; understand the differences between method invocation instructions
Your Progress
You discover a hidden chamber in the ruins. In the center of the room sits an old projector. The slides don't show code—they show the binary intermediate product that the Java compiler produces: bytecode. You find Master Chen's notes on the wall, in black and white: "Without understanding bytecode, you don't understand Java."
Your Task
Your .java file gets compiled into a .class file, then loaded and executed by the JVM. If you only look at .java-level syntax, you'll never know what the compiler does behind the scenes—generic erasure, syntactic sugar, boxing/unboxing, string concatenation optimization. This chapter lets you see the decompiled results with your own eyes and read bytecode.
Chapter Layers
- Required: Class file structure overview, using
javap, common bytecode instructions- Optional: Detailed explanation of method invocation instructions (invokevirtual/invokespecial/invokeinterface/invokestatic)
- Advanced: Exception table structure, how Lambda expressions are implemented in bytecode
Breaking Ground · Tracing the Origin
You write a simple string concatenation in Java source code. But one question has never been answered directly—will the compiler calculate it at compile time, or leave it for runtime? Standing before the projector in the ruins, you decide to see for yourself:
String s = "hello" + "world";Will the compiler optimize it to "helloworld", or concatenate at runtime? You don't know. But you compile it to a class file and decompile it to find out:
javac Hello.java
javap -c HelloOutput looks like:
0: ldc #7 // String helloworld
2: astore_1
3: returnThe compiler already calculated the concatenation—ldc #7 loads the "helloworld" string from the constant pool. This is the truth that bytecode reveals. Master Chen was right: without understanding bytecode, you don't understand Java.
Class File Structure Overview
A .class file contains the following parts:
ClassFile {
u4 magic; // 0xCAFEBABE
u2 minor_version;
u2 major_version; // 65 = Java 21
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags; // public, final, abstract...
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}Use javap -v Hello (verbose mode) to see the complete structure. But you don't need to remember all of it—just know:
- Constant Pool: The symbol table for all strings, class names, method names, field names
- Methods: Bytecode instructions + exception table + line number table
- Attributes: Annotations, generic signatures, inner class info, etc.
The constant pool is the JVM's "dictionary." All instructions reference indices in the constant pool rather than writing strings directly. This design has two advantages: fixed instruction length (2-byte index), and constants can be shared across multiple instructions.
Stack-Based Virtual Machine: Instruction Set
The JVM is a stack-based virtual machine—it has no registers, only an operand stack. Your simplest addition:
int add(int a, int b) {
return a + b;
}Decompiled, you see it working on the stack:
0: iload_1 // Push local variable 1 (a) onto stack
1: iload_2 // Push local variable 2 (b) onto stack
2: iadd // Pop two values, add, push result
3: ireturn // Pop top of stack and returnThe operand stack is not a register. iadd doesn't have a "fetch from which register" parameter—it takes two values from the top of the stack. This is what "stack-based" means.
Why did the JVM choose stack architecture? Stack-based instruction encoding is more compact than register-based (each instruction doesn't need to specify register numbers), and the instruction set doesn't need to change across platforms. The cost is more load/store instructions—moving local variables from the variable table to the stack and back.
Compare with x86's register architecture (you don't need to know x86 assembly, just feel the difference):
; x86 — operates directly on registers
mov eax, [a] ; From memory a to register eax
add eax, [b] ; eax = eax + b
; Result is in eaxThe JVM needs extra load steps, but in exchange, its instruction set is independent of any physical CPU.
Common Instructions Quick Reference
| Type | Instruction | Effect |
|---|---|---|
| Load | iload, aload, fload | Push local variable onto stack |
| Store | istore, astore, fstore | Pop stack top into local variable |
| Arithmetic | iadd, isub, imul, idiv | Pop two values, compute, push result |
| Object | new, getfield, putfield | Create object, read/write fields |
| Method | invokevirtual, invokespecial | Invoke method |
| Return | ireturn, areturn, return | Return various types |
The Art of Method Invocation Instructions
Java has four method invocation instructions, each used in different scenarios. This is the key to understanding "how polymorphism is implemented at the bytecode level."
invokevirtual: Instance Method, Dynamic Dispatch
void call(Animal a) {
a.speak(); // invokevirtual
}Decompiled:
0: aload_1
1: invokevirtual #12 // Method Animal.speak()Vinvokevirtual uses dynamic dispatch: at runtime, it looks up the method table based on a's actual type. If a is a Dog, it calls Dog.speak(); if it's a Cat, it calls Cat.speak().
The bytecode level doesn't care about the actual type—it just says "starting from the runtime class, look upward for the implementation of speak."
invokespecial: Constructor, Private Methods, Superclass Methods
class B extends A {
B() {
super(); // invokespecial
}
private void helper() {} // Caller: invokespecial
}invokespecial does no dynamic dispatch—the compiler has already determined which method to call. Calling super() in a constructor is fixed, and private methods can't be overridden.
invokestatic: Static Methods
int x = Math.max(3, 5); // invokestaticThe simplest—no instance, no dispatch.
invokeinterface: Interface Methods
Similar to invokevirtual, but the JVM needs to look up the interface method table (itable) instead of the virtual method table (vtable). Theoretically slightly slower—practically imperceptible.
void call(List list) {
list.size(); // invokeinterface
}Comparison of four instructions:
| Instruction | Target | Dispatch | Typical Scenario |
|---|---|---|---|
invokevirtual | Instance method | Runtime dynamic | obj.toString() |
invokespecial | Constructor/private/super | Compile-time fixed | super(), private void |
invokestatic | Static method | Compile-time fixed | Math.max() |
invokeinterface | Interface method | Runtime dynamic (itable lookup) | list.size() |
Exception Table
try-catch in bytecode isn't an "instruction" but an exception table:
void readFile(String path) {
try {
FileInputStream fis = new FileInputStream(path);
} catch (IOException e) {
System.out.println("error");
}
}Decompiled exception table:
Exception table:
from to target type
0 16 19 Class java/io/IOExceptionThis means: for code between bytecode offsets 0 and 16, if an IOException is thrown, jump to offset 19 for execution.
How is finally implemented? The finally block is duplicated in the bytecode—placed at the end of the normal path, and also at the end of each catch block. The compiler ensures that whether the method returns normally or with an exception, the code here will be executed.
try {
// ...
} finally {
cleanup(); // Appears twice in bytecode
}You can verify this with javap—you'll see the finally code appear in the exception table as a "final handler."
Bonus Exploration: Lambda in Bytecode
list.forEach(x -> System.out.println(x));A Lambda is not an anonymous inner class. At the bytecode level, the compiler generates an invokedynamic instruction, and at runtime, it creates an instance of the functional interface through LambdaMetafactory.
// Decompiled (javap -v)
0: aload_0
1: invokedynamic #7, 0 // InvokeDynamic #0:accept:()Ljava/util/function/Consumer;invokedynamic was introduced in Java 7 and became a first-class user in Java 8 with Lambdas. The reason: invokedynamic allows the specific call target to be determined at runtime—Lambda is generated only once at startup and reused afterward, performing better than anonymous classes.
Common Pitfalls
- "Reading bytecode = learning assembly" — The JVM instruction set has about 200 instructions (~30 commonly used), far fewer than x86 (thousands).
- "String + operation always creates objects" — The compiler and JIT optimize (string concatenation optimization, automatic StringBuilder usage), but in-loop concatenation is still a trap.
- "All exceptions need bytecode checking" — No, it's only for understanding
finallyduplication. You don't need to memorize the instruction table, just know how to readjavapoutput.
Pass Challenges
- Warm-up: Write a simple Java class (just an
addmethod), decompile withjavap -c, and explain the meaning of each line. - Hands-on: Write a method with
try-catch-finally, decompile it, and verify how many timesfinallywas duplicated. - Observe: Compare the bytecode of
String s = "a" + "b"vsString s = "a"; String t = s + "b"—can you see under what conditions the compiler does compile-time folding?
Traveler's Notes
Bytecode is the Java compiler's "tell-it-like-it-is"—if you don't ask, it pretends the syntactic sugar is real. Once you javap, it reveals the truth about generic erasure, string folding, and finally duplication.
→ Next Stop Preview
You can finally see Java's cards laid on the table. But there's a longer road ahead—some languages aren't content with just using what the compiler gives them; they want to do the compiler's work themselves. The sixth door: Metaprogramming & DSL—code that writes code.