Software Systems Atlas

Metadata Card

Prerequisites: Basic terminal operations (Ch2), Package Manager (Ch5)
Estimated time: 40 min
Core difficulty:
Completion marker: Can read a stack trace, locate problems through logs, and write minimal reproducible code

Your Progress

You're still in the workshop before setting out. More tools installed, more code written — and then you encounter your first error that leaves you helpless. Red text fills the screen for pages, and you don't even know where to start looking. The workshop master walks over, glances at the screen:

Your Task

Your program crashed. You think "the server is unreachable," but the server says "authentication failed," so you go check authentication and find it was actually "config file read wrong" — and finally, yesterday you changed application.properties and forgot to add a comma. This kind of "problem chain" is extremely common in debugging. What you need isn't clairvoyance — it's a systematic investigation method: reading stack traces, searching for error info, checking log files, writing minimal reproductions. This is what this chapter gives you — a wilderness survival manual.

Chapter Layers
Required reading: Reading stack traces, golden rules for searching error info, adding logs to critical paths, writing minimal reproducible examples
Optional reading: Log level usage scenarios (TRACE/DEBUG/INFO/WARN/ERROR), log file rotation configuration
Advanced: logback.xml detailed configuration, production log path troubleshooting, log collection pipeline (ELK)
This chapter will NOT require you to understand
Complete logback XML configuration syntax — default library config is enough
Production log rotation strategies and file path troubleshooting — learn when your project goes live
Using assertions (assert) — a development-time inspection tool, not a replacement for logging

The Breakthrough · Tracing the Origins

Scenario: The Program Crashed

Your program was running fine. You added a line of code, it compiled successfully, and you confidently ran it.

Then the screen exploded with red — dozens of lines of English letters, mixed with numbers and strange symbols, scrolling from the bottom of the screen to the top, like a stone rolling off a cliff covered in incantations.

You're completely baffled. These words piled up look like a foreign language — no, more like an encrypted foreign language. You search through them, but every word seems unintelligible: "Exception," "Thread," "NullPointer," "at" — you recognize these words, but strung together they make no sense.

You look out the window at the workshop — everything seems so foreign.

"What's all this red?" you ask the workshop master. He walks over, glances at the screen, and says: "That's the program's last words. Learn to read them, and you'll know where it died and how."

You wrote a small piece of Java code. It compiled fine, but at runtime it threw this error:

Exception in thread "main" java.lang.NullPointerException
 at com.example.Main.processData(Main.java:12)
 at com.example.Main.main(Main.java:6)

What's your first reaction?

Most people's first reaction is: "Where did I go wrong?" — but that question is too vague. Let's pause and learn to read the error message itself.

First Move: Reading the Stack Trace

"Don't panic first." The workshop master walks over, finger pointing at the top of the screen. "Read from the top — not the bottom."

"From the top?" you stare at the red blob, confused.

He points to the first line: Exception in thread "main" java.lang.NullPointerException. "This line tells you two things: the thread that crashed is called main, and the error type is NullPointerException. NullPointer means you tried to use something that doesn't exist."

He points to the lines below. "Each remaining line is a footprint — telling you how your program got to this point."

Every line in the error message is a clue.

Exception in thread "main" ← which thread crashed
java.lang.NullPointerException ← what type of error
 at com.example.Main.processData(Main.java:12) ← where it crashed (file:line)
 at com.example.Main.main(Main.java:6) ← who called it (call chain)

How to read it: Read the call chain from bottom to top, find the error type from top to bottom.

First look at the exception type: NullPointerException → some object was null but you called a method on it
Look at the first specific location: Main.java:12 → open the file and check line 12
Look at the call chain traceback: who called processData at line 6

The workshop master stares at the first line at com.example.Main.processData(Main.java:12) on the screen. "See? It tells you which file and which line it crashed on. Open it and have a look."

Suppose your code is:

java

public class Main {
 public static void main(String[] args) {
 String input = null;
 processData(input); // Line 6: called processData
 }

 public static void processData(String data) {
 int length = data.length(); // Line 12: data is null!
 System.out.println(length);
 }
}

Line 12 data.length() — data is null, calling .length() crashes it. Solution: ensure data is not null, or check before calling.

The workshop master leans his arm on the table. "You've learned the stack trace format. But when you use Python or JavaScript, error messages will wear different disguises — same structure, different clothes."

For readers using different languages:

Python's stack trace looks like this:

python

Traceback (most recent call last):
 File "main.py", line 6, in <module>
 process_data(input_val)
 File "main.py", line 3, in process_data
 length = len(data)
TypeError: object of type 'NoneType' has no len()

JavaScript (Node.js) stack trace looks like this:

TypeError: Cannot read properties of null (reading 'length')
 at processData (/app/main.js:7:18)
 at Object.<anonymous> (/app/main.js:3:1)

See? Different format, same structure: error type → location → call chain. No matter what language you're in, reading a stack trace is the same.

Second Move: Searching Error Information

You've learned to read stack traces — found the broken line, fixed the first bug.

But a second error appears, and this time you completely don't understand it:

org.postgresql.util.PSQLException:
 ERROR: duplicate key value violates unique constraint "users_pkey"

"duplicate... duplicate..." you mutter. "Copy? What got copied?" You try to guess its meaning but go in the wrong direction and waste half an hour.

"Here you go again." The workshop master sighs. "Not every problem needs you to guess. Others have walked this path too, and they wrote the answers on the web."

You encounter an error you don't recognize. For example:

org.postgresql.util.PSQLException:
 ERROR: duplicate key value violates unique constraint "users_pkey"

Can't understand it? Don't panic. Copy the error type + key information and search:

duplicate key value violates unique constraint "users_pkey"

Search results will tell you: you tried to insert a record into the database, but the primary key already exists. Now you know to check: did you insert twice? Did you not deduplicate? Is the primary key generation logic broken?

Golden Rules for Searching Errors:

Copy the complete error message (including exception type and key info), don't just copy "it doesn't work"
Remove project-specific parts (package names like com.example, class names like MyApp, server IPs), keep the exception type and general description
Add language/framework tags → e.g., "NullPointerException Java" or "duplicate key PostgreSQL Spring Boot"
Prioritize Stack Overflow, GitHub Issues, official documentation — the community has likely already solved the same problem you're facing

A Stack Overflow search result usually looks like this:

Q: PSQLException: duplicate key value violates unique constraintA: You're trying to insert a row with a primary key that already exists. Check if you're inserting twice or if your sequence generator is out of sync.

Focus on understanding the root cause rather than directly copy-pasting the solution. The same error can have a hundred different causes, and your situation might not be the same as someone else's.

Third Move: Reading Log Files

You can read stack traces and search for error info. But there's one problem — programs don't just crash when you're watching the terminal. What matters is: when nobody is watching the program, what went wrong?

"Your program crashed last night," your colleague says. "But you weren't watching the terminal — you were asleep. So how do you know what happened?"

You freeze. "I... I could keep the program running in the terminal?"

"That's not the solution." The workshop master shakes his head. "The program needs to record itself — like a ship's log, writing down every major and minor event. Even when nobody's watching, the record stays."

Real production programs don't throw errors directly at your terminal. They record all events during runtime — including errors — into log files.

You've deployed your Java web app, and a user says "the system reported an error," but it works fine locally. What do you do? First, find the logs:

bash

# Common log locations
/var/log/myapp/application.log
/var/log/myapp/error.log
~/logs/app.log

Each log line usually has a standard format:

2026-06-23 14:32:15.123 INFO [main] com.example.WorkshopService - Starting to process workpiece #1024
2026-06-23 14:32:15.456 ERROR [main] com.example.WorkshopService - Failed to process workpiece: material shortage
java.lang.RuntimeException: material shortage
 at com.example.MaterialService.checkStock(MaterialService.java:45)
 at com.example.WorkshopService.forgeItem(WorkshopService.java:78)

Look at what this log line tells us:

Info	Example
Timestamp	`2026-06-23 14:32:15.456`
Log level	`ERROR`
Thread	`[main]`
Source	`com.example.WorkshopService`
Message	`Failed to process workpiece: material shortage`
Stack trace	The exception stack trace that follows

Log levels help you filter quickly:

bash

# Only search for ERROR
grep "ERROR" application.log

# Search for errors in a specific time period
grep "2026-06-23 14:" application.log | grep "ERROR"

# Search for something related to a specific workpiece
grep "workpiece #1024" application.log

"Is that enough?" the workshop master asks. You nod — you can read error reports now. But then he says: "But there's one more thing: you can't wait for a bug to come before checking logs. You need to install a 'black box' while writing your code."

He pats your shoulder: "Add logs to critical paths. When problems arise later, you'll check logs instead of code."

Try It: Add Logging to Your Program

Remember the logging library we installed in the last chapter? Now use it.

python

# Python + loguru
from loguru import logger

def forge_item(item_id):
 logger.info("Starting to forge workpiece #{}", item_id)
 try:
 result = heat_and_hammer(item_id)
 logger.info("Workpiece #{} forged successfully", item_id)
 return result
 except Exception as e:
 logger.error("Workpiece #{} forging failed: {}", item_id, str(e))
 logger.exception("Full exception details")
 raise

You've set up a logging framework in Python. But what if you're using Java? Same thing, different syntax:

java

// Java + SLF4J
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class ForgeService {
 private static final Logger log = LoggerFactory.getLogger(ForgeService.class);

 public void forgeItem(String workpieceId) {
 log.info("Starting to forge workpiece {}", workpieceId);
 try {
 heatAndHammer(workpieceId);
 log.info("Workpiece {} forged successfully", workpieceId);
 } catch (Exception e) {
 log.error("Workpiece {} forging failed", workpieceId, e);
 }
 }
}

Adding logs isn't like writing comments — it's installing a "black box" for your program. When something goes wrong, the log is your only witness.

Fourth Move: Writing a Minimal Reproducible Example

You found the bug — but you can't fix it yourself. You need to consult a more experienced craftsman in the workshop.

You open a chat window and start typing: "My program reported an error. I have a Spring Boot application connected to a PostgreSQL database, using the HikariCP connection pool, and when I call UserService, it throws a NullPointerException..."

You send a long paragraph of text, along with five hundred lines of code.

The other person replies with two words: "Too much."

You feel a bit wronged — but they're right. You've included too much irrelevant information. The code has database connections, configuration loading, routing, caching... but the bug might be in just one line of code.

This is one of the most underestimated skills in debugging. You encountered a bug, but the real code is too long and complex to send directly to someone else. You need a standalone, as short as possible piece of code to reproduce the bug.

Don't ask questions like this:

My Spring Boot app throws NullPointerException, can anyone help? Attached 500 lines of code.

Ask like this:

I have a piece of code where user.getName() throws a NullPointerException in a specific scenario. Here's a minimal reproduction:
java
public class Main {
public static void main(String[] args) {
User user = getUser(); // returns null
System.out.println(user.getName()); // throws NullPointerException
}
}
1
2
3
4
5
6

Principles of a minimal reproduction:

Remove all irrelevant code — delete everything not related to the bug
Use hardcoded data — don't read from databases or APIs, hardcode it in the code
Ensure it's independently runnable — someone else can copy-paste and run it
Include the complete error info — what error did you get?

The process of writing a minimal reproduction is itself debugging. Many times, you'll find the bug yourself halfway through writing it. This is a repeatedly proven phenomenon — the process of articulating your problem clearly is the process of solving it.

Advanced Adventure

Log Level Usage Guide

Logs shouldn't be as detailed as possible — too little and you can't find anything, too much and key info gets drowned. Standard log levels from low to high:

Level	When to Use	Example
TRACE	Almost never, only for extreme debugging	Entering/exiting each iteration of a loop
DEBUG	Development-stage information	SQL statements, API request parameters
INFO	Normal milestone events	Service startup, workpiece registration, scheduled task execution
WARN	Non-fatal but notable situations	Missing config item (using default), retry, degradation
ERROR	Errors requiring human intervention	Database unreachable, payment failure, file not found

During development: Set log level to DEBUG or even TRACE for the most detailed info. Production: Set to INFO or WARN to avoid performance overhead and excessive log volume. When troubleshooting: Temporarily drop a specific package's level to DEBUG, then change it back after locating the issue.

Advanced: logback.xml Configuration (come back when your project goes live)
During development you need enough information, but don't want to be flooded with trivial details. Imagine a large board on the workshop wall showing the detail level for different areas — "Forge area to finest detail, storage area just a summary." In Java, logback.xml does this:
xml

<configuration>

<root level="INFO"/>

<logger name="com.example.WorkshopService" level="DEBUG"/>
</configuration>
1
2
3
4
5
6
7
This is like saying: "Workshop-wide, only record important events, but record every step in the forge area."

Common Pitfalls

Don't Use `System.out.println`

If your code still has System.out.println("made it here") — stop. A few problems:

It only outputs to stdout, not into log files
Nobody reads stdout in production
You don't want to see workpiece info printed on the console in production

Rule: Use a logging library, never use System.out and print() for debugging.

Advanced: Production Log Path Issues (things you need to know after going live — just understand for now)
Lesson: Confirm the log file location
You wrote an API. Local testing is perfect. Deployed to the server, user says it's broken. You log in to check the logs — no logs. After searching for half a day, you find: your log config wrote to a different path. Locally, logs go to ./logs/app.log, but in Docker deployment the working directory changes, so they go to /app/current/logs/app.log — but you've been looking at /var/log/app.log.
bash
# First confirm which directory the program is in
ps aux | grep myapp
# Find logs in the corresponding directory
cd /usr/local/myapp/
find . -name "*.log" -type f
1
2
3
4
5
Lesson: Log File Rotation
After three days of running, the log file has grown to 2GB. You log in and cat application.log — the terminal freezes. Log files need rotation configured — each day's logs go to a new file, old ones are compressed and archived. This is essential for production, but configure it when you need it.

Final Challenge

Warm-up (5 min, required): Intentionally write code that throws an exception/error (like accessing an array out of bounds, using a null object), observe each line of the stack trace. Try to find the bug's location from the stack trace.
Challenge (30 min, optional): Add a logging library (loguru/slf4j/winston from last chapter) to an existing project. Add logs to critical paths (startup, requests, errors), then intentionally create a bug and find it from the logs.
Observe: Write code in two different languages (e.g., Python and Java) that throw the same error (like dividing by zero), and observe the differences and commonalities in stack trace format.
Troubleshooting: You start a Java service and get this error:

Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonFactory

You clearly have Jackson dependency in pom.xml. Where's the problem? Hint: was it mvn compile or mvn package? Can the dependency be found at runtime? (Answer: missing Jackson-core transitive dependency, or the build artifact doesn't include all dependencies. Solution: use mvn dependency:tree to check, or configure shade plugin to package as a fat jar.)

Checklist

After this chapter, you should be able to:

Read stack traces in any language, find bugs by looking at error type and location
Effectively search using keywords + error information
Correctly configure and write to log files in at least one of the three languages
Use grep, tail, less etc. to view and filter logs
Write minimal reproductions to isolate and report a bug
Understand why System.out.println can't replace logging

Common Sticking Points

"Can't understand the stack trace": Don't read the whole thing at once. Start with the exception type, then look upward for your code (not third-party library code). Find the line starting with at com.yourpackage.xxx — that's the entry point.
"No search results": Remove specific numbers, user IDs, timestamps, keep only the error type and key description. Searching in English yields 5x more results than Chinese.
"Log file too big to open": Don't use cat. Use tail -n 100 app.log (last 100 lines) or less app.log (paged browsing), or grep ERROR app.log to filter.
"I fixed the error but don't know the cause": Go back to the logs, find the context around the error. If still can't tell, add more logs next time and try again. Observability isn't configured once — it's iteratively improved.
"Can't write a minimal reproduction": Start from the buggy code and delete lines one by one. Run after each deletion, until you can't delete any more and still reproduce. You might find the bug yourself in this process.
Two classic error types:
NullPointerException / NoneType has no ...: The #1 error in almost all languages — you tried to use something that wasn't there
ClassNotFoundException / ModuleNotFoundError: Your code imported something but it can't be found at runtime — usually a dependency installation issue

No Need to Understand Now

Remote Debugging: Setting breakpoints in production — this is an advanced skill; logs are enough for now
APM tools (Datadog, New Relic, SkyWalking): Distributed tracing and performance monitoring, only needed for large team projects
Structured logs (JSON format logs): For log processing tools; human-readable text logs are fine at this stage
Log collection pipeline (Filebeat → Logstash → Elasticsearch): The full ELK stack — that's an ops concern; for now, just know "logs write to files"
Assertions: Development-time inspection tools, but don't use them in production or for flow control in logging scenarios

Traveler's Notes

Programs will always have errors — but you can read stack traces, search for error info, check logs, and write minimal reproductions. These skills outlast any single programming language. Every debugging session is a process of "narrowing down the suspect range," and eventually you'll find that one comma, that one null, that one typo. It's not talent — it's a systematic method that can be practiced and optimized.

→ Preview of Next Stop

You've learned to read logs and errors. But sometimes, the program isn't running on your machine — it's inside a Docker container. You need a way to package your entire project so it runs identically on any machine. Next stop: Docker.

Don't Use System.out.println ​

→ Preview of Next Stop ​

Don't Use `System.out.println`

→ Preview of Next Stop