Chapter 3: TCP Deep Dive
Metadata Card
| Field | Value |
|---|---|
| Difficulty | (Hardcore) |
| Prerequisites | Vol 4 Chapter 2 (Transport Layer Introduction), basic socket programming |
| Keywords | TCP header, three-way handshake, four-way wave, TIME_WAIT, sequence number, sliding window, congestion control, state machine |
| Core Skills | Read every bit of a TCP header; draw handshake/wave timing diagrams; understand flow and congestion control as "walking on two legs" |
Your Progress
"Before you lies the most critical segment of the post road — TCP (Transmission Confirmation Spell). It guarantees that every spell message you send arrives in order. Three mana handshakes, sliding scroll window, mana flow control — TCP transforms your connection from unreliable to reliable."
TCP is the "reliable transmission hub" of the entire post road. In the previous two chapters, you mastered the seven-layer beacon tower and the three elements (encapsulation/decapsulation/multiplexing). Now you'll crawl into TCP's incantation brain — understanding every field, every conversation, every retransmission.
After this chapter, you won't "write" TCP (that's the post road guardian array's job), but you'll be able to read every transmission flow in your mana telescope, and when debugging "slowness", you won't have to guess blindly.
Chapter Layering
- Must Read (TCP Basics): Connection establishment (three-way handshake), reliable transmission (sequence number/acknowledgment number/retransmission), connection teardown (four-way wave/TIME_WAIT), observing TCP states with
ss- Advanced (TCP Advanced): Sliding window/flow control, congestion control overview (specific algorithms covered in ch10), TCP state machine
- Advanced: Scapy manual three-way handshake, window scale factor, TCP options details, fast recovery details
This chapter will NOT require you to master
- Scapy-forged TCP packets for port scanning
- Formula derivation of congestion control algorithms (covered in ch10)
- SACK/Selective Acknowledgment binary format
Your Task
This task follows a TCP Basics and TCP Advanced two-step approach:
TCP Basics (Must Read):
- Understand the complete process of three-way handshake and four-way wave
- Sequence number and acknowledgment number: not packet number, but byte stream coordinates
- Use
ssandlsofto observe real-time TCP states - TCP header: understand a TCP conversation through 20 bytes
TCP Advanced (Optional): 5. Sliding window and flow control: how sender and receiver coordinate speed 6. Congestion control overview (detailed algorithm in ch10) 7. TCP state machine: hand-drawn migration diagram of 11 states
The specific algorithms of congestion control (Cubic/BBR) are systematically covered in Chapter 10 (Congestion Control Special Topic). This chapter only provides an overview.
Breakthrough · Trace Back
3.1 TCP Header: Understand a Conversation Through 20 Bytes
Imagine you, as a post road inspector, have caught a spell message flying through the air. The message's surface is engraved with a rune sequence:
00 50 0e 8c 0a 02 23 4c 00 00 00 00 a0 02 ff d7
00 00 00 00 02 04 05 b4As the post road administrator, you need to read everything from these 20 bytes: who sent it, who it's for, how much was sent, how much the receiver has received. It's like reading a complete letter from a string of codes.
Overwhelmed? Don't worry. Break it open.
Fixed Header (20 bytes)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Offset| Res. |N|C|E|U|A|P|R|S|F| Window Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (Variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Field-by-field breakdown:
| Field | Length | Meaning |
|---|---|---|
| Source Port / Destination Port | 16 bits each | Basis for multiplexing/demultiplexing. Socket identifier = (src_ip, src_port, dst_ip, dst_port, protocol) |
| Sequence Number (SEQ) | 32 bits | Byte stream coordinate. The first data byte's offset in the entire byte stream |
| Acknowledgment Number (ACK) | 32 bits | The next byte sequence number expected. Means all previous bytes received |
| Data Offset | 4 bits | Header length (in 4-byte units). Min 5 (20B), Max 15 (60B) |
| Flags | 9 bits | NS CWR ECE URG ACK PSH RST SYN FIN |
| Window Size | 16 bits | How many more bytes the receiver can accept (flow control) |
| Checksum | 16 bits | Covers pseudo-header + TCP header + data |
| Urgent Pointer | 16 bits | Only meaningful when URG=1 |
3.2 Three-Way Handshake: Who Speaks First Sends SYN
You're not mistaken: TCP connection establishment is a three-way exchange.
Client Server
| |
|------ SYN, SEQ=1000 -------------->| ① Client: I want to connect
| |
|<---- SYN, SEQ=5000, ACK=1001 ------| ② Server: OK, acknowledging ①
| |
|------ ACK, SEQ=1001, ACK=5001 --->| ③ Client: Received your SYN, starting data
| |
|========= Data flow starts here ====|Why Three Times, Not Two, Not Four?
The fatal problem with two-way handshake: If client A sends SYN, network delay causes a retransmission, and the old SYN arrives at server B — a two-way handshake would make B establish a "dead connection" waiting for data that never comes. The third step of the three-way handshake confirms that A is really alive, not a ghost packet.
3.3 Sequence Number & Acknowledgment Number: Byte Coordinates, Not Packet Numbers
This is the concept beginners most often confuse about TCP. Many people think TCP counts by "the nth packet sent" — like numbering chapters in a textbook.
But that's not how it works at all. Post road couriers don't accept vague descriptions like "the third letter" — they need precise coordinates.
TCP doesn't count by "packets". It counts by "bytes". Like every stone slab on a post road having a unique number, rather than "the third pile of stones."
Your application layer writes 3000 bytes, TCP splits it into three segments of 1000 bytes each:
Segment ①: seq=1000, Len=1000 → covers bytes 1000-1999
Segment ②: seq=2000, Len=1000 → covers bytes 2000-2999
Segment ③: seq=3000, Len=1000 → covers bytes 3000-3999Receiver replies: ACK=4000 → meaning I received bytes 0-3999, please send 4000.
This is why cumulative ACKs work — if segment ① is lost but ②③ arrived, the receiver can only reply ACK=1000, because bytes before 1000 aren't complete.
Initial Sequence Number (ISN): Why not 0?
Historically TCP did start from 0, but now uses random ISN (RFC 6298 / current clock-based algorithm).
- Prevents "ghost segments" from old connections being mistaken — two different connections accidentally using the same IP:port combination; random ISN ensures segments don't get confused
- Security — makes forging RST segments harder (attacker needs to guess the sequence number range)
3.4 Four-Way Wave: A Graceful Farewell
After the connection is established, data flows back and forth. But eventually it's time to say goodbye.
TCP is full-duplex — both sides can talk and listen simultaneously. So saying goodbye is more complex than a one-way street: each side must independently say "I'm done talking" and wait for the other to acknowledge "I understand."
Like two tower stations ending a call — not one side just hanging up: A says "I'm done," B says "I understand" (but B may still have things to say), B finishes and says "I'm done too," A says "Got it." Four sentences, not one missing.
Client Server
| |
|------ FIN, SEQ=4000 ------------->| ① Client: I'm done sending
| |
|<---- ACK, SEQ=8000, ACK=4001 -----| ② Server: Got it (still sending remaining data)
| |
| One-way data remaining... |
| |
|<---- FIN, SEQ=9000, ACK=4001 -----| ③ Server: I'm done too
| |
|------ ACK, SEQ=4001, ACK=9001 --->| ④ Client: Received (entering TIME_WAIT)
| |TIME_WAIT: Why Wait 2MSL Before Saying Goodbye Forever?
After the client sends the final ACK, it enters TIME_WAIT state, holding for 2 × Maximum Segment Lifetime (typically 30 seconds to 2 minutes, Linux default 60 seconds).
Imagine standing in front of a tower, shouting your last words — "I'm leaving!" You can't just turn around and walk away; you need to wait a little, in case the other party didn't hear and asks "What did you say?" — you need to be able to reply.
Two reasons:
Ensure the server receives the final ACK — If the client's final ACK gets lost on the post road, the server will resend FIN. TIME_WAIT allows the client to resend the ACK. The "2" in 2MSL is because one FIN retransmission needs at least one round trip — the other side resends it to you + you reply ACK back.
Let all "ghost segments" of the connection die on the post road — After 2MSL, any delayed segments belonging to this connection (late letters caused by post road congestion) will have been discarded or timed out. This way, when you use the same port number for a new connection next time, no "ghost letter" from last month will be mistaken as legitimate data for the new connection.
3.5 Sliding Window & Flow Control: Speed Matching
The receiver has a receive buffer. If the application layer reads slowly and the network layer sends a lot, the buffer fills up. Flow control prevents the sender from overwhelming the receiver.
Window Advertisement — "How much more can I accept?"
Every TCP segment header carries a Window Size field (16-bit, max 65535), which is like the other side telling you in real-time "how much space my hall has left":
After sender finishes, 4096 bytes of window remaining... can still send
ACK received, window becomes 8192 bytes... hall cleared, can send again
ACK received but window is 0... hall full! Stop! Wait for the receiver to say "space available"The receiver's advertised window = remaining receive buffer size. The sender's usable window = min(receiver's advertised window, congestion window) — what you can actually send depends on two things: whether the other side can accept (flow control) and whether the post road can bear it (congestion control).
Window Scaling
The window size field is only 16 bits, max 65535 bytes — far from enough for today's multi-MB buffers.
TCP negotiates the Window Scale option during the SYN/SYN-ACK of the three-way handshake, shifting left by up to 14 bits, pushing the window limit to 1GB (65535 × 2¹⁴).
3.6 Congestion Control Overview: What if the Post Road is Too Crowded?
Flow control handles "whether the other side can accept" — if you send too many letters and the other tower's hall is full, it tells you "slow down."
But congestion control handles a completely different problem: It's not that the other side is full — the entire post road is jammed. Like too many cars on a highway, not your garage being full — the whole road is congested. These two are often confused, but they're fundamentally different:
| Flow Control | Congestion Control | |
|---|---|---|
| What it manages | Receiver buffer overflow | Intermediate router/link overload with packet loss |
| Who drives | Receiver reports window | Sender self-assesses |
| Method | Window advertisement | Slow start, congestion avoidance |
| Packet loss means | Receiver reading too slow? | Network too crowded! Slow down immediately! |
Four algorithms (TCP Reno, the most classic industrial variant):
① Slow Start
Not really "slow". On the contrary — it grows exponentially.
cwnd = 1 (MSS)
Every ACK received, cwnd += 1
→ Doubles every RTT
RTT 1: cwnd = 1
RTT 2: cwnd = 2
RTT 3: cwnd = 4
RTT 4: cwnd = 8
...
Until ssthresh (slow start threshold) or packet lossIntuition: The sender doesn't know the network's capacity. It uses slow start to quickly probe the upper limit — exponential growth goes from 1 to thousands in milliseconds.
② Congestion Avoidance
When cwnd >= ssthresh, switch to linear growth:
After receiving all ACKs for an RTT, cwnd += 1
→ Additive Increase (the AI part of AIMD)③ Fast Retransmit
When the sender receives three duplicate ACKs (for the same sequence number), it determines that segment is lost, doesn't wait for timeout, and immediately retransmits.
④ Fast Recovery
After fast retransmit, instead of going back to slow start, enter fast recovery:
ssthresh = cwnd / 2
cwnd = ssthresh + 3 (because three duplicate ACKs already confirmed some data)
For each duplicate ACK, cwnd += 1
When a new ACK arrives, cwnd = ssthresh → enter congestion avoidanceModern Congestion Control Spectrum
| Algorithm | Core Idea | Target Scenario |
|---|---|---|
| TCP Reno | AIMD + Fast Recovery | Traditional wired networks, loss = congestion |
| TCP BIC/CUBIC | Binary search window | High bandwidth delay networks (Linux default) |
| TCP BBR | Bandwidth and RTT modeling, doesn't treat loss as congestion signal | Long-fat networks, wireless (loss but not congestion) |
3.7 TCP State Machine: The Life and Death of 11 States
| # | State | Who Enters | Description |
|---|---|---|---|
| 1 | CLOSED | Initial/End | No connection |
| 2 | LISTEN | Server | Waiting for connection request |
| 3 | SYN_SENT | Client | Sent SYN, waiting for SYN+ACK |
| 4 | SYN_RCVD | Server | Received SYN, sent SYN+ACK |
| 5 | ESTABLISHED | Both | Connection fully established |
| 6 | FIN_WAIT_1 | Active closer | Sent FIN, waiting for ACK or FIN |
| 7 | FIN_WAIT_2 | Active closer | Received ACK of FIN, waiting for FIN |
| 8 | CLOSE_WAIT | Passive closer | Received FIN, waiting for app to close |
| 9 | CLOSING | Both closing | Both sent FIN but waiting for each other's ACK |
| 10 | LAST_ACK | Passive closer | App closed, sent FIN, waiting for ACK |
| 11 | TIME_WAIT | Active closer | Waiting 2MSL |
Common Pitfalls
| Pitfall | Cause | Solution |
|---|---|---|
| Getting SEQ and ACK mixed up | Thinking by "packet" number | Remember: every byte is a coordinate, ACK is "next coordinate" |
| Why TIME_WAIT is 2MSL | Thinking 1 MSL is enough for one-way ACK | Server resending FIN needs one round trip (2MSL) |
| Sliding window ≠ congestion window | Confusing the two separate controls | Actual window = min(flow_wnd, cong_wnd) |
| Why cwnd += 3 during fast recovery | Seems unreasonable | Three duplicate ACKs mean three segments have left the network, capacity freed |
Traveler's Notes
TCP is, in my opinion, the most beautiful protocol in computer networking.
At first, looking at the header format made me dizzy — 20 bytes, a bunch of flags, checksum, urgent pointer... I thought it was just a clunky standard.
But what truly made me fall in love with TCP was the sliding window design. Using just 16 bits of window size, it coordinates the transmission rates of both parties, and elegantly solved the "16 bits aren't enough" problem with scaling — this isn't a patch, it's the evolutionary capability of protocol design.
Later, when I wrote a toy TCP/IP stack (the kind with only 100 lines of state machine), I truly understood that the three-way handshake isn't as simple as "three steps" — in those three packets, it negotiates MSS, window scaling, timestamps, SACK — all hidden inside the TCP Options.
Don't be afraid. Look at the state machine diagram twice, capture a few packets with Python, and it'll become your muscle memory. In the next chapter, you'll see TCP's "application layer child" — HTTP — how it stands on the shoulders of these 20 bytes and rules the internet.
→ Next Stop Preview
Chapter 4: HTTP & Web Servers
The application layer protocol built on top of TCP connections — you'll write your own mini web server (Python implementation) and understand the request/response format, caching strategies, cookies, CORS — all the essentials you need to work with HTTP daily.