Skip to content

Chapter 3: TCP Deep Dive


Metadata Card

FieldValue
Difficulty(Hardcore)
PrerequisitesVol 4 Chapter 2 (Transport Layer Introduction), basic socket programming
KeywordsTCP header, three-way handshake, four-way wave, TIME_WAIT, sequence number, sliding window, congestion control, state machine
Core SkillsRead every bit of a TCP header; draw handshake/wave timing diagrams; understand flow and congestion control as "walking on two legs"

Your Progress

"Before you lies the most critical segment of the post road — TCP (Transmission Confirmation Spell). It guarantees that every spell message you send arrives in order. Three mana handshakes, sliding scroll window, mana flow control — TCP transforms your connection from unreliable to reliable."

TCP is the "reliable transmission hub" of the entire post road. In the previous two chapters, you mastered the seven-layer beacon tower and the three elements (encapsulation/decapsulation/multiplexing). Now you'll crawl into TCP's incantation brain — understanding every field, every conversation, every retransmission.

After this chapter, you won't "write" TCP (that's the post road guardian array's job), but you'll be able to read every transmission flow in your mana telescope, and when debugging "slowness", you won't have to guess blindly.


Chapter Layering

  • Must Read (TCP Basics): Connection establishment (three-way handshake), reliable transmission (sequence number/acknowledgment number/retransmission), connection teardown (four-way wave/TIME_WAIT), observing TCP states with ss
  • Advanced (TCP Advanced): Sliding window/flow control, congestion control overview (specific algorithms covered in ch10), TCP state machine
  • Advanced: Scapy manual three-way handshake, window scale factor, TCP options details, fast recovery details

This chapter will NOT require you to master

  • Scapy-forged TCP packets for port scanning
  • Formula derivation of congestion control algorithms (covered in ch10)
  • SACK/Selective Acknowledgment binary format

Your Task

This task follows a TCP Basics and TCP Advanced two-step approach:

TCP Basics (Must Read):

  1. Understand the complete process of three-way handshake and four-way wave
  2. Sequence number and acknowledgment number: not packet number, but byte stream coordinates
  3. Use ss and lsof to observe real-time TCP states
  4. TCP header: understand a TCP conversation through 20 bytes

TCP Advanced (Optional): 5. Sliding window and flow control: how sender and receiver coordinate speed 6. Congestion control overview (detailed algorithm in ch10) 7. TCP state machine: hand-drawn migration diagram of 11 states

The specific algorithms of congestion control (Cubic/BBR) are systematically covered in Chapter 10 (Congestion Control Special Topic). This chapter only provides an overview.


Breakthrough · Trace Back

3.1 TCP Header: Understand a Conversation Through 20 Bytes

Imagine you, as a post road inspector, have caught a spell message flying through the air. The message's surface is engraved with a rune sequence:

00 50 0e 8c 0a 02 23 4c 00 00 00 00 a0 02 ff d7
00 00 00 00 02 04 05 b4

As the post road administrator, you need to read everything from these 20 bytes: who sent it, who it's for, how much was sent, how much the receiver has received. It's like reading a complete letter from a string of codes.

Overwhelmed? Don't worry. Break it open.

Fixed Header (20 bytes)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |        Destination Port       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Offset| Res. |N|C|E|U|A|P|R|S|F|        Window Size          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Checksum             |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Options (Variable)                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             Data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field-by-field breakdown:

FieldLengthMeaning
Source Port / Destination Port16 bits eachBasis for multiplexing/demultiplexing. Socket identifier = (src_ip, src_port, dst_ip, dst_port, protocol)
Sequence Number (SEQ)32 bitsByte stream coordinate. The first data byte's offset in the entire byte stream
Acknowledgment Number (ACK)32 bitsThe next byte sequence number expected. Means all previous bytes received
Data Offset4 bitsHeader length (in 4-byte units). Min 5 (20B), Max 15 (60B)
Flags9 bitsNS CWR ECE URG ACK PSH RST SYN FIN
Window Size16 bitsHow many more bytes the receiver can accept (flow control)
Checksum16 bitsCovers pseudo-header + TCP header + data
Urgent Pointer16 bitsOnly meaningful when URG=1

3.2 Three-Way Handshake: Who Speaks First Sends SYN

You're not mistaken: TCP connection establishment is a three-way exchange.

Client                              Server
  |                                    |
  |------ SYN, SEQ=1000 -------------->|   ① Client: I want to connect
  |                                    |
  |<---- SYN, SEQ=5000, ACK=1001 ------|   ② Server: OK, acknowledging ①
  |                                    |
  |------ ACK, SEQ=1001, ACK=5001 --->|   ③ Client: Received your SYN, starting data
  |                                    |
  |========= Data flow starts here ====|

Why Three Times, Not Two, Not Four?

The fatal problem with two-way handshake: If client A sends SYN, network delay causes a retransmission, and the old SYN arrives at server B — a two-way handshake would make B establish a "dead connection" waiting for data that never comes. The third step of the three-way handshake confirms that A is really alive, not a ghost packet.


3.3 Sequence Number & Acknowledgment Number: Byte Coordinates, Not Packet Numbers

This is the concept beginners most often confuse about TCP. Many people think TCP counts by "the nth packet sent" — like numbering chapters in a textbook.

But that's not how it works at all. Post road couriers don't accept vague descriptions like "the third letter" — they need precise coordinates.

TCP doesn't count by "packets". It counts by "bytes". Like every stone slab on a post road having a unique number, rather than "the third pile of stones."

Your application layer writes 3000 bytes, TCP splits it into three segments of 1000 bytes each:

Segment ①: seq=1000, Len=1000     → covers bytes 1000-1999
Segment ②: seq=2000, Len=1000     → covers bytes 2000-2999
Segment ③: seq=3000, Len=1000     → covers bytes 3000-3999

Receiver replies: ACK=4000 → meaning I received bytes 0-3999, please send 4000.

This is why cumulative ACKs work — if segment ① is lost but ②③ arrived, the receiver can only reply ACK=1000, because bytes before 1000 aren't complete.

Initial Sequence Number (ISN): Why not 0?

Historically TCP did start from 0, but now uses random ISN (RFC 6298 / current clock-based algorithm).

  • Prevents "ghost segments" from old connections being mistaken — two different connections accidentally using the same IP:port combination; random ISN ensures segments don't get confused
  • Security — makes forging RST segments harder (attacker needs to guess the sequence number range)

3.4 Four-Way Wave: A Graceful Farewell

After the connection is established, data flows back and forth. But eventually it's time to say goodbye.

TCP is full-duplex — both sides can talk and listen simultaneously. So saying goodbye is more complex than a one-way street: each side must independently say "I'm done talking" and wait for the other to acknowledge "I understand."

Like two tower stations ending a call — not one side just hanging up: A says "I'm done," B says "I understand" (but B may still have things to say), B finishes and says "I'm done too," A says "Got it." Four sentences, not one missing.

Client                              Server
  |                                    |
  |------ FIN, SEQ=4000 ------------->|   ① Client: I'm done sending
  |                                    |
  |<---- ACK, SEQ=8000, ACK=4001 -----|   ② Server: Got it (still sending remaining data)
  |                                    |
  |        One-way data remaining...   |
  |                                    |
  |<---- FIN, SEQ=9000, ACK=4001 -----|   ③ Server: I'm done too
  |                                    |
  |------ ACK, SEQ=4001, ACK=9001 --->|   ④ Client: Received (entering TIME_WAIT)
  |                                    |

TIME_WAIT: Why Wait 2MSL Before Saying Goodbye Forever?

After the client sends the final ACK, it enters TIME_WAIT state, holding for 2 × Maximum Segment Lifetime (typically 30 seconds to 2 minutes, Linux default 60 seconds).

Imagine standing in front of a tower, shouting your last words — "I'm leaving!" You can't just turn around and walk away; you need to wait a little, in case the other party didn't hear and asks "What did you say?" — you need to be able to reply.

Two reasons:

  1. Ensure the server receives the final ACK — If the client's final ACK gets lost on the post road, the server will resend FIN. TIME_WAIT allows the client to resend the ACK. The "2" in 2MSL is because one FIN retransmission needs at least one round trip — the other side resends it to you + you reply ACK back.

  2. Let all "ghost segments" of the connection die on the post road — After 2MSL, any delayed segments belonging to this connection (late letters caused by post road congestion) will have been discarded or timed out. This way, when you use the same port number for a new connection next time, no "ghost letter" from last month will be mistaken as legitimate data for the new connection.


3.5 Sliding Window & Flow Control: Speed Matching

The receiver has a receive buffer. If the application layer reads slowly and the network layer sends a lot, the buffer fills up. Flow control prevents the sender from overwhelming the receiver.

Window Advertisement — "How much more can I accept?"

Every TCP segment header carries a Window Size field (16-bit, max 65535), which is like the other side telling you in real-time "how much space my hall has left":

After sender finishes, 4096 bytes of window remaining... can still send
ACK received, window becomes 8192 bytes... hall cleared, can send again
ACK received but window is 0... hall full! Stop! Wait for the receiver to say "space available"

The receiver's advertised window = remaining receive buffer size. The sender's usable window = min(receiver's advertised window, congestion window) — what you can actually send depends on two things: whether the other side can accept (flow control) and whether the post road can bear it (congestion control).

Window Scaling

The window size field is only 16 bits, max 65535 bytes — far from enough for today's multi-MB buffers.

TCP negotiates the Window Scale option during the SYN/SYN-ACK of the three-way handshake, shifting left by up to 14 bits, pushing the window limit to 1GB (65535 × 2¹⁴).


3.6 Congestion Control Overview: What if the Post Road is Too Crowded?

Flow control handles "whether the other side can accept" — if you send too many letters and the other tower's hall is full, it tells you "slow down."

But congestion control handles a completely different problem: It's not that the other side is full — the entire post road is jammed. Like too many cars on a highway, not your garage being full — the whole road is congested. These two are often confused, but they're fundamentally different:

Flow ControlCongestion Control
What it managesReceiver buffer overflowIntermediate router/link overload with packet loss
Who drivesReceiver reports windowSender self-assesses
MethodWindow advertisementSlow start, congestion avoidance
Packet loss meansReceiver reading too slow?Network too crowded! Slow down immediately!

Four algorithms (TCP Reno, the most classic industrial variant):

① Slow Start

Not really "slow". On the contrary — it grows exponentially.

cwnd = 1 (MSS)
Every ACK received, cwnd += 1
→ Doubles every RTT

RTT 1: cwnd = 1
RTT 2: cwnd = 2
RTT 3: cwnd = 4
RTT 4: cwnd = 8
...
Until ssthresh (slow start threshold) or packet loss

Intuition: The sender doesn't know the network's capacity. It uses slow start to quickly probe the upper limit — exponential growth goes from 1 to thousands in milliseconds.

② Congestion Avoidance

When cwnd >= ssthresh, switch to linear growth:

After receiving all ACKs for an RTT, cwnd += 1
→ Additive Increase (the AI part of AIMD)

③ Fast Retransmit

When the sender receives three duplicate ACKs (for the same sequence number), it determines that segment is lost, doesn't wait for timeout, and immediately retransmits.

④ Fast Recovery

After fast retransmit, instead of going back to slow start, enter fast recovery:

ssthresh = cwnd / 2
cwnd = ssthresh + 3 (because three duplicate ACKs already confirmed some data)
For each duplicate ACK, cwnd += 1
When a new ACK arrives, cwnd = ssthresh → enter congestion avoidance

Modern Congestion Control Spectrum

AlgorithmCore IdeaTarget Scenario
TCP RenoAIMD + Fast RecoveryTraditional wired networks, loss = congestion
TCP BIC/CUBICBinary search windowHigh bandwidth delay networks (Linux default)
TCP BBRBandwidth and RTT modeling, doesn't treat loss as congestion signalLong-fat networks, wireless (loss but not congestion)

3.7 TCP State Machine: The Life and Death of 11 States

#StateWho EntersDescription
1CLOSEDInitial/EndNo connection
2LISTENServerWaiting for connection request
3SYN_SENTClientSent SYN, waiting for SYN+ACK
4SYN_RCVDServerReceived SYN, sent SYN+ACK
5ESTABLISHEDBothConnection fully established
6FIN_WAIT_1Active closerSent FIN, waiting for ACK or FIN
7FIN_WAIT_2Active closerReceived ACK of FIN, waiting for FIN
8CLOSE_WAITPassive closerReceived FIN, waiting for app to close
9CLOSINGBoth closingBoth sent FIN but waiting for each other's ACK
10LAST_ACKPassive closerApp closed, sent FIN, waiting for ACK
11TIME_WAITActive closerWaiting 2MSL

Common Pitfalls

PitfallCauseSolution
Getting SEQ and ACK mixed upThinking by "packet" numberRemember: every byte is a coordinate, ACK is "next coordinate"
Why TIME_WAIT is 2MSLThinking 1 MSL is enough for one-way ACKServer resending FIN needs one round trip (2MSL)
Sliding window ≠ congestion windowConfusing the two separate controlsActual window = min(flow_wnd, cong_wnd)
Why cwnd += 3 during fast recoverySeems unreasonableThree duplicate ACKs mean three segments have left the network, capacity freed

Traveler's Notes

TCP is, in my opinion, the most beautiful protocol in computer networking.

At first, looking at the header format made me dizzy — 20 bytes, a bunch of flags, checksum, urgent pointer... I thought it was just a clunky standard.

But what truly made me fall in love with TCP was the sliding window design. Using just 16 bits of window size, it coordinates the transmission rates of both parties, and elegantly solved the "16 bits aren't enough" problem with scaling — this isn't a patch, it's the evolutionary capability of protocol design.

Later, when I wrote a toy TCP/IP stack (the kind with only 100 lines of state machine), I truly understood that the three-way handshake isn't as simple as "three steps" — in those three packets, it negotiates MSS, window scaling, timestamps, SACK — all hidden inside the TCP Options.

Don't be afraid. Look at the state machine diagram twice, capture a few packets with Python, and it'll become your muscle memory. In the next chapter, you'll see TCP's "application layer child" — HTTP — how it stands on the shoulders of these 20 bytes and rules the internet.


Next Stop Preview

Chapter 4: HTTP & Web Servers

The application layer protocol built on top of TCP connections — you'll write your own mini web server (Python implementation) and understand the request/response format, caching strategies, cookies, CORS — all the essentials you need to work with HTTP daily.

Built with VitePress | Software Systems Atlas