1

My question is related to a concept that really confused me while reading an introduction to network related topics. What is the distinction between application protocol and TCP, exactly?

What I don't understand is how something like an HTTP port is considered a part of TCP, but actually is connected to the web as an application. Isn't it then application layer data? How are they separate? For instance, isn't HTTP port related to specific applications, like the web, and totally separate from other application like a torrent? Then doesn't it correlate to application level?

7 Answers7

7

The relation between application layer protocol and transport protocol is similar to the relation between language and paper. Language describes the rules for content which can be transported using paper, but language is not a subset of paper. Similar HTTP describes the rules how specific data are transported using TCP, but HTTP is not a subset of TCP.

Steffen Ullrich
  • 2,658
  • 15
  • 15
7

What i dont understand is how is it that something like HTTP port is considered as part of TCP but actually is connected to web as application?

It is not an HTTP port; it is a TCP port that IANA has registered for use by HTTP. See the IANA Service Name and Transport Protocol Port Number Registry.

Ron Maupin
  • 102,040
  • 26
  • 123
  • 202
4

Each layer needs to interface to the layers above and below it. The precise details of these interfaces are not normally defined in network standards because they are not visible "on the wire" and because they nessacerally depend on the design of the system on which the network stack is implemented.

When a packet is received, each layer of the network stack must know how to deliver the data to the next layer. Ethernet has an "ethertype" field, 0x0800 is IPv4, 0x0806 is Arp, 0x86DD is IPv6, there are many others for non-IP protocols. IP has a "protocol" field which identifies the "transport" protocol, 0x01 is ICMP, 0x06 is TCP, 0x11 is UDP. So the Ethernet implementation knowsthat it needs to deliver a packet to the IP implementation which in turn knows that it needs to deliver it to the TCP/UDP implementation.

The interface between TCP/UDP and the application layer is a bit different, because rather than merely being the boundary between two layers in the network stack, it is normally the boundary between the operating system kernel and the applications that run on that kernel. In particular one system may be running multiple instances of the same application, and one client application may even want to open multiple logical connections to the same server application.

Each logical TCP connection is therefore identified by a combination of two IP addresses and two port numbers. When discussing packet formats we talk about "source" and "destination" addresses and ports, while when talking about implementations on a host we talk about "local" and "remote" addresses and ports.

In UDP things are slightly different, because the operating system does not track UDP connections, but the basic idea of their being two port numbers and the IP/port combinations being swapped when generating replies remains.

Servers generally run on a port number that is statically configured when the service is installed. Clients on the other hand normally use a port number chosen at random. In this way multiple instances of the same client can happily coexist on the same host and connect to the same server.

For most protocols there is a default port number, which servers will listen on by default and clients will connect to by default.

Peter Green
  • 13,882
  • 2
  • 23
  • 54
3

In the interest of brevity, in the following answer I'm intentionally giving a high-level overview here; especially the names of things may differ from what you read in various sources, and this may be combining certain concepts you see split apart elsewhere.

To review: we have basically three components in the communications you're talking about.

  1. The network protocol, which in this case is TCP, deals with transferring sequences of bytes between endpoints. It is of no concern to this layer if the sequence is GET /foo HTTP/1.0 (which happens to be HTTP) or MAIL FROM:<joe@example.com> (which you may recognise as SMTP); neither string has any meaning to TCP. In either case, those characters are sent and received, without loss, in that order.

  2. The application protocol, which consists of sequences of particular bytes that have meaning to the application. GET /foo HTTP/1.0 will have meaning to an HTTP server, but will be considered an error by an SMTP server.

  3. The address of each endpoint, particularly in this case that of the server. Here this includes the address of the host (e.g., 192.168.1.1) and also the address of the application on the host ("TCP port 80"). Together these make up the full address; having a port number along with the host address allows multiple applications to communicate on the same machine (e.g., you can have both an HTTP server on port 80 and an SMTP server on port 25).

Where I think your confusion has arisen is in not understanding that these three things are separate and orthogonal. When you use the HTTP application protocol, it remains the HTTP application protocol whether or not you're using it with TCP, and regardless of to what address you're sending the HTTP requests.

For example, if I write down GET /hello HTTP/1.0 on a piece of paper and hand it to you, and you then write down on another piece of paper

HTTP/1.0 200 OK

Hello, world.

and give me that piece of paper, we have participated in an exchange using the HTTP protocol, though we have used neither TCP nor IP addresses.

This mixing and matching of protocols and addresses is a regular thing in the real world. The protocol used to communicate between a Docker client (such as the docker command-line program) and Docker server exchanges ordered sequences of bytes to request various actions (such as, "start a Docker container") is frequently used over both Unix domain networking and TCP networking. If I configure my Docker server to listen at both the Unix domain endpoint /var/run/docker.sock and the TCP endpoint 192.168.1.1:2375 (i.e., that host address and port 2375), I can send Docker protocol messages to either endpoint (each of which is using a different network protocol) and communicate with the server. To use the first one I set DOCKER_HOST=unix:///var/run/docker.sock in my process environment; to use the second one I set DOCKER_HOST=tcp://192.168.1.1:2375 in my environment.

For convenience, popular application protocols that can be run over TCP or UDP often have a default port that is used if you specify an incomplete address that does not include the port number.

Thus, the partial address given in the URL http://192.168.1.1/foo is exactly the same address as given in http://192.168.1.1:80/foo, and in both cases "TCP protocol" is also implied (because it's clearly an IP address, and HTTP is not used over UDP). The same is true of Docker: tcp://192.168.1.1 is a partial address that really specifies tcp://192.168.1.1:2375.

Note that the use of a default port when not specified in the address is a feature of the particular application that's accepting an address specification from you; it is not part of the application protocol itself. Whether a default port (or any other partial or full default address) is used depends on the application:

  • For HTTP, the IANA has defined a standard for default port (as mentioned in Ron Maupin's answer): if not specified, port 80 will be used, and virtually all HTTP client applications will use this default if you specify an incomplete address that's missing the port. But only if you're using TCP! You'll notice that the default port used for TCP is irrelevant in my example of a "paper-based" HTTP communication above.

  • For Docker the default port when using the docker command-line client is 2375. This is not an IANA standard; this was something that the original developers of Docker came up with and coded into their application. As with many programs; the Docker client even lets you specify no address at all and has its own default complete address for that case (unix:///var/run/docker.sock is that default).

cjs
  • 231
  • 1
  • 4
1

A network stack is organized in layers where each lower layer provides services for an upper layer. HTTP in the application layer uses TCP in the transport layer, which uses IPv4 or IPv6 in the network layer, which might use Ethernet in the data link layer.

Layers communicate with each other for their services but neither is part of another. Each has its distinct functionality.

A HTTP server listens on some TCP port in order to be connected to. The default TCP port is 80 for HTTP or 443 for HTTPS.

These Q&A might be helpful for you:

Zac67
  • 90,111
  • 4
  • 75
  • 141
1

I think the reason for your question is some misunderstanding:

Protocols like HTTP, FTP, SMTP, ... do not necessarily use TCP but they can also use different connection-oriented protocols such as SPX or NetBIOS.

The fact that TCP is the only of these protocols being used today makes you think that HTTP requires TCP.

In the "ideal OSI model" it is the idea that you can replace the upper- and lower-layer protocols.

IPv4 (layer 3), for example, can use different layer 2 protocols (for example Ethernet or PPP) and it can carry different layer 4 protocols (for example UDP or TCP).

And TCP (layer 4) can use different layer 3 protocols (namely: IPv4, IPv6 or IPX) and carry different layer 5 protocols (HTTP, FTP, SSH ...).

The same is true for layer 5: You can use different layer 4 protocols (such as TCP, SPX, NetBIOS ...) to carry HTTP traffic.

However, most of these layer-4 protocols are no longer being used today.

Martin Rosenau
  • 2,366
  • 8
  • 10
0

TCP/IP has four defined layers, the application layer, transport layer, internet layer, and the link layer. Each layer does something with the data, then passes that data on to the next layer for additional handling. None of the layers deal with the data from the other layers directly, they just add or handle their specific part of the data and then pass it on.

Ignoring the real-world complexities involved, consider the following analogy. Let's suppose you want to write a letter to your friend in another country. First, you write the letter (application layer), put the letter into an envelope (transport layer), address the envelope (internet layer), and then put in the mail box (link layer). From there, the letter travels to its final destination via mailperson, then by post office, then by other means (e.g. trucks or airplanes), which then goes to the far side post office, then to the mailperson, then to the mailbox, and finally to your friend.

The TCP/IP design means that, as long as the boundary rules are followed, you can swap out one implementation of any layer with another without disrupting the entire stack. As an obvious example, the link layer might be: WiFi, Ethernet, microwave, IR, fiber, dialup, etc. The internet layer doesn't care how the link layer is configured, just that it needs to route data from point A to point B. It also doesn't care what data is in the packet created by TCP/IP, it's just routing data across the network we call the Internet.

So, you wouldn't say that one layer is a subset of another, they're all distinct layers with specific purposes. Each layer is a part of a greater whole that makes internetwork data transport possible. The four layers are all required for an application (ignoring some special cases, like pings), but those layers can be swapped in and out as needed for performance, convenience, cost, etc.

phyrfox
  • 1
  • 2