Blog

Intro - Basic Networking Concepts

This blog post outlines a handful of basic networking concepts, tools, and components that are important to understand for any Networking, Infrastructure, DevOps, or Site Reliability Engineer. It is not a comprehensive list, nor am I going to go in depth into each subject, however I have collected quite a few notes over the years, and wanted to aggregate and consolidate them into one place.

Concepts Covered

TCP vs UDP

Virtual Private Network (VPN)

Router

DNS Records

What happens when you type in address to URL

Control Plane vs Data Plane

Virtual LAN or VLAN (Virtual Local Area Network)

Conntrack

What is A Port?

Netstat

TLS (Transport Layer Security)

IPsec Tunnel

CAP Theorem

SRE Concepts - Monitoring, Alerting, Notifications

Certificates (SSL/ TLS) and the TLS Handshake

Border Gateway Protocol

Process vs Thread

TCP vs UDP

Both TCP and UDP use a numerical identifier for the data structures of the endpoints for host-to-host communications. Such an endpoint is known as a port and the identifier is the port number. TCP is a connection-oriented protocol and UDP is a connection-less protocol.

Transmission Control Protocol (TCP)

TCP directs packets to a specific application on a computer using a port number, and works with the IP, which defines how computers send packets of data to each other. TCP defines how to establish and maintain a network conversation through which application programs can exchange data. The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. TCP originated in the initial network implementation in which it complemented the Internet Protocol (IP). Hence, it is broadly referred to as TCP/IP.

1) LISTEN – Server is listening on a port, such as HTTP
2) SYNC-SENT – Sent a SYN request, waiting for a response
3) SYN-RECEIVED – (Server) Waiting for an ACK, occurs after sending an ACK from the server
4) ESTABLISHED – 3 way TCP handshake has completed

User Datagram Protocol (UDP)

User Datagram Protocol Communications protocol that is primarily used for establishing low-latency and loss-tolerating connections between applications on the internet. It speeds up transmissions by enabling the transfer of data before an agreement is provided by the receiving party.

(Extra) IP (Internet Protocol)

IP directs packets to a specific computer using an IP address. IP is at Layer 3 (Transport) of the OSI model (see OSI Model section below), while TCP and UDP are at Layer 4 (Transport).

TCP vs UDP

Some key differences between TCP and UDP:

TCP is connection-based, while UDP is connectionless.

TCP creates a secure communication line and verifies receipt of data, while UDP sends data without confirming receipt or checking for errors.

Speed - UDP is faster than TCP because it doesn't require additional responses from the receiver.

Data integrity - TCP only transmits complete sets of data packets, while UDP transmits whatever it can, even if some packets are lost.

Security - TCP transmissions are generally easier to keep secure than those sent via UDP.

Suitability - TCP is better for applications that prioritize data integrity, while UDP is better for applications that prioritize speed and low latency.

Examples of When TCP would be preferred, and UDP would be Preferred:

TCP:

Anything where packets dropped would be an issue:

Transferring files (photos, documents, etc...), so FTP or SFTP

Email (SMTP)

Secure Shell (SSH)

Video Streaming (e.g. Netflix)

UDP:

Where some latency or like a glitch or something isn't that big of a deal:

Real-Time Multimedia Streaming (e.g. Zoom calls)

Online Gaming

Domain Name System Queries (DNS) - UDP handles converting “www.example.com” to an IP address requests efficiently.

Simple Network Management Protocol (SNMP)

Dynamic Host Configuration Protocol (DHCP)

Routing Information Protocol (RIP)

NAT Gateway

Fundamentally, a NAT Gateway allows instances with no public IPs to access the internet. A NAT gateway is a Network Address Translation (NAT) service. Instances in a private subnet can connect to services outside your VPC but external services cannot initiate a connection with those instances. Many times NAT gateways are classified into Internet NAT gateways and Virtual Private Cloud (VPC) NAT gateways. Internet NAT gateways provide NAT services for public IP addresses, while VPC NAT gateways provide NAT services for private IP addresses. NAT Gateways provide.

It is easiest to think of NAT Gateways in terms of traffic flow. NAT gateways intercept Egress (outgoing) traffic. Or, in other words, traffic coming from inside a private network to the public network or internet. They will perform IP Translation so that the downstream service at the receiving end of the request will see the request coming from the NAT Gateway, not the service which has a private IP in a private network.

Internet Gateway

Fundamentally, an Internet Gateway (IGW) allows instances with public IPs to access the internet.

An Internet gateway is a network "node" that connects two different networks that use different protocols (rules) for communicating. In the most basic terms, an Internet gateway is where data stops on its way to or from other networks. Thanks to gateways, we can communicate and send data back and forth with each other.

Internet GW allows both inbound and outbound access to the internet. Which differs from a NAT Gateway in that the NAT Gateway only allows outbound access. Thus, an Internet GAteway allows instances with public IPs to access the internet whereas NAT Gateway allows instances with private IPs to access internet.

Routers are often Internet gateways. They are a piece of hardware that essentially connects your computer to the Internet. In home networks, it is usually something that comes with software you can install on one computer and then connect other computers to as well. Then everyone connected to your router can connect to the Internet through your ISP. While a router can be connected to more than two networks at a time, this is usually not the case for routers used at home.

In short, an Internet gateway is one of the ways that information is sent and delivered to us as we use the Internet. It is what gives us the ability to access other networks to view web pages, initiate downloads or uploads, buy things online, and more.

Default Gateway

A default gateway in simple terms is device that forwards data from one network to another. It is a node in a computer network that forwards traffic to other networks when no other route matches the destination IP address of a packet. It's the first path that information takes between systems, and it's often called the gateway. For example, in a home office with wireless internet, the router is the default gateway. In fact, the majority of time the default gateway is going to be a router.

RouteTable and Router

A routing table is a set of rules, often viewed in table format, that is used to determine where data packets traveling over an Internet Protocol (IP) network will be directed. All IP-enabled devices, including routers and switches, use routing tables.

A route table contains a set of rules, called routes, that are used to determine where network traffic from your subnet or gateway is directed.

Each subnet in your VPC must be associated with a route table, which controls the routing for the subnet (subnet route table).

Subnet

In the most basic sense, a subnet is a logical subdivision of an IP network. The practice of dividing a network into two or more networks is called subnetting. Computers that belong to a subnet are addressed with an identical most-significant bit-group in their IP addresses.

Purpose: To improve network performance. Running a vast amount of network devices on the same subnet can abstruse and complicate the network, especially if there's a lot of broadcast traffic.

For example, two subnets created in a VPC can look like: 10.0.0.0/24 and 10.1.0.0/24. Devices in each subnet are in a different broadcast domains.

Virtual Private Network (VPN)

A Virtual Private Network, or VPN, is a tool that implements a data and traffic tunneling feature. It extends a private network across a public network, and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. It is a mechanism for creating a secure connection between a computing device and a computer network, or between two networks, using an insecure communication medium such as the public Internet. One downside to note is that depending on VPN options, geo location, and internet bandwidth, some VPN options may make your connection speed decrease slightly, sometimes by 10%. However, the upside of a VPN traditionally security. As a VPN is creating a point-to-point tunnel that encrypts your personal data, masks your IP address, and lets you sidestep website blocks and firewalls on the internet.

As a simple example of a use case for a VPN, say you are in Mexico and you want to access Star Sports' grid of live games, debate shows and highlights of your favorite leagues. You can establish connection with the VPN of your choice, select an Indian server that provides you with an Indian VPN address and the service will think your connection is coming from India, not from Mexico.

Peering vs VPN

Network Peering is another commonly used tool in the Inffrastucuture world. It is similer to a VPN, but has unique and key differences.

Peering - connects two networks to appear as one routed through a cloud provider’s infrastructure.

VPN - connects virtual networks over the public internet.

VPN (IPSsec) tunnel vs SSL/TLS

An IPSec tunnel allows for the implementation of a virtual private network (VPN) which an enterprise may use to securely extend its reach beyond its own network to customers, partners and suppliers. There are several types of VPN protocols for tunneling, or transmitting, data over the Internet. For example, most eCommerce sites use Secure Sockets Layer (SSL) and Transport Layer Security (TLS). Some networks utilize Secure Shell (SSH), and others use Layer 2 Tunneling Protocol (L2TP). Compared to these various types of “normal” tunnels, IPSec provides the most robust cryptographic security.

IPsec VPN is one of two common VPN protocols, or set of standards used to establish a VPN connection. IPsec is set at the IP layer, and it is often used to allow secure, remote access to an entire network (rather than just a single device). This inability to restrict users to network segments is a common concern with this protocol.

IPsec VPNs come in two types:

Tunnel mode

Transport mode

DNS

The world of computers works in 1's and 0's. Computers don't recognize names, they recognize numbers (more specifically binary). However, we as humans do not operate this way. If we were to speak the same langauge as computers, you would have to memorize all the IPs of your favorite webistes; and when you wanted to navigate to them, type that IP in the URL bar. DNS was created to solve this issue. DNS (Domain Name System) is how we map an IP address to recognizable host names. So, instead of having to memorize 192.186.60.0 if you wanted to go to google.com for example, you can just type in https://www.google.com and be taken there.

DNS Records

DNS records (aka zone files) are instructions that live in authoritative DNS servers and provide information about a domain including what IP address is associated with that domain and how to handle requests for that domain. Probably the most common, (and at the least the easiest to understand) is the A Record. An A record maps a domain name to the IP address (Version 4) of the computer hosting the domain. An A record uses a domain name to find the IP address of a computer connected to the internet. The A in A record stands for Address. Whenever you visit a web site, send an email, connect to Twitter or Facebook, or do almost anything on the Internet, the address you enter is a series of words connected with dots. Here are the other most common DNS records:

A Record - The record that holds the IP address of a domain.

AAAA Record - The record that contains the IPv6 address for a domain (as opposed to A records, which list the IPv4 address).

CNAME record - Forwards one domain or subdomain to another domain, does NOT provide an IP address.

MX record - Directs mail to an email server.

TXT record - Lets an admin store text notes in the record. These records are often used for email security.

NS record - Stores the name server for a DNS entry.

SOA record - Stores admin information about a domain.

SRV record - Specifies a port for specific services.

PTR record - Provides a domain name in reverse-lookups.

There are many other types of DNS records, however they are less commonly used. The above are what you are most likely to see or work with.

DNS Tools - Traceroute

Traceroute is a command that runs tools used for network diagnostics. These tools trace the paths data packets take from their source to their destinations, allowing administrators to better resolve connectivity issues. Before running a traceroute command, you should understand a network mechanism called “time to live” (TTL).

TTL limits how long data can “live” in an IP network. Every packet of data is assigned a TTL value. Every time a data packet reaches a hop, the TTL value is decreased by one. Another key element to understand is “round-trip time” (RTT). Traceroute ensures each hop on the way to a destination device drops a packet and sends back an ICMP error message. This means traceroute can measure the duration of time between when the data is sent and when the ICMP message is received back for each hop—giving you the RTT value for each hop.

An example of how to run traceroute:

traceroute [domain name]

traceroute facebook.com

traceroute -w 3 -q 1 -m 16 example.com

DNS Tools - Dig

Dig (Domain Information Groper) is a powerful command-line tool for querying DNS name servers. The dig command, allows you to query information about various DNS records, including host addresses, mail exchanges, and name servers. It is the most commonly used tool among system administrators for troubleshooting DNS problems because of its flexibility and ease of use.

dig IP or DOMAIN_NAME

dig HOSTNAME

dig +short HOSTNAME # (only ips)

dig @ # (bypass ttl)

dig google.com +trace

dig -x 172.217.14.238 # (look up a domain name by its IP address)

dig google.com +noall +answer. # (Detailed about answer section)

dig +noall +answer -x 172.217.14.238

Display all: dig google.com ANY

dig google.com +short

Detailed about answer section: dig google.com +noall +answer

Trace the route to get to DNS server: dig google.com +trace

To look up a domain name by its IP address: dig -x 172.217.14.238

The -x option allows you to specify the IP address instead of a domain name. This can be combined with other options:

dig +noall +answer -x 172.217.14.238

Bypass ttl:

dig @HOST

dig @8.8.8.8 google.com

More examples: https://phoenixnap.com/kb/linux-dig-command-examples

What happens when you type in address to URL

The browser checks the cache for a DNS (domain name system) record to find the corresponding IP address. To find the DNS record, the browser checks four caches.

First, it checks the browser cache. The browser maintains a repository of DNS records for a fixed duration for websites you have previously visited. So, it is the first place to run a DNS query.

Second, the browser checks the OS cache. If it is not in the browser cache, the browser will make a system call (i.e., gethostname on Windows) to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.

Third, it checks the router cache. If it’s not on your computer, the browser will communicate with the router that maintains its’ own cache of DNS records.

Fourth, it checks the ISP cache. If all steps fail, the browser will move on to the ISP. Your ISP maintains its’ own DNS server, which includes a cache of DNS records, which the browser would check with the last hope of finding your requested URL.

If the requested URL is not in the cache, ISP’s DNS server initiates a DNS query to find the IP address of the server that hosts it.

The purpose of a DNS query is to search multiple DNS servers on the internet until it finds the correct IP address for the website. This type of search is called a recursive search since the search will repeatedly continue from a DNS server to a DNS server until it either finds the IP address we need or returns an error response saying it was unable to find it.

OSI Model

The open systems interconnection (OSI) model is a conceptual model created by the International Organization for Standardization which enables diverse communication systems to communicate using standard protocols. Essentially, the OSI provides a standard for different computer systems to be able to communicate with each other. The OSI Model can be seen as a universal language for computer networking. It is based on the concept of splitting up a communication system into seven abstract layers, each one stacked upon the last.

Important to note that when the OSI model was first created, the top three layers (Session, Presentation, and Application) had a specific function that was independent from the rest. Howver, currently the distinction between these layers can be somewhat vague. Every application is infact free to impliment the functions of Layer 5, 6, and 7 as they choose. Therefor, these three layers are often considered as a single universal "Application" layer. In fact, the other popular internet communication model, called the TCP/IP Model, does exactly that.

Layer 7) Application Layer

Layer 6) Presentation Layer

Layer 5) Session Layer

Layer 4) Transport Layer

Service to Service

Distinguish Data Streams

Addressing Scheme - Ports

0-65535: TCP - Reliability

0-65535: UDP - Effeciency

Layer 3) Network Layer

End to End

Ip Addresses

Routers

Any device with an IP Address

Layer 2) Data Link Layer

Hop to Hop

MAC Addresses

Switches

Layer 1) Physical Layer

Transporting Bits

Physical Hardware/ Cables

Wires, Cables, Wi-Fi, Repeaters, Hubs

Hub

Allows many computers to communicate with each other at once. A Hub (or Ethernet hub, active hub, network hub, repeater hub, multiport repeater) is a network hardware device for connecting multiple Ethernet devices together and making them act as a single network segment. It has multiple input/output (I/O) ports, in which a signal introduced at the input of any port appears at the output of every port except the original incoming. A hub works at the physical layer. A layer 1 network device such as a hub transfers data but does not manage any of the traffic coming through it. Any packet entering a port is repeated to the output of every other port except for the port of entry. Specifically, each bit or symbol is repeated as it flows in. A repeater hub can therefore only receive and forward at a single speed. Dual-speed hubs internally consist of two hubs with a bridge between them. Since every packet is repeated on every other port, packet collisions affect the entire network, limiting its overall capacity.

Modem

Connects LANs to the internet. “Modulator” Modulates or transforms Digital to analogue and vice versa. A modem transmits data by modulating one or more carrier wave signals to encode digital information, while the receiver demodulates the signal to recreate the original digital information. The goal is to produce a signal that can be transmitted easily and decoded reliably. Modems can be used with almost any means of transmitting analog signals, from light-emitting diodes to radio.

Switch

A network switch (also called switching hub, bridging hub, and, by the IEEE, MAC bridge) is networking hardware that connects devices on a computer network by using packet switching to receive and forward data to the destination device. A network switch is a multiport network bridge that uses MAC addresses to forward data at the data link layer (layer 2) of the OSI model. Some switches can also forward data at the network layer (layer 3) by additionally incorporating routing functionality. Such switches are commonly known as layer-3 switches or multilayer switches. Unlike repeater hubs, which broadcast the same data out of each port and let the devices pick out the data addressed to them, a network switch learns the Ethernet addresses of connected devices and then only forwards data to the port connected to the device to which it is addressed. Switches are most commonly used as the network connection point for hosts at the edge of a network. In the hierarchical internetworking model and similar network architectures, switches are also used deeper in the network to provide connections between the switches at the edge. The network switch plays an integral role in most modern Ethernet local area networks (LANs). Mid-to-large-sized LANs contain a number of linked managed switches. Small office/home office (SOHO) applications typically use a single switch, or an all-purpose device such as a residential gateway to access small office/home broadband services such as DSL or cable Internet. In most of these cases, the end-user device contains a router and components that interface to the particular physical broadband technology.

Control Plane vs Data Plane

In networking, a plane is an abstract conception of where certain processes take place. The term is used in the sense of "plane of existence." The two most commonly referenced planes in networking are the control plane and the data plane (also known as the forwarding plane).

The control plane is the part of a network that controls how data packets are forwarded — meaning how data is sent from one place to another. The process of creating a routing table, for example, is considered part of the control plane. Routers use various protocols to identify network paths, and they store these paths in routing tables.

In contrast to the control plane, which determines how packets should be forwarded, the data plane actually forwards the packets. The data plane is also called the forwarding plane.

Think of the control plane as being like the stoplights that operate at the intersections of a city.

Meanwhile, the data plane (or the forwarding plane) is more like the cars that drive on the roads, stop at the intersections, and obey the stoplights.

Control Plane

In Routing... control plane refers to the all functions and processes that determine which path to use to send the packet or frame. Control plane is responsible for populating the routing table, drawing network topology, forwarding table and hence enabling the data plane functions. Means here the router makes its decision. In a single line it can be said that it is responsible for How packets should be forwarded.

Data Plane

In Routing... data plane refers to all the functions and processes that forward packets/frames from one interface to another based on control plane logic. Routing table, forwarding table and the routing logic constitute the data plane function. Data plane packet goes through the router and incoming and outgoing of frames are done based on control plane logic. Means in single line it can be said that it is responsible for moving packets from source to destination. It is also called the Forwarding plane.

Network Topology

Network topology refers to the way data flows in a network. The control plane establishes and changes network topology. Again, think of the stoplights that function at the intersections of a city. Network topology is like the way that the roads are arranged, and the computing devices within the network are like the destinations that those roads lead to

Virtual LAN or VLAN (Virtual Local Area Network)

A virtual LAN (VLAN) is any broadcast domain that is partitioned and isolated in a computer network at the data link layer (OSI layer 2). LAN is the abbreviation for local area network and in this context virtual refers to a physical object recreated and altered by additional logic. VLANs work by applying tags to network frames and handling these tags in networking systems – creating the appearance and functionality of network traffic that is physically on a single network but acts as if it is split between separate networks. In this way, VLANs can keep network applications separate despite being connected to the same physical network, and without requiring multiple sets of cabling and networking devices to be deployed.

VLANs allow network administrators to group hosts together even if the hosts are not directly connected to the same network switch. Because VLAN membership can be configured through software, this can greatly simplify network design and deployment. Without VLANs, grouping hosts according to their resource needs the labor of relocating nodes or rewiring data links. VLANs allow devices that must be kept separate to share the cabling of a physical network and yet be prevented from directly interacting with one another. This managed sharing yields gains in simplicity, security, traffic management, and economy.

For example, a VLAN can be used to separate traffic within a business based on individual users or groups of users or their roles (e.g. network administrators), or based on traffic characteristics (e.g. low-priority traffic prevented from impinging on the rest of the network's functioning). Many Internet hosting services use VLANs to separate customers' private zones from one other, allowing each customer's servers to be grouped in a single network segment no matter where the individual servers are located in the data center. Some precautions are needed to prevent traffic "escaping" from a given VLAN, an exploit known as VLAN hopping. To subdivide a network into VLANs, one configures network equipment. Simpler equipment might partition only each physical port (if even that), in which case each VLAN runs over a dedicated network cable. More sophisticated devices can mark frames through VLAN tagging, so that a single interconnect (trunk) may be used to transport data for multiple VLANs. Since VLANs share bandwidth, a VLAN trunk can use link aggregation, quality-of-service prioritization, or both to route data efficiently.

Conntrack

Component of Netfilter used to track the state of connections to and from the machine.

NAT relies on Conntrack to function.

SNAT (source network address translation) - rewrites source ip address.

DNAT (destination network address translation) - rewrites the destination ip address

Identifies connections as tuple:

Source address

Source port

Destination address

Destination port

L4 Protocol

What is A Port?

A port is a virtual point where network connections start and end. Ports are software-based and managed by a computer's operating system. Each port is associated with a specific process or service. Ports allow computers to easily differentiate between different kinds of traffic: emails go to a different port than webpages, for instance, even though both reach a computer over the same Internet connection.

Port Number

Ports are standardized across all network-connected devices, with each port assigned a number. Most ports are reserved for certain protocols — for example, all Hypertext Transfer Protocol (HTTP) messages go to port 80. While IP addresses enable messages to go to and from specific devices, port numbers allow targeting of specific services or applications within those devices. Ports are a transport layer (layer 4) concept. Only a transport protocol such as the Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) can indicate which port a packet should go to. TCP and UDP headers have a section for indicating port numbers. Network layer protocols — for instance, the Internet Protocol (IP) — are unaware of what port is in use in a given network connection. In a standard IP header, there is no place to indicate which port the data packet should go to. IP headers only indicate the destination IP address, not the port number at that IP address.

Tools To Check Open Ports

Lsof (list open files)

sudo lsof -i -P -n | grep LISTEN

See a specific port such as 22:

sudo lsof -i:22

i4 means only show ipv4 address and ports -P and -n fast output.

lsof -Pn -i4

lsof -Pn -i4 | grep LISTEN

This will tell you if any program is responding to DNS requests:

lsof -i 4udp@127.0.0.1:53 | grep -v ^COMMAND | awk '{print $1}'

There may be other programs listening for UDP requests on your hosts. This command, run as root will show you these:

lsof -i 4udp@127.0.0.1 | grep -v ^COMMAND | awk '{print $1}'

Netstat

sudo netstat -tulpn | grep LISTEN

Netcat (nc)

nc -z -v HOST PORT

nc -z -v example.com 80

nc -vz 10.82.0.139 443

Nmap

sudo nmap -sTU -O IP-address-Here

sudo nmap -sS IP

nmap IP -sT

sudo nmap 10.145.0.23 -sA

nmap 192.168.1.1 -p 21

nmap 192.168.1.1 -p 21-100

All ports:

nmap 192.168.1.1 -p-

Discovery only on ports x, no port scan:

nmap -iR 10 -PS22-25,80,113,1050,35000 -v -sn

Query the Internal DNS for hosts, list targets only

nmap 192.168.1.1-50 -sL --dns-server 192.168.1.1.

Traceroute to random targets, no port scan

nmap -iR 10 -sn -traceroute

Telnet

An application protocol used on the Internet or local area network to provide a bidirectional interactive text-oriented communication facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP). Alternatively, you can use the nc (or netcat) command in Terminal, as a replacement for telnet.

Telnet:

telnet IP PORT

(Alternatively) using Curl:

curl -v telnet://IP:PORT

curl -v telnet://127.0.0.1:23

Netstat

Netstat is a utility tool that displays network connections for Transmission Control Protocol, routing tables, and a number of network interface and network protocol statistics. The netstat command works in conjunction with the ifconfig command to provide a status condition of the TCP/IP network interface. The command netstat -in for example uses the -i flag to present information on the network interfaces while the -n flag prints the IP addresses instead of the host names.

netstat -tupln | grep LISTEN

netstat -plnt

netstat -lnp

netstat -tnl

netstat -n -o

netstat -r

netstat -ap TCP

All sockets:

netstat -a

Statistics:

netstat -s

Show listening sockets:

nestat -l

TCP data:

netstat -t

UDP data:

netstat -u

See route Tables (for exampe on your on MAC):

netstat -nr

See default gateway (for exampe on your on MAC):

netstat -nr | grep default

TLS (Transport Layer Security)

Transport Layer Security (TLS), the successor of the now-deprecated Secure Sockets Layer (SSL), is a cryptographic protocol designed to provide communications security over a computer network. Several versions of the protocol are widely used in applications such as email, instant messaging, and voice over IP, but its use as the Security layer in HTTPS remains the most publicly visible. The TLS protocol aims primarily to provide privacy and data integrity between two or more communicating computer applications. It runs in the application layer of the Internet and is itself composed of two layers: the TLS record and the TLS handshake protocols.

TLS evolved from a previous encryption protocol called Secure Sockets Layer (SSL), which was developed by Netscape. TLS version 1.0 actually began development as SSL version 3.1, but the name of the protocol was changed before publication in order to indicate that it was no longer associated with Netscape. Because of this history, the terms TLS and SSL are sometimes used interchangeably. HTTPS is an implementation of TLS encryption on top of the HTTP protocol, which is used by all websites as well as some other web services. Any website that uses HTTPS is therefore employing TLS encryption.

For a website or application to use TLS, it must have a TLS certificate installed on its origin server (the certificate is also known as an "SSL certificate" because of the naming confusion described above). A TLS certificate is issued by a certificate authority to the person or business that owns a domain. The certificate contains important information about who owns the domain, along with the server's public key, both of which are important for validating the server's identity.

IPsec Tunnel

IPsec tunnel mode is used between two dedicated routers, with each router acting as one end of a virtual "tunnel" through a public network. In IPsec tunnel mode, the original IP header containing the final destination of the packet is encrypted, in addition to the packet payload. To tell intermediary routers where to forward the packets, IPsec adds a new IP header. At each end of the tunnel, the routers decrypt the IP headers to deliver the packets to their destinations. An Internet Protocol Security (IPSec) tunnel is a set of standards and protocols originally developed by the Internet Engineering Task Force (IETF) to support secure communication as packets of information are transported from an IP address across network boundaries and vice versa. An IPSec tunnel allows for the implementation of a virtual private network (VPN) which an enterprise may use to securely extend its reach beyond its own network to customers, partners and suppliers.

There are several types of VPN protocols for tunneling, or transmitting, data over the Internet. For example, most eCommerce sites use Secure Sockets Layer (SSL) and Transport Layer Security (TLS). Some networks utilize Secure Shell (SSH), and others use Layer 2 Tunneling Protocol (L2TP). Compared to these various types of “normal” tunnels, IPSec provides the most robust cryptographic security.

CAP Theorem

Concept that a distributed database system can only have two of the three:

Consistency

Availability

Partitioning

Consistency

All nodes see the same data at the same time. Simply put, performing a read operation will return the value of the most recent write operation causing all nodes to return the same data. A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state.

In this model, a system can (and does) shift into an inconsistent state during a transaction, but the entire transaction gets rolled back if there is an error during any stage in the process. Imagine we have 2 different records ("Bulbasaur" and "Pikachu") at different timestamps. The output on the third partition is "Pikachu", the latest input. However, the nodes will need time to update and will not be Available on the network as often. The data is same in all the server nodes(leader or follower) no matter where you read it from (either node A or B) data is always identical implying the system has nearly instantaneous sync capabilities

Availability

Every request gets a response. (success/failure). Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system.

This metric is trivial to measure: either you can submit read/write commands, or you cannot. Hence, the databases are time independent as the nodes need to be available online at all times. Unlike the previous example, we do not know if "Pikachu" or "Bulbasaur" was added first. The output could be either one. Hence why, high availability isn't feasible when analyzing streaming data at high frequency.

Partitioning

When dealing with modern distributed systems, Partition Tolerance is not an option. It's a necessity. Hence, we have to trade between Consistency and Availability. System continues to run, despite number of messages being delayed by the network between nodes.

A system that is partition-tolerant can sustain any amount of network failure that doesn't result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. system continues to respond, even after some of the server nodes fail. Implies that the system maintains all the requests/responses function somehow.

Examples

CP [Consistency/Partition Tolerance] Systems

Node01 had the data but Node02 was down, so instead of Node01 sending back the data it had it sent and error because it couldn't determine consistency.

Examples: MongoDB, Memcache, Hbase - risk of some data becoming unavailable.

AP [Availability/Partition Tolerance] Systems

What if in the above scenario, the system instead of sending ERROR (in case Node02 is down); it sends the data received from Node01. The client got the data, but was it the latest data copy stored in the system in the past? You cannot decide. You chose availability over consistency. These are AP systems.

Examples: Cassandra, Couch, Riak.

CA [Consistency/Availability] Systems

In a distributed environment, we cannot avoid "P" of CAP. So we have to choose between CP or AP systems. If we desire to have a consistent and available system, then we must forget about partition tolerance it's possible only in non-distributed systems such as oracle and SQL server.

SRE Concepts - Monitoring, Alerting, Notifications

Very brief high level overview of monitoring and alerting, from the DevOps and SRE perspective. This is meant to just introduce these concepts, and provide a short foundational definition to each. These are vary large and important concepts, especially when running a distributed system at enterprise level. If interested in more detail into SRE, and curious the difference in SRE vs DevOps, see my blog post Core Principals of SRE (Site Reliability Engineering) and SRE vs DevOps.

Monitoring

Monitoring is the process of collecting, aggregating, and analyzing the metrics provided by the components in your environment by using a monitoring solution. A monitoring system enables you to gather statistics, store, centralize and visualize metrics, events, logs, and traces in real time. A good monitoring system enables you to see the bigger picture of what is going on across your infrastructure at any time, all the time, and in real time. Monitoring solutions enable you to sample or aggregate both current and historical data. While fresh metrics are essential for troubleshooting any new issues, they are also valuable when analyzed over longer periods of time. Doing this provides insight you can see only when you zoom out and see changes, patterns, and trends over time. Good monitoring solutions will make your job easier by providing you with out-of-the-box or customizable graphs and dashboards where you can instantly see how all your apps and services interact with each other. Instead of finding your way around tables, you can now understand what’s happening at a glance.

Alerting

Alerting is the reactive element of a monitoring system that triggers actions whenever metric values change. Therefore, the most common and basic type of alert includes a threshold and an action the system needs to perform whenever alert rule conditions are met. Besides threshold-based alerts, monitoring systems often include anomaly detection-based alerting and heartbeat alerts. Alerts are often the main “interaction interface” DevOps engineers have with monitoring systems. While pretty dashboards are useful, too, you don’t want to have to look at them all the time to spot problems. You want to carry on with your life and work and be alerted when there is something you need to address.

There are two common outputs of alerts:

Notifications

Automated actions

Notifications are the most common output of alerts. Their main purpose is to alert a human being to a problem, providing as much context along the way to help the person troubleshoot and solve problems faster. Effective alert notifications must contain enough information to paint a clear picture of what happened, where, and when so that engineers can easily and quickly understand the root cause and fix it. Automated actions are not as common in monitoring systems. They can be useful in situations when automated actions are safe to perform without human intervention. An example of such action may be an automated restart of a problematic service.

Certificates (SSL/ TLS) and the TLS Handshake

CA-signed certificate

Digital document issued by a certificate authority that binds a public key to the identity of the certificate owner, thereby enabling the certificate owner to be authenticated. An identity certificate issued by a CA is digitally signed with the private key of the certificate authority.

A Certificate (also known as digital certificate, public key certificate, digital ID, or identity certificate) is a signed certificate that is obtained from a certificate authority by generating a certificate signing request (CSR). Signed certificate that is obtained from a certificate authority by generating a certificate signing request (CSR).

It typically contains:

(1) distinguished name and public key of the server or clients.

(2) common name and digital signature of the certificate authority.

(3) period of validity (certificates expire and must be renewed).

(4) administrative and extended information.

The certificate authority analyzes the CSR fields, validates the accuracy of the fields, generates a certificate, and sends it to the requester. A certificate can also be self-signed and generated by any one of many tools available, such as IBM® Sterling Certificate Wizard or OpenSSL. These tools can generate a digital certificate file and a private key file in PEM format, which you can combine using any ASCII text editor to create a key certificate file.

Certificate authority (CA)

An organization that issues digitally-signed certificates. The certificate authority authenticates the certificate owner's identity and the services that the owner is authorized to use, issues new certificates, renews existing certificates, and revokes certificates belonging to users who are no longer authorized to use them. The CA digital signature is assurance that anybody who trusts the CA can also trust that the certificate it signs is an accurate representation of the certificate owner.

Certificate signing request (CSR)

Message sent from an applicant to a CA in order to apply for a digital identity certificate. Before creating a CSR, the applicant first generates a key pair, keeping the private key secret. The CSR contains information identifying the applicant (such as a directory name in the case of an X.509 certificate), and the public key chosen by the applicant. The CSR may be accompanied by other credentials or proofs of identity required by the certificate authority, and the certificate authority may contact the applicant for further information.

Private Key

String of characters used as the private, “secret” part of a complementary public-private key pair. The symmetric cipher of the private key is used to sign outgoing messages and decrypt data that is encrypted with its complementary public key. Data that is encrypted with a public key can only be decrypted using its complementary private key. The private key is never transmitted and should never be shared with a trading partner.

Public Key

String of characters used as the publicly distributed part of a complementary public-private key pair. The asymmetric cipher of the public key is used to confirm signatures on incoming messages and encrypt data for the session key that is exchanged between server and client during negotiation for an SSL/TLS session. The public key is part of the ID (public key) certificate. This information is stored in the key certificate file and read when authentication is performed.

Session Key

Asymmetric cipher used by the client and server to encrypt data. It is generated by the SSL software. Trusted root certificate file (also known as root certificate file File that contains one or more trusted root certificates used to authenticate ID (public) certificates sent by trading partners during the Sterling Connect:Direct® protocol handshake.

Cipher Suite

A cryptographic key exchange algorithm that enables you to encrypt and decrypt files and messages with the SSL or TLS protocol.

Client Authentication

A level of authentication that requires the client to authenticate its identity to the server by sending its certificate.

Key Certificate File

File that contains the encrypted private key and the ID (public key) certificate. This file also contains the certificate common name that can be used to provide additional client authentication.

Passphrase

Passphrase used to access the private key.

Self-Signed Certificate

Digital document that is self-issued, that is, it is generated, digitally signed, and authenticated by its owner. Its authenticity is not validated by the digital signature and trusted key of a third-party certificate authority. To use self-signed certificates, you must exchange certificates with all your trading partners.

How TLS/SSL Works

The industry-standard way to add encryption for data in motion is to use TLS (the successor to SSL). There are many examples online explaining how TLS works, but here are the basics:

Some entity decides to be a "Certificate Authority" ("CA") meaning it will issue TLS certificates to websites or other services.

An entity becomes a Certificate Authority by creating a public/private key pair and publishing the public portion (typically known as the "CA Cert").

The private key is kept under the tightest possible security since anyone who possesses it could issue TLS certificates as if they were this Certificate Authority.

In fact, the consequences of a CA's private key being compromised are so disastrous that CA's typically create an "intermediate" CA keypair with their "root" CA key, and only issue TLS certificates with the intermediate key.

Your client (e.g. a web browser) can decide to trust this newly created Certificate Authority by including its CA Cert (the CA's public key) when making an outbound request to a service that uses the TLS certificate.

When CAs issue a TLS certificate ("TLS cert") to a service, they again create a public/private keypair, but this time the public key is "signed" by the CA.

That public key is what you view when you click on the lock icon in a web browser and what a service "advertises" to any clients such as web browsers to declare who it is. When we say that the CA signed a public key, we mean that, cryptographically, any possessor of the CA Cert can validate that this same CA issued this particular public key.

The public key is more generally known as the TLS cert.

The private key created by the CA must be kept secret by the service since the possessor of the private key can "prove" they are whoever the TLS cert (public key) claims to be as part of the TLS protocol.

How does that "proof" work? Your web browser will attempt to validate the TLS cert in two ways:

First, it will ensure this public key (TLS cert) is in fact signed by a CA it trusts.

Second, using the TLS protocol, your browser will encrypt a message with the public key (TLS cert) that only the possessor of the corresponding private key can decrypt. In this manner, your browser will be able to come up with a symmetric encryption key it can use to encrypt all traffic for just that one web session.

Now your client/browser has:

Declared which CA it will trust.

Verified that the service it's connecting to possesses a certificate issued by a CA it trusts.

Used that service's public key (TLS cert) to establish a secure session.

SPIFFE

When talking about TLS and Trust Domains, I think it would be appropriate to briefly touch on SPIFFE as well. SPIFFE, the Secure Production Identity Framework for Everyone, is a set of open-source standards for securely identifying software systems in dynamic and heterogeneous environments. Systems that adopt SPIFFE can easily and reliably mutually authenticate wherever they are running. (Example: Istio) It is a set of open-source specifications for a framework capable of bootstrapping and issuing identity to services across heterogeneous environments and organizational boundaries. The heart of these specifications is the one that defines short lived cryptographic identity documents – called SVIDs via a simple API. Workloads can then use these identity documents when authenticating to other workloads, for example by establishing a TLS connection or by signing and verifying a JWT token.

A SPIFFE ID is a string that uniquely and specifically identifies a workload. SPIFFE IDs may also be assigned to intermediate systems that a workload runs on (such as a group of virtual machines). SPIFFE IDs are a Uniform Resource Identifier (URI) which takes the following format: spiffe://trust domain/workload identifier. The workload identifier uniquely identifies a specific workload within a trust domain.

The trust domain corresponds to the trust root of a system. A trust domain could represent an individual, organization, environment or department running their own independent SPIFFE infrastructure. All workloads identified in the same trust domain are issued identity documents that can be verified against the root keys of the trust domain.

Border Gateway Protocol (BGP)

BGP is a dynamic Routing protocol used to manage how packets get routed between edge routers on the internet. It is a set of rules that allows networks to communicate and exchange routing information. When someone submits data via the Internet, BGP is responsible for looking at all of the available paths that data could travel and picking the best route, which usually means hopping between autonomous systems. BGP is the protocol that makes the Internet work by enabling data routing. When a user in Singapore loads a website with origin servers in Argentina, BGP is the protocol that enables that communication to happen quickly and efficiently.

The Internet is a network of networks. It is broken up into hundreds of thousands of smaller networks known as autonomous systems (ASes). Each of these networks is essentially a large pool of routers run by a single organization.

Each network is assigned a BGP ASN (autonomous system number) to designate a single administrative entity or corporation that represents a common and clearly defined routing policy on the internet. BGP is a standardized exterior gateway protocol designed to exchange routing and reachability information among autonomous systems (AS) on the Internet. BGP is classified as a path-vector routing protocol, and it makes routing decisions based on paths, network policies, or rule-sets configured by a network administrator.

For routing, BGP determines the best path for data to travel between networks, considering all available options. BGP enables networks to announce themselves as destinations or routes to destinations, and allows network operators to control traffic flow, optimize bandwidth, and improve network performance.

Route Reflector

Route reflectors reduce the number of connections required in an AS. A single router (or two for redundancy) can be made a route reflector: other routers in the AS need only be configured as peers to them. A route reflector offers an alternative to the logical full-mesh requirement of internal border gateway protocol (IBGP). A RR acts as a focal point for IBGP sessions. The purpose of the RR is concentration. Multiple BGP routers can peer with a central point, the RR – acting as a route reflector server – rather than peer with every other router in a full mesh. All the other IBGP routers become route reflector clients.

This approach, similar to OSPF's DR/BDR feature, provides large networks with added IBGP scalability. In a fully meshed IBGP network of 10 routers, 90 individual CLI statements (spread throughout all routers in the topology) are needed just to define the remote-AS of each peer: this quickly becomes a headache to administer. A RR topology could cut these 90 statements down to 18, offering a viable solution for the larger networks administered by ISPs.

A route reflector is a single point of failure, therefore at least a second route reflector may be configured in order to provide redundancy. As it is an additional peer for the other 10 routers, it comes with the additional statement count to double that minus 2 of the single Route Reflector setup. An additional 11*2-2=20 statements in this case due to adding the additional Router. Additionally, in a BGP multipath Environment this also can benefit by adding local switching/Routing throughput if the RRs are acting as traditional Routers instead of just a dedicated Route Reflector Server role.

Rules

RR (Route Reflector) servers propagate routes inside the AS based on the following rules:

If a route is received from a non-client peer, reflect to clients only and EBGP peers.

If a route is received from a client peer, reflect to all non-client peers and also to client peers, except the originator of the route and reflect to EBGP peers.

BGP confederation

Confederations are sets of autonomous systems. In common practice, only one of the confederation AS numbers is seen by the Internet as a whole. Confederations are used in very large networks where a large AS can be configured to encompass smaller more manageable internal AS's. The confederated AS is composed of multiple AS's, each confederated AS alone has iBGP fully meshed and has connections to other AS's inside the confederation. Even though these AS's have eBGP peers to AS's within the confederation, the AS's exchange routing as if they used iBGP.

In this way, the confederation preserves next hop, metric, and local preference information. To the outside world, the confederation appears to be a single AS. Confederations can be used in conjunction with route reflectors. Both confederations and route reflectors can be subject to persistent oscillation unless specific design rules, affecting both BGP and the interior routing protocol, are followed.

Process vs Thread

A process is a program that is actively running in memory, and it has its own memory space, program code, and system resources. Processes are independent of each other and don't share memory.

A thread is a lightweight unit of execution within a process that shares the process's memory space. Threads within the same process can communicate and coordinate more easily than processes.

Main Difference

Process is a self-contained unit of execution that runs independently
Thread is the smallest unit of execution within a process.

Switching threads doesn't require interaction with the OS, but switching processes does. A failure in one process doesn't directly affect other processes, but a failure in one thread can affect other threads in the same process. Processes are isolated and require less synchronization, but threads require careful synchronization due to shared resources.

Sibling "threads" (in most operating systems) share the same virtual address space, the same sockets and open files, all the same resources
"Processes," on the other hand are isolated/protected from one another, and they share nothing except when they explicitly request to share some specific thing.
In an OS that has both "processes" and "threads," a process often can be thought of as a container for one or more threads and, for all of the resources that they share.
For example, Microsoft Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor computer, the system can simultaneously execute as many threads as there are processors on the computer.