top of page

Introduction to Scapy

Updated: Jul 16, 2021


Scapy
Scapy

In this article we will see what is scapy? How it works but before we must know about the network, protocol and their types, TCP port number and packets. Let's start.


What is a network?


A network consists of two or more computers in the network that use a set of common communication protocols over digital interconnection for the purpose of sharing information. Computer networks are categorized as Local Area Network LANs, Metropolitan Area Networks MANs and Wide Area Network WANs based on the maximum distance on the network. Networks that connect hosts in a room building, or building of campus are called LANs. The distance between hosts on a LAN can be anywhere from a few meters to about one kilometer. Networks that are used to connect hosts within a city, or between small cities are known as MANs. The distance between hosts on a MANs is about one to 20 kilometers. Networks that are used to connect hosts within a state or country are known as WANs. The distance between hosts on a WANs is in the range of ten of kilometers to thousands of kilometers.


Protocol is a set of rules that allow two or more devices to exchange information over the network. It is a router that determines a packet's path from source to destination hardware implemented protocols in the network interface cards of two physically connected computers control the flow of bits on the wire between the two computers. A congestion control protocol controls the rate at which packets are transmitted between sender and receiver. Protocols are running everywhere on the internet.


Transmission Control Protocol (TCP) is used for a set of rules to send the data into packet form over the network and ensure the successful delivery of the data on the destination site on the network . Internet Protocol is used for a set of rules to send and receive data at the IP address. An IP address is the unique identification for every device which is connected across the network. This Internet protocol has two versions IPv4 and IPv6.


IPv4


IP stands for Internet Protocol and v4 stands for version 4. It has 12 fields and Each IP address is 32 bit long (Equivalent 4 bytes). IP addresses are typically written in so-called dot decimal notation. Each byte of the address is in the decimal form and Each value should be between 0 to 255. For example, a typical address would be 193.32.216.9. The 193 is the decimal equivalent for the first 8 bits of the address and the 32 is the decimal equivalent for the second 8 bits of the address etc.


IPv6


New networks and IP nodes are being attached to the internet at a breathtaking rate. To respond to this need of large IP space, a new IP protocol, IPv6, was developed. IPv6 address moves 32 bit to a 128-bit address space. It allows 340 undecillion unique address spaces. IPv6 address contains letters and numbers and it has 8 groups. For example a typical address would be 2002:db8::8a2f:362:7797. In IPv6 there are three types of addresses. In addition to unicast and multicast addresses, a new type of address, called an anycast address, has also been introduced which allows a packet addressed to an anycast address to be delivered to any one of a group of hosts


TCP port number


An IP address is not sufficient to run a network application. On a computer running multiple applications or services. IP address just identifies only devices not their application. Port number identifies the application and services which are running on the computer. A TCP port number is a unique identification of each application and it is used with an IP address. A port is a 16 bit unsigned integer, and the total number of ports available in the TCP/IP model is 65,535 ports. Hence the range of port numbers is 0 to 65,535. Here is an example of port number 193.32.216.9:7. In that case 192.32.216.9 is an IP address and 7 is port number.


What is Scapy


Scapy is a python program and it is one of the best packet manipulation tools. It enables the user to send the dissect, sniff and forge packet. This allows their construction tools to scan, probe or attack on the network. In short, scapy is a most powerful tool for packet manipulation programs. We can easily forge and dissect packets with the help of scapy. Tracerouting, probing, unit testing and network scanning are some tasks handled by scapy. It can change most important tools like hping, arpspoof, arp-sk, arping, p0f etc. We can also set the source IP and destination IP and then we can send the packet from source IP to destination IP.


Scapy program
Scapy program

Scapy does two things: sending packets and receiving answers. Scapy is also capable of sending invalid frames, injecting your own 802.11 frames, combining techniques. Scapy sends a packet, and receives an answer that matches requests with answers and returns a list of packet couples (request, answer) and a list of unmatched packets. This has the big advantage over tools like Nmap or hping that an answer is not reduced to (open/closed/filtered), but is the whole packet. When we communicate or exchange the information across the network all the data is transferred in the packet form. Packet is a small segment of a large message. These packets are then recombined by the device that receives them. Packet header stores all the information like source and destination IP address TCP ports etc.


Why is scapy so special ?


Fast Packet Design

  • Scapy paradigm is to propose a domain specific language enabling a powerful and fast description of any kind of packet.

  • Scapy enables the user to describe a packet or set of packets as layers that are stacked one upon another.

  • Scapy does not oblige the user to use predetermined methods or templates.


Probe once, interpret many

  • Scapy gives all the information, i.e. all the stimuli sent and all the responses received.

  • Scapy gives the complete raw data, that data may be used many times allowing the viewpoint to evolve during analysis For e.g. a TCP port scan may be probed and the data visualized as the result of the port scan.



Pcap is an application programming interface for capturing network traffic. Wireshark is a network analyzer that is to create the .Pcap file to collect and record data from the network.


Now here Using the rdcap read the pcap file displaying data.


Code Snippet :

from scapy.all import *
import os
d = " "
data = "202011251400-78-5k.pcap"
d = rdpcap(data)
sessions = d.sessions()
print(sessions)

Output :


rdcap read the pcap file displaying data
rdcap read the pcap file displaying data

Now Let's analyze the network data. Identify elephant TCP flows, i.e., those flows that transfer a large amount of data, in a packet trace captured at a vantage point in the ISP network.


Now here Read the .Pcap file using PcapReader() and store the source IP, destination IP, source port number and destination port number and also packet length in the dictionary form.


Code Snippet :

from scapy.layers.inet import IP, TCP
pkts = 0
flows = 0
ft = {}
for pkt in PcapReader(data):
            pkts += 1
            if IP in pkt:
                if TCP in pkt:
                    ft[(pkt[IP].src,pkt[IP].dst,pkt[TCP].sport,pkt[TCP].dport)]= pkt.wirelen
                    flows +=1 
ft

Output :



elephant TCP flows
elephant TCP flows

Plotting Top 100 TCP flow size distribution


Code Snippet :

topn = 100
        data = [i/1000 for i in list(self.ft.values())]
        data.sort()
        data = data[-topn:]
        fig = plt.figure()
        ax = fig.add_subplot(1,1,1)
        ax.hist(data, bins=20, log=True)
        ax.set_ylabel('# of flows')
        ax.set_xlabel('Data sent [KB]')
        ax.set_title('Top {} TCP flow size distribution.'.format(topn))
        plt.savefig(sys.argv[1] + '.flows.pdf', bbox_inches='tight')
        plt.close()

Output :



Plotting TCP flow size distribution
Plotting TCP flow size distribution


Thank You.

bottom of page