Overview
When you deal with external binary data in Python, there are a couple of ways to get that data into a data structure. You can use the ctypes
module to define the data structure or you can use the struct
python module.
You will see both methods used when you explore tool repositories on the web. This article shows you how to use each one to read an IPv4 header off the network. It’s up to you to decide which method you prefer; either way will work fine.
ctypes
is a foreign function library for Python. It deals with C-based languages to provide C-compatible data types, and enables you to call functions in shared libraries.struct
converts between Python values and C structs that are represented as Python bytes objects.
So ctypes
handles binary data types in addition to a lot of other functionality, while handling binary data is the main purpose of the struct
module.
Let’s see how these two libraries are used when we need to decode an IPv4 header off the network.
First, here’s the structure of the IPv4 header. This is from the IETF RFC 791:
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Initial Data from the Network
We need some data to work with, so let’s get a single packet from the network. This little snippet show do fine. I ran this on Linux.
import socket
import sys
def sniff(host):
sniffer = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_ICMP)
sniffer.bind((host, 0))
sniffer.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1)
# read and return a single packet
return sniffer.recvfrom(65535)
if __name__ == '__main__':
if len(sys.argv) == 2:
host = sys.argv[1]
else:
host = '192.168.1.69'
buff = sniff(host)
We just grab a single raw packet from the network and put it into a variable, buff
. So now that we have binary data, let’s look at how to use it.
ctypes
module
The following code snippet defines a new class, IP
that can read a packet and parse the header into its separate fields.
from ctypes import *
import socket
import struct
class IP(Structure):
_fields_ = [
("ihl", c_ubyte, 4),
("version", c_ubyte, 4),
("tos", c_ubyte, 8),
("len", c_ushort, 16),
("id", c_ushort, 16),
("offset", c_ushort, 16),
("ttl", c_ubyte, 8),
("protocol_num", c_ubyte, 8),
("sum", c_ushort, 16),
("src", c_uint32, 32),
("dst", c_uint32, 32)
]
def __new__(cls, socket_buffer=None):
return cls.from_buffer_copy(socket_buffer)
def __init__(self, socket_buffer=None):
# human readable IP addresses
self.src_address = socket.inet_ntoa(struct.pack("<L",self.src))
self.dst_address = socket.inet_ntoa(struct.pack("<L",self.dst))
You can see that the _fields_
structure defines each part of the header, giving the width in bits as the last argument. Being able to specify the bit width is handy. Our IP
class inherits from the ctypes
Structure
class, which specifies that we must have a defined _fields_
structure before any instance is created.
Class Instantiation
The wrinkle with ctypes
Structure
abstract base class is the __new__
method. See the documentation for full details: ctypes module.
The __new__
method takes the class reference as the first argument. It creates and returns an instance of the class, which passes to the __init__
method.
We create the instance normally, but underneath, Python invokes the class method __new__
, which fills out the _fields_
data structure immediately before instantiation (when the __init__
method is called). As long as you’ve defined the structure beforehand, just pass the __new__
method the external (network packet) data, and the fields magically appear as attributes on your instance.
struct
module
The struct
module provides format characters that you used to specify the structure of the binary data. The first character (in our case, <
) specifies the “endianness” of the data. See the documentation for full details: struct module.
import ipaddress
import struct
class IP:
def __init__(self, buff=None):
header = struct.unpack('<BBHHHBBH4s4s', buff)
self.ver = header[0] >> 4
self.ihl = header[0] & 0xF
self.tos = header[1]
self.len = header[2]
self.id = header[3]
self.offset = header[4]
self.ttl = header[5]
self.protocol_num = header[6]
self.sum = header[7]
self.src = header[8]
self.dst = header[9]
# human readable IP addresses
self.src_address = ipaddress.ip_address(self.src)
self.dst_address = ipaddress.ip_address(self.dst)
# map protocol constants to their names
self.protocol_map = {1: "ICMP", 6: "TCP", 17: "UDP"}
Here are the individual parts of the header.
- B 1 byte (
ver
,hdrlen
) - B 1 byte
tos
- H 2 bytes
total len
- H 2 bytes
identification
- H 2 bytes
flags + frag offset
- B 1 byte
ttl
- B 1 byte
protocol
- H 2 bytes
checksum
- 4s 4 bytes
src ip
- 4s 4 bytes
dst ip
Everything is pretty straightforward, but with ctypes
, we could specify the bit-width of the individual pieces. With struct
, there’s no format character for a nybble
(4 bits), so we have to do some manipulation to get the ver
and hdrlen
from the first part of the header.
Binary Manipulations
The wrinkle with struct
in this example is that we need to do some manipulation of header[0]
, which contains a single byte but we need to create two variables from that byte, each containing a nybble
.
High nybble
We have one byte and for the ver
variable, we want the high-order nybble
. The typical way you get the high nybble
of a byte is to right-shift.
We right shift the byte by 4 places, which is like prepending 4 zeros at the front so the last 4 bytes fall off, leaving us with the first nybble
:
0 1 0 1 0 1 1 0 >> 4
-----------------------------
0 0 0 0 0 1 0 1
Low nybble
We have one byte and for the hdrlen
variable, we want the low-order nybble
. The typical way you get the low nybble
of a byte is to AND
it with F
(00001111):
0 1 0 1 0 1 1 0 &F
0 0 0 0 1 1 1 1
-----------------------------
0 0 0 0 0 1 1 0
Let’s look an example in the Python REPL:
>>> m = 66
>>> m
66
>>> bin(m)
'0b1000010' # or 0100 0010
>>> bin(m>>4)
'0b100' # or 0100
>>> bin(m&0xF)
'0b10' # or 0010
Now, more specifically to our IPv4 case, the first byte in the header is always 0x45 = 69 decimal = 01000101 binary
.
See what that looks like when we right-shift it by 4 and then AND
it with F
:
>>> '{0:08b}'.format(0x45)
'01000101'
>>> '{0:04b}'.format(0x45>>4)
'0100'
>>> '{0:04b}'.format(0x45&0xF)
'0101'
You don’t have to know binary manipulation backward and forward for decoding an IP header, but there are some patterns like these (shift and AND
) you will see over and over again as you code and as you explore other hackers’ code.
That seems like a lot of work doesn’t it? In the case where we have to do some bit shifting, it does take effort. But for many cases (e.g. ICMP), everything works on an 8-byte boundary and so is very simple to set up. Here is an “Echo Reply” ICMP message; you can see that each parameter of the ICMP header can be defined in a struct
with one of the existing format letters (BBHHH) (RFC777):
Echo or Echo Reply Message
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+-
A quick way to parse that would simply be:
class ICMP:
def __init__(self, buff):
header = struct.unpack('<BBHHH', buff)
self.type = header[0]
self.code = header[1]
self.sum = header[2]
self.id = header[3]
self.seq = header[4]
Conclusion
You can use either the ctypes
module or the struct
module to read and parse binary data. Here is an example of instantiating the class no matter which method you use. You instantiate the IP
class with your packet data in the variable buff
:
mypacket = IP(buff)
print(f'{mypacket.src_address} -> {mypacket.dst_address}')
With ctypes
, make sure you define your _fields_
structure and hand the data to it in the _new_
method. When you instantiate the class, you’ll have the access to the data attributes automatically.
With struct
, you define how to read the data with a format string. For data attributes that don’t lie on the 8-byte boundary, you may need to do some binary manipulation.
In short, use whichever method fits your brain. But always be aware that you may see code from others that use a different method. Hopefully, now you’ll see it and understand it.
from Planet Python
via read more
No comments:
Post a Comment