Parsing DNS Messages
This is a companion handout to the Iterative Domain Name resolver assignment.
To handle creation and parsing of DNS messages in your own code without using third-party DNS libraries, you will use the struct
module, especially the pack
, unpack
, and unpack_from
functions.
from struct import pack, unpack, unpack_from
- The
pack()
function packs values into a binary data (raw bytes) - The
unpack()
andunpack_from()
functions unpack a binary data into its individual element(s)
Structure of DNS Messages
Section 4.1 of RFC1035 specifies five parts of a DNS Message:
- Header
- Question Section
- Answer Section
- Authority Section
- Additional Record Section
The header itself is a fixed structure of 12 bytes with the following layout[1]
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
For the purpose of illustration, the above message format can be represented by the following C struct:
Creating a DNS Query Message
Suppose you are about to create a DNS message to query the status of a particular nameserver. For this particular scenario, the messages contains only the header, 0 questions, 0 answers, 0 authority, and 0 additional records.
Creating the Message Header
Each message should have a unique ID, so in case a client sends multiple queries, it will be able to associate incoming responses to their outgoing queries. Suppose, you randomly select 0xA047 as the ID of this outgoing message. The binary representation of your header should look like:
+---------------------------------+
| 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 | ID = 0xA047
+---------------------------------+
| 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 | Opcode = 2
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | QDCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ANCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | NSCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ARCOUNT = 0
+---------------------------------+
and it can be created using the following snippet:
qdcount = 0
ancount = 0
nscount = 0
arcount = 0
# QR OP AA TC RD RA Z RCODE
dns_header = pack(">HHHHHH", 0xA047, 0b0__0010_0__0__0__0_000_0000,
qdcount, ancount, nscount, arcount)
Now suppose you want to create a DNS query to get name servers for gvsu.edu
. Your message would have only 1 question, 0 answers, 0 authority, and 0 additional records.
+---------------------------------+
| 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 | ID = 0xB103
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | Opcode = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 | QDCOUNT = 1
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ANCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | NSCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ARCOUNT = 0
+---------------------------------+
the header can be created using the following snippet:
qdcount = 1
ancount = 0
nscount = 0
arcount = 0
dns_header_withq = pack(">HHHHHH", 0xB103, 0,
qdcount, ancount, nscount, arcount)
Creating the Question Section
Section 4.1.2 of RFC1035 specifies that the question section has the following three parts:
- QNAME (variable length) holds the domain name a sequence of labels.
- QTYPE (2 bytes) is the question/query type
- QCLASS (2 bytes) is the class of the query
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ QNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
DNS Labels
Section 3.1 of RFC1025 specifies how to create each label: it begins with a one-byte length followed the label itself. The last label is always the null label.
For instance
github.io
is represented as the following three labels:Length Label ASCII Encoding 6 github 06 67 69 74 68 75 62 2 io 02 69 6F 0 00 Altogether, the domain
github.io
would be encoded as 11 octets (including the last NULL).edu
is represented as the following two labels:
Length Label ASCII Encoding 3 edu 03 65 64 75 0 00 Altogether, the domain
edu
would be encoded as 5 octets (including the last NULL).
To create a label you and can use the following snippet:
# Create gvsu.edu label
my_label = bytearray()
my_label.append(4) # length of the first label
my_label.extend("gvsu".encode()) # the first label
my_label.append(3) # length of the second label
my_label.extend("edu".encode()) # the second label
my_label.append(0) # null label
# Create the question section
my_qtype = 2 # Query for Name server
my_qclass = 1 # Class for the Internet
my_question_section = my_label + pack(">HH", my_qtype, my_qclass)
# Concatenate the header and the question section
my_dnsmessage = dns_header_withq + my_question_section
Sending the Query over UDP Socket and Parsing The response
# Send to Verisign at 198.41.0.4 to Port 53
my_socket.sendto(my_dnsmessage, ("198.41.0.4", 53))
response, addr = my_socket.recvfrom(8192)
Use struct.unpack_from()
to parse the incoming response:
dns_id, flags, qcount, acount, nscount, arcount = unpack_from(">HHHHHH", response)
print(f"ID={dns_id} Flags={hex(flags)} Q={qcount} A={acount} NS={nscount} AR={arcount}")
TIP
To confirm that the packet is formatted correctly, accepted and responded by the remote server, it is strongly recommended that you open Wireshark and apply the filter udp.port == 53 && ip.addr == 198.41.0.4
to show only DNS traffic between you and the VeriSign server.
The following screenshots show the details of both DNS query (first screenshot) and response (second screenshot) messages captured from Wireshark.
The actual bytes of the DNS response message is shown below:
The first 12 bytes are part of the DNS header:
Value | Description |
---|---|
0xb103 | Message ID |
0x8200 | Various bit flags |
0x0001 | Number of questions (1) |
0x0000 | Number of answers |
0x000d | Number of authority records (13) |
0x000b | Number of additional records (11) |
The next 14 bytes is the question section:
Value | Description |
---|---|
04 67 76 73 75 03 65 64 75 00 | Label gvsu.edu |
00 02 | Qtype = 2 (NS) |
00 01 | Qclass = 1 (Internet) |
Label Compression
As evident from the screenshots above, it is quite common that a DNS response message carries multiple domain names with similar pattern such as (a.edu-servers.net
, b.edu-servers.net
, ..., m.edu-servers.net
, etc.). Using regex, these domain names can be represented as a shorter expression [a-m].edu.servers.net
. The DNS protocol itself does not use regex, it is shown here to give you a general idea that, indeed using a clever technique, these names can be shortened. To keep the message size small(er), repeated names are shortened using a "Message Compression" technique as explained in Section 4.1.4 of RFC1035.
The following screenshot shows the raw bytes of an incoming DNS response that includes 13 records of domain names from *.edu-servers.net
. However, a few bytes after you can see the letters b
(offset 0x46), f
(offset 0x56), h
, ... encoded in the response message. This is a clear evidence of label compression.
The only one name readable to humans is d.edu-servers.net
(starting at offset 0x26) encoded as the following bytes:
1 11 3 null
01 64 0b 65 64 75 2d 73 65 72 76 65 72 73 03 6e 65 74 00
d e d u - s e r v e r s n e t
The other domain names are compressed. For instance, b.edu-servers.net
is encoded as the following bytes (starting at offset 0x0045)
Bytes | Description |
---|---|
01 62 | The first label of length 1 (b ASCII 0x62) |
c0 28 | A pointer to subsequent label(s) at offset 0x28 |
When the two most significant bits of "length byte" are a '1' (or in binary 11xx xxxx), that byte and the next byte represent a "pointer" (instead of label length). Further inspection at offset 0x28 shows the following bytes:
Len:11 3 null
0b 65 64 75 2d 73 65 72 76 65 72 73 03 6e 65 74 00
e d u - s e r v e r s n e t
ASCII art copied from RFC1035 ↩︎