Skip to content

Parsing DNS Messages

This is a companion handout to the Iterative Domain Name resolver assignment.

To handle creation and parsing of DNS messages in your own code without using third-party DNS libraries, you will use the struct module, especially the pack, unpack, and unpack_from functions.

python
from struct import pack, unpack, unpack_from
  • The pack() function packs values into a binary data (raw bytes)
  • The unpack() and unpack_from() functions unpack a binary data into its individual element(s)

Structure of DNS Messages

Section 4.1 of RFC1035 specifies five parts of a DNS Message:

  • Header
  • Question Section
  • Answer Section
  • Authority Section
  • Additional Record Section

The header itself is a fixed structure of 12 bytes with the following layout[1]

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      ID                       |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    QDCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ANCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    NSCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ARCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

For the purpose of illustration, the above message format can be represented by the following C struct:

Creating a DNS Query Message

Suppose you are about to create a DNS message to query the status of a particular nameserver. For this particular scenario, the messages contains only the header, 0 questions, 0 answers, 0 authority, and 0 additional records.

Creating the Message Header

Each message should have a unique ID, so in case a client sends multiple queries, it will be able to associate incoming responses to their outgoing queries. Suppose, you randomly select 0xA047 as the ID of this outgoing message. The binary representation of your header should look like:

+---------------------------------+
| 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 |  ID = 0xA047
+---------------------------------+
| 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 |  Opcode = 2
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  QDCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  ANCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  NSCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  ARCOUNT = 0
+---------------------------------+

and it can be created using the following snippet:

python
qdcount = 0
ancount = 0
nscount = 0
arcount = 0
                               #      QR  OP  AA TC RD RA  Z  RCODE
dns_header = pack(">HHHHHH", 0xA047, 0b0__0010_0__0__0__0_000_0000,
  qdcount, ancount, nscount, arcount)

Now suppose you want to create a DNS query to get name servers for gvsu.edu. Your message would have only 1 question, 0 answers, 0 authority, and 0 additional records.

+---------------------------------+
| 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 |  ID = 0xB103
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  Opcode = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 |  QDCOUNT = 1
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  ANCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  NSCOUNT = 0
+---------------------------------+
| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |  ARCOUNT = 0
+---------------------------------+

the header can be created using the following snippet:

python
qdcount = 1
ancount = 0
nscount = 0
arcount = 0
dns_header_withq = pack(">HHHHHH", 0xB103, 0,
  qdcount, ancount, nscount, arcount)

Creating the Question Section

Section 4.1.2 of RFC1035 specifies that the question section has the following three parts:

  • QNAME (variable length) holds the domain name a sequence of labels.
  • QTYPE (2 bytes) is the question/query type
  • QCLASS (2 bytes) is the class of the query
                             1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                     QNAME                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QTYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QCLASS                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

DNS Labels

Section 3.1 of RFC1025 specifies how to create each label: it begins with a one-byte length followed the label itself. The last label is always the null label.

For instance

  • github.io is represented as the following three labels:

    LengthLabelASCII Encoding
    6github06 67 69 74 68 75 62
    2io02 69 6F
    000

    Altogether, the domain github.io would be encoded as 11 octets (including the last NULL).

    • edu is represented as the following two labels:
    LengthLabelASCII Encoding
    3edu03 65 64 75
    000

    Altogether, the domain edu would be encoded as 5 octets (including the last NULL).

To create a label you and can use the following snippet:

python
# Create gvsu.edu label
my_label = bytearray()
my_label.append(4)                # length of the first label
my_label.extend("gvsu".encode())  # the first label
my_label.append(3)                # length of the second label
my_label.extend("edu".encode())   # the second label
my_label.append(0)                # null label

# Create the question section
my_qtype = 2  # Query for Name server
my_qclass = 1 # Class for the Internet

my_question_section = my_label + pack(">HH", my_qtype, my_qclass)

# Concatenate the header and the question section
my_dnsmessage = dns_header_withq + my_question_section

Sending the Query over UDP Socket and Parsing The response

python
# Send to Verisign at 198.41.0.4 to Port 53

my_socket.sendto(my_dnsmessage, ("198.41.0.4", 53))
response, addr = my_socket.recvfrom(8192)

Use struct.unpack_from() to parse the incoming response:

python
dns_id, flags, qcount, acount, nscount, arcount = unpack_from(">HHHHHH", response)
print(f"ID={dns_id} Flags={hex(flags)} Q={qcount} A={acount} NS={nscount} AR={arcount}")

TIP

To confirm that the packet is formatted correctly, accepted and responded by the remote server, it is strongly recommended that you open Wireshark and apply the filter udp.port == 53 && ip.addr == 198.41.0.4 to show only DNS traffic between you and the VeriSign server.

The following screenshots show the details of both DNS query (first screenshot) and response (second screenshot) messages captured from Wireshark.

The actual bytes of the DNS response message is shown below:

The first 12 bytes are part of the DNS header:

ValueDescription
0xb103Message ID
0x8200Various bit flags
0x0001Number of questions (1)
0x0000Number of answers
0x000dNumber of authority records (13)
0x000bNumber of additional records (11)

The next 14 bytes is the question section:

ValueDescription
04 67 76 73 75 03 65 64 75 00Label gvsu.edu
00 02Qtype = 2 (NS)
00 01Qclass = 1 (Internet)

Label Compression

As evident from the screenshots above, it is quite common that a DNS response message carries multiple domain names with similar pattern such as (a.edu-servers.net, b.edu-servers.net, ..., m.edu-servers.net, etc.). Using regex, these domain names can be represented as a shorter expression [a-m].edu.servers.net. The DNS protocol itself does not use regex, it is shown here to give you a general idea that, indeed using a clever technique, these names can be shortened. To keep the message size small(er), repeated names are shortened using a "Message Compression" technique as explained in Section 4.1.4 of RFC1035.

The following screenshot shows the raw bytes of an incoming DNS response that includes 13 records of domain names from *.edu-servers.net. However, a few bytes after you can see the letters b (offset 0x46), f (offset 0x56), h, ... encoded in the response message. This is a clear evidence of label compression.

The only one name readable to humans is d.edu-servers.net (starting at offset 0x26) encoded as the following bytes:

 1    11                                   3          null 
01 64 0b 65 64 75 2d 73 65 72 76 65 72 73 03 6e 65 74 00
   d     e  d  u  -  s  e  r  v  e  r  s     n  e  t

The other domain names are compressed. For instance, b.edu-servers.net is encoded as the following bytes (starting at offset 0x0045)

BytesDescription
01 62The first label of length 1 (b ASCII 0x62)
c0 28A pointer to subsequent label(s) at offset 0x28

When the two most significant bits of "length byte" are a '1' (or in binary 11xx xxxx), that byte and the next byte represent a "pointer" (instead of label length). Further inspection at offset 0x28 shows the following bytes:

Len:11                                   3          null
    0b 65 64 75 2d 73 65 72 76 65 72 73 03 6e 65 74 00
        e  d  u  -  s  e  r  v  e  r  s     n  e  t

  1. ASCII art copied from RFC1035 ↩︎