the last question

comma_four.jpg

Dec 4, 2025

Spoiler warning: This contains complete solutions to all five flags from the comma.ai NeurIPS CTF. If you want to solve it yourself first, stop reading.

Comma.ai released a CTF for their NeurIPS event. This was my first CTF. Took about two days.

Starting point

The challenge was at https://comma.ai/neurips. There was an image file called comma_four.jpg.

comma_four

Flag 1 - EXIF metadata

Ran exiftool on the image.

exiftool comma_four.jpg

The UserComment field contained the first flag.

congratulations for finding the first flag{[redacted]} flags have this format flag{x} 
please submit x ([redacted] in this case) to the following link: 
[redacted] then go to https://commaai.github.io/model_reports 
for the next flag. Remember this flag, and the next ones you find, they might be useful! 
If the flag requires going to other territories, it will be very explicit, with a clear 
instruction saying to go somewhere 'for the next flag'. Good luck!

flag 1: [redacted]

Flag 2 - Zero-width unicode stego

The model_reports directory had many subdirectories. Wait… a UUID starting with the flag we just found? Interesting.

model reports

There was a readme file at [redacted]-077d-461f-9df9-dd28aa0b6b26/400/README.txt. It contained plain text about model evaluation reports. Nothing obviously suspicious:

comma.ai uses this repository to share all reports used to evaluate the models shipped to openpilot.
Examples:
- North Nevada driving model 0.10.1: e2d9c622-25a8-4ccd-8c8e-c62537b7aa0c/400/
- World Model used in 0.10.1: 923eee54-b95d-465c-a9d7-8c1064170270/90/
- Auto Encoder used in 0.10.1: 4672da0d-19f5-44f8-a5fb-2215981c9c0e/50/
- Driver Monitoring Model used in 0.10.1: 59cfd731-6f80-4857-9271-10d952165079/200/
these reports run during training and constitute our entire automated test and evaluation suite. Check them out!​‌‌​​​‌‌​‌‌​‌‌‌‌​‌‌​‌‌‌​​‌‌​​‌‌‌​‌‌‌​​‌​​‌‌​​​​‌​‌‌‌​‌​​​‌‌‌​‌​‌​‌‌​‌‌​​​‌‌​​​​‌​‌‌‌​‌​​​‌‌​‌​​‌​‌‌​‌‌‌‌​‌‌​‌‌‌​​‌‌‌​​‌‌​​‌​​​​​​‌‌​​‌‌​​‌‌​‌‌‌‌​‌‌‌​​‌​​​‌​​​​​​‌‌​​‌‌​​‌‌​‌​​‌​‌‌​‌‌‌​​‌‌​​‌​​​‌‌​‌​​‌​‌‌​‌‌‌​​‌‌​​‌‌‌​​‌​​​​​​‌‌‌​‌​​​‌‌​‌​​​​‌‌​​‌​‌​​‌​​​​​​‌‌‌​​‌‌​‌‌​​‌​‌​‌‌​​​‌‌​‌‌​‌‌‌‌​‌‌​‌‌‌​​‌‌​​‌​​​​‌​​​​​​‌‌​​‌‌​​‌‌​‌‌​​​‌‌​​​​‌​‌‌​​‌‌‌​‌‌‌‌​‌‌​​‌‌‌​​‌​​‌‌​​​​​​‌‌‌​​‌​​‌‌​‌‌​​​‌‌​​‌‌​​‌‌​‌‌​​‌‌​​‌​‌​​‌‌​​‌​​‌‌‌‌‌​‌​​‌​‌‌‌​​​‌​​​​​​‌‌​​‌‌​​‌‌​‌​​‌​‌‌​‌‌‌​​‌‌​​‌​​​​‌​​​​​​‌‌‌​‌​​​‌‌​‌​​​​‌‌​​‌​‌​​‌​​​​​​‌‌​‌‌‌​​‌‌​​‌​‌​‌‌‌‌​​​​‌‌‌​‌​​​​‌​​​​​​‌‌​​‌‌​​‌‌​‌‌​​​‌‌​​​​‌​‌‌​​‌‌‌​​‌​​​​​​‌‌​‌​​‌​‌‌​‌‌‌​​​‌​​​​​​‌‌‌​‌​​​‌‌​‌​​​​‌‌​​‌​‌​​‌​​​​​​‌‌​‌‌‌‌​‌‌‌​​​​​‌‌​​‌​‌​‌‌​‌‌‌​​‌‌‌​​​​​‌‌​‌​​‌​‌‌​‌‌​​​‌‌​‌‌‌‌​‌‌‌​‌​​​​‌​​​​​​‌‌‌​​‌​​‌‌​​‌​‌​‌‌‌​​​​​‌‌​‌‌‌‌​‌‌‌​​‌‌​‌‌​‌​​‌​‌‌‌​‌​​​‌‌​‌‌‌‌​‌‌‌​​‌​​‌‌‌‌​​‌​​‌​‌‌​​​​‌​​​​​​‌‌​​‌​​​‌‌​‌‌‌‌​‌‌​‌‌‌​​​‌​​‌‌‌​‌‌‌​‌​​​​‌​​​​​​‌‌​​‌‌​​‌‌​‌‌‌‌​‌‌‌​​‌​​‌‌​​‌‌‌​‌‌​​‌​‌​‌‌‌​‌​​​​‌​​​​​​‌‌‌​‌​​​‌‌​‌​​​​‌‌​​‌​‌​​‌​​​​​​‌‌​​​‌​​‌‌‌​​‌​​‌‌​​​​‌​‌‌​‌‌‌​​‌‌​​​‌‌​‌‌​‌​​​​‌‌​​‌​‌​‌‌‌​​‌‌​​‌​​​​‌

The file was 3,619 bytes even though the visible paragraph was roughly 400 characters. Hidden data, obviously.

curl -sL https://commaai.github.io/model_reports/[redacted].../README.txt | wc -c
# 3619

curl -sL https://commaai.github.io/model_reports/[redacted].../README.txt | cat -v
# ...Check them out!​M-^@M-^K​M-^@M-^L​M-^@M-^L​M-^@M-^K...

cat -v showed escape sequences after the visible text: M-^@M-^K and M-^@M-^L repeated hundreds of times. Didn’t know what those were.

Googled “invisible unicode steganography” and found that zero-width characters (U+200B, U+200C, etc.) are commonly used to hide binary data in text. Checked if those were present:

python3 -c "
text = open('README.txt').read()
for c in set(text):
    if 0x200b <= ord(c) <= 0x200f:
        print(f'U+{ord(c):04X}: {text.count(c)} occurrences')
"
# U+200B: 545 occurrences
# U+200C: 487 occurrences

Only two character types: U+200B (545) and U+200C (487). 1032 bits total = 129 bytes of hidden data.

Two characters means two possible bit mappings. Tried U+200B=0/U+200C=1 first.

That worked. Wrote a decoder:

import sys

# mapping: U+200B=0, U+200C=1
data = sys.stdin.read()
bits = "".join("0" if c == "\u200b" else "1" if c == "\u200c" else "" for c in data)
out = "".join(chr(int(bits[i:i+8], 2)) for i in range(0, len(bits), 8) if len(bits[i:i+8]) == 8)
print(out)

This decoded to the second flag and a message about checking the openpilot repository.

congratulations for finding the second flag{[redacted]}. find the next flag in the 
openpilot repository, don't forget the branches!

flag 2: [redacted]

Flag 3 - Text in ONNX model

Cloned the openpilot repo and checked out the neurips-driving branch.

git clone --depth=1 --no-single-branch https://github.com/commaai/openpilot.git
cd openpilot
git fetch --depth=1 origin neurips-driving
git checkout -f FETCH_HEAD

Ran strings on the model file:

strings -a selfdrive/modeld/models/driving_policy.onnx | grep -ni "flag{"

Found the third flag as plain text in the ONNX file.

congratulations for finding the third flag{[redacted]}
for the next flag: go to hf/datasets/commaai/comma2k19
the names are Vincent Rijmen and Joan Daemen
the dongle_id is b0c9d2329ad1606b
the date is 2018-08-16
the time is 21-52-30
the CAN boot time is around 17602.32
the CAN speed is x m/s
the key is md5(x:.1f).digest() # 128 bits
European Central Bank code:
55c2ffe03e69a22836834f26e4deb10d71b6b704c99faf39357587516a58b096c8ea3ecc0800fec4ac501b52cca00903011e34d604ed6b9b99e88b5571f3876bb0370

Vincent Rijmen and Joan Daemen created AES. So this is AES decryption with the key derived from speed.

flag 3: [redacted]

Flag 4 - AES brute force

The “European Central Bank” hint pointed to ECB mode (AES-ECB). The hex string was ciphertext. Key derivation was MD5 of a speed value formatted as a single-decimal float.

Car speeds are bounded, so 0-60 m/s at 0.1 increments gives 601 possibilities. Brute force it.

Wrote a script that extracted the hex ciphertext directly from the ONNX file and tried each possible speed:

from hashlib import md5
from Crypto.Cipher import AES
import binascii

# extract hex blob from onnx (see brute_force.py for full extraction logic)
hex_blob = "55c2ffe03e69a22836834f26e4deb10d71b6b704c99faf39357587516a58b096..."
data = binascii.unhexlify(hex_blob)

for i in range(0, 600+1):
    x = i/10.0
    key = md5(f"{x:.1f}".encode()).digest()
    
    try:
        p = AES.new(key, AES.MODE_ECB).decrypt(data)
        s = p.rstrip(b"\x00").decode("utf-8", "ignore")
        if "flag{" in s:
            print(f"speed: {x:.1f} m/s")
            print(s)
            break
    except:
        pass

The correct speed was 30.8 m/s.

congratulations on finding the fourth flag{[redacted]}. go to the comma_four.jpg image 
again for the last flag. Derek Upham. left to right, top to bottom, 8x8 zig-zag. 
[first 32 bits = payload length (in bytes)] [payload]

Back to the original image. Derek Upham created jsteg.

flag 4: [redacted]

Flag 5 - jsteg

jsteg is a JPEG steganography tool. It hides data in the LSB of JPEG’s quantized DCT coefficients.

JPEG compression:

Splits image into 8x8 blocks
Applies DCT (converts to frequency domain)
Quantizes the coefficients (rounds them for compression)

jsteg embeds bits in these quantized coefficients. Extraction rules:

Skip DC coefficient (index 0)
Skip coefficients where |value| ≤ 1
Traverse blocks row-major (left to right, top to bottom)
Within each block use zigzag scan order
Extract LSB of each remaining coefficient
First 32 bits encode payload length (bytes)
Next (length × 8) bits are the payload

This part was annoying. Small mistakes (including zeros, including ±1 coefficients, wrong zigzag order, wrong byte bit order, wrong endianness) make the length field decode to garbage. Instead of 51 bytes, you get 2 million bytes and the entire extraction fails. Had to get the exact jsteg convention right.

from jpegio import read

zz = [
 0,1,8,16,9,2,3,10,17,24,32,25,18,11,4,5,
 12,19,26,33,40,48,41,34,27,20,13,6,7,14,21,28,
 35,42,49,56,57,50,43,36,29,22,15,23,30,37,44,51,
 58,59,52,45,38,31,39,46,53,60,61,54,47,55,62,63
]

def bits_to_int(b):
    return int(''.join(str(x) for x in b), 2)

def bits_to_bytes(bits):
    out = bytearray()
    for i in range(0, len(bits), 8):
        chunk = bits[i:i+8]
        if len(chunk) < 8:
            chunk += [0]*(8-len(chunk))
        out.append(int(''.join(str(x) for x in chunk), 2))
    return bytes(out)

img = read("comma_four.jpg")
coefs = img.coef_arrays[0]
h, w = coefs.shape

bits = []
for by in range(0, h, 8):
    for bx in range(0, w, 8):
        blk = coefs[by:by+8, bx:bx+8]
        for idx in zz[1:]:
            r, c = divmod(idx, 8)
            v = int(blk[r, c])
            if abs(v) <= 1:
                continue
            bits.append(abs(v) & 1)

length = bits_to_int(bits[:32])
payload_bits = bits[32:32 + length*8]
payload = bits_to_bytes(payload_bits)

print(payload.decode('utf-8'))

congratulations for finding the last flag{[redacted]}

flag 5: [redacted]