printfand format strings
printf command produces formatted output. In C, it takes the form:
int printf(const char *format, ...)
… where the first parameter is a format string and the following parameters are its arguments. For example:
printf("Hello, %s! Your user id is %d.\n", name, uid);
In a format string, format specifiers start with a
For each specifier in the format string, there should (usually) be one parameter of a matching type in the argument list:
||unsigned int, prints in hex|
Many of these options can take a further numeric argument between the
% sign and the type specifier.
%08x prints an integer in hexadecimal using exactly 8 bytes, padding with leading zeroes if necessary (so 255 prints as
Integer specifiers can also take a length modifier.
%hd means a short int and
%hhd is a char, whereas
%ld is a long int and
%lld a long long int.
%n is unusual in that it writes rather than reads a value.
Like all integer specifiers, it too can take length modifiers.
If 8 bytes have been written so far,
%n writes the 4-byte value 0x00000008 to its argument, whereas
%hhn writes the single byte (char) 0x08.
%n exists at all is debatable but it does allow some tricks such as this one courtesy of http://stackoverflow.com/a/3402415:
int n; printf("%s: %nFoo\n", "hello", &n); printf("%*sBar\n", n, "");
… which prints the following:
hello: Foo Bar
The point is that
Bar are aligned, however long the leading string (in this case
hello: ) is.
* length modifier, as in
%*s, specifies that this argument takes two parameters in the argument list, the first being an int that gives a length specification, the second being the actual argument.
printf("%*d", len, var) with
len=8 has the same effect as
printf("%8d", var), except, of course, that in the former the length can be adjusted at runtime.
You do not need to use the
* trick in this lab however.
%n has proven much more useful for format string attacks—due to its ability to overwrite memory locations—than it has ever been for legitimate uses; but attempts to remove it in C have met with resistance because there are apparently legitimate programs that use it too.
printf for other languages can choose not to implement
%n though: Python’s print/format mechanism does not have the
For more information on
printf’s many options, type
man 3 printf or look on the Internet.
printf, varargs, and the stack
printf uses C’s varargs (variable number of arguments) mechanism.
This lets it accept any number and type of arguments, but at the cost of no type checking.
Although the compiler can complain that, e.g.,
printf("Hello %s!") is probably a mistake, as it knows the semantics of
printf, the compiler cannot tell whether
printf(mystr, name) is correct—it can only warn you that this is dangerous.
When you call
printf("Hello, %s! Your user id is %d.\n", name, uid);
… the stack frame of
printf looks something like this:
uid is passed by value (copied onto the stack), as is the pointer
name (of type
char*, presumably), but not the string that it points to.
The parameters are pushed onto the stack in reverse order, followed by the format string itself.
(This has nothing to do with
printf, it is simply how varargs work.)
printf executes, it sets up an internal pointer to its first argument, the address of the format string.
Then it iterates over the format string.
Characters that are not a
% simply get printed out.
Each time it encounters a
% parameter in the format string (except
%%), it moves its pointer up one step on the stack and processes the argument.
If you call
printf with too few parameters, the effect is that the pointer ends up pointing at addresses on the stack beyond the
printf parameters; C calls this “undefined behaviour”, we call it a vulnerability.
The effect is that each
%-parameter in the format string that does not have a matching argument causes
printf to read (or, in the case of
%n, even write) to a memory address related to something further up on the stack.
Above the stack frame of
printf lies the stack frame of the function that called it, with all its local variables that we can target in a format string attack.
The classical format string vulnerability is a line such as:
… where str is user input.
There is no reason ever to do this, as there’s no formatting involved: since you’re just printing a string,
puts(str) would do the same job—or
printf("%s", str), if you must.
Most compilers nowadays treat a one-argument
printf call as an error, or at least warn you about it.
Why would we attack a program if we can just open the program file ourselves?
Normally, any program we run in a UNIX-based operating system has the same permissions as our user account. However, some tasks such as changing our password require access to files we can’t normally write to, such as the system password file.
Setuid programs run as the program’s owner instead of its caller. For example:
$ ls -l $(which passwd) -rwsr-xr-x 1 root root 47032 Jan 27 2016 /usr/bin/passwd
s in the fourth column indicates that this is a setuid program.
r-x in columns 8-10 means that any logged in user can execute this program, and make a copy of the program file and inspect, debug, or modify this copy should they wish to—but making a copy creates a new file owned by themselves and turns off the setuid bit again.
Only the original program will run with root rights.
If there were a format string vulnerability in this program, we might be able to exploit it to gain root rights on the machine we’re working on.
Of course we already have a way to gain root rights on the lab vm (
sudo), but this is just to make setting up and investigating the attacks easier—a fully developed format string attack should, of course, work against a program on a machine where we have no legitimate way of getting root rights.
formatstring.c program in this lab’s directory with:
$ gcc formatstring.c -o formatstring $ cp formatstring formatstr-root $ sudo chown root formatstr-root $ sudo chmod +s formatstr-root
Note the compiler warning after the first command.
You now have two programs to play with:
formatstring itself and a setuid-root version (
Have a look at the program. It contains some “secret” values that we’ll try to access and some user inputs (yes, there’s a buffer overflow vulnerability here too, but that’s not today’s topic).
To make the attack slightly easier, the program prints out the memory addresses where the secrets are stored, when you run it. Notice that if you run the program several times, you will get different addresses each time.
Consider the following code in lines 27-34:
printf("Please enter a decimal integer\n"); scanf("%d", &int_input); /* getting an input from user */ printf("Please enter a string\n"); scanf("%s", user_input); /* getting a string from user */ /* Vulnerable place */ printf(user_input); printf("\n");
printf directly prints out a string that came from the input, using it as a format string.
Try running the program and entering a few strings, including ones such as
%x and see what happens.
Your tasks for this lab are:
secretto a value of your choosing.
printf("Please enter a decimal integer\n"); scanf("%d", &int_input); /* getting an input from user */
Then recompile the normal and setuid-root programs. Next, we are going to turn off address space layout randomisation. Type the following command (it stays off until you reboot or reactivate it):
$ sudo sysctl -w kernel.randomize_va_space=0
Check if this works by running the formatstring program twice in a row.
It should give you the same address for
secret both times.
Task: Repeat Tasks (i)-(iv) for the modified program.
Note: you may get unlucky and encounter a fixed address for
secret which contains control characters such as
If this happens, ask a lab assistant for help.
formatstr-rootas a subprocess. You will need to use something like Python’s
pexpectto handle bidirectional communication between programs.
Your attacks must work against the setuid-root programs indicated, but you are welcome to create variations on the programs to explore what is going on.
For example, you can set the
-g compiler option and then debug the (non-setuid) program with
gdb formatstring to investigate its exact stack layout.
Instead of typing your attack by hand each time, you can put it in a file and run
$ ./formatstr-root < file
file is the file with your attack strings.
Place one input on each line in the file.
For example, if you want to answer 42 to the numeric input and
sss to the string one, the file should look like this with a newline at the end of each line:
This technique also allows you to create format strings with some non-printable characters.
If you create a hex-encoded file
file.hex like this:
61 73 64 66 0a
… and run
xxd -p -r file.hex > file, then you will get the decoded data in
You can hex-encode too by leaving off the
(You can add spaces between the bytes; they are ignored when decoding.)
0a for a newline in a hex-encoded file.
Certain bytes will not work in a format string even if you write the string in hex and decode it to a file.
For example, a null byte (
00) terminates a string in C.
Other bytes are fine in format strings but
scanf, which reads in your string in the first place, will stop if it sees them ; this includes space (
0x20), newline (
0x0d (these are used in newlines on other platforms). If your
secret value lands at an address containing one of these bytes after you have turned ASLR off in Task 2, speak to a lab assistant.
The techniques you’ll need to use include reading from a memory location on the stack and printing out its value; reading from a memory location not on the stack to print its value and writing to a memory location on the stack. What do the different format string parameters do from an attacker’s perspective?
The secret values themselves are on the heap (since
malloc is used to allocate them), but their address, in the form of the pointer
*secret, is on the stack, which is helpful for your format string attack.
However, this is a pointer to
secret; the challenge in Task 2 in particular is to attack
secret, the address of which is not on the stack—not yet.
The format string itself comes from the variable
This is a local array, not a
malloc‘d one, so the format string will be located in
main’s stack frame, just above that of
If we nudge
printf’s internal pointer up enough times, it will point at the format string itself, where we have almost complete control over the contents.