Input/Output in C
Reminder: C file handling
You should already be familiar with the following C functions from <stdio.h>, though you can look up details with man 3 FUNCNAME on a lab machine:
FILE *fopen(const char *path, const char *mode)opens a file, returning a file handle if successful and NULL otherwise. On some systems, it makes a difference whether you read a file in text mode (r) or binary mode (rb), the difference being that the former activates an extra "line discipline" in the C library. For example on Windows, text mode translates a lone\ncharacter to\r\n. On POSIX systems, there is no difference between text and binary mode.int fclose(FILE *file)closes a file, and flushes any pending writes. It returns 0 on success and nonzero in case of errors.int fflush(FILE *file)flushes a file: for output, it makes sure any buffered characters are written out. For input, it throws away any unread input in the c library input buffer. It returns 0 on success and nonzero in case of errors.fflush(NULL)flushes all open files.int feof(FILE *file)returns nonzero if the file in question has reached end-of-file, and zero otherwise. This is the only correct way to test for an end-of file condition.int ferror(FILE *file)returns nonzero if the file in question is in an error state. For example after afreadthat did not return the expected number of blocks, exactly one offeofandferrorwill return nonzero, indicating what happened. In case of an error, the global variableerrnowill contain the error code.size_t fread(void *dest, size_t size, size_t num, FILE *file)reads up tonumblocks ofsizebytes into the buffer atdestand returns the number of blocks read (for a block size of 1, this is the number of bytes). If the return value is less thannumthen either an error occurred or the end of the file was reached.size_t fwrite(const void *buffer, size_t size, size_t num, FILE *file)writes up tonumblocks ofsizebytes from thebufferto thefileand returns the number of blocks written. If this is less thannum, then an error occurred during the writing.
A number of utility functions build upon these:
int fprintf(FILE *stream, const char *format, ...)writes formatted output to a file, usually by callingfwritebehind the scenes.printfis a shortcut forfprintf(stdout, ...). These functions to not print a newline unless you ask for one. It returns the number of characters written, or a negative number if an error occurred.int fputs(const char *string, FILE *file)writes a zero-terminated string to the file, not including the zero terminator.putsis a shortcut that writes tostdoutand adds a newline at the end.int fputc(int c, FILE *file)writes a single character (cis cast to anunsigned charfirst) andputcharis a shortcut for writing tostdout(but does not add a newline, unlikeputs).int fgetc(FILE *file)reads a single character from a file (but returns it as an int, however it's safe to cast to char).getcharis a shortcut that reads from standard input.char *fgets(char *buffer, int size, FILE *file)reads up tosize-1bytes into the buffer, stopping early if it finds a null byte, newline or end-of-file character. It then adds a zero-terminator to the string. If it stopped early because of a newline, the newline is included in the buffer; if it stopped due to an end-of-file then this is not included. Older versions of C included agetsversion that reads from standard input and does not take a size argument; this is insecure as it can produce a buffer overflow and it should never be used.fgetsis safe if the length of the buffer is at leastsizebytes. It returns a pointer to buffer on success and NULL if an error occurred. End-of-file counts as an error for this purpose if no characters could be read at all.
Fread (and fwrite) can return in one of three different states:
- The return value is equal to the number of items you asked to read/write (third argument). This means that the read/write was successful.
- The return value is not equal to the number of items,
ferrorreturns nonzero on the file: an error occurred. The return value indicates how many items were successfully read or written before the error occurred.errnocontains more information about what happened. - The return value is not equal to the number of items,
ferrorreturns zero on the file: end of file. Callingfeofon the file will return nonzero. End of file when reading means exactly what it says; end of file when writing to something with a program on the other side (pipe, socket, pty) means the other side has closed the connection.
Exercises
For our next exercise, we investigate how the C library file functions interact with the terminal. Compile this program:
// program: input1.c //
#include <stdio.h>
#include <string.h>
#include <errno.h>
// Utility function to print an error message and return from main,
// use: return error(code, text); // in main
// prints text and the current errno value, then returns code.
int error(int ret, char* text) {
int e = errno;
printf("Error %s: %s (code %i).\n", text, strerror(e), e);
return ret;
}
int main(int argc, char **argv) {
if (argc < 2) { printf("Use: %s FILENAME\n", argv[0]); return 1; }
printf("Opening [%s]\n", argv[1]);
FILE *file = fopen(argv[1], "r");
if (file == NULL) { return error(2, "opening file"); }
char c, d;
size_t result = fread(&c, 1, 1, file);
if (result < 1) {
if (ferror(file)) {
return error(2, "reading first character");
} else {
puts("No first character - end of file?");
return 2;
}
}
printf("Read character: [%c].\n", c);
result = fread(&d, 1, 1, file);
if (result < 1) {
if (ferror(file)) {
return error(2, "reading second character");
} else {
puts("No second character - end of file?");
return 2;
}
}
printf("Read another character: [%c].\n", d);
int e = fclose(file);
if (e) { return error(2, "closing file."); }
return 0;
}
The essence of the program are the four lines
fread(&c, 1, 1, file);
printf(..., c);
fread(&d, 1, 1, file);
printf(..., d);
which read two characters from the file indicated in its first argument, then print them out. Note that we print the first character before reading the second, as this will become relevant.
The rest of the program is error handilng, and although some tutorials leave this off to make the code look easier, this sets you up for terrible habits - in the real world, errors happen and you need to handle them properly. So this program also shows the absolute minimum of error handling when you work with a file.
- Make a file
filecontaining some characters such asabcand run./input1 file. Convince yourself that the program prints the first two characters (first two bytes, to be precise). - Next, run
./input1 /dev/stdin, to make it read from standard input. Notice that it blocks. Type a few characters such asabcand notice that the program prints nothing, until you press ENTER at which point both the "Read character" and "Read another character" output appears at once. What is the reason for this? - Restart the program, and when it is waiting for input, type
ab[BACKSPACE]c[ENTER]. Explain what you see. - What happens if you enter only one character and then press ENTER? What if you press ENTER and haven't typed anything at all?
- If you want, verify that using fgetc / getchar produces the same behaviour on standard input: the program does not proceed past the first read command until you press ENTER. You can even try debugging with gdb to make sure.
- Insert the line
setbuf(file, NULL);just after thefopenline to make sure it's not the C library buffering.
Have you answered the questions in bold above before reading on?
- Next, on your terminal, execute
stty -icanon, which turns off the "icanon" setting. (The convention forsttyis that an option name in an argument turns it on, except if prefixed with a minus which turns it off.) - Rerun the program (
./input1 /dev/stdin) and see what happens now. - Turn "icanon" back on with
stty icanon. - Research what
icanondoes, for example inman sttyon a lab machine or online. - Why does BACKSPACE and
^Uwork in your bash shell, even with icanon turned off?
Next, we are going to experiment with the terminal in fully raw mode:
stty -gprints the current terminal setting in a format that you can save and load again. Take a look, then store them with the shell commandstate=$(stty -g).- Execute
stty raw, then run your program again. The terminal looks a bit "messed up" - can you tell what is happening? Try another command likelsin raw mode too if that helps. - Restore your terminal settings with
stty $state, reading them back from the variable where you stored them. You can then reset the window with^L.
There is an important warning here if you are ever benchmarking a program that produces (standard) output - the total running time will be the sum of the program's running time and the connected terminal's running time to process the output. For example, printing a sequence of newline characters is typically slower than printing a sequence of 'a's, even if both programs are doing putc in a loop, because the terminal has extra work to do on each newline.
Here is another C program to experiment with:
// input2.c //
// Reads characters in a loop and prints their hex codes.
// Quit with 'Q'.
#include <stdio.h>
int main() {
unsigned char c;
int r;
setbuf(stdout, NULL);
while(1) {
r = fread(&c, 1, 1, stdin);
if (r < 1) break; // bail out on error or eof
if (c == 'Q') break;
printf("(%02X) ", c);
}
return 0;
}
- What do you expect to happen if you run this in a "normal" terminal and type a few characters?
- Run it in a terminal with icanon turned off and type a few characters. Notice that you can still cancel it with
^C. - Run it in a terminal in raw mode and type a few characters. What happens now when you press
^C?
Luckily, we built in another way to quit the program with Q!
- In raw mode, try a few "special" keys such as ENTER, ESC, the arrow keys, function keys (F1, F2) etc.
- We turned off buffering for standard output. Why is this important - what happens otherwise and who is doing the buffering?
What actually happens when you press Control-C in a terminal in "cooked" mode is it sends the signal SIGINT (interrupt) to the connected program - the default behaviour for this signal is to terminate the program. Putting the terminal in raw mode just passes the control code for Control-C on to the program, though it would still quit if you sent it a SIGINT another way (e.g. with the kill command from another terminal). However, you could also write a signal handler in your own program that reacts to SIGINT and does something else, for example just ignores the signal.
When you type say an a, even in raw mode, the character appears on your screen even though there's no print command for it. This is the terminal echo; you can turn even this off with stty -echo but you are now typing blind! You can still start your program with ./input2 followed by ENTER, and it will still print the hex codes of everything you type from then on until you press Q but it will no longer show you the characters themselves. If you saved the terminal state before, then typing stty $state and ENTER will get you your echo back.