Chapter 6 Parsing files line by line

6.1 Learning objectives

  • Understand how to read files in Python

  • Learn how to iterate through lines in a file using a for loop

6.2 Create a Text File

  • Let’s create a simple text file called sample.txt using the Jupyter text editor

    The file should contain the following lines:

    apple
    banana
    cherry
    date
    

6.3 Open A File Stream

A file stream is like a pipeline that lets you read data from a file one piece at a time. The most common way to open a file is using the open() function.

  • Create a new Python script called 03-parse-text-file.py

    #!/usr/bin/env python3
    
    import sys
    
    my_file = open( sys.argv[1] )
    print(my_file)
  • Save it and make it executable

    chmod +x 03-parse-text-file.py
  • Run the script with the file name as an argument

    ./03-parse-text-file.py sample.txt

    This will print something like

    <_io.TextIOWrapper name='sample1.txt' mode='r' encoding='UTF-8'>

As you can see, the print() function can’t print the file content directly. This output just indicates the file sample1.txt is opened in read mode ('r') with UTF-8 encoding.

6.4 Add a for loop

To read the file and print each line, we can use a for loop.

  • Update 03-parse-text-file.py:

    #!/usr/bin/env python3
    
    import sys
    
    my_file = open(sys.argv[1])
    
    # for iterates through files
    for my_line in my_file:
        # objects have methods
        my_line = my_line.rstrip("\n")
        print( my_line )
    
    my_file.close()
    • We use my_file.close() to close the file after we’re done reading it. This is important to free up system resources.
  • Save it and run the script again

    ./03-parse-text-file.py sample1.txt

    This will print each line of the file without extra spaces or newlines

    apple
    banana
    cherry
    date

6.5 Summary

Congratulations! You have just:

  • Created a Python script that reads a file

  • Used a for loop to iterate through each line in the file

  • Printed each line without extra spaces or newlines