🇬🇧 Discover your text! A simple letter frequency analysis tool

INTRODUCTION

Have you ever wondered which letters in the Polish language are used most frequently? Or perhaps you want to check if your favorite author has their own unique „alphabetical fingerprint”? Thanks to a simple Python program, you can analyze any text in seconds, view the results in a clear format, and export them for further analysis.

Our program is an ideal tool for pupils, students, data analysts, and anyone who is simply curious about language. Just paste any fragment of text—from a short sentence to an entire novel—and the program will take care of the rest.

Principle of Operation

At the start, the program asks the user to enter text. Its length is limited only by your RAM, so it can be veeeery long 🤩. After entering the text, you decide whether the results should be exported to a CSV file, and then the tool gets to work. The results are presented in three ways to give you a full picture of the analyzed material:

Detailed Table: A clear table is displayed in the console showing how many times each letter of the Polish alphabet (from „a” to „ż”) appeared in your text.

Graphical Chart: The program generates a readable bar chart. All letters of the alphabet are on the horizontal axis, and the height of the bars corresponds to the number of occurrences. This allows you to instantly see which letters dominate the text.

CSV Export: If you want to save the results for later or conduct more advanced analysis, the program offers an option to export data to a CSV file. Such a file can be easily opened in any spreadsheet software, such as Microsoft Excel or Google Sheets.

This simple yet powerful tool opens the door to the fascinating world of text analysis and shows how much information can be extracted from ordinary words.

Below is the Python code:


PYTHON CODE

# Install required libraries
# pip install matplotlib pandas

import string
import matplotlib.pyplot as plt
import pandas as pd
import os

def text_analyzer():
    # Polish alphabet
    alphabet = 'aąbcćdeęfghijklłmnńoópqrsśtuvwxyzźż'
    counters = {letter: 0 for letter in alphabet}
    
    print("Letter Frequency Text Analyzer")
    print("-" * 40)
    user_text = input("Enter text for analysis: ")
    export_decision = input("Do you want to export the results to a CSV file? (yes/no): ").lower()

    print("\nAnalyzing text...")
    for char in user_text.lower():
        if char in counters:
            counters[char] += 1
            
    print("Analysis complete.")
    
    print("\n## Letter Occurrence Table ##")
    print("-" * 30)
    print(f"{'Letter':<10} | {'Occurrences':<20}")
    print("-" * 30)
    for letter, count in counters.items():
        print(f"{letter:<10} | {count:<20}")
    print("-" * 30)

    all_letters = list(counters.keys())
    all_counts = list(counters.values())
    
    print("\nGenerating chart...")
    plt.figure(figsize=(18, 8))
    plt.bar(all_letters, all_counts, color='skyblue')
    plt.title('Letter Frequency in Text (All Alphabet Letters)')
    plt.xlabel('Letters')
    plt.ylabel('Number of Occurrences')
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.xticks(rotation=60, ha='right', fontsize=10)
    plt.tight_layout()
    
    print("Displaying chart. Close the chart window to continue.")
    plt.show()

    if export_decision == 'yes' or export_decision == 'tak':
        print("\n## Exporting Results ##")
        try:
            export_data = {letter: [letter, counters[letter]] for letter in alphabet}
            df = pd.DataFrame(export_data, index=['Letter', 'Occurrences'])
            df.index.name = "Row Description"
            file_name = 'text_analysis.csv'
            # Using semicolon as separator for European Excel compatibility
            df.to_csv(file_name, sep=';', encoding='utf-8-sig')
            
            with open(file_name, 'a', encoding='utf-8-sig') as file:
                file.write(f'\n\nAnalyzed text:\n"{user_text}"')
                
            print(f"\n✅ Success! Results have been successfully saved to '{file_name}'")
            print(f"File location: {os.path.abspath(file_name)}")
        except Exception as e:
            print(f"\n❌ An error occurred while saving the file: {e}")
    else:
        print("\nProgram finished without exporting data (as per your decision).")

if __name__ == '__main__':
    text_analyzer()

⬆️⬆️⬆️ See in Google Colaboratory


HOW THE CODE WORKS?

The operation of the program can be described in a few simple steps that it performs after execution.

  1. Step 1: Preparation and Data Collection At the very beginning, the program prepares a list of all letters in the Polish alphabet. Then, it asks the user to enter the text for analysis and to decide whether the final results should be saved to a file.
  2. Step 2: Text Analysis The program goes through the entire entered text, character by character. For every letter it encounters (ignoring case), it adds one to the corresponding counter assigned to that letter.
  3. Step 3: Presentation of Results in a Table After the counting is finished, the program displays a clear table in the console, showing the number of occurrences next to each letter of the alphabet.
  4. Step 4: Data Visualization Using the collected data, the program creates a bar chart. All letters of the alphabet are placed on the chart, and the height of the bars reflects their calculated frequency.
  5. Step 5: Export to File (Optional) If the user expressed a desire to do so at the beginning, the program creates a file in CSV format. It saves the data regarding the occurrences of individual letters and, for context, the original text that was analyzed. Finally, it informs the user that the file has been successfully saved.

Leave a Reply

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *