You are viewing a single comment's thread from:

RE: LeoThread 2024-11-23 15:13

in LeoFinance3 months ago

This code was generated by the small Qwen2.5-coder 3B model that I installed locally on my machine. I'm amazed at how good it is despite being one of the smaller coding models!

The Code (explanation below):

import os
import re

def find_files_with_ending_98(directory):
    number_pattern = r'\b\d+98\b'
    
    for root, dirs, files in os.walk(directory):
        for file in files:
            try:
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8-sig') as f:
                    for line_number, line in enumerate(f, start=1):
                        if re.search(number_pattern, line):
                            print(f"File: {file_path}")
                            print(f"Line {line_number}: {line.strip()}\n")
            except UnicodeDecodeError:
                print(f"Warning: Could not decode file {file_path}. Skipping.")

# Start the search from the current directory
find_files_with_ending_98('.')
Sort:  

Context: I wanted to find a certain number that ends with 98 in one of my older text files.

Explanation:

  1. Import Modules:

    • os is used to interact with the operating system, allowing us to traverse directories and files.
    • re is used for regular expression operations, which help in searching for patterns like numbers ending with "98".
  2. Pattern Definition:

    • The pattern \b\d+98\b is defined using a regular expression.
      • \d+ matches one or more digits.
      • 98 matches the literal string "98".
      • The \b ensures that "98" is matched as a whole word, not part of another number.
  3. Walking Through Directory:

    • os.walk(directory) is used to traverse the directory tree starting from directory.
    • For each file found in a directory or its subdirectories, it checks if the pattern matches any line in the file.
  1. File Handling and Searching:

    • The file is opened with utf-8 encoding.
    • Each line in the file is read using enumerate to get both the line number and the line content.
    • If a line matches the pattern, it prints the file path along with the line number and the matching line.
  2. Starting Search:

    • The search starts from the current directory ('.') by calling the function with this argument.

This program will recursively search through all files in the specified directory and its subdirectories, printing the paths of files that contain lines with numbers ending with "98".

Key Changes:

  1. Encoding Specification:

    • Changed 'r', encoding='utf-8' to 'r', encoding='utf-8-sig'.
      • The '-sig' option tells Python to skip the BOM (Byte Order Mark) at the start of the file if it exists, which can sometimes cause decoding errors.
  2. Error Handling:

    • Added a try-except block around the file opening code to catch and handle UnicodeDecodeError.

This approach should help you process files encoded in other character sets without running into decoding errors. However, keep in mind that this is a workaround, and it might not work for all files due to the varying encoding schemes used by different text files.