This code was generated by the small Qwen2.5-coder 3B model that I installed locally on my machine. I'm amazed at how good it is despite being one of the smaller coding models!
The Code (explanation below):
import os
import re
def find_files_with_ending_98(directory):
number_pattern = r'\b\d+98\b'
for root, dirs, files in os.walk(directory):
for file in files:
try:
file_path = os.path.join(root, file)
with open(file_path, 'r', encoding='utf-8-sig') as f:
for line_number, line in enumerate(f, start=1):
if re.search(number_pattern, line):
print(f"File: {file_path}")
print(f"Line {line_number}: {line.strip()}\n")
except UnicodeDecodeError:
print(f"Warning: Could not decode file {file_path}. Skipping.")
# Start the search from the current directory
find_files_with_ending_98('.')
Context: I wanted to find a certain number that ends with 98 in one of my older text files.
Explanation:
Import Modules:
os
is used to interact with the operating system, allowing us to traverse directories and files.re
is used for regular expression operations, which help in searching for patterns like numbers ending with "98".Pattern Definition:
\b\d+98\b
is defined using a regular expression.\d+
matches one or more digits.98
matches the literal string "98".\b
ensures that "98" is matched as a whole word, not part of another number.Walking Through Directory:
os.walk(directory)
is used to traverse the directory tree starting fromdirectory
.File Handling and Searching:
utf-8
encoding.enumerate
to get both the line number and the line content.Starting Search:
'.'
) by calling the function with this argument.This program will recursively search through all files in the specified directory and its subdirectories, printing the paths of files that contain lines with numbers ending with "98".
Key Changes:
Encoding Specification:
'r', encoding='utf-8'
to'r', encoding='utf-8-sig'
.'-sig'
option tells Python to skip the BOM (Byte Order Mark) at the start of the file if it exists, which can sometimes cause decoding errors.Error Handling:
UnicodeDecodeError
.This approach should help you process files encoded in other character sets without running into decoding errors. However, keep in mind that this is a workaround, and it might not work for all files due to the varying encoding schemes used by different text files.