Skip to content Skip to sidebar Skip to footer

How To Read Text Copied From Web To Txt File Using Python

I'm learning how to read text files. I used this way: f=open('sample.txt') print(f.read()) It worked fine if I typed the txt file myself. But when I copied text from a news artic

Solution 1:

You're on Windows and trying to print to the console. The print() is throwing the exception.

The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).

You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters, for input and output.

Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:

withopen("myoutput.log", "w", encoding="utf-8") as my_log:
    my_log.write(body)

Ensure you open the file with the correct encoding.

Solution 2:

I assume that you are using Python 3 from the open and print syntax you use.

The offending character u"\u2014" is an em-dash (ref). As I assume you are using Windows, maybe setting the console in UTF8 (chcp 65001) could help provided you use a not too old version.

If it is a batch script, and if the print is only here to get traces, you could use explicit encoding with error='replace'. For example assuming that you console uses code page 850:

print(f.read().encode('cp850', 'replace'))

This will replace all unmapped characters with ? - not very nice, but at least it does not raise...

Post a Comment for "How To Read Text Copied From Web To Txt File Using Python"