Contents Menu Expand Light mode Dark mode Auto light/dark mode
charset_normalizer 2.0.12 documentation
charset_normalizer 2.0.12 documentation
  • Support
  • Installation
  • Basic Usage
  • Advanced Search
  • Handling Result
  • Miscellaneous
  • Command Line Interface
  • Frequently asked questions
  • Why should I migrate to Charset-Normalizer?
  • Developer Interfaces

Advanced Search ¶

Charset Normalizer method from_bytes , from_fp and from_path provide some optional parameters that can be tweaked.

As follow

from charset_normalizer import from_bytes
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')
results = from_bytes(
    my_byte_str,
    steps=10,  # Number of steps/block to extract from my_byte_str
    chunk_size=512,  # Set block size of each extraction
    threshold=0.2,  # Maximum amount of chaos allowed on first pass
    cp_isolation=None,  # Finite list of encoding to use when searching for a match
    cp_exclusion=None,  # Finite list of encoding to avoid when searching for a match
    preemptive_behaviour=True,  # Determine if we should look into my_byte_str (ASCII-Mode) for pre-defined encoding
    explain=False  # Print on screen what is happening when searching for a match
)
								

Using CharsetMatches ¶

Here, results is a CharsetMatches object. It behave like a list but does not implements all related methods. Initially, it is sorted. Calling best() is sufficient to extract the most probable result.

class charset_normalizer. CharsetMatches ( results : Optional [ List [ charset_normalizer.models.CharsetMatch ] ] = None ) [source] ¶

Container with every CharsetMatch items ordered by default from most probable to the less one. Act like a list(iterable) but does not implements all related methods.

append ( item : charset_normalizer.models.CharsetMatch ) → None [source] ¶

Insert a single match. Will be inserted accordingly to preserve sort. Can be inserted as a submatch.

best ( ) → Optional [ charset_normalizer.models.CharsetMatch ] [source] ¶

Simply return the first match. Strict equivalent to matches[0].

first ( ) → Optional [ charset_normalizer.models.CharsetMatch ] [source] ¶

Redundant method, call the method best(). Kept for BC reasons.

List behaviour ¶

Like said earlier, CharsetMatches object behave like a list.

# Call len on results also work
if not results:
    print('No match for your sequence')
# Iterate over results like a list
for match in results:
    print(match.encoding, 'can decode properly your sequence using', match.alphabets, 'and language', match.language)
# Using index to access results
if results:
    print(str(results[0]))
									

Using best() ¶

Like said above, CharsetMatches object behave like a list and it is sorted by default after getting results from from_bytes , from_fp or from_path .

Using best() return the most probable result, the first entry of the list. Eg. idx 0. It return a CharsetMatch object as return value or None if there is not results inside it.

result = results.best()
						

Calling first() ¶

The very same thing than calling the method best() .

Class aliases ¶

CharsetMatches is also known as CharsetDetector , CharsetDoctor and CharsetNormalizerMatches . It is useful if you prefer short class name.

Verbose output ¶

You may want to understand why a specific encoding was not picked by charset_normalizer. All you have to do is passing explain to True when using methods from_bytes , from_fp or from_path .

Next
Handling Result
Previous
Installation
Copyright © 2019, Ahmed TAHRI | Created using Sphinx and @pradyunsg 's Furo theme . | Show Source
Contents
  • Advanced Search
    • Using CharsetMatches
    • List behaviour
    • Using best()
    • Calling first()
    • Class aliases
    • Verbose output