Python Strings encode() method - GeeksforGeeks (2024)

Python String encode() converts a string value into a collection of bytes, using an encoding scheme specified by the user.

Python String encode() Method Syntax:

Syntax: encode(encoding, errors)
Parameters:
encoding: Specifies the encoding on the basis of which encoding has to be performed.
errors: Decides how to handle the errors if they occur, e.g ‘strict’ raises Unicode error in case of exception and ‘ignore’ ignores the errors that occurred. There are six types of error response
strict – default response which raises a UnicodeDecodeError exception on failure
ignore – ignores the unencodable unicode from the result
replace – replaces the unencodable unicode to a question mark ?
xmlcharrefreplace – inserts XML character reference instead of unencodable unicode
backslashreplace – inserts a \uNNNN escape sequence instead of unencodable unicode
namereplace – inserts a \N{…} escape sequence instead of unencodable unicode
Return: Returns the string in the encoded form

Python String encode() Method Example:

Python

print("¶".encode('utf-8'))

Output:

b'\xc2\xb6'

Example 1: Code to print encoding schemes available

There are certain encoding schemes supported by Python String encode() method. We can get the supported encodings using the Python code below.

Python3

from encodings.aliases import aliases# Printing list availableprint("The available encodings are : ")print(aliases.keys())

Output:

The available encodings are : 
dict_keys(['ibm039', 'iso_ir_226', '1140', 'iso_ir_110', '1252', 'iso_8859_8', 'iso_8859_3', 'iso_ir_166', 'cp367', 'uu', 'quotedprintable', 'ibm775', 'iso_8859_16_2001', 'ebcdic_cp_ch', 'gb2312_1980', 'ibm852', 'uhc', 'macgreek', '850', 'iso2022jp_2', 'hz_gb_2312', 'elot_928', 'iso8859_1', 'eucjp', 'iso_ir_199', 'ibm865', 'cspc862latinhebrew', '863', 'iso_8859_5', 'latin4', 'windows_1253', 'csisolatingreek', 'latin5', '855', 'windows_1256', 'rot13', 'ms1361', 'windows_1254', 'ibm863', 'iso_8859_14_1998', 'utf8_ucs2', '500', 'iso8859', '775', 'l7', 'l2', 'gb18030_2000', 'l9', 'utf_32be', 'iso_ir_100', 'iso_8859_4', 'iso_ir_157', 'csibm857', 'shiftjis2004', 'iso2022jp_1', 'iso_8859_2_1987', 'cyrillic', 'ibm861', 'ms950', 'ibm437', '866', 'csibm863', '932', 'iso_8859_14', 'cskoi8r', 'csptcp154', '852', 'maclatin2', 'sjis', 'korean', '865', 'u32', 'csshiftjis', 'dbcs', 'csibm037', 'csibm1026', 'bz2', 'quopri', '860', '1255', '861', 'iso_ir_127', 'iso_celtic', 'chinese', 'l8', '1258', 'u_jis', 'cspc850multilingual', 'iso_2022_jp_2', 'greek8', 'csibm861', '646', 'unicode_1_1_utf_7', 'ibm862', 'latin2', 'ecma_118', 'csisolatinarabic', 'zlib', 'iso2022jp_3', 'ksx1001', '858', 'hkscs', 'shiftjisx0213', 'base64', 'ibm857', 'maccentraleurope', 'latin7', 'ruscii', 'cp_is', 'iso_ir_101', 'us_ascii', 'hebrew', 'ansi_x3.4_1986', 'csiso2022jp', 'iso_8859_15', 'ibm860', 'ebcdic_cp_us', 'x_mac_simp_chinese', 'csibm855', '1250', 'maciceland', 'iso_ir_148', 'iso2022jp', 'u16', 'u7', 's_jisx0213', 'iso_8859_6_1987', 'csisolatinhebrew', 'csibm424', 'quoted_printable', 'utf_16le', 'tis260', 'utf', 'x_mac_trad_chinese', '1256', 'cp866u', 'jisx0213', 'csiso58gb231280', 'windows_1250', 'cp1361', 'kz_1048', 'asmo_708', 'utf_16be', 'ecma_114', 'eucjis2004', 'x_mac_japanese', 'utf8', 'iso_ir_6', 'cp_gr', '037', 'big5_tw', 'eucgb2312_cn', 'iso_2022_jp_3', 'euc_cn', 'iso_8859_13', 'iso_8859_5_1988', 'maccyrillic', 'ks_c_5601_1987', 'greek', 'ibm869', 'roman8', 'csibm500', 'ujis', 'arabic', 'strk1048_2002', '424', 'iso_8859_11_2001', 'l5', 'iso_646.irv_1991', '869', 'ibm855', 'eucjisx0213', 'latin1', 'csibm866', 'ibm864', 'big5_hkscs', 'sjis_2004', 'us', 'iso_8859_7', 'macturkish', 'iso_2022_jp_2004', '437', 'windows_1255', 's_jis_2004', 's_jis', '1257', 'ebcdic_cp_wt', 'iso2022jp_2004', 'ms949', 'utf32', 'shiftjis', 'latin', 'windows_1251', '1125', 'ks_x_1001', 'iso_8859_10_1992', 'mskanji', 'cyrillic_asian', 'ibm273', 'tis620', '1026', 'csiso2022kr', 'cspc775baltic', 'iso_ir_58', 'latin8', 'ibm424', 'iso_ir_126', 'ansi_x3.4_1968', 'windows_1257', 'windows_1252', '949', 'base_64', 'ms936', 'csisolatin2', 'utf7', 'iso646_us', 'macroman', '1253', '862', 'iso_8859_1_1987', 'csibm860', 'gb2312_80', 'latin10', 'ksc5601', 'iso_8859_10', 'utf8_ucs4', 'csisolatin4', 'ebcdic_cp_be', 'iso_8859_1', 'hzgb', 'ansi_x3_4_1968', 'ks_c_5601', 'l3', 'cspc8codepage437', 'iso_8859_7_1987', '8859', 'ibm500', 'ibm1026', 'iso_8859_6', 'csibm865', 'ibm866', 'windows_1258', 'iso_ir_138', 'l4', 'utf_32le', 'iso_8859_11', 'thai', '864', 'euc_jis2004', 'cp936', '1251', 'zip', 'unicodebigunmarked', 'csHPRoman8', 'csibm858', 'utf16', '936', 'ibm037', 'iso_8859_8_1988', '857', 'csibm869', 'ebcdic_cp_he', 'cp819', 'euccn', 'iso_8859_2', 'ms932', 'iso_2022_jp_1', 'iso_2022_kr', 'csisolatin6', 'iso_2022_jp', 'x_mac_korean', 'latin3', 'csbig5', 'hz_gb', 'csascii', 'u8', 'csisolatin5', 'csisolatincyrillic', 'ms_kanji', 'cspcp852', 'rk1048', 'iso2022jp_ext', 'csibm273', 'iso_2022_jp_ext', 'ibm858', 'ibm850', 'sjisx0213', 'tis_620_2529_1', 'l10', 'iso_ir_109', 'ibm1125', '1254', 'euckr', 'tis_620_0', 'l1', 'ibm819', 'iso2022kr', 'ibm367', '950', 'r8', 'hex', 'cp154', 'tis_620_2529_0', 'iso_8859_16', 'pt154', 'ebcdic_cp_ca', 'ibm1140', 'l6', 'csibm864', 'csisolatin1', 'csisolatin3', 'latin6', 'iso_8859_9_1989', 'iso_8859_3_1988', 'unicodelittleunmarked', 'macintosh', '273', 'latin9', 'iso_8859_4_1988', 'iso_8859_9', 'ebcdic_cp_nl', 'iso_ir_144'])

Example 2: Code to encode the string

Python3

string = "¶" # utf-8 character# trying to encode using utf-8 schemeprint(string.encode('utf-8'))

Output:

b'\xc2\xb6'

Errors when using wrong encoding scheme

Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used

Python3

string = "¶" # utf-8 character# trying to encode using ascii schemeprint(string.encode('ascii'))

Output:

Example 2: Using ‘errors’ parameter to ignore errors while encoding

Python String encode() method with errors parameter set to ‘ignore’ will ignore the errors in conversion of characters into specified encoding scheme.

Python

string = "123-¶" # utf-8 character# ignore if there are any errorsprint(string.encode('ascii', errors='ignore'))

Output:

b'123-'

Python Strings encode() method – FAQs

What is string encoding?

String encoding is the process of converting a string into a sequence of bytes. Encoding specifies how characters are represented in bytes, enabling data to be stored, transmitted, and interpreted correctly.
Common Encodings:
ASCII: A 7-bit encoding scheme for English characters.
UTF-8: A variable-width encoding scheme that can represent any character in the Unicode standard.
ISO-8859-1: An 8-bit encoding scheme for Western European languages.
Example of String Encoding:
text = "Hello, World!"
encoded_text = text.encode('utf-8') # Convert string to bytes using UTF-8 encoding
print(encoded_text) # Output: b'Hello, World!'

What is UTF-8 encoding in Python?

UTF-8 (8-bit Unicode Transformation Format) is a popular encoding scheme in Python that can encode all possible characters (code points) in Unicode. It uses one to four bytes for each character and is backward compatible with ASCII.
Characteristics of UTF-8:
Variable-Length: Uses 1 to 4 bytes for different characters.
Backward Compatibility: ASCII characters use a single byte identical to ASCII.
Universal: Can represent characters from any language or script.
Example of UTF-8 Encoding:
text = "Hello, World!"
encoded_text = text.encode('utf-8') # Encode string to UTF-8 bytes
print(encoded_text) # Output: b'Hello, World!'

What is `decode()` in Python?

The decode() method is used to convert a sequence of bytes back into a string. It reverses the encoding process and specifies the encoding scheme to interpret the bytes.
Example of decode():
# Convert bytes back to string using UTF-8 encoding
encoded_text = b'Hello, World!'
decoded_text = encoded_text.decode('utf-8')
print(decoded_text) # Output: 'Hello, World!'

What is the difference between string `encode()` and `decode()` in Python?

encode(): Converts a string into a bytes object using a specified encoding.
Syntax: string.encode(encoding='utf-8', errors='strict')
Usage: Used for converting text to bytes for storage or transmission.
Example: text.encode('utf-8')
decode(): Converts a bytes object back into a string using a specified encoding.
Syntax: bytes.decode(encoding='utf-8', errors='strict')
Usage: Used for converting bytes back to text after it has been encoded.
Example: encoded_text.decode('utf-8')

manjeet_04

Improve

Python Strings encode() method - GeeksforGeeks (2024)

Python String encode() Method Syntax:

Python String encode() Method Example:

Example 1: Code to print encoding schemes available

Example 2: Code to encode the string

Errors when using wrong encoding scheme

Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used

Example 2: Using ‘errors’ parameter to ignore errors while encoding

Python Strings encode() method – FAQs

What is string encoding?

What is UTF-8 encoding in Python?

What is decode() in Python?

What is the difference between string encode() and decode() in Python?

Please Login to comment...

What is `decode()` in Python?

What is the difference between string `encode()` and `decode()` in Python?