Python String encode() converts a string value into a collection of bytes, using an encoding scheme specified by the user.
Python String encode() Method Syntax:
Syntax: encode(encoding, errors)
Parameters:
- encoding: Specifies the encoding on the basis of which encoding has to be performed.
- errors: Decides how to handle the errors if they occur, e.g ‘strict’ raises Unicode error in case of exception and ‘ignore’ ignores the errors that occurred. There are six types of error response
- strict – default response which raises a UnicodeDecodeError exception on failure
- ignore – ignores the unencodable unicode from the result
- replace – replaces the unencodable unicode to a question mark ?
- xmlcharrefreplace – inserts XML character reference instead of unencodable unicode
- backslashreplace – inserts a \uNNNN escape sequence instead of unencodable unicode
- namereplace – inserts a \N{…} escape sequence instead of unencodable unicode
Return: Returns the string in the encoded form
Python String encode() Method Example:
print("¶".encode('utf-8'))
Output:
b'\xc2\xb6'
Example 1: Code to print encoding schemes available
There are certain encoding schemes supported by Python String encode() method. We can get the supported encodings using the Python code below.
from encodings.aliases import aliases# Printing list availableprint("The available encodings are : ")print(aliases.keys())
Output:
The available encodings are :
dict_keys(['ibm039', 'iso_ir_226', '1140', 'iso_ir_110', '1252', 'iso_8859_8', 'iso_8859_3', 'iso_ir_166', 'cp367', 'uu', 'quotedprintable', 'ibm775', 'iso_8859_16_2001', 'ebcdic_cp_ch', 'gb2312_1980', 'ibm852', 'uhc', 'macgreek', '850', 'iso2022jp_2', 'hz_gb_2312', 'elot_928', 'iso8859_1', 'eucjp', 'iso_ir_199', 'ibm865', 'cspc862latinhebrew', '863', 'iso_8859_5', 'latin4', 'windows_1253', 'csisolatingreek', 'latin5', '855', 'windows_1256', 'rot13', 'ms1361', 'windows_1254', 'ibm863', 'iso_8859_14_1998', 'utf8_ucs2', '500', 'iso8859', '775', 'l7', 'l2', 'gb18030_2000', 'l9', 'utf_32be', 'iso_ir_100', 'iso_8859_4', 'iso_ir_157', 'csibm857', 'shiftjis2004', 'iso2022jp_1', 'iso_8859_2_1987', 'cyrillic', 'ibm861', 'ms950', 'ibm437', '866', 'csibm863', '932', 'iso_8859_14', 'cskoi8r', 'csptcp154', '852', 'maclatin2', 'sjis', 'korean', '865', 'u32', 'csshiftjis', 'dbcs', 'csibm037', 'csibm1026', 'bz2', 'quopri', '860', '1255', '861', 'iso_ir_127', 'iso_celtic', 'chinese', 'l8', '1258', 'u_jis', 'cspc850multilingual', 'iso_2022_jp_2', 'greek8', 'csibm861', '646', 'unicode_1_1_utf_7', 'ibm862', 'latin2', 'ecma_118', 'csisolatinarabic', 'zlib', 'iso2022jp_3', 'ksx1001', '858', 'hkscs', 'shiftjisx0213', 'base64', 'ibm857', 'maccentraleurope', 'latin7', 'ruscii', 'cp_is', 'iso_ir_101', 'us_ascii', 'hebrew', 'ansi_x3.4_1986', 'csiso2022jp', 'iso_8859_15', 'ibm860', 'ebcdic_cp_us', 'x_mac_simp_chinese', 'csibm855', '1250', 'maciceland', 'iso_ir_148', 'iso2022jp', 'u16', 'u7', 's_jisx0213', 'iso_8859_6_1987', 'csisolatinhebrew', 'csibm424', 'quoted_printable', 'utf_16le', 'tis260', 'utf', 'x_mac_trad_chinese', '1256', 'cp866u', 'jisx0213', 'csiso58gb231280', 'windows_1250', 'cp1361', 'kz_1048', 'asmo_708', 'utf_16be', 'ecma_114', 'eucjis2004', 'x_mac_japanese', 'utf8', 'iso_ir_6', 'cp_gr', '037', 'big5_tw', 'eucgb2312_cn', 'iso_2022_jp_3', 'euc_cn', 'iso_8859_13', 'iso_8859_5_1988', 'maccyrillic', 'ks_c_5601_1987', 'greek', 'ibm869', 'roman8', 'csibm500', 'ujis', 'arabic', 'strk1048_2002', '424', 'iso_8859_11_2001', 'l5', 'iso_646.irv_1991', '869', 'ibm855', 'eucjisx0213', 'latin1', 'csibm866', 'ibm864', 'big5_hkscs', 'sjis_2004', 'us', 'iso_8859_7', 'macturkish', 'iso_2022_jp_2004', '437', 'windows_1255', 's_jis_2004', 's_jis', '1257', 'ebcdic_cp_wt', 'iso2022jp_2004', 'ms949', 'utf32', 'shiftjis', 'latin', 'windows_1251', '1125', 'ks_x_1001', 'iso_8859_10_1992', 'mskanji', 'cyrillic_asian', 'ibm273', 'tis620', '1026', 'csiso2022kr', 'cspc775baltic', 'iso_ir_58', 'latin8', 'ibm424', 'iso_ir_126', 'ansi_x3.4_1968', 'windows_1257', 'windows_1252', '949', 'base_64', 'ms936', 'csisolatin2', 'utf7', 'iso646_us', 'macroman', '1253', '862', 'iso_8859_1_1987', 'csibm860', 'gb2312_80', 'latin10', 'ksc5601', 'iso_8859_10', 'utf8_ucs4', 'csisolatin4', 'ebcdic_cp_be', 'iso_8859_1', 'hzgb', 'ansi_x3_4_1968', 'ks_c_5601', 'l3', 'cspc8codepage437', 'iso_8859_7_1987', '8859', 'ibm500', 'ibm1026', 'iso_8859_6', 'csibm865', 'ibm866', 'windows_1258', 'iso_ir_138', 'l4', 'utf_32le', 'iso_8859_11', 'thai', '864', 'euc_jis2004', 'cp936', '1251', 'zip', 'unicodebigunmarked', 'csHPRoman8', 'csibm858', 'utf16', '936', 'ibm037', 'iso_8859_8_1988', '857', 'csibm869', 'ebcdic_cp_he', 'cp819', 'euccn', 'iso_8859_2', 'ms932', 'iso_2022_jp_1', 'iso_2022_kr', 'csisolatin6', 'iso_2022_jp', 'x_mac_korean', 'latin3', 'csbig5', 'hz_gb', 'csascii', 'u8', 'csisolatin5', 'csisolatincyrillic', 'ms_kanji', 'cspcp852', 'rk1048', 'iso2022jp_ext', 'csibm273', 'iso_2022_jp_ext', 'ibm858', 'ibm850', 'sjisx0213', 'tis_620_2529_1', 'l10', 'iso_ir_109', 'ibm1125', '1254', 'euckr', 'tis_620_0', 'l1', 'ibm819', 'iso2022kr', 'ibm367', '950', 'r8', 'hex', 'cp154', 'tis_620_2529_0', 'iso_8859_16', 'pt154', 'ebcdic_cp_ca', 'ibm1140', 'l6', 'csibm864', 'csisolatin1', 'csisolatin3', 'latin6', 'iso_8859_9_1989', 'iso_8859_3_1988', 'unicodelittleunmarked', 'macintosh', '273', 'latin9', 'iso_8859_4_1988', 'iso_8859_9', 'ebcdic_cp_nl', 'iso_ir_144'])
Example 2: Code to encode the string
string = "¶" # utf-8 character# trying to encode using utf-8 schemeprint(string.encode('utf-8'))
Output:
b'\xc2\xb6'
Errors when using wrong encoding scheme
Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used
string = "¶" # utf-8 character# trying to encode using ascii schemeprint(string.encode('ascii'))
Output:
UnicodeEncodeError: 'ascii' codec can't encode character '\xb6' in position 0: ordinal not in range(128)
Example 2: Using ‘errors’ parameter to ignore errors while encoding
Python String encode() method with errors parameter set to ‘ignore’ will ignore the errors in conversion of characters into specified encoding scheme.
string = "123-¶" # utf-8 character# ignore if there are any errorsprint(string.encode('ascii', errors='ignore'))
Output:
b'123-'
Python Strings encode() method – FAQs
What is string encoding?
String encoding is the process of converting a string into a sequence of bytes. Encoding specifies how characters are represented in bytes, enabling data to be stored, transmitted, and interpreted correctly.
Common Encodings:
- ASCII: A 7-bit encoding scheme for English characters.
- UTF-8: A variable-width encoding scheme that can represent any character in the Unicode standard.
- ISO-8859-1: An 8-bit encoding scheme for Western European languages.
Example of String Encoding:
text = "Hello, World!"
encoded_text = text.encode('utf-8') # Convert string to bytes using UTF-8 encoding
print(encoded_text) # Output: b'Hello, World!'
What is UTF-8 encoding in Python?
UTF-8 (8-bit Unicode Transformation Format) is a popular encoding scheme in Python that can encode all possible characters (code points) in Unicode. It uses one to four bytes for each character and is backward compatible with ASCII.
Characteristics of UTF-8:
- Variable-Length: Uses 1 to 4 bytes for different characters.
- Backward Compatibility: ASCII characters use a single byte identical to ASCII.
- Universal: Can represent characters from any language or script.
Example of UTF-8 Encoding:
text = "Hello, World!"
encoded_text = text.encode('utf-8') # Encode string to UTF-8 bytes
print(encoded_text) # Output: b'Hello, World!'
What is decode()
in Python?
The
decode()
method is used to convert a sequence of bytes back into a string. It reverses the encoding process and specifies the encoding scheme to interpret the bytes.Example of
decode()
:# Convert bytes back to string using UTF-8 encoding
encoded_text = b'Hello, World!'
decoded_text = encoded_text.decode('utf-8')
print(decoded_text) # Output: 'Hello, World!'
What is the difference between string encode()
and decode()
in Python?
encode()
: Converts a string into a bytes object using a specified encoding.
- Syntax:
string.encode(encoding='utf-8', errors='strict')
- Usage: Used for converting text to bytes for storage or transmission.
- Example:
text.encode('utf-8')
decode()
: Converts a bytes object back into a string using a specified encoding.
- Syntax:
bytes.decode(encoding='utf-8', errors='strict')
- Usage: Used for converting bytes back to text after it has been encoded.
- Example:
encoded_text.decode('utf-8')
Previous Article
Python String count() Method
Next Article
Python String endswith() Method