In today’s world, security is critical in many applications. As a result, secure information storage in the database is required, and encoded copies of strings must be saved. Encoding, at its most basic, is a method of converting characters (such as letters, punctuation, symbols, whitespace, and control characters) to numbers and, eventually, bits. Each character can be encoded to a distinct bit sequence. To achieve this encoding process, Python encode() function is used. The Python String encode() function encodes a string using the given encoding scheme. This article focuses on Python encode() function, different encoding techniques, its applications along with examples.
Definition
- The Python encode() is a built-in string method that is used to return an encoded version of the string according to the encoded standard.
- Python encode() string function is used to secure the string by encoding it based on the specified encoding type.
Python encode() Syntax
Python dictionary encode() function follows the below-mentioned syntax:
string.encode(encoding, errors)
String encode() Parameters
By default, this function does not require any parameters. But the Python encode() function accepts a maximum 2 parameters:
- encoding (Optional) – specifies the encoding standard to be used. If no encoding is specified, Python considers UTF-8 as its default encoding standard.
- errors (Optional) – if any errors occur, it decided how to handle them. The default errors value is ‘strict’. There are 6 types of error responses:
- strict – default response, which throws a UnicodeDecodeError exception if it fails.
- ignore – it ignores all the unencodable characters from the result
- replace – it replaces the unencodable characters from the result with a question mark ‘?’
- xmlcharrefreplace – substitutes an XML character reference for unencodable Unicode
- backslashreplace – instead of unencodable Unicode, inserts an \uNNNN escape sequence
- namereplace – instead of unencodable Unicode, it inserts a \N{…} escape sequence
Return value from encode()
The Python string encode() function returns an encoded string.
Program to check all Encoding Standards in Python
Python provides a wide range of Encoding formats and standards to be used. The below program displays all the Encoding Standards in Python.
from encodings.aliases import aliasesprint("The available encodings are : ")print(aliases.keys())
Output
The available encodings are :
dict_keys(['646', 'ansi_x3.4_1968', 'ansi_x3_4_1968', 'ansi_x3.4_1986', 'cp367', 'csascii', 'ibm367', 'iso646_us', 'iso_646.irv_1991', 'iso_ir_6', 'us', 'us_ascii', 'base64', 'base_64', 'big5_tw', 'csbig5', 'big5_hkscs', 'hkscs', 'bz2', '037', 'csibm037', 'ebcdic_cp_ca', 'ebcdic_cp_nl', 'ebcdic_cp_us', 'ebcdic_cp_wt', 'ibm037', 'ibm039', '1026', 'csibm1026', 'ibm1026', '1125', 'ibm1125', 'cp866u', 'ruscii', '1140', 'ibm1140', '1250', 'windows_1250', '1251', 'windows_1251', '1252', 'windows_1252', '1253', 'windows_1253', '1254', 'windows_1254', '1255', 'windows_1255', '1256', 'windows_1256', '1257', 'windows_1257', '1258', 'windows_1258', '273', 'ibm273', 'csibm273', '424', 'csibm424', 'ebcdic_cp_he', 'ibm424', '437', 'cspc8codepage437', 'ibm437', '500', 'csibm500', 'ebcdic_cp_be', 'ebcdic_cp_ch', 'ibm500', '775', 'cspc775baltic', 'ibm775', '850', 'cspc850multilingual', 'ibm850', '852', 'cspcp852', 'ibm852', '855', 'csibm855', 'ibm855', '857', 'csibm857', 'ibm857', '858', 'csibm858', 'ibm858', '860', 'csibm860', 'ibm860', '861', 'cp_is', 'csibm861', 'ibm861', '862', 'cspc862latinhebrew', 'ibm862', '863', 'csibm863', 'ibm863', '864', 'csibm864', 'ibm864', '865', 'csibm865', 'ibm865', '866', 'csibm866', 'ibm866', '869', 'cp_gr', 'csibm869', 'ibm869', '932', 'ms932', 'mskanji', 'ms_kanji', '949', 'ms949', 'uhc', '950', 'ms950', 'jisx0213', 'eucjis2004', 'euc_jis2004', 'eucjisx0213', 'eucjp', 'ujis', 'u_jis', 'euckr', 'korean', 'ksc5601', 'ks_c_5601', 'ks_c_5601_1987', 'ksx1001', 'ks_x_1001', 'gb18030_2000', 'chinese', 'csiso58gb231280', 'euc_cn', 'euccn', 'eucgb2312_cn', 'gb2312_1980', 'gb2312_80', 'iso_ir_58', '936', 'cp936', 'ms936', 'hex', 'roman8', 'r8', 'csHPRoman8', 'cp1051', 'ibm1051', 'hzgb', 'hz_gb', 'hz_gb_2312', 'csiso2022jp', 'iso2022jp', 'iso_2022_jp', 'iso2022jp_1', 'iso_2022_jp_1', 'iso2022jp_2', 'iso_2022_jp_2', 'iso_2022_jp_2004', 'iso2022jp_2004', 'iso2022jp_3', 'iso_2022_jp_3', 'iso2022jp_ext', 'iso_2022_jp_ext', 'csiso2022kr', 'iso2022kr', 'iso_2022_kr', 'csisolatin6', 'iso_8859_10', 'iso_8859_10_1992', 'iso_ir_157', 'l6', 'latin6', 'thai', 'iso_8859_11', 'iso_8859_11_2001', 'iso_8859_13', 'l7', 'latin7', 'iso_8859_14', 'iso_8859_14_1998', 'iso_celtic', 'iso_ir_199', 'l8', 'latin8', 'iso_8859_15', 'l9', 'latin9', 'iso_8859_16', 'iso_8859_16_2001', 'iso_ir_226', 'l10', 'latin10', 'csisolatin2', 'iso_8859_2', 'iso_8859_2_1987', 'iso_ir_101', 'l2', 'latin2', 'csisolatin3', 'iso_8859_3', 'iso_8859_3_1988', 'iso_ir_109', 'l3', 'latin3', 'csisolatin4', 'iso_8859_4', 'iso_8859_4_1988', 'iso_ir_110', 'l4', 'latin4', 'csisolatincyrillic', 'cyrillic', 'iso_8859_5', 'iso_8859_5_1988', 'iso_ir_144', 'arabic', 'asmo_708', 'csisolatinarabic', 'ecma_114', 'iso_8859_6', 'iso_8859_6_1987', 'iso_ir_127', 'csisolatingreek', 'ecma_118', 'elot_928', 'greek', 'greek8', 'iso_8859_7', 'iso_8859_7_1987', 'iso_ir_126', 'csisolatinhebrew', 'hebrew', 'iso_8859_8', 'iso_8859_8_1988', 'iso_ir_138', 'csisolatin5', 'iso_8859_9', 'iso_8859_9_1989', 'iso_ir_148', 'l5', 'latin5', 'cp1361', 'ms1361', 'cskoi8r', 'kz_1048', 'rk1048', 'strk1048_2002', '8859', 'cp819', 'csisolatin1', 'ibm819', 'iso8859', 'iso8859_1', 'iso_8859_1', 'iso_8859_1_1987', 'iso_ir_100', 'l1', 'latin', 'latin1', 'maccyrillic', 'macgreek', 'maciceland', 'maccentraleurope', 'maclatin2', 'macintosh', 'macroman', 'macturkish', 'ansi', 'dbcs', 'csptcp154', 'pt154', 'cp154', 'cyrillic_asian', 'quopri', 'quoted_printable', 'quotedprintable', 'rot13', 'csshiftjis', 'shiftjis', 'sjis', 's_jis', 'shiftjis2004', 'sjis_2004', 's_jis_2004', 'shiftjisx0213', 'sjisx0213', 's_jisx0213', 'tis260', 'tis620', 'tis_620_0', 'tis_620_2529_0', 'tis_620_2529_1', 'iso_ir_166', 'u16', 'utf16', 'unicodebigunmarked', 'utf_16be', 'unicodelittleunmarked', 'utf_16le', 'u32', 'utf32', 'utf_32be', 'utf_32le', 'u7', 'utf7', 'unicode_1_1_utf_7', 'u8', 'utf', 'utf8', 'utf8_ucs2', 'utf8_ucs4', 'cp65001', 'uu', 'zip', 'zlib', 'x_mac_japanese', 'x_mac_korean', 'x_mac_simp_chinese', 'x_mac_trad_chinese'])
Example 1: Encode to Default UTF-8 Encoding
Example
# Python program to illustrate encode()my_str = 'Good Morning'print('String is:', my_str)# encodes to default utf-8print('Encoded string is:', my_str.encode())print('')word = 'PŸTHØN'print('String is:', word)# encodes to default utf-8print('Encoded string is:', word.encode())
Output
String is: Good MorningEncoded string is: b'Good Morning'String is: PŸTHØNEncoded string is: b'P\xc5\xb8TH\xc3\x98N'
Example 2: Encoding with other Standards
Example
# Python program to illustrate encode()my_str = 'Good Môrning'print('String is:', my_str)# encodes to latin10print('Encoded string is:', my_str.encode('latin10'))print('')word = 'Δ π θ'print('String is:', word)# encodes to greekprint('Encoded string is:', word.encode('greek'))
Output
String is: Good MôrningEncoded string is: b'Good M\xf4rning'String is: Δ π θEncoded string is: b'\xc4 \xf0 \xe8'
Example 3: Encoding with error parameter
# Python program to illustrate encode()greet = 'Wełcøme'print('The encoded version (with ignore) is:', greet.encode("ascii","ignore"))print('The encoded version (with replace) is:', greet.encode("ascii","replace"))print('The encoded version (with namereplace) is:', greet.encode("ascii","namereplace"))print('The encoded version (with backslashreplace) is:', greet.encode("ascii","backslashreplace"))print('The encoded version (with xmlcharrefreplace) is:', greet.encode("ascii","xmlcharrefreplace"))
Output
The encoded version (with ignore) is: b'Wecme'The encoded version (with replace) is: b'We?c?me'The encoded version (with namereplace) is: b'We\\N{LATIN SMALL LETTER L WITH STROKE}c\\N{LATIN SMALL LETTER O WITH STROKE}me'The encoded version (with backslashreplace) is: b'We\\u0142c\\xf8me'The encoded version (with xmlcharrefreplace) is: b'Wełcøme'
Frequently Asked Questions
Q1. What does encode() do in Python?
In today’s world, security is critical in many applications. As a result, secure information storage in the database is required, and encoded copies of strings must be saved. To achieve this encoding process, Python encode() function is used. The Python String encode() function encodes a string using the given encoding scheme.
Q2. How do you declare an encoding in Python?
Python dictionary encode() function follows the below-mentioned syntax:
string.encode(encoding, errors)
Q3. What is Python default encoding?
If no parameters are specified, then the Python string encode() function uses its default values. For the encoding parameter, the default value is UTF-8, and for the errors parameter, the default value is strict.
Example
msg = 'Edücatioñ'print('Encoding with default parameters:', msg.encode())print('Encoding with parameters:', msg.encode('utf8'))
Output
Encoding with default parameters: b'Ed\xc3\xbccatio\xc3\xb1'Encoding with parameters: b'Ed\xc3\xbccatio\xc3\xb1'
Q4. How do I encode a URL in Python?
To encode a URL in Python, we first need to import the ‘urllib’ module. Encoding a URL can be done in 3 methods:
parse.quote()
import urllibmsg = 'I ãm a študeńt'print(urllib.parse.quote(msg))
Output
I%20%C3%A3m%20a%20%C5%A1tude%C5%84t
parse.quote_plus(): Encodes spaces to ‘+’
import urllibmsg = 'I ãm a študeńt'print(urllib.parse.quote_plus(msg))
Output
I+%C3%A3m+a+%C5%A1tude%C5%84t
parse.urlencode(): Encodes multiple parameters
import urllibmsg = {'A': 'Hi', 'website': 'www.apple.com'}print(urllib.parse.urlencode(msg))
Output
A=Hi&website=www.apple.com