Is orjson faster than JSON?

Performance: orjson is a fast, correct JSON library for Python . It is a C extension that is up to 20 times faster than the built-in json module in Python. This speed is achieved by using highly optimized C code. Functionality: orjson is designed to be a drop-in replacement for the json module.

What is Huggingface in Python?

Hugging Face is an online community where people can team up, explore, and work together on machine-learning projects . Hugging Face Hub is a cool place with over 350,000 models, 75,000 datasets, and 150,000 demo apps, all free and open to everyone.

Ijson is an iterative JSON parser with standard Python iterator interfaces . Installation. Usage. High-level interfaces.

What is the use of CVzone in Python?

CVzone is the main Framework that is used in this prototype, it is a computer vision package that makes us easy to run like face detection, hand tracking, pose estimation, etc., and also image processing and other AI functions .

What is the BCrypt function in Python?

BCrypt is a password hashing algorithm that is considered one of the most secure algorithms for password hashing in Python . BCrypt is designed to be slow, which makes it more difficult for hackers to crack the hashed passwords.

What is the purpose of Runtimeerror in Python?

A program with a runtime error is one that passed the interpreter's syntax checks, and started to execute . However, during the execution of one of the statements in the program, an error occurred that caused the interpreter to stop executing the program and display an error message.

What causes MemoryError in Python?

A MemoryError means that the interpreter has run out of memory to allocate to your Python program. This may be due to an issue in the setup of the Python environment or it may be a concern with the code itself loading too much data at the same time .

orjson (2024)

orjson is a fast, correct JSON library for Python. Itbenchmarks as the fastest Pythonlibrary for JSON and is more correct than the standard json library or otherthird-party libraries. It serializesdataclass,datetime,numpy, andUUID instances natively.

Its features and drawbacks compared to other Python JSON libraries:

serializes dataclass instances 40-50x as fast as other libraries
serializes datetime, date, and time instances to RFC 3339 format,e.g., "1970-01-01T00:00:00+00:00"
serializes numpy.ndarray instances 4-12x as fast with 0.3x the memoryusage of other libraries
pretty prints 10x to 20x as fast as the standard library
serializes to bytes rather than str, i.e., is not a drop-in replacement
serializes str without escaping unicode to ASCII, e.g., "好" rather than"\\u597d"
serializes float 10x as fast and deserializes twice as fast as otherlibraries
serializes subclasses of str, int, list, and dict natively,requiring default to specify how to serialize others
serializes arbitrary types using a default hook
has strict UTF-8 conformance, more correct than the standard library
has strict JSON conformance in not supporting Nan/Infinity/-Infinity
has an option for strict JSON conformance on 53-bit integers with defaultsupport for 64-bit
does not provide load() or dump() functions for reading from/writing tofile-like objects

orjson supports CPython 3.8, 3.9, 3.10, 3.11, 3.12, and 3.13. It distributesamd64/x86_64, aarch64/armv8, arm7, POWER/ppc64le, and s390x wheels for Linux,amd64 and aarch64 wheels for macOS, and amd64 and i686/x86 wheels for Windows.orjson does not and will not support PyPy. orjson does not and will notsupport PEP 554 subinterpreters. Releases follow semantic versioning andserializing a new object type without an opt-in flag is considered abreaking change.

orjson is licensed under both the Apache 2.0 and MIT licenses. Therepository and issue tracker isgithub.com/ijl/orjson, and patches may besubmitted there. There is aCHANGELOGavailable in the repository.

Usage

Install

To install a wheel from PyPI:

pip install --upgrade "pip>=20.3" # manylinux_x_y, universal2 wheel supportpip install --upgrade orjson

To build a wheel, see packaging.

Quickstart

This is an example of serializing, with options specified, and deserializing:

>>> import orjson, datetime, numpy>>> data = { "type": "job", "created_at": datetime.datetime(1970, 1, 1), "status": "🆗", "payload": numpy.array([[1, 2], [3, 4]]),}>>> orjson.dumps(data, option=orjson.OPT_NAIVE_UTC | orjson.OPT_SERIALIZE_NUMPY)b'{"type":"job","created_at":"1970-01-01T00:00:00+00:00","status":"\xf0\x9f\x86\x97","payload":[[1,2],[3,4]]}'>>> orjson.loads(_){'type': 'job', 'created_at': '1970-01-01T00:00:00+00:00', 'status': '🆗', 'payload': [[1, 2], [3, 4]]}

Migrating

orjson version 3 serializes more types than version 2. Subclasses of str,int, dict, and list are now serialized. This is faster and more similarto the standard library. It can be disabled withorjson.OPT_PASSTHROUGH_SUBCLASS.dataclasses.dataclass instancesare now serialized by default and cannot be customized in adefault function unless option=orjson.OPT_PASSTHROUGH_DATACLASS isspecified. uuid.UUID instances are serialized by default.For any type that is now serialized,implementations in a default function and options enabling them can beremoved but do not need to be. There was no change in deserialization.

To migrate from the standard library, the largest difference is thatorjson.dumps returns bytes and json.dumps returns a str. Users withdict objects using non-str keys should specifyoption=orjson.OPT_NON_STR_KEYS. sort_keys is replaced byoption=orjson.OPT_SORT_KEYS. indent is replaced byoption=orjson.OPT_INDENT_2 and other levels of indentation are notsupported.

Serialize

def dumps( __obj: Any, default: Optional[Callable[[Any], Any]] = ..., option: Optional[int] = ...,) -> bytes: ...

dumps() serializes Python objects to JSON.

It natively serializesstr, dict, list, tuple, int, float, bool, None,dataclasses.dataclass, typing.TypedDict, datetime.datetime,datetime.date, datetime.time, uuid.UUID, numpy.ndarray, andorjson.Fragment instances. It supports arbitrary types through default. Itserializes subclasses of str, int, dict, list,dataclasses.dataclass, and enum.Enum. It does not serialize subclassesof tuple to avoid serializing namedtuple objects as arrays. To avoidserializing subclasses, specify the option orjson.OPT_PASSTHROUGH_SUBCLASS.

The output is a bytes object containing UTF-8.

The global interpreter lock (GIL) is held for the duration of the call.

It raises JSONEncodeError on an unsupported type. This exception messagedescribes the invalid object with the error messageType is not JSON serializable: .... To fix this, specifydefault.

It raises JSONEncodeError on a str that contains invalid UTF-8.

It raises JSONEncodeError on an integer that exceeds 64 bits by default or,with OPT_STRICT_INTEGER, 53 bits.

It raises JSONEncodeError if a dict has a key of a type other than str,unless OPT_NON_STR_KEYS is specified.

It raises JSONEncodeError if the output of default recurses to handling bydefault more than 254 levels deep.

It raises JSONEncodeError on circular references.

It raises JSONEncodeError if a tzinfo on a datetime object isunsupported.

JSONEncodeError is a subclass of TypeError. This is for compatibilitywith the standard library.

If the failure was caused by an exception in default thenJSONEncodeError chains the original exception as __cause__.

default

To serialize a subclass or arbitrary types, specify default as acallable that returns a supported type. default may be a function,lambda, or callable class instance. To specify that a type was nothandled by default, raise an exception such as TypeError.

>>> import orjson, decimal>>>def default(obj): if isinstance(obj, decimal.Decimal): return str(obj) raise TypeError>>> orjson.dumps(decimal.Decimal("0.0842389659712649442845"))JSONEncodeError: Type is not JSON serializable: decimal.Decimal>>> orjson.dumps(decimal.Decimal("0.0842389659712649442845"), default=default)b'"0.0842389659712649442845"'>>> orjson.dumps({1, 2}, default=default)orjson.JSONEncodeError: Type is not JSON serializable: set

The default callable may return an object that itselfmust be handled by default up to 254 times before an exceptionis raised.

It is important that default raise an exception if a type cannot be handled.Python otherwise implicitly returns None, which appears to the callerlike a legitimate value and is serialized:

>>> import orjson, json, rapidjson>>>def default(obj): if isinstance(obj, decimal.Decimal): return str(obj)>>> orjson.dumps({"set":{1, 2}}, default=default)b'{"set":null}'>>> json.dumps({"set":{1, 2}}, default=default)'{"set":null}'>>> rapidjson.dumps({"set":{1, 2}}, default=default)'{"set":null}'

option

To modify how data is serialized, specify option. Each option is an integerconstant in orjson. To specify multiple options, mask them together, e.g.,option=orjson.OPT_STRICT_INTEGER | orjson.OPT_NAIVE_UTC.

OPT_APPEND_NEWLINE

Append \n to the output. This is a convenience and optimization for thepattern of dumps(...) + "\n". bytes objects are immutable and thispattern copies the original contents.

>>> import orjson>>> orjson.dumps([])b"[]">>> orjson.dumps([], option=orjson.OPT_APPEND_NEWLINE)b"[]\n"

OPT_INDENT_2

Pretty-print output with an indent of two spaces. This is equivalent toindent=2 in the standard library. Pretty printing is slower and the outputlarger. orjson is the fastest compared library at pretty printing and hasmuch less of a slowdown to pretty print than the standard library does. Thisoption is compatible with all other options.

>>> import orjson>>> orjson.dumps({"a": "b", "c": {"d": True}, "e": [1, 2]})b'{"a":"b","c":{"d":true},"e":[1,2]}'>>> orjson.dumps( {"a": "b", "c": {"d": True}, "e": [1, 2]}, option=orjson.OPT_INDENT_2)b'{\n "a": "b",\n "c": {\n "d": true\n },\n "e": [\n 1,\n 2\n ]\n}'

If displayed, the indentation and linebreaks appear like this:

{ "a": "b", "c": { "d": true }, "e": [ 1, 2 ]}

This measures serializing the github.json fixture as compact (52KiB) orpretty (64KiB):

Library	compact (ms)	pretty (ms)	vs. orjson
orjson	0.03	0.04	1
ujson	0.18	0.19	4.6
rapidjson	0.1	0.12	2.9
simplejson	0.25	0.89	21.4
json	0.18	0.71	17

This measures serializing the citm_catalog.json fixture, more of a worstcase due to the amount of nesting and newlines, as compact (489KiB) orpretty (1.1MiB):

Library	compact (ms)	pretty (ms)	vs. orjson
orjson	0.59	0.71	1
ujson	2.9	3.59	5
rapidjson	1.81	2.8	3.9
simplejson	10.43	42.13	59.1
json	4.16	33.42	46.9

This can be reproduced using the pyindent script.

OPT_NAIVE_UTC

Serialize datetime.datetime objects without a tzinfo as UTC. Thishas no effect on datetime.datetime objects that have tzinfo set.

OPT_NON_STR_KEYS

Serialize dict keys of type other than str. This allows dict keysto be one of str, int, float, bool, None, datetime.datetime,datetime.date, datetime.time, enum.Enum, and uuid.UUID. For comparison,the standard library serializes str, int, float, bool or None bydefault. orjson benchmarks as being faster at serializing non-str keysthan other libraries. This option is slower for str keys than the default.

>>> import orjson, datetime, uuid>>> orjson.dumps( {uuid.UUID("7202d115-7ff3-4c81-a7c1-2a1f067b1ece"): [1, 2, 3]}, option=orjson.OPT_NON_STR_KEYS, )b'{"7202d115-7ff3-4c81-a7c1-2a1f067b1ece":[1,2,3]}'>>> orjson.dumps( {datetime.datetime(1970, 1, 1, 0, 0, 0): [1, 2, 3]}, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_NAIVE_UTC, )b'{"1970-01-01T00:00:00+00:00":[1,2,3]}'

These types are generally serialized how they would be asvalues, e.g., datetime.datetime is still an RFC 3339 string and respectsoptions affecting it. The exception is that int serialization does notrespect OPT_STRICT_INTEGER.

This option has the risk of creating duplicate keys. This is because non-strobjects may serialize to the same str as an existing key, e.g.,{"1": true, 1: false}. The last key to be inserted to the dict will beserialized last and a JSON deserializer will presumably take the lastoccurrence of a key (in the above, false). The first value will be lost.

This option is compatible with orjson.OPT_SORT_KEYS. If sorting is used,note the sort is unstable and will be unpredictable for duplicate keys.

>>> import orjson, datetime>>> orjson.dumps( {"other": 1, datetime.date(1970, 1, 5): 2, datetime.date(1970, 1, 3): 3}, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SORT_KEYS)b'{"1970-01-03":3,"1970-01-05":2,"other":1}'

This measures serializing 589KiB of JSON comprising a list of 100 dictin which each dict has both 365 randomly-sorted int keys representing epochtimestamps as well as one str key and the value for each key is asingle integer. In "str keys", the keys were converted to str beforeserialization, and orjson still specifes option=orjson.OPT_NON_STR_KEYS(which is always somewhat slower).

Library	str keys (ms)	int keys (ms)	int keys sorted (ms)
orjson	1.53	2.16	4.29
ujson	3.07	5.65
rapidjson	4.29
simplejson	11.24	14.50	21.86
json	7.17	8.49

ujson is blank for sorting because it segfaults. json is blank because itraises TypeError on attempting to sort before converting all keys to str.rapidjson is blank because it does not support non-str keys. This canbe reproduced using the pynonstr script.

OPT_OMIT_MICROSECONDS

Do not serialize the microsecond field on datetime.datetime anddatetime.time instances.

>>> import orjson, datetime>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, 1), )b'"1970-01-01T00:00:00.000001"'>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, 1), option=orjson.OPT_OMIT_MICROSECONDS, )b'"1970-01-01T00:00:00"'

OPT_PASSTHROUGH_DATACLASS

Passthrough dataclasses.dataclass instances to default. This allowscustomizing their output but is much slower.

>>> import orjson, dataclasses>>>@dataclasses.dataclassclass User: id: str name: str password: strdef default(obj): if isinstance(obj, User): return {"id": obj.id, "name": obj.name} raise TypeError>>> orjson.dumps(User("3b1", "asd", "zxc"))b'{"id":"3b1","name":"asd","password":"zxc"}'>>> orjson.dumps(User("3b1", "asd", "zxc"), option=orjson.OPT_PASSTHROUGH_DATACLASS)TypeError: Type is not JSON serializable: User>>> orjson.dumps( User("3b1", "asd", "zxc"), option=orjson.OPT_PASSTHROUGH_DATACLASS, default=default, )b'{"id":"3b1","name":"asd"}'

OPT_PASSTHROUGH_DATETIME

Passthrough datetime.datetime, datetime.date, and datetime.time instancesto default. This allows serializing datetimes to a custom format, e.g.,HTTP dates:

>>> import orjson, datetime>>>def default(obj): if isinstance(obj, datetime.datetime): return obj.strftime("%a, %d %b %Y %H:%M:%S GMT") raise TypeError>>> orjson.dumps({"created_at": datetime.datetime(1970, 1, 1)})b'{"created_at":"1970-01-01T00:00:00"}'>>> orjson.dumps({"created_at": datetime.datetime(1970, 1, 1)}, option=orjson.OPT_PASSTHROUGH_DATETIME)TypeError: Type is not JSON serializable: datetime.datetime>>> orjson.dumps( {"created_at": datetime.datetime(1970, 1, 1)}, option=orjson.OPT_PASSTHROUGH_DATETIME, default=default, )b'{"created_at":"Thu, 01 Jan 1970 00:00:00 GMT"}'

This does not affect datetimes in dict keys if using OPT_NON_STR_KEYS.

OPT_PASSTHROUGH_SUBCLASS

Passthrough subclasses of builtin types to default.

>>> import orjson>>>class Secret(str): passdef default(obj): if isinstance(obj, Secret): return "******" raise TypeError>>> orjson.dumps(Secret("zxc"))b'"zxc"'>>> orjson.dumps(Secret("zxc"), option=orjson.OPT_PASSTHROUGH_SUBCLASS)TypeError: Type is not JSON serializable: Secret>>> orjson.dumps(Secret("zxc"), option=orjson.OPT_PASSTHROUGH_SUBCLASS, default=default)b'"******"'

This does not affect serializing subclasses as dict keys if usingOPT_NON_STR_KEYS.

OPT_SERIALIZE_DATACLASS

This is deprecated and has no effect in version 3. In version 2 this wasrequired to serialize dataclasses.dataclass instances. For more, seedataclass.

OPT_SERIALIZE_NUMPY

Serialize numpy.ndarray instances. For more, seenumpy.

OPT_SERIALIZE_UUID

This is deprecated and has no effect in version 3. In version 2 this wasrequired to serialize uuid.UUID instances. For more, seeUUID.

OPT_SORT_KEYS

Serialize dict keys in sorted order. The default is to serialize in anunspecified order. This is equivalent to sort_keys=True in the standardlibrary.

This can be used to ensure the order is deterministic for hashing or tests.It has a substantial performance penalty and is not recommended in general.

>>> import orjson>>> orjson.dumps({"b": 1, "c": 2, "a": 3})b'{"b":1,"c":2,"a":3}'>>> orjson.dumps({"b": 1, "c": 2, "a": 3}, option=orjson.OPT_SORT_KEYS)b'{"a":3,"b":1,"c":2}'

This measures serializing the twitter.json fixture unsorted and sorted:

Library	unsorted (ms)	sorted (ms)	vs. orjson
orjson	0.32	0.54	1
ujson	1.6	2.07	3.8
rapidjson	1.12	1.65	3.1
simplejson	2.25	3.13	5.8
json	1.78	2.32	4.3

The benchmark can be reproduced using the pysort script.

The sorting is not collation/locale-aware:

>>> import orjson>>> orjson.dumps({"a": 1, "ä": 2, "A": 3}, option=orjson.OPT_SORT_KEYS)b'{"A":3,"a":1,"\xc3\xa4":2}'

This is the same sorting behavior as the standard library, rapidjson,simplejson, and ujson.

dataclass also serialize as maps but this has no effect on them.

OPT_STRICT_INTEGER

Enforce 53-bit limit on integers. The limit is otherwise 64 bits, the same asthe Python standard library. For more, see int.

OPT_UTC_Z

Serialize a UTC timezone on datetime.datetime instances as Z insteadof +00:00.

>>> import orjson, datetime, zoneinfo>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=zoneinfo.ZoneInfo("UTC")), )b'"1970-01-01T00:00:00+00:00"'>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=zoneinfo.ZoneInfo("UTC")), option=orjson.OPT_UTC_Z )b'"1970-01-01T00:00:00Z"'

Fragment

orjson.Fragment includes already-serialized JSON in a document. This is anefficient way to include JSON blobs from a cache, JSONB field, or separatelyserialized object without first deserializing to Python objects via loads().

>>> import orjson>>> orjson.dumps({"key": "zxc", "data": orjson.Fragment(b'{"a": "b", "c": 1}')})b'{"key":"zxc","data":{"a": "b", "c": 1}}'

It does no reformatting: orjson.OPT_INDENT_2 will not affect acompact blob nor will a pretty-printed JSON blob be rewritten as compact.

The input must be bytes or str and given as a positional argument.

This raises orjson.JSONEncodeError if a str is given and the input isnot valid UTF-8. It otherwise does no validation and it is possible towrite invalid JSON. This does not escape characters. The implementation istested to not crash if given invalid strings or invalid JSON.

This is similar to RawJSON in rapidjson.

Deserialize

def loads(__obj: Union[bytes, bytearray, memoryview, str]) -> Any: ...

loads() deserializes JSON to Python objects. It deserializes to dict,list, int, float, str, bool, and None objects.

bytes, bytearray, memoryview, and str input are accepted. If the inputexists as a memoryview, bytearray, or bytes object, it is recommended topass these directly rather than creating an unnecessary str object. That is,orjson.loads(b"{}") instead of orjson.loads(b"{}".decode("utf-8")). Thishas lower memory usage and lower latency.

The input must be valid UTF-8.

orjson maintains a cache of map keys for the duration of the process. Thiscauses a net reduction in memory usage by avoiding duplicate strings. Thekeys must be at most 64 bytes to be cached and 2048 entries are stored.

Types

dataclass

orjson serializes instances of dataclasses.dataclass natively. It serializesinstances 40-50x as fast as other libraries and avoids a severe slowdown seenin other libraries compared to serializing dict.

It is supported to pass all variants of dataclasses, including dataclassesusing __slots__, frozen dataclasses, those with optional or defaultattributes, and subclasses. There is a performance benefit to notusing __slots__.

Library	dict (ms)	dataclass (ms)	vs. orjson
orjson	1.40	1.60	1
ujson
rapidjson	3.64	68.48	42
simplejson	14.21	92.18	57
json	13.28	94.90	59

This measures serializing 555KiB of JSON, orjson natively and other librariesusing default to serialize the output of dataclasses.asdict(). This can bereproduced using the pydataclass script.

Dataclasses are serialized as maps, with every attribute serialized and inthe order given on class definition:

>>> import dataclasses, orjson, typing@dataclasses.dataclassclass Member: id: int active: bool = dataclasses.field(default=False)@dataclasses.dataclassclass Object: id: int name: str members: typing.List[Member]>>> orjson.dumps(Object(1, "a", [Member(1, True), Member(2)]))b'{"id":1,"name":"a","members":[{"id":1,"active":true},{"id":2,"active":false}]}'

datetime

orjson serializes datetime.datetime objects toRFC 3339 format,e.g., "1970-01-01T00:00:00+00:00". This is a subset of ISO 8601 and iscompatible with isoformat() in the standard library.

>>> import orjson, datetime, zoneinfo>>> orjson.dumps( datetime.datetime(2018, 12, 1, 2, 3, 4, 9, tzinfo=zoneinfo.ZoneInfo("Australia/Adelaide")))b'"2018-12-01T02:03:04.000009+10:30"'>>> orjson.dumps( datetime.datetime(2100, 9, 1, 21, 55, 2).replace(tzinfo=zoneinfo.ZoneInfo("UTC")))b'"2100-09-01T21:55:02+00:00"'>>> orjson.dumps( datetime.datetime(2100, 9, 1, 21, 55, 2))b'"2100-09-01T21:55:02"'

datetime.datetime supports instances with a tzinfo that is None,datetime.timezone.utc, a timezone instance from the python3.9+ zoneinfomodule, or a timezone instance from the third-party pendulum, pytz, ordateutil/arrow libraries.

It is fastest to use the standard library's zoneinfo.ZoneInfo for timezones.

datetime.time objects must not have a tzinfo.

>>> import orjson, datetime>>> orjson.dumps(datetime.time(12, 0, 15, 290))b'"12:00:15.000290"'

datetime.date objects will always serialize.

>>> import orjson, datetime>>> orjson.dumps(datetime.date(1900, 1, 2))b'"1900-01-02"'

Errors with tzinfo result in JSONEncodeError being raised.

To disable serialization of datetime objects specify the optionorjson.OPT_PASSTHROUGH_DATETIME.

To use "Z" suffix instead of "+00:00" to indicate UTC ("Zulu") time, use the optionorjson.OPT_UTC_Z.

To assume datetimes without timezone are UTC, use the option orjson.OPT_NAIVE_UTC.

enum

orjson serializes enums natively. Options apply to their values.

>>> import enum, datetime, orjson>>>class DatetimeEnum(enum.Enum): EPOCH = datetime.datetime(1970, 1, 1, 0, 0, 0)>>> orjson.dumps(DatetimeEnum.EPOCH)b'"1970-01-01T00:00:00"'>>> orjson.dumps(DatetimeEnum.EPOCH, option=orjson.OPT_NAIVE_UTC)b'"1970-01-01T00:00:00+00:00"'

Enums with members that are not supported types can be serialized usingdefault:

>>> import enum, orjson>>>class Custom: def __init__(self, val): self.val = valdef default(obj): if isinstance(obj, Custom): return obj.val raise TypeErrorclass CustomEnum(enum.Enum): ONE = Custom(1)>>> orjson.dumps(CustomEnum.ONE, default=default)b'1'

float

orjson serializes and deserializes double precision floats with no loss ofprecision and consistent rounding.

orjson.dumps() serializes Nan, Infinity, and -Infinity, which are notcompliant JSON, as null:

>>> import orjson, ujson, rapidjson, json>>> orjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])b'[null,null,null]'>>> ujson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])OverflowError: Invalid Inf value when encoding double>>> rapidjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])'[NaN,Infinity,-Infinity]'>>> json.dumps([float("NaN"), float("Infinity"), float("-Infinity")])'[NaN, Infinity, -Infinity]'

int

orjson serializes and deserializes 64-bit integers by default. The rangesupported is a signed 64-bit integer's minimum (-9223372036854775807) toan unsigned 64-bit integer's maximum (18446744073709551615). Thisis widely compatible, but there are implementationsthat only support 53-bits for integers, e.g.,web browsers. For those implementations, dumps() can be configured toraise a JSONEncodeError on values exceeding the 53-bit range.

>>> import orjson>>> orjson.dumps(9007199254740992)b'9007199254740992'>>> orjson.dumps(9007199254740992, option=orjson.OPT_STRICT_INTEGER)JSONEncodeError: Integer exceeds 53-bit range>>> orjson.dumps(-9007199254740992, option=orjson.OPT_STRICT_INTEGER)JSONEncodeError: Integer exceeds 53-bit range

numpy

orjson natively serializes numpy.ndarray and individualnumpy.float64, numpy.float32, numpy.float16 (numpy.half),numpy.int64, numpy.int32, numpy.int16, numpy.int8,numpy.uint64, numpy.uint32, numpy.uint16, numpy.uint8,numpy.uintp, numpy.intp, numpy.datetime64, and numpy.boolinstances.

orjson is compatible with both numpy v1 and v2.

orjson is faster than all compared libraries at serializingnumpy instances. Serializing numpy data requires specifyingoption=orjson.OPT_SERIALIZE_NUMPY.

>>> import orjson, numpy>>> orjson.dumps( numpy.array([[1, 2, 3], [4, 5, 6]]), option=orjson.OPT_SERIALIZE_NUMPY,)b'[[1,2,3],[4,5,6]]'

The array must be a contiguous C array (C_CONTIGUOUS) and one of thesupported datatypes.

Note a difference between serializing numpy.float32 using ndarray.tolist()or orjson.dumps(..., option=orjson.OPT_SERIALIZE_NUMPY): tolist() convertsto a double before serializing and orjson's native path does not. Thiscan result in different rounding.

numpy.datetime64 instances are serialized as RFC 3339 strings anddatetime options affect them.

>>> import orjson, numpy>>> orjson.dumps( numpy.datetime64("2021-01-01T00:00:00.172"), option=orjson.OPT_SERIALIZE_NUMPY,)b'"2021-01-01T00:00:00.172000"'>>> orjson.dumps( numpy.datetime64("2021-01-01T00:00:00.172"), option=( orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NAIVE_UTC | orjson.OPT_OMIT_MICROSECONDS ),)b'"2021-01-01T00:00:00+00:00"'

If an array is not a contiguous C array, contains an unsupported datatype,or contains a numpy.datetime64 using an unsupported representation(e.g., picoseconds), orjson falls through to default. In default,obj.tolist() can be specified.

If an array is not in the native endianness, e.g., an array of big-endian valueson a little-endian system, orjson.JSONEncodeError is raised.

If an array is malformed, orjson.JSONEncodeError is raised.

This measures serializing 92MiB of JSON from an numpy.ndarray withdimensions of (50000, 100) and numpy.float64 values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	194	99	1.0
ujson
rapidjson	3,048	309	15.7
simplejson	3,023	297	15.6
json	3,133	297	16.1

This measures serializing 100MiB of JSON from an numpy.ndarray withdimensions of (100000, 100) and numpy.int32 values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	178	115	1.0
ujson
rapidjson	1,512	551	8.5
simplejson	1,606	504	9.0
json	1,506	503	8.4

This measures serializing 105MiB of JSON from an numpy.ndarray withdimensions of (100000, 200) and numpy.bool values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	157	120	1.0
ujson
rapidjson	710	327	4.5
simplejson	931	398	5.9
json	996	400	6.3

In these benchmarks, orjson serializes natively, ujson is blank because itdoes not support a default parameter, and the other libraries serializendarray.tolist() via default. The RSS column measures peak memoryusage during serialization. This can be reproduced using the pynumpy script.

orjson does not have an installation or compilation dependency on numpy. Theimplementation is independent, reading numpy.ndarray usingPyArrayInterface.

str

orjson is strict about UTF-8 conformance. This is stricter than the standardlibrary's json module, which will serialize and deserialize UTF-16 surrogates,e.g., "\ud800", that are invalid UTF-8.

If orjson.dumps() is given a str that does not contain valid UTF-8,orjson.JSONEncodeError is raised. If loads() receives invalid UTF-8,orjson.JSONDecodeError is raised.

orjson and rapidjson are the only compared JSON libraries to consistentlyerror on bad input.

>>> import orjson, ujson, rapidjson, json>>> orjson.dumps('\ud800')JSONEncodeError: str is not valid UTF-8: surrogates not allowed>>> ujson.dumps('\ud800')UnicodeEncodeError: 'utf-8' codec ...>>> rapidjson.dumps('\ud800')UnicodeEncodeError: 'utf-8' codec ...>>> json.dumps('\ud800')'"\\ud800"'>>> orjson.loads('"\\ud800"')JSONDecodeError: unexpected end of hex escape at line 1 column 8: line 1 column 1 (char 0)>>> ujson.loads('"\\ud800"')''>>> rapidjson.loads('"\\ud800"')ValueError: Parse error at offset 1: The surrogate pair in string is invalid.>>> json.loads('"\\ud800"')'\ud800'

To make a best effort at deserializing bad input, first decode bytes usingthe replace or lossy argument for errors:

>>> import orjson>>> orjson.loads(b'"\xed\xa0\x80"')JSONDecodeError: str is not valid UTF-8: surrogates not allowed>>> orjson.loads(b'"\xed\xa0\x80"'.decode("utf-8", "replace"))'���'

uuid

orjson serializes uuid.UUID instances toRFC 4122 format, e.g.,"f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

>>> import orjson, uuid>>> orjson.dumps(uuid.UUID('f81d4fae-7dec-11d0-a765-00a0c91e6bf6'))b'"f81d4fae-7dec-11d0-a765-00a0c91e6bf6"'>>> orjson.dumps(uuid.uuid5(uuid.NAMESPACE_DNS, "python.org"))b'"886313e1-3b8a-5372-9b90-0c9aee199e5d"'

Testing

The library has comprehensive tests. There are tests against fixtures in theJSONTestSuite andnativejson-benchmarkrepositories. It is tested to not crash against theBig List of Naughty Strings.It is tested to not leak memory. It is tested to not crashagainst and not accept invalid UTF-8. There are integration testsexercising the library's use in web servers (gunicorn using multiprocess/forkedworkers) and whenmultithreaded. It also uses some tests from the ultrajson library.

orjson is the most correct of the compared libraries. This graph shows how eachlibrary handles a combined 342 JSON fixtures from theJSONTestSuite andnativejson-benchmark tests:

Library	Invalid JSON documents not rejected	Valid JSON documents not deserialized
orjson	0	0
ujson	31	0
rapidjson	6	0
simplejson	10	0
json	17	0

This shows that all libraries deserialize valid JSON but only orjsoncorrectly rejects the given invalid JSON fixtures. Errors are largely due toaccepting invalid strings and numbers.

The graph above can be reproduced using the pycorrectness script.

Performance

Serialization and deserialization performance of orjson is better thanultrajson, rapidjson, simplejson, or json. The benchmarks are done onfixtures of real data:

twitter.json, 631.5KiB, results of a search on Twitter for "一", containingCJK strings, dictionaries of strings and arrays of dictionaries, indented.
github.json, 55.8KiB, a GitHub activity feed, containing dictionaries ofstrings and arrays of dictionaries, not indented.
citm_catalog.json, 1.7MiB, concert data, containing nested dictionaries ofstrings and arrays of integers, indented.
canada.json, 2.2MiB, coordinates of the Canadian border in GeoJSONformat, containing floats and arrays, indented.

Latency

twitter.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.1	8377	1
ujson	0.9	1088	7.3
rapidjson	0.8	1228	6.8
simplejson	1.9	531	15.6
json	1.4	744	11.3

twitter.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.6	1811	1
ujson	1.2	814	2.1
rapidjson	2.1	476	3.8
simplejson	1.6	626	3
json	1.8	557	3.3

github.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.01	104424	1
ujson	0.09	10594	9.8
rapidjson	0.07	13667	7.6
simplejson	0.2	5051	20.6
json	0.14	7133	14.6

github.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.05	20069	1
ujson	0.11	8913	2.3
rapidjson	0.13	8077	2.6
simplejson	0.11	9342	2.1
json	0.11	9291	2.2

citm_catalog.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.3	3757	1
ujson	1.7	598	6.3
rapidjson	1.3	768	4.9
simplejson	8.3	120	31.1
json	3	331	11.3

citm_catalog.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	1.4	730	1
ujson	2.6	384	1.9
rapidjson	4	246	3
simplejson	3.7	271	2.7
json	3.7	267	2.7

canada.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	2.4	410	1
ujson	9.6	104	3.9
rapidjson	28.7	34	11.8
simplejson	49.3	20	20.3
json	30.6	32	12.6

canada.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	3	336	1
ujson	7.1	141	2.4
rapidjson	20.1	49	6.7
simplejson	16.8	59	5.6
json	18.2	55	6.1

Memory

orjson as of 3.7.0 has higher baseline memory usage than other librariesdue to a persistent buffer used for parsing. Incremental memory usage whendeserializing is similar to the standard library and other third-partylibraries.

This measures, in the first column, RSS after importing a library and readingthe fixture, and in the second column, increases in RSS after repeatedlycalling loads() on the fixture.

twitter.json

Library	import, read() RSS (MiB)	loads() increase in RSS (MiB)
orjson	15.7	3.4
ujson	16.4	3.4
rapidjson	16.6	4.4
simplejson	14.5	1.8
json	13.9	1.8

github.json

Library	import, read() RSS (MiB)	loads() increase in RSS (MiB)
orjson	15.2	0.4
ujson	15.4	0.4
rapidjson	15.7	0.5
simplejson	13.7	0.2
json	13.3	0.1

citm_catalog.json

Library	import, read() RSS (MiB)	loads() increase in RSS (MiB)
orjson	16.8	10.1
ujson	17.3	10.2
rapidjson	17.6	28.7
simplejson	15.8	30.1
json	14.8	20.5

canada.json

Library	import, read() RSS (MiB)	loads() increase in RSS (MiB)
orjson	17.2	22.1
ujson	17.4	18.3
rapidjson	18	23.5
simplejson	15.7	21.4
json	15.4	20.4

Reproducing

The above was measured using Python 3.11.9 on Linux (amd64) withorjson 3.10.6, ujson 5.10.0, python-rapidson 1.18, and simplejson 3.19.2.

The latency results can be reproduced using the pybench and graphscripts. The memory results can be reproduced using the pymem script.

Questions

Why can't I install it from PyPI?

Probably pip needs to be upgraded to version 20.3 or later to supportthe latest manylinux_x_y or universal2 wheel formats.

"Cargo, the Rust package manager, is not installed or is not on PATH."

This happens when there are no binary wheels (like manylinux) for yourplatform on PyPI. You can install Rust throughrustup or a package manager and then it will compile.

Will it deserialize to dataclasses, UUIDs, decimals, etc or support object_hook?

No. This requires a schema specifying what types are expected and how tohandle errors etc. This is addressed by data validation libraries alevel above this.

Will it serialize to `str`?

No. bytes is the correct type for a serialized blob.

Packaging

To package orjson requires at least Rust 1.72and the maturin build tool. The recommendedbuild command is:

maturin build --release --strip

It benefits from also having a C build environment to compile a fasterdeserialization backend. See this project's manylinux_2_28 builds for anexample using clang and LTO.

The project's own CI tests against nightly-2024-08-05 and stable 1.72. Itis prudent to pin the nightly version because that channel can introducebreaking changes.

orjson is tested for amd64, aarch64, arm7, ppc64le, and s390x on Linux. Itis tested for either aarch64 or amd64 on macOS and cross-compiles for the other,depending on version. For Windows it is tested on amd64 and i686.

There are no runtime dependencies other than libc.

The source distribution on PyPI contains all dependencies' source and can bebuilt without network access. The file can be downloaded fromhttps://files.pythonhosted.org/packages/source/o/orjson/orjson-${version}.tar.gz.

orjson's tests are included in the source distribution on PyPI. Therequirements to run the tests are specified in test/requirements.txt. Thetests should be run as part of the build. It can be run withpytest -q test.