orjson is a fast, correct JSON library for Python. Itbenchmarks as the fastest Pythonlibrary for JSON and is more correct than the standard json library or otherthird-party libraries. It serializesdataclass,datetime,numpy, andUUID instances natively.
Its features and drawbacks compared to other Python JSON libraries:
- serializes
dataclass
instances 40-50x as fast as other libraries - serializes
datetime
,date
, andtime
instances to RFC 3339 format,e.g., "1970-01-01T00:00:00+00:00" - serializes
numpy.ndarray
instances 4-12x as fast with 0.3x the memoryusage of other libraries - pretty prints 10x to 20x as fast as the standard library
- serializes to
bytes
rather thanstr
, i.e., is not a drop-in replacement - serializes
str
without escaping unicode to ASCII, e.g., "好" rather than"\\u597d" - serializes
float
10x as fast and deserializes twice as fast as otherlibraries - serializes subclasses of
str
,int
,list
, anddict
natively,requiringdefault
to specify how to serialize others - serializes arbitrary types using a
default
hook - has strict UTF-8 conformance, more correct than the standard library
- has strict JSON conformance in not supporting Nan/Infinity/-Infinity
- has an option for strict JSON conformance on 53-bit integers with defaultsupport for 64-bit
- does not provide
load()
ordump()
functions for reading from/writing tofile-like objects
orjson supports CPython 3.8, 3.9, 3.10, 3.11, 3.12, and 3.13. It distributesamd64/x86_64, aarch64/armv8, arm7, POWER/ppc64le, and s390x wheels for Linux,amd64 and aarch64 wheels for macOS, and amd64 and i686/x86 wheels for Windows.orjson does not and will not support PyPy. orjson does not and will notsupport PEP 554 subinterpreters. Releases follow semantic versioning andserializing a new object type without an opt-in flag is considered abreaking change.
orjson is licensed under both the Apache 2.0 and MIT licenses. Therepository and issue tracker isgithub.com/ijl/orjson, and patches may besubmitted there. There is aCHANGELOGavailable in the repository.
Usage
Install
To install a wheel from PyPI:
pip install --upgrade "pip>=20.3" # manylinux_x_y, universal2 wheel supportpip install --upgrade orjson
To build a wheel, see packaging.
Quickstart
This is an example of serializing, with options specified, and deserializing:
>>> import orjson, datetime, numpy>>> data = { "type": "job", "created_at": datetime.datetime(1970, 1, 1), "status": "🆗", "payload": numpy.array([[1, 2], [3, 4]]),}>>> orjson.dumps(data, option=orjson.OPT_NAIVE_UTC | orjson.OPT_SERIALIZE_NUMPY)b'{"type":"job","created_at":"1970-01-01T00:00:00+00:00","status":"\xf0\x9f\x86\x97","payload":[[1,2],[3,4]]}'>>> orjson.loads(_){'type': 'job', 'created_at': '1970-01-01T00:00:00+00:00', 'status': '🆗', 'payload': [[1, 2], [3, 4]]}
Migrating
orjson version 3 serializes more types than version 2. Subclasses of str
,int
, dict
, and list
are now serialized. This is faster and more similarto the standard library. It can be disabled withorjson.OPT_PASSTHROUGH_SUBCLASS
.dataclasses.dataclass
instancesare now serialized by default and cannot be customized in adefault
function unless option=orjson.OPT_PASSTHROUGH_DATACLASS
isspecified. uuid.UUID
instances are serialized by default.For any type that is now serialized,implementations in a default
function and options enabling them can beremoved but do not need to be. There was no change in deserialization.
To migrate from the standard library, the largest difference is thatorjson.dumps
returns bytes
and json.dumps
returns a str
. Users withdict
objects using non-str
keys should specifyoption=orjson.OPT_NON_STR_KEYS
. sort_keys
is replaced byoption=orjson.OPT_SORT_KEYS
. indent
is replaced byoption=orjson.OPT_INDENT_2
and other levels of indentation are notsupported.
Serialize
def dumps( __obj: Any, default: Optional[Callable[[Any], Any]] = ..., option: Optional[int] = ...,) -> bytes: ...
dumps()
serializes Python objects to JSON.
It natively serializesstr
, dict
, list
, tuple
, int
, float
, bool
, None
,dataclasses.dataclass
, typing.TypedDict
, datetime.datetime
,datetime.date
, datetime.time
, uuid.UUID
, numpy.ndarray
, andorjson.Fragment
instances. It supports arbitrary types through default
. Itserializes subclasses of str
, int
, dict
, list
,dataclasses.dataclass
, and enum.Enum
. It does not serialize subclassesof tuple
to avoid serializing namedtuple
objects as arrays. To avoidserializing subclasses, specify the option orjson.OPT_PASSTHROUGH_SUBCLASS
.
The output is a bytes
object containing UTF-8.
The global interpreter lock (GIL) is held for the duration of the call.
It raises JSONEncodeError
on an unsupported type. This exception messagedescribes the invalid object with the error messageType is not JSON serializable: ...
. To fix this, specifydefault.
It raises JSONEncodeError
on a str
that contains invalid UTF-8.
It raises JSONEncodeError
on an integer that exceeds 64 bits by default or,with OPT_STRICT_INTEGER
, 53 bits.
It raises JSONEncodeError
if a dict
has a key of a type other than str
,unless OPT_NON_STR_KEYS
is specified.
It raises JSONEncodeError
if the output of default
recurses to handling bydefault
more than 254 levels deep.
It raises JSONEncodeError
on circular references.
It raises JSONEncodeError
if a tzinfo
on a datetime object isunsupported.
JSONEncodeError
is a subclass of TypeError
. This is for compatibilitywith the standard library.
If the failure was caused by an exception in default
thenJSONEncodeError
chains the original exception as __cause__
.
default
To serialize a subclass or arbitrary types, specify default
as acallable that returns a supported type. default
may be a function,lambda, or callable class instance. To specify that a type was nothandled by default
, raise an exception such as TypeError
.
>>> import orjson, decimal>>>def default(obj): if isinstance(obj, decimal.Decimal): return str(obj) raise TypeError>>> orjson.dumps(decimal.Decimal("0.0842389659712649442845"))JSONEncodeError: Type is not JSON serializable: decimal.Decimal>>> orjson.dumps(decimal.Decimal("0.0842389659712649442845"), default=default)b'"0.0842389659712649442845"'>>> orjson.dumps({1, 2}, default=default)orjson.JSONEncodeError: Type is not JSON serializable: set
The default
callable may return an object that itselfmust be handled by default
up to 254 times before an exceptionis raised.
It is important that default
raise an exception if a type cannot be handled.Python otherwise implicitly returns None
, which appears to the callerlike a legitimate value and is serialized:
>>> import orjson, json, rapidjson>>>def default(obj): if isinstance(obj, decimal.Decimal): return str(obj)>>> orjson.dumps({"set":{1, 2}}, default=default)b'{"set":null}'>>> json.dumps({"set":{1, 2}}, default=default)'{"set":null}'>>> rapidjson.dumps({"set":{1, 2}}, default=default)'{"set":null}'
option
To modify how data is serialized, specify option
. Each option
is an integerconstant in orjson
. To specify multiple options, mask them together, e.g.,option=orjson.OPT_STRICT_INTEGER | orjson.OPT_NAIVE_UTC
.
OPT_APPEND_NEWLINE
Append \n
to the output. This is a convenience and optimization for thepattern of dumps(...) + "\n"
. bytes
objects are immutable and thispattern copies the original contents.
>>> import orjson>>> orjson.dumps([])b"[]">>> orjson.dumps([], option=orjson.OPT_APPEND_NEWLINE)b"[]\n"
OPT_INDENT_2
Pretty-print output with an indent of two spaces. This is equivalent toindent=2
in the standard library. Pretty printing is slower and the outputlarger. orjson is the fastest compared library at pretty printing and hasmuch less of a slowdown to pretty print than the standard library does. Thisoption is compatible with all other options.
>>> import orjson>>> orjson.dumps({"a": "b", "c": {"d": True}, "e": [1, 2]})b'{"a":"b","c":{"d":true},"e":[1,2]}'>>> orjson.dumps( {"a": "b", "c": {"d": True}, "e": [1, 2]}, option=orjson.OPT_INDENT_2)b'{\n "a": "b",\n "c": {\n "d": true\n },\n "e": [\n 1,\n 2\n ]\n}'
If displayed, the indentation and linebreaks appear like this:
{ "a": "b", "c": { "d": true }, "e": [ 1, 2 ]}
This measures serializing the github.json fixture as compact (52KiB) orpretty (64KiB):
Library | compact (ms) | pretty (ms) | vs. orjson |
---|---|---|---|
orjson | 0.03 | 0.04 | 1 |
ujson | 0.18 | 0.19 | 4.6 |
rapidjson | 0.1 | 0.12 | 2.9 |
simplejson | 0.25 | 0.89 | 21.4 |
json | 0.18 | 0.71 | 17 |
This measures serializing the citm_catalog.json fixture, more of a worstcase due to the amount of nesting and newlines, as compact (489KiB) orpretty (1.1MiB):
Library | compact (ms) | pretty (ms) | vs. orjson |
---|---|---|---|
orjson | 0.59 | 0.71 | 1 |
ujson | 2.9 | 3.59 | 5 |
rapidjson | 1.81 | 2.8 | 3.9 |
simplejson | 10.43 | 42.13 | 59.1 |
json | 4.16 | 33.42 | 46.9 |
This can be reproduced using the pyindent
script.
OPT_NAIVE_UTC
Serialize datetime.datetime
objects without a tzinfo
as UTC. Thishas no effect on datetime.datetime
objects that have tzinfo
set.
>>> import orjson, datetime>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0), )b'"1970-01-01T00:00:00"'>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0), option=orjson.OPT_NAIVE_UTC, )b'"1970-01-01T00:00:00+00:00"'
OPT_NON_STR_KEYS
Serialize dict
keys of type other than str
. This allows dict
keysto be one of str
, int
, float
, bool
, None
, datetime.datetime
,datetime.date
, datetime.time
, enum.Enum
, and uuid.UUID
. For comparison,the standard library serializes str
, int
, float
, bool
or None
bydefault. orjson benchmarks as being faster at serializing non-str
keysthan other libraries. This option is slower for str
keys than the default.
>>> import orjson, datetime, uuid>>> orjson.dumps( {uuid.UUID("7202d115-7ff3-4c81-a7c1-2a1f067b1ece"): [1, 2, 3]}, option=orjson.OPT_NON_STR_KEYS, )b'{"7202d115-7ff3-4c81-a7c1-2a1f067b1ece":[1,2,3]}'>>> orjson.dumps( {datetime.datetime(1970, 1, 1, 0, 0, 0): [1, 2, 3]}, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_NAIVE_UTC, )b'{"1970-01-01T00:00:00+00:00":[1,2,3]}'
These types are generally serialized how they would be asvalues, e.g., datetime.datetime
is still an RFC 3339 string and respectsoptions affecting it. The exception is that int
serialization does notrespect OPT_STRICT_INTEGER
.
This option has the risk of creating duplicate keys. This is because non-str
objects may serialize to the same str
as an existing key, e.g.,{"1": true, 1: false}
. The last key to be inserted to the dict
will beserialized last and a JSON deserializer will presumably take the lastoccurrence of a key (in the above, false
). The first value will be lost.
This option is compatible with orjson.OPT_SORT_KEYS
. If sorting is used,note the sort is unstable and will be unpredictable for duplicate keys.
>>> import orjson, datetime>>> orjson.dumps( {"other": 1, datetime.date(1970, 1, 5): 2, datetime.date(1970, 1, 3): 3}, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SORT_KEYS)b'{"1970-01-03":3,"1970-01-05":2,"other":1}'
This measures serializing 589KiB of JSON comprising a list
of 100 dict
in which each dict
has both 365 randomly-sorted int
keys representing epochtimestamps as well as one str
key and the value for each key is asingle integer. In "str keys", the keys were converted to str
beforeserialization, and orjson still specifes option=orjson.OPT_NON_STR_KEYS
(which is always somewhat slower).
Library | str keys (ms) | int keys (ms) | int keys sorted (ms) |
---|---|---|---|
orjson | 1.53 | 2.16 | 4.29 |
ujson | 3.07 | 5.65 | |
rapidjson | 4.29 | ||
simplejson | 11.24 | 14.50 | 21.86 |
json | 7.17 | 8.49 |
ujson is blank for sorting because it segfaults. json is blank because itraises TypeError
on attempting to sort before converting all keys to str
.rapidjson is blank because it does not support non-str
keys. This canbe reproduced using the pynonstr
script.
OPT_OMIT_MICROSECONDS
Do not serialize the microsecond
field on datetime.datetime
anddatetime.time
instances.
>>> import orjson, datetime>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, 1), )b'"1970-01-01T00:00:00.000001"'>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, 1), option=orjson.OPT_OMIT_MICROSECONDS, )b'"1970-01-01T00:00:00"'
OPT_PASSTHROUGH_DATACLASS
Passthrough dataclasses.dataclass
instances to default
. This allowscustomizing their output but is much slower.
>>> import orjson, dataclasses>>>@dataclasses.dataclassclass User: id: str name: str password: strdef default(obj): if isinstance(obj, User): return {"id": obj.id, "name": obj.name} raise TypeError>>> orjson.dumps(User("3b1", "asd", "zxc"))b'{"id":"3b1","name":"asd","password":"zxc"}'>>> orjson.dumps(User("3b1", "asd", "zxc"), option=orjson.OPT_PASSTHROUGH_DATACLASS)TypeError: Type is not JSON serializable: User>>> orjson.dumps( User("3b1", "asd", "zxc"), option=orjson.OPT_PASSTHROUGH_DATACLASS, default=default, )b'{"id":"3b1","name":"asd"}'
OPT_PASSTHROUGH_DATETIME
Passthrough datetime.datetime
, datetime.date
, and datetime.time
instancesto default
. This allows serializing datetimes to a custom format, e.g.,HTTP dates:
>>> import orjson, datetime>>>def default(obj): if isinstance(obj, datetime.datetime): return obj.strftime("%a, %d %b %Y %H:%M:%S GMT") raise TypeError>>> orjson.dumps({"created_at": datetime.datetime(1970, 1, 1)})b'{"created_at":"1970-01-01T00:00:00"}'>>> orjson.dumps({"created_at": datetime.datetime(1970, 1, 1)}, option=orjson.OPT_PASSTHROUGH_DATETIME)TypeError: Type is not JSON serializable: datetime.datetime>>> orjson.dumps( {"created_at": datetime.datetime(1970, 1, 1)}, option=orjson.OPT_PASSTHROUGH_DATETIME, default=default, )b'{"created_at":"Thu, 01 Jan 1970 00:00:00 GMT"}'
This does not affect datetimes in dict
keys if using OPT_NON_STR_KEYS.
OPT_PASSTHROUGH_SUBCLASS
Passthrough subclasses of builtin types to default
.
>>> import orjson>>>class Secret(str): passdef default(obj): if isinstance(obj, Secret): return "******" raise TypeError>>> orjson.dumps(Secret("zxc"))b'"zxc"'>>> orjson.dumps(Secret("zxc"), option=orjson.OPT_PASSTHROUGH_SUBCLASS)TypeError: Type is not JSON serializable: Secret>>> orjson.dumps(Secret("zxc"), option=orjson.OPT_PASSTHROUGH_SUBCLASS, default=default)b'"******"'
This does not affect serializing subclasses as dict
keys if usingOPT_NON_STR_KEYS.
OPT_SERIALIZE_DATACLASS
This is deprecated and has no effect in version 3. In version 2 this wasrequired to serialize dataclasses.dataclass
instances. For more, seedataclass.
OPT_SERIALIZE_NUMPY
Serialize numpy.ndarray
instances. For more, seenumpy.
OPT_SERIALIZE_UUID
This is deprecated and has no effect in version 3. In version 2 this wasrequired to serialize uuid.UUID
instances. For more, seeUUID.
OPT_SORT_KEYS
Serialize dict
keys in sorted order. The default is to serialize in anunspecified order. This is equivalent to sort_keys=True
in the standardlibrary.
This can be used to ensure the order is deterministic for hashing or tests.It has a substantial performance penalty and is not recommended in general.
>>> import orjson>>> orjson.dumps({"b": 1, "c": 2, "a": 3})b'{"b":1,"c":2,"a":3}'>>> orjson.dumps({"b": 1, "c": 2, "a": 3}, option=orjson.OPT_SORT_KEYS)b'{"a":3,"b":1,"c":2}'
This measures serializing the twitter.json fixture unsorted and sorted:
Library | unsorted (ms) | sorted (ms) | vs. orjson |
---|---|---|---|
orjson | 0.32 | 0.54 | 1 |
ujson | 1.6 | 2.07 | 3.8 |
rapidjson | 1.12 | 1.65 | 3.1 |
simplejson | 2.25 | 3.13 | 5.8 |
json | 1.78 | 2.32 | 4.3 |
The benchmark can be reproduced using the pysort
script.
The sorting is not collation/locale-aware:
>>> import orjson>>> orjson.dumps({"a": 1, "ä": 2, "A": 3}, option=orjson.OPT_SORT_KEYS)b'{"A":3,"a":1,"\xc3\xa4":2}'
This is the same sorting behavior as the standard library, rapidjson,simplejson, and ujson.
dataclass
also serialize as maps but this has no effect on them.
OPT_STRICT_INTEGER
Enforce 53-bit limit on integers. The limit is otherwise 64 bits, the same asthe Python standard library. For more, see int.
OPT_UTC_Z
Serialize a UTC timezone on datetime.datetime
instances as Z
insteadof +00:00
.
>>> import orjson, datetime, zoneinfo>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=zoneinfo.ZoneInfo("UTC")), )b'"1970-01-01T00:00:00+00:00"'>>> orjson.dumps( datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=zoneinfo.ZoneInfo("UTC")), option=orjson.OPT_UTC_Z )b'"1970-01-01T00:00:00Z"'
Fragment
orjson.Fragment
includes already-serialized JSON in a document. This is anefficient way to include JSON blobs from a cache, JSONB field, or separatelyserialized object without first deserializing to Python objects via loads()
.
>>> import orjson>>> orjson.dumps({"key": "zxc", "data": orjson.Fragment(b'{"a": "b", "c": 1}')})b'{"key":"zxc","data":{"a": "b", "c": 1}}'
It does no reformatting: orjson.OPT_INDENT_2
will not affect acompact blob nor will a pretty-printed JSON blob be rewritten as compact.
The input must be bytes
or str
and given as a positional argument.
This raises orjson.JSONEncodeError
if a str
is given and the input isnot valid UTF-8. It otherwise does no validation and it is possible towrite invalid JSON. This does not escape characters. The implementation istested to not crash if given invalid strings or invalid JSON.
This is similar to RawJSON
in rapidjson.
Deserialize
def loads(__obj: Union[bytes, bytearray, memoryview, str]) -> Any: ...
loads()
deserializes JSON to Python objects. It deserializes to dict
,list
, int
, float
, str
, bool
, and None
objects.
bytes
, bytearray
, memoryview
, and str
input are accepted. If the inputexists as a memoryview
, bytearray
, or bytes
object, it is recommended topass these directly rather than creating an unnecessary str
object. That is,orjson.loads(b"{}")
instead of orjson.loads(b"{}".decode("utf-8"))
. Thishas lower memory usage and lower latency.
The input must be valid UTF-8.
orjson maintains a cache of map keys for the duration of the process. Thiscauses a net reduction in memory usage by avoiding duplicate strings. Thekeys must be at most 64 bytes to be cached and 2048 entries are stored.
The global interpreter lock (GIL) is held for the duration of the call.
It raises JSONDecodeError
if given an invalid type or invalidJSON. This includes if the input contains NaN
, Infinity
, or -Infinity
,which the standard library allows, but is not valid JSON.
It raises JSONDecodeError
if a combination of array or object recurses1024 levels deep.
JSONDecodeError
is a subclass of json.JSONDecodeError
and ValueError
.This is for compatibility with the standard library.
Types
dataclass
orjson serializes instances of dataclasses.dataclass
natively. It serializesinstances 40-50x as fast as other libraries and avoids a severe slowdown seenin other libraries compared to serializing dict
.
It is supported to pass all variants of dataclasses, including dataclassesusing __slots__
, frozen dataclasses, those with optional or defaultattributes, and subclasses. There is a performance benefit to notusing __slots__
.
Library | dict (ms) | dataclass (ms) | vs. orjson |
---|---|---|---|
orjson | 1.40 | 1.60 | 1 |
ujson | |||
rapidjson | 3.64 | 68.48 | 42 |
simplejson | 14.21 | 92.18 | 57 |
json | 13.28 | 94.90 | 59 |
This measures serializing 555KiB of JSON, orjson natively and other librariesusing default
to serialize the output of dataclasses.asdict()
. This can bereproduced using the pydataclass
script.
Dataclasses are serialized as maps, with every attribute serialized and inthe order given on class definition:
>>> import dataclasses, orjson, typing@dataclasses.dataclassclass Member: id: int active: bool = dataclasses.field(default=False)@dataclasses.dataclassclass Object: id: int name: str members: typing.List[Member]>>> orjson.dumps(Object(1, "a", [Member(1, True), Member(2)]))b'{"id":1,"name":"a","members":[{"id":1,"active":true},{"id":2,"active":false}]}'
datetime
orjson serializes datetime.datetime
objects toRFC 3339 format,e.g., "1970-01-01T00:00:00+00:00". This is a subset of ISO 8601 and iscompatible with isoformat()
in the standard library.
>>> import orjson, datetime, zoneinfo>>> orjson.dumps( datetime.datetime(2018, 12, 1, 2, 3, 4, 9, tzinfo=zoneinfo.ZoneInfo("Australia/Adelaide")))b'"2018-12-01T02:03:04.000009+10:30"'>>> orjson.dumps( datetime.datetime(2100, 9, 1, 21, 55, 2).replace(tzinfo=zoneinfo.ZoneInfo("UTC")))b'"2100-09-01T21:55:02+00:00"'>>> orjson.dumps( datetime.datetime(2100, 9, 1, 21, 55, 2))b'"2100-09-01T21:55:02"'
datetime.datetime
supports instances with a tzinfo
that is None
,datetime.timezone.utc
, a timezone instance from the python3.9+ zoneinfo
module, or a timezone instance from the third-party pendulum
, pytz
, ordateutil
/arrow
libraries.
It is fastest to use the standard library's zoneinfo.ZoneInfo
for timezones.
datetime.time
objects must not have a tzinfo
.
>>> import orjson, datetime>>> orjson.dumps(datetime.time(12, 0, 15, 290))b'"12:00:15.000290"'
datetime.date
objects will always serialize.
>>> import orjson, datetime>>> orjson.dumps(datetime.date(1900, 1, 2))b'"1900-01-02"'
Errors with tzinfo
result in JSONEncodeError
being raised.
To disable serialization of datetime
objects specify the optionorjson.OPT_PASSTHROUGH_DATETIME
.
To use "Z" suffix instead of "+00:00" to indicate UTC ("Zulu") time, use the optionorjson.OPT_UTC_Z
.
To assume datetimes without timezone are UTC, use the option orjson.OPT_NAIVE_UTC
.
enum
orjson serializes enums natively. Options apply to their values.
>>> import enum, datetime, orjson>>>class DatetimeEnum(enum.Enum): EPOCH = datetime.datetime(1970, 1, 1, 0, 0, 0)>>> orjson.dumps(DatetimeEnum.EPOCH)b'"1970-01-01T00:00:00"'>>> orjson.dumps(DatetimeEnum.EPOCH, option=orjson.OPT_NAIVE_UTC)b'"1970-01-01T00:00:00+00:00"'
Enums with members that are not supported types can be serialized usingdefault
:
>>> import enum, orjson>>>class Custom: def __init__(self, val): self.val = valdef default(obj): if isinstance(obj, Custom): return obj.val raise TypeErrorclass CustomEnum(enum.Enum): ONE = Custom(1)>>> orjson.dumps(CustomEnum.ONE, default=default)b'1'
float
orjson serializes and deserializes double precision floats with no loss ofprecision and consistent rounding.
orjson.dumps()
serializes Nan, Infinity, and -Infinity, which are notcompliant JSON, as null
:
>>> import orjson, ujson, rapidjson, json>>> orjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])b'[null,null,null]'>>> ujson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])OverflowError: Invalid Inf value when encoding double>>> rapidjson.dumps([float("NaN"), float("Infinity"), float("-Infinity")])'[NaN,Infinity,-Infinity]'>>> json.dumps([float("NaN"), float("Infinity"), float("-Infinity")])'[NaN, Infinity, -Infinity]'
int
orjson serializes and deserializes 64-bit integers by default. The rangesupported is a signed 64-bit integer's minimum (-9223372036854775807) toan unsigned 64-bit integer's maximum (18446744073709551615). Thisis widely compatible, but there are implementationsthat only support 53-bits for integers, e.g.,web browsers. For those implementations, dumps()
can be configured toraise a JSONEncodeError
on values exceeding the 53-bit range.
>>> import orjson>>> orjson.dumps(9007199254740992)b'9007199254740992'>>> orjson.dumps(9007199254740992, option=orjson.OPT_STRICT_INTEGER)JSONEncodeError: Integer exceeds 53-bit range>>> orjson.dumps(-9007199254740992, option=orjson.OPT_STRICT_INTEGER)JSONEncodeError: Integer exceeds 53-bit range
numpy
orjson natively serializes numpy.ndarray
and individualnumpy.float64
, numpy.float32
, numpy.float16
(numpy.half
),numpy.int64
, numpy.int32
, numpy.int16
, numpy.int8
,numpy.uint64
, numpy.uint32
, numpy.uint16
, numpy.uint8
,numpy.uintp
, numpy.intp
, numpy.datetime64
, and numpy.bool
instances.
orjson is compatible with both numpy v1 and v2.
orjson is faster than all compared libraries at serializingnumpy instances. Serializing numpy data requires specifyingoption=orjson.OPT_SERIALIZE_NUMPY
.
>>> import orjson, numpy>>> orjson.dumps( numpy.array([[1, 2, 3], [4, 5, 6]]), option=orjson.OPT_SERIALIZE_NUMPY,)b'[[1,2,3],[4,5,6]]'
The array must be a contiguous C array (C_CONTIGUOUS
) and one of thesupported datatypes.
Note a difference between serializing numpy.float32
using ndarray.tolist()
or orjson.dumps(..., option=orjson.OPT_SERIALIZE_NUMPY)
: tolist()
convertsto a double
before serializing and orjson's native path does not. Thiscan result in different rounding.
numpy.datetime64
instances are serialized as RFC 3339 strings anddatetime options affect them.
>>> import orjson, numpy>>> orjson.dumps( numpy.datetime64("2021-01-01T00:00:00.172"), option=orjson.OPT_SERIALIZE_NUMPY,)b'"2021-01-01T00:00:00.172000"'>>> orjson.dumps( numpy.datetime64("2021-01-01T00:00:00.172"), option=( orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NAIVE_UTC | orjson.OPT_OMIT_MICROSECONDS ),)b'"2021-01-01T00:00:00+00:00"'
If an array is not a contiguous C array, contains an unsupported datatype,or contains a numpy.datetime64
using an unsupported representation(e.g., picoseconds), orjson falls through to default
. In default
,obj.tolist()
can be specified.
If an array is not in the native endianness, e.g., an array of big-endian valueson a little-endian system, orjson.JSONEncodeError
is raised.
If an array is malformed, orjson.JSONEncodeError
is raised.
This measures serializing 92MiB of JSON from an numpy.ndarray
withdimensions of (50000, 100)
and numpy.float64
values:
Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
---|---|---|---|
orjson | 194 | 99 | 1.0 |
ujson | |||
rapidjson | 3,048 | 309 | 15.7 |
simplejson | 3,023 | 297 | 15.6 |
json | 3,133 | 297 | 16.1 |
This measures serializing 100MiB of JSON from an numpy.ndarray
withdimensions of (100000, 100)
and numpy.int32
values:
Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
---|---|---|---|
orjson | 178 | 115 | 1.0 |
ujson | |||
rapidjson | 1,512 | 551 | 8.5 |
simplejson | 1,606 | 504 | 9.0 |
json | 1,506 | 503 | 8.4 |
This measures serializing 105MiB of JSON from an numpy.ndarray
withdimensions of (100000, 200)
and numpy.bool
values:
Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
---|---|---|---|
orjson | 157 | 120 | 1.0 |
ujson | |||
rapidjson | 710 | 327 | 4.5 |
simplejson | 931 | 398 | 5.9 |
json | 996 | 400 | 6.3 |
In these benchmarks, orjson serializes natively, ujson is blank because itdoes not support a default
parameter, and the other libraries serializendarray.tolist()
via default
. The RSS column measures peak memoryusage during serialization. This can be reproduced using the pynumpy
script.
orjson does not have an installation or compilation dependency on numpy. Theimplementation is independent, reading numpy.ndarray
usingPyArrayInterface
.
str
orjson is strict about UTF-8 conformance. This is stricter than the standardlibrary's json module, which will serialize and deserialize UTF-16 surrogates,e.g., "\ud800", that are invalid UTF-8.
If orjson.dumps()
is given a str
that does not contain valid UTF-8,orjson.JSONEncodeError
is raised. If loads()
receives invalid UTF-8,orjson.JSONDecodeError
is raised.
orjson and rapidjson are the only compared JSON libraries to consistentlyerror on bad input.
>>> import orjson, ujson, rapidjson, json>>> orjson.dumps('\ud800')JSONEncodeError: str is not valid UTF-8: surrogates not allowed>>> ujson.dumps('\ud800')UnicodeEncodeError: 'utf-8' codec ...>>> rapidjson.dumps('\ud800')UnicodeEncodeError: 'utf-8' codec ...>>> json.dumps('\ud800')'"\\ud800"'>>> orjson.loads('"\\ud800"')JSONDecodeError: unexpected end of hex escape at line 1 column 8: line 1 column 1 (char 0)>>> ujson.loads('"\\ud800"')''>>> rapidjson.loads('"\\ud800"')ValueError: Parse error at offset 1: The surrogate pair in string is invalid.>>> json.loads('"\\ud800"')'\ud800'
To make a best effort at deserializing bad input, first decode bytes
usingthe replace
or lossy
argument for errors
:
>>> import orjson>>> orjson.loads(b'"\xed\xa0\x80"')JSONDecodeError: str is not valid UTF-8: surrogates not allowed>>> orjson.loads(b'"\xed\xa0\x80"'.decode("utf-8", "replace"))'���'
uuid
orjson serializes uuid.UUID
instances toRFC 4122 format, e.g.,"f81d4fae-7dec-11d0-a765-00a0c91e6bf6".
>>> import orjson, uuid>>> orjson.dumps(uuid.UUID('f81d4fae-7dec-11d0-a765-00a0c91e6bf6'))b'"f81d4fae-7dec-11d0-a765-00a0c91e6bf6"'>>> orjson.dumps(uuid.uuid5(uuid.NAMESPACE_DNS, "python.org"))b'"886313e1-3b8a-5372-9b90-0c9aee199e5d"'
Testing
The library has comprehensive tests. There are tests against fixtures in theJSONTestSuite andnativejson-benchmarkrepositories. It is tested to not crash against theBig List of Naughty Strings.It is tested to not leak memory. It is tested to not crashagainst and not accept invalid UTF-8. There are integration testsexercising the library's use in web servers (gunicorn using multiprocess/forkedworkers) and whenmultithreaded. It also uses some tests from the ultrajson library.
orjson is the most correct of the compared libraries. This graph shows how eachlibrary handles a combined 342 JSON fixtures from theJSONTestSuite andnativejson-benchmark tests:
Library | Invalid JSON documents not rejected | Valid JSON documents not deserialized |
---|---|---|
orjson | 0 | 0 |
ujson | 31 | 0 |
rapidjson | 6 | 0 |
simplejson | 10 | 0 |
json | 17 | 0 |
This shows that all libraries deserialize valid JSON but only orjsoncorrectly rejects the given invalid JSON fixtures. Errors are largely due toaccepting invalid strings and numbers.
The graph above can be reproduced using the pycorrectness
script.
Performance
Serialization and deserialization performance of orjson is better thanultrajson, rapidjson, simplejson, or json. The benchmarks are done onfixtures of real data:
twitter.json, 631.5KiB, results of a search on Twitter for "一", containingCJK strings, dictionaries of strings and arrays of dictionaries, indented.
github.json, 55.8KiB, a GitHub activity feed, containing dictionaries ofstrings and arrays of dictionaries, not indented.
citm_catalog.json, 1.7MiB, concert data, containing nested dictionaries ofstrings and arrays of integers, indented.
canada.json, 2.2MiB, coordinates of the Canadian border in GeoJSONformat, containing floats and arrays, indented.
Latency
twitter.json serialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 0.1 | 8377 | 1 |
ujson | 0.9 | 1088 | 7.3 |
rapidjson | 0.8 | 1228 | 6.8 |
simplejson | 1.9 | 531 | 15.6 |
json | 1.4 | 744 | 11.3 |
twitter.json deserialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 0.6 | 1811 | 1 |
ujson | 1.2 | 814 | 2.1 |
rapidjson | 2.1 | 476 | 3.8 |
simplejson | 1.6 | 626 | 3 |
json | 1.8 | 557 | 3.3 |
github.json serialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 0.01 | 104424 | 1 |
ujson | 0.09 | 10594 | 9.8 |
rapidjson | 0.07 | 13667 | 7.6 |
simplejson | 0.2 | 5051 | 20.6 |
json | 0.14 | 7133 | 14.6 |
github.json deserialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 0.05 | 20069 | 1 |
ujson | 0.11 | 8913 | 2.3 |
rapidjson | 0.13 | 8077 | 2.6 |
simplejson | 0.11 | 9342 | 2.1 |
json | 0.11 | 9291 | 2.2 |
citm_catalog.json serialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 0.3 | 3757 | 1 |
ujson | 1.7 | 598 | 6.3 |
rapidjson | 1.3 | 768 | 4.9 |
simplejson | 8.3 | 120 | 31.1 |
json | 3 | 331 | 11.3 |
citm_catalog.json deserialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 1.4 | 730 | 1 |
ujson | 2.6 | 384 | 1.9 |
rapidjson | 4 | 246 | 3 |
simplejson | 3.7 | 271 | 2.7 |
json | 3.7 | 267 | 2.7 |
canada.json serialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 2.4 | 410 | 1 |
ujson | 9.6 | 104 | 3.9 |
rapidjson | 28.7 | 34 | 11.8 |
simplejson | 49.3 | 20 | 20.3 |
json | 30.6 | 32 | 12.6 |
canada.json deserialization
Library | Median latency (milliseconds) | Operations per second | Relative (latency) |
---|---|---|---|
orjson | 3 | 336 | 1 |
ujson | 7.1 | 141 | 2.4 |
rapidjson | 20.1 | 49 | 6.7 |
simplejson | 16.8 | 59 | 5.6 |
json | 18.2 | 55 | 6.1 |
Memory
orjson as of 3.7.0 has higher baseline memory usage than other librariesdue to a persistent buffer used for parsing. Incremental memory usage whendeserializing is similar to the standard library and other third-partylibraries.
This measures, in the first column, RSS after importing a library and readingthe fixture, and in the second column, increases in RSS after repeatedlycalling loads()
on the fixture.
twitter.json
Library | import, read() RSS (MiB) | loads() increase in RSS (MiB) |
---|---|---|
orjson | 15.7 | 3.4 |
ujson | 16.4 | 3.4 |
rapidjson | 16.6 | 4.4 |
simplejson | 14.5 | 1.8 |
json | 13.9 | 1.8 |
github.json
Library | import, read() RSS (MiB) | loads() increase in RSS (MiB) |
---|---|---|
orjson | 15.2 | 0.4 |
ujson | 15.4 | 0.4 |
rapidjson | 15.7 | 0.5 |
simplejson | 13.7 | 0.2 |
json | 13.3 | 0.1 |
citm_catalog.json
Library | import, read() RSS (MiB) | loads() increase in RSS (MiB) |
---|---|---|
orjson | 16.8 | 10.1 |
ujson | 17.3 | 10.2 |
rapidjson | 17.6 | 28.7 |
simplejson | 15.8 | 30.1 |
json | 14.8 | 20.5 |
canada.json
Library | import, read() RSS (MiB) | loads() increase in RSS (MiB) |
---|---|---|
orjson | 17.2 | 22.1 |
ujson | 17.4 | 18.3 |
rapidjson | 18 | 23.5 |
simplejson | 15.7 | 21.4 |
json | 15.4 | 20.4 |
Reproducing
The above was measured using Python 3.11.9 on Linux (amd64) withorjson 3.10.6, ujson 5.10.0, python-rapidson 1.18, and simplejson 3.19.2.
The latency results can be reproduced using the pybench
and graph
scripts. The memory results can be reproduced using the pymem
script.
Questions
Why can't I install it from PyPI?
Probably pip
needs to be upgraded to version 20.3 or later to supportthe latest manylinux_x_y or universal2 wheel formats.
"Cargo, the Rust package manager, is not installed or is not on PATH."
This happens when there are no binary wheels (like manylinux) for yourplatform on PyPI. You can install Rust throughrustup
or a package manager and then it will compile.
Will it deserialize to dataclasses, UUIDs, decimals, etc or support object_hook?
No. This requires a schema specifying what types are expected and how tohandle errors etc. This is addressed by data validation libraries alevel above this.
Will it serialize to str
?
No. bytes
is the correct type for a serialized blob.
Packaging
To package orjson requires at least Rust 1.72and the maturin build tool. The recommendedbuild command is:
maturin build --release --strip
It benefits from also having a C build environment to compile a fasterdeserialization backend. See this project's manylinux_2_28
builds for anexample using clang and LTO.
The project's own CI tests against nightly-2024-08-05
and stable 1.72. Itis prudent to pin the nightly version because that channel can introducebreaking changes.
orjson is tested for amd64, aarch64, arm7, ppc64le, and s390x on Linux. Itis tested for either aarch64 or amd64 on macOS and cross-compiles for the other,depending on version. For Windows it is tested on amd64 and i686.
There are no runtime dependencies other than libc.
The source distribution on PyPI contains all dependencies' source and can bebuilt without network access. The file can be downloaded fromhttps://files.pythonhosted.org/packages/source/o/orjson/orjson-${version}.tar.gz
.
orjson's tests are included in the source distribution on PyPI. Therequirements to run the tests are specified in test/requirements.txt
. Thetests should be run as part of the build. It can be run withpytest -q test
.
License
orjson was written by ijl <ijl@mailbox.org>, copyright 2018 - 2024, availableto you under either the Apache 2 license or MIT license at your choice.