Load Part Of A Json In Python

January 31, 2023 Post a Comment

I have a json file with about a 1000 data entries. For example {'1':'Action','2':'Adventure',.....'1000':'Mystery'} The above is just a example. I am using the json.load feature

Solution 1:

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.

After loading, you could take the ten key-value pairs with the lowest key value:

import heapq
import json

data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}

The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.

Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:

data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}

If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():

class FirstTenDict(dict):
    def __init__(self, pairs):
        super(FirstTenDict, self).__init__(pairs[:10])

data = json.loads(json_string, object_pairs_hook=FirstTenDict)

Demo of the latter approach:

>>> import json
>>> class FirstTenDict(dict):
...     def __init__(self, pairs):
...         super(FirstTenDict, self).__init__(pairs[:10])
... 
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
...  "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
...  "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
 'foo10': 'spam',
 'foo11': 'eric',
 'foo21': 'monty',
 'foo24': 'vikings',
 'foo31': 'baz',
 'foo42': 'bar',
 'foo44': 'ham',
 'foo65': 'idle',
 'foo88': 'python'}

Solution 2:

You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:

import ijson

def iter_items(parser):
    for prefix, event, value in parser:
        if event == 'string':
            yield prefix, value

with open('filename.json') as infile:
    items = iter_items(ijson.parser(infile))
    # choose one of the following
    # first 10 items from the file regardless of keys
    print dict(itertools.islice(items, 10))
    # least 10 keys when considered as integers
    print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))

Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).

By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.

Solution 3:

file = 'data.json'
with open(file, 'rb') as f:
    content = json.load(file)

what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}

I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

Solution 4:

In short, you can't.

While each entry is a JSON entry, the file as a whole is a valid JSON file.

Baca Juga

For example:

"1":"Action" is proper JSON format, but you cannot load it on its own.

In order to be able to import it as a JSON format, you'll need the full syntax of it {"1":"Action"}

What you'll need to do is still load the whole file, then assign first 10 lines to a variable.

Solution 5:

You have two options:

If you use Python >= 3.1 you can use

from collections import OrderedDict
decoder = json.JSONDecoder(object_pairs_hook=OrderedDict)
data = decoder.decode(datastring)

This will decode the whole file, but keep all key-value pairs in the same order as they were in the file.

Then you can slice the first n items with something like

result = OrderedDict((k,v) for (k,v),i in zip(data.items(), range(n)))

This isn't efficient, but you will get the first 10 entries, as they were written in the JSON.

The second option and the more efficient but harder one is using an iterative JSON parser like ijson as @steve-jessop mentioned.

If and only if your JSON files are always flat (don't contain any subobjects or lists), as your example in the question, the following code will put the first 10 elements into result. More complex files need more complex parser code.

import ijson
result = {}
for prefix, event, value in ijson.parse(file):
  if event == 'map_key':
    if len(result) > 10:
      break
  if prefix:
    result[prefix] = value

Python College