Skip to content Skip to sidebar Skip to footer

How To Recursively Sort Yaml (with Anchors) Using Commentedmap?

I'm facing issue with the recursive sort solution proposed here I cannot sort YAML file with anchor and sub-elements. The .pop method call is throwing a KeyError exception. Ex: vol

Solution 1:

The approach I take with solving these kind of things, is first to add the expected and necessary imports, define the input and expected output as multiline strings, and add a useful diff method to the YAML instance.

String input is easier to work with than files while testing as everything is in one file (need to remove some trailing spaces?) and you cannot overwrite your input and start the next run with something different than the first.

importsysimportdifflibimportruamel.yamlfromruamel.yaml.commentsimportmerge_attribyaml_in="""\
_world:
  anchor_struct: &anchor_struct
    foo:
      val: "foo"bar:val:"foo"string:"string"newmsg:&newmsgmsg:"msg"foo:"foo"new:"new"anchr_val:&anchor_valfamous_valbool:Trueelem2:myStruct:<<:*anchor_structanchor_val:*anchor_val<<:*anchor_structzzz:zorglubwww:webanchor_struct:<<:*anchor_structother_elem:"other_elem"elem1:<<:*anchor_structzzz:zorglubnewmsg:<<:*newmsgmsg:"msg2"myStruct:<<:*anchor_structanchor_struct:second_elem:"second_elem"<<:*anchor_structother_elem:"other_elem"www:webanchor_val:*anchor_val"""

yaml_out = """\_world:anchor_struct:&anchor_structbar:val:"foo"foo:val:"foo"anchr_val:&anchor_valfamous_valbool:Truenewmsg:&newmsgfoo:"foo"msg:"msg"new:"new"string:"string"elem1:<<:*anchor_structanchor_struct:<<:*anchor_structother_elem:"other_elem"second_elem:"second_elem"anchor_val:*anchor_valmyStruct:<<:*anchor_structnewmsg:<<:*newmsgmsg:"msg2"www:webzzz:zorglubelem2:<<:*anchor_structanchor_struct:<<:*anchor_structother_elem:"other_elem"anchor_val:*anchor_valmyStruct:<<:*anchor_structwww:webzzz:zorglub"""


def diff_yaml(self, data, s, fnin="in",fnout="out"):# dump data if necessary and compare with sinl= [l.rstrip()+'\n'forlins.splitlines()]   # trailing space at end of line disregardedifnotisinstance(data,str):buf=ruamel.yaml.compat.StringIO()self.dump(data,buf)outl=buf.getvalue().splitlines(True)else:outl= [l.rstrip()+'\n'forlindata.splitlines()]
    diff=difflib.unified_diff(inl,outl,fnin,fnout)result=Truefor line in diff:sys.stdout.write(line)result=Falsereturnresultruamel.yaml.YAML.diff=diff_yamlyaml=ruamel.yaml.YAML()# yaml.indent(mapping=4, sequence=4, offset=2)yaml.boolean_representation= ["False", "True"]
yaml.preserve_quotes=True

Then make sure your excpected output is valid, and can be round-tripped:

dout = yaml.load(yaml_out)
buf = ruamel.yaml.compat.StringIO()
yaml.dump(dout, buf)
assert yaml.diff(dout, yaml_out)

which should not give output nor an assertion error ( there is trailing whitespace in your expected output, as well as the not default True boolean). If the expected output cannot be round-tripped, ruamel.yaml might not be able dump your expected output.

If you are stuck can now inspect dout to determine what your parsed input should look like.

So now try the recursive_sort

def recursive_sort_mappings(s):
    if isinstance(s, list):
        for elem in s:
            recursive_sort_mappings(elem)
        return 
    if not isinstance(s, dict):
        return
    for key in sorted(s, reverse=True):
        value = s.pop(key)
        recursive_sort_mappings(value)
        s.insert(0, key, value)

din = yaml.load(yaml_in)
recursive_sort_mappings(din)
yaml.diff(din, yaml_out)

Which gives quite a bit of output, as the recursive_sort_mappings doesn't know about merges and runs over all the keys, tries to keep merge keys in their original position, and additionally when popping a key (before reinserting it in the first position), does some magic in case the popped value exists in a merged mapping:

---in+++out@@-1,8+1,8@@_world:anchor_struct:&anchor_struct-bar:+bar:&id001val:"foo"-foo:+foo:&id002val:"foo"anchr_val:&anchor_valfamous_valbool:True@@-14,24+14,38@@elem1:<<:*anchor_structanchor_struct:+bar:*id001<<:*anchor_struct+foo:*id002other_elem:"other_elem"second_elem:"second_elem"anchor_val:*anchor_val+bar:*id001+foo:*id002myStruct:<<:*anchor_struct+bar:*id001+foo:*id002newmsg:<<:*newmsg+foo:"foo"msg:"msg2"+new:"new"www:webzzz:zorglubelem2:-<<:*anchor_structanchor_struct:<<:*anchor_struct+bar:*id001+foo:*id002other_elem:"other_elem"anchor_val:*anchor_val+<<:*anchor_struct+bar:*id001+foo:*id002myStruct:<<:*anchor_struct+bar:*id001+foo:*id002www:webzzz:zorglub

To solve this you need to do multiple things. First you need to abandon the .insert(), which emulation (for the Python3 built-in OrderedDict) the method defined C ordereddict package ruamel.ordereddict. This emulation recreates the OrderedDict and that leads to duplication. Python3 C implementation, has a less powerful (than .insert()), but in this case useful method move_to_end (Which could be be used in an update to the .insert() emulation in ruamel.yaml).

Second you need only to walk over the "real" keys, not those keys provided by merges, so you cannot use for key in.

Third you need the merge key to move to the top of mapping if it is somewhere else.

(The level argument was added for debugging purposes)

defrecursive_sort_mappings(s, level=0):
    ifisinstance(s, list): 
        for elem in s:
            recursive_sort_mappings(elem, level=level+1)
        returnifnotisinstance(s, dict):
        return
    merge = getattr(s, merge_attrib, [None])[0]
    if merge isnotNoneand merge[0] != 0:  # << not in first position, move itsetattr(s, merge_attrib, [(0, merge[1])])

    for key insorted(s._ok): # _ok -> set of Own Keys, i.e. not merged in keys
        value = s[key]
        # print('v1', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
        recursive_sort_mappings(value, level=level+1)
        # print('v2', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
        s.move_to_end(key)


din = yaml.load(yaml_in)
recursive_sort_mappings(din)
assert yaml.diff(din, yaml_out)

And then the diff no longer gives output.

Post a Comment for "How To Recursively Sort Yaml (with Anchors) Using Commentedmap?"