How To Recursively Sort Yaml (with Anchors) Using Commentedmap?
Solution 1:
The approach I take with solving these kind of things, is first to add the
expected and necessary imports, define the input and expected output as
multiline strings, and add a useful diff
method to the YAML instance.
String input is easier to work with than files while testing as everything is in one file (need to remove some trailing spaces?) and you cannot overwrite your input and start the next run with something different than the first.
importsysimportdifflibimportruamel.yamlfromruamel.yaml.commentsimportmerge_attribyaml_in="""\
_world:
anchor_struct: &anchor_struct
foo:
val: "foo"bar:val:"foo"string:"string"newmsg:&newmsgmsg:"msg"foo:"foo"new:"new"anchr_val:&anchor_valfamous_valbool:Trueelem2:myStruct:<<:*anchor_structanchor_val:*anchor_val<<:*anchor_structzzz:zorglubwww:webanchor_struct:<<:*anchor_structother_elem:"other_elem"elem1:<<:*anchor_structzzz:zorglubnewmsg:<<:*newmsgmsg:"msg2"myStruct:<<:*anchor_structanchor_struct:second_elem:"second_elem"<<:*anchor_structother_elem:"other_elem"www:webanchor_val:*anchor_val"""
yaml_out = """\_world:anchor_struct:&anchor_structbar:val:"foo"foo:val:"foo"anchr_val:&anchor_valfamous_valbool:Truenewmsg:&newmsgfoo:"foo"msg:"msg"new:"new"string:"string"elem1:<<:*anchor_structanchor_struct:<<:*anchor_structother_elem:"other_elem"second_elem:"second_elem"anchor_val:*anchor_valmyStruct:<<:*anchor_structnewmsg:<<:*newmsgmsg:"msg2"www:webzzz:zorglubelem2:<<:*anchor_structanchor_struct:<<:*anchor_structother_elem:"other_elem"anchor_val:*anchor_valmyStruct:<<:*anchor_structwww:webzzz:zorglub"""
def diff_yaml(self, data, s, fnin="in",fnout="out"):# dump data if necessary and compare with sinl= [l.rstrip()+'\n'forlins.splitlines()] # trailing space at end of line disregardedifnotisinstance(data,str):buf=ruamel.yaml.compat.StringIO()self.dump(data,buf)outl=buf.getvalue().splitlines(True)else:outl= [l.rstrip()+'\n'forlindata.splitlines()]
diff=difflib.unified_diff(inl,outl,fnin,fnout)result=Truefor line in diff:sys.stdout.write(line)result=Falsereturnresultruamel.yaml.YAML.diff=diff_yamlyaml=ruamel.yaml.YAML()# yaml.indent(mapping=4, sequence=4, offset=2)yaml.boolean_representation= ["False", "True"]
yaml.preserve_quotes=True
Then make sure your excpected output is valid, and can be round-tripped:
dout = yaml.load(yaml_out)
buf = ruamel.yaml.compat.StringIO()
yaml.dump(dout, buf)
assert yaml.diff(dout, yaml_out)
which should not give output nor an assertion error ( there is trailing
whitespace in your expected output, as well as the not default True
boolean). If the expected output cannot be round-tripped, ruamel.yaml might not be able dump your expected output.
If you are stuck can now inspect dout
to determine what your parsed input should look like.
So now try the recursive_sort
def recursive_sort_mappings(s):
if isinstance(s, list):
for elem in s:
recursive_sort_mappings(elem)
return
if not isinstance(s, dict):
return
for key in sorted(s, reverse=True):
value = s.pop(key)
recursive_sort_mappings(value)
s.insert(0, key, value)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
yaml.diff(din, yaml_out)
Which gives quite a bit of output, as the recursive_sort_mappings
doesn't know
about merges and runs over all the keys, tries to keep merge keys in their original position, and additionally when popping a key (before reinserting it in the
first position), does some magic in case the popped value exists in a merged mapping:
---in+++out@@-1,8+1,8@@_world:anchor_struct:&anchor_struct-bar:+bar:&id001val:"foo"-foo:+foo:&id002val:"foo"anchr_val:&anchor_valfamous_valbool:True@@-14,24+14,38@@elem1:<<:*anchor_structanchor_struct:+bar:*id001<<:*anchor_struct+foo:*id002other_elem:"other_elem"second_elem:"second_elem"anchor_val:*anchor_val+bar:*id001+foo:*id002myStruct:<<:*anchor_struct+bar:*id001+foo:*id002newmsg:<<:*newmsg+foo:"foo"msg:"msg2"+new:"new"www:webzzz:zorglubelem2:-<<:*anchor_structanchor_struct:<<:*anchor_struct+bar:*id001+foo:*id002other_elem:"other_elem"anchor_val:*anchor_val+<<:*anchor_struct+bar:*id001+foo:*id002myStruct:<<:*anchor_struct+bar:*id001+foo:*id002www:webzzz:zorglub
To solve this you need to do multiple things. First you need to abandon the .insert(), which emulation (for the Python3 built-in OrderedDict
) the method defined C ordereddict package ruamel.ordereddict. This emulation recreates the OrderedDict and
that leads to duplication. Python3 C implementation, has a less powerful (than .insert()
), but in this case useful
method move_to_end
(Which could be be used in an update to the .insert()
emulation in ruamel.yaml).
Second you need only to walk over the "real" keys, not those keys provided by merges, so you cannot use for key in
.
Third you need the merge key to move to the top of mapping if it is somewhere else.
(The level
argument was added for debugging purposes)
defrecursive_sort_mappings(s, level=0):
ifisinstance(s, list):
for elem in s:
recursive_sort_mappings(elem, level=level+1)
returnifnotisinstance(s, dict):
return
merge = getattr(s, merge_attrib, [None])[0]
if merge isnotNoneand merge[0] != 0: # << not in first position, move itsetattr(s, merge_attrib, [(0, merge[1])])
for key insorted(s._ok): # _ok -> set of Own Keys, i.e. not merged in keys
value = s[key]
# print('v1', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
recursive_sort_mappings(value, level=level+1)
# print('v2', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
s.move_to_end(key)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
assert yaml.diff(din, yaml_out)
And then the diff no longer gives output.
Post a Comment for "How To Recursively Sort Yaml (with Anchors) Using Commentedmap?"