As a data format, yaml is extremely complicated and it has many footguns. In this post I explain some of those pitfalls by means of an example, and I suggest a few simpler and safer yaml alternatives.
A 10 minute read covering some YAML edge-cases that you should have in mind when writing complex YAML files
Most of the problems can be totally avoided by telling the YAML loader what type you’re expecting instead of forcing it to guess (e.g. provide a schema or use typed getter functions). If it has to guess, it’s no surprise that some things don’t survive the string to inferred type to desired type journey, and this is something that isn’t seen as a dealbreaker in other contexts, e.g. the multitude of languages where the string "false" evaluates to true when converted to a boolean because it’s non-empty.
In any almost other context (where boolean values exist), strings must be delimited by quotes, eliminating the ambiguity with false as string contents and the false boolean value
Putting "false" in a YAML file gives you a string, and just false on its own gives you a boolean, unless you tell the YAML library that it’s a string. Part of the point of YAML is that you don’t have to specify lots of stuff that’s redundant except when it would otherwise be ambiguous, and people misinterpret that as never having to specify anything ever.
The problem is specifically that in’t not exactly clear what’s considered ambiguous. For instance, no is the same thing as false, but as evidenced in the linked post, in the context of country codes, it means “Norway” and it’s not obvious that it might get interpreted as a boolean value.
It’s the same thing as this famous meme about implicit type conversions in JS :
TBF (I don’t often defend JS) - one of those is just “standard floating point issues” that every developer should be aware of. Computers cannot represent an infinite array of numbers between 0 and 1.
The first four of them are “just how floats work”, yeah. Has nothing to do with JavaScript.
typeofNaN// "number"
Classic, yes, very funny. “NaN stands for ‘not a number’ but it says it’s a number”. But for real though. It’s still a variable that’s the Number type, but its contents happen to be invalid. It’s Not a (Valid) Number.
The next three are just classic floating point precision moments.
The Math.max() and Math.min() ones are interesting. Seems that under the hood, both methods implicitly have a fallback “number” that it compares to any argument list you give it that will auto-lose (or at closest, tie) with any other valid number you can possibly give it, so when you give it nothing at all, they leak out. Honestly, makes sense. Kinda ludicrous it needs to have defined behavior for a zero-argument call in the first place. But JS is one of those silly languages that lets you stuff in or omit as many arguments as you want with no consequences, function signature be damned. So as long as that paradigm exists, the zero-argument case probably ought to do something, and IMO this isn’t the worst choice.
Every other one is bog standard truthy/type coercion shitlery. A demonstration of why implicit type coercion as a language feature is stupid.
Most of the problems can be totally avoided by telling the YAML loader what type you’re expecting instead of forcing it to guess (e.g. provide a schema or use typed getter functions). If it has to guess, it’s no surprise that some things don’t survive the string to inferred type to desired type journey, and this is something that isn’t seen as a dealbreaker in other contexts, e.g. the multitude of languages where the string
"false"
evaluates to true when converted to a boolean because it’s non-empty.In any almost other context (where boolean values exist), strings must be delimited by quotes, eliminating the ambiguity with
false
as string contents and thefalse
boolean valuePutting
"false"
in a YAML file gives you a string, and justfalse
on its own gives you a boolean, unless you tell the YAML library that it’s a string. Part of the point of YAML is that you don’t have to specify lots of stuff that’s redundant except when it would otherwise be ambiguous, and people misinterpret that as never having to specify anything ever.The problem is specifically that in’t not exactly clear what’s considered ambiguous. For instance,
no
is the same thing asfalse
, but as evidenced in the linked post, in the context of country codes, it means “Norway” and it’s not obvious that it might get interpreted as a boolean value.It’s the same thing as this famous meme about implicit type conversions in JS :
TBF (I don’t often defend JS) - one of those is just “standard floating point issues” that every developer should be aware of. Computers cannot represent an infinite array of numbers between 0 and 1.
The first four of them are “just how floats work”, yeah. Has nothing to do with JavaScript.
typeof NaN // "number"
Classic, yes, very funny. “NaN stands for ‘not a number’ but it says it’s a number”. But for real though. It’s still a variable that’s the Number type, but its contents happen to be invalid. It’s Not a (Valid) Number.
The next three are just classic floating point precision moments.
The
Math.max()
andMath.min()
ones are interesting. Seems that under the hood, both methods implicitly have a fallback “number” that it compares to any argument list you give it that will auto-lose (or at closest, tie) with any other valid number you can possibly give it, so when you give it nothing at all, they leak out. Honestly, makes sense. Kinda ludicrous it needs to have defined behavior for a zero-argument call in the first place. But JS is one of those silly languages that lets you stuff in or omit as many arguments as you want with no consequences, function signature be damned. So as long as that paradigm exists, the zero-argument case probably ought to do something, and IMO this isn’t the worst choice.Every other one is bog standard truthy/type coercion shitlery. A demonstration of why implicit type coercion as a language feature is stupid.