Friday, February 9

jq: serialize for wordpress

This kind of just rolls off the tongue, right?

def towp: 
  if "null"==type then "N;"
  elif "boolean"==type then if . then "b:1;" else "b:0;" end
  elif "number"==type then if .==(.|floor) and .<9e15 then "i:"+(.|tostring)+";" else "d:"+(.|tostring)+";" end
  elif "string"==type then "s:"+(.|length|tostring)+":\""+.+"\";"
  else "a:"+(.|length|tostring)+":{"+(.|[to_entries|map(to_entries)[]|map(.value|towp)|add]|add)+"}"

For some reason, even though wordpress uses php, its serialized objects use a slightly different format from what's documented at -- see for a decent description and for a more official description.

Note, in particular, that in this wordpress format, strings get enclosing quotes but the reported length does not include those quotes. I do not yet know what this looks like for quotes and backslashes within a string - there are at least four possibilities (two hideously broken) for how that might be handled.

One issue, I imagine, is that when you delete an element from a php array, indices do not change for items after that point. The php serialize format does not preserve indices, but the wordpress serialize format does.

Anyways... it's a defining characteristic of software that "what works" generally takes precedence over "what's formally correct" (at least, if you want it to work).

Thus, although I keep harping on the distinction between "boolean" as used in software and its history:
... I still have to use the word "boolean" in my code (and in my web searches when I want to be reminded of related syntactic issues or whatever else -- on the plus side, long words that are relatively meaningless do have some advantages when searching, as long as you know you need to search for tham).

But I guess that's related to one of the nice things about standards... there's so many to choose from.

And I guess that's also related to how you don't really want to use this on stuff that's already a string (unless sometimes it has to be something else, and you want to preserve that...).

And, also, related to how blogger's layout is so ornate that it pretty much has to be fixed width (which means it will be wrong for some desktops (they pretty much have to disable the theme support for phones - it's really that insane). [There is an option to revert the blog to "classic themes", but the preview on that is not really a preview at all, and I need to be doing other things now...]

And, I guess another loosely related issue is how incredibly difficult it can be to report bugs. (There's just so many people in the world - billions - and most of them do not have the frame of mind to understand what a meaningful bug report is. So as more and more people come online, things get more and more messed up and things that used to work start failing as a consequence. ... Eventually, the failures may become visible enough to get fixed anyways, but all too often huge issues can be neglected for decades, or longer. And, just figuring out where (or if) they should be fixed can be daunting.)

Anyways... here's the reverse transform:

def _fwp($P):
  $ as $j|
  .[$j:$j+1] as $typ|
  if "end"==$P.op then
    if ";"==$typ then
      error("expected ';' at position "+($j|tostring)+" got '"+$typ+"' ")
  elif "endarray"==$P.op then
    if "}"==$typ then
      error("expected '}' at position "+($j|tostring)+" got '"+$typ+"'")
  elif "start"==$P.op then
    if "N"==$typ then
    elif ":"!=.[$j+1:$j+2] then
      error("expected : at position "+($j|tostring)+" got '"+.[$j+1:$j+2]+"'")
    elif "b" == $typ then
      .[$j+2:$j+3] as $t|
      if "0"==$t then
      elif "1"==$t then
        error("expected 0 or 1 at position "+($j+2|tostring)+" got '"+$t+"'")
    elif ("i"==$typ) or ("d"==$typ) then
      (.[$j+2:]|match("[^;]*")) as $match|
    elif "s"==$typ then
      (.[$j+2:]|match("[^:]*")) as $match|
      ($match.string|tonumber) as $strlen|
      ($j+4+$match.length) as $J|
      if ":\""==.[$J-2:$J] then
        .[$J:$J+$strlen] as $str|
        error("expected ':' at "+($J|tostring)+" got '"+.[$J-1:$J]+"'")
    elif "a"==$typ then
      (.[$j+2:]|match("[^:]*")) as $match|
      if 0==$match.length then
        error("invalid array length at position "+($j|tostring)+" got nothing")
        ($match.string|tonumber) as $alen|
        ($j+3+$match.length) as $j|
        if ":"==.[$j-1:$j] then
          error("expected : at start of array at position "+($j|tostring)+" got '"+.[$j-1:$j]+"'")
      error("unrecognized type "+$typ+" at position "+($j|tostring))
  elif "startarray"==$P.op then
    if "{"==.[$j:$j+1] then
      if "i"==.[$j+1:$j+2] then
      elif "s"==.[$j+1:$j+2] then
      elif "}"==.[$j+1:$j+2] then
        error("invalid index type '"+.[$j:$j+1]+"' at position "+($j|tostring))
      error("expected { at start of array at position "+($j|tostring)+" got '"+.[$j:$j+1]+"'")
  elif "array"==$P.op then
    if 0==$P.len then
      $P.r as $r|
      _fwp($P|.op="start") as $P|
      $P.r as $key|
      if "number"==($key|type) then
        _fwp($P|.op="start") as $P|
        $P.r as $val|
        error("invalid array index "+($key|tostring)+" at position "+($j|tostring))
  elif "object"==$P.op then
    if 0==$P.len then
      $P.r as $r|
      _fwp($P|.op="start") as $P|
      $P.r as $key|
      if "string"==($key|type) then
        _fwp($P|.op="start") as $P|
        $P.r as $val|
        _fwp($P|.op="object"|.len|=(.-1)|.r=$r+{($key): $val})
        error("invalid object index "+($key|tostring)+" at position "+($j|tostring))
    error("program bug (this should never happen)")

def fromwp:

You wind up needing to a parser in jq for the reverse transform, and it mostly has to go into a single recursive function because functions can't refer to other functions which have yet to be defined, and I couldn't think of any way of breaking out significant chunks that made sense, with that limitation.

It doesn't help that (at least in version 1.5) jq's error handling is kind of useless (does not tell you where in the code the error occurred). So I threw in some forced errors to help track down problems (I could do better about reporting invalid where invalid numbers occurred, but try/catch in jq 1.5 mixed with error statements in code like this can lose track of context - in some cases which I found difficult to isolate I was seeing draft versions of this code trying to parse an error statement instead of the text it was supposed to be parsing.)

Another quirk here is that I threw in the support for double quotes around strings (why bother using a numeric string length that does not include those quotes? Scary design process there...) at the last minute, and I'm not bothering to check for a closing quote - I'm just skipping over that position without checking (which, ok, works just fine... but is sloppy and allows for future specification entropy in a bad way).

(Another issue with parsing nunbers is that I might be asking jq to inspect the entire rest of the unparsed string to see where the number ends - I tried limiting it, using .[$j+2:30] instead of .[$j:], but jq would decide in some cases that that was an error (near the end of the string). I decided I did not want the complexity if trying to micromanage that issue - things are messy enough already and I didn't want the hard-to-debug number parsing to be even more obscure, so just went with the simple .[$j:] approach.)

This is painfully slow on large (50k) objects, but it seems to work...

I should build a report_parse_error({offset, expected}) routine based on J's multi-line reporting style and use that here. That should make for easier to read error messages.

I also should build an extract_next_number($offset) filter and use it here - that would let me be smart about how big of a string I'm searching for the end of the number in (can do minimum of +30 or remainder of string), which should be a big performance win.

Sunday, November 19

And now, some drive-by on religious deception

This is going to be a totally embarrassing write-up, I expect. But I need to get this off my plate.

I ran across recently, and ... well...

Article 1
WE AFFIRM that God has designed marriage to be a covenantal, sexual, procreative, lifelong union of one man and one woman, as husband and wife, and is meant to signify the covenant love between Christ and his bride the church.
WE DENY that God has designed marriage to be a homosexual, polygamous, or polyamorous relationship. We also deny that marriage is a mere human contract rather than a covenant made before God.

Exodus 21:10 "If he take him another wife; her food, her raiment, and her duty of marriage, shall he not diminish."

Article 2
WE AFFIRM that God’s revealed will for all people is chastity outside of marriage and fidelity within marriage.
WE DENY that any affections, desires, or commitments ever justify sexual intercourse before or outside marriage; nor do they justify any form of sexual immorality.

Maybe worth reviewing Joshua 2 here?

Article 10
WE AFFIRM that it is sinful to approve of homosexual immorality or transgenderism and that such approval constitutes an essential departure from Christian faithfulness and witness.
WE DENY that the approval of homosexual immorality or transgenderism is a matter of moral indifference about which otherwise faithful Christians should agree to disagree.

Ecclesiastes 3:1 "All things have time, and all things under the sun pass by their spaces."

Why am I raising these issues?

Overpopulation, overbreeding, fake morals.

By placing too much emphasis on those parts of the Bible written by this guy:

Acts 8:3 "But Saul greatly destroyed the church, and entered by houses, and drew out men and women, and betook them into prison."

... you lose out on what the rest of it was really saying.