Saturday, March 24

+/ 1 2 3 NB. six

Scott Vokes () wrote:

 I haven't looked at it lately, but I remember running into a brick wall trying to figure out how adverbs work in it. 

So I was thinking that sounds like a worthwhile project -- documenting how an adverb works in the J interpreter.

So, first off, my understanding is that the J interpreter, while written in C, is structured much like a J program.  It's doing "J things".  So, to understand what it's doing, we should start with J.

In this case, let's consider the expression:

   +/ 1 2 3

Here, we have three objects:

+ a verb
/ an adverb
1 2 3 a noun

And the first thing that the J interpreter would handle, when executing this expression is the noun 1 2 3.  Since it's a noun, internally in the interpeter it's represented by two lists.  The first list is the shape of the array (1 dimenion: 3).  The second list is the values of the array (which must have a length equal to the product of the values of first list).  There's also something to represent the C type used to represent the values, and the C length of the stored data (for memory management).

Next, J would pick up the adverb.  But it can't do anything with it yet.

Next, J would pick up the verb.  A J verb has a dyadic definition, and a monadic definition and a rank.  In this case the dyadic definition is the interesting one, but the interpeter does not know that yet (unless special code is involved, which might actually be the case here).  So, somewhere in the interpreter we are going to be dealing with both of these definitions (and the rank: 0 0 0).

Next, the interpeter would give that verb to the adverb.  And, the adverb would be building a new verb from it.  Again, the new verb would have a dyadic definition, a monadic definition and a rank.  In this case we are going to be using the monadic definition, but unless special code is involved (and it probably is, in this case), the interpeter will not know that yet.

Finally, this new verb is going to process that noun.  But the interpreter isn't quite done yet -- it's going to be dealing with rank issues before letting the underlying verb mechanisms get at the underlying data.  But, eventually, something is going to be adding up those three numbers and coming back with the value 6.

Except, of course, 6 is going to be a noun. So it's going to have a list of dimensions (an empty list), and a list of values (one value here).  Obvious, maybe, but important.

--------------

So.. next, what are the C definitions that are relevant here?

First, line 227 of j.h:

#define F2(f)           A f(J jt,A a,A w)

This (F2) is how a C function is turned into a J dyadic definition.

A is an array type (it's a pointer to the structure which represents a noun)

J is the type of an J interpreter instance (again, it's a pointer to a structure).  I think this corresponds to a context where a symbol can be resolved (a stack frame within a locale), but I am not sure, maybe I'll come back to this.

Also: in the interpreter, the name 'a' typically refers to a verb's left argument, 'w' typically refers to its right argument and 'z' typically refers to its result.  (There are historical reasons for this.)

So this is saying that when we execute a function representing a dyadic verb, that verb will get three arguments: the context it is executing in, the verb's left argument, and the verb's right argument.

So... next, line 705 of ja.h:

#define plus(x,y)                   jtplus(jt,(x),(y))

Here, we see something that looks like a definition for plus.  And notice how it's following the 'F2' pattern -- jtplus could be a valid argument for F2 -- so this looks promising for a definition for addition.

And, indeed, on line 489 of je.h:

extern F2(jtplus);

So this is the C extern, that various C modules are going to need to have access to, if they are going to be able to refer to jtplus.  Also, we have a cpp macro named 'plus' that does the J dyadic addition thing.  This is not necessarily the angle we will be coming in at, when we execute +/ but hopefully some of the patterns are becoming clear.  For example, we know know that jt is expected to be the name that locally refers to the context where a verb will be executed.

So, how does the definition of jtplus look?  It seems to be line 400 of va2.c:

F2(jtplus   ){R va2(a,w,CPLUS   );}

R is the C return statement.  We already know that F2(jtplus) is the C type signature for jtplus.  So, we just need va2 and CPLUS to understand this statement.

CPLUS is simple, it's on line 52 of jc.h

#define CPLUS      '+'          /*  43 053 2b                              */

The comment gives the numeric value of that character, in decimal, octal and hexidecimal.

va2 is defined on line 1074 of ja.h and looks like this:

#define va2(x,y,z)                  jtva2(jt,(x),(y),(z))

We know that z will be '+' and x and y will be left and right verb arguments

jtva2 has a big definition:


static A jtva2(J jt,A a,A w,C id){A z;B b,c,sp=0;C*av,*wv,*zv;I acr,af,ak,an,ar,*as,at,cv,f,m,
     mf,n,nf,*oq=jt->rank,r,*s,*sf,t,wcr,wf,wk,wn,wr,*ws,wt,zk,zn,zt;VF ado;
 RZ(a&&w);
 an=AN(a); ar=AR(a); as=AS(a); at=an?AT(a):B01; 
 wn=AN(w); wr=AR(w); ws=AS(w); wt=wn?AT(w):B01;
 if(id==CEXP&&1==wn&&FL&wt&&0.5==*DAV(w))R sqroot(a);
 if(SPARSE&at){sp=1; at=DTYPE(at);}
 if(SPARSE&wt){sp=1; wt=DTYPE(wt);}
 RZ(var(id,a,w,at,wt,&ado,&cv)); zt=rtype(cv); t=atype(cv);
 if(t&&!sp){
  b=1&&t&XNUM;
  if(t!=at)RZ(a=b?xcvt(cv&VXEQ?XMEXMT:cv&VXFC?XMFLR:cv&VXCF?XMCEIL:XMEXACT,a):cvt(t,a));
  if(t!=wt)RZ(w=b?xcvt(cv&VXEQ?XMEXMT:cv&VXCF?XMFLR:cv&VXFC?XMCEIL:XMEXACT,w):cvt(t,w));
 }
 if(jt->rank){I acn,q,wcn,zcn;
  r=jt->rank[0]; acr=MIN(ar,r); af=ar-acr; acn=prod(acr,as+af);
  r=jt->rank[1]; wcr=MIN(wr,r); wf=wr-wcr; wcn=prod(wcr,ws+wf); jt->rank=0;
  ASSERT(!ICMP(as,ws,MIN(af,wf))&&!ICMP(as+af,ws+wf,MIN(acr,wcr)),EVLENGTH);
  c=af<=wf; f=c?wf:af; q=c?af:wf; sf=c?ws:as;
  b=acr<=wcr; zcn=b?wcn:acn; m=b?acn:wcn; n=m?zcn/m:0; r=b?wcr:acr; s=b?ws+wf:as+af;
  if(zcn){RE(mf=prod(q,sf)); RE(nf=prod(f-q,q+sf));}else mf=nf=0;
  if(!sp){RE(zn=mult(mf,mult(nf,zcn))); zk=zcn*bp(zt); ak=acn*bp(AT(a)); wk=wcn*bp(AT(w));}
 }else{
  ASSERT(!ICMP(as,ws,MIN(ar,wr)),EVLENGTH);
  ak=wk=zk=af=wf=f=c=0; acr=ar; wcr=wr; sf=0; mf=nf=1;
  b=ar<=wr; zn=b?wn:an; m=b?an:wn; n=m?zn/m:0; r=b?wr:ar; s=b?ws:as;
 }
 if(sp){z=vasp(a,w,id,ado,cv,t,zt,af,acr,wf,wcr,f,r); if(!jt->jerr)R z;}
 else{
  GA(z,zt,zn,f+r,sf); ICPY(f+AS(z),s,r); 
  if(!zn)R z;
  av=CAV(a); wv=CAV(w); zv=CAV(z);
  if(1==nf) DO(mf,        ado(jt,b,m,n,zv,av,wv); zv+=zk; av+=ak; wv+=wk;)
  else if(c)DO(mf, DO(nf, ado(jt,b,m,n,zv,av,wv); zv+=zk;         wv+=wk;); av+=ak;)
  else      DO(mf, DO(nf, ado(jt,b,m,n,zv,av,wv); zv+=zk; av+=ak;        ); wv+=wk;);
  if(!jt->jerr)R cv&VRI+VRD?cvz(cv,z):z;
 }
 R NEVM<jt->jerr?(jt->rank=oq,va2(a,w,id)):0;
}    /* scalar fn primitive and f"r main control */

I have emphasized the places in this code which care about the fact that we are doing addition, instead of something else (bold, by itself, was almost invisible so I also used italics and I increased font size).  This looks like a workhorse to handle all scalar primitives.  And, it looks recursive, at the very bottom, which I think is how it handles overflow -- if a result overflows, the routine can derive new argument types and call itself again.  Interesting (not completely structural) names here are the ones that care that we are dealing with CPLUS as opposed to something else:

var
vasp
va2 (a reference back to this routine).

var gets defined on line 1078 of ja.h:

#define var(x0,x1,x2,x3,x4,x5,x6)   jtvar(jt,(x0),(x1),(x2),(x3),(x4),(x5),(x6))

so it needs a reference to the J interpreter context.  Also.. three extra arguments (beyond jt and the three arguments provided to va2()) -- that's probably rank handling (and peeking ahead at the definition of jtvar: yes, it is).

vasp gets defined a couple lines down, on line 1080 of ja.h:

#define vasp(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12)     jtvasp(jt,(x0),(x1),(x2),(x3),(x4),(x5),(x6),(x7),(x8),(x9),(x10),(x11),(x12))

This one turns out to be code for dealing with sparse data.  Interesting, but too far afield for what we are doing right now.

So let's look at jtvar:

B jtvar(J jt,C id,A a,A w,I at,I wt,VF*ado,I*cv){B b;I j,t,x;VA*q;VA2 p;
 if(jt->jerr){
  switch(VARCASE(jt->jerr,id)){
   default:      R 0;
   case VARCASE(EWIMAG,CCIRCLE ): *ado=(VF)cirZZ;   *cv=VZ+VZZ+VRD; break;
   case VARCASE(EWIMAG,CEXP    ): *ado=(VF)powZZ;   *cv=VZ+VZZ+VRD; break;
   case VARCASE(EWIRR ,CBANG   ): *ado=(VF)binDD;   *cv=VD+VDD;     break;
   case VARCASE(EWIRR ,CEXP    ): *ado=(VF)powDD;   *cv=VD+VDD;     break;
   case VARCASE(EWRAT ,CDIV    ): *ado=(VF)divQQ;   *cv=VQ+VQQ;     break;
   case VARCASE(EWRAT ,CEXP    ): *ado=(VF)powQQ;   *cv=VQ+VQQ;     break;
   case VARCASE(EWDIV0,CDIV    ): *ado=(VF)divDD;   *cv=VD+VDD;     break;
   case VARCASE(EWOV  ,CPLUS   ): *ado=(VF)plusIO;  *cv=VD+VII;     break;
   case VARCASE(EWOV  ,CMINUS  ): *ado=(VF)minusIO; *cv=VD+VII;     break;
   case VARCASE(EWOV  ,CSTAR   ): *ado=(VF)tymesIO; *cv=VD+VII;     break;
   case VARCASE(EWOV  ,CPLUSDOT): *ado=(VF)gcdIO;   *cv=VD+VII;     break;
   case VARCASE(EWOV  ,CSTARDOT): *ado=(VF)lcmIO;   *cv=VD+VII;     break;
   case VARCASE(EWOV  ,CSTILE  ): *ado=(VF)remDD;   *cv=VD+VDD;     break;
  }
  RESETERR;
 }else if(at&NUMERIC&&wt&NUMERIC){
  t=at|wt; b=1&&t&RAT+XNUM+XD+XZ;
  j=t&CMPX ? 9 : b ? (t&XZ?13:t&XD?12:t&FL?8:t&RAT?11:10) : 
      (at&B01?0:at&INT?3:6)+(wt&B01?0:wt&INT?1:2);
  q=va+vaptr[(UC)id];
  p=(q->p2)[j];
  *ado=p.f; *cv=x=p.cv; if(b&&t&FL&&!(x&VZZ))*cv+=VDD;
 }else{
  b=!HOMO(at,wt); *cv=VB;
  jt->rela=ARELATIVE(a)*(I)a;
  jt->relw=ARELATIVE(w)*(I)w;
  switch(id){
   case CEQ: *ado=b?(VF)zeroF:at&SBT?(VF)eqII:at&BOX?(VF)eqAA:
                  at&LIT?(wt&LIT?(VF)eqCC:(VF)eqCS):wt&LIT?(VF)eqSC:(VF)eqSS; break;
   case CNE: *ado=b?(VF) oneF:at&SBT?(VF)neII:at&BOX?(VF)neAA:
                  at&LIT?(wt&LIT?(VF)neCC:(VF)neCS):wt&LIT?(VF)neSC:(VF)neSS; break;
   default:
    ASSERT(at&SBT&&wt&SBT,EVDOMAIN);
    q=va+vaptr[(UC)id]; p=(q->p2)[12];
    ASSERT(p.f,EVDOMAIN);
    *ado=p.f; *cv=x=p.cv;
 }}
 R 1;
}    /* function and control for rank */

First it has a bunch of error reporting, then it checks if both arguments are numeric and ... 

  q=va+vaptr[(UC)id]; 

Here, vaptr is a table with this comment:  /* index in va[] for each ID */

And the va table itself contains this:

/* 2b +  */ {
 {{plusBB,VI    }, {plusII,VI+VII}, {plusBD,VD}, 
  {plusII,VI+VII}, {plusII,VI    }, {plusID,VD}, 
  {plusDB,VD    }, {plusDI,VD    }, {plusDD,VD}, 
  {plusZZ,VZ+VZZ}, {plusXX,VX+VXX}, {plusQQ,VQ+VQQ}, {0,0}},
 {{plusinsB,VI}, {plusinsI,VI}, {plusinsD,VD}, {plusinsZ,VZ}, {0,0},         {0,0},         {0,0}},
 {{pluspfxB,VI}, {pluspfxI,VI}, {pluspfxD,VD}, {pluspfxZ,VZ}, {pluspfxX,VX}, {pluspfxQ,VQ}, {0,0}},
 {{plussfxB,VI}, {plussfxI,VI}, {plussfxD,VD}, {plussfxZ,VZ}, {plussfxX,VX}, {plussfxQ,VQ}, {0,0}} },

... so it looks like we are doing a table lookup based on operation and type -- C gets a bit obscure for multi-dimensional arrays -- but ultimately, we are going to be running one of these "typed functions".  J doesn't want to have to decide which one until it's inspected the data, so that's why we just refer to the operation by name until after we are down in the weeds.  (And we can imagine that each of these names is a C macro and really a reference to jtplusBB or jtplusXX or whatever else.)

And, another thing is that this table lookup lets us share common code with all "scalar atomic functions" -- most of it is "boilerplate work", and at the last moment we dispatch to the appropriate handler.  We could easily be using an analogous pattern when we get into adverb evaluation.

So.. that's our first verb.

-----------------

For our adverb, let's start on line 785 of ja.h:

reduce(x,y)                 jtreduce(jt,(x),(y))

This jt stuff should be a familiar pattern by now.  Here, x is probably the verb doing the reducing and y is probably the noun being reduced.  (x and y being newer names that adopted the same role as the previous a and w.)

Looking for references to jtreduce, I see three of them in ar.c:

line 12:
static DF1(jtreduce);

line 442 (the definition of jtreduce)

line 563 (the definition of jtslash)

Taking a step back:  in J, we have two steps in using +/ 1 2 3.  Our first step was to derive a verb (monadic definition, dyadic definition, rank).  That's probably what jtslash is doing.   Our second step was to derive a noun.  That's probably what jtreduce is doing.

Taking a step back further, all references to reduce are in ar.c (aside from that definition in ja.h).  So I suppose we can get away with a static rather than an extern in a header file, for jtreduce.  

So... looking at jtslash:

F1(jtslash){A h;AF f1=jtreduce;C c;V*v;
 RZ(w);
 if(NOUN&AT(w))R evger(w,sc(GINSERT));
 v=VAV(w); 
 switch(v->id){
  case CCOMMA:  f1=jtredcat;    break;
  case CCOMDOT: f1=jtredstitch; break;
  case CSEMICO: f1=jtredsemi;   break;
  case CUNDER:  if(COPE==ID(v->g)){c=ID(v->f); if(c==CCOMMA)f1=jtredcateach; else if(c==CCOMDOT)f1=jtredstiteach;}
 }
 RZ(h=qq(w,v2(lr(w),RMAX)));
 R fdef(CSLASH,VERB, f1,jtoprod, w,0L,h, 0L, RMAX,RMAX,RMAX);
}

it has some special code for a few operations, otherwise it defines a verb (using fdef).  The monadic definition is fl which will be jtreduce if special code was not used.  Here, w is probably our plus verb.

But let's look at fdef for a moment, or rather jtfdef:

A jtfdef(J jt,C id,I t,AF f1,AF f2,A fs,A gs,A hs,I flag,I m,I l,I r){A z;V*v;
 RE(0);
 GA(z,t,1,0,0); v=VAV(z);
 v->f1    =f1?f1:jtdomainerr1;  /* monad C function */
 v->f2    =f2?f2:jtdomainerr2;  /* dyad  C function */
 v->f     =fs;                  /* monad            */
 v->g     =gs;                  /* dyad             */      
 v->h     =hs;                  /* fork right tine or other auxiliary stuff */
 v->flag  =flag;
 v->mr    =m;                   /* monadic rank     */
 v->lr    =l;                   /* left    rank     */
 v->rr    =r;                   /* right   rank     */
 v->fdep  =0;                   /* function depth   */
 v->id    =id;                  /* spelling         */
 R z;
}

Yep -- looks like struct stuffing.

So, anyways, when we evaluate this thing, we expect that it will be under control of jtreduce, which looks like this:

static DF1(jtreduce){A z;C id;I c,cv,f,m,n,r,rr[2],t,wn,wr,*ws,wt,zt;VF ado;
 RZ(w);
 wr=AR(w); r=jt->rank?jt->rank[1]:wr; f=wr-r;
 wn=AN(w); ws=AS(w); n=r?ws[f]:1;
 if(SPARSE&AT(w))R reducesp(w,self);
 wt=AT(w); wt=wn?wt:B01;
 id=vaid(VAV(self)->f);
 switch(n){
  case 0: R red0(w,self);
  case 1: R tail(w);
  case 2: RZ(reduce2(w,id,f,r,&z)); if(z)R z;
 }
 vains(id,wt,&ado,&cv);
 if(!ado)R redg(w,self);
 zt=rtype(cv); jt->rank=0;
 GA(z,zt,wn/n,MAX(0,wr-1),ws); if(1<r)ICPY(f+AS(z),f+1+ws,r-1);
 if((t=atype(cv))&&t!=wt)RZ(w=cvt(t,w));
 m=prod(f,ws); c=m?wn/m:prod(r,f+ws);
 ado(jt,m,c,n,AV(z),AV(w));
 if(jt->jerr)R jt->jerr==EWOV?(rr[1]=r,jt->rank=rr,reduce(w,self)):0; else R cv&VRI+VRD?cvz(cv,z):z;
}    /* f/"r w main control */

here we see the usual mismash of C type checking and sanity checking, That switch statement looks like it's based off of the length of the data.  Length zero data gets one handling, length 1 data gets another handling, and length 2 gets the full "reduce" handling.  So jtreduce2 is going to be doing the heavy lifting, and jtreduce is dealing with the special cases.  Probably...

So, question: is this kind of analysis worth continuing?  Should I delve deeper into any of the details here?

2 comments:

  1. Ok, thank you, I will keep this interest in mind.

    I probably will not have time this year, though -- I've been busy working on the update to www.usatoday.com (and bug fixing! and figuring out what my coworkers have done, and other forms of buck stopping) and there's a slightly related project coming up for me, and I'm behind on my online class at http://www.cs.brown.edu/courses/cs173/2012/OnLine/

    ReplyDelete