notes-computer-programming-programmingLanguageDesign-kytheNotes

Kythe

See also Kythe section in [1].

The Kythe schema

http://www.kythe.io/docs/schema/

The Kythe schema defines node and edge types for a graph in which some nodes correspond to nodes in the AST of the program, and other nodes represent other relevant conceptual entities such as types defined by the program and semantic relations between these types.

A notation for distinguishing nodes in the Kythe representation vs nodes in the AST

I'm unclear as to whether this notation is formally part of Kythe, or is just used in the comments on the web page http://www.kythe.io/docs/schema/ to explain Kythe, but in any case it's useful to understand if you are reading that webpage.

Uppercase is used to denote identifiers for specific nodes in the Kythe representation. Lowercase is used to denote Kythe 'keywords', that is, names of node and edge types which are defined in the Kythe schema.

The '@' character is used as a prefix (prefix sigil) to denote an 'anchor', which is a node representing a particular occurrence of a substring within the source code (by convention, the rest of the text of an anchor identifer after the '@' is the text of the token, that is, the string corresponding to the anchor in the source code; if the anchor contains spaces then this is enclosed by double quotes). Each anchor is associated with a 'start' and 'end' byte offset to show where it is in the source code.

The 'defines' edge type relates an anchor node (a string the source code) to a non-anchor node in the Kythe representation ('semantic node'), to show which part of the source code contains the definition of the concept denoted by the semantic node.

Note that in Kythe, the anchor defining some semantic operation such as a declaration is taken to be, not the whole string of the source code announcing or accomplishing the declaration, but rather just the identifier within the declaration; for example, in C++, if a typedef if declared by "typedef int Counter;", the type alias node corresponding to this semantic operation is said to be defined at the substring 'Counter', not at the entire substring 'typedef int Counter;'.

For example, if we have the C++ statement:

typedef int Counter;

we might write

@Counter defines TAlias

to indicate an edge of edgekind 'defines' between anchor node '@Counter' and a node named 'TAlias'. Anchor node 'Counter' refers to the occurrence of the string 'Counter' in the source code. We might then write:

TAlias aliases TInt

to indicate an edge of edgekind 'aliases' between node 'TAlias' and node 'TInt'.

In this example, node TAlias is a 'type alias node' (nodekind 'talias'), which represents the aliasing of type 'Counter' to type 'int'.

To refer to an attribute of a node ('fact'), we use the syntax 'nodename.attributename'. Some attributes are themselves groupings containing one or more subattributes. To refer to a subattribute within an attribute, we use 'nodename.attributename/subattributename'. The attribute 'node' contains subattributes for metadata about a node such as its kind. To assert that an attribute has a value, we use the syntax 'nodename.attributename value'. For example, in the above example, we said that TAlias had nodekind 'talias'. This assertion would be written:

TAlias.node/kind talias

Nodekinds and edgekinds may have subkinds; the name of the subkind is of the form 'parentkind/subkind'. For example, the edgekind 'completes' has a subkind 'completes/uniquely'.

The syntax

!{ A someedgekind B}

means that there is not an edge from A to B of kind someedgekind.

Kythe edge kinds

Notes regarding instantiation and specialization of templates

In Kythe, some template type declarations (eg in C++, a type declaration starting with "template<typename T, typename S>") are related by a 'specialization' relationship, which is a form of subtyping. A 'specific' template is said to specialize a primary template if, for example, the primary type declaration start with "template<typename T, typename S>" and the specific one is declaring the same identifier except it binds 'float' to type parameter T and leaves S unbound; in this case the second declaration is said to be a specialization of the first.

Template type declarations which are not specializations of any other template type declaration are called 'primary' and the ones which are specializations of some primary template are called 'specific'. Kythe does not explicitly record the specialization relationship between specific template type declarations, rather it is only concerned with tracking the relationship between a primary template and each of its specific templates.

What Kythe does record is: when a template T is instantiated to a concrete type C by binding types to all type parameters in T, then Kythe records (a) which template T was instantiated, and what the bindings were for the type parameters in T, and (b) which corresponding primary template P was instantiated (eg the primary template P of which T was a specialization), and what the bindings were for all of the type parameters in P (ie both the type parameters given in (a), but also any type parameters bound in the specialization from P to T). (a) is called an 'instantiates' relationship, and (b) is called a 'specializes' relationship.

For example, consider the following C++ code:

template <typename T, typename S, typename V> T v;
template <typename U> U v<int, U, long>;
float w = v<int, float, long>;

Kythe sees two templates declaring identifier 'v', followed by a declaration for identifier 'w' that initializes 'w' by setting it to 'v' (todo?). One the first line we see the first template, which has three type parameters, T, S, and V. This template is a primary template. The second line contains the second template, which specializes the primary template on the first line by binding 'int' to T and 'long' to V, and binding a new type parameter U to S. This template is a specific template which specializes the primary template in the first line. The third line references a concrete type by starting with the second template and binding 'float' to U. The concrete type in the third line is said to instantiate the specific template in the second line, with the binding of 'float' to U; it is also said to specialize the primary template in the first line, with the bindings of 'int' to T, 'float' to S, and 'long' to V.

Kythe node kinds

todo