PCA LANGUAGES AND COMPILATION

The Morphware Stable Interface is composed of several novel elements.  These include the use of streaming languages, a two-stage compilation with a virtual machine middle layer, and metadata to describe the target architecture and guide high-level compiler optimization.  The following subsections provide more detail on each of these elements.

Streaming Algorithms and Languages

As discussed in the last section, high performance streaming sensor applications are of significant interest to the PCA program.  Applications in this form may be efficiently mapped to physical resources in several ways.  For instance, kernels may be multiplexed in space or time.  Multiplexing in space statically assigns each kernel in the graph to disjoint physical resources.  In this scheme, each kernel must be assigned to enough resources to ensure that its throughput equals to the input rate. In time multiplexing, kernels take turns using all of the system resources to process a block of input data, so long as the system is able to rotate through all the kernels quickly enough to keep up with the input data.  Hybrid combinations are also possible.

Many opportunities for partitioning a streaming computing problem are masked from compilers and linkers by traditional general-purpose languages.  Streaming languages are efficient on PCA hardware because they expose data parallelism and the kernel structure directly in the applications’ representation in the programming language.  Compilers can then determine which data may be kept local to the compute resources and streamed directly between logic units via internal buffers, greatly improving utilization efficiency, power, and throughput. Streaming languages and the associated methodology are an integral part of the MSI. 

Because of the potential efficiency of streaming on PCA systems, several of the PCA hardware teams are developing new stream-based languages to aid in exploiting the capabilities of their hardware.  Two of these, Brook and StreamIt, are currently supported by the MSI.  While both StreamIt and Brook are similar in their use of streams and kernels to process data, they also have several significant differences.  StreamIt represents a program as a single stream graph that operates on a conceptually infinite stream, while Brook supports multiple stream graphs that operate on finite streams and are controlled from a pointer-less subset of C.  Kernels in StreamIt must have static input and output rates, while Brook kernels can be dynamic. StreamIt kernels can have internal state that is preserved between invocations, while Brook kernels must be stateless; also, StreamIt kernels can peek at input items without popping them from the stream, while Brook kernels must pop any items that they inspect.  Finally, the stream graphs in StreamIt are composed of hierarchical units, each of which has a single input stream and single output stream, while Brook supports a flat graph of kernels, each of which can have multiple input streams and multiple output streams.  StreamIt and Brook are fully detailed in their respective specifications [9,10].  Rudimentary examples of each are given in the Appendix of this document.  More substantial examples of Brook code are included in Reservoir’s R-Stream 1.0 high-level compiler release [11].

Two Stage Compilation

The implied abstract machine model on which traditional programming languages such as C are built has become an increasingly poor match to actual modern processor architectures.  Greater use of multiple processors, multiple independent process flows within a processor, a complex memory hierarchy, and the common use of many isolated computing systems on a single application (clusters and distributed computing) have all created important deviations from the underlying computational engine for which these languages were designed.  Many approaches have been taken in an attempt to maintain the usability of traditional languages, while exploiting the emergent capabilities of modern computers, including domain-specific middleware, explicit communication libraries, special codes inserted into source code to serve as compiler hints, and a litany of general guidelines and approaches for structuring programs to best suit particular platforms [12].  Each of these approaches has had some success in exploiting new technology, but has resulted in software that is tightly coupled to a particular deployment platform, and often requires a significant level of human effort to maximize performance for that configuration.

With a finite, but increasing set of source languages, APIs, and PCA target platforms, each new API or source language introduced requires an application development toolset for each possible target PCA platform, and each new PCA platform requires a development toolset for every source language and API supported.  The resulting proliferation of toolsets is depicted in Figure 5.

 

Figure 5.  Multiple source languages and multiple hardware targets require development of many different software tool sets.

The approach of building multiple independent compilers and build tools is not efficient for PCA platforms, which can change and reconfigure frequently.  The build-chain tools must be portable and capable of deploying an application onto a wide variety of platform configurations and in addition must be capable of selecting the optimal hardware/software configurations for a particular task and set of constraints.  In order to support increased portability and to speed the deployment of new PCA devices and PCA programming languages, the Morphware Forum recognized the need to establish an explicit expression of the common abstractions of the PCA processors in the MSI.  In particular, since the source languages and Application Programming Interfaces (APIs) (e.g., C, C++, StreamIt, Brook) developed for PCA architectures are targeted to similar implicit abstract machines, and since PCA platforms share common abstract characteristics, it is natural to introduce a portability layer between these APIs and the PCA hardware that encapsulates the common abstractions.

In the MSI, this portability layer is the Stable Architecture Abstraction Layer (SAAL).  The SAAL is a set of portable APIs that encapsulate abstractions of the computing resources present in PCA devices, as well as the operations on those resources used by the MSI source languages and APIs.  This portability layer abstracts and simplifies PCA hardware for the source languages, and provides a consistent abstract set of resource types and functional support requirements for PCA hardware developers.  The SAAL portability layer also simplifies the deployment of new PCA platforms and new MSI source languages and APIs by providing a single common target for each.  New languages and APIs must only provide a mapping to the SAAL portability layer, instead of build tools targeting every possible target platform.  New PCA platforms need only supply a compiler for the SAAL to work with all of the existing source languages and APIs.  This simplified toolset architecture is depicted in Figure 6.

 

Figure 6.  Introduction of a virtual machine layer reduces the number of software tools required.

The MSI framework thus uses two separate compile steps.  The source languages and APIs collectively are referred to as the Stable Application Programming Interface (SAPI) layer.  The top-level input at the SAPI level is processed by the high-level compiler (HLC) appropriate to the source language(s) used.  The high-level compiler outputs SAPI code, which is the input to the low-level compiler (LLC) for the target PCA platform of choice.  At this writing, the first alpha version of a PCA high-level compiler, Reservoir Inc.’s R-Stream, has been released [11 ]

Previous / Next