Create your own Python

Model-driven development is getting more popular even in the automotive industry. Martin Karlsch, who is graduating as Master of Science in Software Engineering at the Hasso-Plattner-Institute in Potsdam, has recently published his master-thesis about a model-driven framework for domain specific languages (DSL) which is demonstrated on a Python-based test automation language he created with his prototype implementation of such framework, frodo.

DSLs are designed for implementations of special purpose software. Other than general purpose languages, Python for example, DSLs are strictly focused on particular and often very complex areas of application, such as pharmacy, genetics, aerospace, or automotive. Let's have a look on the latter: Complexity of software systems in modern automobiles is significantly increasing, not only in those of German brands such as BMW, Mercedes or Audi. Most innovations in cars are software-related and these software-components must often fulfill many requirements (e.g. safety regulations, real-time processing) and will be integrated with plenty of other components in a broad variety of vehicle models. I think there is no driver who has never heard of at least one mass car recall due to severe software-related defects in the past few years. So, manufacturers started to address the challenges involved with this progression and are looking for ways to strongly improve testing and software quality management at all. E. g., for test scenarios regarding integration of software-components and other components in a prototype car there is the need to have a domain specific language in which the engineers can easily specify the automatic tests they want to run and probably also to analyze the mass of information in the test reports. It is in the nature of DSLs that their user bases are rather small. Moreover it is not unlikely that a company which is facing such complex and cost-intensive challenges will find itself in the need to develop its own DSL.

Wait! ... What?

Anyone who has ever developed a programming language (I never programmed one myself, but nevertheless I felt free to make sassy design decisions) will know that this is all but an easy task to master.

As I understand Martin Karlsch's master-thesis, he has achieved to implement a prototype of this framework, that builds up on existing Python grammar and constructs which can be extended, reduced or modified by domain experts and software engineers, so the language is tailored for use in the target domain. These specifications are defined in domain meta models, syntax views with textual representations and semantic mappings. As result of the process the domain experts will finally have a special-purpose Python language available. - Python gets children!

I like it. I like the idea of combining Python's clean elegance and dynamic OO strengths with domain-specific elements. Martin's framework is capable to carry out the delicious Python flavor to many more gourmets, very specialized gourmets indeed. And I wouldn't be surprised if in a future not far this model-driven approach will become a popular alternative and add-on to the rich base of Python libraries and site-packages.



pyswarm is a Free Software tool for the model-driven development of Python applications with PostgreSQL databases. Future releases should support a wide range of UML tools. Since I think that UML is already well known in today's software industry, I want to provide an overview of the Model-Driven Architecture (MDA) and some other related standards of the Object Management Group (OMG, official Web-site / Wikipedia), and how they may play a role with Python and pyswarm in particular.

pyswarm SDK 0.7.1 is still a prototype. Currently the SDK supports UML 2.0 models stored in (MagicDraw) XMI 2.1 files that are parsed and then directly transformed into a custom target application code. For the 1.0 final release several major improvements have been suggested. Some of them are tightly coupled with MDA requirements or recommendations, such as the model-to-model transformation.

OMG's Four Model Layers

A good starting point is OMG' view on the modeling domain. This view covers four modeling layers: M3 is the layer of meta-meta modeling. It does not only sound very abstract, it is the most abstract layer here. OMG's key standard in the M3 layer is the Meta Object Facility (MOF, Web-site / Wikipedia). MOF is used to describe meta models such as the Unified Modeling Language (UML, Web-site / Wikipedia) and even MOF itself. UML resides in the M2 layer (meta modeling) and is also a standard by OMG. A model that is described in UML resides in the M1 layer. You can see two examples of such models of the M1 layer in the pyswarm documentation.

And as a fourth layer there is M0, the most concrete layer. The OMG understands M0 as the data layer. M0 represents concrete instances of elements (e.g. objects of classes or records of database tables) that have been specified in a model in the M1 layer. Regarding pyswarm SDK 0.7.1 this means that the generator reads a model of the M1 layer (such as the one stored in the PetStore.xml file), transforms that in another M1 layer model (but this time specifically for the pyswarm architecture) and generates Python + SQL code, which then in run-time will work with a concrete M0 layer model.

Model-Driven Architecture

Furthermore there is OMG's Model-Driven Architecture (MDA, Web-site / Wikipedia), a standard for model-driven software development heavily based on OMG standards. Essentially MDA is about the creation of software by transformations from one (UML) model to another.

For doing this MDA suggests that a Computation Independent Model (CIM) is specifying domain-relevant knowledge of domain-experts in order to provide software-experts information about the expected purpose, terms and limitations of a software system. In short, you can consider a CIM as the visualized business model for which a software system is planned, which is specifiying purely business requirements, with no technical meaning.

MDA also suggests that a CIM is manually transformed into a Platform-Independent Model (PIM). A PIM identifies entities from the CIM and specifies them in a more technical meaning, e.g. by defining business classes, operations, attributes, packages, but a PIM is still not specific for a particular platform or language. At least theoretically it should be possible to use a PIM for any other platform. Usually only some UML extensions have to be changed, e.g. by changing stereotypes that are applied to elements in the PIM.

The PIM will be transformed into a Platform-Specific Model (PSM), which usually should happen automatically. You can expect that the PSM will be more complex than the PIM since it represents how the PIM is adopted for a particular platform. You can check the examples in the pyswarm SDK introduction to see the differences between the PIM and the resulting PSM.

Some MDA tools will provide the resulting PSM and may even offer to the user to change the PSM before it will be transformed into the so-called Implementation-Specific Model (ISM). The ISM is the implementation, resulting from the PSM, speak: the generated code (that is, in the case of pyswarm, Python and SQL), documentation and other directories and files.

In order to improve tool usability MDA also suggests the use of Transformation Record Models (TRM) created during a model-to-model transformation. A TRM would show the mapping between elements of the source model and elements of the target model.


Software-engineering is sometimes categorized according to the sequence in which specification and implementation are accomplished during a project. Under this criteria, there are commonly three different categories identified: forward engineering (you have a specification and use that to create an implementation), reverse engineering (if you are in the pitiful situation of using an implementation to create the specification of the same), and round-trip engineering for a complete development cycle into both directions. In that meaning you can consider MDA as a mainly forward engineering technique. The MDA doesn't require transformations from an ISM to a PSM or from a PSM to a PIM, neither MDA encourages you to do so. Also MDA doesn't require that a PSM is made available to the user. Some MDA tools though are capable to show users the PSM resulting from the user's PIM and some are storing the PSM as XMI file.

More OMG Standards

OMG has been hard-working in the recent years and specified several standards that are of interest regarding MDA. Unfortunately many of them are pretty complex for their own and have been revised multiple times. Although I was not able to keep track with all of these standards, I want to outline those which could be of interest for the re-engineering of the pyswarm SDK:

As mentioned before there is the Meta Object Facility (MOF), especially the Essential MOF (EMOF) as core specification of MOF, the specification of XML Metadata Interchange (XMI, Web-site / Wikipedia) and the most recent MOF2.0/XMI2.1 mapping specification. XMI can be used to serialize meta models in MOF into an XML-based format, so these meta models can be exchanged between tools, particularily UML tools.

The Object Constraint Language (OCL, Web-site / Wikipedia) is a declarative language that can be used to specify rules applicable to elements in a model. Originally developed by IBM, OMG used OCL as part of the UML standard. In the current version 2.0 OCL can be used for the specification of constraints in any MOF-based meta model.

After some years of experience with MDA in daily business some people felt the need to define a standard for model transformations. Thus OMG specified the MOF Query/View/Transformation (QVT, PDF / Wikipedia) standard. QVT includes OCL with imperative extensions and defines three domain specific languages to be used for model transformation specifications.


Indeed, also due to their complexity the standards outlined before are already providing a thriving habitat for highly specialized modeling experts. I understand that there are scenarios in software industry, in particular in large projects, where this grade of abstraction is crucial and justified. Still I am not sure if consideration of all these standards would be really a viable approach for adding the support of other UML tools and XMI formats to pyswarm SDK, especially since it is not clear if they will be worth the efforts - not only to be implemented once, but also to be maintained and updated to future standard changes.

Specifications And White Papers



UML in Python projects

The UML (Unified Modeling Language) is a notation language to visualize software systems, originating from different languages invented by the three amigos et al and after some unification process later standardized by the OMG and since 2000 UML is even an ISO standard.

Long ago, during my time in individual software consultancy we used UML diagrams to handle domain-inherent complexity of some applications since UML is doing a good job in abstracting all the architecture and implementation details and visualize a target software in a notation that is not too difficult to understand even for domain experts who are unfamiliar with software development. The first diagrams I had to work with were implementation diagrams (those with the components) and the use-case diagrams, often describing them across components with detailed and structured use-case descriptions attached. As any other instrument which reduces complexity by abstraction UML itself adds some own complexity to a project. So, it is not surprising that UML is usually applied more in large and complex than in rather small and simple projects.

As mentioned we used UML to design the target software, resulting in plenty of diagrams and some boxes full of paper which a small army of developers has used to do the implementation. Of course there was some difference between what has been specified and what has been implemented, despite of the big efforts to keep specification and implementation synchronized. Not to mention the amount of bugs which likely find their way into growing code-bases. Of course many of the countless UML CASE tools avialable now are able to spit out some code fragments, e.g. in Python, from the UML model specified. I prefer to speak of passive generators and flat code-generations in these cases. You can consider these generation features often in the same dimension as practical as code completion or code templates. They can be handy by reducing the amount of common code fragments you need to type, e.g. the method signatures. But that's it.

However, these projects have been successful in the meaning that the resulting apps were impressing in fulfilling customer requirements, but I had the feeling that the high costs and the long time span it takes to create such ambitious individual software makes it difficult to be realized for any organizations that are not large corporations. Well, I haven't been alone with this view. Some years earlier OMG started to standardize a model-driven software development, the Model-Driven Architecture (MDA) and as far as I remember around 2002 Markus Hillebrand brought this emerging development to my attention. Different from me, this friend of mine is a real software-engineer and blessed with a rare coding talent.

Mean-while MDA is a widely spread standard in model-driven software projects. It can be considered as the missing stone in OMG's vision how to gain more benefit from UML usage. MDA describes how UML can be utilized to transform one model to another model, finally even to the implementation of such models, speak: generation of code, documentation, deployment artifacts and so on. There are plenty of MDA tools available, some of them are Free Software, and most of them are focused on the very complex target languages and platforms (guess, why).

What I have missed during the recent years is a Python tool that combines these standards with a Python-based architecture. Yes, I'm confident that Python can be used for complex software architectures (speak: n-tier architectures, distributed objects, business logic encapsulation etc.), PEAK and GNUenterprise proofed that this is technically possible. Maybe there is one project I have just overseen.

As far as I understand, the majority of Python users usually choose Python for smaller projects and, if I am not mistaken, in the cases where the task requires more abstraction and efficiency they often choose one of the pythonic frameworks. Nevertheless, I am pretty sure that there are Python users which use UML models at least for requirement engineering or the design and specification of a target software.

So, if you are one of these Pythonists I would like to learn which UML CASE tool(s) you use in Python projects, for which tasks you use them (spec, code-generation,...?) and which diagram types you use. In any case I'm curious to learn your opinion on UML in Python projects.


Transactions in user-interfaces

As you may know, the OpenSwarm SDK already imports specific UML models to generate Python apps that use PostgreSQL databases to store persistent business objects. Somehow this prototype even works, at least for the server side components (business logic).

As described in the introduction it is planned that a developer can also add user-interface (UI) components, which are either generated by the OpenSwarm SDK or manually implemented (or mixed). User-interfaces should of course support the transaction concept, but I wonder how you people design UIs with transaction feature that span over a single call (e.g. in the UI clicking "Delete" button of a specific record).

In the unlikely case that the ORM may irritate you, here is a brief description how it works in this project (else you can ignore this):
The back-ends are using PostgreSQL transactions represented as Operation objects. Typical use would be (with short Python examples and most important SQL representations -note that the SQL queries are executed instantly, no big cache/repository is used in the app layer):

  1. Start transaction on a component:
    Python: oOp = myComponent.createOp()
  2. get the object with unique id 1:
    Py: numberOne = myComponent.getEntityByEID(oOp, 1)
    SQL: a "SELECT ..." to get records in some specific table
  3. call method on object #1 to create some new Foo object #2
    Py: numberOne.newFoo(oOp, 'test value')
    SQL: some "INSERT..." on several tables
  4. commit the transaction
    Py: oOp.commit()
In principle there are calls that are creating new objects, reading or modifying them, deleting them, adding and removing them from collections (objects linked with another, which is mapped in an association table).

So, given a simple case, in the UI the end-user may browse through a list of Foo objects (records), retrieves one of them to view in detail and wants to change the one or other attribute. Assume that the Foo class has a method for changing the name attribute, Foo::update(self, oOp, name:string)E.g. by clicking on "Edit" button (transaction begins here) a form pops up (or the dialog changes to a editable form), expecting all the fields for the update method to be called if end-user clicks on "Save" (transaction commit, so records in database are changed) - or the end-user clicks on "Cancel" and the transaction is rolled back and closed with-out any changes to the database (update method would not be called of course).

I think, this is a rather common case and doesn't make me any head-ache. Now let make things a little bit more complicated: The same as the above, but it is this Foo object (or record) detail view with some attribute values of the Foo object - plus a link to another object, e.g. of the class Person. Assume the end-user clicks "Edit" and this magic update method associated with this form expects exactly one linked object, e.g. Foo::update(self, oOp, name:string, author:Person). So, the edit view shows one text input widget and some object browse & select widget, so the end-user can browse through the list of Person objects and assign the current object to that. Probably the end-user even has even the option to create and assign a new Person object in this very same object browse & select dialog, before he is saving the updated Foo object.

So, is the creation of the new Person object part of the same transaction as the update of the Foo object? Or is it better for usability and UI creation if the Person object is created in a first transaction and committed, before a second transaction takes over the update of the Foo object when end-user clicks on "Save". In the case of two different transactions clicking the "Cancel" button in the Foo edit view, the Person object would already be persistently in the database, while in a single transaction this would not be the case (the Person object creation would be rolled-back with the entire transaction).

Well, this example is still rather simple. I can imagine use-cases that are much more complicated (envolving objects of many classes, or even from different components, think of 2-phase commmit and "nested" transactions).

How would an user-interface support different handling (either a single transaction or multiple transactions) and on behalf of a good usability: How would this be transparent to the user?

I guess, as long as there is no concrete and satisfying answer to these questions there is no sense in defining the way how developers can specify UI components in OpenSwarm application models and how those specifications can be transformed into runtime implementations.

So, any ideas?


Getting ready for the final

Today is a remarkable day for my pet project OpenSwarm, a model-driven software development tool in Python and for Python. In MDA-style it uses UML 2.0 models (or more concrete their XML-based serialization form XMI 2.1) to generate business logic components including database layers for PostgreSQL by object-relational mapping (ORM).

I started this project in April 2006 after unsuccessfully researched for a multi-tier capable distribute component architecture based on Python and appropriate code-generators. I felt the need to be able to produce our in-house applications faster and with high quality. By a high quality I mean particularily the prevention of breaking business rules by business logic API for distributed business objects and 2-phase ACID-compliant transaction safety. I just failed to find a tool in the Python world matching my needs, so I decided to have a try and started the OpenSwarm project on SourceForge.

To be honest: I never thought I bring this project that far to a prototype. Hahaha, I even don't consider myself as a programmer. Despite this barrier it somehow worked to get a release which shows some of the core features planned, so hopefully anyone else can get a picture of what OpenSwarm is supposed to become.

Well, of course there is still a long way to go in order to get a first production-ready final release. Fortunately I already got some responses to my Help Wanted postings by talented SourceForge users with-in the last two or three weeks. Thus I had the feeling it was necessary to make some sketch of what could be part of the final release, so I made today a major update of the OpenSwarm documentation including a Discussion part with initial porposals for future development. I really appreciate their help offers and hope that we can iron out some of the bad design decisions and implementations I made in the prototype and that we can add some cool new features to OpenSwarm.

I feel really excited where this journey will take us to.