+6

Syntax Highlighing: support for user-defined language

Toralf 6 years ago updated by Thomas Singer 5 months ago 3

I use a scripting language Autohotkey. And i would love to see Syntax highlighting in smartgit for it. 

Since I have seen several requests for more support of syntax highlighting for other languages, I assume it makes sense tha smartgit should  provide an interface that allows users to „define“ their own syntax highlighting and associate this to a file Extension.

If there is already a possibility to do so, please point me in that direction.

Thanks for considering 

We are using ANTLR for configuring the lexer grammar. Better suggest which language is missing as a separate request.

That doesn't scale though... For example, there are many domain-specific languages that are used by relatively few of your users (or even just internally in a single company) – you can't support all of them individually. Also, as existing languages evolve, your lexer grammars become outdated (e.g. interpolated and raw string literals in C# are currently broken in SmartGit).

I think you can cover most languages (including DSLs) by supporting TextMate grammar files (tmLanguage.json), which are used by VSCode, so you wouldn't have to maintain them, just use something like https://github.com/eclipse/tm4e to integrate them into SmartGit. This would require some kind of mapping between TextMate scopes and the highlighting token types used by SmartGit (configured in Preferences), or directly using TextMate themes.

Alternatively, you might be able to use user-provided .g4 (lexer or combined grammar) or .interp files (generated by ANTLR) to tokenize source code using ANTLR runtime's support for interpretation, see https://github.com/antlr/antlr4/blob/master/doc/interpreters.md. A disadvantage of this approach is that it doesn't support (ignores) action code in the grammar. The user would also provide a mapping from token names/numbers to colors or highlighting token types ("string", "number", etc.).

A third (more complex but also more flexible) option would be to load user-provided WebAssembly modules (using a WASM runtime library for Java) with a specific interface and call their tokenize function. This means the user can provide a tokenizer/parser built in whatever programming language they happen to be using (as long as it can be compiled to WASM) with whatever logic they want. This might potentially even allow for using the languages' official compilers (like Roslyn for C#).

+1

Thanks for the input.


Regarding the new C# syntax: could you please provide a short example illustrating it at smartgit@syntevo.com? Thanks in advance.