Zig Saves the Day for cross platform tree-sitter compilation
I recently decided to give my Emacs configuration a makeover. Over time, startup had become sluggish, and I wanted to try out the new tree-sitter modes. Setting these up involves compiling tree-sitter grammars, which is easy enough on Linux. By following the excellent article, How to Get Started with Tree-Sitter by Mickey Peterson I was able to get everything up and running.
But the problem is, all the guides I found work great for Mac and Linux, but not for Windows so unfortunately this code...
(setq treesit-language-source-alist
'((bash "https://github.com/tree-sitter/tree-sitter-bash")
(cmake "https://github.com/uyha/tree-sitter-cmake")
(css "https://github.com/tree-sitter/tree-sitter-css")
(elisp "https://github.com/Wilfred/tree-sitter-elisp")
(go "https://github.com/tree-sitter/tree-sitter-go")
(html "https://github.com/tree-sitter/tree-sitter-html")
(javascript "https://github.com/tree-sitter/tree-sitter-javascript" "master" "src")
(json "https://github.com/tree-sitter/tree-sitter-json")
(make "https://github.com/alemuller/tree-sitter-make")
(markdown "https://github.com/ikatyang/tree-sitter-markdown")
(python "https://github.com/tree-sitter/tree-sitter-python")
(toml "https://github.com/tree-sitter/tree-sitter-toml")
(tsx "https://github.com/tree-sitter/tree-sitter-typescript" "master" "tsx/src")
(typescript "https://github.com/tree-sitter/tree-sitter-typescript" "master" "typescript/src")
(yaml "https://github.com/ikatyang/tree-sitter-yaml")))
(mapc #'treesit-install-language-grammar (mapcar #'car treesit-language-source-alist))
Which downloads and compiles the grammars flawlessly on my Linux system would not work for me long term.
But what is tree sitter?
I have an article about this coming soon but in brief...
Tree-sitter is a parsing library and tool that helps analyze and understand the structure of code. By understanding the structure of the code, it is able to more effectively provide syntax highlighting, code folding, symbol search, and structured editing. It is more flexible, and easier to work with than any other prior solution in the text editor space. We will look at how it does this in more depth next week :)
Zig to the Rescue
Ideally, I needed a way to automate compiling the grammars, so I could seamlessly transition across all my systems. That meant wrestling with different compilers (GCC, Clang, MSVC) on different operating systems. While doable, it sounded like a major headache, especially with the notoriously finicky MSVC.
Then I remembered reading about Zig's cross-compilation superpowers. I knew about this from Andrew Kelly's (creator of Zig) article titled... zig cc
: a Powerful Drop-In Replacement for GCC/Clang.
This came in handy when I was writing the Experimenting with GUIs on the Pi Zero article. Being able to test examples beforehand by cross compile the C/C++ examples on my 8 core 4.3Ghz cpu, before committing to timing them on the Pi Zero's single core 1Ghz processor, sped things up immensely. So with a potential solution in mind, I turned to writing the script.
Automating the download PowerShell
I whipped up a PowerShell script to automate the process. I've put the core logic below, but don't worry about trying to understand all of it.
$ZigVersion = "0.11.0"
$fileExt = if ($IsWindows) { ".dll" } elseif ($IsLinux) { ".so" } elseif ($IsMacOS) { ".dylib" } else { throw "Unsupported OS" }
# Get the temp directory
$tempDir = Get-TempDirectory
# Determine OS and Architecture
$osArch = if ($IsWindows) { "windows-x86_64" } elseif ($IsMacOS) {
if ([System.Runtime.InteropServices.RuntimeInformation]::OSArchitecture -eq "Arm64") {
"macos-aarch64"
} else {
"macos-x86_64"
}
} elseif ($IsLinux) { "linux-x86_64" } else { "unsupported" }
# I.E zig-windows-x86_64-0.11.0
$zigOsArch = "zig-$osArch-$ZigVersion"
# Construct the download URL with the Zig version variable
$downloadZigUrl = switch ($osArch) {
"windows-x86_64" { "https://ziglang.org/download/$ZigVersion/$zigOsArch.zip" }
"macos-aarch64" { "https://ziglang.org/download/$ZigVersion/$zigOsArch.tar.xz" }
"macos-x86_64" { "https://ziglang.org/download/$ZigVersion/$zigOsArch.tar.xz" }
"linux-x86_64" { "https://ziglang.org/download/$ZigVersion/$zigOsArch.tar.xz" }
Default { throw "Unsupported platform: $osArch" }
}
# Check if Zig directory already exists and -ReinstallZig is not used
if ((Test-Path -Path "./$zigOsArch") -and (-not $ReinstallZig)) {
Write-Host "$zigOsArch already downloaded. Skipping install. Pass -ReinstallZig to redownload."
}
elseif ($ReinstallZig -And (Test-Path -Path "./$zigOsArch")) {
Write-Host "Removing old Zig version $zigOsArch"
Remove-Item -Path "./$zigOsArch" -Recurse -Force
# Get the temp directory
$tempDir = Get-TempDirectory
# Download and extract the file
Download-And-Extract-Zig -url $downloadZigUrl -tempDir $tempDir
Write-Host "Platform specific Zig has been redownloaded and extracted to: $PSScriptRoot"
}
else {
# Download and extract the file
Download-And-Extract-Zig -url $downloadZigUrl -tempDir $tempDir
Write-Host "Zig has been downloaded and extracted to: $PSScriptRoot"
}
I picked Zig version 0.11.0
because master is being actively worked on so the urls are changing.
It gets the operating system the script is running on which determines the name of the architecture, and the extension for the shared library.
It then interpolates the string to construct the appropriate url, and download the appropriate Zig compiler using that url. I download it to the platform specific temp
folder, then extract the folder to my .emacs.d
. The Zig executable is pre-compiled, so once I have the folder unzipped it's ready to go
Downloading and Compiling the Grammars.
After I've downloaded the Zig folder it's time for the second part of the script, downloading the tree-sitter grammars. I have all the grammars saved in an array, which I iterate through, cloning each repository. Every tree-sitter grammar repo I've encountered has either a parser.c
, scanner.c
, or scanner.cc
located in the src
folder. These are the files you need to compile to get your tree-sitter mode to work properly. The script ultimately gets these files and figures out which ones need to have these Zig commands run on them to create a dynamic library.
zig c++ -c -fPIC $fileName -o $cppObject $($includeDirs -join ' ' ) -lc"
zig cc -c -fPIC $fileName -o $cObject $($includeDirs -join ' ')"
zig cc -c -fPIC $fileName -o $cObject $($includeDirs -join ' ')"
It does some more logic to create the appropriately named libtree-sitter-<language>.ext
, and finally, copies the files over to the tree-sitter folder in my .emacs.d
. And that is pretty much it. Here is a snippet of that part of the code
$grammarUrls = @(
"https://github.com/camdencheek/tree-sitter-dockerfile",
"https://github.com/tree-sitter/tree-sitter-bash",
"https://github.com/uyha/tree-sitter-cmake",
"https://github.com/tree-sitter/tree-sitter-css",
"https://github.com/tree-sitter/tree-sitter-c-sharp",
"https://github.com/tree-sitter/tree-sitter-javascript",
"https://github.com/tree-sitter/tree-sitter-json",
"https://github.com/tree-sitter/tree-sitter-python",
"https://github.com/ikatyang/tree-sitter-yaml"
)
$zigCompiler = (Get-ChildItem zig-*/zig).FullName
foreach ($url in $grammarUrls) {
# I.E tree-sitter-markdown
$repoName = $url -split '/' | Select-Object -Last 1
# I.E tree-sitter-markdown to markdown
$languageName = $repoName -replace 'tree-sitter-', ''
$cloneDir = Join-Path $tempDir $repoName
# Check if grammar directory already exists and -ReinstallGrammar is not used
if ((Test-Path -Path $cloneDir) -and (-not $ReinstallGrammar)) {
Write-Host "$repoName already cloned. Pass -ReinstallGrammar to reclone."
continue
}
# If -ReinstallGrammar is used, clean up the current grammar directory
if ($ReinstallGrammar -And (Test-Path -Path $cloneDir)) {
Write-Host "Removing old version of $repoName"
Remove-Item -Path $cloneDir -Recurse -Force
}
Write-Host "Downloaded $url to $cloneDir"
# Clone the grammar repository
git clone $url $cloneDir --depth=1
# Navigate to the cloned directory
Push-Location -Path $cloneDir
# Compile the grammar using zig cc
$libName = "libtree-sitter-$languageName$fileExt"
$cSrcFiles = (Get-ChildItem src/*.c).FullName
# Hack to filter schema.generated.cc from the yaml ts grammar
# It's already included in scanner.cc
$cppSrcFiles = Get-ChildItem src/*.cc
| Where-Object { $_.Name -ne "schema.generated.cc" } | ForEach-Object { $_.FullName }
$includeDirs = @('-Isrc -Isrc/tree-sitter')
# Check if there are C++ source files
if ($cppSrcFiles -ne $null) {
# Compile C++ files using zig c++
[System.Collections.ArrayList]$cppObjectFiles = @()
foreach ($fileName in $cppSrcFiles) {
$cppObject = $fileName + ".o"
$cppObjectFiles.Add($cppObject) | Out-Null
$cppCmd = "$zigCompiler c++ -c -fPIC $fileName -o $cppObject $($includeDirs -join ' ' ) -lc"
Write-Host "Compiling C++ files with the command: $cppCmd`n"
Invoke-Expression $cppCmd
}
[System.Collections.ArrayList]$cObjectFiles = @()
foreach ($fileName in $cSrcFiles) {
$cObject = $fileName + ".o"
$cObjectFiles.Add($cObject) | Out-Null
$cCmd = "$zigCompiler cc -c -fPIC $fileName -o $cObject $($includeDirs -join ' ')"
Write-Host "Compiling C files with the command: $cCmd`n"
Invoke-Expression $cCmd
}
Modifying the init.el
After I've downloaded and compiled everything it's time to modify my init.el
. The init.el
file is what Emacs runs before it starts up, and can be modified by the user to make Emacs do pretty much whatever you want. I have logic in mine to make sure that tree-sitter support is compiled into my Emacs. If it is, it goes through the different languages, checks if the grammar is installed, and then maps the corresponding file extension to the appropriate -ts-mode
.
(if (treesit-available-p)
(progn
(message "Tree-Sitter is available")
;; Shell script files
(if (treesit-language-available-p 'sh)
(add-to-list 'auto-mode-alist '("\\.sh\\'" . bash-ts-mode)))
Some people use major-mode-remap-alist
instead, but I find that solution inelegant.
Here is the link to the full PowerShell file that does all of this. That is also the link to my Emacs configuration if you would like to use it yourself. I dedicate all of it to the public domain, or the MIT license if you prefer a concrete license.
If you've never given Zig a try, I'd recommend it. I think it has a lot of compelling arguments to exist, and is filling a niche that most new programming languages don't care to fill. And of course, I'd also recommend PowerShell as a modern cross-platform shell language. I've written two articles about it so my bias is clear :)
1.Getting work done with PowerShell on Linux
2.The case for PowerShell On macOS and Linux
Call To Action 📣
Hi 👋 my name is Diego Crespo and I like to talk about technology, niche programming languages, and AI. I have a Twitter and a Mastodon, if you’d like to follow me on other social media platforms. If you liked the article, consider liking and subscribing. And if you haven’t why not check out another article of mine listed below! Thank you for reading and giving me a little of your valuable time. A.M.D.G