Here are the latest-known themes around Abstract Syntax Trees (ASTs) as of 2026, with a quick snapshot of recent directions and resources.
What is an AST (recap)
- An Abstract Syntax Tree is a tree representation of source code that abstracts away concrete syntax details to expose its structural and semantic content. This enables tooling for analysis, transformation, and code generation.[7]
Recent developments and trends
- ASTs are increasingly tied to AI-assisted code tasks. Several analyses and discussions point to ASTs as foundational for parsing, program understanding, and enabling transformations in workflows involving large language models and code generation pipelines.[2][4]
- Research has compared AST parsing backends (e.g., JDT, Tree-sitter, ANTLR, srcML) in terms of tree size, depth, and abstraction level, noting that different parsers yield trade-offs between structural richness and model learnability. JDT tends to produce smaller, shallower trees with higher abstraction, while other parsers offer richer representations that may incur heavier learning costs for models.[4][2]
- There is ongoing exploration of AST-based representations beyond traditional compilers, including code search and summarization tasks, where syntactic information from ASTs improves expressiveness and accuracy of code-related models.[4]
- Practical tooling and libraries for working with ASTs continue to proliferate, including libraries focused on patching, transforming, and generating code from ASTs, with ecosystems evolving for JavaScript/TypeScript and other languages.[10][7]
Where to look for current discussions and implementations
- Language-focused AST research: academic repositories and arXiv preprints comparing AST parsers and their impact on downstream tasks include discussions of depth, size, and abstraction levels across JDT, Tree-sitter, srcML, and ANTLR.[2][4]
- Industry tooling and tutorials: video explainers and practitioner tutorials illustrate how ASTs feed into code analysis, formatting, and code-generation workflows, often highlighting how tools like Babel, ESLint, Prettier, or custom parsers leverage ASTs.[7]
- Community discussions and code samples: developer forums and GitHub topic pages show active usage of AST-based libraries for patching, transformations, and language-agnostic tooling experiences.[9][10]
Illustrative takeaway
- If you’re building a code-analysis or transformation tool today, you’ll likely choose an AST representation that balances richness with model/trainability. For large-scale code understanding tasks, starting with a mid-tier AST (e.g., Tree-sitter or srcML) can provide enough structure while keeping downstream models more efficient, whereas JDT-style trees may be preferable where abstraction and compactness are advantageous. This aligns with reported differences in how different parsers structure code and how that affects downstream performance.[2][4]
Would you like a concise, side-by-side comparison table of the main AST parsers (JDT, Tree-sitter, srcML, ANTLR) with key characteristics (tree size, depth, abstraction level, typical use cases) and a few links to recent papers or blog posts? I can tailor it to a language you care about (e.g., JavaScript/TypeScript, Java) and to whether you prioritize parsing speed, abstraction, or downstream ML-task performance.
Citations
- Abstract syntax trees overview and usage in tooling.[7]
- Comparative findings on AST parsers (JDT, Tree-sitter, srcML, ANTLR) and their impact on tasks.[4][2]
- AST-informed code analysis and model training trends in practice.[4]
Sources
interpreter, pyre-ast will be able to parse/reject it as well. Furthermore, abstract syntax trees obtained from pyre-ast is guaranteed to 100% match the results obtained by Python's own ast.parse API, down to every AST node and every line/column number.
alan.petitepomme.netievans on June 7, 2021 It supports many more languages (~17 at various stages of development) and being able to do AST patching as in the original is one of the capabilities we're experimenting with: https://semgrep.dev/docs/experiments/overview/#autofix Would love your feedback!
news.ycombinator.comWe apply the approach to gradually migrate the schemas of the AUTOBAYES program synthesis system to concrete syntax. Fit experiences show that this can result in a considerable reduction of the code size and an improved readability of the code. In particular, abstracting out fresh-variable generation and second-order term construction allows the formulation of larger continuous fragments and improves the locality in the schemas. … We used the recent grammar of the Arden Syntax v.2.10, and both...
www.science.govBased on the extensive experimental results, we conclude the following findings: • The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On … pets require more high-level abstract summaries in code summarization, and code snippets semantically match but contain fewer query...
arxiv.org• The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On the contrary, ASTs generated by ANTLR exhibit the largest size and the lowest abstraction level. Tree-sitter and srcML are both intermediate in structure size and abstraction level between JDT and ANTLR. … • Among...
arxiv.org