This project is a PHP 5.5 (and older) parser written in PHP itself.
A parser is useful for static analysis and manipulation of code and basically any other application dealing with code programmatically. A parser constructs an Abstract Syntax Tree (AST) of the code and thus allows dealing with it in an abstract and robust way.
There are other ways of dealing with source code. One that PHP supports natively is using the
token stream generated by token_get_all
. The token stream is much more low level than
the AST and thus has different applications: It allows to also analyze the exact formatting of
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
For example an AST abstracts away the fact that in PHP variables can be written as $foo
, but also
as $$bar
, ${'foobar'}
or even ${!${''}=barfoo()}
. You don't have to worry about recognizing
all the different syntaxes from a stream of tokens.
Another questions is: Why would I want to have a PHP parser written in PHP? Well, PHP might not be a language especially suited for fast parsing, but processing the AST is much easier in PHP than it would be in other, faster languages like C. Furthermore the people most probably wanting to do programmatic PHP code analysis are incidentally PHP developers, not C developers.
The parser uses a PHP 5.5 compliant grammar, which is backwards compatible with at least PHP 5.4, PHP 5.3 and PHP 5.2 (and maybe older).
As the parser is based on the tokens returned by token_get_all
(which is only able to lex the PHP
version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4 and 5.5 is provided. This
allows to parse PHP 5.5 source code running on PHP 5.2, for example. This emulation is very hacky and not
yet perfect, but it should work well on any sane code.
The parser produces an Abstract Syntax Tree (AST) also known as a node tree. How this looks like
can best be seen in an example. The program <?php echo 'Hi', 'World';
will give you a node tree
roughly looking like this:
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
)
)
)
This matches the semantics the program had: An echo statement, which takes two strings as expressions,
with the values Hi
and World!
.
You can also see that the AST does not contain any whitespace information (but most comments are saved). So using it for formatting analysis is not possible.
Apart from the parser itself this package also bundles support for some other, related features: