A static type analyzer for Python code

Developer guide
Development process
Python version upgrades
Supporting new features
Program analysis
Main loop
Stack frames
Data representation
Abstract values
Special builtins
Type annotations
Type stubs
Style guide
Documentation debugging

View the Project on GitHub google/pytype

Hosted on GitHub Pages — Theme by orderedlist

The main loop

Processing a file

Pytype’s high-level workflow to analyse a single file1 is:

Processing a single opcode

run_instruction is the central dispatch point for opcode analysis. For every opcode, OP, we have a corresponding byte_OP() method; run_instruction looks this method up, calls it with the current state and the opcode, and uses the return value as the new state.

TIP: If you want to get a feel for how pytype works, an excellent starting point is to look at some of the byte_* methods and see how they mirror the workings of the python interpreter at a type level, popping arguments off the stack, manipulating locals and globals dictionaries, and creating objects for classes, methods and functions.

Two-pass Analysis

Pytype performs two passes when analyzing a file, as mentioned in the workflow above.

The first pass starts with run_program(), which executes the bytecode of the Python program using pytype’s virtual machine. This first step compiles the source code, executes the bytecode and builds the typegraph for the program. Besides regular type errors, this step also checks for errors such as:

However, this step will only find errors in functions and classes that are part of the control flow graph, starting with the main function of the file. If a function or class is not reachable from main(), this pass will miss errors in that member. If the file doesn’t have a main() – i.e. it is a library – then no class or function bodies will be type checked.

Because of that, pytype uses the typegraph to run a second analysis pass by calling analyze(). This pass recursively type checks all members of the program, starting at the top level definitions. These are mostly classes, though some libraries define top-level functions.

Both passes will be performed, no matter if pytype is run in “inference” (-o) or “check” (-C) mode. The second pass can be disabled using the --main (or -m) debug option, in which case only the code that is reachable from main() will be analyzed.

  1. : process_one_file() 

  2. : class CallTracer 

  3. run_program() 

  4. compile_src() 

  5. run_bytecode() 

  6. A frame is a segment of code, typically one method or function. See [^run-instruction]: run_instruction() [^compute-types]: compute_types() 

  7. analyze()