Closing the Loop:Universal Repository Representation with RPG-Encoder

Jane Luo1,‡,* Chengyu Yin1,‡,* Xin Zhang1,*,† Qingtao Li1 Steven Liu1,‡ Yiming Huang2
Jie Wu3,‡ Hao Liu1,‡ Yangyu Huang1 Yu Kang1 Fangkai Yang1 Ying Xin1 Scarlett Li1

1Microsoft Research Asia · 2UCSD · 3Tsinghua University

*Equal Contribution · †Corresponding Author · ‡Work done during internship at Microsoft

RPG-Encoder Overview

RPG-Encoder bridges the gap between Code and RPG, enabling bidirectional transformation through semantic lifting, hierarchical construction, and incremental evolution.

Abstract

Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic depth. We consider repository comprehension and generation to be inverse processes within a unified cycle: generation expands intent into implementation, while comprehension compresses implementation back into intent. To address this, we propose RPG-Encoder, a framework that generalizes the Repository Planning Graph (RPG) from a static generative blueprint into a unified, high-fidelity representation. RPG-Encoder closes the reasoning loop through three mechanisms: (1) Encoding raw code into the RPG that combines lifted semantic features with code dependencies; (2) Evolving the topology incrementally to decouple maintenance costs from repository scale, reducing overhead by 95.7%; and (3) Operating as a unified interface for structure-aware navigation. In evaluations, RPG-Encoder establishes state-of-the-art repository understanding on SWE-bench Verified with 93.7% Acc@5 and exceeds the best baseline by over 10% on SWE-bench Live Lite. These results highlight our superior fine-grained localization accuracy in complex codebases. Furthermore, it achieves 98.5% reconstruction coverage on RepoCraft, confirming RPG's high-fidelity capacity to mirror the original codebase and closing the loop between intent and implementation.

Method

RPG-Encoder Pipeline

1 RPG Structure

We define RPG as a hierarchical, dual-view graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$. The node set $\mathcal{V} = \mathcal{V}_{H} \cup \mathcal{V}_{L}$ distinguishes High-level Nodes $\mathcal{V}_{H}$ representing architectural directories from Low-level Nodes $\mathcal{V}_{L}$ comprising atomic implementations (files, classes, functions). Each node $v = (f, \mathbf{m}) \in \mathcal{V}$ pairs a semantic feature $f$ describing functionality with structural metadata $\mathbf{m}$ encoding code entity attributes. The edge set $\mathcal{E}$ integrates two perspectives:

  • Functional edges $\mathcal{E}_{\text{feature}}$: establishing teleological hierarchy
  • Dependency edges $\mathcal{E}_{\text{dep}}$: mapping logical interactions including imports and calls

2 RPG Encoding: Three-Phase Extraction

Phase 1: Semantic Lifting

Lifts codebase into a discrete registry of Low-level Nodes. Extracts semantic features for functions and classes, mapping them to behavioral signatures while retaining code-level attributes as metadata.

Phase 2: Structure Reorganization

Constructs High-level Nodes by recovering the latent functional topology. Performs Functional Abstraction via granularity-based compression and Hierarchical Aggregation to link nodes to semantic centroids.

Phase 3: Artifact Grounding

Anchors the functional manifold to physical artifacts using LCA-based bottom-up propagation. Injects dependency edges via AST analysis to complete the implementation map.

3 RPG Evolution: Incremental Maintenance

To reduce the cost of full re-generation, we maintain the graph incrementally via commit-level feature extraction and three atomic update protocols:

  • Deletions: Remove nodes for deleted entities and recursively prune empty parent categories.
  • Modifications: Re-generate semantic descriptions; update position only if functional intent shift is detected.
  • Additions: Create nodes for new entities and insert by matching semantics against existing centroids.

4 RPG Operation: Unified Reasoning Substrate

RPG provides a queryable index where Functional and Dependency Views are partitioned by edge types but share a unified node set. Three core tools enable navigation:

SearchNode

Global retrieval by matching intent against semantic features or filtering metadata.

FetchNode

Node-level data retrieval: extracts attributes and raw source code.

ExploreRPG

Cross-view traversal along edges for navigating execution flows.

Experiments

We evaluate RPG-Encoder on two challenging benchmarks: SWE-bench for fault localization and RepoCraft for repository reconstruction. Our experiments demonstrate that RPG-guided agents achieve state-of-the-art performance with significant efficiency gains across multiple backbone models.

Table 1: Comprehensive Localization Results on SWE-bench Verified and SWE-bench Live

Method SWE-bench Verified SWE-bench Live
File-level Function-level File-level Function-level
Acc@1Acc@5PreRec Acc@1Acc@5PreRec Acc@1Acc@5PreRec Acc@1Acc@5PreRec
o3-mini
Agentless67.188.167.064.734.760.339.433.254.278.555.647.728.854.239.325.6
OrcaLoca67.571.968.364.046.352.948.341.535.438.036.227.623.126.125.315.6
LocAgent62.877.264.761.432.140.533.928.947.659.449.741.223.831.026.617.7
CoSIL66.585.766.263.652.273.354.747.160.980.866.154.843.865.151.435.6
RPG-Encoder (Ours)78.391.280.776.858.577.862.955.173.788.277.564.556.575.664.746.9
Δ best+10.8+3.1+12.4+12.1+6.3+4.5+8.2+8.0+12.8+7.4+11.4+9.7+12.7+10.5+13.3+11.3
GPT-4o
Agentless63.086.163.161.131.458.834.729.356.178.857.148.330.657.441.426.4
OrcaLoca64.369.365.061.439.853.342.536.742.547.645.034.028.237.032.521.1
LocAgent71.987.973.469.340.167.444.838.162.580.066.854.235.756.444.529.9
CoSIL64.984.465.062.243.266.248.240.160.177.063.750.741.261.649.129.4
RPG-Encoder (Ours)74.589.677.072.753.176.757.949.569.283.573.260.350.569.459.441.8
Δ best+2.6+1.7+3.6+3.4+9.9+9.3+9.7+9.4+6.7+3.5+6.4+6.1+9.3+7.8+10.3+11.9
GPT-4.1
Agentless65.290.865.763.529.349.032.726.462.085.563.054.535.159.446.025.4
OrcaLoca75.280.076.571.355.266.759.050.156.259.657.144.242.050.546.229.1
LocAgent79.590.980.877.232.365.636.731.274.787.976.866.143.468.752.538.7
CoSIL69.890.670.767.651.874.555.347.062.384.767.355.648.872.258.341.2
RPG-Encoder (Ours)82.693.283.679.368.783.471.062.478.090.581.469.064.781.972.152.6
Δ best+3.1+2.3+2.8+2.1+13.5+8.9+12.0+12.3+3.3+2.6+4.6+2.9+15.9+9.7+13.8+11.4
GPT-5
Agentless78.795.978.376.245.168.147.341.364.587.465.157.438.864.649.731.6
OrcaLoca88.293.988.684.276.186.279.168.674.482.377.663.559.674.068.646.6
LocAgent88.296.788.486.750.980.355.949.779.793.081.474.248.068.756.640.5
CoSIL82.895.782.380.268.381.868.962.369.889.372.962.255.276.262.346.5
RPG-Encoder (Ours)91.997.791.189.183.493.684.576.982.194.485.476.271.987.878.161.1
Δ best+3.7+1.0+2.5+2.4+7.3+7.4+5.4+8.3+2.4+1.4+4.0+2.0+12.3+11.6+9.5+14.5
Claude-4.5-Sonnet
Agentless76.696.576.974.431.734.632.027.163.889.766.158.041.472.455.335.9
OrcaLoca87.289.687.582.274.579.376.565.174.778.376.261.565.169.467.846.1
LocAgent71.476.672.770.249.357.851.544.958.769.061.654.747.360.552.639.3
CoSIL75.596.175.973.757.578.760.752.964.588.369.457.551.174.960.139.6
RPG-Encoder (Ours)90.597.691.888.679.893.783.475.882.093.985.675.874.890.480.763.3
Δ best+3.3+1.1+4.3+6.4+5.3+14.4+6.9+10.7+7.3+4.2+9.4+14.3+9.7+15.5+12.9+17.2

Table 2: Repository Reconstruction on RepoCraft

FrameworkBackboneCoverage (%)Accuracy (Pass/Vote) (%)#FilesnLOCCode Tokens
Gold ProjectsHuman100.094.8 / 98.834597,725718,946
ZeroRepo-DocGPT-4.164.650.0 / 63.42096,079158,948
ZeroRepo-DocGPT-5-mini74.252.6 / 71.414313,414125,625
ZeroRepo-RPG (Ours)GPT-4.193.585.8 / 93.420635,190346,865
ZeroRepo-RPG (Ours)GPT-5-mini98.586.0 / 97.722660,871550,432

Key Insights

Analysis

Maintenance Cost
Cost comparison: Full reconstruction vs. incremental updates across commit history.
Agent Step Distribution
Step-wise action distributions induced by RPG interface across LLMs.
Error Distribution
Distribution of failure modes on SWE-bench Verified (100 failed trajectories per method).

BibTeX

@misc{luo2026rpgencoder
      title={Closing the Loop: Universal Repository Representation with RPG-Encoder}, 
      author={Jane Luo and Chengyu Yin and Xin Zhang and Qingtao Li and Steven Liu and Yiming Huang and Jie Wu and Hao Liu and Yangyu Huang and Yu Kang and Fangkai Yang and Ying Xin and Scarlett Li},
      year={2026},
      eprint={2602.02084},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.02084}, 
}