-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Skip auto model router API call for cached inline chat #3005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: TylerLeonhardt <2644648+TylerLeonhardt@users.noreply.github.com>
| // Skip router for inline chat (ChatLocation.Editor) to avoid 200ms latency penalty. | ||
| // Inline chat already resolves to fast models, so classification is not needed. | ||
| const location = chatRequest?.location ?? ChatLocation.Panel; | ||
| const isInlineChat = location === ChatLocation.Editor; | ||
| const usingRouterModel = !isInlineChat && this._configurationService.getExperimentBasedConfig(ConfigKey.TeamInternal.AutoModeRouterUrl, this._expService) !== undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot I still want InlineChat to use the automodel routing, but I want it to skip over the fetching if there's already something cached
so it should be included in the shouldRoute logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the logic to include the inline chat check in shouldRoute condition. Inline chat now uses automodel routing but skips router fetching when a cache entry exists. Fixed in commit 1b34074.
- Inline chat still uses automodel routing but skips router API call when cache exists - Addresses feedback to include logic in shouldRoute instead of blocking router path entirely - Inline chat will route on first call, then reuse cached model on subsequent calls Co-authored-by: TylerLeonhardt <2644648+TylerLeonhardt@users.noreply.github.com>
The auto model router adds ~200ms latency to inline chat due to the classification API call. This change optimizes inline chat by skipping the router API call when a cached model selection already exists, while still participating in automodel routing.
Changes
Modified
AutomodeService._resolveWithRouterModel()to skip router API fetch for inline chat when cache exists:Behavior:
This approach allows inline chat to benefit from smart model routing while avoiding latency penalties on repeated requests.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.