{"data":{"post":{"title":"I Gave GLM-5.2 Eyes","subtitle":"","isPublished":true,"createdTime":"2026-06-29T00:00:00.000Z","lastModifiedTime":null,"license":null,"tags":["AI","Agent"],"category":"Programming","file":{"childMdx":{"excerpt":"Have you ever sent an image to GLM-5.2 in OpenCode? The model says it is text-only and cannot…","code":{"body":"function _objectWithoutProperties(source, excluded) { if (source == null) return {}; var target = _objectWithoutPropertiesLoose(source, excluded); var key, i; if (Object.getOwnPropertySymbols) { var sourceSymbolKeys = Object.getOwnPropertySymbols(source); for (i = 0; i < sourceSymbolKeys.length; i++) { key = sourceSymbolKeys[i]; if (excluded.indexOf(key) >= 0) continue; if (!Object.prototype.propertyIsEnumerable.call(source, key)) continue; target[key] = source[key]; } } return target; }\n\nfunction _objectWithoutPropertiesLoose(source, excluded) { if (source == null) return {}; var target = {}; var sourceKeys = Object.keys(source); var key, i; for (i = 0; i < sourceKeys.length; i++) { key = sourceKeys[i]; if (excluded.indexOf(key) >= 0) continue; target[key] = source[key]; } return target; }\n\nconst layoutProps = {};\nreturn class MDXContent extends React.Component {\n  constructor(props) {\n    super(props);\n    this.layout = null;\n  }\n\n  render() {\n    const _this$props = this.props,\n          {\n      components\n    } = _this$props,\n          props = _objectWithoutProperties(_this$props, [\"components\"]);\n\n    return React.createElement(MDXTag, {\n      name: \"wrapper\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Have you ever sent an image to GLM-5.2 in OpenCode? The model says it is text-only and cannot inspect visual content. You accept the limitation and move on.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/132076ad702789fc387b27379f1f0f11/d16b7/glm-5-2-not-spport-vision-task-1.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/132076ad702789fc387b27379f1f0f11/0cc25/glm-5-2-not-spport-vision-task-1.png\",\n        \"srcSet\": [\"/static/132076ad702789fc387b27379f1f0f11/5116e/glm-5-2-not-spport-vision-task-1.png 178w\", \"/static/132076ad702789fc387b27379f1f0f11/92f55/glm-5-2-not-spport-vision-task-1.png 356w\", \"/static/132076ad702789fc387b27379f1f0f11/0cc25/glm-5-2-not-spport-vision-task-1.png 712w\", \"/static/132076ad702789fc387b27379f1f0f11/7ae06/glm-5-2-not-spport-vision-task-1.png 1068w\", \"/static/132076ad702789fc387b27379f1f0f11/eee47/glm-5-2-not-spport-vision-task-1.png 1424w\", \"/static/132076ad702789fc387b27379f1f0f11/38407/glm-5-2-not-spport-vision-task-1.png 2136w\", \"/static/132076ad702789fc387b27379f1f0f11/58df7/glm-5-2-not-spport-vision-task-1.png 2626w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/132076ad702789fc387b27379f1f0f11/690c8/glm-5-2-not-spport-vision-task-1.webp\",\n        \"srcSet\": [\"/static/132076ad702789fc387b27379f1f0f11/25c8a/glm-5-2-not-spport-vision-task-1.webp 178w\", \"/static/132076ad702789fc387b27379f1f0f11/60698/glm-5-2-not-spport-vision-task-1.webp 356w\", \"/static/132076ad702789fc387b27379f1f0f11/690c8/glm-5-2-not-spport-vision-task-1.webp 712w\", \"/static/132076ad702789fc387b27379f1f0f11/d7e52/glm-5-2-not-spport-vision-task-1.webp 1068w\", \"/static/132076ad702789fc387b27379f1f0f11/456ef/glm-5-2-not-spport-vision-task-1.webp 1424w\", \"/static/132076ad702789fc387b27379f1f0f11/2a654/glm-5-2-not-spport-vision-task-1.webp 2136w\", \"/static/132076ad702789fc387b27379f1f0f11/d16b7/glm-5-2-not-spport-vision-task-1.webp 2626w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/132076ad702789fc387b27379f1f0f11/d16b7/glm-5-2-not-spport-vision-task-1.webp\",\n        \"alt\": \"OpenCode prompt with an attached image where GLM-5.2 reports that it cannot inspect images.\",\n        \"title\": \"Image Input Fails in GLM-5.2\",\n        \"width\": 712,\n        \"height\": 164,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Image Input Fails in GLM-5.2\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `The hidden failure is worse. When GLM-5.2 works with browser-use tools, the tools capture screenshots, and the model confidently reports what it supposedly sees. But the model never saw a pixel. It read the AX tree, the accessibility metadata returned by a separate snapshot call, and treated that as visual verification. The AX tree can confirm that a button exists, but it cannot confirm whether the button is centered, whether the text is readable, or whether two screenshots match.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/2cbdf2abc17b1a4fb20bdf994f842716/f56ad/glm-5-2-not-spport-vision-task-2.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/2cbdf2abc17b1a4fb20bdf994f842716/0cc25/glm-5-2-not-spport-vision-task-2.png\",\n        \"srcSet\": [\"/static/2cbdf2abc17b1a4fb20bdf994f842716/5116e/glm-5-2-not-spport-vision-task-2.png 178w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/92f55/glm-5-2-not-spport-vision-task-2.png 356w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/0cc25/glm-5-2-not-spport-vision-task-2.png 712w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/7ae06/glm-5-2-not-spport-vision-task-2.png 1068w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/eee47/glm-5-2-not-spport-vision-task-2.png 1424w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/38407/glm-5-2-not-spport-vision-task-2.png 2136w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/34c79/glm-5-2-not-spport-vision-task-2.png 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/2cbdf2abc17b1a4fb20bdf994f842716/690c8/glm-5-2-not-spport-vision-task-2.webp\",\n        \"srcSet\": [\"/static/2cbdf2abc17b1a4fb20bdf994f842716/25c8a/glm-5-2-not-spport-vision-task-2.webp 178w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/60698/glm-5-2-not-spport-vision-task-2.webp 356w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/690c8/glm-5-2-not-spport-vision-task-2.webp 712w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/d7e52/glm-5-2-not-spport-vision-task-2.webp 1068w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/456ef/glm-5-2-not-spport-vision-task-2.webp 1424w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/2a654/glm-5-2-not-spport-vision-task-2.webp 2136w\", \"/static/2cbdf2abc17b1a4fb20bdf994f842716/f56ad/glm-5-2-not-spport-vision-task-2.webp 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/2cbdf2abc17b1a4fb20bdf994f842716/f56ad/glm-5-2-not-spport-vision-task-2.webp\",\n        \"alt\": \"OpenCode verification output where GLM-5.2 relies on an accessibility snapshot instead of actual pixels.\",\n        \"title\": \"Accessibility Snapshot Mistaken for Vision\",\n        \"width\": 712,\n        \"height\": 243,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Accessibility Snapshot Mistaken for Vision\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `To solve both problems, I built a plugin that gives GLM-5.2 eyes in OpenCode. This post covers the main lessons I learned while building it:`), React.createElement(MDXTag, {\n      name: \"ol\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `How to mix and match models with different capabilities without a model router or fusion models.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `How to design agent-to-agent communication.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `How to make skills trigger reliably on multimodal content.`)), React.createElement(MDXTag, {\n      name: \"h2\",\n      components: components\n    }, `Installing and Using the Plugin`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `If you want the plugin right away, here is the install command:`), React.createElement(MDXTag, {\n      name: \"pre\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"code\",\n      components: components,\n      parentName: \"pre\",\n      props: {\n        \"className\": \"language-shell\"\n      }\n    }, `opencode plugin opencode-vision -g\n`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `This plugin comes with a `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"p\"\n    }, `vision`), ` skill. To use it, drag an image into the input box.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/f02b6313689ce5373f8a29d98823d81d/f56ad/example-drop-image-1.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/f02b6313689ce5373f8a29d98823d81d/0cc25/example-drop-image-1.png\",\n        \"srcSet\": [\"/static/f02b6313689ce5373f8a29d98823d81d/5116e/example-drop-image-1.png 178w\", \"/static/f02b6313689ce5373f8a29d98823d81d/92f55/example-drop-image-1.png 356w\", \"/static/f02b6313689ce5373f8a29d98823d81d/0cc25/example-drop-image-1.png 712w\", \"/static/f02b6313689ce5373f8a29d98823d81d/7ae06/example-drop-image-1.png 1068w\", \"/static/f02b6313689ce5373f8a29d98823d81d/eee47/example-drop-image-1.png 1424w\", \"/static/f02b6313689ce5373f8a29d98823d81d/38407/example-drop-image-1.png 2136w\", \"/static/f02b6313689ce5373f8a29d98823d81d/34c79/example-drop-image-1.png 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/f02b6313689ce5373f8a29d98823d81d/690c8/example-drop-image-1.webp\",\n        \"srcSet\": [\"/static/f02b6313689ce5373f8a29d98823d81d/25c8a/example-drop-image-1.webp 178w\", \"/static/f02b6313689ce5373f8a29d98823d81d/60698/example-drop-image-1.webp 356w\", \"/static/f02b6313689ce5373f8a29d98823d81d/690c8/example-drop-image-1.webp 712w\", \"/static/f02b6313689ce5373f8a29d98823d81d/d7e52/example-drop-image-1.webp 1068w\", \"/static/f02b6313689ce5373f8a29d98823d81d/456ef/example-drop-image-1.webp 1424w\", \"/static/f02b6313689ce5373f8a29d98823d81d/2a654/example-drop-image-1.webp 2136w\", \"/static/f02b6313689ce5373f8a29d98823d81d/f56ad/example-drop-image-1.webp 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/f02b6313689ce5373f8a29d98823d81d/f56ad/example-drop-image-1.webp\",\n        \"alt\": \"OpenCode session demonstrating an image prompt before the vision plugin handles it.\",\n        \"title\": \"Image Prompt Before Vision Routing\",\n        \"width\": 712,\n        \"height\": 145,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Image Prompt Before Vision Routing\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `When the first time you use it, you must pick a vision-capable model detected from your configured providers.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/70271622f8191c3605644106fb3095be/f56ad/example-drop-image-2.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/70271622f8191c3605644106fb3095be/0cc25/example-drop-image-2.png\",\n        \"srcSet\": [\"/static/70271622f8191c3605644106fb3095be/5116e/example-drop-image-2.png 178w\", \"/static/70271622f8191c3605644106fb3095be/92f55/example-drop-image-2.png 356w\", \"/static/70271622f8191c3605644106fb3095be/0cc25/example-drop-image-2.png 712w\", \"/static/70271622f8191c3605644106fb3095be/7ae06/example-drop-image-2.png 1068w\", \"/static/70271622f8191c3605644106fb3095be/eee47/example-drop-image-2.png 1424w\", \"/static/70271622f8191c3605644106fb3095be/38407/example-drop-image-2.png 2136w\", \"/static/70271622f8191c3605644106fb3095be/34c79/example-drop-image-2.png 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/70271622f8191c3605644106fb3095be/690c8/example-drop-image-2.webp\",\n        \"srcSet\": [\"/static/70271622f8191c3605644106fb3095be/25c8a/example-drop-image-2.webp 178w\", \"/static/70271622f8191c3605644106fb3095be/60698/example-drop-image-2.webp 356w\", \"/static/70271622f8191c3605644106fb3095be/690c8/example-drop-image-2.webp 712w\", \"/static/70271622f8191c3605644106fb3095be/d7e52/example-drop-image-2.webp 1068w\", \"/static/70271622f8191c3605644106fb3095be/456ef/example-drop-image-2.webp 1424w\", \"/static/70271622f8191c3605644106fb3095be/2a654/example-drop-image-2.webp 2136w\", \"/static/70271622f8191c3605644106fb3095be/f56ad/example-drop-image-2.webp 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/70271622f8191c3605644106fb3095be/f56ad/example-drop-image-2.webp\",\n        \"alt\": \"OpenCode session where the plugin starts vision routing and discovers available image-capable models.\",\n        \"title\": \"Discovering Vision Models\",\n        \"width\": 712,\n        \"height\": 267,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Discovering Vision Models\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Then a subagent configured with that vision-capable model evaluates the image. GLM-5.2 in the main agent receives the subagent's findings as text.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/7c1870c2a4b33d05215079edcbcfd17e/f56ad/example-drop-image-3.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/7c1870c2a4b33d05215079edcbcfd17e/0cc25/example-drop-image-3.png\",\n        \"srcSet\": [\"/static/7c1870c2a4b33d05215079edcbcfd17e/5116e/example-drop-image-3.png 178w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/92f55/example-drop-image-3.png 356w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/0cc25/example-drop-image-3.png 712w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/7ae06/example-drop-image-3.png 1068w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/eee47/example-drop-image-3.png 1424w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/38407/example-drop-image-3.png 2136w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/34c79/example-drop-image-3.png 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/7c1870c2a4b33d05215079edcbcfd17e/690c8/example-drop-image-3.webp\",\n        \"srcSet\": [\"/static/7c1870c2a4b33d05215079edcbcfd17e/25c8a/example-drop-image-3.webp 178w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/60698/example-drop-image-3.webp 356w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/690c8/example-drop-image-3.webp 712w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/d7e52/example-drop-image-3.webp 1068w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/456ef/example-drop-image-3.webp 1424w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/2a654/example-drop-image-3.webp 2136w\", \"/static/7c1870c2a4b33d05215079edcbcfd17e/f56ad/example-drop-image-3.webp 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/7c1870c2a4b33d05215079edcbcfd17e/f56ad/example-drop-image-3.webp\",\n        \"alt\": \"OpenCode session asking the user to select a vision-capable model for image analysis.\",\n        \"title\": \"Selecting a Vision Model\",\n        \"width\": 712,\n        \"height\": 155,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Selecting a Vision Model\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `This skill also works with images returned by computer-use and browser-use tools.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/ccdd7094f9696cf5e88f919a0c5a9000/f56ad/example-computer-use-1.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/ccdd7094f9696cf5e88f919a0c5a9000/0cc25/example-computer-use-1.png\",\n        \"srcSet\": [\"/static/ccdd7094f9696cf5e88f919a0c5a9000/5116e/example-computer-use-1.png 178w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/92f55/example-computer-use-1.png 356w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/0cc25/example-computer-use-1.png 712w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/7ae06/example-computer-use-1.png 1068w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/eee47/example-computer-use-1.png 1424w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/38407/example-computer-use-1.png 2136w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/34c79/example-computer-use-1.png 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/ccdd7094f9696cf5e88f919a0c5a9000/690c8/example-computer-use-1.webp\",\n        \"srcSet\": [\"/static/ccdd7094f9696cf5e88f919a0c5a9000/25c8a/example-computer-use-1.webp 178w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/60698/example-computer-use-1.webp 356w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/690c8/example-computer-use-1.webp 712w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/d7e52/example-computer-use-1.webp 1068w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/456ef/example-computer-use-1.webp 1424w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/2a654/example-computer-use-1.webp 2136w\", \"/static/ccdd7094f9696cf5e88f919a0c5a9000/f56ad/example-computer-use-1.webp 2628w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/ccdd7094f9696cf5e88f919a0c5a9000/f56ad/example-computer-use-1.webp\",\n        \"alt\": \"OpenCode computer-use example where GLM-5.2 captures a Finder window with cua-driver, loads the vision skill, and returns a screen description.\",\n        \"title\": \"Describing a Finder Window with Computer-use Tools\",\n        \"width\": 712,\n        \"height\": 498,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Describing a Finder Window with Computer-use Tools\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"h2\",\n      components: components\n    }, `The Architecture`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `ZCode implements vision support by routing image input to a vision-capable model included in its official subscription plan. That is why ZCode can understand the images you send it, and why that behavior disappears when you use GLM-5.2 through unofficial providers.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `But OpenCode cannot configure a model router or fusion models. So how can we make OpenCode handle visual content?`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Since many providers available through OpenCode already offer vision-capable models: OpenAI ChatGPT, Kimi for Coding, OpenCode Go, and Ollama Pro/Max. With the primitives OpenCode already provides, we can build a lightweight architecture:`), React.createElement(MDXTag, {\n      name: \"ol\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `Create subagents that use vision-capable models to process visual content.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `Delegate visual tasks to these subagents through a skill when needed.`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `With today's agent tooling, those two ideas are enough to prompt an agent into building the plugin.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `However, two details still matter:`), React.createElement(MDXTag, {\n      name: \"ol\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `Agent-to-agent communication design`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `What the skill description covers`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Both are critical to the quality of vision-task results.`), React.createElement(MDXTag, {\n      name: \"h2\",\n      components: components\n    }, `Agent-to-agent Communication`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Stable agent-to-agent communication usually starts with a rigid contract that structures subagent inputs and outputs.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `However, to support a wide range of visual tasks, that contract cannot be too narrow or rigid.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `For example, if we add a field for the task purpose but allow only a small set of values, our subagents cannot handle other kinds of work.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `Bad Design:`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `The following code comes from my first agent-to-agent contract design. It had several design flaws:`), React.createElement(MDXTag, {\n      name: \"ol\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `The `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"li\"\n    }, `role`), ` field in the `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"li\"\n    }, `Image`), ` object is designed for comparison tasks, but not every visual task is a comparison task.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `The `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"li\"\n    }, `judgment`), ` field covers only a limited set of visual tasks, and we cannot list every possible task when we design the skill.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `The `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"li\"\n    }, `judgment`), ` field can contain only one object such that can only have one `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"li\"\n    }, `Alignemnt`), ` object. What if I want to check an object's alignment on both the X and Y axes?`)), React.createElement(MDXTag, {\n      name: \"pre\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"code\",\n      components: components,\n      parentName: \"pre\",\n      props: {\n        \"className\": \"language-typescript\"\n      }\n    }, `interface Image { path: string; label: string; role: \"baseline\" | \"current\" | \"reference\" }\ninterface Request { \n    id: string\n    images: [Image]\n    judgment: Presence | Absence | Alignment | Ordering | Equality | Layout | Readability | State | Diff | Describe;\n    criteria?: string;\n    responseContract?: string;\n}\ninterface Presence { kind: \"presence\"; subject: string; expectation: string }\ninterface Absence { kind: \"absence\"; subject: string; expectation: string }\ninterface Alignment { kind: \"alignment\"; subject: string; axis: string; expectation: string; tolerance: string }\ninterface Ordering { kind: \"ordering\"; direction: string; expected: string[] }\ninterface Equality { kind: \"equality\"; subjects: string[]; threshold: string }\ninterface Layout { kind: \"layout\"; expectations: string[] }\ninterface Readability { kind: \"readability\"; subject: string }\ninterface State { kind: \"state\"; subject: string; expectedState: string }\ninterface Diff { kind: \"diff\"; baseline: string; current: string }\ninterface Describe { kind: \"describe\"; focus: string }\n`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `Good Design:`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `A better approach is to let the agent design the contract from a prompt template and a few explicit principles:`), React.createElement(MDXTag, {\n      name: \"ol\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, `Declare the subagent's spawning prompt as a template.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `Inside`), ` the spawning prompt, declare the subagent's `, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `response schema`), ` as a template.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `Inside`), ` the spawning prompt, add principles that enforce the subagent to responds with the `, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `response schema`), `.`), React.createElement(MDXTag, {\n      name: \"li\",\n      components: components,\n      parentName: \"ol\"\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `Outside`), ` the spawning prompt, add principles that guide the main agent to spawn the subagent with a dynamically designed `, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"li\"\n    }, `response schema`), `.`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `With this design, communication between agents stays structured while remaining dynamic enough to represent a wide range of visual tasks.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"figure\",\n      components: components,\n      parentName: \"p\"\n    }, `\n    `, React.createElement(MDXTag, {\n      name: \"a\",\n      components: components,\n      parentName: \"figure\",\n      props: {\n        \"href\": \"/static/3de83d44217529741ca1a8c90230ef70/e4ef5/agent-to-agent-prompt-example.webp\"\n      }\n    }, React.createElement(MDXTag, {\n      name: \"picture\",\n      components: components,\n      parentName: \"a\"\n    }, `\n  `, React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/3de83d44217529741ca1a8c90230ef70/0cc25/agent-to-agent-prompt-example.png\",\n        \"srcSet\": [\"/static/3de83d44217529741ca1a8c90230ef70/5116e/agent-to-agent-prompt-example.png 178w\", \"/static/3de83d44217529741ca1a8c90230ef70/92f55/agent-to-agent-prompt-example.png 356w\", \"/static/3de83d44217529741ca1a8c90230ef70/0cc25/agent-to-agent-prompt-example.png 712w\", \"/static/3de83d44217529741ca1a8c90230ef70/7ae06/agent-to-agent-prompt-example.png 1068w\", \"/static/3de83d44217529741ca1a8c90230ef70/eee47/agent-to-agent-prompt-example.png 1424w\", \"/static/3de83d44217529741ca1a8c90230ef70/38407/agent-to-agent-prompt-example.png 2136w\", \"/static/3de83d44217529741ca1a8c90230ef70/ad291/agent-to-agent-prompt-example.png 2988w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), React.createElement(MDXTag, {\n      name: \"source\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/3de83d44217529741ca1a8c90230ef70/690c8/agent-to-agent-prompt-example.webp\",\n        \"srcSet\": [\"/static/3de83d44217529741ca1a8c90230ef70/25c8a/agent-to-agent-prompt-example.webp 178w\", \"/static/3de83d44217529741ca1a8c90230ef70/60698/agent-to-agent-prompt-example.webp 356w\", \"/static/3de83d44217529741ca1a8c90230ef70/690c8/agent-to-agent-prompt-example.webp 712w\", \"/static/3de83d44217529741ca1a8c90230ef70/d7e52/agent-to-agent-prompt-example.webp 1068w\", \"/static/3de83d44217529741ca1a8c90230ef70/456ef/agent-to-agent-prompt-example.webp 1424w\", \"/static/3de83d44217529741ca1a8c90230ef70/2a654/agent-to-agent-prompt-example.webp 2136w\", \"/static/3de83d44217529741ca1a8c90230ef70/e4ef5/agent-to-agent-prompt-example.webp 2988w\"],\n        \"sizes\": \"(max-width: 712px) 100vw, 712px\"\n      }\n    }), `\n  `, React.createElement(MDXTag, {\n      name: \"img\",\n      components: components,\n      parentName: \"picture\",\n      props: {\n        \"src\": \"/static/3de83d44217529741ca1a8c90230ef70/e4ef5/agent-to-agent-prompt-example.webp\",\n        \"alt\": \"Prompt template diagram showing how the main agent defines a visual task, images, response template, and response rules for a subagent.\",\n        \"title\": \"Dynamic Vision Subagent Prompt Template\",\n        \"width\": 712,\n        \"height\": 1196,\n        \"loading\": \"lazy\"\n      }\n    }))), `\n    `, React.createElement(MDXTag, {\n      name: \"figcaption\",\n      components: components,\n      parentName: \"figure\"\n    }, `\n        `, React.createElement(MDXTag, {\n      name: \"span\",\n      components: components,\n      parentName: \"figcaption\"\n    }, `\n            Dynamic Vision Subagent Prompt Template\n        `), `\n    `))), React.createElement(MDXTag, {\n      name: \"h2\",\n      components: components\n    }, `Skill Description`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `People may think multimodal support is only about user input. However, tool results can introduce multimodal content.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `This means the skill description should cover cases where tool results include multimodal content. In OpenCode, this is straightforward because images in tool results have two recognizable traits:`), React.createElement(MDXTag, {\n      name: \"pre\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"code\",\n      components: components,\n      parentName: \"pre\",\n      props: {\n        \"className\": \"language-yaml\"\n      }\n    }, `description: >-\n  You **MUST** use the vision skill when your model is text-only (e.g.\n  glm-5.2, deepseek-v4-pro) AND:\n  ...\n  (5) OR a tool result contains an image attachment the current model\n  cannot see (attachments[].mime = \"image/png\",\n  url = \"data:image/png;base64,...\");\n`)), React.createElement(MDXTag, {\n      name: \"h2\",\n      components: components\n    }, `Limitations`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `Native Multimodality:`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `This plugin does not add native multimodality to a text-only model like GLM-5.2.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Multimodal content carries details that text cannot fully capture. Native multimodality lets the model see those details directly.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `This plugin cannot do that. It sends a vision model's findings back to the main agent as text, so some visual information still gets compressed or lost.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `Disable the Plugin:`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Sometimes you may switch to a vision-capable model like GPT. In that case, keeping that model in the driver's seat for visual tasks is a better choice -- it can inspect images natively and may produce better results.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `However, plugins installed with `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"p\"\n    }, `opencode plugin`), ` do not appear in OpenCode's plugin management UI.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `To disable the plugin for a single task, prepend this sentence to your prompt: `, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `\"You MUST not use the vision skill.\"`), ` OpenCode will then skip the `, React.createElement(MDXTag, {\n      name: \"inlineCode\",\n      components: components,\n      parentName: \"p\"\n    }, `vision`), ` skill that comes with this plugin.`), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, React.createElement(MDXTag, {\n      name: \"strong\",\n      components: components,\n      parentName: \"p\"\n    }, `Video Content:`)), React.createElement(MDXTag, {\n      name: \"p\",\n      components: components\n    }, `Models like Kimi K2.7 Code support video input. OpenCode does not accept video input, so this plugin does not support video either.`));\n  }\n\n}\nMDXContent.isMDXComponent = true;","scope":""},"headings":[{"value":"Installing and Using the Plugin","depth":2},{"value":"The Architecture","depth":2},{"value":"Agent-to-agent Communication","depth":2},{"value":"Skill Description","depth":2},{"value":"Limitations","depth":2}]}}},"earlierPostExcerpt":{"slug":"/post/2026/06/glm-5-2-affordable-providers-vision-and-agents-8f8c","title":"GLM 5.2: Affordable Providers, Vision, and Agents","subtitle":"","createdTime":"2026-06-26T00:00:00.000Z","tags":["AI","Agent"],"category":"Programming","file":{"childMdx":{"excerpt":"After running out of Claude Code and Codex quota last week, I tried GLM-5.2 on real code. It felt like a GPT-5.5-tier model for coding-agent work. The official China domestic plans were not practical for me: it is always out of stock, and the stability and speed were poor. So I tested alternative…"}}},"laterPostExcerpt":null},"pageContext":{"postId":"1fb7769d-0a07-5205-8b7d-d082234b1499","earlierPostId":"bd9c1ccb-b935-5d0c-b93a-95d744dc0889","laterPostId":null}}