A look at AI coding agents - Claude Code and CoPilot CLI
Introduction
A few months back I wrote a post about doing some Accidental Vibe Coding using Claude. This was messing around with the Claude application, generating code and copying and pasting it back into my IDE. This time I am doing something more intentional using the Claude Code and GitHub CoPilot CLI coding agents. Using the sonnet 4.5 model with Claude and the gpt-5 model with CoPliot.
I have been using AI assistants for coding for quite a while now. I used GitHub co-pilot when it was still in beta and, when it was quite new (before it even had a chat feature), used Tabnine at work. When these tools were released they seemed like magic. The fact they could take a description of what you wanted to do translate that into code to do it was amazing. The way they would complete blocks of code for you was very impressive. Looking back though, we were wowed by the tech and the fact this ghost in the machine could magically produce what appeared to be coherent code. In reality they were fairly limited in functionality and usability. You could get them to write relatively simple blocks of code which were often almost there but not functional. They saved time writing some boiler plate code, completing large blocks of repeated patterns like switch blocks and in my case, helping with remembering syntax. But you could not use AI to create entire applications. At least not functional ones.
Fast forward 3 years and the world of coding with AI is light years away from where it was. In the last 6 to 9 months alone the world has changed. We have moved away from the wonder of vibe coding, evolved with the tools and are finding more and more ways to incorporate AI coding into our workflow. Desktop AI and coding assistants are massively more capable and can generate complete working applications from scratch.
We are still figuring all this out and it means different things to different people. On a recent episode of the Dwarkesh Podcast, Dwarkesh Patel was talking to Andrej Karpathy who shares some great insights (I highly recommend listening to or watch the episode) and thinking. One of the things Karpathy mentions is that he uses AI assistants while coding as a smart auto complete (as per the way we all used to exclusively use it). He uses it as a labour saving tool for writing boiler plate code or completing the ’local’ chunk of code being worked on. The way the tools work now makes this much more reliable and I find myself still working in this way a lot as well. Its a great way to enjoy a coding task and be super productive without feeling like the AI is doing all the work for you, or in Andrej’s case, doing it in ways that he can do better.
To this end I have found myself working with AI tools in a mixture of four modes. There may well be existing terms of these, but here is how I define them:
Smart Auto-Complete - Selectively having the AI help complete chunks of code as I write
Hands off Expert Assistant - Having the AI answer questions about existing code or advise and assist when writing new code. Usually via a chat interface. Things like: “how can I implement X here”, “explain this block of code”, “how could I improve this function?”, “Is this idiomatic”
Hands on Expert Assistant - Have the AI answer questions and implement solutions directly in the code. I guess this is a more ‘agentic’ type interaction, but its distinct from the last mode which is purely agentic
Independent Development - Prompt the AI to implement a complete set of functionality or a complete version of the application. The AI can work independently and produce entire working applications or sub-systems
I tend to mix these modes as I see fit as I am sure many people do. But I find the first two massively helpful as a ‘hobbiest developer’ who wants to learn by doing.
The biggest win I have found using AI tooling is that it has reinvigorated my coding by removing things that in the past blocked progression. What I mean is that I no longer fear getting blocked on things. Especially things that are not at all my strength (web development!) or I have struggled to learn and get frustrated by lack of progress which in some cases leads to an abandoned a project. Although This doesn’t often happen to me these days as I have learnt to be much more patient with learning new things. But sometimes the need or enthusiasm just goes away. As happened with the application I used for this experiment.
But the Human Factor
In all of this, one thing I have always thought and I am even more convinced of now, seeing these things in use, is that as capable as AI tooling is, it won’t be replacing a human developer any time soon. Of course with the rate of progress being as it is these could be famous last words. I don’t think so though, there is something that AI lacks and will lack for a while I think. The lacking ingredient is variation in creativity that arises from variation in personality.
Humans have infinite variety in personality and imagination. LLMs, while extremely capable, produce content that without a human in the loop, often lacks a certain uniqueness or perhaps ‘humaness’ that is identifiable by other humans. Its a version of the uncanny valley. In some cases this does not matter and LLM’s are the perfect automata for the job. But it’s additive to the human element not a replacement for it.
This is the super power, it makes LLM’s all the more powerful and useful. They unlock latent human creativity allowing us to focus on the important detail where our personalities matter, rather than the mundane and mechanical.
Not to be overly utopian and gooey about it, because of course there are negatives, but I genuinely feel that the revolution in AI will create a renaissance of creativity.
A Story - The Self Hosting Dream Nightmare
Once upon a time I used to self-host everything including my own email server (yes this was and still is a monumentally bad idea). After a while I realised that not only was fixing computer systems my day job, it had also become my night job. Eventually I got tired of spending my spare time running my own IT support and I binned all of it and made my tech setup as simply as possible. I vowed never to self host anything thing that was a critical part of my ’tech stack’.
But then one day I noticed my smart TV was talking to hundreds of odd looking domains on the internet, investigating I discovered that these were of course advertising or analytics related. I didn’t remember opting into any of it. This annoyed me. So I started to look at ways to block the unwanted traffic without breaking the general functionality of the TV. And this is how it starts…
I discovered one effective way to block lots of this noise was to blackhole the DNS queries and as luck would have it there was an open source product that did exactly that. Its called Pi-Hole (yes the name is amusing). Pi-Hole is DNS sinkhole or blackhole tool designed to run on a low powered Raspberry Pi device or in a docker container.
I installed it on a Raspberry Pi and slung it in a cupboard. Then changed the DHCP settings on my router to hand out the Pi address rather than an external DNS service and I was in business! But.. I was now, once again, self hosting a critical part of my tech stack. Not only that, I decided it was a great idea to use it for name resolution for all on my home network. Including all of my wife and sons devices.
Of course Pi-Hole runs great! It magically just works and never fails… Until you go away on a work trip for a week. Then of course the minute you leave the house that one critical thing silently fails and your wife can’t send email and your son can’t play PlayStation and you forget that the Pi-Hole is in the loop and spend hours trying to debug the wifi. Back in support hell.
So to give me an early warning system and remind me that the Pi-hole is doing stuff, I decided it would be a nice project to write a simple healthchecker app. Yes, yes… I know things already exist. Yes, yes I am very aware of prometheus. But where is the fun in that!
The Simple Healthchecker App
To solve this problem I wrote a really simple Go app that reads in a set of hosts from a config file and periodically executes whatever healthchecks are defined against them. Initially just a simple ‘ping’ and/or an ‘http’ check. Both for liveness. The app then uses healthcheck.io to report and alert on liveness via email and the excellent Pushover service. Spot the flaw? Yes… The app will fail to resolve healthcheck.io if DNS is down. I just made sure I hosted the app on a machine that didn’t use my pi-hole for resolution.
I knocked this app up in an afternoon and hosted it on my Mac Mini as a daemon. Perfect! It worked quite well.
Of course this was the MVP. I wanted some extra functionality. I wanted:
- A WebUI so I could see the status of checks without digging out the log.
- I wanted to be able to dynamically add hosts and checks and I wanted to be able to optionally do it via the webUI
- I wanted to use HTMX for the UI
- I wanted to use Go’s embed functionality to encapsulate all the UI components within the binary
As I said above, I am rubbish at Web development so I knew the UI was going to be painful. But I wanted to learn so I started by implementing a simple UI to render the config and enable or disable the checks. It was really janky and really basic:

Pretty ugly right?
I toiled away at making the UI better and more functional, but I kept hitting roadblocks and it wasn’t fun any more. I still fancied the idea of making the UI nice and more functional. But I shelved it and despite best intentions didn’t get back to it.
So this seemed like the perfect app for an experiment with AI coding tools to unblock me. It’s something that has some utility and I actually use.
Enter The AI
As mentioned above I used Claude Code and GitHub CoPilot CLI to recreate the application. I allocated the same amount of time as I had used in the first implementation: About half a day each to implement the simple healthchecker app with a functioning WebUI.
The goal is to compare the two and provide some commentary about the experience and effectiveness of each.
I used this prompt as the starting point:
I want to write a simple healthcheck application in Go.
The application will take a YAML or TOML config file that will contain the list of hosts
I want to check liveness of.
By default the application will try to ping each host in the
config and return a status. Later I want to add other checks such as HTTP checks etc.
I also want to use healthcheck.io to show a failure and send
notifications if a ping fails.
Finally, want an embedded webUI that uses HTMX to build a interactive interface
that shows what hosts are configured and what checks they have against them
(i.e. ping check) I'd like to be able to use the webUI to enable and disable checks
This defines my original idea for the app but also gives some specific information to the LLM to see what approach it takes. I specifically mentioned YAML and TOML. My original application used the ubiquitous (viper)[https://github.com/spf13/viper] library for config management. I also used the (cobra)[https://github.com/spf13/cobra] to build the CLI, but for this use case it was probably overkill. I wanted to see if the LLM’s opted for a library or just implemented this functionality directly.
I also used HTMX for the UI to remove the requirement for a lot of client side Javascript. I got very interested in the idea of hypermedia-driven applications and wanted to experiment with it. So I explicitly instructed the LLM’s to use it.
Lets dig into how each agent and LLM faired!
GitHub Copilot

GitHub Copilot CLI. (Sill in public preview it appears), “brings AI-powered coding assistance directly to your command line”. Following in the footsteps of other CLI based tools, Copilot runs in the shell and can write code and execute shell commands directly (once granted permission). If you already have a Github Copilot subscription (as I do) you can use it with the CLI tool.
When using an IDE, Copilot enables you to use multiple different LLM’s with different multipliers for ‘Premium requests’:

With the CLI, you can only use LLM’s which consume premium requests:

Which means for every prompt using the agent you will consume premium requests at the multiplier shown.
Within the basic subscription you get 300 premium requests per month. It seems quite hard to determine exactly how many premium requests you are consuming as the agent operates. You can look on your GitHub subscriptions dashboard and it will show you where you stand. I consumed 25% of my premium requests doing this experiment. At 100% I would have to pay extra, or fall back on agent/chat using one of the 0x multiplier models in the IDE. It would be nice if you could fall back on a zero multiplier model within the CLI, but maybe there is some capability issue there.
I chose GPT-5 to use with copilot.
I first initialised the project directory my creating an initial go module and Git repo to give the LLM an hint and initial ‘seed’ for the project start.
The options you can give the agent are fairly basic as compared to Claude Code (see below). It’s a matter of giving it the prompt and letting it go. Which is exactly what I did.
Copilot got straight to writing a working MVP with very little up front planning. It followed my prompt very well. It didn’t use Viper (unexpected) or Cobra (expected). It used the go embed directive to embed the UI and strictly used HTMX to create the initial UI. It went off and researched how healthchecks.io works and implemented the required integration.
The initial UI it created was extremely basic, a simple table showing the check status and state (unfortunately I lost the screenshot!). But the app functioned as per the spec. I immediately followed up by prompting the agent to improve the web UI. I asked it to implement a card based view like the one I tried to implement myself.
After a few iterations to refine it, including asking it to implement the ‘Add host’ and ’edit’ modal dialog it came up with this:

I had some back and forth with the agent to iterate and get the edit/add and delete functionality working right and to improve the UI layout.

I had one situation where copilot failed to implement a working feature, which was the scheduler. The way it implemented it meant that it did not start the scheduler on initial startup and because the scheduler code included the code to reset the schedule it therefore never ran any checks. Copilot got a bit hung up debugging this and I had to add some ‘printf’ debugging to trace it. Which is an area Copilot was weak on: logging and debugging output. Copilot took a very basic approach here. It was the one area I had to manually add functionality. To be fair I never asked it to log anything, but I was surprised when it didn’t do more of it because logging to the console is one way the LLM can check the app is working as expected during the debug loop. Copilot preferred to write it’s own little test functions to do this rather than adding explicit log output.
The session took about four hours from start to ‘finished’ app with a UI I was happy with as a usable v1. Including manually reviewing the code myself to understand the implementation and approach the LLM had taken.
As I say I used about 25% of my premium request budget to generate the application, but it cost me nothing above the Copilot subscription I was already paying for. I was very happy with the result. My app felt more complete and I really enjoyed creating it the way I wanted it to be in my original plan.
Copilot Observations and Summary
Copilot favoured building directly towards a basic working MVP over planning or building a task list and trying to build a more complete product in one go. There is nothing wrong with this approach and some may prefer it
I was very happy with how quickly Copilot could interpret follow up prompts and turn them into the functionality I required. The back and forth conversation with the agent felt natural and easy. It was great at understanding what I wanted. Iteration time was fast apart from when it got hung up running external commands
Copilot didn’t add much in the way of logging or debug output, preferring to write small disposable tests to test specific areas of focus or issue
Copilot was very good at sticking to the brief given in the original prompt. Even when iterating it would stick to the parameters of the original brief
Copilot struggle with some shell commands and often required me to manually run things. This may have been an issue with my environment and I couldn’t get to the bottom of the issue. Copilot quickly abandoned trying different approaches to get something to work and would instead provide me with the command to run and to provide feedback manually. This was ok, but got a bit cumbersome at times and slowed down iteration
Copilot was well integrated into the VS Code IDE, as you might expect from two Microsoft products. The ability to switch between agent and ask modes is very helpful. It allows me to operate in all four of the modes discussed above frictionlessly
The plethora of models available via Copilot means you can choose the best option based on requirement if you so desire. Albeit with the premium multipliers for most of them
At $10/month for a subscription and the premium request allowance, Copilot is extremely good value for money
Claude Code

Once authenticated,the latest version of Claude Code seems to have ditched the fancy splash screen it had during tech preview and is straight down to business.
With a Claude Pro subscription you have access to two Anthropic models:

The current Sonnet model (4.5) and the new (as of mid October 2025) Haiku model (4.5). Haiku is the more efficient model in terms of compute and power and is a third the cost of Sonnet to use. Haiku is slightly less ‘powerful’ than Sonnet. Anthropic define Sonnet 4.5 as ‘frontier class’ and Haiku is ’near frontier’. Anthropic point out that you could have Sonnet break up a complex task and farm it out to a bunch of Haiku agents. Which is a neat idea.
If the benchmarks are to be believed Haiku looks pretty close and would probably me more than good enough for this task. Regardless I chose Sonnet 4.5 for this experiment.
Claude code agents are interesting I didn’t use them here but maybe for another experiment. It’s one of the things that Claude Code can do that Copilot can’t. Using the /agents command you can define multiple independent agents to take on persona’s within your project and do specific tasks:

Claude offers a lot more options than Copilot:

Including this new Sandbox feature which allows you to ‘unleash’ Claude without supervision in an environment isolated from your filesystem and network. With out fear of it making catastrophic mistakes like deleting your operating system.
Anthropic are updating Claude and releasing new features at quite a pace. These and other features are clearly aimed at the enterprise to enable fast and wide adoption.
On first run in a project directory which doesn’t contain one, Claude will prompts you to run ‘/init’ on the project directory to generate a new Claude.md file. It will analyse anything in the project directory and your prompt to generate the CLAUDE.md file which will contain project specific settings and context to allow claude to better understand your project and bootstrap new sessions.
I ran /init and created the basic CLAUDE.md file. If there are any source files in the directory at all Claude will use them to build CLAUDE.md, you can them augment it with specific detail and ask claude to keep it updated as you go. It’s very slick.
I gave Claude the same prompt as I gave Copilot and set it going. Out of the gate Claude takes a different approach, again to my eye, aimed at enterprise users. Claude first builds quite a detailed plan before it starts writing any code.
Unlike Copilot, Claude favours writing a complete and more polished prototype on the first iteration. Once its done, it proudly proclaims the application is ‘Production Ready’

Not so sure about that! But the Go it produces is idiomatic in terms of code and project structure, which is expected but nice.
Like Copilot, Claude didn’t use Viper or Cobra for the config or CLI scaffold, preferring to write from scratch. This makes sense in that the requirements are very simple and importing two chunky libraries is probably overkill.
Claude also used HTMX as instructed and used embed as requested but this is where is differed from Copilot when refining the Web UI in subsequent iterations. Instead of sticking to the Hypermedia paradigm, Claude preferred to use a lot of additional client side javascript for the UI. I noticed it was doing this and questioned it suggesting this wasn’t the correct approach with HTMX. Claude agreed and refactored the UI. Once again proclaiming success:

Interesting that it didn’t stick to the requirements like Copilot did and went off on its own. But once re-prompted it refactored things reliably and quickly.
Claude also favoured writing much more detailed logs to the console than Copilot and was more autonomous with its debugging. Often making mistakes, discovering them in the logs and fixing them before finishing the current iterations.
Claude provides slightly better summaries of what it did on each iteration than Copilot, it’s always very proud of itself whereas copilot was more utilitarian in its responses.
Because Claude puts more polish on the final product it didn’t need additional prompting to ‘pretty up’ the UI. As such I never prompted it to produce a style. So the final ‘production ready’ UI doesn’t use a card layout. I thought would prefer the card layout, but I actually quite like the Claude version:

However, there will be many people out there who instantly recognise this as a ’thing generated by Claude’. It has a look about it that many web interfaces generated by Claude sort of share. I am sure I could change it up with more iterations, but I actually quite like it as is for now.
Because Claude was better at fixing issues as it went it was quicker to iterate with it. So I carried on a bit further with the Claude version. Because I knew I would be running this on a Mac Mini, instead of only running it as a system daemon, I added a feature to daemonise and add a Mac menubar item from which a user can launch the UI.

Having never tried this myself before, I had no preconceived idea how to do this in Go so I didn’t prompt Claude with any preferred library or approach. Claude found a cross platform systray library on github and implemented it in one go. Which was very impressive. I had to provide it an image for the icon though because it was rubbish at drawing a simple icon. In theory this will work on Linux, Mac and Window. I have only tested on Mac so far.
The session took the same time as copilot, around 4 hours. As for quota use and cost, well Claude billing requires an accounting qualification to understand, not even Claude itself can answer billing questions:

I had to actually search for something! Or what I really did was ask Perplexity.
In any case, with a Pro subscription you get full access to the Sonnet and Haiku models but not Opus from desktop, web and claude code. With rate limits that recycle intra day and weekly limits on input and output tokens. You can pay-as-you-go beyond the weekly limits I believe by switching to using an API key instead of the OAuth Login, but I didn’t have to. It equates to 10-40 prompts (depending on size) every 5 hours. Anthropic say most Pro users can expect 40-80 hours of Sonnet 4.5 within the weekly limit. Which is ample for my use case and I expect a large number of hobbists. Anthropic also say its good for ‘small repositories (typically under 1000 lines of code)’ which seems low to me. I didn’t hit any context limits with this project, but it wasn’t very taxing.
I used ~10% of my weekly quota on this experiment.
Claude Code Observations and Summary
Claude was more planning orientated than Copilot and favoured what it considered more robust final products. Although for this project this amounted to some cosmetics rather than core functionality
Claude has more features oriented towards the enterprise user than Copilot. While these are super useful, they may not be necessary for the casual user
Claude Code offers many more configuration options such as sub-agents and sandboxing that copilot does. Copilot gets the job done, but doesn’t extend far beyond the basic coding agent functionality
Claude’s preference for verbose logging enabled it to produce working product with fewer re-prompts than Copilot
Claude appears to be more willing to go off on tangents and take a broader interpretation of the spec. Doing what it thinks is an appropriate approach. Hence it needed realigning when it took a path that wasn’t correct as per what was asked for
Claude had fewer issues running commandline tools than Copilot, which again made iteration time a bit quicker
At $20/Month Claude Pro is double the price of copilot. But you get a lot of bang for your buck. Claude code without ‘premium’ multipliers as well as Claude desktop Chat, Web Chat and now Claude Code Web as well. Add to this the capabilities Claude has such as the new skills feature and many more they keep adding:

(The MCP builder skill looks interesting)
Overall I was very impressed with what you get with Claude Pro. $20/Month seems like a reasonable price for a nice chat application with a ton of capability and coding agent.
Summary
Both Claude Code and Copilot are very capable and more than met the challenge of unblocking me on my frustrations with Web UI’s. I was more than happy with both results and could comfortably use either. I think I will probably use the Claude version as I added the nice menu item option to that one. But I might tinker with the UI to make it less ‘Claude default skin’
Both Claude and Copilot are great value for money. If what you mainly want is a coding agent then Copilot is a great choice and I am sure Microsoft will keep adding more functionality. If you want the nice chat app with its extra capabilities and the more enterprisey features and operating model then Claude is an excellent option. You can’t go wrong with either option.
If you are in the market, once piece of advice I would give is to pick one (or both!) and lock in a yearly subscription now. We are in a golden age at the moment and the prices can only go up. Get today’s price before they do!
I am very happy to have these tools in my toolbox. Not only do I use them every day in my day job, but they save me from a horrible night job as well. They unlock an ability for me to complete personal projects beyond the barrier where frustration might otherwise set in. They help me learn faster and they make coding and creating more enjoyable for me.
You can find the code generated by me, Copilot and Claude here: https://github.com/andrewsjg/simple-healthchecker
Future Experiments
In the future I’d like to update this experiement using OpenAI Codex and Gemini CLI. I’d also like to see if it’s at all practical to use Codex with locally with a local LLM via Ollama or Ollama cloud.
I’d also like to dig into the various agent SDK’s as well as the various offerings from AWS and Google such as Google Agent Development Kit and the GCP hosting options and AWS Bedrock AgentCore. I have had a play with Agent Core and was very impressed. I’d like to do more.
Watch this space!
I hope this was of some help and interest to people. If you did find it useful and interesting, feel free to drop me a line on Mastodon - @[email protected], BlueSky - @jgandrews.bsky.social or LinkedIn. If you REALLY enjoyed it, feel free to buy me a Coffee via the button below!