I just recently started documenting my code as it helped me. Though I feel like my documentations are a bit too verbose and probably unneeded on obvious parts of my code.
So I started commenting above a few lines of code and explain it in a short sentence what I do or why I do that, then leave a space under it for the next line so it is easier to read.
What do you think about this?
Edit: real code example from one of my projects:
async def discord_login_callback(request: HttpRequest) -> HttpResponseRedirect:
async def exchange_oauth2_code(code: str) -> str | None:
data = {
'grant_type': 'authorization_code',
'code': code,
'redirect_uri': OAUTH2_REDIRECT_URI
}
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
async with httpx.AsyncClient() as client:
# get user's access and refresh tokens
response = await client.post(f"{BASE_API_URI}/oauth2/token", data=data, headers=headers, auth=(CLIENT_ID, CLIENT_SECRET))
if response.status_code == 200:
access_token, refresh_token = response.json()["access_token"], response.json()["refresh_token"]
# get user data via discord's api
user_data = await client.get(f"{BASE_API_URI}/users/@me", headers={"Authorization": f"Bearer {access_token}"})
user_data = user_data.json()
user_data.update({"access_token": access_token, "refresh_token": refresh_token}) # add tokens to user_data
return user_data, None
else:
# if any error occurs, return error context
context = generate_error_dictionary("An error occurred while trying to get user's access and refresh tokens", f"Response Status: {response.status_code}\nError: {response.content}")
return None, context
code = request.GET.get("code")
user, context = await exchange_oauth2_code(code)
# login if user's discord user data is returned
if user:
discord_user = await aauthenticate(request, user=user)
await alogin(request, user=discord_user, backend="index.auth.DiscordAuthenticationBackend")
return redirect("index")
else:
return render(request, "index/errorPage.html", context)
The code already describes what it does, your comments should describe why it does that, so the purpose of the code.
Yeah, my general rule of thumb is that the following 4 things should be in the documentation:
- Why?
- Why not?, which IMO is often more important as you might know a few pitfalls of things people might want to try but that aren’t being done for good reasons.
- Quirks and necessities of parameters and return values, this ensures that someone doesn’t need to skim your code just to use it.
- If applicable, context for the code’s existance, this is often helpful years down the line when trying to refactor something.
Yep. I mostly document why the obvious or best practice solution is wrong. And the answer is usually because of reliance on other poorly written code - third party or internal.
Your code should generally be self documenting: Have variable and method names that make sense.
Use comments when you need to explain something that might not be obvious to someone reading the code.
Also have documentation for your APIs: The interfaces between your components.
One interesting thing I read was that commenting code can be considered a code smell. It doesn’t mean it’s bad, it just means if you find yourself having to do it you should ask yourself if there’s a better way to write the code so the comment isn’t needed. Mostly you can but sometimes you can’t.
API docs are also an exception imo especially if they are used to generate public facing documentation for someone who may not want to read your code.
Agree with you though, generally people should be able to understand what’s going on by reading your code and tests.
Great points. I’m a huge advocate for adding comments liberally, and then treating them as a code smell after.
During my team’s code reviews, anything that gets a comment invariably raises a “could we improve this so the comment isn’t need?” conversation.
Our solution is often an added test, because the comment was there to warn future developers not to make the same mistake we did.
I know there are documentation generators (like JSDoc in JavaScript) where you can literally write documentation in your code and have a documentation site auto-generated at each deployment. There’s definitely mixed views on this though
To my knowledge that just formats existing comments. With LLMs you could probably do 95% of the actual commenting.
Useful comments should provide context or information not already available in the code. There is no LLM that can generate good comments from the source alone
Codium does surprisingly well at generating JSDoc, and it processes your code within the context of your entire codebase. Still not quite there yet, but you might be surprised
Why wouldn’t it be able to? It can link similar code structure to data in its training set. Maybe the ones that aren’t at that level but it’s hardly a stretch to make these inferences. Most of the code you write is hardly novel.
If it’s not exactly novel, how many comments do you really need?
An LLM is just gonna describe the code it sees. Good comments should include information and context that is not already in the source.
I’m mostly talking about when you need to use JSDoc format which are usually for interfaces, so it’s usually just a chore for humans.
Probably harder to get good comments inside code, but it might still be possible.
Good comments describe the “why” or rationale. Not the what. This function doesn’t need any comments at all… but it needs a far better name like logAndReturnSeed. That said, depending on what specifically you’re doing I’d probably advocate for not printing the value in this function because it feels weird so I’d probably end up writing this function like
def function rollD10() -> int: return random.randInt(1, 10)
And I, as a senior developer, think that level of comments is great.
You mentioned that this is a trivial example but the main skill in commenting is using it sparingly when it adds value - so a more realistic example might be more helpful.
The big problem in your code is that the function name isn’t descriptive. If I’m 500 lines down seeing this function called, how do I know what you’re trying to do? I’m going to have to scroll up 500 lines to find out. The function name should be descriptive.
I expected this comment section to be a mess, but actually it’s really good:
- “why not what”
- “as self-documenting as possible”
If you want an example, look at the Atom codebase. It is incredibly well done.
Great summary. The only thing I would add is that when we say “Answer Why?” we’re implicitly inlcuding “WTF?!”. It’s the one version of “what” that’s usually worth the window line space it costs. - Usually with a link to the unsolved upstream bug report at the heart of the mess.
I’m curious as to thoughts regarding documenting intent which cross over with what in my opinion.
Regarding self documenting: I agree, but I also think that means potentially using 5 lines when 1 would do if it makes maintenance more straightforward. This crazy perl one liner makes perfect sense today but not in 3 years.
I like to do two kinds of comments:
- Summarize and explain larger parts of code at the top of classes and methods. What is their purpose, how do they tackle the problem, how should they be used, and so on.
- Add labels/subtitles to smaller chunks of code (maybe 4-10 lines) so people can quickly navigate them without having to read line by line. Stuff like “Loading data from X”, “Converting from X to Y”, “Handling case X”. Occasionally I’ll slip in a “because …” to explain unusual or unexpected circumstances, e.g. an API doesn’t follow expected standards or its own documentation. Chunks requiring more explanation than that should probably be extracted into separate methods.
There is no need to explain what every line of code is doing, coders can read the code itself for that. Instead focus on what part of the overall task a certain chunk of code is handling, and on things that might actually need explaining.
I find too verbose comments less annoying than no comments.
Try to describe the bigger picture. Good comments allow understanding the current portion of the code without reading other code.
Also add comments later if you find yourself having to read other code to understand the code you’re currently looking at.
Comments are also a good place to write out abrevations/acronyms.
Never optimize for sourcecode size.
Write comments for functions
“Function x creates a number and prints it to the console”
“Function x fetches new content from the fediverse”
Commenting print(“hello world”) with “print hello world” doesn’t make too much sense
Yeah I know that. I wrote that just as an attempt to show how it looks like. I won’t document a print statement in my code.
Imagine your “code” as English sentences. If it is hard to read, you might rephrase it. If something is getting long and drawn out, use paragraphs (methods and functions). At the end of the day, the easier it is to read, the better, unless there’s a performance cost that’s worthy of considering.
Like the top-level comment suggests, you should comment your methods. I would go one step further and use a standard comment format. I like Ruby, so immediately, I think YARDoc. With a YARDoc comment, you define what it does, the parameter types and descriptions, what it returns, possible exceptions that could be returned, etc.
Even better, by using standardized comments, not only does this make it easier to read by you and others, but most of the time, you get documentation rendered for free. For example, here is a library I wrote:
And here is the automatically-generated HTML documentation:
More specifically, here’s some YARDoc for a method:
And here is the generated documentation from this comment:
This style of auto-generated documentation is available for pretty much all mature languages, and I highly recommend that you hit the ground running with them 👍
Thanks. It seems interesting and useful.
I rarely read comments in code, that is from within source code anyway. I of course write comments explaining the behavior of public facing interfaces and otherwise where they serve to generate documentation, but very rarely otherwise. And I use that generated documentation. So in a roundabout way I do read comments but outside of the code base.
For instance I might use godoc to get a general idea of components but if I’m in the code I’ll be reading the code instead.
As others have said, your code generally but not always should clearly express what it does. It is fine to comment why you have decided to implement something in a way that isn’t immediately clear.
I’m not saying others don’t read comments in code; some do. I just never find myself looking at docs in code. The most important skill I have cultivated over the decades has been learning to read and follow the actual code itself.
There are several types of documentation:
- Line or block comments. Reserved for when you’re doing something non-obvious, like a hack, a workaround because of a bug that can’t be fixed yet etc. Designed to help other programmers (or yourself a few months later) to understand what’s going on. Ideally you shouldn’t have any of these but life ain’t perfect.
- If parts of your code are intended to be used as libraries, modules, APIs etc. there are standard methods of documenting those and extracting the documentation automatically in a readable format — like JavaDoc, Swagger etc. Modern IDEs will generate interface hints on the fly so most people nowadays rely on those, but they’re not a 100% substitute for the human-written description next to a class or method.
- Unit tests describe the intent for a piece of code and offer concrete pass/fail instructions. Same goes for other type of tests, like end to end tests, regression tests etc. All tests come with specific frameworks, which have their own methods of outlining specifications.
- Speaking of specifications those are also a very important type of documentation. Usually provided by the product owner and fleshed out by technical people like architects or team leads, they’re documented in tools like JIRA as part of the development process. They are at the core of the work done by programmers and testers.
- Speaking of processes and procedures, it helps everybody if they’re documented as well, usually in a wiki. They help a new hire get up to speed faster and they explain how the toolchains are set up for development, testing, deployment and bug fixing.
- The human interfaces are a particularly interesting and important aspect and they’re usually modeled and shared in specific tools by UX people.
- Last but not least the technical as well as business designs should be documented as well. These usually circulate as PDF, DOC, Excel, PPT over email and file shares. Typically made and contributed to by business analysts and software architects.
For new code I’m writing I’m using mostly JsDoc function headers on public methods of classes and exported functions. With one or two sentences explaining what function does.
Also try to explain what to expect in edge cases, like when you pass am empty string, null, … stuff of that nature - for which I then create unit tests.
I also always mention if a function is pure or not or if a method changes the state of its object. On a sidenote I find it odd that almost no language has a keyword for pure functions or readonly methods.
If I add a big new chunk of code that spans multiple files but is somewhat closed off, I create a md file explaining the big picture. For example I recently added my own closed off library to my angular frontend that handles websocket stuff like subscribing, unsubscribing, buffering, pausing,… for which a created a md file explaining it.
What is a pure function? Never heard that before.
Essentially a function that doesn’t produce side effects, like modifying variables outside of its scope or modifying the function parameters. This something you should always try to incorporate into your code as it makes it much easier to test and makes the function’s use less risky since you don’t relay on external unrelated values.
To give you an example in JavaScript, here are two ways to replace certain numbers from an other list of numbers with the number 0
first a way to do it with a non pure function :
let bannedNumbers = [4,6]
const nums = [0,1,2,3,4,5,6,7,8,9]
function replaceWithZero(nums){ for (let i = 0 ; i < nums.length; i++){ if (bannedNumbers.includes(nums[i])){ nums[i] = 0 } } } replaceWithZero(nums) console.log("numbers are : ", nums)
here the function replaceWithZero does two things that make it impure. First it modifies its parameter. This can lead to issues, for example if you have Second it uses a non-constant variable outside of its scope (bannedNumbers). Which is bad because if somewhere else in the code someone changes bannedNumbers the behavior of the function changes.
A proper pure implementation could look something like this :
const nums = [0,1,2,3,4,5,6,7,8,9] function repalceWithZero(nums){ const bannedNumbers = [4,6] const result = [] for(const num of nums){ result.push(bannedNumbers.includes(num) ? 0 : num) } return result } const replaced = replaceWithZero(nums) console.log("numbers are : ", replaced)
Here we are not modifying anything outside of the function’s scope or its parameters. This means that no matter where, when and how often we call this function it will always behave the same when given the same inputs! This is the whole goal of pure functions.
Obviously in practice can’t make everything 100% pure, for example when making a HTTP request you are always dependent on external factors. But you can try to minimize external factors by making the HTTP request, and the running the result only through pure functions.
I really wouldn’t call anything that hits the network pure, because errors are quite likely. But I guess we all put the bar at a different level, I would not count logging as a side effect yet I’ve been bitten by overly verbose logs in hot loops.
const-ness gives a mini version of purity, although nothing prevents someone from opening
/etc/lol
in a const function… I think GCC has a pure attribute but I don’t think it’s enforced by the compiler, only used for optimizations