understanding a big codebase you have never worked.
Get it up and running in a dev environment and start inserting changes to see what breaks where.
Revert and retry until you’ve learned where you’re supposed to be meddling.
Look at the packages. Try to break it down into architectural layers. Understand in a broad sense what each layer adds to the one before. Rage that it wasn’t so much architected as cobbled together from pieces never designed to fit together. Decry it as total garbage and recommend total rewrite.
As an advanced technique, you can usually skip the first half of this.
Everything except last 3 words.
I think about a feature or bugfix that I want to work on, then shoehorn it in by any means necessary. Once my code is confirmed working, the planning phase begins and I go through the module(s) I’m working with line-by-line and match the original author’s coding style and usually by that point I pick up a trail or discover a bunch of helper functions/libraries that I can use to replace parts of my code, and continue from there.
As others have said, configuration files is a great way to learn that. Pick a config option you want to learn about, jump to the config loader, find where the variable gets set, then do a global search for that function. From there it starts to fall into place.
Sidenote: I also learned rust this way. It took me around 6 months to learn the rgit codebase solely from adding features that I wanted from cgit. Now I’m at the point where rebasing from upstream to my soft-fork doesn’t mess up any of my changes, and am able add or fix things with relative ease. If memory serves, a proper debugger (firedbg is excellent!) was used on several occasions to track down an extremely annoying and ambiguous error message that was due to rust’s trait system being a pain in my ass.
Start early in the commit history, see if you can understand the general shapes and concepts the project was using at the start.
Then sort of binary-search your way forward in different sized jumps and see how quickly you can get to present day without sacrificing your sanity. Completely at least.
I have two key points to understand any large codebase:
- Start with the entry point. Check the initialization process. It will most likely tell you what other parts of the code are crucial to the application. Start digging into those parts that are mentioned in the initialization process. Rinse and repeat for their dependencies which might look important. Just read and take notes if necessary. Try to understand how the application gets its stuff running. Don’t spend too much time on a specific part, just get a broad understanding and how it all flows.
- After the first step, you should start seeing some sort of patterns to how the software is made: repeating principles, common practices, overall architecture. This is the point when you should be confident enough to introduce changes to the software, therefore you should have a build environment which guarantees the application works. If it doesn’t, have someone in the team help you to get it running without any changes to the codebase. Don’t make changes until you have a working build environment.
With both done, you should already be comfortable enough to start modifying the application.
I cannot stress enough how many developers I’ve seen trying to dig into random parts of the code knowing nothing where or how it all begins, making it super-problematic to add new features. Yeah they can fix a bug or two, but the biggest issues start when they try to implement something new.
Pick a small bug or feature from the backlog and fix it. First iteration of a fix is probably shoehorned in there, then I try to adapt the fix to the code base. Matching the style and design of the code base is more important than my own preferences.
I’m a learn by doing kind of person.
If there is git history it’s often a good thing to use that to understand what tends to change together, which parts have lots of churn, etc.
There are tools for this: https://github.com/smontanari/code-forensics
First good step is to get the program running. Particularly good if you manage to set it in an IDE so you can use a debugger.
Next would be to play around with the program. Does it give any print outs? Search for the printed text in the code to see where it comes from. What would it take for it to get there?
Are there any configuration files? Try to find where the configuration is loaded in the code and see where the config parameters are used. How does different settings affect the program?
Are there any API endpoints? Find where the endpoint is defined. Do you know how to call it? Figure out what you need to call it. Do you get any interesting response? Figure out why you got that response.
Keep searching for clues like these about how the program works. See how the program behaves and work out why it behaved that way.
Over time you might learn the anatomy of the code base. It will become easier and easier to navigate around the code.
Depends on what I want or need to “understand”.
I’ve worked for many years on a project (it’s a whole project ecosystem tbh with multiple projects; desktop winforms app, server app, SQL server, asp.net MVC app, asp.net blazor app, mobile wpf app, sync service app). On the main project (client + server) I haven’t visited one major area, and another I confidently know that it’s not understandable to me without specific deep effort.
I recently had to work on the latter. I take a localized approach. Explore what I have to do, without opening the full scoped understanding that’d lead me to architecture refacs. I write out the method call stacks to get an overview of who calls what when. To then know what I have to inspect and analyze further.
I take notes where necessary, or improve and comment code where appropriate for better understanding and obviousness.
I create documentation - about concepts and architecture as appropriate and necessary.
Code should be obvious and intuitive. Concept docs should document the broader concepts.
When those concept docs exist, those are what you look at to understand app intention and behavior. And it should give you an introduction to architecture. From there, exploring the code should be self-explanatory (but may require specific, repeated, and iterative analysis). And I take notes about what’s relevant and I need for understanding or task.
Afterwards, those notes should have, or should then integrate into the code base or docs, or be determined irrelevant for that. If I had to write them out and down, it’s more likely they should be part of something than not.
In what regard?
navigating/ making changes.
Use git? Use CI/CD? What do you mean?