 |
 |
 |
 |
| Programming & Packaging A place to discuss programming and packaging. |

30th May 2012, 09:54 PM
|
|
Registered User
|
|
Join Date: May 2012
Location: United States
Posts: 2

|
|
Is there a systematic way to read and understand code?
(preamble)
I am a first time poster here on Fedora Forums; I have been using fedora and other linux distros for a few years now. I'm in college, and I am now studying computer engineering and some other things. I've learned most of my programming (python, hello-worldy C/C++, and now java) mostly just by getting a project from research or work and trying to hack it together as best as I can think how to do it.
I use google, various linux forums, stackoverflow, and github to get my knowledge. I use nothing but open source software for my projects.
(important bit)
As an engineering student (previously studying under a different discipline) it occurred to me that I wasn't approaching reading code in a very systematic or organized way. I thought about it some. It didn't seem right to me. When I was told to solve a thermodynamics problem, or a biomechanics problem, there was a systematic approach you could take that would invariably lead to the solution. Even for poorly defined problems(non-physical), there are critical thinking methods you can use for all sorts of life problems, there are writing recipes for strong prose, and so forth.
Is there a systematic way to study someone else's code? If I ever do any real open-source work, I figure I'll have to read a lot of code. I want a sort of guideline, methodology, or system for doing so.
Fedora Forum seemed like the right place to ask this sort of question, because the fedora project is massive, cutting edge, and open source.
Thanks
|

30th May 2012, 10:25 PM
|
|
Registered User
|
|
Join Date: Aug 2009
Location: Waldorf, Maryland
Posts: 6,108

|
|
|
Re: Is there a systematic way to read and understand code?
Unfortunately, there isn't a good way.
There are a number of USEFUL ways, though none work in all instances. It is also an iterative process, you have to repeat earlier steps, looking at code again after gaining some experience with other parts of the application.
One way (IMHO the easiest) is to start with the documentation of the application. Once you get the idea of the problem the application is to solve, the code has a tendency to follow that idea, even if the idea becomes superseded by updates that change the focus.
The usual problem is that the original programmer did not include documentation with the program.
The next problem is that the original programmer did not comment the program... and worse, updates to the program did not update the comments.
This brings up the next problem - identifying the problem the application is to solve. Simple libraries (and I include glibc in this) have standards documents for what it is doing - and this allows you to go back to the first method.
Object oriented kit has additional problems - You can't easily identify the scope of the problem without first knowing the object model used... And then the implementation of that in a particular language. Most languages only support single inheritance, but this obscures some of the contortions done to make a problem area fit the implementation model (which in turn makes the code hard to follow).
OO programs introduce another issue - it mixes dynamic constructs with static constructs which makes it difficult to follow the actions taken by a program. You used to be able to run a "cflow" application and identify where major activity would occur.
That is no longer as useful because it is only good for a static analysis. Good for some C code, but not good for OO designs (too dynamic - you get a trace of the calls to the OO model, but not the trace of the functions called by the model (which are dynamic).
What results is a bit inefficient in that you try to follow the top level elements (as far as you can see) to get an overview of what appears to be going on. Try not to get trapped in too much detail.
I have found that if it takes more than an hour to follow a function, it is because I don't have enough background yet to understand it. This leads to trying to get an overview of the application libraries/OO model, which can also take a good bit of time. When I get tired of looking at that, I go back to the higher level and see if it has started making more sense.
Oh - keep copious notes on what you find, and don't delete first impressions - just add more notes. Sometimes you find that the original notes have what "should be", and the following notes have "what is"... and that can explain some odd goings on with the application.
What you are doing is not an "engineering" type of thing.
You are, in one sense, reengineering the thought processes of a no longer available person/team. It is more like an archaeological study than an engineering one, trying to figure out what people did, and why.
|

31st May 2012, 12:07 AM
|
|
Registered User
|
|
Join Date: May 2012
Location: United States
Posts: 2

|
|
|
Re: Is there a systematic way to read and understand code?
Thanks for the reply!
Having read that there is no one systematic way, I want to open it up to everyone who wants to put in some advice for reading code.
If you have experience reading code, let me know! I'm trying to build skills I think will help me be a better developer or engineer in the future.
|

31st May 2012, 06:17 AM
|
|
Registered User
|
|
Join Date: Aug 2004
Posts: 3,855

|
|
|
Re: Is there a systematic way to read and understand code?
Quote:
Originally Posted by Eutropia
When I was told to solve a thermodynamics problem, or a biomechanics problem, there was a systematic approach you could take that would invariably lead to the solution.
|
Problems in science textbooks need to have systematic solutions - it makes homework simpler to grade. Reading code is more of a "real life" problem than a textbook problem. Real life problems in science and engineering can't necessarily be solved by systematic methods.
Quote:
|
Even for poorly defined problems(non-physical), there are critical thinking methods you can use for all sorts of life problems, there are writing recipes for strong prose, and so forth.
|
Perhaps the relevant examples for reading code is learning a local dialect of language or doing anthropology since it involves learning local customs.
Quote:
|
Is there a systematic way to study someone else's code? If I ever do any real open-source work, I figure I'll have to read a lot of code. I want a sort of guideline, methodology, or system for doing so.
|
I think there are various software tools that attempt to analyze code, diagram it etc. (I'm not enthusiastic about any I've seen.) If you can import code in the Eclipse, that gives you some help, yes?
I don't think there is a cut and dried approach for comprehending large pieces of software. You are basically faced with learning some peoples conventions and habits. (For example, has this programmer read the Design Patterns book?) Even the documentation reveals peoples' peculiarities. For example, in mathematical software, unless the programmer is well trained in mathematics, the documentation may use mathematical terms incorrectly. So you must learn a person's manner of speech.
That's why I think reading code is like doing anthropology or linguistics. You have to observe behavior, both of the program and the programmer. Now, if I only understood how anthropologists work!
__________________
"Never let the task you are trying to accomplish distract you from the study of computers."
|

31st May 2012, 06:22 AM
|
|
Registered User
|
|
Join Date: Aug 2010
Location: Al Ain, UAE
Posts: 1,059

|
|
|
Re: Is there a systematic way to read and understand code?
Doxygen goes a long way.
|

31st May 2012, 09:22 AM
|
 |
Registered User
|
|
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302

|
|
|
Re: Is there a systematic way to read and understand code?
It's a wonderful question. As a professional I read a lot more code than I write, and code is often miserable to read.
--
The first question must be, "WHY are you reading the code ?". What do you expect to gain ?
/ Sometimes we read code as an educational experience. You might be interesting in how strace or time programs work in Linux so you pull up the source and have a look. Perhaps you want to write a panel applet - so you pull up the source for an existing applet and study the method by example. sometimes you want to study exactly how others have solved the same or similar problems.
/ Another reason is to study unexpected behavior or remove bugs. In this case we usually have a good characterization of the odd/bug behavior and this can help direct the reading process. Sometimes we get extremely detailed problem reports or even messages from the working code. Sometimes we get or can obtain a stack traceback or a debug session and can perhaps see the exact line of code where the 'terminal offense' occurred.
/ Often we are reading code b/c we need to make some substantial revision or feature addition - big change.
A second question follows from the first - how much effort should you reasonably expend in studying the code. Do you need to read and understand the entire code-set, or is your need more specific and limited ?
/ If your goal is to read as an example, then you likely have some very specific reason and can limit your reading to the relevant parts. Perhaps you see a program that does something unique and you want to see how they implement it. This sort of reading often speaks to the dry, turgid, example-less, difficult to search and out of date form of most library documentation. You may only need to read a few lines, or a few routines.
/ If code is demonstrating bugs or unexpected behavior, then the amount of reading should proceed from the most obvious point of failure outward till the problem is encompassed. So you might know the report or trace shows the problem is exactly at line xxxx: of some file, but nearly always the ultimate cause of that problem is somewhere relate but elsewhere. In a case like that you can't usually read the entire code-set - you want to use debugging tools and especially strong, careful, deductive logic to work your way back to the actual cause of the problem. FWIW the ability to diagnose problems accurately is rare. Most people jump to poor conclusions with too little evidence, fail to question their conclusions when it doesn't fully explain the problem behavior. The problems that are most difficult to debug are ones where the evidence doesn't point to any particular parts of the code.
/ When you are going to add features you MAY not need to understand every line in detail, but you will need to understand all the interfaces your new feature will touch. If you have some cli and you want to some new commands - you need to understand the command parser and how the action is associated with the command and how command results are reported and interact with the others. Often there are parts of he app that have no impact on your changes and you can largely ignore these. If you are dealing with a small app then maybe you want to read all the code just for understanding. So for example if you wanted to add sctp protocol to netcat - you may as well read the whole thing. OTOH if you want add two dimensional arrays to bash, or some new feature to firefox - you'll probably want to read just the related parts of the code.
Revisions+additions points to the code design as well. If the code is modular and has good internal structures for your needs then code reading can be dramatically limited. If you wanted to add a file systems or drivers or new security model to the Linux kernel your code reading can be reduced to understand and examples of those interfaces. If you want to make a systemic change that wasn't anticipated in design then you will need to read far and wide.
--
In all cases your general understanding of the problem being solved and the general method the code designers used to address he problem can make our code reading vastly easier. This is about 'motivation' in a sense. If when you look at a routine you understand the general; purpose and understand the 'intention' most of the code and calls, then you can make good leaps-of-faith into what is going on in other parts of the code. Sometimes those leaps are wrong, in detail so you can't become too-certain until you read farther, but even with an 80% accuracy you can form a mental picture of what is happening much faster, and backfill the vague loose-ends as needed.
Another odd thing is whether you should read code from the top levels down or the low-levels up. IMO for bug-fixing bottom-up is most practical. For major changes top-down (in selective directions) is best. To gain a clear picture of the entire code top-down is best.
Of course you need to understand the programming language well. You'll also need to understand the APIs well enough to read for meaning,and any APIs you need to use in detail. Having good background knowledge from reading code widely really helps. You can see that many programs use similar methods and even peculiar means as others.
===============
Yes - if you need to make changes read the FUNCTIONAL documentation first, good idea for debugging too. Comments in long lived code are not-infrequently dead-wrong, and frequently only meaningful to the author. Code comments just fail in every way. Most modern C language projects have absolutely minimal comments, perhaps a Doxygen header line per routine and a rare comment elsewhere. Nearly worthless.
There is a difference when reading OO code, but it largely depends on style. There is a even bigger difference when reading functional language code. Of course half the time code in these languages are written in procedural style.
======
I'm not at all impressed with the systematic approach vs "real life" concept. That's just an appeal use seat-of-the-pants methods in a case where time and effort really matter. That is not good design or engineering practice.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Current GMT-time: 02:26 (Saturday, 25-05-2013)
|
|
 |
 |
 |
 |
|
|