Working on Chrome made me develop a tool for reading source code (opens in new tab)

(medium.com)

215 pointsnebucnaut9y ago51 comments

51 comments

40 comments · 18 top-level

barrkel9y ago· 9 in thread

The symbolic lookups and diagrams described are already implemented in modern IDEs. People who rely on relatively simple text editors and text / regex search may be less aware of this, of course.

I do agree that visualizations are lacking. In part this is because of difficulties in complete analysis, and rather than do half a job, they don't do any job at all. And in other parts its because different tools do a better job: for example, instrumenting profilers are better at showing control flow.

I generally want three kinds of things in my head when reading code: control flow, data flow and data shape.

But there are different resolutions to this. Control flow may be simple, at the method level; or it may be more complex, with asynchronous callbacks, queuing systems, RPC, web service requests and orchestrations controlled by configuration.

Data flow may be simply knowing where in the code a particular attribute is updated and where it is read and used to make a decision. But it's also about where the data came from ultimately - what all the ingredients are that go into its formation - and also what other data it in turn affects.

Data shape has to do with the simple shapes of structures, but diagrams are scarcely needed for that - a glance at the definition of the structure is enough to commit it to short-term memory. More interesting is global invariants, local invariants, longer chains (how you get to one distant structure from another via links), the database model, configuration data model, static data vs dynamic data.

Most of the interesting information is at a higher level than can reasonably be analyzed in most Turing-complete languages. The best info comes from profiling or debugging.

AlexC049y ago

Can visual studio do the graphical representation of a C# codebase? A javascript file?

Looking at Coati.io it can't do those languages yet, but the graphical flow chart looks really amazing to me and I'de love to try it out on my own code.

barrkel9y ago

Class descriptions are not only put into diagrams, they're interactive controls. Ten years ago, the model-driven development trends focused on making more and more of the code driveable by UI representations and code. People tried to model code using UML or UML-like representations, with two-way links. Tooling inferred models from existing code.

For more current approaches, see "Code Maps" in the VS 2015 release notes here - https://www.visualstudio.com/en-us/news/releasenotes/vs2015-...

Doing proper control flow is done less often for the reasons I explained. Profilers do it better. See e.g. the Call Graph in AQTime: https://support.smartbear.com/viewarticle/43205/ - it lets you drill through the callers and callees of each function as seen in practice. This means it can see through polymorphism, reflection, dynamic loading, function pointers - all the things that make static analysis infeasible.

1 more reply

stonemetal9y ago

Visual Studio does, but only in the "Enterprise" configuration($6K per seat). I haven't used it to be able to speak to its quality.

72deluxe9y ago

Well you have the class view which tells you a lot. And you have the object browser which is very useful too. It's not quite the same but I think it tells you a lot about a class without having to do masses of reading.

bobajeff9y ago

In the past working on a large c++ codebase I've tried using a debugger to get an understanding on how a function gets called.

Even the debugger fails when you come across indirect calls. Also in gdb multiple threads make it easy to miss something when stepping through code.

realharo9y ago

Visual Studio's debugger has some really nice features and interface for dealing with multi-threaded code https://channel9.msdn.com/Shows/Visual-Studio-Toolbox/C-Plus...

barrkel9y ago

Debuggers aren't always the best tool for figuring out control flow - accidentally step over something and you can miss it - but they are useful to explore the map of data in memory. Some debuggers let you expand out the map hierarchically, others use separate windows (e.g. the old Borland IDE Debug Inspector windows).

For exploring control flow, an instrumenting profiler is really useful, if you can easily isolate a representative code run.

Asooka9y ago

I would strongly suggest you check out rr - Mozilla's Record and Replay framework https://github.com/mozilla/rr . Don't let the name confuse you - it's not specific at all to Firefox. It can record a process's execution and replay it in gdb, and lets you step and continue backwards. This last point is super powerful - let's say you see that some variable has an unexpected value. You can set a watchpoint on it, then reverse-continue and gdb will stop at the point it was changed. It has been invaluable in finding some tricky bugs. Apart from that, you can also use it to explore the execution of a program, since you can freely step backwards and forwards in time. You also get a deterministic thread execution sequence each time. That might be either good or bad, depending on what you want.

bigger_cheese9y ago

A few years ago now at my work I wanted to get my head around how our large c++ code base worked I used a tool called Doxygen it was pretty useful for reading and visualizing the source code. It output html pages and could generate dependency graphs etc automatically would show you which file a function came from let you browse by class or by individual source files etc. You could easily see every file a class was referenced in etc.

Downside was it took a long time (many hours) to scan the entire codebase.

jasim9y ago· 6 in thread

As someone looking in from outside, the Chrome codebase is a treat. Anecdote.

Once during the start of my career I was asked to implement a caching layer in an HTTP library. I had no idea what it was, so there was some reading up to do. There were the excellent guides from mnot.net, as well as the nicely written RFC 2616. But Chrome's code was the best of all - https://github.com/adobe/chromium/blob/master/net/http/http_.... If you want to know for a fact how Chrome decides caching, that code is kind of the heart of it.

Recently I needed to retrieve the "Rendered Font Name" that is available in the DevTools "Computed CSS" section. This is the name of the system font that Chrome finally picks, based on the Font-Family property. This is platform specific and so not in DOM, nor available to Chrome Extensions directly. The only way it could be done was by making an extension running in debug mode and communicates to the browser thru its remote debugger protocol. (This part of the documentation is lacking, but it is an esoteric topic anyway). The good news was that the code that does this is well encapsulated and could be easily extracted into a command line utility. For the curious: https://chromium.googlesource.com/chromium/src/+/master/thir... (there is a nugget about real-world software in the comments)

This is, at a surface level, good code to me. Things are easy to find, there is a rhyme and rhythm to the system, and feels welcoming. The thoughts of the people who designed that system over years would be great to hear. Most writing about good code on the internet comes from an OO background, mostly wrt information systems. I wonder what people who've written these systems have to say about building and engineering complex software.

bengoodger9y ago

Chrome team co-founder/engineer/etc. here. Glad you found it useful! Some components like our net stack are particularly cleanly factored. Others have more room for improvement.

I would say that at the beginning of the project (2006-2008) we didn't have so much of a focus on platform design, just on shipping a browser as quickly as possible. Some of the abstractions from that era haven't stood the test of time as the project has scaled to many platforms, features etc.

Over the course of time we've had various refactoring projects to try and pay down some of the technical debt. The first major one was the "content refactor" from 2011. This led to the separation of the multi-process browser shell from the UI layer, which has allowed for other chromium-derived browser apps to emerge.

Today, we've observed that even this layer is a bit too complicated, so we're running more projects to try and modularize it a bit more. My mental model is that the browser is kind of like a set of system services for an ephemeral app runtime, and it's good to imagine what the APIs & separation between those things should be. To aid this we've developed a new suite of IPC tools which are way more useful than the original stuff we have used for much of the lifetime of Chrome.

Anyway this kind of thing requires an ongoing investment and a set of people who thrive on the art of API design and in grungy, challenging refactoring work. I probably have many more thoughts on this topic but this'll do for right now :-)

jasim9y ago

Absolutely stoked to read your response. Thank you sir.

There is a dearth of quality conversations on the internet about good code in a real-world messy context, mostly because the people who're doing serious work don't have the time to talk about it. Would be a good thing if you write more. In fact you folks should be writing books!

1 more reply

mattmanser9y ago

I'm not a fan of the comments. There's comments that are utterly useless like:

    // Of course, there are other factors that can force a response to always be validated or re-fetched.

Gee thanks, what might those factors be? A comment like that is worse than not commenting at all.

Or the real ones that get my goat, comments which just tell you what the code that follows obviously does:

  // If there is no Date header, then assume that the server response was
  // generated at the time when we received the response.
  Time date_value;
  if (!GetDateValue(&date_value))
    date_value = response_time;

There's even a line (1019) where they divide by 10 but don't explain why they do it. That's what I personally would comment.

The code itself is fairly easy to read (though I do wonder why they bothered using TimeDelta as it just seems to make the code more complicated, and results in confusing code like this:).

    return TimeDelta();  // not fresh

groby_b9y ago

While I'm not on the net team - I just mangle the UI and annoy Ben ;) - I can wager a good guess: The net team used TimeDelta because for any software project of Chromium's size strong types are crucial.

If they just returned an int, you'll sooner or later see it passed through five different places, and at the end location, nobody remembers if it's ms, time ticks, seconds, or even a unix time instead of a delta. TimeDelta removes that question.

It also does things that have subtle issues you might miss if you did it "by hand", like saturated adds, multiplying with an integer value while handling overflow correctly, etc.

These things might be overkill for smaller project, but once you have something with hundreds of contributors, every little bit helps keep the code base sane.

As for the division by 10 - look up at line 953 :)

Yes, it should be a named constant. And some of the comments could certainly be better. It's a work in progress. (And if you want to help with that work, we happily accept patches!)

1 more reply

squeaky-clean9y ago

I think the comment you mention makes perfect sense within the rest of its context. It's saying the rule for determining freshness is "response_is_fresh = (freshness_lifetime > current_age)", however that's not always the case. And looking at the code, you can see it returns true early if lifetime is equal to a new TimeDelta.

The GetFreshnessLifetime function below it then covers the additional cases where it returns that. Such as the headers being set to not cache, or the expiry time being earlier than the response's time (or current time if none is provided).

I think it also makes sense to assume the comment is letting us know that just because HttpResponseHeaders::RequiresValidation returns false, that doesn't mean that's the only thing that can make it require/not require a re-fetch.

> There's even a line (1019) where they divide by 10 but don't explain why they do it. That's what I personally would comment.

This is covered near the head of the function, line 951. Using a constant such as heuristic_scalar instead of simply using '10' would be more readable though.

samfisher839y ago

The chrome code base is not easy to understand and it keeps getting changed even when it really doesn't need to. For example it used to render most of the colors here Gradient.cpp. Now its rendered in some other place and its hard to figure out exactly where.

realharo9y ago· 4 in thread

The visualization aspect looks quite interesting, but I'm not sure how many things this can do that a quality IDE already doesn't.

The example uses the author mentions ("Following code paths from method to method", "Finding where an interface is implemented and which methods get overridden", "Exploring dependencies between types and functions") all sound like pretty standard features today. Plus with an IDE you get the benefit of having them right there in the editor/debugger/etc. and much more.

I would however really like something for inspecting the run-time structure of an application's objects. Most debugger views are really clunky for looking at large amounts of data, and even the pretty-print features often don't help much. Having a zoomable graph with the objects right there in front of me would really bring my productivity to the next level.

westoncb9y ago

I'm working on a new kind of tool on those lines—would appreciate any feedback on whether the format I've designed so far would be useful to you (or others). http://westoncb.com/projects/avd

(I've been thinking about ways of getting to deal with arbitrary object graphs, but an important requirement was to keep it language-independent, so it just deals with common data structure formats at the moment: lists, trees, tables, graphs, hashmaps, etc. —my thinking is most difficult to debug algos are performing operations on these anyhow.)

cellularmitosis9y ago

This is amazing! Is the UI OpenGL? Is the visualization program a separate process? I'm guessing the main process sends info about the monitored data structures over a socket?

1 more reply

noselasd9y ago

The DDD debugger has(had ? Not sure it's maintained anymore) pretty useful visualization of data - e.g. like https://bcaptain.files.wordpress.com/2013/06/ddd.png

realharo9y ago

That's a good start, I'll try that out when I have some time.

This specific screenshot however - a linked list - shows a situation where you most likely don't care about the exact structure like this - you only care that it's a list of elements, so most debuggers would show it to you like that - just as a list. The specific raw structure only obscures the parts you care about.

Visual Studio allows describing your custom data structures like this using XML files (https://msdn.microsoft.com/en-us/library/jj620914.aspx), or even custom graphical controls (https://code.msdn.microsoft.com/windowsdesktop/Writing-graph...) and gdb allows you to write a pretty-printer in Python (using an API which I can't find any documentation for :/ ), but it's all kinda clunky and still doesn't deal well with large amounts of data.

My ideal tool would be something that combines both of these things - you have

- An object graph that you can zoom in/out of that shows raw objects (just like in that DDD screenshot).

- An easy way to describe a custom view for your own objects (you can also switch to the raw view for an object, or switch between different defined views, etc.). Just like VS/gdb/lldb allow you to do, but a lot easier and more powerful. You could for example view a specific dictionary that contains complex objects in a tabbed interface, where the keys are tab titles, etc.

- A way to live edit these custom views - so that you can rapidly create them without restarting the debugger and restoring the state many times (Visual Studio supports this for the .natvis files).

- A powerful searching/filtering/transformation mechanism (e.g. for every object that satisfies this condition, show only this property and sort by that, etc.).

- Some way to save these configured views + filters + transformations, etc.

2 more replies

mwexler9y ago· 1 in thread

I see comments that "an IDE can do this". Perhaps so. But there are lots of data-discovery missions where you need to learn a codebase quickly that you won't be fully compiling or building yourself. Perhaps you need to replicate an approach, or understand where something in your code needs to change in order to work with the other code that you can't touch. Tools like the OP's could become very handy in understanding a codebase without bringing the entire thing into your IDE.

I agree that some pieces only are really comprehensible runtime, but I applaud tools that reflect the need to learn a codebase without necessarily having to (or being able to) bring all the code into your IDE.

speps9y ago

> learn a codebase quickly that you won't be fully compiling or building yourself.

I used early versions of Coati (0.5 I think) and it used clang for the backend, loading a new project took ages, probably longer than compiling it in my case. I should try again as they released 1.0 not too long ago.

outcoldman9y ago· 1 in thread

Trial does not allow to try it... You can only try it on some predefined small projects. As I understand the only option for me to try it is to buy a license. Ok, author says that I can get refund in 1 month if I will not like it. But anyway - too much movements for trying. Author please consider to actually give an option to really try it. Another: please add retina icons.

sexyForkBomb9y ago

I was curious about this as well. You can mail them for a "real" trial.

mrkgnao9y ago· 1 in thread

Chromium Code Search sounds sort of similar to Hoogle. Can anyone with experience of both confirm this?

Also -- sort of offtopic, but motivated by TFA -- I'd love some way to find out about quirks that native speakers of $lang have when they speak $otherlang.

A few common "tells" that I know, for $otherlang = Englishf are Hyphenating-Things-Like-This and also spaces before exclamation points or question marks !

0x4a429y ago

"spaces before exclamation points or question marks !"

This is probably related to typographic rules. For example in french you put a space before and after double punctuation marks (!?:;" etc) and a space after single punctuation marks (.,) one exception is the single quote which should not be precedeed nor followed by a space.

javathrowaway9y ago

I feel obliged to mention this talk by Zed Shaw: https://vimeo.com/53062800

It gives an overview and interpretation of a body of neuroscience research in the context of teaching programming. I can't quite summarize the whole talk succinctly (and don't want to lure anyone with catchy titles either) — but my takeaway from it is that those "visual" programming tools are mostly useless and not going to help significantly.

The reason for that being how the brain works: switching back and forth between "visual" and "linguistic" cognition is hard, and requires specific training to do efficiently. Please turn to the talk for references.

TheMagicHorsey9y ago

This company is tackling the same problem with a slightly different approach. I like that they have made their tools open source and modular. Theoretically you can use their system with any language by writing a few modules.

https://sourcegraph.com/

relics4439y ago

If it was hosted, and you could point it at a GitHub repo, I'd use it.

Otherwise I'd rather use whatever IDE JetBrains has for it. It might not have the exact same capabilities (or maybe it does, I didn't look closely enough) but why use another tool and context if the current one is good enough.

zfedoran9y ago

This reminds me a little bit of the IDA disassembler. There are moments where you might face production issues with external JS code and have no source maps available. A tree view to dissect what is going on would be useful in these situations.

mattnewton9y ago

This is really cool. One killer feature that I don't know if you have considered is some kind of visualization of the stack trace, showing how control passes between the objects & functions. That would really help with dynamic languages, or large codebases that have been Greenspun-d heavily. It would be very difficult to implement, especially in a cross-platform way, but I bet if you pick a language community like c++, java or something else and focus on it, you might have better results.

mondoshawan9y ago

This tool looks quite useful for weird build environments with tons of completely undocumented native code such as AOSP (Android) where IDEs regularly fall over.

godmodus9y ago

Trying should be made more accessible.

"Looks" promising.

manishsharan9y ago

Java provides an easy way for tracing control flow during program execution: when I join a new contract and am responsible for maintaining legacy java code, I use AspectJ with logging aspects with before and after pointcuts. This helps me figure out the application control flow.

I wonder if something similar is available for C/C++.

zimablue9y ago

It's closed source?

hkon9y ago

Reminds me of the code bubbles and debug-canvas I used to use in Visual Studio

fapjacks9y ago

Incidentally, Webkit is the hairiest pile of code I've ever seen.

general_ai9y ago

How does it compare to Google Kythe (internally known as Code Search): https://github.com/google/kythe?

j / k navigate · click thread line to collapse

51 comments

40 comments · 18 top-level

barrkel9y ago· 9 in thread

The symbolic lookups and diagrams described are already implemented in modern IDEs. People who rely on relatively simple text editors and text / regex search may be less aware of this, of course.

I generally want three kinds of things in my head when reading code: control flow, data flow and data shape.

Most of the interesting information is at a higher level than can reasonably be analyzed in most Turing-complete languages. The best info comes from profiling or debugging.

AlexC049y ago

Can visual studio do the graphical representation of a C# codebase? A javascript file?

Looking at Coati.io it can't do those languages yet, but the graphical flow chart looks really amazing to me and I'de love to try it out on my own code.

barrkel9y ago

For more current approaches, see "Code Maps" in the VS 2015 release notes here - https://www.visualstudio.com/en-us/news/releasenotes/vs2015-...

1 more reply

stonemetal9y ago

Visual Studio does, but only in the "Enterprise" configuration($6K per seat). I haven't used it to be able to speak to its quality.

72deluxe9y ago

bobajeff9y ago

In the past working on a large c++ codebase I've tried using a debugger to get an understanding on how a function gets called.

Even the debugger fails when you come across indirect calls. Also in gdb multiple threads make it easy to miss something when stepping through code.

realharo9y ago

Visual Studio's debugger has some really nice features and interface for dealing with multi-threaded code https://channel9.msdn.com/Shows/Visual-Studio-Toolbox/C-Plus...

barrkel9y ago

For exploring control flow, an instrumenting profiler is really useful, if you can easily isolate a representative code run.

Asooka9y ago

bigger_cheese9y ago

Downside was it took a long time (many hours) to scan the entire codebase.

jasim9y ago· 6 in thread

As someone looking in from outside, the Chrome codebase is a treat. Anecdote.

bengoodger9y ago

Chrome team co-founder/engineer/etc. here. Glad you found it useful! Some components like our net stack are particularly cleanly factored. Others have more room for improvement.

jasim9y ago

Absolutely stoked to read your response. Thank you sir.

1 more reply

mattmanser9y ago

I'm not a fan of the comments. There's comments that are utterly useless like:

    // Of course, there are other factors that can force a response to always be validated or re-fetched.

Gee thanks, what might those factors be? A comment like that is worse than not commenting at all.

Or the real ones that get my goat, comments which just tell you what the code that follows obviously does:

  // If there is no Date header, then assume that the server response was
  // generated at the time when we received the response.
  Time date_value;
  if (!GetDateValue(&date_value))
    date_value = response_time;

There's even a line (1019) where they divide by 10 but don't explain why they do it. That's what I personally would comment.

The code itself is fairly easy to read (though I do wonder why they bothered using TimeDelta as it just seems to make the code more complicated, and results in confusing code like this:).

    return TimeDelta();  // not fresh

groby_b9y ago

It also does things that have subtle issues you might miss if you did it "by hand", like saturated adds, multiplying with an integer value while handling overflow correctly, etc.

These things might be overkill for smaller project, but once you have something with hundreds of contributors, every little bit helps keep the code base sane.

As for the division by 10 - look up at line 953 :)

Yes, it should be a named constant. And some of the comments could certainly be better. It's a work in progress. (And if you want to help with that work, we happily accept patches!)

1 more reply

squeaky-clean9y ago

> There's even a line (1019) where they divide by 10 but don't explain why they do it. That's what I personally would comment.

This is covered near the head of the function, line 951. Using a constant such as heuristic_scalar instead of simply using '10' would be more readable though.

samfisher839y ago

realharo9y ago· 4 in thread

The visualization aspect looks quite interesting, but I'm not sure how many things this can do that a quality IDE already doesn't.

westoncb9y ago

I'm working on a new kind of tool on those lines—would appreciate any feedback on whether the format I've designed so far would be useful to you (or others). http://westoncb.com/projects/avd

cellularmitosis9y ago

This is amazing! Is the UI OpenGL? Is the visualization program a separate process? I'm guessing the main process sends info about the monitored data structures over a socket?

1 more reply

noselasd9y ago

The DDD debugger has(had ? Not sure it's maintained anymore) pretty useful visualization of data - e.g. like https://bcaptain.files.wordpress.com/2013/06/ddd.png

realharo9y ago

That's a good start, I'll try that out when I have some time.

My ideal tool would be something that combines both of these things - you have

- An object graph that you can zoom in/out of that shows raw objects (just like in that DDD screenshot).

- A way to live edit these custom views - so that you can rapidly create them without restarting the debugger and restoring the state many times (Visual Studio supports this for the .natvis files).

- A powerful searching/filtering/transformation mechanism (e.g. for every object that satisfies this condition, show only this property and sort by that, etc.).

- Some way to save these configured views + filters + transformations, etc.

2 more replies

mwexler9y ago· 1 in thread

speps9y ago

> learn a codebase quickly that you won't be fully compiling or building yourself.

outcoldman9y ago· 1 in thread

sexyForkBomb9y ago

I was curious about this as well. You can mail them for a "real" trial.

mrkgnao9y ago· 1 in thread

Chromium Code Search sounds sort of similar to Hoogle. Can anyone with experience of both confirm this?

Also -- sort of offtopic, but motivated by TFA -- I'd love some way to find out about quirks that native speakers of $lang have when they speak $otherlang.

A few common "tells" that I know, for $otherlang = Englishf are Hyphenating-Things-Like-This and also spaces before exclamation points or question marks !

0x4a429y ago

"spaces before exclamation points or question marks !"

javathrowaway9y ago

I feel obliged to mention this talk by Zed Shaw: https://vimeo.com/53062800

TheMagicHorsey9y ago

https://sourcegraph.com/

relics4439y ago

If it was hosted, and you could point it at a GitHub repo, I'd use it.

zfedoran9y ago

mattnewton9y ago

mondoshawan9y ago

This tool looks quite useful for weird build environments with tons of completely undocumented native code such as AOSP (Android) where IDEs regularly fall over.

godmodus9y ago

Trying should be made more accessible.

"Looks" promising.

manishsharan9y ago

I wonder if something similar is available for C/C++.

zimablue9y ago

It's closed source?

hkon9y ago

Reminds me of the code bubbles and debug-canvas I used to use in Visual Studio

fapjacks9y ago

Incidentally, Webkit is the hairiest pile of code I've ever seen.

general_ai9y ago

How does it compare to Google Kythe (internally known as Code Search): https://github.com/google/kythe?

j / k navigate · click thread line to collapse