(And as long as I'm griping, don't get me started on all the people who think a wall of text slapped into a PNG constitutes an "infographic.")
Slavishly reproducing his methodology ignores everything we've learned since then. On the other hand, for those new to the field, reading about his work and understanding what he was trying to accomplish with the tools available at the time can open our eyes to new ways of thinking.
(As an aside, this same maxim has also helped me with things like programming tools. We don't need to use Lisp or Smalltalk for everything, but we can learn a lot from these languages, and especially from what their creators and proponents were trying to achieve with them.)
The famous graphic about Napoleon's army which is very associated with Tufte is an example of a graphic that crams a lot of data into an illustration that rewards careful study. It's actually not a graphic that especially makes data about something obvious at a quick glance.
Sometimes an illustration that best serves as a background for a knowledgeable person spending 30 minutes explaining it is a good approach. Other times you want to capture the contrast between a few numbers in a compelling way.
2. Almost every chart and graph I see would be better if its creator understood and applied Tufte’s principles.
3. His books are truly delightful to read and are not overrated.
4. If he’s so overrated, name a person or resource that would better impart a set of useful principles to create effective and accurate visual representations of information.
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8...
Tufte, as another user correctly pointed out, is massively overrated. He is also incredibly thin-skinned and often blocks people, many of whom work in data visualization, if they criticize or critique his work or ideas.
Alberto Cairo is just as overrated, but he is more receptive to feedback.
The fact that most maps are terrible means we need to encourage better maps, not dismiss them entirely.
That said, I am being dramatic on my claim. It doesn't help that I don't have an internal map. I'm oddly good with directions, but I do not visualize getting from here to there in anything resembling a map in my mind.
So, to that end, most maps that someone uses to show me something that it is best at, a simple time series or scatter plot would have done as well. Often better.
That is to say, selection bias on my part. ;)
For the 3D one specifically, right under the graphic, the article says: "3D has a time and a place. It can be a really useful way to encode thematic data on the z-axis and make something useful. But extruding Hubei compared to the rest of the areas just doesn’t work. It’s gratuitous and adds nothing. It’s really hard to make any sense of relative amounts and that’s before we even deal with foreshortening and occlusion."
P.S. From the HN guidelines (https://news.ycombinator.com/newsguidelines.html):
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith... Please don't comment on whether someone read an article.
First you said, "I can't count the number of times I've looked at a data visualization and wished I could sit down with the person who made it and read an Edward Tufte book to them."
I was and am 100% on board with this comment. I think the same thing often.
Then you said "There's just so few good examples out there of data visualizations that respect basic principles of visual communication, like the ones outlined in this article."
I agree, the article does a pretty good job.
Then, "They generally seem to aim more for visual impact (like the useless 3D display in the article, which you've gotta admit is striking) than for clarity, which I guess is understandable but is still too bad."
I was uncertain about this statement. The previous sentence you start by stating "There's just so few good examples..." and end with "...like the ones outline in this article", which made it a little unclear if the one's in the article were good or not, but as I was reading it I was leaning to the good side. Then this sentence started with "They generally seem...", and since the end of the previous sentence ended talking about the "ones outlined in the article", I associated "They" with "the ones in the article". And this sentence that started with "They generally" was negative.
Then I contributed some miscommunication. When I used "you" in the sentence I was thinking in general terms (including myself) and not you personally. I think that might have been better stated as "If one reads the article...".
Anyway, I was initially confused by your statement. Now I see what you were going for.
Edits: grammar, missing words
Like several others, I was also confused by your initial comment. At first I thought you were criticizing the article as an example of bad graphics and useless 3D.
I am no master of communication, but there is one thing that stuck in my mind from a class I took many years ago: If I am talking to someone or writing something they read, and they seem to be misinterpreting or misunderstanding me, who is responsible for that? Is it the reader or listener, or is it me?
The lesson was that I, the person doing the communicating, am responsible, not the person receiving the communication. It's usually not helpful to blame them for misunderstanding. Instead I should realize that I was probably unclear in some way, and do what I can to clear it up.
Of course there are exceptions. Sometimes people are willfully misunderstanding and don't give you a chance to clarify. I remember one friend who delighted in pouncing on me if we were casually brainstorming and I said something that wasn't exactly what I really meant. When I would correct myself they would say "Oh no, you already said XYZ and you can't take it back now!"
But I think those cases are unusual, and I've found it very helpful to avoid blaming the listener and just see how I can be more clear.
This point was clear in your top comment.
> I was praising the article for illustrating good principles of visual communication...
This point was completely unclear in your top comment.
I read your top comment three times, and each time made me feel more certain you were complaining about the site as an example of failing to implement good visualizations (until I read this comment).
Despite this ESRI-backed article on the subject, I think the popular ESRI-driven map dashboard for Coronavirus[1] has a major flaw that violates the crux of this article. Dot density maps _MUST_ be set to scale relative to your map scale, or else you get nightmare scenarios like this one[2]. This is doubly true if the dots are varying in size (which I also think is a fundamentally terrible representation, because people suck at mentally comparing areas). If I were to modify it, I would probably use a choropleth-like representation. Keep the dots equally sized and colour them different shades of red. That way nobody's brain will mislead them into thinking "this larger circle means a larger area is all infected."
[1] https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.h...
Further, two dimensional area (circles) is a particularly terrible dimension to marry to geography, because it’s an additional (and therefore competing) spatial dimension. Color is better, but still has problems (compare Rhode Island to Texas, or populous New Jersey to unpopulated Wyoming/Alaska). And color forces you to bin, which can be misleading. Chloropleths are still harder to read (compare California to Maine, they don’t share an Axis and are irregular shapes, making it hard to compare their areas) than a bar graph or histogram.
IMO a logarithmic bar graph is the most reasonable choice, if you want to include population density I’d encode it with opacity and one-dimensional space (a shaded-in bar representing infection, a dark bar representing mortality, and a transparent bar representing total population). If geographic projections are that important to you, you can superimpose those bars on countries. It sucks but it gives you geographic scale. If anyone wants to build this graph, please include an adjacent, different-hued bar encoding the number of tests performed thus far.
And per the terminology in the article, that ESRI map uses proportionally scaled symbols, not dots.
That pattern holds true for nearly every province, making them all rather misleading.
Ah my previous comment was looking for this!
There are a lot of flaws with the visualizations of the infections. But using choropleth representations would need a population reference no ? I'm genuinely curious, should the range be reflective of the population with the series the infections ?
It could then be enhanced with deaths per infections in certain regions, which could be further enchanced with distance to hospitals.
I sound like an ass right now but the data is here, we should use it properly to help, and with people like yourself, maybe it would be better than whats being given right now
Honestly, it's tempting. I should dust off QGIS or Leaflet and try a few examples. But to speak candidly, I'd probably much rather play Lego with my kid tonight.
That's completely fair enough! Priorities!
https://jagjapan.maps.arcgis.com/apps/opsdashboard/index.htm...
I was surprised when you said that. Recently I was doing some georeferecing of historical maps on top of current maps, and I was very disappointed with the choices.
That was my first experience with GIS toolkits. I tried ArcGIS, QGIS and a few lesser knowns. I was looking for a good UI/UX, partly because the goal was to teach a non-technical acquintance to do it. There was a shareware toolkit that was exclusively for georeferencing that had a very satisfactory UI, but it had a blocking bug that would cause it to crash.
I'm not sure why this is or even if it is peculiar to GIS, or just more visible compared to the many slow-to-upgrade software fields which don't have a UI or which do more server-side. Also I think some of it is also driven by having one or more big customers who themselves refuse to upgrade.
The effect was particularly visible as an outside team was developing a greenfield iOS app for our data at the same time as our team maintained their old-new Windows app. The iOS team was able to, as they say, "move fast and break things" and gain accolades for whizbang features. It was interesting to watch them accomplish more with less computing power and a more primitive (IMO) development language (their Objective-C to our C#).
Edit: I called it the "old-new" Windows app as there had been an even older app (predating C#) which the C# app replaced. In the circle of life the once-new C# app itself became ossified and stuck with whatever short-sighted design legacy decisions were baked in. There was a lot of technical debt in the codebase.
I should say, partially replaced, as they were never (while I was there) able to convince all of the customers to upgrade from the original app built, hence why I say that big customers who can simply refuse to change might be a factor.
I'm not sure how common your use case is, but maybe it does need some simpler option (does it exist in Google Earth maybe?). I think a lot of the improvements are in the form of data authoring and presentation. Even just, "hey here's a public participation map of jogging/biking routes, all measured out and hand-optimized for avoiding traffic lights. Check it out and add your own!" has gone from impossible to pretty easy since I began in the field.
I wonder if someone with the proper credentials could contact the creator of this website [0] with advice. I seems like a good idea and resource but the map bothered me the very first time I saw it. The color coding is simply wrong and it communicates something that does not align well with reality.
> But looks can be deceptive. The fact that it looks okay is hiding a dark secret that, if you’re not aware of the fact, won’t even get noticed. The map is using totals (absolute values). There are very very few golden rules in cartography but this is one of them: you cannot map totals using a choropleth thematic mapping technique. The reason is simple. Each of our areas on the map is a different size, and has a different number of people in it. These two innate characteristics of all thematic maps means you simply cannot compare like for like across the map.
> The label tells us that Hubei region has over 65,000 cases of coronavirus. It sounds a lot. But does Hubei have 100,000 people, or possibly 100,000,000 people living there?
I definitely agree with the author: that there are very few "golden rules" in visualization, and that not depicting absolute numbers in a choropleth map is one of them. However, the author does an excellent job (with a bar chart and revised map) showing how this anti-pattern severely obfuscates how much the Hubei region is an extreme outlier.
That's even a bit of an understatement, because three infected in a city of millions can easily be an early stage pandemic whereas three infected in the middle of nowhere would just be a three very unlucky persons.
If you start moving to things like per-capita, i actually think that has the potential to be more confusing for more numbers of people.
so yes, absolute values will be highly correlated with population, but again it just depends on what you really want to highlight and communicate
Maybe you really do want the absolute value.
But I'm not sure how shading this by absolute totals – in which case, Wyoming and N.J. would be the same shade – would provide significantly more value? Sure, N.J.'s situation wouldn't be effectively invisible in the totals map, compared to the rates map. But now we have to imagine a scenario in which a person-to-person virus managed to sicken so many people (proportionally speaking) in such a large rural state compared to an extremely urban state. It's very hard to imagine a scenario in which we don't want to focus attention on Wyoming. For 10,000 Wyoming people to be infected – and only 10,000 affected in New Jersey – would almost certainly mean that the infection's original epicenter is Wyoming, and that someone from Wyoming had direct contact via travel with a New Jersey resident, (especially if Wyoming and N.J. are outliers in terms of absolute totals by state).
there is no objective right answer, and it's about intent of the communicator. in this case, rate may very well be what the intent needs to be.
The argument isn't that you never want to present absolute values.
The argument is that you never want to present absolute values on a choropleth map, explicitly because it always obfuscates the data in a way that is misleading.
The data in question should probably be in a table, if the idea is to be able to compare Hubei Province to the others. The bar chart makes the difference very clear, but it also makes it difficult to interpret the data for the other provinces.
What for though? What can you tell from that value, other than the value itself?
You cannot tell whether it's common or rare, you cannot tell the risk of anyone in a certain area to be affected, you will have a hard time showing trends because people will react to the phenomena and avoid a certain high-risk area which will then result in fewer cases in that area.
"How many hosts does the virus have in which to mutate"
"How much will the global economy be affected by the cases in this area"
The majority of meaningful information received from such a chart right now is the presence or absence of the virus. Secondary is the number of cases to indicate the stage of spread (e.g. 1 suggests maybe an outlier, 2-10 suggests early stages of contact spreading, etc).
Communicating information with an inherently exponential growth rate is just entirely different beast.
For this reason I think the 3D projection graph is actually not as bad as made out. Sure, it's hard to tell anything about any of the other provinces compared to Hubei, but it really highlights the difference between there and other provinces.
To be willing to take on such an economic drain in order to do so makes it seem like they're treating the virus like a potential pandemic. Are the death rates for the current coronavirus outbreak substantially higher than the regular flu? What else am I missing here?
My only quibble is that the shape of the uncertainty shouldn't be a box, it should be oriented around a downward sloping line.
The passengers on the diamond princess cruise ship [1] could give us a better estimate of the true death rate, because they all got tested regardless of symptoms. So far 4 out of 700 people have died (0.5%). The death rate for people 65+ with the seasonal flu is 0.9% [2]. If the coronavirus is as deadly as the flu we expect 7 people to die, if it is 20X as deadly as the flu we expect 140 people to die.
[1] https://en.wikipedia.org/wiki/2020_coronavirus_outbreak_on_c...
https://www.sfchronicle.com/bayarea/article/Wuhan-coronaviru...
In the case of China, chaos resulting from the panedemic has the potential to undue Xi Jinping's reign, so his cadre has decided to take the hit on the economy and go full war mode to combat it.
That being said, it does pose a critical danger and greater mortality rate if healthcare infrastructure is overwhelmed. The drastic lockdowns do help control the spread to a degree that mitigates this possibility and allows for the ramping up of response capacity. The US should be responding with comparable force (and probably will be forced to in the coming weeks), but there's a lot going against taking action at the moment, from poor national coordination, Trump administration cuts and malfeasance, bureaucratic impediments around mass testing, and outsourced supply chains.
Because it’s spreading silently and is so impactful to its victims it’s a really big deal.
https://news.ycombinator.com/newsguidelines.html
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
Taiwan's exclusion from the WHO and other geo-political bullying notwithstanding, the situation regarding this outbreak is dramatically different here than it is in China.
The impacts of travel restrictions on people who have recently visited Taiwan are also dramatically different than those who have recently visited China.
I don't know how to reconcile your goals with another person's rejection of any kind of normalization of Chinese threatening of Taiwanese sovereignty. Perhaps the only option is for those unwilling to let articles get away with glossing over Taiwanese sovereignty unchallenged to get banned / downvoted to oblivion. I think that's acceptable, though sad, because I like it here.
> These political divisions are arbitrary anyway.
Not sure how interstate activity and travel is regulated/limited in China (in normal times), but in the U.S., state borders are not just some imaginary political construct. Laws and services – and therefore, impact to respective populations – can drastically differ by state lines, and ignoring that is a huge mistake.
You should be concerned about the fraction of land area on which you are at high risk. If 100 people have it, and each person creates a high risk across A area (and the areas don't overlap), that is 100*A / country-area. Which is proportional to cases/area (presuming A is constant) the first statistic he used.
EDIT: if you know you are going to interact with N people, the cases per population figure is relevant again.
It's that the case anywhere? Even city centre / suburbs will have different values, much less a whole province where A may contain an empty field or a group of residential buildings with 50 floors and shared lifts.
I could see how it would be useful for a map of a city, or maybe even at a scale of some regions... but not for comparing totals between regions.
I am gonna update the numbers for today.
https://coronaprogress.com/?color=FEFE62
https://coronaprogress.com/?color=D35BF7
I have often considered dividing by some kind of combination of area and population, but even that seems not quite right. Disregarding "victimless crimes," much crime is interactive: two or more parties must be involved, therefore the population ought to have some kind of exponent attached to it, like particles bouncing against one another in a container.
I never did puzzle this out, I am sure brighter minds than I would have come to some conclusions.
Population squared? That gives you the number of potential connections.
In the rather clumsy taxonomy of crime I created from the UCR, most violent crime -- excepting suicide -- would be collision-based. Some drug crimes like possession would not be collision-based (although it could be argued that possession involves buying which involves another person) while drug sales would be. Crimes against property are interesting -- is that another person by proxy, or should that merely be collision-less?
Where connections could eg mean: two people come in close enough proximity with each other to spread a virus.
If for you the fact that a country emits internationally valid passports, print its money, has a government and an army is not enough to be a real state, you're a living in a province of China too.
For instances I say "I saw a fox outside my house yesterday". I don't need to specify the exact species of fox, and anybody who knows that I live in Europe will know that I mean vulpes vulpes.
There's nothing to really say the same situation happening in China, Iran, SK, Italy won't happen here.
Have a supply of food ready, minimize being in crowds, don't touch your face when you're not inside the house.
Other stuff I've been doing that aren't necessarily the right thing:
- Eating meat well done for a while
- Not eating raw veggies
- Working from home more often
- Telling sick co-workers to stay home (I'm in a tech company, theres really no excuse of sick days)
- Wash your hands.
- Cover your cough.
- Stay home.
(Last: if you're sick, if the outbreak is local, if you don't absolutely need to be somewhere.)
Ready: Pandemic preparations: Community mitigation guidelines to prevent pandemic influenza https://www.ready.gov/pandemic
Before a Pandemic
- Store a two week supply of water and food.
- Periodically check your regular prescription drugs to ensure a continuous supply in your home.
- Have any nonprescription drugs and other health supplies on hand, including pain relievers, stomach remedies, cough and cold medicines, anti-diarrhoeal medication, fluids with electrolytes, and vitamins.
- Get copies and maintain electronic versions of health records from doctors, hospitals, pharmacies and other sources and store them, for personal reference.
- Talk with family members, loved ones, neighbours, co-workers, and other frequent contacts, about how they would be cared for if they got sick, or what will be needed to care for them in your home.
During a Pandemic
Limit the Spread of Germs and Prevent Infection:
- Avoid close contact with people who are sick.
- When you are sick, keep your distance from others to protect them from getting sick too.
- Cover your mouth and nose with a tissue when coughing or sneezing. It may prevent those around you from getting sick.
- Wash your hands frequently to help protect you from germs.
- Avoid touching your eyes, nose or mouth.
- Practice other good health habits. Get plenty of sleep, be physically active, manage your stress, drink plenty of fluids, and eat nutritious food.
Adapted from: <https://www.ready.gov/pandemic>
(Most of the prepatory advice will be familiar to Bay Area residents as typical earthquake preparedness. Elsewhere it's standard preparation for major winter storms or hurricanes. Be prepared to sit tight for a few weeks.)
US CDC medical travel advisories: https://wwwnc.cdc.gov/travel/notices
United States, 2017, Draws on ~200 journal articles written 1990 - 2016. Provides a framework on response strategy to COVID-19. https://www.cdc.gov/mmwr/volumes/66/rr/rr6601a1.htm)
Edit: This isn't really a fair assessment. See the comments below.
It took just 1 person to infect 600 people on a 2,700 passenger cruise ship, many of which happened even after quarantine and medical staff were introduced.
That means that 1.9 million NYC people can be infected and 38,000 people can be killed from just one person.
Which is quite amazing, given that the virus was spreading there for weeks (at least) without anyone being aware of anything before all the mess was uncovered and announced.
If growth was still exponential under these conditions, we would be very very screwed.
I don't think they have done any random sampling in the testing. If you only have mild symptoms (could be it or something else) it's best to stay home rather than risk catching it at the hospital.
Edit: Downvote all you like guys - but Taiwan is an independent nation. :)
What's odd though is the author even plotted data for Taiwan, so they must have seen what they were doing..
https://twitter.com/ARTICLE19Iran/status/1231895623789576192...
Web Mercator Auxiliary Sphere is a good default for a software application to use. It is global, meaning that regardless of what data you dump onto it, it will show up on the map. It is conformal, which means if you zoom in, shapes will be preserved. If you zoom in on a town square that is actually square, it will be square on the map, too. North is up in all locations.
That being said, it's only a good default because if you users aren't knowledgeable enough to select the right projection, web Mercator aux sphere is the least bad, lowest common denominator option. When you as a user choose what projection to use to visualize your data, it's usually wrong to select web Mercator aux sphere. But if you were never going to make the effort to select the right projection anyway, it's not a completely terrible default.
Note that web Mercator is different from web Mercator aux sphere. Web Mercator is not conformal, which makes it pretty useless. Many people use the terms web Mercator and web Mercator aux sphere interchangeably, which they shouldn't.
https://m.facebook.com/457568574373257/posts/this-is-a-popul...
Because Mercator.
Likewise in soviet russa.. map owns you. How much of the cold war might have been put back to bed, by a better map projection?
Hypothetically: If all infected submitted their map data for the last few days (annonymously - no need to identify people) and all of that data was plotted over maps, you could identify the routes and direction of infections.
I don't know if it would be anything more than an interesting visualisation of the data already collected, but the comment mentioning Edward Tufte really got me thinking how to visualise the data we have properly.
We haven't seen something spreading like this in my lifetime anyway and at the same time, we've never had so much data on ourselves in my lifetime either, might be a good time to put it to good use for once.
https://www.cdc.gov/eis/field-epi-manual/chapters/Describing...
There's a H1N1 model here [1] that one could use as a starting point, I imagine?
This really seems like a case of "it's not a bug, it's a feature". It may be rare (so far, anyway), but few would argue "danger and death" is an inaccurate characterization.
I really like the article though.
Maybe thick 2d lines would work better?
Jack would happily watch people die if it sold more licenses.
No, no, no. 2.2% (conservatively) is not extremely rare for a virus that we have been helpless to stop from spreading to every continent except Antarctica.
Some of the point in this discussion are pretty good, but the thing I missed is a good commentary on the temporal nature of anything like virus spread.
Google, Responsibly Facebook, Responsibly AirBNB, Responsibly
Prove me wrong.