There ain't no rules around here! We're trying to accomplish something!

Archive for February, 2009|Monthly archive page

Was Darwin overrated?

In Biology on February 15, 2009 at 2:06 am

Watch John Horgan and Carl Zimmer, preeminent science journalists, gab about Darwin (his birthday was Thursday.) Darwin as Hollywood star, horizontal evolution, rethinking the tree of life, group selection — it’s all good.

As for whether Darwin is overrated, there are a couple ways of looking at it. You can focus on his contemporaries and predecessors, and note that he wasn’t alone in thinking about evolution: there was traveling naturalist Alfred Russel Wallace, whose observations about the geographic distribution of species led him to theorize about the divergence of species, and whose correspondence heavily influenced Darwin’s Origin of Species. The idea that species change over time was advanced by Robert Grant, who saw a progression in fossil animals; Robert Chambers, author of the popular-science bestseller Vestiges of the Natural History of Creation; and Charles’ grandfather Erasmus Darwin, who hypothesized that all life had a common origin. You could argue that Darwin worked in a ferment of ideas about life’s origins and variation, that he was not alone. It’s a useful perspective. Scientists are rarely lone geniuses, even if they are geniuses — they collaborate, borrow, and bicker like everybody else.

Does that make Darwin overrated? I don’t think so. He did, after all, put forward the theory of natural selection as we now know it. Vestiges was a vast, mystical treatment of the origins of the universe (complete with some racial theorizing unsavory to contemporary eyes); Origin was a cautious work, anticipating every counterargument, bolstered with pages of evidence about pigeon breeding. Darwin made evolution a subject of scientific study.

He’s also a profoundly appealing figure. Unlike, say, Newton (combative, paranoid, devoted to his alchemy), Darwin the man was genuinely likeable. He had something of the attitude of the humble, persistent noodler. He measured armadillo fossils in the Galapagos, and thought it was odd that they resembled (but were not identical to) the armadillos his expedition was roasting for dinner over the campfire. He was a rigorous observer, but he also had a useful aimless curiosity. He was an abolitionist and a loving husband and father. We can sympathize with the loss of his daughter and his doubts about the theory of evolution. If we want to put a human face on science, we could do worse than Darwin.

Facebook, evolution, and mathematical modeling

In Biology, Delights, Math on February 12, 2009 at 6:09 pm

Slate has a neat article about Facebook’s new “25 things about me” craze. (For those who have remained blissfully ignorant: thousands of users wrote notes about random personal habits or goals, and tagged their friends in an expanding web of navel-gazing.) Turns out it can be modeled like an epidemic. A user is “contagious” for about one day — the day he tags a bunch of his friends in the note. After being tagged, most users respond within one day. Then response frequency drops off exponentially.

Here’s a nice Nature Review about the mathematics of modeling infectious disease.

biological infectiousness of influenza, HIV, and malaria

biological infectiousness of influenza, HIV, and malaria


The number of individuals that an infected person infects is given by a probability distribution. The probability that an infected person will infect another person within a small interval is

b(t) s dt

b is infectiousness, dt is an arbitrarily small amount of time, and s is the probability that the other person is infected.
If a group of individuals all have the same infectiousness, then the number of secondary infections that are caused by each infectious individual is a random number drawn from the Poisson distribution with mean R, where R is the expected number of new infected victims.

The interesting thing here is that the whole field of mathematical modeling of disease transmission isn’t going to be just a biological subject forever. It’s also going to be a behavioral subject. The idea that cultural ideas propagate and evolve like organisms isn’t new — it’s as old as Dawkins and his notion of “memes.” But back in the sixties he couldn’t have predicted just how concrete the similarities would be — that we could see the exact same differential equations governing Facebook crazes as malaria outbreaks. Watch as epidemiologists get drafted as marketing consultants in the next few years.

Reverse-engineering the brain

In Neuroscience, Technology on February 11, 2009 at 10:52 pm

modha_brain_660x

Via Wired:
“The plan is to engineer the mind by reverse-engineering the brain,” says Dharmendra Modha, manager of the cognitive computing project at IBM Almaden Research Center.

In what could be one of the most ambitious computing projects ever, neuroscientists, computer engineers and psychologists are coming together in a bid to create an entirely new computing architecture that can simulate the brain’s abilities for perception, interaction and cognition. All that, while being small enough to fit into a lunch box and consuming extremely small amounts of power.

The 39-year old Modha, a Mumbai, India-born computer science engineer, has helped assemble a coalition of the country’s best researchers in a collaborative project that includes five universities, including Stanford, Cornell and Columbia, in addition to IBM.

The researchers’ goal is first to simulate a human brain on a supercomputer. Then they plan to use new nano-materials to create logic gates and transistor-based equivalents of neurons and synapses, in order to build a hardware-based, brain-like system. It’s the first attempt of its kind.

Related is the project at Harvard and UCLA to map the “connectome” — the actual neural circuitry of the central and peripheral nervous system. That’s 100 billion neurons and several trillion synaptic connections; it’s equivalent in scale and scope to the Human Genome Project. The key discovery is the Brainbow, a technique for stimulating neurons so they took up different colored fluorescent probes; for the first time, tangled neurons can be visually distinguished.

The Connectome project is a kind of science that often gets short shrift these days: “inductive” reasoning, collecting a vast library of observations first, in the hopes that they will suggest theories and future questions. It’s very much in the tradition of Victorian naturalists (Darwin was an inductivist). The idea is that the brain is so complicated and so little understood that we don’t yet know what to theorize about or where to look; the humbler and ultimately more fruitful approach is to look around. This is a high-tech version of going back to biology’s beetle-pinning roots.

Stimulus 2.0

In Policy on February 11, 2009 at 12:23 pm

The Senate passed the stimulus bill yesterday, 61 to 37. After hashing out the differences between the House and Senate versions, Congress is expected to send a final version to the President by the end of the week.

Via ProPublica, here’s table of how the changes stack up. The largest cuts in the Senate version is in the state fiscal stabilization fund to be used for education and for cash-strapped states’ budgets. As for science and technology: it actually comes out ahead in the Senate version, as a whole, but some areas lose.

Fossil energy research gains 4.6 billion,
renewable energy research gains 2.5 billion,
NSF loses 1.8 billion,
NIH loses 1 billion,
DOE research loses 2 billion,
University research facilities lose 1.5 billion,
biomedical research gains 9 billion,
NASA gains 700 million,
NOAA gains 400 milion,
CDC loses 1.1 billion.

It’s not great, particularly if you’re in basic research, or work or study at a university. Education also receives significant cuts.

My thought on the stimulus (with Alice Rivlin, Bill Clinton’s budget director) is that it’s full of long term projects that don’t belong in a stimulus, and “Such a long-term investment program should not be put together hastily and lumped in with the anti-recession package.” (Econ geek note: you don’t have to be of the Fama/Cochrane school, “Government spending by definition crowds out all investment, and fiscal stimulus is a logical impossibility” to believe that this particular recovery package is flawed. I actually think Krugman’s argument makes a lot of sense.)

Most science spending (though not all; consider equipment replacement) is long-term investment, and so will come too late to stimulate. So from a macro perspective, it’s probably one of those items that shouldn’t be in the bill. But, like all other groups with an interest in federal funding, scientists can fairly worry that if we don’t get a boost now, we never will. I’m hoping, with my usual goofy optimism, that Congress will demonstrate that it takes scientific research seriously as a public good.

For the true wonks: all the changes between the House and Senate version, via TPM.

Australian bush fires and data portability

In Policy, Technology on February 10, 2009 at 6:25 pm

fire9feb2009
Google Australia has created a flash map of the fires currently devastating southeast Australia, with fire locations and status updates. Green areas are safe; red means fires are still in progress. These are the worst fires in Australia’s history and what’s particularly scary is that they may have been set deliberately. More than 100 are reported to have lost their lives.

Some food for thought: Googler Paula Fox was able to provide the flash map because the Victoria Fire Department supports the open standard RSS. (RSS is a standardized data format for frequently updated information, designed to be read on many kinds of programs.) But to be useful for visualization, fire data needs geographical information; there exist adaptations such as GeoRSS to do this, but the fire department didn’t have any such thing.

From technologist Elias Bizannes:

1) If you output data, output it in some standard structured format (like RSS, KML, etc).
2) If you want that data to be useful for visualisation, include both time and geographic (latitude/longitude information). Otherwise you’re hindering the public’s ability to use it.
3) Let the public use your data. The Google team spent some time to ensure they were not violating anything by using this data. Websites should be clearer about their rights of usage to enable mashers to work without fear
4) Extend the standards. It would have helped a lot of the CFA site extended their RSS with some custom elements (in their own namespace), for the structured data about the fires. Like for example Get the hell out of here.
5) Having all the Fire Department’s using the same standards would have make a world of difference – build the mashup using one method and it can be immediately useful for future uses.

Natural disaster response needs data, and good data sharing protocols. US agencies aren’t always so good at that. During Katrina, it was the volunteer database Katrinalist that helped people find survivor information. But FEMA’s models were not made available in a way that would allow first responders to act quickly. We need to work on that.

Diffusion geometry: data takes shape

In Math on February 10, 2009 at 4:34 am

Graph of a set of 1000 documents

Graph of a set of 1000 documents, arranged by similarity.


How can you organize and extract information from huge data sets of digital text documents? How do you develop automatic recommendations based on preference history? Could Google ever personalize your search results to reflect your previous interests? Essentially, this is an applied math problem, and it uses a relatively new technique called diffusion geometry. Here’s an article by Ronald Coifman and Mauro Maggioni, two leaders in the field.

Basically, we want to understand the geometry of the data. We would like data points to lie on a manifold — a smooth surface in high-dimensional space, representing some kind of relatively simple rule (just as a sphere represents the rule “all points are the same distance from the origin.”) Then, we want to guess functions on the data from a few samples, with the goal of predicting values of the functions at new points.

The first step is to define a “similarity” function on the data. For example, if you have a thousand articles, a data point might be a vector consisting of the frequency of each word in the article (note that these vectors are huge!) and the similarity might be the correlation between word vectors when larger than 0.95, and zero otherwise.

Then, if we normalize the similarity, it’s a Markov chain; it describes a random walk, the probability of “jumping” from one document to a similar one. We can imagine a drunken ant crawling at random from one node to another, hitting similar, nearby nodes most often, but occasionally venturing to farther ones.

The eigenvectors of this Markov chain can be used as new coordinates for the data set; we can make a map of the data in space so that the distances between points equal the “diffusion distances” on the original data, the probabilities of reaching one node from another. This gives the desired representation of the data on a low-dimensional surface.

What this means is that the data can create its own rules and categories. Instead of labeling articles “business” or “sports,” we’ll see a cluster in the data that forms its own label. It’s a kind of spontaneous organization.

Here’s a nifty tutorial from Duke; for more mathematical detail, check out this article.

Better labs

In Biology, Delights on February 10, 2009 at 2:43 am

xwpn0s3l4h6uu7t10ezprxceo1_500
DIYBio, “an organization that aims to help make biology a worthwhile pursuit for citizen scientists, amateur biologists, and DIY biological engineers” has a lot of wonderful, decentralized ideas for making biology work better. One project is the SmartLab, a benchtop that can

1. identify tools (microscopes, pipettes, gel electrophoresis boxes, etc.) by barcodes or RFID tags, and display contextual information for them; how much you’ve pipetted, what you just put in the tube, and so on.

2. guide you through a protocol.

3. keep a virtual lab notebook of everything you do. Video, audio, measurements. Your electrophoresis box is “smart” and records data in realtime.

It’s a radical idea: human error isn’t inevitable. “Forgetting” a step in your protocol isn’t inevitable. You can work around your own fallibility.

Incidentally, this reminds me of something from my high school days as a Kid Intern in a genetics lab. The lowliest job was racking pipette tips by hand, and it was a running joke that someday someone would invent a machine that could rack tips automatically and make a fortune. Well, I found it. It was invented by a couple of guys in a kibbutz, the year I graduated.
il2006000660_14122006_gz_enx4-b

Something lovely

In Delights, Neuroscience on February 10, 2009 at 1:55 am

cajal440-web

This is a Ramon y Cajal drawing of a human motor cortex pyramidal cell.

Santiago Ramon y Cajal pioneered the study of the fine structure of the nervous system in the 1890’s. His meticulous and lucid drawings are, by any standard, both science and art. The full-text of one of his books is here.

From the Science Tattoo Emporium.

Stimulus for scientists

In Policy on February 9, 2009 at 9:20 pm

Since President Obama has pledged to “restore science to its rightful place,” researchers have been hopeful that the new economic stimulus package will include a boost to science. Fortunately, the bill includes funds for basic research: $10 billion for the National Institutes of Health, $40 billion for the Department of Energy, and more than $1 billion each for NASA, the National Science Foundation, and the National Oceanic and Atmospheric Administration. This is significantly down from the more generous House package, though: the NSF funding, for instance, has been cut from $3 billion to $1 billion.

It’s important to note that this is a package of more than $800 billion. The cost of funding science is pretty minor in comparison. But research — even and especially basic research — drives future productivity. Cosmic Variance makes the point that we’re not going to get the much vaunted revolution in green energy without some physicists (like those at Princeton’s own Plasma Physics Lab.) Basic research is an investment in the future, in the jobs that don’t exist yet.

Now there’s a fair argument that a stimulus package is the wrong place to put science spending, because of its emphasis on speed. The NSF has to allocate all its funds in three months. That’s an really tight schedule, and it almost guarantees a slapdash approach to funding. Maybe a decade-long commitment to more science funding would be better than a windfall in the stimulus package. On the other hand, politics is an imperfect endeavor, and this may be science’s best bet.

Back during the campaign, University of Chicago economist and Nobel Laureate Jim Heckman articulated his hopes that an Obama administration would focus on a “future-oriented society”:

The real question apart from the current turmoil is the longer run. Denying the value of investment in knowledge; in infrastructure; in basic science and education at all levels has been and will continue to be harmful to our long run health. In my mind Obama’s eyes are fixed more on things that will improve the US economy in the next century.

( Whole thing here.)

Let’s hope that the bill that’s likely to pass on Tuesday — and science funding over the next four years — lives up to that standard.<