Monthly Archives: April 2009

Notes from Stephen Wolfram webcast

These are my raw notes from the session with Stephen Wolfram on the pre-launch of the Wolfram Alpha service at the Berkman center. Unfortunately, I was on a really bad Internet connection and only got the sound, and missed the first 20 minutes or so running around trying to find something better.

Notes from Stephen Wolfram on Alpha debut

…discussion of queries:
– nutrition in a slice of cheddar
– height of Mount Everest divided by length of Golden Gate bridge
– what’s the next item in this sequence
– type in a random number, see what it knows about it
– "next total solar eclipse"

What is the technology?
– computes things, it is harder to find answers on the web the more specifically you ask
– instead, we try to compute using all kinds of formulas and models created from science and package it so that we can walk up to a web site and have it provide the answer

– four pieces of technology:
— data curation, trillions of pieces of curated data, free/licensed, feeds, verify and clean this (curate), built industrial data curation line, much of it requires human domain expertise, but you need curated data
— algorithms: methods and models, expressed in Mathematica, there is a finite number of methods and models, but it is a large number…. now 5-6 million lines of math code
— linguistic analysis to understand input, no manual or documentation, have to interpret natural language. This is a little bit different from trad NL processing. working with more limited set of symbols and words. Many new methods, has turned out that ambiguity is not such a bit problem once we have mapped it onto a symbolic representation
— ability to automate presentation of things. What do you show people so they can cognitively grasp what you are, requires computational esthetics, domain knowledge.

Will run on 10k CPUs, using Grid Mathematica.
90% of the shelves in a typical reference library we have a decent start on
provide something authoritative and then give references to something upstream that is
know about ranges of values for things, can deal with that
try to give footnotes as best we can

Q: how do you deal with keeping data current
– many people have data and want to make it available
– mechanism to contribute data and mechanism for us to audit it

first instance is for humans to interact with it
there will be a variance of APIs,
intention to have a personalizable version of Alpha
metadata standards: when we open up our data repository mechanism, wn we use that can make data available

Questions from audience:

Differences of opinion in science?
– we try to give a footnote
– Most people are not exposed to science and engineering, you can do this without being a scientist

How much will you charge for this?
– website will be free
– corporate sponsors will be there as well, in sidebars
– we will know what kind of questions people ask, how can we ingest vendor information and make it available, need a wall of auditing
– professional version, subscription service

Can you combine databases, for instance to compute total mass of people in England?
– probably not automatically…
– can derive it
– "mass of people in England"
– we are working on the splat page, what happens when it doesn’t know, tries to break the query down into manageable parts
300th largest country in Europe? – answers "no known countries"

Data sources? Population of Internet users. how do you choose?
– identifying good sources is a key problem
– we try do it well, use experts, compare
– US government typically does a really good job
– we provide source information
– have personally been on the phone with many experts, is the data knowable?
– "based on available mortality data" or something

Technology focus in the future, aside from data curation?
– all of them need to be pushed forward
– more, better, faster of what we have, deeper into the data
– being able to deal with longer and more complicated linguistics
– being able to take pseudocode
– being able to take raw data or image input
– it takes me 5-10 years to understand what the next step is in a project…

How do you see this in contrast with semantic web?
– if the semantic web had been there, this would be much easier
– most of our data is not from the web, but from databases
– within Wolfram Alpha we have a symbolic ontology, didn’t create this as top down, mostly bottom-up from domains, merged them together when we realized similarities
– would like to do some semantic web things, expose our ontological mechanisms

At what point can we look at the formal specs for these ontologies?
– good news: All in symbolic mathematical code
– injecting new knowledge is complicated – nl is surprisingly messy, such as new terms coming in, for instance putting in people and there is this guy called "50 cent"
– exposure of ontology will happen
– the more words you need to describe the question, the harder it is
– there are holes in the data, hope that people will be motivated to fill them in

Social network? Communities?
– interesting, don’t know yet

How about more popular knowledge?
– who is the tallest of Britney Spears and 50 cent
– popular knowledge is more shallowly computable than scientific information
– linguistic horrors, book names and such, much of it clashes
– will need some popularity index, use Wikipedia a lot, can determine whether a person is important or not

The meaning of life? 42….

Integration with CYC?
– CYC is most advanced common sense reasoning system
– CYC takes what they reason about things and make it computing strengths
– human reasoning not that good when it comes to physics, more like Newton and using math

Will you provide the code?
– in Mathematica, code tends to be succinct enough that you can read it
– state of the art of synthesizing human-readable theorems is not that good yet
– humans are less efficient than automated and quantitative qa methods
– in many cases you can just ask it for the formula
– our pride lies in the integration, not in the models, for they come from the world
– "show formula"

Will this be integrated into Mathematica?
– future version will have a special mode, linguistic analysis, pop it to the server, can use the computation

How much more work on the natural language side?
– we don’t know
– pretty good at removing linguistic fluff, have to be careful
– when you look at people interacting with the system, but pretty soon they get lazy, only type in the things they need to know
– word order irrelevant, queries get pared down, we see deep structure of language
– but we don’t know how much further we need to go

How does this change the landscape of public access to knowledge?
– proprietary databases: challenge is make the right kind of deal
– we have been pretty successful
– we can convince them to make it casually available, but we would have to be careful that the whole thing can’t be lifted out
– we have yet to learn all the issues here

– have been pleasantly surprised by the extent to which people have given access
– there is a lot of genuinely good public data out there

This is a proprietary system – how do you feel about a wiki solution outcompeting you?
– that would be great, but
– making this thing is not easy, many parts, not just shovel in a lot of data
– Wikipedia is fantastic, but it has gone in particular directions. If you are looking for systematic data, properties of chemicals, for instance, over the course of the next two years, they get modified and there is not consistency left
– the most useful thing about Wikipedia is the folk knowledge you get there, what are things called, what is popular
– have thought about how to franchise out, it is not that easy
– by the way, it is free anyway…
– will we be inundated by new data? Encouraged by good automated curation pipelines. I like to believe that an ecosystem will develop, we can scale up.
– if you want this to work well, you can’t have 10K people feeding things in, you need central leadership

Interesting queries?
– "map of the cat" (this is what I call artificial stupidity)
– does not know anatomy yet
– how realtime is stock data? One minute delayed, some limitations
– there will be many novelty queries, but after that dies down, we are left with people who will want to use this every day

How will you feel if Google presents your results as part of their results?
– there are synergies
– we are generating things on the fly, this is not exposable to search engines
– one way to do it could be to prescan the search stream and see if wolfram alpha can have a chance to answer this

Role for academia?
– academia no longer accumulates data, useful for the world, but not for the university
– it is a shame that this has been seen as less academically respectable
– when chemistry was young, people went out and looked at every possible molecule
– this is much to computer complicated for the typical libraries
– historical antecedents may be Leibniz’ mechanical and computational calculators, he had the idea, but 300 years too early

When do we go live?
… a few weeks
– maybe a webcast if we dare…

Young male Russians drink, whore and fight themselves to death

This rather frightening article by Nicholas Eberstadt from World Affairs looks into the causes of Russian depopulation and falling life expectancy over the last 50 years or so. Russia is depopulating at a rate only found in really troubled countries in Africa, and the cause is the high mortality, in particular, young men:

According to the U.S. Census Bureau International Data Base for 2007, Russia ranked 164 out of 226 globally in overall life expectancy. Russia is below Bolivia, South America’s poorest (and least healthy) country and lower than Iraq and India, but somewhat higher than Pakistan. For females, the Russian Federation life expectancy will not be as high as in Nicaragua, Morocco, or Egypt. For males, it will be in the same league as that of Cambodia, Ghana, and Eritrea.
In the face of today’s exceptionally elevated mortality levels for Russia’s young adults, it is no wonder that an unspecified proportion of the country’s would-be mothers and fathers respond by opting for fewer offspring than they would otherwise desire. To a degree not generally appreciated, Russia’s current fertility crisis is a consequence of its mortality crisis.

The reason is binge alcoholism (on average, one bottle of vodka per week, according to some experts), HIV, tuberculosis, accidents and violence: "No literate and urban society in the modern world faces a risk of deaths from injuries comparable to the one that Russia experiences." The consequences are dire:

In the contemporary international economy, one additional year of life expectancy at birth is associated with an increase in per capita output of about 8 percent. A decade of lost life expectancy improvement would correspond to the loss of a doubling of per capita income. By this standard, Russia’s economic as well as its demographic future is in jeopardy.

So, how to mitigate this – as the author sees few and recommends no solutions?

Management is fundamentally an oral culture and analytics a literate one

Great stuff from my old pal Jim McGee: Bridging managerial and analytic cultures, part 1 and part 2.

From part the first:

Technology professionals have long struggled with getting a complex message across to management. In our honest and unguarded moments, we talk of "dumbing it down for the suits." But the challenge is more subtle than that. We need to repackage the argument to work within the frame of oral thought.

And second:

In addition to helping the analytically biased see the value of creating a compelling story, you need to help them see how and why story works differently than analysis. The best stories to drive change are not complex, literary, novels. They are epic poetry; tapping into archetypes and cliché, acknowledging tradition, grounded in the particular.

…which, of course, is why personalized examples work so well. (And work so badly when not connected to a logical argument or important point.)

In other words – there should be plenty of work for all those laid-off journalists in companies, trying to find le mot juste that will transform the numbingly complex into the directionally intuitive.

Read the whole thing – if nothing else, for the language.

Steroids for the flighty-minded

An excellent and truly scary article by Margaret Talbot in the New Yorker about the use of neuroenhancers by people who are not ill. Which is comparable to recreational plastic surgery, which I don’t like either.

Is it just me, or is cheating seen as more and more normal and not to be punished or even held in contempt? When I catch students plagiarizing (which happens with a depressing frequency, partly because the tools for doing so have gotten so much better) their defense is more and more that this is normal, that you cannot expect them to come up with something original when everything is available out there on Google and Wikipedia. My retort is that I need to judge them on their own work, not others’, and that they therefore need to make it clear to me what they have done themselves and what they have found somewhere else. And their answer is that they put "Source: Wikipedia" at the bottom and therefore they are scot free, so there.

I would get angry if this wasn’t so depressing and so pointless. I am tempted to just fail them. Not for plagiarism – which entails disciplinary committees and all sorts of make-work. Rather an F for outright stupidity.

It is some consolation that creativity is one area where neuroenhancers don’t seem to work. But they might, as the article finds,  help these modern-day multitaskers concentrate on one specific task (hoping that it is a productive one and not, say, obsessively alphabetizing your library.) But neuroenhancers won’t make your ideas better – they won’t assist in spotting the prey, only in bringing it home. In the most dreary way possible:

Every era, it seems, has its own defining drug. Neuroenhancers are perfectly suited for the anxiety of white-collar competition in a floundering economy. And they have a synergistic relationship with our multiplying digital technologies: the more gadgets we own, the more distracted we become, and the more we need help in order to focus. The experience that neuroenhancement offers is not, for the most part, about opening the doors of perception, or about breaking the bonds of the self, or about experiencing a surge of genius. It’s about squeezing out an extra few hours to finish those sales figures when you’d really rather collapse into bed; getting a B instead of a B-minus on the final exam in a lecture class where you spent half your time texting; cramming for the G.R.E.s at night, because the information-industry job you got after college turned out to be deadening. Neuroenhancers don’t offer freedom. Rather, they facilitate a pinched, unromantic, grindingly efficient form of productivity.

If you find that tempting, be my guest. I am sure you can find directions via Google.

Jon Udell on observable work

Jon Udell has a great presentation over at Slideshare on how to work in observable spaces – something that should be done, to a much larger extent, by academics. I quite agree (and really need to get better at this myself):

Not sure if this is a good thing

Bill Schiano and I, between ourselves, solved this one pretty quickly. (That is, we found the computer names, not the extra thing, not mentioned on the site.)

(Incidentally, I also found SAGE, which was a pretty important computer system in its own right (as well as a computer company.). Also UNIX, CEC 80 (which at least sounds like a computer) and "rank" and "crib". Oh well.