Category Archives: Network research

Write, that I may find thee

A Google Dance – when Google changes its rankings of web sites – used to be something that happened infrequently enough that each “dance” had a name – Boston, Fritz and Brandy, for instance – but are now happening more than 500 times per year, with names like Panda #25 and Penguin 2.0, to name a few relatively recent ones. (There is even a Google algorithm change “weather report”, as many of the updates now are unnamed and very frequent.) As a consequence, search engine optimization seems to me to be changing – and funny enough, is less and less about optimization and more and more about origination and creation.

It turns out that Google is now more and more about original content – that means, for instance, that you can no longer boost your web site simply by using Google Translate to create a French or Korean version of your content. Nor can you create lots of stuff that nobody reads – and by nobody, I mean not just that nobody reads your article, but that the incoming links are from, well, nobodies. According to my sources, Google’s algorithms have now evolved to the point where there are just two main mechanisms for generating the good Google juice (and they are related):

  1. Write something original and good, not seen anywhere else on the web.
  2. Get some incoming links from web sites with good Google-juice, such as the New York Times, Boing Boing, a well-known university or, well, any of the “Big 10” domains (Wikipedia, Amazon, Youtube, Facebook, eBay (2 versions), Yelp, WebMD, Walmart, and Target.)

The importance of the top domains is increasing, as seen by this chart from mozcast.com:

image

In other words, search engines are moving towards the same strategy for determining what is important as the rest of the world has – if it garners the attention of the movers and shakers (and, importantly, is not a copy of something else) it must be important and hence, worthy of your attention.

For the serious companies (and publishers) out there, this is good news: Write well and interesting, and you will be rewarded with more readers and more influence. This also means that companies seeking to boost their web presence may be well advised to hire good writers and create good content, rather than resort to all kinds of shady tricks – duplication of content, acquired traffic (including hiring people to search Google and click on your links and ads), and backlinks from serially created WordPress sites.

For writers, this may be good news – perhaps there is a future for good writing and serious journalism after all. The difference is that now you write to be found original by a search engine – and should a more august publication with a human behind it see what you write and publish it, that will just be a nice bonus.

Advertisements

What you can learn from your LinkedIn network

LinkedIn Maps is a fascinating service that lets you map out your contact network. Here is my first-level network, with 848 nodes (click for larger image):

image

The colors are added automatically by LinkedIn, presumably by profile similarity and link to other networks. You have to add the labels yourself – they are reasonably precise, at least for the top five groups (listed according to size and, I presume, interconnectedness).

As can be seen, I am a gatekeeper between a network of consultants and researchers in the States (the orange group) and reasonably plugged into the IT industry, primarily Norwegian (the dark blue). The others are fairly obvious, with the exception of the last category, which happens to be an eclectic group that I interact with quite a lot, but which are hard to categorize, at least from their backgrounds.

Incidentally, the “shared” map, which takes away names, provides more information for analysis. Note the yellow nodes in my green network on the right: These are the people hired by BI to manage or teach in China. They are, not in nationality but in orientation, foreigners in their own organization.

My LinkedIn policy is to accept anyone I know (i.e. have had dealings with and would like in my network), which, naturally, includes a number of students (I will friend any student of my courses as long as I can remember them, though I must admit I am a bit sloppy there.)

What is missing? Two things stand out: I have many contacts in Norwegian media and in the international blogosphere, which isn’t here because, well, Norwegian media use Twitter or their own outlets, and bloggers use, well, their blogs. Hence, the commentariat is largely invisible in the LinkedIn world (except for Jill Walker Rettberg, who sicced me onto LinkedIn Maps). Also, a number of personal friends are not here, simply because LinkedIn is a professional network – and as such captures formal relationships, not your daily communications.

Now, what really would make me curious is what this map would look like for my Facebook, Twitter and Gmail accounts – and how they overlap. But the network in itself is interesting – and tells me that increasing the interaction between my USA network and the Norwegian IT industry wouldn’t hurt.

Two books on search and social network analysis

Social Network Analysis for Startups: Finding connections on the social webSocial Network Analysis for Startups: Finding connections on the social web by Maksim Tsvetovat
My rating: 3 of 5 stars

Concise and well-written (like most O’Reilly stuff) book on basic social network analysis, complete with (Python, Unix-based) code and examples. You can ignore the code samples if you want to just read the book (I was able to replicate some of them using UCINet, a network analysis tool).

Liked it. Recommended.

Search Analytics for Your Site: Conversations with Your CustomersSearch Analytics for Your Site: Conversations with Your Customers by Louis Rosenfeld
My rating: 4 of 5 stars

Very straightforward and practically oriented – with lots of good examples. Search log analysis – seeing what customers are looking for and whether or not they find it – is as close to having a real, recorded and analyzable conversation with your customers as you can come, yet very few companies do it. Rosenfeld shows how to do it, and also how to find the low-hanging fruit and how to justify spending resources on it.

This is not rocket science – I was, quite frankly, astonished at how few companies do this. With more and more traffic coming from search engines, more and more users using search rather than hierarchical navigation, and the invisibility of dissatisfied customers (and the lost opportunities they represent) this should be high on any CIOs agenda.

Highly recommended.

View all my reviews

Our search-detected personalities

Personas is an interesting project at the Media Lab which takes your (or anyone else’s) name as input and then determines our personalities based on what it finds about us on the web, generating a graphical representation. This is my result:

image

…which I found rather disturbing: Fame, sports and religion seems to take way to much space here. The reason, of course, is that my name is rather common in Norway, and, for example, a formerly well known skier skews the results, even though I seem to be the most web-known person with that name.

Anyway, if you have a rare name, it might be accurate – and if your name is John Smith, you might be left with an average, possibly tilted a bit towards Pocahontas:

image

Anyway – try it out. You might be surprised. And please remember – this is an art project, not an accurate representation of anything…

Update September 20:I somehow forgot to point to Naomi Haque’s blog post about Personas, with discussion of how social networking changes our perception of self.

Messy works magically

Craigslist is a mess that is currently taking the mickey out of eBay and irritating Google, according to a fun article in Wired. I am not surprised. The value of a meeting place is not what happens there, but who is there – and by minimizing controls and keeping most transactions face-to-face, Cragslist is eeking out the value from the network with minimal investment and a business model that really isn’t a business model.

As for the messy design, well, it is quick, and you learn very fast where to click to get what you want.

The funny thing is that in Norway, the most popular website by far is vgnett.no, the online version of the biggest tabloid paper – or, rather, an online paper that shares the name, but not must else, with VG, the paper paper. The online version has its own editorial office and their design is evolved, evolving, and the perennial joke with web designers for its busy organization and ratty typeface. They would love to replace it with something akin to Aftenposten or New York Times, where order, quality and completeness reigns. VGnett would beg to offer – they know the use patterns of their audience and serve it, messy or not.

Network externalities in plain view, in other words.

Are social networks a help or a threat to headhunters?

In a currently hot Youtube video which breathlessly evangelizes the revolutionary nature of social networks, I found this statement: "80% of companies are using LinkedIn as their primary tool to find employees". In the comments this is corrected to "80 percent of companies use or are planning to use social networking to find and attract candidates this year", which sounds rather more believable. Social media is where the young people (and, eventually, us in the middle ages as well) are, so that is where you should look.

At the same time, many of the most prolific users of LinkedIn (and, at least according to this guy, Twitter), both in terms of number of contacts and other activities, are headhunters. It is these people’s business to know many people and be able to find someone who matches a company’s demands.

image Headhunters are the proverbial networkers – they derive their value from knowing not just many people, but the right people. In particular, headhunters that know people in many places are valuable, because they would then be the only conduit between one group and another. Your network is more valuable the fewer of your contacts are also in contact with each other.

The American sociologist Ronald S. Burt, in his book Structural Holes: The Social Networks of Competition (1992), showed that social capital accrues to those who not only know many people, but have connections across groups. Or, in other words, if everyone had been directly linked, you would have a dense network structure. The fact that we aren’t, means that there are structural holes – hence the term. In the picture to the right, we see a social network of 9 individuals. Person A here derives social capital from being the link two groups that otherwise are only internally connected. A would be an excellent headhunter here. (Much as profits only can be generated if you can locate market imperfections).

LinkedIn is a social networks, indistinguishable from a regular one (i.e., one that is not digitally facilitated) except that you can search across the network, directly up to three levels away, indirectly a bit further. Headhunters like it for this reason, and use it extensively in the early phases of locating a candidate. The trouble is, LinkedIn (not to mention the tendency of more and more people having their CV online on regular websites) makes searching for candidates easy for everyone else as well. In other words – while initially helpful, is the long term result of this searchability that headhunters will no long be necessary.

Search technology – in social networks as well as in general – lowers the transaction cost of finding something. Lower transaction costs favors coordination by markets rather than hierarchy (or, in this case, network). Hence, the value of having a central position in that network should diminish. On the other hand, search technology (in networks in particular) allows you to extend your network, hence increase your social capital. Which effect is stronger remains to be seen.

Anyway, this should make for interesting research. Anyone out there in headhunterland interested in talking to me about their use of these tools?

From links to seeds: Edging towards the semantic web

Wolfram Alpha just may take us one step closer to the elusive Semantic Web, by evolving a communication protocol out of its query terms.

(this is very much in ruminating form – comments welcome)

Wolfram Alpha officially launched on May 18, an exciting new kind of "computational" search engine which, rather than looking up documents where your questions have been answered before, actually computes the answer. The difference, as Stephen Wolfram himself has said, is that if you ask what the distance is to the moon, Google and other search engines will find you documents that tells you the average distance, whereas Wolfram Alpha will calculate what the distance is right now, and tell you that, in addition to many other facts (such as the average). Wolfram Alpha does not store answers, but creates them every time. And it does primarily answer numerical, computable questions.

The difference between Google (and other search engines) and Wolfram Alpha is not so clear-cut, of course. If you ask Google "17 mpg in liters per 100km" it will calculate the result for you. And you can send Wolfram Alpha non-computational queries such as "Norway" and it will give an informational answer. The difference lies more in what kind of data the two services work against, and how they determine what to show you: Google crawls the web, tracking links and monitoring user responses, in a sense asking every page and every user of their services what they think about all web pages (mostly, of course, we don’t think anything about most of them, but in principle we do.) Wolfram Alpha works against a database of facts with a set of defined computational algorithms – it stores less and derives more. (That being said, they will both answer the question "what is the answer to life, the universe and everything" the same way….)

While the technical differences are important and interesting, the real difference between WA and Google lies in what kind of questions they can answer – to use Clayton Christensen’s concept, the different jobs you would hire them to do. You would hire Google to figuring out information, introduction, background and concepts – or to find that email you didn’t bother filing away in the correct folder. You would hire Alpha to answer precise questions and get the facts, rather than what the web collectively has decided is the facts.

The meaning of it all

Now – what will the long-term impact of Alpha be? Google has made us replace categorization with search – we no longer bother filing things away and remembering them, for we can find them with a few half-remembered keywords, relying on sophisticated query front-end processing and the fact that most of our not that great minds think depressingly alike. Wolfram Alpha, on the other hand, is quite a different animal. Back in the 80s, I once saw someone exhort their not very digital readers to think of the personal computer as a "friendly assistant who is quite stupid in everything but mathematics."  Wolfram Alpha is quite a bit smarter than that, of course, but the fact is that we now have access to this service which, quite simply, will do the math and look up the facts for us. Our own personal Hermione Granger, as it is.

I think the long-term impact of Wolfram Alpha will be to further something that may not have started with Google, but certainly became apparent with them: The use of search terms (or, if you will, seeds) as references. It is already common to, rather than writing out a URL, to help people find something by saying "Google this and you will find it". I have a couple of blogs and a web page, but googling my name will get you there faster (and you can misspell my last name and still not miss.) The risk in doing that, of course, is that something can intervene. As I read (in this paper) General Motors, a few years ago, had an ad for a new Pontiac model, at the end of which they exhorted the audience to "Google Pontiac" to find out more. Mazda quickly set up a web page with Pontiac in it, bought some keywords on Google, and quite literally Shanghaied GM’s ad.

Wolfram Alpha, on the other hand, will, given the same input, return the same answer every time. If the answer should change, it is because the underlying data has changed (or, extremely rarely, because somebody figured out a new way of calculating it.) It would not be because someone external to the company has figured out a way to game the system. This means that we can use references to Wolfram Alpha as shorthand – enter "budget surplus" in Wolfram Alpha, and the results will stare you in the face. In the sense that math is a language for expressing certain concepts in a very terse and precise language, Wolfram Alpha seeds will, I think, emerge as a notation for referring to factual information.

A short detour into graffiti

Back in the early-to-mid-90s, Apple launched one of the first pen-based PDAs, the Apple Newton. The Newton was, for its time, an amazing technology, but for once Apple screwed it up, largely because they tried to make the device do too much. One important issue was the handwriting recognition software – it would let you write in your own handwriting, and then try to interpret it. I am a physician’s son, and I certainly took after my father in the handwriting department. Newton could not make sense of my scribbles, even if I tried to behave, and, given that handwriting recognition is hard, it took a long time doing it. I bought one, and then sent it back. Then the Palm Pilot came, and became the device to get.

The Palm Pilot did not recognize handwriting – it demanded that you, the user, wrote to it in a sign language called Graffiti, which recognized individual characters. Most of the characters resembled the regular characters enough that you could guess what they were, for the others you either had to consult a small plastic card or experiment. The feedback was rapid, to experimenting usually worked well, and pretty soon you had learned – or, rather, your hand had learned – to enter the Graffiti characters rapidly and accurately.

Wolfram Alpha works in the same way as Graffiti did: As Steven Wolfram says in his talk at the Berkman Center, people start out writing natural language but pretty quickly trim it down to just the key concepts (a process known in search technology circles as "anti-phrasing".) In other words, by dint of patience and experimentation, we (or, at least, some of us) will learn to write queries in a notation that Wolfram Alpha understands, much like our hands learned Graffiti.

From links to seeds to semantics

Semantics is really about symbols and shorthand – a word is created as shorthand for a more complicated concept by a process of internalization. When learning a language, rapid feedback helps (which is why I th
ink it is easier to learn a language with a strict and terse grammar rather than a permissive one), simplicity helps, and a structure and culture that allows for creating new words by relying on shared context and intuitive combinations (see this great video with Stephen Fry and Jonathan Ross on language creation for some great examples.)

And this is what we need to do – gather around Wolfram Alpha and figure out the best way of interacting with the system -and then conduct "what if" analysis of what happens if we change the input just a little. To a certain extent, it is happening already, starting with people finding Easter Eggs – little jokes developers leave in programs for users to find. Pretty soon we will start figuring out the notation, and you will see web pages use Wolfram Alpha queries first as references, then as modules, then as dynamic elements.

It is sort of quirky when humans start to exchange query seeds (or search terms, if you will).  It gets downright interesting when computers start doing it. It would also be part of an ongoing evolution of gradually increasing meaningfulness of computer messaging.

When computers – or, if you will, programs – needed to exchange information in the early days, they did it in a machine-efficient manner – information was passed using shared memory addresses, hexadecimal codes, assembler instructions and other terse and efficient, but humanly unreadable encoding schemes. Sometime in the early 80s, computers were getting powerful enough that the exchanges gradually could be done in human-readable format – the SMTP protocol, for instance, a standard for exchanging email, could be read and even hand-built by humans (as I remember doing in 1985, to send email outside the company network I was on.) The world wide web, conceived in the early 90s and live to a wider audience in 1994, had at its core an addressing system – the URL – which could be used as a general way of conversing between computers, no matter what their operating system or languages. (To the technology purists out there – yes, WWW relies on a whole slew of other standards as well, but I am trying to make a point here) It was rather inefficient from a machine communication perspective, but very flexible and easy to understand for developers and users alike. Over time, it has been refined from pure exchange of information to the sophisticated exchanges needed to make sure it really is you when you log into your online bank – essentially by increasing the sophistication of the HTML markup language towards standards such as XML, where you can send over not just instructions and data but also definitions and metadata.

The much-discussed semantic web is the natural continuation of this evolution – programming further and further away from the metal, if you will. Human requests for information from each other are imprecise but rely on shared understanding of what is going on, ability to interpret results in context, and a willingness to use many clues and requests for clarification to arrive at a desired result. Observe two humans interacting over the telephone – they can have deep and rich discussions, but as soon as the conversation involves computers, they default to slow and simple communication protocols: Spelling words out (sometimes using the international phonetic alphabet), going back and forth about where to apply mouse clicks and keystrokes, double-checking to avoid mistakes. We just aren’t that good at communicating as computers – but can the computers eventually get good enough to communicate with us?

I think the solution lies in mutual adaptation, and the exchange of references to data and information in other terms than direct document addresses may just be the key to achieving that. Increases in performance and functionality of computers have always progressed in a punctuated equilibrium fashion, alternating between integrated and modular architectures. The first mainframes were integrated with simple terminal interfaces, which gave way to client-server architectures (exchanging SQL requests), which gave way to highly modular TCP/IP-based architectures (exchanging URLs), which may give way to mainframe-like semi-integrated data centers. I think those data centers will exchange information at a higher semantic level than any of the others – and Wolfram Alpha, with its terse but precise query structure may just be the way to get there.