Technology – Page 17 – David Bradley

Curing Pubmedophobia

Scienceroll’s Bertalan MeskÃ³ has come up with a solution for PubMed fatigue. It’s a debilitating condition that leads to feelings of inadequacy, but it’s not the patient who feels inadequate it’s the PubMed bot itself. “For a site that is as vital to scientific progress as PubMed is, their search engine is shamefully bad. It’s embarrassingly, frustratingly, painfully bad,” says Anna Kushnir on her nature networks blog.

So, MeskÃ³ has been connecting up some pipes on the interwebs to come up with the Scienceroll Search. Basically, a personalized medical search engine powered by Polymeta.com. “You can choose which databases to search in and which one to exclude from your list,” he explains, “It works with well-known medical search engines and databases and we’re totally open to add new ones or remove those you don’t really like.” I almost have a feeling it is something that might have been done with a personalized Google search, but I doubt it could be taken to this logical extreme in Google. So give it a try and leave feedback on MeskÃ³’s site.

Sciencebase Seedier Side

Censored elephant

Anyone would think Sciencebase resided in one of the seedier corners of the internet. Because of all the recent fuss about the new seven deadly sins, I was just checking out the visitor traffic using Google Webmaster Tools and found some quite worrying search queries that bring you, dear readers, to this cybershore.

Apparently, 4% of the searches on Google Images this week brought you looking for the periodic table of sex!

Well, I have to admit, there is a thumbnail graphic of said item on the site, mentioned in a post Periodic Post, from August 2006. And, Sciencebase ranks #7 in Google for that phrase, so it’s perhaps not surprising. Slightly more worrying is what people were searching for who reached my post Giving Good Headline, which was about the subject of press releases and the best approach to headline writing. Then there are the dozens of visitors looking for Girly Games who found my article on the psychology of video game addiction and how it apparently differs between males and females.

I’m loathe to list some of the other terms people are searching for on Google Images for that brings them to Sciencebase, oh go on then, you twisted my arm: erectile dysfunction, premature ejaculation, seven deadly sins, erotic, and several I’d blush even to type.

Now, Sciencebase, for one reason or another (and it’s not because, it’s that kind of site) somehow ranks quite highly for several of those terms. However, there are a few others that bring visitors to the site for which the site ranks way, way down the search engine results pages (SERPs), and I don’t just mean the bottom of page 1, or page 2 even, I mean bottom of page 73. Now, that reveals true search diligence, I’d suggest. Whoever works their way through 72 pages of results to get to that one item?

Most visitors are not after smut, thankfully, they’re after the site’s hard-hitting science news and views with a cynical bite. Unfortunately, as I was writing this post, I realised that posting it in its original form was unlikely to reduce the tide of filth, in fact it might simply encourage more of those seedier searches, so I’ve removed a few of the less choice keywords, just to make it safe for work and to prevent the site from getting slapped with an adult-filtering ban.

Afters I’d put the finishing touches to this piece, I checked the site’s top searches in GWMT again, just to see if there had been any additional phrases of interest. I discovered that Sciencebase is now ranking for “polar bears willie soon” in blog searching. Asa search phrase that has to be pretty unique. I assumed it was from someone in the UK urgently tracking down Arctic bestiality sites, or Inuit innuendo, until I realised it’s actually two phrases – Polar Bears (the Arctic beast in question) and Willie Soon (the climate change skeptic) – both of which I mentioned in my CO2 refusenik post recently. It seems that Sciencebase is not quite the quagmire of filth some of the site’s visitors had hoped after all.

Linked In Questions

Linked In Questions Recently, I did a little blogging experiment on the business networking site LinkedIn (inspired by a post on Copyblogger). I was writing a feature article for Sciencebase about risk and the public perception of trust in science and technology. As an alternative route into the opinions of lots of members of the community, I posted an open question asking rhetorically why the public no longer trusts science and told potential respondents to let me know if they didn’t mind being quoted in the article.

The question was worded very loosely with the aim of eliciting the strongest responses possible. It’s not something I would usually do, I’d simply approach independent experts and contacts and ask their opinions directly in a more traditional journalistic way. But, like I say, this was an experiment.

Replies poured in quite quickly. One respondent thought I was crazy for imagining that the public does not trust science. “People do trust science and scientists," he said, "Anyone who doesn’t, please stand up and be allowed to fall immediately victim to polio, the Black Death, measles, chronic sinus infection, prostate cancer, and on on and on.” Others were in a similar vein.

They were not the kind of responses I was expecting. As if by listing the various things that many people take for granted somehow measures their trust of science. In fact, one can make a similar list of the kinds of science-related topics that are alluded to in the research about which I was writing in the original post – GMOs, nuclear, cloning, mobile phone radiation, stem cells, cancer risk, adverse drug reactions, superbugs, vaccines, environment, pollution, chemical weapons, biological agents, military technology.

These are all science subjects, in some sense, and are considered seriously problematic in the eyes of the public. Of course, the solutions to all those problems also lies with science, but that doesn’t detract from the fact that the public commonly distrusts.

The article itself looked at how the public respond to such issues, specifically cancer clusters, and delved into how trust in such matters is actually coloured by the particular organisation or entity that is offering the information about the topic. Moreover, the study showed that the way people assess risk when faced with such information differs greatly depending on the source of the information. Their thinking seems to change in working on such a risk-benefit equation depending on the source, whether it’s come from an official organ or a pressure group, for instance.

Strangely, another respondent accused me of bias in my writing, as if somehow the placement of a deliberately provocative question in a public forum was somehow the writing itself rather than simply an enquiry.

I could not understand why he thought that my posing a question journalistically would preclude me from writing a neutral piece? It was his response to my initial broad question that has led me to write this post, however, so maybe I should thank him for the inspiration.

As I explained, I put the question with a deliberate and strong inflection in order to provoke the strongest response from the community. That’s pretty much a standard approach to getting useful opinions from people on both sides of an argument in journalism. If you don’t believe me listen to the way people like the BBC’s Jeremy Paxman and John Humphrys posture through their questioning in order to get the best response out of their interviewees. They often hint at a strong statement through their question one way or the other and people will either support what you say and offer their positive opinions or else argue against you.

It’s usually best to lean away from them (not only to avoid the blows but to inspire them to give the strongest argument for their case, and I’m not referring to Paxman or Humphrys here). Either way, you get useful comments on both sides that will provide the foundations for the actual writing and so allow you to produce a neutral article that reveals the pros and cons of an issue without personal bias.

Anyway, as an experiment, it didn’t work too well, initially. However, once the community had warmed to the question and I’d added a clarification some quite useful answers that weren’t simply an attack on the question itself began to emerge.

As it turns out, none of the responses really fit with what I wanted to report in the original post, which you can read in the Sciencebase blog under the title In What We Trust, by the way, and so I intend to write another post discussing the various points raised and namechecking those members of the LinkedIn community who were happy to be quoted.

Search and Cite for Science Bloggers

Crossref for WordPress

A couple of weeks ago I was reading a post by Will Griffiths on the ChemSpider Open Chemistry Web blog about how the DOI citation system of journal article lookups might be improved. The DOI system basically assigns each research paper a unique number depending, with an embedded publisher tag. Enter a DOI into a look up box (e.g. the DOI lookup on Sciencebase, foot of that page) and it almost instantaneously takes you to the paper in question. I use the DOI system for references in Sciencebase posts all the time.

There are a few cons that counteract its various pros, for instance, not all publishers use it and among those that do there are some who do not implement the DOI for their papers until they are in print, as opposed to online. Despite that it is very useful and commonly used. Having read Griffiths’ post about OpenURL a non-proprietary DOI alternative, I thought maybe it would be even more powerful if the concept were taken back another step the author level and I came up with the concept of a PaperID, which I blogged about on Chemspy.com. PaperID, I reasoned could be a unique identification tag for a reasearch paper, created by the author using a central open system (akin to the InChI code for labelling individual compounds). I’m still working out the ins and outs of this concept and while a few correspondents have spotted potentially fatal flaws others see it as a possible way forward.

Meanwhile, CrossRef, the association behind the publisher linking network, has just announced a beta version of a plugin for bloggers that can look up and insert DOI-enabled citations in a blog post. I’ve not investigated the plugin in detail yet, but you can download from a Crossref page at Sourceforge.net. the Crossref plugin apparently allows bloggers to add a widget to search CrossRef metadata using citations or partial citations. The results of the search, with multiple hits, are displayed and you then either click on a hit to go to the DOI, or click on an icon next to the hit to insert the citation into their blog entry. I presume they’re using plugin and widget in the accepted WordPress glossary sense of those words as the plugin is available only for WordPress users at the moment with a MoveableType port coming soon.

According to Geoffrey Bilder, CrossRef’s Director of Strategic Initiatives, “CrossRef is helping jumpstart the process of citing the formal literature from blogs. While there is a growing trend in scientific and academic blogging toward referring to formally published literature, until now there were few guidelines and very few tools to make that process easy.” Well that reference to a jumpstart sounds like marketing-speak to me, as Sciencebase and dozens of other science blogs have been using DOI for years.

Whether or not I will get around to installing what amounts to yet another WordPress plugin I haven’t decided. I may give it a go, but if it works as well as is promised you will hopefully not see the join. Meanwhile, let me have your thoughts on the usefulness of DOI and the potential of OpenURL and PaperID in the usual place below.

Sciencebase Upgraded

UPDATE: Since I wrote this post back in February 2008, WordPress has gone through many changes, updates on Sciencebase are automated these days too, which is marvellous.

I finally upgraded the Sciencebase site to the very latest version of WordPress, it had been languishing at version 2.1.3 (can you believe it?) for far too long. There had not only been dozens of security upgrades since that version and the current version 2.3.3 but various new features that the site was not making full use of.

It was a post by Wayne Liew WayneLiewDot.com that persuaded me to do the necessary and his recommendation for using a plugin that automates that whole process was the tipping point I needed.

Having carried out the upgrade (more on the actual WordPress upgrade process here) and found only a few minor problems, like a disordered sidebar, a couple of out-of-date plugins and just one irrelevant dead plugin, and fixed those as best as I could, I figured it was time for a weekend break. So my wife and I headed off to the seaside, abandoned the children with their grandparents and took off with the dog for a well-earned break at an artsy country town on the Suffolk coast. (Photos will appear soon on the Sciencebase Flickr account). Hence this trivial and possibly pointless post.

Back with a more substantial science based post later this week.

PaperID – An Open Source Identifier for Research Papers

As a journalist, I receive a lot of press releases that cite “forthcoming” papers. Depending on the publisher one can usually find the paper in a pre-press state on their website. However, it’s often the case that the DOI does not go live at the same time as the embargo expires on the press release, and so I might legitimately publish an article about the research I cannot use the DOI as the reference and must use the direct URL for the paper. Unfortunately, some publishers then move the paper when the paper publishes, so the link I used ends up broken.

Moreover, this cannot be useful for authors themselves in that a paper that does not make the grade at the International Journal of Good Stuff and ends up being resubmitted to the Parochial Bulletin of Not So Good Stuff will gain a different identification code along the way.

Will Griffiths on ChemSpider was recently discussing the possibility of an OpenURL system. I think we could go one step further.

A simple standardized way of generating a unique identifier for each and every paper that would be transportable between different phases of the publication process from submission to acceptance and publication, or rejection and resubmission elsewhere, would be a much better way of registering papers. The identifier would be created at the point when the final draft is ready to be mailed to the first editorial office in the chain, perhaps based on timestamp, lead author initials, and standard institution abbreviation. It could be the scientific literary equivalent of an InChIkey for each research paper.

There would have to be a standardized validation system, so that authors were sure to be using the right system, but that could be established relatively painlessly through the big institutions, be networked and have cross-checking to avoid duplicates. And, of course, be open source, open access.

The possibilities are endless, PaperID would create an electronic paper trail from author through preprint, in press, to online, and final publication. It might even be back-extended into the area of Open Notebook Science and equally usefully into archival, review, and cross-referencing.

DOI is useful most of the time OpenURL sounds intriguing, but PaperID could be revolutionary.

No Spies Under My Bed

Computer spy

Currently, the only truly effective way consumers can stop the collection of their personal data when shopping is not to use the internet, to be paid and to pay for everything in cash, and to hide their money in their mattress.

More seriously, most of us will continue to use web services despite privacy concerns. You can try to opt-out of marketing schemes or reconfigure your web browser to reject advances from sites that offer cookies or install spying applications. However, most such rejections will prevent you from trading on most e-commerce sites altogether. So, cookies will crumble, there’s no two ways about it if you want to shop online or use web 2.0 interactive sites. You can, of course, use software to delete those cookies as soon as you’re finished your interaction with the site and so gain a little privacy and prevent the sites tracking where you went after you left when you visit a second time. But, either way, they’re going to get lots of useful information while the cookie lasts.

There is no single solution to preventing the increasing erosion of personal privacy on the Web, says Madan Lal Bhasin of the Business School SungKyunKwan University in Seoul, South Korea. Writing in the latest issue of the International Journal of Internet Marketing and Advertising (2008, vol 4, pp 213-240) he describes how new e-commerce technology has increased the ability of online retailers and others to collect, monitor, target and sell personal information about their customers to third parties.

Countless companies across the globe are doing just that in ways that were not dreamed of before the advent of the Web. Moreover, the emergence of so-called social media web 2.0 sites, such as MySpace and Facebook has led to a new generation of privacy issues that go beyond those seen with conventional e-commerce websites.

As such, online consumer and web user privacy is becoming an ever keener focal point among cyber activists as well as among governments and regulators. That said, when it is the governments themselves losing and abusing the personal data of millions of taxpayers (see recent UK news), then the notion of any government protecting one’s privacy becomes absurd. Nevertheless, in the long-term, finding a balance between absolute personal privacy and the smooth operation of commerce and social sites in cyberspace poses a significant challenge.

Bhasin and colleagues point out that are grave dangers for corporations that collect and use personal information, ignoring privacy legislative and regulatory warning signs. Indeed, such abuse could prove to be very costly not only in terms of putative fines from regulators but also through loss of business among customers increasingly aware of their privacy rights. In the worst-case scenario it is not beyond the realms of possibility that a company abusing standard privacy etiquette to the extreme could collapse should word spread and users boycott the site or mount retaliatory attacks of their own against the company’s web servers. Regardless, many companies can and do repeatedly flout the complex rules and regulations that govern privacy in the US, Europe, and elsewhere.

Technology that protects consumer privacy must work without stifling e-commerce. It must somehow be foolproof and be entirely transparent to end-users. Unfortunately, no such technology yet achieves this. There are countless personal software products, such as anti-spyware programs, cookie cutters, anonymous proxies, and other solutions, such as Firefox plugins like NoScript (which blocks all scripting on a website) and AdBlockPlus (which blocks advertisements). These can reduce the chances of private data being sucked from an individual’s web browsing habits.

However, there are hundreds of such programs each with a slightly different purpose. The field is heavily fragmented and many users are not only unaware of these programs they are also generally unaware of the existence of spyware and cookies. An additional problem arises when novice users having heard rumour of spyware, download tools without taking advice. There are well-known legitimate tools available. There are many instances of malware surrogates of those tools that often rank higher in the search engine results pages and so are more prominent. Installation of such rogue programs can result in deeper privacy compromises than the user hoped to avert.

Similarly, software that encrypts, deletes history files or shields your computer from apparently benevolent, but potentially malicious, applications is available but many users are again unaware of the issues intrinsic to using cyberspace and so do not use such programs. Rogue versions of every kind of protective software exist to exploit the novice user.

Legitimate e-commerce and web 2.0 sites have transparent privacy policies. These sites and others may also use online seals of trustworthiness and browser certificates that demonstrate credibility. However, such statements and badges are only useful if the companies that display them adhere to the underlying principles. Any company could wear a badge of honour, and yet even large, well-known companies do not necessarily comply fully with their own privacy policies and they allow trust certificates to expire. something many users simply ignore without realising the implications.

“Unfortunately, there is no ‘single’ solution to stop the erosion of privacy in cyberspace – no single law that can be proposed or single technology that can be invented to stop the profilers and spies in their tracks,” Bhasin re-asserts. He concludes that, “The battle of privacy, of course, must be fought on three fronts – legal, political and technological – and each new assault must be vigilantly resisted as it occurs.” Whether or not individuals will ever have the weaponry to win the battle is a different matter, we can try, but I suspect the only truly private approach is that bundle of cash stuffed in your mattress.

Most Commented Posts on Sciencebase

If you have ever wondered what gets people chatting on the Sciencebase Blog and why the site has now almost reached the 5000 passed the 3000 newsfeed subscriber point, then you might like to check out this selection of recent posts that, according to a neat little WordPress plugin are the posts with the most comments. Actually Alex King’s Popularity Contest can do the same thing.

It makes for interesting reading, not least because it reveals just how diverse the posts are that catch your interest. You can see a selection of the top-commented posts compiled in January 2008 below.

Some posts obviously pique the curiosity of school students working on science assignments, others reach out to those interested in new avenues of research in medicine, and yet others touch the raw nerves often exposed in the evolution-creation debate. Some, like the Lego and the mp3 player items simply entertain and provide a little education at the same time.

I may make this a regular feature, so watch out for an update soon and
don’t forget to click the titles that catch your eye most and leave your own comments to help keep the debate rolling along.

Spammatical Errors

I usually ignore the comment spam folders on this website as per my own advice. Occasionally, however, I will scan them quickly. I do so if a regular reader has commented and has emailed to say that their comment is yet to appear. Legitimate words do sometimes get caught in the Akismet netting. I can then add the individual to the filter whitelist and approve the comment.

Spam comments usually come in one of a few limited types. The first is the straightforward nonsense list of random lewd keywords, Rx ingredients, and messages pertaining to the impossible enhancement of various organs, and it is not to Messrs Hammnond nor Henry Willis and his Sons to which I am referring here. The second type is the bizarre one-word message saying: “Cool!”, “Nice,” “Sorry,” and “Interesting”. When you first see this kind of message, they may give a blogger a little ego boost (about 0.000003154%). But, after the 10,376th you begin to doubt their sincerity, especially as they are usually accompanied by links to lewd keywords, Rx ingredients and the enhancement of various…you know the rest.

Anyway, there is another kind of comment spam and that is the kind that resembles a genuine comment but then lets itself down with a stupid link to a dumb site. It’s usually a brief sentence or phrase. Sometimes it will be an entirely random string of words, presumably scraped from an online text, but occasionally it will seem to actually be attempting to engage in a conversation via a blog’s comments.

You might see phrases like: “Hi Guys! What Your Site Powered By?” and a link to some expensive software, “My brother Tom’s been working real hard all year, but he’s struggling to make ends meet. How do you think he could improve his credit rating?” and a link to a credit card site, or perhaps “Let’s keep in touch we can help each other with sites,” and a link to some unknown web hosting company. Even bizarre queries such as “What effects did katrina on mississippi?” with an insurance link appear every now and then.

Of course, at this stage in blogging history, most bloggers recognise these messages as detrimental to their sites as, once again, they will have the enhancing, Rx and lewd keyword links built in. But, it’s the unusual style in which some are written that intrigues me. I don’t think it says anything much about the psychology of spammers, especially those that are nothing but spewing bots, nor about anything deep taking place in English lessons. They are intriguing in how sophisticated might be the phrasing let down by a slip of syntax or grammatical integrity.

For instance, a recent commenter was able to construct the following quite complex sentence: “Your website is beautifully decorated and easily navigated.” and yet they blew it with their second line: “I have enjoyed visiting the site today and visit again,” which unfortunately doesn’t parse. Similarly, “Some nice article here. thanks for it.” Not only starts a “sentence” with a lower case “t” but there is a serious mismatch between the quantities discussed.

Admittedly, some of the less exact grammar comes from spam originating in parts of the world where the native tongue may not be English. Personally, I would be useless at spamming in Portuguese, Mandarin, Hindi, or any of a few dozen other languages. I could probably scrape through with a spam in French, German, Italian, or Spanish, although I’d have to have an international lewd word dictionary to hand to do so.

In the following comment spam, there is almost subtle use of the word “seldom”, but it lies in stark contrast to the quality of grammar in the rest of the phrase: “This is really fresh idea of the design of the site! I seldom met such in Internet. Good Work dude!”

An easy target is this comment, which appears repeatedly: “I’d prefer reading in my native language, because my knowledge of your languange is no so well. But it was interesting! Look for some my links”. Yes, if that one had escaped Akismet and I’d approved it I can just imagine readers dashing off to look for those links, which, you guessed it, pointed to some great insurance deals on organ enhancement drugs.

PubMed Central Submission Now Mandatory

The US’s National Institutes of Health (NIH) has a Public Access Policy that is set to become mandatory following President Bush’s approval on Dec 26th 2007. This change will mean that NIH-funded researchers will be obliged to submit an electronic version of any of their final, peer-reviewed manuscripts to PubMed Central, as soon as the paper has been accepted for publication in a journal.

Many researchers are pleased with the move and Peter Suber outlines the implications in detail in the January issue of the SPARC Open Access Newsletter. Citing the cons are several of the non-OA publishers who claim that NIH has no rights over the intellectual property of the science it funds and that research papers should remain the copyright of the publishers. They argue that the value added by the publication process will effectively be handed over to PubMed Central by the submission process without compensation. Others argue that the publishers have had it too good for many years.

There remain several outstanding issues which will no doubt be argued over in the months to come.