Search and Cite for Science Bloggers

Crossref for WordPress

A couple of weeks ago I was reading a post by Will Griffiths on the ChemSpider Open Chemistry Web blog about how the DOI citation system of journal article lookups might be improved. The DOI system basically assigns each research paper a unique number depending, with an embedded publisher tag. Enter a DOI into a look up box (e.g. the DOI lookup on Sciencebase, foot of that page) and it almost instantaneously takes you to the paper in question. I use the DOI system for references in Sciencebase posts all the time.

There are a few cons that counteract its various pros, for instance, not all publishers use it and among those that do there are some who do not implement the DOI for their papers until they are in print, as opposed to online. Despite that it is very useful and commonly used. Having read Griffiths’ post about OpenURL a non-proprietary DOI alternative, I thought maybe it would be even more powerful if the concept were taken back another step the author level and I came up with the concept of a PaperID, which I blogged about on Chemspy.com. PaperID, I reasoned could be a unique identification tag for a reasearch paper, created by the author using a central open system (akin to the InChI code for labelling individual compounds). I’m still working out the ins and outs of this concept and while a few correspondents have spotted potentially fatal flaws others see it as a possible way forward.

Meanwhile, CrossRef, the association behind the publisher linking network, has just announced a beta version of a plugin for bloggers that can look up and insert DOI-enabled citations in a blog post. I’ve not investigated the plugin in detail yet, but you can download from a Crossref page at Sourceforge.net. the Crossref plugin apparently allows bloggers to add a widget to search CrossRef metadata using citations or partial citations. The results of the search, with multiple hits, are displayed and you then either click on a hit to go to the DOI, or click on an icon next to the hit to insert the citation into their blog entry. I presume they’re using plugin and widget in the accepted WordPress glossary sense of those words as the plugin is available only for WordPress users at the moment with a MoveableType port coming soon.

According to Geoffrey Bilder, CrossRef’s Director of Strategic Initiatives, “CrossRef is helping jumpstart the process of citing the formal literature from blogs. While there is a growing trend in scientific and academic blogging toward referring to formally published literature, until now there were few guidelines and very few tools to make that process easy.” Well that reference to a jumpstart sounds like marketing-speak to me, as Sciencebase and dozens of other science blogs have been using DOI for years.

Whether or not I will get around to installing what amounts to yet another WordPress plugin I haven’t decided. I may give it a go, but if it works as well as is promised you will hopefully not see the join. Meanwhile, let me have your thoughts on the usefulness of DOI and the potential of OpenURL and PaperID in the usual place below.

Sciencebase Upgraded

UPDATE: Since I wrote this post back in February 2008, WordPress has gone through many changes, updates on Sciencebase are automated these days too, which is marvellous.

I finally upgraded the Sciencebase site to the very latest version of WordPress, it had been languishing at version 2.1.3 (can you believe it?) for far too long. There had not only been dozens of security upgrades since that version and the current version 2.3.3 but various new features that the site was not making full use of.

It was a post by Wayne Liew WayneLiewDot.com that persuaded me to do the necessary and his recommendation for using a plugin that automates that whole process was the tipping point I needed.

Having carried out the upgrade (more on the actual WordPress upgrade process here) and found only a few minor problems, like a disordered sidebar, a couple of out-of-date plugins and just one irrelevant dead plugin, and fixed those as best as I could, I figured it was time for a weekend break. So my wife and I headed off to the seaside, abandoned the children with their grandparents and took off with the dog for a well-earned break at an artsy country town on the Suffolk coast. (Photos will appear soon on the Sciencebase Flickr account). Hence this trivial and possibly pointless post.

Back with a more substantial science based post later this week.

PaperID – An Open Source Identifier for Research Papers

As a journalist, I receive a lot of press releases that cite “forthcoming” papers. Depending on the publisher one can usually find the paper in a pre-press state on their website. However, it’s often the case that the DOI does not go live at the same time as the embargo expires on the press release, and so I might legitimately publish an article about the research I cannot use the DOI as the reference and must use the direct URL for the paper. Unfortunately, some publishers then move the paper when the paper publishes, so the link I used ends up broken.

Moreover, this cannot be useful for authors themselves in that a paper that does not make the grade at the International Journal of Good Stuff and ends up being resubmitted to the Parochial Bulletin of Not So Good Stuff will gain a different identification code along the way.

Will Griffiths on ChemSpider was recently discussing the possibility of an OpenURL system. I think we could go one step further.

A simple standardized way of generating a unique identifier for each and every paper that would be transportable between different phases of the publication process from submission to acceptance and publication, or rejection and resubmission elsewhere, would be a much better way of registering papers. The identifier would be created at the point when the final draft is ready to be mailed to the first editorial office in the chain, perhaps based on timestamp, lead author initials, and standard institution abbreviation. It could be the scientific literary equivalent of an InChIkey for each research paper.

There would have to be a standardized validation system, so that authors were sure to be using the right system, but that could be established relatively painlessly through the big institutions, be networked and have cross-checking to avoid duplicates. And, of course, be open source, open access.

The possibilities are endless, PaperID would create an electronic paper trail from author through preprint, in press, to online, and final publication. It might even be back-extended into the area of Open Notebook Science and equally usefully into archival, review, and cross-referencing.

DOI is useful most of the time OpenURL sounds intriguing, but PaperID could be revolutionary.

No Spies Under My Bed

Computer spy

Currently, the only truly effective way consumers can stop the collection of their personal data when shopping is not to use the internet, to be paid and to pay for everything in cash, and to hide their money in their mattress.

More seriously, most of us will continue to use web services despite privacy concerns. You can try to opt-out of marketing schemes or reconfigure your web browser to reject advances from sites that offer cookies or install spying applications. However, most such rejections will prevent you from trading on most e-commerce sites altogether. So, cookies will crumble, there’s no two ways about it if you want to shop online or use web 2.0 interactive sites. You can, of course, use software to delete those cookies as soon as you’re finished your interaction with the site and so gain a little privacy and prevent the sites tracking where you went after you left when you visit a second time. But, either way, they’re going to get lots of useful information while the cookie lasts.

There is no single solution to preventing the increasing erosion of personal privacy on the Web, says Madan Lal Bhasin of the Business School SungKyunKwan University in Seoul, South Korea. Writing in the latest issue of the International Journal of Internet Marketing and Advertising (2008, vol 4, pp 213-240) he describes how new e-commerce technology has increased the ability of online retailers and others to collect, monitor, target and sell personal information about their customers to third parties.

Countless companies across the globe are doing just that in ways that were not dreamed of before the advent of the Web. Moreover, the emergence of so-called social media web 2.0 sites, such as MySpace and Facebook has led to a new generation of privacy issues that go beyond those seen with conventional e-commerce websites.

As such, online consumer and web user privacy is becoming an ever keener focal point among cyber activists as well as among governments and regulators. That said, when it is the governments themselves losing and abusing the personal data of millions of taxpayers (see recent UK news), then the notion of any government protecting one’s privacy becomes absurd. Nevertheless, in the long-term, finding a balance between absolute personal privacy and the smooth operation of commerce and social sites in cyberspace poses a significant challenge.

Bhasin and colleagues point out that are grave dangers for corporations that collect and use personal information, ignoring privacy legislative and regulatory warning signs. Indeed, such abuse could prove to be very costly not only in terms of putative fines from regulators but also through loss of business among customers increasingly aware of their privacy rights. In the worst-case scenario it is not beyond the realms of possibility that a company abusing standard privacy etiquette to the extreme could collapse should word spread and users boycott the site or mount retaliatory attacks of their own against the company’s web servers. Regardless, many companies can and do repeatedly flout the complex rules and regulations that govern privacy in the US, Europe, and elsewhere.

Technology that protects consumer privacy must work without stifling e-commerce. It must somehow be foolproof and be entirely transparent to end-users. Unfortunately, no such technology yet achieves this. There are countless personal software products, such as anti-spyware programs, cookie cutters, anonymous proxies, and other solutions, such as Firefox plugins like NoScript (which blocks all scripting on a website) and AdBlockPlus (which blocks advertisements). These can reduce the chances of private data being sucked from an individual’s web browsing habits.

However, there are hundreds of such programs each with a slightly different purpose. The field is heavily fragmented and many users are not only unaware of these programs they are also generally unaware of the existence of spyware and cookies. An additional problem arises when novice users having heard rumour of spyware, download tools without taking advice. There are well-known legitimate tools available. There are many instances of malware surrogates of those tools that often rank higher in the search engine results pages and so are more prominent. Installation of such rogue programs can result in deeper privacy compromises than the user hoped to avert.

Similarly, software that encrypts, deletes history files or shields your computer from apparently benevolent, but potentially malicious, applications is available but many users are again unaware of the issues intrinsic to using cyberspace and so do not use such programs. Rogue versions of every kind of protective software exist to exploit the novice user.

Legitimate e-commerce and web 2.0 sites have transparent privacy policies. These sites and others may also use online seals of trustworthiness and browser certificates that demonstrate credibility. However, such statements and badges are only useful if the companies that display them adhere to the underlying principles. Any company could wear a badge of honour, and yet even large, well-known companies do not necessarily comply fully with their own privacy policies and they allow trust certificates to expire. something many users simply ignore without realising the implications.

“Unfortunately, there is no ‘single’ solution to stop the erosion of privacy in cyberspace – no single law that can be proposed or single technology that can be invented to stop the profilers and spies in their tracks,” Bhasin re-asserts. He concludes that, “The battle of privacy, of course, must be fought on three fronts – legal, political and technological – and each new assault must be vigilantly resisted as it occurs.” Whether or not individuals will ever have the weaponry to win the battle is a different matter, we can try, but I suspect the only truly private approach is that bundle of cash stuffed in your mattress.

Most Commented Posts on Sciencebase

Most Commented Posts on Sciencebase

If you have ever wondered what gets people chatting on the Sciencebase Blog and why the site has now almost reached the 5000 passed the 3000 newsfeed subscriber point, then you might like to check out this selection of recent posts that, according to a neat little WordPress plugin are the posts with the most comments. Actually Alex King’s Popularity Contest can do the same thing.

It makes for interesting reading, not least because it reveals just how diverse the posts are that catch your interest. You can see a selection of the top-commented posts compiled in January 2008 below.

Some posts obviously pique the curiosity of school students working on science assignments, others reach out to those interested in new avenues of research in medicine, and yet others touch the raw nerves often exposed in the evolution-creation debate. Some, like the Lego and the mp3 player items simply entertain and provide a little education at the same time.

I may make this a regular feature, so watch out for an update soon and
don’t forget to click the titles that catch your eye most and leave your own comments to help keep the debate rolling along.

Spammatical Errors

Akismet traps spam

I usually ignore the comment spam folders on this website as per my own advice. Occasionally, however, I will scan them quickly. I do so if a regular reader has commented and has emailed to say that their comment is yet to appear. Legitimate words do sometimes get caught in the Akismet netting. I can then add the individual to the filter whitelist and approve the comment.

Spam comments usually come in one of a few limited types. The first is the straightforward nonsense list of random lewd keywords, Rx ingredients, and messages pertaining to the impossible enhancement of various organs, and it is not to Messrs Hammnond nor Henry Willis and his Sons to which I am referring here. The second type is the bizarre one-word message saying: “Cool!”, “Nice,” “Sorry,” and “Interesting”. When you first see this kind of message, they may give a blogger a little ego boost (about 0.000003154%). But, after the 10,376th you begin to doubt their sincerity, especially as they are usually accompanied by links to lewd keywords, Rx ingredients and the enhancement of various…you know the rest.

Anyway, there is another kind of comment spam and that is the kind that resembles a genuine comment but then lets itself down with a stupid link to a dumb site. It’s usually a brief sentence or phrase. Sometimes it will be an entirely random string of words, presumably scraped from an online text, but occasionally it will seem to actually be attempting to engage in a conversation via a blog’s comments.

You might see phrases like: “Hi Guys! What Your Site Powered By?” and a link to some expensive software, “My brother Tom’s been working real hard all year, but he’s struggling to make ends meet. How do you think he could improve his credit rating?” and a link to a credit card site, or perhaps “Let’s keep in touch we can help each other with sites,” and a link to some unknown web hosting company. Even bizarre queries such as “What effects did katrina on mississippi?” with an insurance link appear every now and then.

Of course, at this stage in blogging history, most bloggers recognise these messages as detrimental to their sites as, once again, they will have the enhancing, Rx and lewd keyword links built in. But, it’s the unusual style in which some are written that intrigues me. I don’t think it says anything much about the psychology of spammers, especially those that are nothing but spewing bots, nor about anything deep taking place in English lessons. They are intriguing in how sophisticated might be the phrasing let down by a slip of syntax or grammatical integrity.

For instance, a recent commenter was able to construct the following quite complex sentence: “Your website is beautifully decorated and easily navigated.” and yet they blew it with their second line: “I have enjoyed visiting the site today and visit again,” which unfortunately doesn’t parse. Similarly, “Some nice article here. thanks for it.” Not only starts a “sentence” with a lower case “t” but there is a serious mismatch between the quantities discussed.

Admittedly, some of the less exact grammar comes from spam originating in parts of the world where the native tongue may not be English. Personally, I would be useless at spamming in Portuguese, Mandarin, Hindi, or any of a few dozen other languages. I could probably scrape through with a spam in French, German, Italian, or Spanish, although I’d have to have an international lewd word dictionary to hand to do so.

In the following comment spam, there is almost subtle use of the word “seldom”, but it lies in stark contrast to the quality of grammar in the rest of the phrase: “This is really fresh idea of the design of the site! I seldom met such in Internet. Good Work dude!”

An easy target is this comment, which appears repeatedly: “I’d prefer reading in my native language, because my knowledge of your languange is no so well. But it was interesting! Look for some my links”. Yes, if that one had escaped Akismet and I’d approved it I can just imagine readers dashing off to look for those links, which, you guessed it, pointed to some great insurance deals on organ enhancement drugs.

PubMed Central Submission Now Mandatory

The US’s National Institutes of Health (NIH) has a Public Access Policy that is set to become mandatory following President Bush’s approval on Dec 26th 2007. This change will mean that NIH-funded researchers will be obliged to submit an electronic version of any of their final, peer-reviewed manuscripts to PubMed Central, as soon as the paper has been accepted for publication in a journal.

Many researchers are pleased with the move and Peter Suber outlines the implications in detail in the January issue of the SPARC Open Access Newsletter. Citing the cons are several of the non-OA publishers who claim that NIH has no rights over the intellectual property of the science it funds and that research papers should remain the copyright of the publishers. They argue that the value added by the publication process will effectively be handed over to PubMed Central by the submission process without compensation. Others argue that the publishers have had it too good for many years.

There remain several outstanding issues which will no doubt be argued over in the months to come.

40320, Such a Significant Figure

40320, Such a Significant Figure

I am currently writing a post about pico and femto satellites for Sciencebase, these devices are tiny compared to the enormous one tonne behemoths many of us would picture if asked to visualise an artificial satellite (more on that later). Anyway, the earth’s escape velocity at sea level from a standing start was a figure I needed to hand while writing the piece.

I found a value in metres per second, converted to kmh and did a quick search with Google Toolbar just to get some references and to confirm my calculation. The kmh value, as you may have guessed, comes out at about 40320. However, Google’s auto-suggest offered me a search for the phrase “40320 plain bob major”, which was odd, to say the least, but would have been the obvious figure to a bell-ringing friend of mine. He would have immediately spotted it as an astoundingly long peal of bells. In fact, this very long peal was rung in 1963 in Loughborough, England, using eight tower bells in all possible permutations 8 multiplied by 8 factoria (8×8!) would come to 322,560 blows. Apparently, it took more than 18 hours to ring the changes all the way through.

Of course, the peal of 40320 arises because of the 8 factorial connection, 8×7×6×5×4×3×2×1 (8!) and has nothing to do with earth’s escape velocity, but it hooked me on a bit of guided searching looking for other significant mentions of the number 40320.

40320 is the number of minutes in 4 weeks and so February with its usual 28 days, should be designated “International Factorial Appreciation Month” according to one author (except in leap years, such as 2008, of course).

Kentucky 40320 is a spot on Ford Hampton Road in Kentucky, USA.

Item 40320 in the SigmaAldrich catalog of chemicals is 2,2-dimethylglutaric acid and bug number 40320 in Ubuntu Linux – “devhelp starts with an “empty” page area, which is not redrawn”, whatever than means, apologies to Ubuntu fans, I’ve not been there, nor done that yet.

The PubMed ID (PMID) 40320 points to a paper in the August 1979 issue of the journal Tijdschr Diergeneeskd entitled “Relationship between the presence of meconium in newborn lambs and postnatal pH and blood gas tension levels” and Tinyurl page 40320 displays a scan of a cheque for $950 with the filename bloodmoney.jpg.

Assuming Rudolph is at the front, there are 40320 ways to arrange the other eight reindeer (this simply relies on the 8! value mentioned earlier and could apply to clusters of any eight objects). It ignores “Olive the other reindeer”, you know the one who used to “laugh and call him names”. At the time of writing there were 207 cars listed for sale according to Google that had 40320 miles on the clock and just 5 with that same number in kilometres, while according to Cancerwise, 40320 women will be diagnosed with uterine cancer this year.

40320 is the item number for a “please shower” sign at BackyardGardener.com and BIOS 40320 is the Aquatic Conservation course covering global freshwaters, science and policy at University of Notre Dame.

Most of these various facts are totally unrelated, except those invoked by 8! Amazing what you learn writing about femto satellites. If you have any other fascinating examples of the number 40320 please give them a mention in the comments box below.

An Amply Adequate Sufficiency of Tautology

Sign with sharp edges

As Russ Swan of Laboratory Talk pointed out in reference to my previous post on the redundancy of the phrase “male semen”, there are numerous other examples around. For instance, the phrase HIV virus is equally redundant as it literally says, “human immunodeficiency virus virus”, likewise ATM machine (automated teller machine machine), PIN number (personal identification number number) and the Sierra Nevada mountain range (Snowy mountain range mountain range). There are lots more everyday examples of interlanguage tautologies of the latter kind on Wikipedia

But there are plenty of examples in science and technology. For instance, this patent title – RAID array configuration synchronization at power on is just one of many examples that cite the acronym RAID followed by the word array, as if RAID standing for “redundant array of independent disks”. Ironic indeed that the phrase itself contains the word redundant.

HIV virus shows up countless times throughout the media, and no less in scientific journal article titles, such as this one – Prevalence of HIV virus among patients, I even saw the phrase “female girls” in one reference on the subject of Rett syndrome. And, there are plenty of examples along the lines of LED display, LCD display, and DC current.

Not quite a pure rhetorical tautology, the graphics acronym TIFF is often accompanied by the word “file” as in a TIFF file, which literally means “tagged image file format file”. Same goes for the phrase pertaining to Adobe’s almost ubiquitous and much-maligned “PDF format”, which expands to “portable document format format”. Then there are phrases like DOS operating system (disk operating system operating system), Windows NT technology, (Windows New Technology technology), BASIC code (Beginners’ All-purpose Symbolic Instruction Code code), and ISDN network (Integrated Services Digital Network network).

There’s a nice extensive, long itemised listing of redundant tautologies to be found located here, but is there any purposeful point to drawing your attention to these phrases? Not really, but they’re great fun to find so if you discover any others please let me know via the comments box.

Learn to Let Go of Your Spam Folders

Ignore spam

In the spirit of recent posts about conversational spam and other such topics, I thought I’d let you into a little secret. My blog comment spam folder fills up every day but thanks to Akismet you never get to see the spam on the blog itself. Same goes for my GMail account spam folder (I route all email through it for that very reason). You probably find the same. Several hundred spam comments every day and the same again in email spam. It can get out of control during the holiday season when you’re not there to check every day. So, what do with it all?

You have two options: you could quickly scan page after page of spam, which can add up to a lot of time each week looking for false positives (and that’s even if you are greasing the spam) or you could simply learn to let go of your spam folders.

Both Akismet for comment spam and GMail for email spam automatically delete the contents of their respective spam folder once entries reach a certain age. The trick is not to be tempted to keep checking the spam folders, just in case. Just let the filters do their job and ignore the contents. If there are false positives, so what? 99.999% of the stuff that is filtered (once you’ve trained the system by properly assigning definite false positives and false negatives early on) is most certainly spam.

Do you really need to wade through page after page of ads for “lager beasts”, “vI@ gera gel”, and “dr@gs Rx online”? No? Me neither. Just learn to let go and you will feel a weight lifted from your shoulders. After I got back online following the Christmas break (other winter solstice festivals are available), Sciencebase had accumulated 14052 spam comments. One click on “Delete All” removed the whole lot from the blog’s database.

I am sure some readers will have found that no amount of training prevents a regular slurry of false positives, so for those poor unfortunates you may have to ignore this advice.

For those with a 99.9999% miss rate, the forget-about-it approach is such a powerful exercise in self control, it’s almost Zen, although I’m sure the psychologists in the audience will have something to say about that (in fact please do, but make sure your comments don’t look spammy).