Archaeology and…machine learning?

Studying 2600-year old artifacts with algorithmic techniques

With the increase in use of machine learning and artificial intelligence in every domain, it is now commonplace to find reports about how humans are likely to become increasingly otiose in the coming world. A more clear-eyed analysis of how technology is being used, however, reveals that these pronouncements are still very much premature, and that an alternate (and more plausible) outcome is one where technology doesn’t replace but supplements human labour in complex ways.

An excellent example of this kind of work is documented in a 2016 Proceedings of the National Academy of Sciences paper by a team from Tel Aviv University. A central question in biblical scholarship concerns when exactly the various parts of the Bible were written, which is made particularly complex because we have so little background knowledge information about life 2,500 years ago. Some traction on this was made through the innovative use of machine learning algorithms to try to determine the level of literacy in the community, giving us an idea of whether people in that community would be capable of producing a work of enormous complexity such as the Bible. The project considered 16 inscriptions found in the area of the desert fort of Arad.

Each of these was an ostracon or a piece of broken pottery used to write on, like in the figure above. Notice how it is chipped, meaning that traditionally only brief excepts are present. In addition, over time the writing can fade, making reading it difficult, let alone comparing and contrasting different pieces. That’s where the tech comes in.

After restoring the script as much as possible, the researchers used machine learning software to identify individual characters and then compare the same letter on different Ostracons on a range of metrics like overall shape, the angles between strokes, the character’s center of gravity, as well as their horizontal and vertical projections. Allowing for some range in handwriting variability, the programme would identify distinct authors through letters which exceeded a threshold of difference. Through this method, the authors concluded that there were a minimum of six authors for the artifacts they had.

This was clearly a case of machine learning performing tasks that humans cannot even dream of doing with their naked eye, and someone who wanted to push the narrative of a coming apocalypse of job losses for human beings can treat this research as confirming their world view. But a closer examination of the variety of methods indicates a slightly more complex story.

Although the programme did identify at least six authors, this fact by itself says very little about the extent of the literate population—after all it could have been the case that only six people in the area had been literate or it could have been that a lot more people were. To make inroads with regard to this question, the results of the application were analyzed by human researchers and a model of the hierarchical relationships between the authors and intended recipients of each message was constructed:

Since there appeared to be people from every sociopolitical strata represented, the authors concluded that it was likely literacy was widespread among the inhabitants of the area in the kingdom of Judah near Fort Arad in 600 BCE.

For the wider audience, the lesson from this study is that we shouldn’t be too certain that machine learning and AI will mean the end of jobs, since there is still the possibility of modifying older ways of working that incorporate technology while still relying substantially on human minds and hands. The effects of the coming machine learning revolution should not be prophesied about in general terms, but instead we should engage in nuanced studies and projections of individual fields and sub-fields.

The future is neither completely opaque nor transparent, and what we can glean about it is almost definitely going to be fragmentary, tentative, and context-dependent, instead of a single grand narrative.


Preview of Society for Scholarly Publishing and BookExpo

Next week in the US, two big annual events will take place—The Society for Scholarly Publishing Annual Meeting in Chicago and BookExpo in New York.

The focus of the Society for Scholarly Publishing’s Annual Meeting is “Scholarly Publishing at the Crossroads: What’s working, what’s holding us back, where do we go from here?” and, as they celebrate the organization’s 40th anniversary, the meeting will focus on past and future practices, technology, establishing and reaching new markets, and how publishers keep up with the changing needs of researchers and academics as both authors and users.

This year’s BookExpo is “Reimagined,” according to parent company Reed Exhibitions, BookExpo will become the “first end-to-end business solution for the global publishing industry,” with attendees experiencing “how content creation, rights trading, retail strategy and consumer behavior will increase profit and give you the tools to succeed in today’s shifting marketplace.”

The two events highlight how far scholarly and STM publishing have come in embracing technology in the workflow and address user needs as they are in today’s world, whereas trade publishing continues to focus on print vs. digital, metadata, and predominantly on adapting an existing system rather than creating something entirely new.

Below are our highlights from both events’ programs, a selection of the events which will help publishers improve their business structure.

SSP Annual Meeting

Wednesday, May 30th

8:30–11:30 Pre-Meeting Seminar: Humans, AI, and Decision Making: How Do We Make Use of Data, Text Mining and Machine Learning for Better Decision Making

AI represents a suite of technologies that are already supporting and assisting human decision-making in a whole host of settings. In this seminar, we’ll discuss some of the ways in which publishers and institutions are using big data, semantics and analytics to make smarter strategic decisions.

Thursday, May 31st

10:30–12:00 pm Artificial Intelligence: How Publishers will Benefit from Artificial Intelligence?

Smart publishers are beginning to embrace AI and are weaving it into the core of their business—to source new content, to inform and improve content and for new product development. Publishers are also using AI to reduce costs in their editorial processes.

3:30–4:30 pm Strange Bedfellows: Integrating Editorial and Sales to Maximize Success

The scholarly communications landscape is increasing in complexity. Publishers can no longer afford to allow departments to operate in silos. Sales colleagues at a publishing house need to understand the goals and objectives of their Editorial colleagues—and vice versa—in order to make the most of market conditions and partner effectively.

Friday, June 1st

11:00–12:30pm New Tools and Trends in Discovery Technologies

With over 2.5 million scholarly articles published each year—more than 8,000 each day—the glut of available scholarly content poses challenges to researchers, authors, publishers, and libraries. For authors and publishers, getting their work discovered and read, and ultimately cited, can be a career-defining challenge. Libraries compete with the open web by providing enhanced discovery services which they hope will be valued by their users. No single solution has emerged to satisfy all of these needs.


Thursday, May 31st

9:45am Leadership Round Table: Publishers on Publishing

This roundtable will feature CEOs from top publishing houses, including Markus Dohle, CEO of Penguin Random House; Carolyn Reidy, President and CEO of Simon & Schuster; and John Sargent, CEO of Macmillan in a powerhouse presentation that will surely be a highlight of BookExpo. Together, these leaders will reflect on industry trends, market highlights, and the power and responsibilities of publishers as global, corporate citizens. Maria A. Pallante, Association of American Publishers President and CEO, will moderate.

11:00am The Content Liberation Movement

Even well into the digital age, publishers have persisted in maintaining processes that confine their businesses to a specific format (usually, the book) and to a single business model. Forward-thinking editors today demand freedom to reuse and repurpose content in innovative, high value ways, especially on mobile devices. Content management systems, though, aren’t fast enough at identifying assets and don’t go far enough when assembling new products.

1:00 pm The State of the Publishing Industry Today

Join Jonathan Stolper, the President of NPD Books, as he breaks down the latest outlook for the US book market. Drawing on data from NPD’s BookScan, PubTrack, and Books & Consumer platforms, this presentation will deliver essential insights into the latest trends from book publishing’s most authoritative source of industry information, including:
 • A recap of key industry performance in 2017/2018
 • The significant trends in content and platform
 • The outlook for digital versus print in the next few years
 • The opportunities (and risks) for publishers and retailers in 2018 and beyond

Friday, June 1st

12:00 pm KeywordsEnhance Discoverability and Increase Sales on Amazon

Hear from technology experts & publishers how they are using the latest machine learning and AI technology tools to increase discoverability, drive sales and help make effective marketing decisions.



Stalking the Muse with Kanye West

A technological response to the question of the origins of creativity

Human beings have always had a close affinity to art. Our humanoid ancestors etched shells hundreds of thousands of years ago, and we have continued to make and celebrate artistic achievement in an unbroken line since then. But this importance placed on art inevitably raises a question—where does creativity come from?

In Plato’s Ion, Socrates faces the rhapsode Ion, a performer of epic poetry, and argues that while his talents were indeed impressive, they were not the application of any skill. Rather, it was divine inspiration coursing through his mind:

Many are the noble words in which poets speak concerning the actions of men; but like yourself when speaking about Homer, they do not speak of them by any rules of art: they are simply inspired to utter that to which the Muse impels them…for not by art does the poet sing, but by power divine. The poets are only the interpreters of the Gods by whom they are severally possessed.

We might reject this as quaint, but what it gets right is that creativity is not generated by an insular process cut off from others and the past, but rather through the interaction of the artist with something outside of the artist. But what Plato wholly attributed this “something” to the work of the Gods, we now partially attribute to prior art itself.

As a culture, we note that often creative work is part inspiration and part adaptation, with artists drawing on earlier work that may have influenced their novels, plays, or films. For example, when the smash-hit musical “Hamilton” first appeared on the scene, multiple mainstream sources Slate, Vulture, The Guardian, The New York Times traced the influences that inspired Lin-Manuel Miranda to create such a groundbreaking work.

Some artists very clearly outline their influences, such as beloved children’s book writer and illustrator Maurice Sendak. He made no secret of his antecedents and sources, and instead wore them on his sleeve. When reading a biography on William Blake, with a rare honesty, he stated:

I read Blake because I want to schlep something from him that I can eat raw, have…Why am I clinging to every word Blake says in this book? I’m trying to suck all his strength out.

And it wasn’t just Blake he was drawing from. It was his standard modus operandi, a part of his creative process:

The muse does not come pay visits, so you go out stalking, hoping that something will catch you. Where do I steal from?

While these might suggest that Sendak was simply borrowing other people’s ideas, the real story is far more complicated. Sendak’s “stealing” was not merely appropriation, but a transmutation of prior work into something unseen. We can note the influences, but no one who has read Where the Wild Things Are or In the Night Kitchen can deny that these were Sendak originals, unquestionably terrific and original works of art.

Sendak shows that even if we draw heavily on past works for inspiration, our art can be wholly our own and new

Or to put a modern spin on it:

This idea of inspiration sparked PageMajik’s newest idea an AI engine that analyzes scenes and points out similar contexts and ideas in the works of great authors. For example, if a dramatic scene involving a dysfunctional family was being written, you might be shown brief excerpts from A Long Day’s Journey into Night or August: Osage County.

Why would this be useful for publishers?

With the threat of plagiarism or reusing material that has come up in the last few years, for those self-publishing their work or even for bestselling writers, it looks at new submissions to make sure they don’t match previously published work.

Why would this be useful to writers?

In a way of enabling Sendak-style inspiration, it can provide authors with an opportunity to boost their creative ideas by highlighting excerpts in similar work that might help them figure out a plot point or a way to interpret the scene in a new and interesting way.

By making overt some of these influences, this system can ensure that what’s being written really does vary from earlier texts and isn’t just an accidental copy.

As someone who firmly believes we can’t know how good a tech idea is until multiple people use it independently over a decent period of time, I can’t wait to see how this works out.

Journalism in the Age of AI

How technology is upending how we produce and consume the news

An Olympic Achievement

The Washington Post’s coverage of the 2018 Winter Games in PyeongChang was somewhat unusual. Glancing through their social media page, the articles and updates might not have looked particularly different. But that was precisely what was unusual. For, you see, it was not composed by a human reporter.

The Washington Post Olympics Bot (@WPOlyBot) generated constant updates on Twitter during the Games, letting viewers stay on top of the latest developments. The updates included announcements about events that were beginning soon:

A line about the winners of events and any specific achievements:

And even a periodic calculation of the cumulative medals won by various countries:

These updates are based off data from sports data companies, and ensure total coverage while not burdening real journalists or relying on human speed and reporting accuracy. While this certainly eased the burden on the human journalists, this was not meant to be a totalreplacement for them. Rather, it was intended to “free up Post reporters and editors to add analysis, color from the scene and real insight to stories in ways only they can.”

While the benefits of the Olympics Bot are very real, there are also easily spotted limitations. Its data was taken from other sites, which meant it was still dependent on human activity at some point in the chain. Moreover, as you scroll through the twitter feed, you notice that the tweets are themselves somewhat plain, having substituted clarity for style. Human journalists then are still quite essential to journalism.

A Dowsing Rod for Information

Although it has to be conceded that AI cannot simply replace human journalism, it can still be asked whether they can help approach the avalanche of online content produced everyday. One interesting proposal that has been made recently is from a recent paper by Google’s Yinfei Yang and UPenn’s Ani Nenkova where they propose testing for “content density”.

According to the authors, “content density” is a measure of how much information there actually is in a certain piece of writing. It is a way of separating serious information articles from mere fluff, and in this way ensure that readers can focus their finite time and energy as effectively as possible on actual content.

To get a sense of what the difference between informative and non-informative content is, consider an example they provide to illustrate this distinction:


The European Union’s chief trade negotiator, Peter Mandelson, urged the United States on Monday to reduce subsidies to its farmers and to address unsolved issues on the trade in services to avert a breakdown in global trade talks.

Ahead of a meeting with President Bush on Tuesday, Mr. Mandelson said the latest round of trade talks, begun in Doha, Qatar, in 2001, are at a crucial stage. He warned of a ”serious potential breakdown” if rapid progress is not made in the coming months.


“ART consists of limitation,” G. K. Chesterton said. ”The most beautiful part of every picture is the frame.” Well put, although the buyer of the latest multimillion-dollar Picasso may not agree.

But there are pictures—whether sketches on paper or oils on canvas—that may look like nothing but scratch marks or listless piles of paint when you bring them home from the auction house or dealer. But with the addition of the perfect frame, these works of art may glow or gleam or rustle or whatever their makers intended them to do.

Assuming that journalistic conventions will more or less remain the same, the authors designed a classifier that utilizes a machine learning approach to differentiate between informative and non-informative text using lexical features (eg: words and their associated average age of acquisition, imagery, and concreteness) and syntactic features (eg: the flow between sentences in terms of discourse relations and entity mentions).

The classifier then categorized with a 67–75% accuracy a test set of articles from different domains. Admittedly this is not quite 100%, and the assumptions made about steady journalistic conventions mean this cannot just be applied broadly just yet. Still, by showing that a successful model that can select for content density better than chance is possible, Yinfei Yang and Ani Nenkova open the possibility of cutting down on time lost on wading through the ubiquitous fluff we seem to be awash with.

As impressive as content density is as a measure of news-worthiness, a problem it cannot address is political bias. After all, there is no dearth of sites which produce article after article stuffed to the brim with deeply partisan content, so just being able to detect content density is not going to be enough.

Knowhere Else to Go

A startup that tries to deal with precisely this is Knowhere News, which boasts of offering “the world’s most unbiased news”.

The way it works is quite straight forward—the site’s AI engine looks for whatever topic is popular at a given time, scours multiple articles on that topic, and then generates an unbiased version of the news. Since there is no journalism required, the actual writing can take as little as 60 seconds!

To work around the fact that not all news sources are equally reliable, human input is required to pre-set points for trustworthiness to value reliable sources over fringe views.

For political stories, Knowhere News even produces two additional articles for the left and the right in addition to the impartial version. For example, the headlines of a recent topic were:

Impartial: Whistleblower on Trump lawyer finances says records are missing

Left: Whistleblower on Trump lawyer finances fears cover-up

Right: Person who leaked Cohen’s financial information questioned

This example really emphasizes how important such a tool can be in our era of hyper-partisan politics. But its limitations are also clear—for one, this too will depend on the work of human journalists to create a mass of articles it can work on.

More importantly, an assumption animating this project is that the “impartial” or view from the center is the most appropriate one to take. While this might very well be true in some cases, there is a risk of legitimating extremist views if we are always willing to meet in the middle. For example, if one political party starting moving towards fascism while the other remained moderate, the impartial view generated by Knowhere News would be a moderation of fascist claims instead of its repudiation. It is important to recognize that while moderation and conciliation are valuable ideals, they can be taken too far.

A Future of Robot Journalism?

Admittedly, the examples examined here don’t exactly mean pink slips for journalists just yet. AI still employs machine learning algorithms that rapidly sift through already existing data to provide updates and create bias-free versions. But these algorithms still need human hands to create the material they can draw on.

But let’s not get complacent about AI just yet—there are still many paths through which AI can make inroads to original journalism. As more academics and culture influencers become active on social media, it is conceivable that an AI system might direct engage them through journalistic activity. Interviews conducted over email don’t necessarily need a human asking questions, putting a new spin on the Turing test. And with the network of cameras and audio devices like phones present in every locale in every community, there might even be enough raw data for field reporting by AI some day.

Granted, the technology that will be required for these advances don’t even look close to materializing just yet. But given the speed at which technology has been overturning entrenched assumptions, it might be hubris to be too cocky about its limitations.

GDPR—how publishers can navigate the choppy waters

If you live in Europe, the odds are that in recent months your inbox has been inundated with emails from pretty much every company you’ve ever had dealings with, asking you whether you’d like to continue to hear from them, or “opt in”. From monthly newsletters to special offers, curated content to advertisements, we as consumers have become accustomed to having our data harvested by companies who then target us with tailored and untailored marketing messages to promote their products and services.

You may have unwittingly forgotten to untick a box when you purchased flights five years ago and have been receiving weekly emails from the airline ever since. Or perhaps in order to log into a café’s WIFI once-upon-a-time you were subsequently asked to subscribe to direct mail from them in exchange for a silky-smooth internet connection. And now, finally, you are being given the chance to right all those wrongs and do away with all those unwanted or unsolicited emails once and for all. You may ask why this is happening and why are you being given the golden opportunity to finally cleanse your life of spam. The answer is GDPR.

What is GDPR?

During the course of the last year, citizens of Europe have been collectively rolling their eyes every time they hear any mention of something called the General Data Protection Regulation, or GDPR, as it has become more affectionately known. Coming into effect next week, the new regulation in EU law addresses data protection and privacy for all individuals in the EU, aiming to give consumers control over their personal data and to simplify the regulatory environment for international business around the continent.

In essence, this means that businesses who directly contact consumers can no longer do so without renewed affirmative consent and recorded approval from the individual before any further data is collected. In addition, consumers can demand that any data held on them can be accessed, amended or completely deleted whenever they like. Any failure to comply with the new regulations could result in hefty fines and crippling penalties for businesses from the Information Commissioner Office (ICO).

Why should publishers care?

While consumers click “unsubscribe” and “opt out” en masse, what are the key implications for businesses? And more specifically, how are publishers likely to be affected by the new legal framework?

First and foremost, and perhaps inevitably, any company which collects consumer data and then uses it to communicate directly with them will see the impact of their direct marketing efforts weakened dramatically. While most publishers operate as B2B entities working through retailers, many have, and still do, conduct direct-to-consumer (B2C) marketing and sales activity. Some publishers, particularly those with recognisable and strong consumer-friendly brand identities, have had great success at building networks and communities around their content and marketing directly to book buyers. And it is these publishers who will need to be most wary of GDPR as it comes into play.

Another thing to consider is that GDPR extends beyond a company’s proprietary systems. If a publisher is using a third-party ecommerce system, for example, it is automatically considered an extension of their own customer database. Therefore, it is the responsibility of the publisher to ensure that those system providers, which harvest customer data on their behalf, are also taking measures to be fully compliant with GDPR.

Surviving the data minefield

While the ICO has sought to reaffirm on several occasions that GDPR should not be a cause for panic, the office has also stated that inaction is not an option either. Legal experts in the industry are suggesting that the first step publishers should take is to conduct a data audit to recognise what kind of personal data they hold, where it came from and with whom it has been shared, and to make efforts to track and document the relationship the company has with each individual.

If the publisher would like to continue engaging with consumers as it has done previously it will need to establish an opt-in/opt-out consent mechanism for both new and existing customers, and carefully record every communication it instigates with these individuals as well as any data collected on them in the future. Steps should also be taken to update privacy policies and notices on company websites and other relevant legal documentation.

Finally, as it will become more challenging for publishers to proactively engage with consumers through direct channels, it is highly likely that they will have to instead put more focus on search and discoverability. To this end, ensuring that metadata is accurate and that a publisher’s content is ubiquitous across every possible channel will never be more important.

There are many ways in which GDPR will likely impact the way publishers go about their day-to-day business, and a week ahead of deadline day it’s still not too late for companies to get more informed, seek legal advice and start taking the necessary steps to become more compliant.

What are Smart Contracts? (And Why Do Publishers Need Them?)

In last week’s blog post, we discussed how blockchain can help publishers increase revenue by automating rights information and creating “smart contracts” which could speed up the sales and licensing process. But, what exactly are smart contracts, how are they generated, and why should publishers consider using them?

Originally coined by developer Nick Szabo in 1995 in an article called “Smart Contracts” in Extropy magazine, smart contracts can digitally facilitate, verify, and enforce an agreement between two parties in a trackable way using algorithms. Each party can see the progress of the other throughout the course of the contract process, without needing to be in the same room. As described by Tsui S. Ng in Business Law Today, “The term ‘smart contracts’ refers to computer transaction protocols that execute the terms of a contract automatically based on a set of conditions.” By translating the contract terms into a series of if-then functions, the smart contract is able to respond as each condition is met and move on to the next. Legal agreements can be struck almost instantly.

Though Szabo originally thought of the idea for smart contracts in the mid-1990s, it has only been through the use of blockchain that smart contracts have begun to be utilized in the marketplace. Blockchain provides the security, the real-time tracking, and accountability that allows smart contracts to be more viable for important transactions. And, these smart contracts could help companies become quite lucrative.

According to an article in Forbes, “Accenture research published at the start of 2017 showed investment banks alone could save up to $12 billion per year by adopting blockchain and smart contracts.”

For publishers, the world of contracts unfortunately continues to be predominantly ruled by paper, creating a lag in transactional payment and royalty collection. But, that doesn’t have to be the case going forward.

With the security and speed of smart contracts, publishers could dramatically change their business. “Smart contracts don’t just contain the terms of a contract but also can act in programmed ways, delivering aspects of an agreement once specific terms are fulfilled. If connected to additional resources, such as distribution networks as well as online and physical stores, the contract could automatically deal with recouping costs and paying royalties,” Tom Cox, development director for IPR License, wrote in a piece for Publishing Perspectives last fall. “If the contracts were sophisticated enough, the complex area of royalties could be handled in almost real time by the system.”

For publishers who are finding that rights transactions are even more essential to their bottom-lines, implementing a system that uses smart contracts could revolutionize their business and greatly increase revenue.

Blockchain and the Future of Publishing

In the last six months, the term “blockchain” has been cropping up in publishing conversations—at the London Book Fair earlier this month and both last week’s STM Conference and Book Industry Study Group annual meeting. As these conversations occur, it is becoming clear that to many publishers the term is as foreign as “metadata” once was, with publishers unclear as to if and how this technology will impact their business. In our series on blockchain, we thought it might be helpful to start by taking a step back and defining what blockchain is, before sharing how it can change publishing for the better.

Blockchain is a decentralized, digitized series of information blocks shared in a peer-to-peer network. Each block includes information from the previous block, a timestamp, and transaction data, all providing a unique and unalterable chain of information. Blockchain is the technology behind the popular cryptocurrency Bitcoin, and, for the publishing industry, it could change the way business is transacted by helping solve many of the issues currently plaguing publishers, from rights management to piracy.

This is true not only for the scholarly publishing community, but independent, trade, journal, magazine, and any other kind of content publishing. Because blockchain technology is decentralized and secure, the most practical, long-term impact of its use in publishing will be to allow researchers, members of a publishing house, writers and publishers to work on the same platform at the same time, providing their individual input and ensuring universal access and secure collaboration. Blockchain allows all parties to work at the same time, see what changes have been made, and have those changes attributed to the appropriate party.

One of the key ways that it can become immediately useful is through digital rights management. A time-consuming and difficult job for the licensor is tracking down ownership, permissions costs, and locating an appropriate person to speak to about licensing content and photos. For the licensee, it is often a challenge to accurately track usage of content once access has been granted, meaning potential lost financial opportunity.

Through blockchain management of rights, content can be embedded with rights information, and smart contracts can be created that allow for easy sharing, licensing, and usage. For publishers, this will increase revenue not only through automating the rights information and freeing up staff to do other high-level work, but also by helping keep track of important contractual components including monies due and rights availability. As demand for more granular rights increase, this type of technology will be even more vital for proficiency with sales transactions, tracking, and reporting, and ultimately to the publisher’s bottom line.

Because of these advances and their opportunities for publishers, we are currently implementing blockchain into the next version of our workflow management system, PageMajik, to continue to improve the free flow of information into the marketplace by easing the workflow constraints and reducing many time-consuming tasks in the publishing value chain. To users of PageMajik, their workflow will not be impacted but their work will be much more secure. By improving these systems and giving writers and publishers the ability to easily write and publish their work, we hope to help change the future of publishing.

Blockchain and STM—a marriage made in heaven?

Two weeks ago, in our blog post the AI Elephant in the room, we welcomed the fact that blockchain was to be discussed at the London Book Fair for the very first time. This week, as the crowds descend upon Philadelphia for the STM US Annual Conference, blockchain is once again on the menu, however, less as a starter and more as a main course. This is the second year running that the STM Association has featured the topic in its conference programme, and it follows a similar session at the APE Conference in January where it was also on the agenda.

It perhaps comes as little surprise that the STM sector is somewhat ahead of the curve on conversations around blockchain innovation. Whether STM is riper for disruption, open to change, or just more in need of it remains to be seen, but over recent years, in spite of many bemoaning slow rates of change and adoption, we’ve witnessed a great deal of effort go into transformational technology in STM, specifically in areas like Open Access, discoverability, metrics and impact measurement, and peer review.

So why is blockchain such a hot topic in STM right now? What kind of blockchain innovations can we expect to see? And how does the industry stand to gain from them?

It’s all about trust

The fact that STM had a head-start on blockchain may quite simply point to a greater need for it. In October 2017, publishers took academic social networking site ResearchGate to court for mass scale copyright infringement. It was the most recent in a long line of high profile cases which have highlighted the flaws in a system still grappling with the new normal of Open Access, social media and big data.

The industry is plagued with disputes around ownership, provenance, authenticity and credibility, and battles are regularly fought around the plagiarism and misappropriation of scientific endeavours. STM’s history of trust issues, who-said-what clashes and copyright court cases, makes it the perfect stomping ground for blockchain technologies. Whether new industry-wide initiatives driven by blockchain are rolled out or companies start to embed parts of blockchain technology as part of their individual ecosystems, scholarly communication could undoubtedly benefit from unequivocal, time-stamped records for every submission, citation, edit or transaction taking place along the chain. If any industry could do with a “Network of trust”, which is what the STM Association is billing blockchain, it’s STM.

Another area of STM publishing where many are predicting blockchain will make inroads is peer review. Whilst widely considered the bedrock of academic publishing, traditional peer review frequently comes under fire, particularly for slowing down the publishing process. In Blockchain for Research: Perspective on a New Paradigm for Scholarly Communication, a paper published by Dr Joris Van Rossum of Digital Science, he suggests that: “The peer review process could greatly improve through blockchain and data underlying the published results could be made available. This would not only improve reproducibility in general, but also allows reviewers to do their work more thoroughly.”

Meanwhile, as new wave journal publishers, like UK-based Veruscript, seek to reward reviewers in an effort to make the peer review system more efficient and streamlined, there would inevitably be scope to implement Bitcoin type technology to facilitate this process.

Blockchain in action

Last week, Digital Science announced its first round of Blockchain Catalyst grants, which are awarded to “any project implementing blockchain in a scholarly or scientific context, especially those that address the dissemination of research”. The initiative was established to find, support, fund and fly the flag for those using blockchain to innovate within the sector.

The publication of the first two projects to be awarded this grant provided a fascinating insight into where and how we might see blockchain technology applied to research in the not so distant future. Hong Kong-based Datax are developing a data crowdsourcing and exchange platform while VIVO from the US are working on a value recognition tool which rewards and incentivises researchers for their contributions.

Equally exciting is the new pilot initiative from ARTiFACTS, which launches this week, using blockchain to record a “permanent, valid and immutable” chain of records in real time, from research to peer review to post-publication.

Scholarly publishers are also discovering that blockchain can offer plenty of benefits in terms of helping them fine-tune and automate day-to-day processes. In a business like STM journal publishing, where a publisher is likely to have a range of journals to manage, with multiple articles and papers on the go, and teams of staff working across editorial and production, blockchain can offer a lifeline when it comes to version control, providing clarity on ownership and navigating digital rights management.

In the world of STM, blockchain makes perfect sense. There are several very obvious areas where this technology could be applied to great effect while making a huge impact and not necessarily forcing scholarly publishers to reinvent the wheel. It’s refreshing to see new initiatives incorporating blockchain being trialled, while others are in the works, and perhaps unsurprising to see STM as the market sector forging ahead and testing the waters before others.

The AI elephant in the room

Two years ago, almost to the day, Oxford University Professor Nick Bostrom, the Founding Director of the Future of Humanity Institute, addressed the crowd at The London Book Fair’s Quantum Conference and gave a riveting keynote talk entitled “The Machine Intelligence Revolution”.

During his presentation, he compared the likely impact of machine intelligence to that of the industrial revolution—with the latter automating manual labour and the former automating intellectual labour. He also predicted that its legacy and impact on the human condition will be even more profound, and that by 2040 we will see machines capable of human-level intelligence, and very shortly after, machines achieving super-intelligence.

While the audience at the time was familiar with terms such as Big Data and augmented reality (AR), the discussion was probably the first time many had been introduced to concepts such as AI and deep learning. On that day, Bostrom didn’t tackle the elephant in the room: “What impact will the machine intelligence revolution have on publishing?”, but the future-gazing talk put the subject on the map and gave the industry something to think long and hard about.

At the time, several delegates dismissed the content of his talk as the stuff of science fiction, a million miles away from their day job of publishing books. For some others, however, it was the starting point of a journey of introspection, where they started to ask themselves important questions such as: How can publishers benefit from machine intelligence? What will the publisher of tomorrow look like? What are the key skills which will be needed? Which roles are likely to be affected by this machine intelligence revolution? And when and how will we need to adapt our models and working practices?

Fast forward two years and, bar a few presentations from technology brands at LBF’s technology stage, Artificial Intelligence seems to have lost its prominence in the seminar programme. It remains to be seen whether or not this is because the industry is more concerned about perceived pressing short-term issues like cashing in on the growth of audiobooks and navigating global economic issues such as Brexit. It is also unclear whether and to what extent publishers today have a clear idea about its practical applications, how it will affect their businesses, and how they will adapt their practices to accommodate it, instead of being disrupted by it.

In spite of this omission, and while Bostrom’s elephant in the room arguably still remains (particularly outside technology circles), significantly topics such as Blockchain and crypto culture have come to the fore in this year’s LBF seminar programme. It is refreshing to see certain pockets of the industry, such as the academic, children’s, and the self-publishing markets, leading the way and debating these innovations on this global stage. Here are PageMajik’s top ten picks from LBF’s speaking programme, for those looking to expand their minds and look to the future:

Discoverability, Superabundance and How to Rise to the Fore

Monday, 9 April 2018, 11:30–12:15

Quantum Conference (the conference centre)

Use your Data to Drive Revenue

Tuesday, 10 April 2018, 13:00–14:00

The Faculty

Blockchain For Books: Towards An Author Centred Payment Model

Tuesday, 10 April 2018, 14:30–15:30

Olympia Room Grand Hall

Taking the Fear Out of AI: Machine Versus Human, or Technology Enabler for Humanity

Tuesday, 10 April 2018, 15:15–15:45

The Buzz Theatre

Bringing Blockchain to Publishing: Funding Books Like Never Before

Tuesday, 10 April 2018, 15:45–16:30

Author HQ

Scaling Foreign Rights and Reprints With Automation

Tuesday, 10 April 2018, 16:00–17:00

International Export Theatre

Small Steps, Giant Leaps: The Digital Transformation Experience

Wednesday, 11 April 2018, 13:00–14:00

The Faculty

Meeting the Changing Needs of Academic Publishing

Thursday, 12 April 2018, 11:30–12:30

The Faculty

Get A Self-Publishing 3.0 Mindset (ALLi)

Thursday, 12 April 2018, 11:45–12:30

Author HQ

Disruptive Publishing

Thursday, 12 April 2018, 14:30–15:30

Children’s Hub

How the prospering independent publishing sector can become even more prosperous

As indie publishers gather later this week in Austin, Texas, for the annual IBPA Publishing University event, attendees will be buoyed by all the positive news and buzz currently enveloping the sector. Indie presses around the globe are reporting strong growth figures year-on-year. In the UK, Inpress revealed a 79 percent increase in sales across 60 small publishers at the back end of 2017. Meanwhile in Publishers Weekly’s annual feature on fast growing US independents last April, half of the companies featured reported triple-figure growth, making 2017 the strongest year for the sector since the publication started its deep dive report 20 years ago.

In a world where big name bestselling authors get snapped up by commercially savvy publishers for seven figure advance deals, and lesser known names flock to Amazon’s self-publishing platforms in the thousands, indies occupy the increasingly important middle-ground.. But what is it exactly that makes indies so appealing? And how can they build yet more on this seemingly unstoppable growth and success?

There’s something about indies

Indies tend to go about things differently compared to your average publisher, often assessing writers and their work on literary merit as opposed to commercial gain. This is appealing to many authors who, aside from wanting to make money, also want to feel that their publisher has love and passion for their work. In addition, indies are also known to take a longer-term view, investing in a writer’s career journey, rather than working with them on a title-by-title basis.

Some authors sign up to indies because they want a publishing house which shares their values and mission, while others have previously published books elsewhere but claim they didn’t receive the editorial input or the attention, commitment and dedication they felt they needed. This is a sentiment echoed by Betsy Reavley, co-founder of Bloodhound Books, in her recent interview with the Daily Telegraph: “Some publishers will get behind a particular writer, spending most of their marketing budget on them and leaving others to languish somewhat. Of course it’s about selling lots of books and making money, but it’s also about being transparent, fair and giving the same opportunity to everyone.”

In essence, author care is very much where the indies excel.

Growing pains

But as independent publishers become larger, growing their author bases and lists each year, the inevitable tends to happen. The more they take on, often without extended resources, the more difficult it becomes to offer consistent levels of care for the author which made them such an attractive proposition in the first place. Time that was previously spent editing manuscripts, accompanying authors on tours, and marketing and promoting their books, is now spent on increasingly unmanageable workflow processes, which become a major drain on resources.

When indies expand exponentially, as they so frequently do, most do not have the appropriate IT infrastructure or tools at their disposal to cope with the dramatically increased volume of books which come their way. Their productivity is hampered and during this process of expansion the publisher’s duty of care to the author, their primary USP, is eroded.

Resolving workflow issues early

The best way to avoid this unpleasant situation from arising is to address the inevitable workflow problems as early as possible. Whether you’re a large publisher or whether you publish less than 50 titles a year, you will eventually find that keeping up with editorial processes, multiple versions, typesetting, proofing, image rights managements, and cover design, across multiple books, becomes arduous and time-consuming. This is the right time to invest in a software solution that can take on the heavy lifting in the workflow.

At PageMajik we work closely with independent publishers of all shapes and sizes to help make their publishing processes simpler. Our publishing workflow productivity tool takes the rigour out of publishing and can boost efficiencies of as much as 40 percent, allowing indies to get back to being indies and do what they do best.