- Global Pharma News & Resources

The rise of data objects spells the end for structured document authoring

The rise of data objects spells the end for structured document authoring


A data-first approach to content sharing, using flexible and dynamic data objects, is consigning structured document authoring to history. Generis’ James Kelleher explains how a shift towards real-time data as a single source of truth is fast replacing the traditional structured document authoring approach.
  • Author Company: Generis Corporation
  • Author Name: James Kelleher
  • Author Email:
  • Author Website:
Editor: PharmiWeb Editor Last Updated: 11-Oct-2022

Structured authoring was designed to be the future of document authoring – the ultimate efficiency in presenting information. But a data-driven approach to information sharing is set to address issues with version control and provide a more traceable line back to the master source of intelligence. This does mean that structured document authoring, as a much-anticipated technology proposition, is already obsolete - before it really had a chance to get off the ground.

The original concept of structured document authoring, which dates back to the 1990s, is based on building routine documents from re-usable segments of content. However, this soon comes up against practical limitations. If the approved, reusable content assets are entire paragraphs or sentences, typically these will need to be tweaked for each use case, for instance. With each edit a new version of that content is created, with implications for change management.

In the meantime, the focus of Regulators and of recommended business practice more generally, has shifted towards live data as the primary source of truth, and as a means of transforming processes. This move away from document-based submissions and reporting further erodes the business case for structured document authoring.

Regulators shifting to data submissions

Although regulated documents as official records of a product won’t disappear overnight, their ’Best Before’ date is drawing ever nearer. During the latter stages of the transition to ISO IDMP compliance in the EU, for instance, published documents will be phased out in favour of rolling data-based submissions: data that regulators can choose to analyse in their own way.

Ultimately, data-based information exchange will become the preferred norm for regulatory submissions, PSMF (safety master) files and APQR (annual product quality review) reports. In fact, PV case file submissions in Europe are already submitted in data form.

Strategically, the focus of new content management investments must now be the data itself, and how this is managed so that it can be used more dynamically for publication - without the risk of a future loss of integrity or consistency between the data and any associated narrative.

Next-level structured content authoring places the emphasis on ‘data objects’. That data object might be ‘Study 123 is a 3-day short-dose study in male rabbits’, for instance. Creating a narrative now means pulling in these ‘data objects’ and inserting minimal ‘joining text’ to make the data more readable in a particular context.

Here, if core information changes, updates can be made at a source level and automatically cascaded down through all use cases for that data object, without the need for extensive manual intervention.

This approach to content preparation offers much more dynamism and flexibility than a structured document authoring scenario. With the persisting diversity in requirements between the different Regulatory authorities, this controlled flexibility is very useful.

Collaborating on a unified data source

Moving away from documents and even from reusable content requires a different mindset, and this is probably one of the biggest barriers for companies currently.

Relying less on Word might seem to imply that teams will need to become proficient in XML. Yet this perception is tied up with the traditional treatment of content – in contrast to the new scenario where the focus is the master data and adding to this to enrich associated company-wide knowledge (around a given product and its evolving status), and where editing can be done in the new breed of user-friendly tools whether for data, Word or XML.

This is about teams from multiple functions all contributing to and enhancing one unified data source, rather than each continuing to enter their own particular information of interest into their respective systems (Clinical, Regulatory/RIM, etc).

Leapfrogging legacy tech

At a conservative estimate, working with data objects, there is scope to reduce the effort of producing a final draft for approval by a factor of 10, thanks to the reduced specialist resources and manual steps needed in authoring and version control. That’s in addition to huge savings in the time and effort that would otherwise be needed to manage components – including decisions about the levels of granularity, rules around re-use, traceability of data into the output, and an entire migration of larger documents into smaller documents/components.

Data objects are already streamlining information sharing processes in major industries. The airline and automotive industries, for instance, where precision, rigour and safety are as critical as they are in life sciences, already use trusted data objects to construct content.

There is a real opportunity to skip a generation of automated content authoring and go straight to a data-first approach to process management, leap-frogging straight to a solution that is much more fit for purpose, transformational, pliable and sustainable in the long term. The first step is to gather together data that is good quality, complete and current, and usable for the intended purposes. Then it will be possible to apply automation rules to streamline business and regulatory processes, driving productivity and enhancing patient outcomes.