How to set up an Insight Engine ?


Setting up a new tool is a complex activity.
For sure it requires technical prerequisites to be ready. For instance, good performances, meaningful interface, legacy and daily data are accessible and connectors ready for an easy  data registration.


This is not enough.

Any change require time to adapt and to be  adopted. Which could be fast : For example, we install a new workstation on the lab that is 3 times faster than the old one, there are no barriers to changes.

Unfortunately, data related project are not so simple.  Many experts, like Pistoia Alliance (See box 1) clearly state that only data sharing and collaboration between teams (Break Silos), companies (Collaborations) and expertises (translational approaches) will underpin the future of life science.

Tools are required to support this process; Insight Engines are tools that allow to federate multiple data sources as well as providing visual search experiences. They are the tool required by this new way to do science. You may find more information on why Insight Engines are useful on our last article : “What is an Insight Engine ?”

Our approach is to understand what are the main search directions. Balancing the cost of recognising an information versus the benefits users will gain in term of retrieval. Recognising an information means to be able to link it to a clear source like an ontology or a Master Data.

For example for a company that is structured into projects, resolving globally the project code is a worthwhile endeavor. Other IDs like compound IDs or study IDs are more specific and needed only for some department. Finally, other metadata like organisms, strain, tissues, thérapeutic target,symptoms, … can be specific to activities and their recognition can be done only there or automatically.


But things are not carved in stone : this is about testing things and look for feedback and benefits.


Thanks to this approach, actors can exchange meaningful information using the context or quickly select the needed data for a specific study. Our past article  THE VALUE OF DATA QUALITY detail those benefits.

BOX 1 : “Life science companies have traditionally developed their own internal infrastructures, which results in a duplication of efforts and in systems that are not interoperable. Collaboration between stakeholders will underpin the future of the life science industry, and overcoming these kinds of barriers is why The Pistoia Alliance was formed. “
Dr. Steve Arlington, President of The Pistoia Alliance. (

These set up projects need to be handle with care. Of course, technical prerequisite needs to be there but also :

  • A gouvernance with strong leadership and emphasis. While bringing benefit to all actors, such tool or strategy is mandatory for none of them. With the internal R&D budget constraints, it is hard to justify a tool that is not mandatory,
  • A strong emphasis on user feedback : on the benefits, on the cost of the new tool. For example in some context, it is adapted to spend 5 minutes per day to annotate data in order to provide to the management an unified vision of the department activity and data. A specialist needs to be listen and his knowledge has to be incorporated to the tool to improve the precision of the tool. It is hard to have a generic approach on cost benefits. Depending on the data (legacy/old data, IP related data, personal data, internal/external,  project/strategic data, … ) and on the usage, needs and means will be differents.

Thanks to Insight Engine, working on the data gouvernance is not theoretical any more.


These tool allows to test, to confront ontologies and master data to the data themselves, in an iterative manner and to build, as a company effort, an index where all data are qualified at best effort.

It is not like creating a new BI tool on top of data but to change the uses around data.  And this approach bring benefit very quickly.

This approach also allow to combine interest for people, team and company in a better data quality, mandatory for any data based strategy (see box 2). The convergence of interest is for us the best driver to a successful deployment of an Insight Engine.

BOX 2 : “Ensuring Quality is recognized as one of the most challenging issues  in Big Data era. Current approaches and solutions emerged both from academia and industry that tackled quality have not reached yet a  convincing level of maturity. Evaluate the importance of assessing quality of Big Data versus the value it generates for its users (e.g. governments,  businesses) is of paramount importance.”

If you’d like to talk to us about how we can help you make your scientific data more valuable, please follow the contact link on the top bar to arrange a call with one of our data management experts

Erwan David, Chief technical officier

Erwan David, Co Founder.