Thursday, October 3, 2019

Andrew Dalke: mmpdb crowdfunding consortium

How can we raise money to fund open source software development in cheminformatics? It's a hard question. Asking for donations doesn't work – companies might not even have a mechanism to make donations. Consultant-based funding doesn't work that well either, because the cost of developing a general-purpose tool is several times more expensive than developing a tool which only meets the specialized needs of one client, and few clients are willing to subsidize the rest of the field. Proprietary software development solves the problem by getting many people to pay for the same product. Can we learn from the success of proprietary software to get the funds which would certainly be useful in improving open source software?

I have started the mmpdb crowdfunding consortium to see if crowdfunding can be used to fund further development of the matched molecular pair program mmpdb. The deadline to join is 1 Febrary 2020 – join now!

Background

mmpdb is an open source success story. It started as the mmpa program developed by Jameed Hussain and Ceara Rea. Their employer, GSK contributed it to the RDKit project. There was no more GSK funding, but others could study and improve the code.

Roche then funded me, Christian Kramer, and Jérôme Hert to add several improvements:

  • better support for symmetry, which results in fully canonical pair descriptions
  • support for chirality, including matching chiral with prochiral structures
  • can include the chemical environment when finding pairs
  • generate property change statistics for each pair, environment, and property type
  • parallelized fragmentation
  • fragmentation can re-use fragmentations from a previous run
  • performance speedups during indexing
  • pair, environment, and property statistics are stored in a SQLite database
  • analysis tools to propose possible transforms to an input structure, or to predict property shifts between two structures
The final code was also contributed to the RDKit project.

Now what?

Mmpdb is popular. Several people at the 2019 RDKit User Group meeting in Hamburg presented work which used it or at least referenced it.

But, who supports it? Who adds features? There is no more funding from GSK or Roche, so all we have a precious and scarce volunteer time. Others might fund their own developers to improve mmpdb, but the code is pretty complicated and it will take a while for new developers to get up to speed.

Sustainability

There is a long and ongoing discussion about how to fund open source projects. I won't even attempt to summarize them here, though I will point to Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure as one starting point.

My question is, are mmpdb users willing to fund its further development? If not, the project is not sustainable. I believe they are willing; the problem is that it's hard to justify paying money for software anyone can download for free.

Crowfunding consortium

I previously tried to develop chemfp as a purely open source commercial product. When customers bought the product, they got the software under the MIT license. This ended up being difficult, for reasons I'll likely blog about later. I now also offer chemfp with proprietary licensing, at a cheaper price.

With mmpdb, I am trying crowdfunding, along the lines of Kickstarter. The basic goals are:

  • Postgres support
  • new commmand-line option ("proprulecat") to export property tables as CSV
Everyone who joins will get these two features, under the existing 3-clause BSD license.

Beyond that are stretch goals. The one many people want is to store the chemical environment in the database as a fragment SMILES, rather than a hex-encoded SHA256 hash of the rooted Morgan fingerprints.

As more people sign up, I'll develop mmpdb further. Many of the stretch goals are related to documentation and testing. Mmpdb was developed as a research project, and needs those sorts of infrastructure improvements to allow future growth.

If enough people join, there will definitely be future crowdfunding efforts, perhaps a web interface, or support for categorial statistics, or other features people have asked me about.

I don't think people will pay for features that are available for free, so these changes will not be made available to the public until specific funding goals are reached.

How do you explain crowdfunding to accounting?

Don't. (Unless you really want to.) Tell them you are going to purchase a new version of mmpdb with Postgres and "proprulecat" support. You will receive these within two weeks of sending me – that is, my Sweden-based software company – a purchase order.

In addition, purchase includes membership in the mmpdb consortium. As more people join, and additional funding goals met, I will continue to improve mmpdb, and you will get those improvements as part of your membership.

Join now!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...