Fig. 1

Overview of the filtering process employed in the in silico pipeline. Initially ~ 35,000 TGF-β superfamily sequences were retrieved, though only ~ 19,000 were predicted to contain SPs. Of these, ~ 10,000 were found to be valid records and to have unique nucleotide sequences. Existing computational methods indicated ~ 7,000 were predicted to still function and to target the extracellular space when attached to BMP2. ~2,500 were found to have strong Kozak sequences and were taken forward to mRNA structure prediction. Sequences with the least structure at the translational start site but high global stability were thought to be preferable. The top 5 sequences from the pipeline and two manually selected alternatives were taken forward to in vitro work