Modelling the early evolution of extracellular matrix from modern Ctenophores and Sponges. Essays in Biochemistry, 63(3), 389-405.

Animals (metazoans) include some of the most complex living organisms on Earth, with regard to their multicellularity, numbers of differentiated cell types and lifecycles. The metazoan extracellular matrix (ECM) is well-known to have major roles in the development of tissues during embryogenesis and in maintaining homeostasis throughout life, yet insight into the ECM proteins which may have contributed to the transition from unicellular eukaryotes to multicellular animals remains sparse. Recent phylogenetic studies place either ctenophores or poriferans as the closest modern relatives of the earliest-emerging metazoans. Here, we review the literature and representative genomic and transcriptomic databases for evidence of ECM and ECM-affiliated components known to be conserved in bilaterians, that are also present in ctenophores and/or poriferans. Whereas an extensive set of related proteins are identifiable in poriferans, there is a strikingly lack of conservation in ctenophores. From this perspective, much remains to be learnt about the composition of ctenophore mesoglea. The principal ECM-related proteins conserved between ctenophores, poriferans and bilaterians include collagen IV, laminin-like proteins, thrombospondin superfamily members, integrins, membrane-associated proteoglycans and tissue transglutaminase. These are candidates for a putative ancestral ECM that may have contributed to the emergence of the metazoans.


INTRODUCTION
Some of the most fascinating and mysterious steps in the evolution of life on earth involve the debut of multicellular organisms from single-celled ancestors. Modern multicellular lifeforms are present in both the bacterial and eukaryotic domains of life and there is evidence that multicellularity has emerged independently multiple times [1][2][3][4][5]. For eukaryotes, particularly the Metazoa (animals), the transition from unicellularity to multicellularity is complex to consider because ancestral forms are either not represented in the fossil record, or are extremely rare and difficult to identify. Genome and transcriptome sequencing projects offer new routes to consider these evolutionary transitions more systematically. For example, an analysis of genomic data from many modern prokaryotes to identify commonalities in protein-coding sequences allowed inference of the possible repertoire of proteins in a last common ancestral cell [6].
In considering central attributes of multicellular organisms, the evolution of stable mechanisms for organised cell-to-cell attachments is a key requirement of multicellularity. The most complex modern multicellular organisms, including between 3 to about 122 different cell types, are found amongst the Metazoa [5]. The single-celled eukaryotes most closely related to animals are the choanoflagellates [7] and filastereans such as Capsaspora owczarzaki, [8,9]. From comparative studies of the transcriptomes and predicted proteomes of these protists, it has been deduced that the origin of metazoans probably involved the functional adaptation of pre-existing gene products, for example new or adapted roles of integrin and cadherin receptors, both of which have been identified in certain unicellular eukaryotes [10,11], as well as genetic rearrangement events that led to the origin of new types of gene products with novel functional capacities for cell interactions and inter-cellular communications [12].
A central mediator of metazoan multicellularity is the extracellular matrix (ECM), a structured extracellular network of collagens, glycoproteins, proteoglycans and associated carbohydrates such as glycosaminoglycans.
The secreted proteins that build the ECM appear to fall within the novel category of metazoan gene products, because many ECM proteins are highlyconserved throughout animals and yet are not represented in choanoflagellates or filastereans [13,14]. Williams et al. [13] established that three species of these protists express distinct sets of predicted secreted proteins (identified by presence of a N-terminal secretory signal peptide and no transmembrane domain), none of which have a domain composition equivalent to a metazoan ECM protein, although individual domains common in ECM proteins are present. Specific examples are the separate domains of fibrillar collagens [15] or thrombospondins [16]. These data suggest that gene rearrangement and domain shuffling had an important role in the emergence of the large, multidomain, secreted proteins that characterise the metazoan 4 ECM. In modern metazoans, the secretion and extracellular assembly of structural proteins of the ECM depends on many ECM-affiliated proteins, both intracellular and extracellular: for example, to effect post-translational modifications, proteolytic processing, or interactions with non-structural and matricellular proteins within the ECM [17,18]. Thus, consideration of the phylogeny of these affiliated proteins is also relevant to constructing models for the evolution of metazoan ECM.
The distinct domain architectures of secreted proteins of choanoflagellates and metazoans suggest that additional insights into metazoan ECM evolution could be ascertained from careful comparative analysis of ECM and affiliated proteins encoded in modern species from the earliest-diverging metazoan phyla. By analogy with the analysis of a prokaryotic ancestral cell [6], proteins in common between extant species in the earliest-diverging phyla would be candidates for membership of "ancestral ECM". Of the early-diverging metazoan phyla (Ctenophora (comb jellies), Porifera (sponges), Placozoa and Cnidaria), cnidarians are, to date, by far the most-studied with regard to their ECM and cell-adhesion mechanisms. This relates to the lengthy history of Hydra as an experimental model [18], the phylogenetic position of cnidarians as the sister group to bilaterians, and the presence of a morphologically welldefined ECM, the mesoglea, that, in Hydra, can be isolated away from the cell layers of the body wall as an acellular structure and is thus suitable for biochemical study [19,20].
Knowledge of the ECM of the other early-diverging phyla is much more sparse. Placozoa comprise an enigmatic phylum that to date includes only a few species and will not be considered further here [21]. A mesoglea between the epithelial cell layers is apparent in ctenophores, but information on its molecular composition is very limited (discussed further below). Sponges have bio-mineralised extracellular structures (spicules) embedded in a fibrillar meshwork, but most classes lack ECM as recognised morphologically in bilaterians. A limitation for the study of ctenophores and sponges until recent years has been a lack of laboratory model species or cell culture [22,23].
The limited knowledge is significant because both Porifera and Ctenophora are considered to be of earlier evolutionary origin than Cnidaria, as evidenced by the fossil record and molecular phylogeny reconstructions [24,25].
However, there remains considerable controversy over whether the sponges or the ctenophores are of earliest evolutionary origin, i.e., which phylum represents the sister group to all other animals. Historically, sponges were placed at the base of the animal evolutionary tree due to their simple morphological organisation, limited number of cell types, and the absence of recognisable nerve or muscle structures: both of the latter are present in ctenophores and cnidarians [26] (Fig. 1A). With the expansion of molecular phylogenetics, several studies have surprisingly placed ctenophores as the sister group to all other animals [e.g., 27] (Fig. 1B), whereas others continue to support the traditional "poriferans-sister" model [e.g, 28,29]. Genome sequencing of two species of ctenophores indicated major differences in the categories of encoded proteins in comparison to all other metazoans, with many proteins of bilaterians noted to be absent [30,31,32]. Thus the "ctenophore first" hypothesis remains under active discussion and investigation [33,34].
The advent of genomic and transcriptomic sequencing projects for an increasing number of poriferan and ctenophore species has revolutionised the possibility to gain insight into ECM content in sponges and ctenophores, through analysis of the predicted proteomes of individual species. This article will review the published literature and discuss our findings from a recent detailed survey of public genomic and transcriptomic databases for ECM proteins in species representing three classes of sponges and three species of ctenophores.

Known Components of ECM in Ctenophores and Poriferans
Ctenophores. The unique anatomy and ultrastructure of ctenophores has been studied by light and electron microscopy, with major interest in developmental processes, nerve and muscle tissues, and the specialised rows of locomotary ciliated combs (Fig. 1C, 1D) [35,36]. Prey capture is carried out by specific colloblast cells on extensible tentacles (most species) 6 or by direct engulfment (Beroe species, that lack tentacles) [27,37]. The mesoglea is typically described as transparent and jelly-like. From transmission electron microscopy studies of the tentacles of Euplokamis, the mesoglea was observed to contain networks of striated fibrils, interpreted as collagen fibrils, as well as muscle fibres, mesenchymal cells and a network of nerve cells. Curious box-like, acellular, extracellular structures were also observed [38]. Later immunofluorescent staining studies of Pleurobrachia species or Beroe abyssicola also identified many cell types within the bodywall mesoglea, including networks of nerve cells, muscle and other cell types [39][40][41]. Transmission electron microscopy has also provided views of a basement membrane-like layer that underlies ectodermal cells in Pleurobrachia bachei and B. ovata, but is not visible in Mnemiopsis leidyi [14].
There is very little direct knowledge of the composition of ctenophore mesoglea, but a phylogenetic study of the basement membrane proteoglycan, perlecan, concluded that perlecan is absent from M. leidyi [42]. Fidler et al. [14] detected collagen IV by immunohistochemistry as diffuse arrays in proximity to ectodermal cells in M. leidyi, and with more appearance of linear elements in Beroe and Pleurobrachia. Genomic and transcriptomic analyses identified many collagen IV paralogues, whereas small collagenous proteins (designated spongins from their initial identification in Porifera) were absent and a unique type of secreted protein, containing only a non-collagenous (NC) domain, was identified and designated NC1 protein [14].
Porifera. Adult sponges are sessile, vase-shaped animals with pores that filter water into the body cavity for food uptake by specialised choanocytes (Fig.   1E) [43,44]. The body wall consists of an epithelial bilayer supported by mineralised spicules and a meshwork of extracellular fibrils termed the mesohyl (Fig. 1F). Unlike ctenophores, overt cell-cell junctions are apparent between epithelial cells [45]. There are four extant classes of sponges ( Fig.   1G) and different classes have different processes of biomineralisation. In calcerous sponges, calcium carbonate-based spicules are assembled extracellularly through carbonic anhydrase activity and possibly in association with acidic extracellular proteins [46][47][48]. In siliceous sponges, silicon dioxide spicules are templated through intracellular and extracellular processes involving (in many species) the polymerising enzyme, silicatein, and frequently with templating onto collagen fibres [49][50][51]. Silicateins are related to the cathepsin family of intracellular processing and degrading enzymes and are thought to have arisen in the sponge lineage through ancestral gene duplication and point mutation of cathepsin L [52]. Chitin has also been identified as a spicule-associated, possible template in demosponges and a glass sponge [53,54].
By electron microscopy, class Homoscleromorpha is distinguished by the presence of a basement membrane structure [55,56]. Indeed, a collagen IV cloned from a homoscleromorph sponge was shown to have a basement membrane-like localisation [55]. In addition, sponges (along with various other invertebrates) encode short-chain spongins that contribute to 10nm microfibrils within the mesohyl. Spongins comprise around 79-100 Gly-Xaa-Yaa triplets and 3 noncollagenous regions, with the C-terminal noncollagenous regions having homology and a proposed shared evolutionary origin with the NC1 domain of collagen IV [57,58]. Collagen fibril structures have been identified by ultrastructural criteria in several sponge species [59][60][61] and molecular cloning led the recognition of a diversity of molecular forms of collagens of the fibrillar or interrupted-triple-helix types in addition to spongins ( [62][63][64] and reviewed by [65]).
Other mechanisms may involve C-type lectins and a calcium-dependent lectin, clathrilectin [74,75], as well as self-association of carbohydrates [76].
Chemically, the sulphated polysaccharides appear very varied, with varying amounts of sulphated galactose, fucose, arabinose or hexuronic acid identified across species [77]. Examination of the structures of the acidlabile carbohydrates of glyconectins from several species identified these to 8 include heterogeneous sulphated oligosaccharides with variable amounts of fucose, arabinose, or py(4,6)Galacatose residues, and thus distinct from the repeated disaccharide units of bilaterian glycosaminoglycans [78].
With regards to other mechanisms of cell-ECM associations, integrin subunits have been cloned from several sponge species [79,80,81]. Integrin(s) have been implicated functionally in autograft fusion in Geodia cydonium [82] and the response of Microciona prolifera cells to depletion of extracellular sulphate [83]. The identification of vinculin in Oscarella pearsei and the localisation of this protein to cell-cell and cell-ECM contact sites further supports that poriferan integrins are likely to function in adhesion and cell signaling, as in bilaterians [84]. Dystroglycan-like proteins with possible laminin-binding capacity have been recognised in several sponges in addition to the dystroglycans of cnidarians and bilaterians [85].

Insights from Genomics and Transcriptomics: A Structured Survey of ECM Proteins Encoded in Poriferans and Ctenophores
The sequencing of the genome of the demosponge Amphimedon queenslandica expanded the view on candidate ECM proteins of sponges.
Analysis of the predicted proteins indicated that, even in the absence of overt basement membrane-like structures, laminin-like proteins are encoded [86,87]. Since 2010, transcriptomes for sponges of other classes and genomes and transcriptomes for several species of ctenophores have been published [31,32,88,89].
To obtain a wider view of ECM and ECM-associated proteins in ctenophores and poriferans, we surveyed sponge and ctenophore genome-and/or transcriptome-predicted proteins for selected ECM and ECM-affiliated proteins. The ECM proteins chosen for study are highly conserved in invertebrate and vertebrate bilaterians [90,91] and have known functional roles in the fibrils and meshworks of the ECM. A range of collagens were included as search tools to assist identification of possible disparate forms.
Major glycoprotein and proteoglycan receptors that tether ECM proteins at cell surfaces and extracellular proteases important for ECM dynamics in bilaterians were also included ( Fig. 2A), along with intracellular proteins that are important for procollagen assembly, processing and collagen fibril formation (Fig. 2B) [92], or for the post-translational assembly of the core tetrasaccharide linker for glycosaminoglycan substitution on proteoglycan core proteins [93] (Fig. 2C). Spongin and silicatein are not present in vertebrates but were included because of their known importance in poriferan ECM. The suite of 41 proteins analysed is listed in Supplementary Table 1 Collectively, the data demonstrate dramatic differences in the profile of conserved proteins in ctenophores versus poriferans (Fig. 3, Table 1). Many more proteins in common with bilaterian ECM are encoded in poriferans than in ctenophores. Nevertheless, the ctenophore list does include a repertoire for a basic cell-ECM adhesion system: cell-surface receptors, ECM proteins, a cross-linking enzyme and a potential ECM-proteolytic enzyme (Fig. 3). The proteins identified in ctenophores were for the most part present in all three species, with the exception of syndecan, identified only in P. bachei, and potential matrix metalloproteases, identified in M. leidyi and H. californensis.
In agreement with Fidler et al. [14], many collagen IV-like paralogues were identified. Post-translational modifications of proline and lysine residues contribute to the stability of collagen triple helices in vertebrates, however only prolyl-4-hydroxylase and not pro-collagen lysine dioxygenase (lysine hydroxylase) was identified in these ctenophores (Table 1, Fig. 3). The encoding of multiple integrin alpha and beta subunits (Table 1) indicates potential for diverse specificities of integrin-mediated cell adhesion, perhaps in line with the relatively large number of cell types now documented in ctenophores [94]. Silicatein-like proteins were identified in all three species; however, given the recognised general divergence of protein sequences in ctenophores [31,32], in-depth studies will be needed to determine the relationship of these to the cathepsin family (Table 1).
We examined the laminin-like proteins in more detail in view of the early evolution of collagen IV [14] and the central role of laminin in basement membrane assembly in bilaterians [95]. With the caveat that some of the identified sequences are incomplete, the laminin subunits identified present a complex picture. Although all are large proteins with many of the characteristic domains of laminins, many variations in domain organisation are apparent, including atypical domains such as thrombospondin or fibronectin III domains.
Overall, the laminin proteins of sponges are more similar to those of bilaterians, yet distinctions between beta and gamma subunits are blurred at both the sequence and domain levels. Notably, the laminin N-domain is lacking from (apparently full-length) ctenophore proteins and two alpha-like subunits but no beta-or gamma-like subunits were identified in P. bachei (Fig.   4). The numbers of alpha-like and beta/gamma-like subunits varied between species and the alpha-like subunits included at most three laminin-G domains.
Biochemical experiments will be needed to determine if these proteins are capable of forming stable heterotrimers and undergoing extracellular polymerisation or integrin-binding. For the ctenophore proteins, it is of interest whether heteromers including two different alpha subunits can be assembled.
The TGM-like proteins identified in the ctenophores and A. queenslandica sponge each include all the major domains of TGM2 but have only 30%-35% sequence identity to human TGM2. However, at the active site of TTG2, the identity is 65%-80% and the cysteine residue is completely conserved ( Fig   5A). Molecular models of the active site region for four of the sequences, constructed against secondary structure alignments of four TGM2 structures from Protein database (4KTY_B, 1G0D_A, 1LM9_A and 2Q3Z_A), are presented overlaid with the structure of human TGM2 (4PYG) that was not used for modelling (Fig. 5B, 5C). The models demonstrate that residues within 60nm of the active site cysteine align very well with the known structure (Fig.   5B), as do the highly-conserved residues at the active site (Fig. 5C). We predict that the ctenophore and sponge proteins should be active transglutaminases.
ECM-related proteins not found in the ctenophores raise other intriguing questions about the biochemical nature of ctenophore mesoglea. It was previously noted that core proteins of secreted proteoglycans of bilaterians are not conserved in early-diverging metazoans [19,90]. However, carbohydrate has been reported as <1% of dry weight of ctenophores [98].
With the exception of glucuronyl-transferase, homologues of the bilaterian enzymes for addition of the core O-linked saccharides (Fig. 2C), are not identifiable (Table 1, Fig 3). Whether this pathway mechanism evolved later or has been lost through lineage-specific gene losses in ctenophores is unclear.
With regard to ECM structure, no fibrillar-like collagens were identified and the collagen cross-linking enzyme lysyl oxidase was also absent, raising questions over the nature of observed striated fibrils in ctenophore mesoglea [38]. However, bone morphogenetic protein 1 (BMP1), which cleaves the Cpropeptide of fibrillar procollagen [99] (Fig. 2B), was present. In bilaterians, BMP1 has many other substrates including a laminin gamma chain [99] and it may be expected that the ctenophore protein can target other substrates. In agreement with [14], spongin was not identified.
In contrast, poriferans were confirmed to encode a wider repertoire of ECM proteins including fibrillar-like collagens of various domain architectures and fibrillin, as well as SPARC and one or more thrombospondin superfamily members (see [16] for details of the thrombospondin superfamily). Collagen in the homoscleromorph sponge. The encoding of lysyl oxidase is in agreement with the detection of striated collagen fibrils in sponges [59][60][61].
Nonwithstanding the unusual carbohydrate structures reported in sponges (see section above), the suite of carbohydrate-addition enzymes encoded indicate potential for addition of the O-linked core tetrasaccharide of glycosaminoglycans ( Table 2 Silecateins were identified as expected in the demosponge A. queenslandica and also in the other species examined (Table 3).

Perspective
Current laboratory experiments and analyses of genome-predicted proteins indicate that ctenophore ECM has a very different protein composition to other metazoans. This cannot be interpreted as a result of the early phylogenetic emergence of this phylum because poriferans, traditionally considered the earliest-diverging metazoans, are found to have an array of ECM and ECMaffiliated proteins that is clearly closer to the conserved repertoire of cnidarians and bilaterians. The difference in ECM might be considered an indication that ctenophores evolved prior to poriferans, in which case the limited set of proteins conserved between ctenophores, poriferans and bilaterians can be taken to represent a prototypic "toolkit" for a minimal metazoan ECM [100]. The combination of collagen IV, laminin-like proteins and thrombospondin superfamily members is of great interest, as the concept of coordinated function of these three proteins within ECM has received little consideration.
However, other factors also need to be taken into consideration. There are estimated to be around 5,000 species of extant poriferans, yet only about 150 known species of ctenophores. This may reflect that the deep-sea lifestyle of many ctenophores makes it difficult to identify the true number of species, or alternatively could indicate very different evolutionary histories of poriferans and ctenophores. Sponges and ctenophores evolved when oxygen levels on Earth were far lower than at present [101]. Indeed, members of both phyla lack hypoxia-inducible factor  (HIF) indicating that oxygen availability does not drive gene expression through the HIF pathway as in cnidarians and bilaterians [102]. In a "poriferan-first" evolutionary scenario, the limited repertoire of known ECM proteins in ctenophores would represent secondary gene losses, leading to a proposal of a relatively complex ECM in the metazoan ancestor. Many ctenophores live in the deep sea and the environment of low oxygen and sparse food sources [101,103], and high hydrostatic pressure may have driven selection for a unique form of ECM.
Clearly, the anatomy of ctenophores does include a mesoglea and, to date, to our knowledge, an unbiased study of ctenophore mesoglea by proteomic methods has not been carried out. Only through this type of approach will a clear view of ctenophore mesoglea composition be gained. A limitation of focusing on the ECM proteins conserved with bilaterians is that possible ctenophore-or poriferan-specific ECM proteins remain undisclosed. As discussed by others, it is very likely that a considerable "hidden biology" of ctenophores and poriferans remains to be discovered [104]. Nevertheless, the positive identification of certain ECM proteins conserved between these earlyemerging phyla and other metazoans increases the precision of models for an ancestral metazoan ECM.

SUMMARY POINTS
1. Ctenophores and poriferans have distinct sets of ECM-related proteins in comparison to the most highly-conserved ECM and ECM-affiliated proteins of bilaterians. 2. In particular, ctenophores lack many of the structural ECM proteins and enzymes for addition of the core O-linked tetrasaccharide that is characteristic of bilaterian glycosaminoglycan substitutions.      [96]. The helix is highlighted in yellow as orientation for the overlay models in Fig. 5C. C, Models were prepared by HHPRED and MODELLER [110] and are shown as Aq (pink), ML03126a (green), ML25826a (salmon) and Pb3462531 (silver) overlaid with the crystal structure of human TGM2 from 4PYG.pdb, (black, with the catalytic cysteine (C277) labelled). The overlays show high conservation of the sidechains of residues within a 6 Angstrom radius around the catalytic cysteine. Table 1. The ECM-related proteins conserved with bilaterians identified from the ctenophore species studied.