Phylogeny of L1 sublineages
WGS knowledge of one,764 M. tuberculosis (Mtb) L1 isolates from Asia, Oceania and Africa had been received till November 2020. Maximum isolates had been from Thailand (36%), Vietnam (26%), India (13%) and the Philippines (9%). The rest had been from different Asian international locations (5%), Africa (n = 151; 9%) and Oceania (Australia and Hawaii; 2%) (Supplementary Desk S1). Accession numbers of the uncooked paired-end reads of all Mtb isolates used on this find out about are equipped in Supplementary Desk S2.
L1.1 splits into 3 main sublineages, denoted L1.1.1–L1.1.33. Maximum (97%) of L1.1.1 isolates had been from Thailand or Vietnam. L18.104.22.168 used to be the most typical sublineage of L1.1.1 (51%) and accounted for 85% of the samples from Vietnam however simplest 3% of Thai samples. L22.214.171.124 used to be additionally basically present in Vietnam whilst many of the ultimate sublineages had been predominantly present in Thailand. We designated two new clades in L1.1.1: L126.96.36.199 from Vietnam and Thailand and L188.8.131.52 from West Africa10. About 3% of L1.1.1 isolates had been left as unclassified (Fig. 1 and Supplementary Desk S3).
L1.1.2 splits into two deep-branching clades5. L184.108.40.206 is composed totally of Thai samples whilst L220.127.116.11 is standard throughout Africa, South Asia, mainland Southeast Asia (MSEA), and is maximum commonplace in India (80%) (Supplementary Desk S3). Significantly, L18.104.22.168 used to be now not recognized in samples from East Asia or island Southeast Asia (ISEA).
L1.1.3 incorporates 4 distinct clades, designated as L22.214.171.124–L126.96.36.199 (Fig. 1). The primary 3 were reported in lots of Asian research5,8,13. L188.8.131.52, used to be recognized most commonly amongst African samples (Supplementary Desk S3), specifically in Malawi (69%). Right here, the fourth numbering of L1.1.3 clades is in keeping with Palittapongarnpim et al.5 however differs from the Napier’s scheme6, which doesn’t acknowledge L184.108.40.206 (Fig. 1).
Curiously, a sublineage within the Napier’s scheme6 now not found in our primary dataset used to be L1.2.1, with maximum isolates being from sufferers in Europe. Due to this fact, we inferred a separate phylogeny of L1.2 isolates that incorporated 410 isolates from our dataset and extra 364 isolates from Napier et al.6 and different contemporary research10,14 (Fig. 2). L1.2 obviously splits into two main branches: L1.2.1 is a small clade (8.4% of L1.2) with most commonly Eu samples whilst L1.2.2 is a big clade (91.6%) this is standard throughout East Asia, MSEA and ISEA, in keeping with Napier et al.6 (Desk 2). Intriguingly, L1.2.1 incorporates a basal deep-branching clade of samples from East Timor and Papua New Guinea. L1.2.2 is assessed into 5 sublineages, 220.127.116.11–18.104.22.168. L22.214.171.124–L126.96.36.199 had been formerly outlined as L188.8.131.52–L184.108.40.206, respectively5. L220.127.116.11 used to be maximum commonplace in Taiwan (31%), adopted through the Philippines, Malaysia (Sabah) and Thailand (~ 10–15% every). L18.104.22.168 used to be most commonly limited to Thailand (89%). L22.214.171.124 used to be the most important sublineage (40% of L1.2). L126.96.36.199 used to be a small sister clade of L188.8.131.52. Each L184.108.40.206 and L220.127.116.11 had been maximum commonplace within the Philippines and within sight East Malaysia. L18.104.22.168 used to be the smallest sublineage, with maximum samples from Vietnam. The separation of L1.2.2 into 5 sublineages had been supported through their related intragroup moderate pairwise SNP distances, which can be lower than the intergroup moderate pairwise SNP distances, and the fixation indices (Supplementary Fig. S2).
L1.3 (formerly L1.2.2) used to be standard in international locations surrounding the Indian Ocean. It incorporates L1.3.1 (18%) and L1.3.2 (82%) sublineages, with one basal isolate from Thailand. Virtually all L1.3.1 isolates (88%) had been from Japanese or Southern Africa, with a couple of basal isolates from India. By contrast, L1.3.2 used to be maximum commonplace in India (36%), adopted through Thailand (20%) (Supplementary Desk S3).
Research of 68 DVRs and spoligotypes
All L1 isolates had feature deletions of six DVRs, DVR39–42, 44 and 48, within the CRISPR area5. A small fraction (< 3%) of L1.1.1 isolates in lots of sublineages had simplest the ones six deletions (Supplementary Desk S4). The corresponding 777777777413771 (SIT236) spoligotype used to be anticipated to resemble that of the latest commonplace ancestor of L1 (L1-MRCA). Because the CRISPR area evolves basically through deletion, sublineages with extra deleted DVR blocks are anticipated to be extra derived. We discovered that L1.1.1 had the bottom choice of deleted DVR blocks, 4 on moderate, adopted through L1.3.2 (4.5), L 1.1.2 (5), L1.1.3 (6), L1.2.2 (6) and L1.3.1 (7) (Supplementary Fig. S4).
Some sublineages correlated neatly to a singular set of DVR deletions that can be used as markers (Desk 1). For instance, L1.1.3 isolates proportion DVR33 and DVR56 deletions. L22.214.171.124 contained a clade with further deletions of DVR6–7, DVR51 and DVR57, similar to spoligotype 777777757413371 (SIT292). L126.96.36.199 had further deletions of DVR7–10, DVR12–19 and DVR34, similar to spoligotype 700777747413771 (SIT129). All L1.2 isolates shared DVR10 deletion. Maximum L1.2.1 had further deletions of DVR11 and DVR33, except for for a couple of basal isolates. Since DVR10–11 deletions weren’t used for spoligotyping, some L1.2.1 and L1.1.3 isolates do have the similar spoligotype, 777777757413771 (SIT591), on account of DVR33 deletion. Further deletions of DVR4 and DVR30–31 had been particular to L1.2.2 (generally 677777477413771, SIT19). L188.8.131.52 had an extra deletion of DVR17–35 (generally 674000003413771, SIT89).
A number of sublineages shared a unmarried DVR deletion block, e.g. L184.108.40.206 (DVR18–21), L220.127.116.11 (DVR57–60) and L18.104.22.168 (DVR53) (Desk 1 and Supplementary Fig. S1). Nonetheless, the ones deletions may happen in different sublineages as neatly, proscribing their possible use as dependable markers. Deletions of a few DVRs had been obviously homoplastic, happening a number of instances independently within the evolutionary historical past of L1. DVR62 is a feature of each L1.3 and L22.214.171.124, and is deleted at the side of DVR61 in L126.96.36.199. This deletion used to be additionally discovered in lots of different sublineages of L1.1.1. Thus, DVR62 deletion on its own is of restricted genotypic worth.
A number of DVR deletions looked as if it would happen sequentially alongside the phylogeny. For instance, maximum of L188.8.131.52 (n = 349, 84%) belonged to a clade with shared DVR36–37 deletions. The remaining (16%) had intact DVR36-37 and had been extra basal to this clade, with 16 isolates having simplest six DVR deletions very similar to L1-MRCA (Supplementary Fig. S1 and Supplementary Desk S4). Inside the DVR36–37 deletion clade, there have been two subclades with further deletions of DVR23 (n = 25) or DVR29–63 (n = 35).
Excessive deletions led to your entire absence of DVRs in 7 isolates from Thailand, all belonging to L184.108.40.206 (Desk 1, Supplementary Fig. S1, and Supplementary Desk S4). The WGS of the isolates had the learn intensity of 15–46 (median = 26) and the breadth of protection on the sequencing intensity of 20 between 23–96% (median = 74%). Those isolates shaped two separate clades, suggesting two unbiased occasions of whole deletion. The spoligotype-negative tournament used to be associated with gene cas1, the deletion of which used to be related to extra vulnerability to DNA injury15.
Geographic range of L1
Sublineages of L1 allotted preferentially across the Indian Ocean and the Western Pacific area (Fig. 3). L1.2 used to be major in ISEA and used to be hardly reported from Africa whilst L220.127.116.11 used to be limited to West Africa10. Simplest L18.104.22.168 and L1.3 had been allotted broadly throughout Asia and Africa, specifically in South Asia.
On the nation degree, some international locations, with > 50 samples, had a definite major L1 sublineage: L22.214.171.124 for Vietnam, L126.96.36.199 for India, L188.8.131.52 for Malawi, L184.108.40.206 for Taiwan, L220.127.116.11 for the Philippines and East Malaysia, L18.104.22.168 and L22.214.171.124 for Thailand (Fig. 3, Desk 2, and Supplementary Desk S3). India, Thailand and Myanmar had been the one international locations on this dataset that harbored isolates belonging to all 5 commonplace sublineages of L1, in spite of simplest 24 WGS samples from Myanmar.
Sublineage barcoding SNPs
We recognized SNPs uniquely shared through all isolates inside every sublineage as markers for sublineage identity (Supplementary Tables S5 and S6). A number of the 1,835 sublineage-specific SNPs, ~ 80% had been in non-essential coding areas or in noncoding areas. We propose the usage of the whole set of SNP markers when imaginable. We additionally equipped a subset of 125 barcoding SNPs, prioritizing synonymous SNPs inside foremost genes (Supplementary Desk S6).
Drug resistance mutations
We used TBprofiler to are expecting drug resistance in response to identified genetic markers16 (Supplementary Fig. S5). The vast majority of isolates (n = 1,272, 72%) didn’t possess any resistance conferring mutations and, subsequently, had been possibly pan-sensitive. Isoniazid resistance used to be maximum commonplace (n = 348, 19.7%), adopted through ethionamide (n = 201, 11.4%), streptomycin (n = 188, 10.7%), rifampicin (n = 160, 9.1%), ethambutol (n = 99, 5.6%), pyrazinamide (n = 66, 3.7%) and fluoroquinolones (n = 28, 1.6%) (Supplementary Desk S7). Mutations conferring resistance to different medicine had been present in lower than 1% of the isolates. By way of classifying drug resistance profile in response to the most recent WHO suggestions (see Strategies), we discovered that 13% (n = 224) of the isolates had been resistant to at least one first-line drug (mono-DR) whilst 7% (n = 123) had been MDR. About 1% of the isolates had been pre-XDR. Just one isolate used to be XDR. Maximum rifampicin-resistant isolates had been additionally proof against isoniazid (n = 142, 89%).
The superiority of drug resistance various throughout sublineages (Supplementary Figs. S5 and S6) which have been correlated with the geography (Fig. 3 and Supplementary Desk S3). MDR isolates had been maximum prevalent amongst L126.96.36.199 (18%), L188.8.131.52 (13%) and L184.108.40.206 (33%), sublineages commonplace within the Philippines and Malaysia (Desk 2). This used to be in keeping with a prior file of nineteen% MDR-TB within the Philippines11 even if the MDR prevalence amongst new TB circumstances there used to be simplest 2% in 201217. We warning that the superiority of drug resistance might not be consultant because of other sampling designs utilized by supply research.
Mutations related to drug resistance had been numerous. All rifampicin-resistant isolates had variants within the rpoB gene, with 31 distinct alleles related to adjustments at 13 amino acid residues. For isoniazid resistance, 99% of the isolates (n = 344) had mutations in katG or fabG1/inhA involving 36 alleles, with katG Ser315Thr being maximum commonplace (52%), adopted through the -15C > T mutation within the fabG1 promoter (38%), with simplest 9 isolates having each variants (Supplementary Desk S7). A number of the different two first-line medicine, we recognized 44 alleles of pncA amongst 63 isolates conferring pyrazinamide resistance and 30 alleles in embA, embB and embR conferring ethambutol resistance.
A small share of isolates (n = 28, 1.6%) possessed mutations in gyrA or gyrB conferring resistance to fluoroquinolones, with Ala90Val and Asp94(Gly/Ala/Asn) in gyrA being the most typical (n = 26). Amongst the ones isolates, 18 had been certified as pre-XDR and most commonly belonged to L1.1.1 or L1.3.2. The one isolate recognized as XDR belonged to L220.127.116.11 from Thailand. Along with pre-XDR mutations, it additionally had an insertion in Rv0678 at place 192 that made it most probably to withstand bedaquiline18.
We recognized genetic clusters in response to the choice of SNP variations being inside a pre-specified cut-point (Supplementary Fig. S7). The use of 3 cut-points at 5, 12 and 20 SNPs, there have been 20 (1.1%), 132 (7.6%) and 251 (14.4%) clustered isolates respectively, with a median cluster measurement of two.0, 2.1 and a couple of.4 (Supplementary Desk S8). The clusters had been allotted throughout maximum sublineages, however had been maximum commonplace amongst L1.3 isolates (31%) and L1.1.3 (23%), and had been uncommon in L1.1.1 (9.6%).