Systematic review of response criteria and endpoints in autoimmune hepatitis by the International Autoimmune Hepatitis Group

: complete biochemical response, insuf ﬁ cient response, non-response, remission, and intolerance to treatment, which can be used to guide future reporting. © 2022 The Authors. Published by Elsevier B.V. on behalf of European Association for the Study of the Liver. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4. 0/).

Background & Aims: Autoimmune hepatitis (AIH) has been well characterised and codified through the development of diagnostic criteria. These criteria have been adapted and simplified and are widely used in clinical practice. However, there is a need to update and precisely define the criteria for both treatment response and treatment. Methods: A systematic review was performed and a modified Delphi consensus process was used to identify and redefine the response criteria in autoimmune hepatitis. Results: The consensus process initiated by the International Autoimmune Hepatitis Group proposes that the term 'complete biochemical response' defined as 'normalization of serum transaminases and IgG below the upper limit of normal' be adopted to include a time point at 6 months after initiation of treatment. An insufficient response by 6 months was a failure to meet the above definition. Non-response was defined as '<50% decrease of serum transaminases within 4 weeks after initiation of treatment'. Remission is defined as liver histology with a Hepatitis Activity Index <4/18. Intolerance to treatment was agreed to stand for 'any adverse event possibly related to treatment leading to potential drug discontinuation'.
Conclusions: These definitions provide a simple and reproducible framework to define treatment response and non-response, irrespective of the therapeutic intervention. A consensus on endpoints is urgently required to set a global standard for the reporting of study results and to enable inter-study comparisons. Future prospective database studies are needed to validate these endpoints. Lay summary: Consensus among international experts on response criteria and endpoints in autoimmune hepatitis is lacking. A consensus on endpoints is urgently required to set a global standard for the reporting of study results and to enable the comparison of results between clinical trials. Therefore, the International Autoimmune Hepatitis Group (IAIHG) herein presents a statement on 5 agreed response criteria and endpoints: complete biochemical response, insufficient response, nonresponse, remission, and intolerance to treatment, which can be used to guide future reporting.

Introduction
Autoimmune hepatitis (AIH) is a rare liver disease with multiple forms of presentation. 1 It is characterised by the presence of auto antibodies, hypergammaglobulinaemia and abnormalities in liver histology, measurable by specific criteria. 2,3 There are no pathognomonic features, the pathogenesis is partly unknown, and it remains a diagnosis of exclusion. 4,5 Initial treatment consists of steroid induction therapy, where the choice depends on histological severity and the fibrosis stage, followed by maintenance therapy with a steroid sparing agent. 6 The ultimate treatment goal is to reduce long-term liver-related morbidity and mortality and to enhance quality of life. For some patients, liver transplantation will still be a necessity in the acute or chronic setting. 1 The International Autoimmune Hepatitis Group (IAIHG) has provided guidance and consensus statements in relation to codifying and characterising the disease, as well as providing guidance in case of serological and immunological testing. In its first iteration, in 1993, the first IAIHG provided diagnostic criteria in addition to definitions relating to therapy. 7 This led to the introduction of the terms 'complete response', 'partial response', 'no response', 'treatment failure', and 'relapse'. These definitions of treatment responsiveness have been the basis for defining response in AIH over the past 30 years. However, the definitions left room for multiple interpretations. To add complexity, the American Association for Study of Liver Disease (AASLD) redefined these terms and added 'incomplete response' as a response criteria or endpoint when some or no improvement in the clinical, laboratory and histological features existed despite compliance over a 2 or 3 year period. 8 Expert opinion has also been a guiding principle in the field, and indeed, terms such as 'good response' and 'insufficient response' have led to more confusion. 9 Therefore, the IAIHG considered the harmonisation of definitions a priority for 3 reasons. Firstly, it would lead to the development of surrogate endpoints in AIH treatment, akin to what is described in other rare diseases, such as primary biliary cholangitis and primary sclerosing cholangitis. [10][11][12] Secondly, for patients with AIH, it would contribute to a better understanding of outcomes based on patient data, in both randomised studies and large case series of well-defined patients. Thirdly, the need for consensus endpoints that can be reproduced and which are useful for clinical trials based on current evidence, expert guidance and international consensus, was considered critical for the advancement of the field. The identification of these outcome measures will help to better define the inflection points in management algorithms for patients with AIH. For that reason, we have studied the results of a systematic review and the Delphi process, to identify and present what we believe are the most important measures of treatment response in AIH. Finally, we sought external validation of the developed endpoints, non-response and complete biochemical response, in a large multicentre cohort of patients with AIH.

Materials and methods
Literature search Systematic searches of MEDLINE, Embase, Web of Knowledge, and the Cochrane database were performed to identify systematic reviews of trials reporting outcomes of and/or therapeutical interventions in AIH, and published guidelines and consensus statements, or recommendations, regarding the management of AIH. Consequently, this process collated all relevant endpoints that were used in any of the original trials. Only reviews published from January 1, 2010 onwards, were included to reduce the risk of using outdated or obsolete measures. Two authors (S.P. and T.G) independently identified titles and abstracts of potentially eligible studies. Discrepancies were resolved by consultation between the initial 2 researchers. When the 2 parties failed to agree, a third researcher could be consulted. Adjudication was never needed. Reference lists of relevant clinical studies, as well as review articles, were explored for additional studies. A detailed description of the MEDLINE search strategy is presented in Table S4. A total of 581 potential studies were retrieved, and 53 articles were evaluated in full-text after the screening of abstracts (Fig. 1). We excluded 36 studies after full-text evaluation, and our final count included 16 articles. We identified 11 (early) clinical trials in AIH. [13][14][15][16][17][18][19][20][21][22][23] Hand searching of reference lists of included studies left us with a further 5 potentially relevant articles: 1 meeting report, 7 and 4 guidelines 8,24-26 that met the inclusion criteria. A variety of endpoints and definitions were used over the years (Table S1). Using definitions published in the literature, various options for each endpoint of interest (non-response, complete biochemical response, insufficient response, remission, and intolerance to treatment) were drafted for entry into the Delphi survey (Table S2). Publications identified in the literature search were used to inform the Delphi process.

Definition of surrogate endpoints in AIH treatment
The IAIHG, which is an international, non-governmental, nonprofit scientific organization based on voluntary participation, providing a platform for carrying out project-based collaborations on AIH and AIH-related diseases, initiated a consensus process. At first, we used a modified Delphi approach. For each endpoint of interest (remission, complete biochemical response, insufficient response, non-response and intolerance to treatment), a number of possible definitions were drafted by a writers' group consisting of 11 members from the IAIHG. Definitions from existing guidelines were used to draft the queries for the survey. All members of the writers' group agreed upon the list of final items. We used Google Forms (Google, Mountain View, CA, United States) to create and execute the survey.
We invited all known members (active and inactive) of the IAIHG to complete the survey. We used the following template for most statements '<remission/complete response/insufficient response/non-response/intolerance to treatment> in AIH is best defined as <item option>' (Table S1). For insufficient response, the distinction was made between insufficient response to firstand second-line AIH therapy, in order to investigate whether physicians would accept different definitions or time points in relation to different therapies. Respondents could rate every Research Article question on a 9-point Likert scale (from 'completely disagree' (n = 1) to 'completely agree' (n = 9)). The median rating of each item was calculated and categorized as inappropriate (median rating 1.0-3.4), uncertain (3.5-6.5) or appropriate (6.5-9.0). A disagreement index (DI) was calculated to establish whether there was consensus on a definition (Table S1). A DI <1 indicated consensus on an item, while a DI >1 indicated non-consensus (Table S3). 27,28 The survey consisted of 2 rounds. Definitions without an 'appropriate' rated item, or definitions without consensus in the first round, were re-rated in the second round. After the second round, results of the survey were presented and discussed during 2 panel discussions at the IAIHG research workshop on July 1 st and 2 nd , 2019 in Vienna. Discussion and voting took place for all endpoints that were still rated 'uncertain' after the second survey round in order to reach a final consensus.
The initial approach was refined during peer reviewresulting in a hybrid process between a Delphi consensus and a nominal group processto reach agreement on points directly raised by the peer reviewers and to sharpen the concept of the definition. The final outcome determined the conclusive definitions for endpoints.

External validation of surrogate endpoints
We pursued external validation of 2 final surrogate endpoints (non-response and complete biochemical response) using data from a multicentre cohort, consisting of 404 patients with AIH, from 5 European countries. The structure of the cohort, and the inclusion and exclusion criteria have been published elsewhere. 29 In brief, adult patients with AIH and a simplified IAIHG score > − 6 were included in this study cohort, whilst patients with variant syndromes or competing liver diseases were excluded.
We used liver-related mortality or liver transplantation as a composite endpoint, while sex and centre-specific upper limit of normal (ULN) values for alanine aminotransferase (ALT) and aspartate aminotransferase (AST) were used as covariates. Only patients who had available serum IgG and serum transaminases were included in this analysis. In case transaminases were elevated and IgG level was unknown within a 6-month timeframe, we considered this outside complete biochemical response.

Analysis
We used the definitions resulting from the Delphi consensus, to distinguish between 2 groups within our AIH cohort depending on whether they met with the definitions from the Delphi round.
Two additional descriptive analyses were conducted to distinguish between patients attaining a complete biochemical response within 6 months and those attaining a complete biochemical response within 12 months but not within 6 months, and those attaining normal AST and ALT levels but having elevated IgG. We made univariate comparisons using the chi-square test, Mann-Whitney U test, or Student's t test as appropriate. We used Kaplan-Meier curves with log-rank testing for the composite endpoint of liver-related mortality or liver transplantation. p values <0.05 were considered statistically significant. Statistical analysis was done with SPSS version 25. Ethics approval was waived after review by the local institutional review board.

Results
Delphi respondents An invitation to participate in the survey was sent to 220 email addresses on the IAIHG mailing list. A total of 75 respondents (34%) completed one or both survey rounds. The first round was completed by 58 respondents (26%), and the second round was completed by 50 respondents (23%). Thirty-three respondents (15%) completed both survey rounds. The IAIHG workshops on July 1 st and 2 nd , 2019 in Vienna were attended by 50 participants.

Complete biochemical response in AIH
In the survey, 6 options for the definition of complete biochemical response were assessed (Table S2). After 2 survey rounds and a discussion, broad consensus was reached (median rating 8, DI 0.16) for the definition 'complete biochemical response' and now defined as 'normalisation of serum transaminases and IgG below the ULN' (Table 1 and Fig. 2). There was a consensus that complete biochemical response should be achieved no later than 6 months after initiation of treatment. The term 'response' was preferred over 'remission', since the term 'remission' reflects a stage in the disease that is less intense, whereas response indicates the transfer to a stage compatible with disease resolution. During the Delphi process we considered normalisation of transaminases below the ULN (without serum IgG) for the definition of complete biochemical response, but this was rated as less suitable and failed to gather consensus (median rating 6, DI 1.29). Normalisation of transaminases and IgG in combination with transient elastography (median rating 7, DI 0.65) was considered promising, based on data where transient elastography enabled the separation of severe from nonsevere fibrosis after 6 months of immunosuppressive treatment. 30 Magnetic resonance elastography was not considered, owing to its lack of general availability.

Insufficient response in AIH
Five options for the definition of 'insufficient response' in relation to different time points were assessed in the survey (Table S2A). After 2 survey rounds and a panel discussion, there was a broad consensus on the definition of 'insufficient response', which was defined as: 'lack of a complete biochemical response' for both first-(median rating 8, DI 0.26) and secondline (median rating 8, DI 0.16) AIH therapy. The importance of harmonising the definition of response for both first-line, second-line and third-line therapy was considered. In the second round, a consensus was reached on the assessment of 'insufficient response' as follows: the determination should be made no later than 6 months after initiation of treatment (first-line therapy: median rating 7, DI 0.16; second-line therapy: median rating 8, DI 0.16). Making this determination at 12 months after initiation of treatment did not reach a consensus or was rated as less appropriate (first-line therapy: median rating 7, DI 1.09; second-line therapy: median rating 6, DI 0.63). Other options for insufficient response in the survey, such as 'persistence of elevated serum transaminases >2 times ULN' or 'increased liver stiffness of repeated measurements with transient elastography', were rated as less appropriate (median ratings 6 and 4, DI's 0.52 and 0.65, respectively). It was agreed that insufficient response to first-line and second-line therapy may only be diagnosed as such after standard therapy has been applied and adherence has been proven (median rating 8, DI 0.13). Conditions for the standard of care first-line therapy in case of AIH, consist of steroid therapy with predniso(lo)ne at a dosage of at least 0.50 mg/kg/day at initiation and a maximum of 10 mg during maintenance. Moreover, budesonide is considered an equivalent treatment for noncirrhotic patients. In the case of azathioprine therapy, therapeutic 6-tioguanine levels were deemed mandatory in order to demonstrate adherence. In the case of second-line therapy, in particular regarding patients who do not tolerate azathioprine, mycophenolate mofetil (MMF) was suggested at a dose of at least 1 g/day, but preferably 2 g/day as an appropriate alternative. There was consensus that demonstration of therapeutic 6-tioguanine levels with 6-mercaptopurine therapy (median rating 8, DI 0.13) establishes adherence.
Consensus regarding the definitions for complete biochemical response and insufficient response indicates that any response other than normalisation of transaminases and IgG should be classified as an insufficient response.

Non-response in AIH
Nine options were given for the definition of 'non-response'. After 2 survey rounds, several answer options were regarded as 'appropriate' by consensus, however, not one definition was singled out as superior (Table S2). The options 'no improvement of serum transaminases and IgG' (median rating 8, DI 0.16) and 'no improvement of serum transaminases, serum IgG and persistent histological activity (hepatitis activity index [HAI] > − 4/ 18 or equivalent)' (median rating 8, DI 0.29) came forward as possible candidates after the 2 voting rounds. During the panel discussion, the definition for non-response was extensively discussed. Both definitions were considered to be insufficiently accurate for defining non-response. Consequently, voting took place using alternative options. It was finally agreed that nonresponse in AIH should be defined as '<50% reduction of serum transaminases after 4 weeks of treatment'. In the context of this definition, serum transaminases should still be above the ULN to be considered as non-response, since transaminases below the ULN indicate potential complete biochemical response.

Research Article
Remission in AIH Four items were assessed as options for defining remission in AIH. 'Normalisation of serum transaminases and serum IgG below the ULN' was proposed as a possible candidate for the definition of remission (median rating 8, DI 0.02). After 2 survey rounds and a panel discussion, it was agreed that remission in AIH could only truly be diagnosed histologically, and it is defined as 'liver histology with an HAI <4/18 or equivalent' (median rating 8, DI 0.32). This indicates that patients with no or only minimal hepatitis (HAI 0-3) are classified as being in remission. 31,32 Another option for the definition of remission that failed to gather consensus included the normalisation of serum transaminases below the ULN (median rating 6, DI 1.04). After peer review, a discussion took place, and it was finally agreed that remission cannot be achieved or tested within a specific timeframe or at a specific time point. However, a liver biopsy could be performed 12 months after treatment initiation or at any other time point during treatment for specific indications.

Intolerance to treatment in AIH
Three answer options for the definition of 'intolerance to treatment' were evaluated in the survey. The option 'any adverse event possibly related to treatment as assessed by the treating physician, leading to potential discontinuation of the drug' was judged as the best definition for intolerance to treatment (median rating 8, DI 0.29). The options 'severe corticosteroid related side effects (acne, hypertension, psychosis, diabetes, osteoporotic fractures), side effects of immunosuppression leading to discontinuation (pancreatitis, cytopenia, hepatitis, gastrointestinal symptoms, allergic reactions)' (median rating 8, DI 0.29) and 'inability to reach recommended standard dose of treatment due to adverse events' (median rating 7, DI 0.37) were assessed as less suitable.

External validation of surrogate endpoints
We were able to include 248 and 293 patients of a total of 404 eligible patients with AIH in our external validation of nonresponse and complete biochemical response, respectively ( Fig. S1 and S2). Due to a lack of follow-up biopsies, it was not possible to validate the surrogate endpoint of remission.

External validation of surrogate endpoint complete biochemical response
One-hundred and thirty-four patients (45.7%) had a complete biochemical response (Table S5B) within 6 months. Responders were less likely to have cirrhosis at baseline compared to nonresponders (18.8% vs. 29.5%; p = 0.047). Other baseline characteristics were comparable between both groups. Liver transplantation or liver-related death occurred less frequently in responders: 0% vs. 7.5% (long-rank p = 0.003) ( Fig. 4 and Table S5B).
Additional descriptive analyses showed that liver transplantation or liver-related death occurred less frequently in patients with a complete biochemical response within 6 months compared to those attaining a complete biochemical response within 12 months but not within 6 months (0.0% of patients vs. 4.4% of patients) and those attaining normal AST and ALT levels but having elevated IgG (0.0% of patients vs. 6.3% patients).

Discussion
The original response criteria, published by the IAIHG in 1993, have provided a basis for the management of AIH. 7 Despite the durability of the criteria, it is apparent that the complexity of these original definitions (which allowed 2 options for each individual response criterion) is too complex to apply in everyday clinical practice. Moreover, varied definitions of both response and remission have crept into national and society guidelines. 1,8,33,34 This signals a need for harmonisation and clarification.
We performed a systematic review that informed a Delphi Method Process. Members of the review panel, consisting of both the steering committee and individuals who are general members of IAIHG, met face-to-face on 2 occasions to develop and refine the definitions proposed over an 18-month period in 2018 and 2019. Finally, consensus statements were agreed upon, following a review of the results and discussions in a workshop, hosted in Vienna in July 2019. We propose 5 standardised endpoints in the management of AIH.
These include a definition of a complete biochemical response. A complete biochemical response is defined as 'normalisation of serum transaminase activity and IgG level below the ULN, which should be achieved no later than 6 months after initiation of treatment'. During the assessment of the response criteria, it was found that guidelines derived from older studies 15 suggested that a threshold of transaminase activity of less than twice the ULN could be considered an adequate response. 8 The most recent randomised, double-blind study in the field, applied the primary endpoint of complete biochemical remission without steroid-specific side effects. 20 In this study, a complete biochemical remission was defined as serum AST and ALT within the normal range. As a consequence, and despite the lack of data from randomised clinical trials, the optimal response definition has evolved over time in clinical practice. In real-life practice, an isolated elevation of either AST or ALT may be attributed to different aetiologies (for example, alcohol-related liver disease or non-alcoholic fatty liver disease). These factors need to be thoroughly assessed and may require an additional diagnostic trajectory. Thus, although ALT or AST can be elevated due to an obvious alternative aetiology, this will preclude the achievement of a complete biochemical response and will be marked as an insufficient response.
The definition of 'insufficient response' is a failure to achieve a complete biochemical response, using a combination of transaminase activity and IgG levels. 20 There is a need for harmonisation in view of the discordance in the literature about the timing of the first assessment after the start of therapy. 29 29 The response to treatment to a large extent depends on patient-related factors and on the severity of AIH at presentation. An insufficient response does not necessarily imply that the treatment regimen should be altered, but it should alert the clinician since it is likely to have some prognostic value. The label of insufficient response has potential clinical consequences, including patient anxiety, and necessitates a search for possible underlying causes, such as non-adherence to treatment or an alternative explanation of the elevated transaminases.
We simplified the concept of non-responsiveness to entail a decrease of 50%, or less, of serum transaminases within 4 weeks of treatment initiation. The panel considered various definitions for a non-response based on previous guidelines and publications. The current definition is in accordance with the original IAIHG report, 7 but it excludes symptomatic improvement, and no change in inflammatory activity on liver biopsy after 6 months of treatment is required. Additionally, steroid dosage or increasing the dosage during steroid treatment is not specified. Indeed, AASLD guidelines from 2010 had suggested that an incomplete response represented 'some or no response in clinical, laboratory and histological features despite compliance with therapy after 2 to 3 years'. 8 A true non-response after initiation of standard treatment in AIH is a rare event since a favourable response to steroids is a key characteristic of AIH. In patients with a nonresponse, adherence should be explored, the AIH diagnosis should be challenged, and the histology should be re-evaluated by an expert liver pathologist. Treatment failure, although helpful as a concept, remains difficult to define precisely, hence the need for accurate prognostic models to stratify patients at an early time point in their treatment course or presentation. 39 The concept of remission should be simple and reproducible. Previous definitions of a complete response included improvement of symptoms, as well as the return of liver function tests to normal within 1 year of treatment initiation, after which maintenance of normalised liver biochemistry for a further 6 months was mandated. 7 Taking a liver biopsy at some point during this 18-month timeframe was necessary, and in keeping with the definition, minimal disease activity was required on biopsy. We considered the inclusion of biochemical parameters in addition to histological biomarkers to define 'remission'. 40 The panel agreed on the statement that remission in AIH can only truly be diagnosed histologically (median rating 8, DI 0.32), which requires a liver biopsy procedure. However, this is at odds with current clinical practice, where a liver biopsy procedure is only performed at diagnosis, and follow-up biopsies are rarely performed, and if so, only in cases of specific indications. There was consensus that remission could be obtained 12 months after treatment initiation or at any other time point during treatment in specific clinical indications. A follow-up biopsy is recommended in patients with a suboptimal response to treatment, in patients with discrepancies between the transaminase response and IgG response, and ideally before complete cessation of immunosuppression to confirm the complete histological resolution of the disease (HAI <4/18). 34,41 A liver biopsy may add value in detecting concurrent liver diseases, such as steroidinduced steatohepatitis, and offer pivotal information for clinical decision-making that requires a high degree of certainty, such as stopping immunosuppressive therapy. 8,34 The risk that comes with a liver biopsy procedure is a reason to consider noninvasive disease monitoring tools, such as transient elastography. Studies and guidelines have confirmed that there is no available evidence to validate the complete resolution of histologic inflammation. 8,34,42 While transient elastography has gained traction in the community, [43][44][45] there is currently little evidence to advocate its use in AIH currently other than as a monitoring tool. We interrogated the Delphi panel about the role of transient elastography in AIH based on data that show that transient elastography separates severe from non-severe fibrosis after 6 months or longer of immunosuppressive treatment, 30 but it was rated as less appropriate to identify a remission, and failed to gain consensus.
Three endpoints (complete biochemical response/insufficient response, and non-response) were externally validated in an AIH cohort. Applying the criteria for the 3 endpoints in this cohort resulted in adequate differentiation with respect to liver-related death or liver transplantation. Although we were able to demonstrate a significant difference in liver-specific mortality or liver transplant-free survival, the data should be interpreted with caution because of the relatively small number of events. In addition, as the further analysis comparing patients achieving a complete biochemical response within 6 months with those achieving a complete biochemical response after 6-12 months was based on a relatively insufficient sample size, the assessment of 'insufficient response' after no more than 12 months could be Research Article an alternative option. We were unable to validate the endpoint 'remission' in our AIH cohort due to the lack of follow-up liver biopsies. 34 The endpoints 'complete biochemical response' and 'insufficient response' may serve as the most usable and relevant endpoints in the outpatient clinic.
Other difficulties concerning the management of AIH, based on previous guidelines, have become apparent. Notably, the previous practice guidelines of AASLD indicated that a period of 2 to 3 years had to pass before meeting the definition of incomplete response. 8 Indeed, original data from the clinical trials of Soloway and Summerskill have informed this thinking. 15,46 British guidelines suggested that in practice, the duration of therapy should be for at least 2 to 3 years, with normal transaminases for at least 18 months, to increase the likelihood of complete remission. 33 In retrospect, it is easy to criticize these attempts to define response/non-response/ incomplete response; however, it must be acknowledged that the field has been crippled by a lack of randomised data, and in addition, most therapeutic interventions have been reported retrospectively, rather than prospectively.
We believe that these more user-friendly and time pointderived definitions facilitate the design and delivery of therapeutic procedures within the field, whether they involve firstline, second-line or salvage therapies or not. By unifying these concepts, any patient who has not achieved a complete biochemical response within 6 months of treatment has shown an insufficient response. While recognizing that treatment approaches vary around the world, the most recent practice guidelines of the AASLD suggest that based on the available data, both MMF and tacrolimus-based therapy are appropriate second-line treatments. 1 In contrast, the European Reference Network for hepatological diseases in conjunction with the European Association for the Study of the Liver guidelines suggested that MMF be utilized where thiopurine therapy (azathioprine or mercaptopurine) is not tolerated, and third-line treatment with tacrolimus must be used in cases of an ongoing insufficient response. 34,41 Despite these nuances in clinical practice and the style of practice, it is apparent that our current definitions of response can be used, irrespective of the drug regimen initiated at any given time point.
An agreed upon set of predefined endpoints comes with a number of key advantages. It increases transparency of the therapeutic trajectory, as it signposts the key junctures of the AIH patient journey. This has clear educational benefits. The use of these endpoints contributes to higher uniformity of treatment plans for patients with AIH and allows for better comparison between patients. This will be a boon for research collaboration since disease endpoints can be compared, allowing benchmarking of patient outcomes in AIH. A key advantage is that the use of these endpoints marks the timepoints along the treatment journey, allowing a timely escalation or de-escalation of therapy. It also allows us to identify hitherto unidentified treatment responses, such as relapses or a partial response, which would benefit from a robust definition.
Our study also comes with limitations. We achieved a response rate of 34% over 2 survey rounds. We drew on the mailing list of the IAIHG, and this list contains active contributors, but also a number of non-practicing clinicians and scientists who have since left the field. The IAIHG is highly diverse, comprised of basic scientists and clinicians, including paediatricians and pathologists, who fall outside the remit of our survey. The endpoints proposed in this paper have been agreed upon by 50 individuals who participated in the discussion during the IAIHG workshops on 1-2 July 2019 in Vienna, which is an accurate reflection of the AIH expert community. We missed an opportunity to define 'loss of response' in AIH, as it was not included in the formal Delphi process. Using a real-world cohort of patients with AIH, the current retrospective database is insufficiently sized to establish that the proposed endpoints are the best early response markers. In view of the retrospective nature of the analysis, a number of relevant data points, in particular IgG, were missing. In addition, the number of events was limited, curtailing our options for a highly robust multivariable analysis.
To establish a research agenda in the future, one of the pressing needs in the field is the incorporation of patientreported outcome measures (PROMs) or patient preferences into therapeutic guidance. We recognize that improvements in quality of life must be a treatment goal in AIH, 47 but there are currently no PROMs validated in AIH that provide a detailed assessment of quality of life and symptoms. Second, non-invasive markers, such as FibroScan, magnetic resonance elastography, or multiparametric magnetic resonance imaging, correlating with fibrosis stage and/or hepatic inflammation, need to be validated to assess AIH disease activity, as they have not earned a place among treatment and society guidelines to date. 1 Last, confirmation of the proposed endpoints will be required to investigate their reproducibility and validity in both adult and paediatric cohorts. Large prospective database studies are necessary to address these questions. The European Reference Network on Hepatological Diseases (ERN RARE-LIVER) is collecting prospective data from newly diagnosed patients with AIH (R-liver), and this initiative will contribute to a better understanding of the validity of the proposed surrogate endpoints.

Conclusions
In conclusion, we established consensus among international experts on the definitions of the surrogate endpoints: complete biochemical response, insufficient response, non-response, remission, and intolerance in AIH. We encourage the AIH community to incorporate these endpoints in future studies to facilitate comparison of outcomes between studies, to properly validate these endpoints, and to share and convey data regarding the effect on long-term endpoints, such as liver transplantation and liver-related death.

Financial support
This IAIHG meeting (July 2019) was supported by the YAEL Foundation. This work has been generated within the European Reference Network on Hepatological Diseases (ERN RARE-LIVER).