Does Machine Translation Affect International Trade? Evidence from a Large Digital Platform. Erik Brynjolfsson, Xiang Hui, Meng Liu. Management ScienceVol. 65, No. 12, Sep 3 2019. https://doi.org/10.1287/mnsc.2019.3388
Abstract: Artificial intelligence (AI) is surpassing human performance in a growing number of domains. However, there is limited evidence of its economic effects. Using data from a digital platform, we study a key application of AI: machine translation. We find that the introduction of a new machine translation system has significantly increased international trade on this platform, increasing exports by 10.9%. Furthermore, heterogeneous treatment effects are consistent with a substantial reduction in translation costs. Our results provide causal evidence that language barriers significantly hinder trade and that AI has already begun to improve economic efficiency in at least one domain.
1.1. Related Literature and Contribution
1.1.1. AI and Economic Welfare. The current generation
of AI represents a revolution of prediction and classification
capabilities (e.g.,Brynjolfsson andMcAfee 2017).
Recent breakthroughs in ML, especially supervised
learning systems using deep neural networks, have
allowed substantial improvements in many technical
capabilities. Machines have surpassed humans at
tasks as diverse as playing the game Go (Silver et al.
2016) and recognizing cancer from medical images
(Esteva et al. 2017). There is active work converting
these breakthroughs into practical applications, such
as self-driving cars, substitutes for human-powered
call centers, and new roles for radiologists and pathologists,
but the complementary innovations required
are often costly (Brynjolfsson et al. 2019).
Machine translation has also experienced significant
improvement because of advances in ML. For
instance, the best score at the Workshop on Machine
Translation for translating English into German improved
from 23.5 in 2011 to 49.9 in 2018,2 according to
the widely used BLEU score, which measures how
close the MT translation output is to one or more
reference translations by linguistic experts (for details,
see Papineni et al. 2002). Much of the recent
progress in MT has been a shift from symbolic approaches
toward statistical and deep neural network
approaches. For our study, an important characteristic
of eMT is that replacing human translators with
MT or upgrading MT is relatively seamless. For instance,
for product listings on eBay, users consume
the output of the translation system but, otherwise,
need not change their buying or selling process. Although
users care about the quality of translation, it
makes no difference whether it was produced by a
human or machine. Thus, adoption of MT can be very
fast and its economic effects, especially on digital
platforms, immediate. Although, so far, much of the
work on the economic effects of AI has been theoretical
(Sachs and Kotlikoff 2012, Aghion et al. 2017,
Korinek and Stiglitz 2017, Acemoglu and Restrepo
2018, Agrawal et al. 2019) and notably (Goldfarb and
Trefler 2018) in the case of global trade, the introduction
of improved MT on eBay is an early opportunity
to assess the economic effects of AI using
plausible natural experiments.
1.1.2. Language Barriers in Trade. Empirical studies
using gravity models, which are formally derived in
Anderson and Van Wincoop (2003), have established
a robust negative correlation between bilateral trade
and language barriers. Typically, researchers regress
bilateral trade on a “common language” dummy and
find that this coefficient is strongly positive (Egger
and Lassmann 2012).3 However, these cross-sectional
regressions are vulnerable to endogeneity biases even
after controlling for the usual set of variables in the
gravity equation. For example, two countries with the
same official language (e.g., the United Kingdom and
Australia) can also be similar in preferences for food,
clothing, entertainment, and so forth. Without exogenous
variation in one or the other, it is impossible to
tease out the language effect on trade.
Our paper exploits a natural experiment on eBay
that provides exactly such an exogenous change,
namely a large reduction in the language barrier, and
assesses its effect on international trade. The online
marketplace provides us with a powerful laboratory
to study the consequences on bilateral trade after this
decrease in language barriers for a given language pair.
Our finding that a quality upgrade of machine translation
could increase exports by about 10.9% is consistent
with Lohmann (2011) and Molnar (2013), who
argue that language barriers may be far more trade
hindering than previously suggested.
1.1.3. Peer-to-Peer Platforms and Matching Frictions.
Einav et al. (2016) and Goldfarb and Tucker (2017)
provide great surveys on how digital technology has
reduced matching frictions and improved market efficiency.
Reduced matching frictions affect price dispersion
as evidenced in Brynjolfsson and Smith (2000),
Brown and Goolsbee (2002), Overby and Forman
(2014), and Cavallo (2017). These reduced frictions
also mitigate geographic inequality in economic activities
in the case of ride-sharing platforms (Lam and Liu
2017, Liu et al. 2018), short-term lodging platforms
(Farronato and Fradkin 2018), crowdfunding platforms
(Catalini and Hui 2017), and e-commerce platforms
(Blum and Goldfarb 2006, Lendle et al. 2016, Cowgill
and Dorobantu 2018, Hui 2019). We contribute to this
literature by documenting the significant matching
frictions between consumers and sellers who speak
different languages. Specifically, we find that efforts
to remove language barriers increase market efficiency
substantially.
[...]
2. Background
[...]
Prior to eMT, eBay used Bing Translator for query
and itemtitle translation. Therefore, the policy treatment
here is an improvement in translation quality. To understand
the magnitude of quality improvement, we
follow the MT evaluation literature and report qualities
based on both the BLEU score and human evaluation.
The BLEU score is an automated measure that has
been shown to highly correlate with human judgment
of quality (Callison-Burch et al. 2006). However, BLEU
scores are not easily interpretable and should not be
compared across languages (Denkowski and Lavie
2014). Generally, scores over 30 reflect understandable
translations, and scores over 50 reflect good
translation (Lavie 2010). On the other hand, although
human evaluations are highly interpretable, they are
very costly and can be less consistent.
A comparison of Bing and eMT translation for item
titles from English into Spanish revealed the BLEU
score increased from 41.01 to 45.24, and human acceptance
rate (HAR) increased from 82.4% to 90.2%.
To compute HAR, three linguistic experts vote either
yes or no for translations based on adequacy only
(whether the translation is acceptable for minimum
understanding), and the majority vote is then used to
determine the translation quality. In comparison, the
BLEU score is rated based on both adequacy and fluency
because it compares the MT output with human translation.
Therefore, in cases in which the grammar and
style of translation are not of first-order importance,
such as in listing titles, one might prefer using HAR for
measuring translation quality.
No comments:
Post a Comment