Sunday, December 7, 2008

Comparison of historical financial data accuracy among three providers

I had been using Yahoo finance's historical data for some quantitative modeling I have been dabbling with. At one point, I noticed that that for two assets, Yahoo did not report closing prices on the same set of days. That is, there was a day in which Yahoo reported a closing price for one but not the other. So I started investigating the extent of data inconsistencies and here's the upshot:

I compared two historical data sources and used a third to resolve disaggreements. The first is
Yahoo's historical data, which is free and is based on Commodity Systems, Inc. (CSI). The second data source is Xignite, which costs about $160/month and is based on some combination of DDF, Reuters, and Zacks. When the two sources disagreed, I resolved it by consulting a third, marketwatch.com, which is free and based on Interactive Data Pricing and Reference Data. The comparison was of market days as well as closing prices.

Here is the abbreviated raw output of my comparison script for two assets: VEURX and VBISX (Vanguard funds)


desktop:~/finance$ ./st.py veurx
Xignite/VEURX(2008-11-20 - 1994-01-04) vs Yahoo/VEURX(2008-11-13 - 1990-11-01)
WARNING: Xignite/VEURX is missing date 2006-01-13, present in Yahoo/VEURX and marketwatch.
WARNING: Xignite/VEURX is missing date 2004-05-26, present in Yahoo/VEURX and marketwatch.
WARNING: Xignite/VEURX is missing date 2004-02-25, present in Yahoo/VEURX and marketwatch.
WARNING: Xignite/VEURX is missing date 1995-07-12, present in Yahoo/VEURX and marketwatch.
Xignite/VEURX leads with 5 extra points
Yahoo/VEURX goes further back 802 points
BAD DATA on 1994-02-08
Yahoo= 0.00350 Xignite=-0.00083 1994-02-08 1994-02-08 8.60000 8.57000 12.04000 12.05000
Xignite is the outlier
BAD DATA on 1994-02-09
Yahoo=-0.00465 Xignite= 0.00000 1994-02-09 1994-02-09 8.56000 8.60000 12.04000 12.04000
Xignite is the outlier
BAD DATA on 1994-03-01
Yahoo=-0.01175 Xignite=-0.02843 1994-03-01 1994-03-01 8.41000 8.51000 11.62000 11.96000
Xignite is the outlier
BAD DATA on 1994-03-02
Yahoo=-0.01665 Xignite= 0.00000 1994-03-02 1994-03-02 8.27000 8.41000 11.62000 11.62000
Xignite is the outlier
BAD DATA on 1994-03-04
Yahoo= 0.00963 Xignite= 0.00000 1994-03-04 1994-03-04 8.39000 8.31000 11.68000 11.68000
Xignite is the outlier
BAD DATA on 1994-03-07
Yahoo= 0.01192 Xignite= 0.02226 1994-03-07 1994-03-07 8.49000 8.39000 11.94000 11.68000
Xignite is the outlier
BAD DATA on 1994-05-26
Yahoo= 0.00239 Xignite=-0.01954 1994-05-26 1994-05-26 8.39000 8.37000 11.56936 11.79995
Xignite is the outlier
BAD DATA on 1994-05-27
Yahoo=-0.01073 Xignite= 0.01127 1994-05-27 1994-05-27 8.30000 8.39000 11.69969 11.56936
Xignite is the outlier
BAD DATA on 1994-06-27
Yahoo= 0.00000 Xignite= 0.01317 1994-06-27 1994-06-27 8.10000 8.10000 11.56936 11.41898
Yahoo is the outlier
BAD DATA on 1994-06-28
Yahoo= 0.01605 Xignite= 0.00260 1994-06-28 1994-06-28 8.23000 8.10000 11.59944 11.56936
Yahoo is the outlier
BAD DATA on 1994-07-18
Yahoo= 0.00000 Xignite= 0.00993 1994-07-18 1994-07-18 8.59000 8.59000 12.23104 12.11074
Yahoo is the outlier
BAD DATA on 1994-07-19
Yahoo= 0.00815 Xignite=-0.00164 1994-07-19 1994-07-19 8.66000 8.59000 12.21099 12.23104
Yahoo is the outlier
BAD DATA on 1996-02-20
Yahoo=-0.01214 Xignite= 0.00000 1996-02-20 1996-02-20 10.58000 10.71000 15.09505 15.09505
Xignite is the outlier
BAD DATA on 1996-02-21
Yahoo= 0.00567 Xignite=-0.00699 1996-02-21 1996-02-21 10.64000 10.58000 14.98949 15.09505
Xignite is the outlier
BAD DATA on 1998-05-19
Yahoo= 0.01495 Xignite= 0.00000 1998-05-19 1998-05-19 19.69000 19.40000 27.32290 27.32290
Xignite is the outlier
BAD DATA on 1998-05-20
Yahoo= 0.01524 Xignite= 0.03048 1998-05-20 1998-05-20 19.99000 19.69000 28.15558 27.32290
Xignite is the outlier
[ ...... omitted rest for brevity ....... ]
TOTAL: Yahoo=1.51066 Xignite=1.51084
desktop:~/finance$ ./st.py vbisx
WARNING: Fabricating data for Yahoo/VBISX on 2003-11-28
Xignite/VBISX(2008-11-19 - 1994-03-10) vs Yahoo/VBISX(2008-11-10 - 1996-06-20)
WARNING: Xignite/VBISX is missing date 2004-05-26, present in Yahoo/VBISX and marketwatch.
WARNING: Xignite/VBISX is missing date 2004-02-25, present in Yahoo/VBISX and marketwatch.
Xignite/VBISX leads with 7 extra points
Xignite/VBISX goes further back 575 points
BAD DATA on 2004-04-20
Yahoo= 0.00000 Xignite=-0.00391 2004-04-20 2004-04-20 8.58000 8.58000 15.61512 15.67636
Yahoo is the outlier
BAD DATA on 2004-04-21
Yahoo=-0.00233 Xignite= 0.00098 2004-04-21 2004-04-21 8.56000 8.58000 15.63043 15.61512
Yahoo is the outlier
BAD DATA on 2008-10-31
Yahoo= 0.00101 Xignite= 0.00412 2008-10-31 2008-10-31 9.95000 9.94000 18.22287 18.14801
WARNING: All prices agree with Marketwatch
TOTAL: Yahoo=0.88037 Xignite=0.88195


In general, over a bunch of assets, it seemed like Xignite's data was more buggy than Yahoo's. The caveat here of course is that marketwatch is an independent data source. I think it is because it sometimes agrees with Yahoo and sometimes with Xignite, though that's not conclusive.

1 comment:

  1. Hi Joshua,

    I am using QSToolKit to build a quantitative analysis, it seems all historical data from FREE provider (Yahoo, google) are not reliable. Did you buy data from premiumdata.com? Does it reliable?

    ReplyDelete