We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
hey dudes, i've just started work on scraping the afl tables site and i've already got some nice chunky CSVs. I'm using import.io for all the low-hanging fruit (stuff like "all time" player tables), and have grabbed a local copy of the entire site so I can leverage scrapy or beautifulsoup for the fiddly stuff (individual matches).
when i'm done with that i'll dump it into a sqlite DB and host it somewhere for anyone that wants it. i'm no sql wizard but it should be structured well enough that it can be refactored if need be :)
Awesome work, I have been wanting to work on scraping the afl tables (a goldmine) but am currently busy with uni. I would really like to check out your DB when you get it though, I have some ideas of my own to try find some predictable behavior and will be happy to share results I develop for some discussion!
yeah i've been interested in ML for a long time and joined the footy tipping at work, thinking i could use computers to get a bit of an edge (i know pretty much nothing about AFL). Like Sean, who posted here a couple of years ago, i've got a RFC up and running using some basic stats and have managed to squeeze a 71% success rate out of it. I think that was about a point or two below the accuracy bookers were getting, and as odds were part of my data set, i've been focusing on squeezing more out of it, which has led to the most epic amount of scope creep :P
i know pretty much no stats/maths (enough to get me by), and enough python to do some damage, but i'm also not deluded and am taking this as a solid learning experience in ML with structured data. I've also been thinking a lot about what can be done with the data and I can tell there are a lot of dimensions and relationships to be explored, so i'm really interested in getting this data as granular as possible.
I also want to try and recover the $25 i naively invested a couple of weeks ago ;D
That's awesome Jarryd, would be keen to see what you come up with!
I haven't really worked exclusively with ML before, but I have done some small projects with swarm and evolutionary computation. I would actually like to try apply my own heuristic algorithm though, I have had a few ideas that I would be interested to try. I come from a math/elec engineer background.
In regards to gaining an edge on the TAB I've considered it a few times, I guess a good footy guru could probably pick on average around 70% of the games maybe? What I think would be really cool is if the algorithm could, on average, determine games with close odds.. maybe find some correlation on stats that would suggest the TAB odds are not correct etc. Keep me posted how you go with scraping/sorting afl tables into a DB, I know its a tedious process!
If you were interested in making money off betting on games, one approach I saw somewhere would be to assign odds for each team and then bet if there's a large difference between what the bookies think and what your classifier thinks.
So if the TAB have a team at $4 odds while your system thinks they're better priced at $3 then you make the bet. If you bet on all opportunities like that then statistically you should come out ahead, assuming your classifier is any good.
This works better than the approach of just trying to pick the winning team as a binary win/loss classifier probably isn't going to do any better than a heuristic that picks the team with the best odds.. and if you're doing that then the bookies will win overall due to their house advantage.
(btw Sam i have shamelessly stolen your SQL schema ;D)
hey dudes, took a bit of a break over the last few days because i was getting slightly burnt out getting home from work and coding after a day of coding :P
i ran into a couple of snags parsing the afl tables because they're generated by javascript. selenium seems to have done the trick though, real nifty! luckily the site is actually pretty well structured, and the tables i want to grab from are being created with sanely named css id selectors. once i get the logic fully sussed for the parser (working on the game stats atm), it'll be easy enough to automate the process.
not much else to say really at this stage, just chipping away at it for now!
Hey Ben, I've been meaning to update this for a while with stats from AFL.com.au but haven't found the time. If you're interested in helping out, let me know! I made a start a while ago, check out the python scraping code in the GitHub repo of you're interested.
Hi Sam,
I'm interested in building a supervised machine learning classification model to see how accurately I can predict game results for use in footy tipping. I'm downloading your data and will let you know how it goes. Can you tell me where you got the data from originally? I've been trying to scrape the AFL stats webpage but its been very time consuming, though they appear to have more stats than shown in your schema. Still, will give it a go with your data first and see how it goes!
Cheers,
Sean
Hey Sean!
The data was mostly pulled from footywire but I'm planning to scrape directly from the afl website this season onwards. Let me know if you (or anyone else who happens to be reading this!) are at all interested in having a go at that and I can send you some python code that may help (with writing to the db and stuff like that).
Would be also good to update the games already in the db with the stats that aren't already in there. I think there are some ranges of games that are missing some stats which would be good to fix as well, I listed this as an issue on the github repo.
I appreciate that data munging isn't quite as exciting as machine learning so no sweat if you're not up to it. :)
Very keen to hear how you go with the prediction stuff though, been meaning to give it a go myself for a while!
Sam
Hi Sam,
I'd love to have a crack at your python code, as I think I'll need the extended stats available on the AFL website to improve my model. I don't have a lot of experience with scraping websites and the AFL site has me stumped so far. Any pointers that might help on that front would be much appreciated!
So far I've managed to get up to 65% accuracy with a Random Forest classifier, so only a bit better than flipping a coin :) I've heard rumors of ~75% accuracy being attainable but I'm not sure what data that was using or which algorithm.
Cheers,
Sean
Hey Sean,
Just committed some code to my github repo: footywire-scraper and a bit of stub code for getting data from afl.com.au.
Interestingly enough the AFL.com.au stats site has a JSON API that it uses for retrieving stats. I've had a bit of a play (check the comments at the top of the file for more details) and it seems like it might be possible just to use that to get the data, which should be a lot easier than having to scrape the HTML! You may have to change the value of the "X-media-mis-token" to get it to work but I'm not sure.
If you end up having to scrape it, the footywire scraper might be a bit of help but probably won't be super useful as it's a bit of a mess and specific to that site. For scraping in python I like to use the beautiful soup library. The developer tools in Chrome make it super easy too as you can right click on any element to see the corresponding HTML code.
Let me know if you get stuck and need a hand anyway though, always happy to help in getting more data into the db!
75% would be pretty cool, hopefully all the extra data will get you there! I didn't really realise how many extra stats there were on the AFL site. My litmus test for these sort of sports prediction algorithms is always just seeing if it can beat the strategy of just always going with the book-makers favourite. :) I imagine it'd be pretty hard to beat that though.
Sam
PS: Those are actually links in the first paragraph, disqus just doesn't seem to want to display them differently to the rest of the text. The link is https://github.com/samvrlew... just in case
Interesting to hear about the API, though I haven't been able to find it (Google points me to http://www.programmableweb.... which in turn points me towards http://xml.afl.com.au/mobil..., but this link is broken). Do you have a working link?
Will have a go in the meantime with your scraper code and see how we go.
Have a look at my code here: https://github.com/samvrlew... for an example of using the API. To be clear, this likely isn't an API that AFL.com.au want programmatically accessed but your browser is accessing it when you visit the site so you should be able to 'trick' it into spitting back the data you want. You'll probably have to have a play around with it though to see the format of the request headers/cookies etc that it expects. It may be a bit of work to get going but it's likely that it'll be easier than scraping HTML. Up to you how you go about it though!
Hi Sam
Not sure if you still keep this up to date.. but are we able to get the data for last year? I am interested in trying to look at some patterns. Also did you consider scrapping the site afl tables.. it has a fairly basic site that would enable scrapping of alot more info