datasetposted on 2020-05-22, 04:08 authored by Alan HealeyAlan Healey
Raw data scraped from the Centrelink website on March 11, 2019.
Zip file contains 3 folders: Folders, Pages, and Text.
Pages contains each individual webpage, with HTML code.
The Folders and Text folders both contain processed text files that have been classified by section (Folders) or not (Text).
Individual files outside of these folders were found to cause problems during processing (such as using non-ASCII characters.
- School of Humanities and Social Sciences