{"id":17045,"date":"2018-04-24T10:11:05","date_gmt":"2018-04-24T14:11:05","guid":{"rendered":"http:\/\/www.bu.edu\/csmet\/?p=17045"},"modified":"2018-04-24T10:11:05","modified_gmt":"2018-04-24T14:11:05","slug":"applying-data-mining-techniques-over-big-data","status":"publish","type":"post","link":"https:\/\/www.bu.edu\/csmet\/2018\/04\/24\/applying-data-mining-techniques-over-big-data\/","title":{"rendered":"Applying data mining techniques over big data"},"content":{"rendered":"<p>Source: <em><a href=\"https:\/\/open.bu.edu\/handle\/2144\/21119\">Applying data mining techniques over big data<\/a><\/em><\/p>\n<p>The rapid development of information technology in recent decades means that data appear in a wide variety of formats \u2014 sensor data, tweets, photographs, raw data, and unstructured data. Statistics show that there were 800,000 Petabytes stored in the world in 2000. Today\u2019s internet has about 0.1 Zettabytes of data (ZB is about 1021 bytes), and this number will reach 35 ZB by 2020. With such an overwhelming flood of information, present data management systems are not able to scale to this huge amount of raw, unstructured data\u2014in today\u2019s parlance, Big Data. In the present study, we show the basic concepts and design of Big Data tools, algorithms, and techniques. We compare the classical data mining algorithms to the Big Data algorithms by using Hadoop\/MapReduce as a core implementation of Big Data for scalable algorithms. We implemented the K-means algorithm and A-priori algorithm with Hadoop\/MapReduce on a 5 nodes Hadoop cluster. We explore NoSQL databases for semi-structured, massively large-scaling of data by using MongoDB as an example. Finally, we show the performance between HDFS (Hadoop Distributed File System) and MongoDB data storage for these two algorithms.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: Applying data mining techniques over big data The rapid development of information technology in recent decades means that data appear in a wide variety of formats \u2014 sensor data, tweets, photographs, raw data, and unstructured data. Statistics show that there were 800,000 Petabytes stored in the world in 2000. Today\u2019s internet has about 0.1 [&hellip;]<\/p>\n","protected":false},"author":2828,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10831],"tags":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/posts\/17045"}],"collection":[{"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/users\/2828"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/comments?post=17045"}],"version-history":[{"count":1,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/posts\/17045\/revisions"}],"predecessor-version":[{"id":17046,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/posts\/17045\/revisions\/17046"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/media?parent=17045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/categories?post=17045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bu.edu\/csmet\/wp-json\/wp\/v2\/tags?post=17045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}