DOI 10. 16353 /j. cnki. 1000-7490. 2015. 03. 027 255049 * R Rapid Miner Mahout R Rapid Miner Mahout R Abstract According to the features of big data era this paper analyzes the main challenges that massive data bring to the a- nalysis tool of data science. The paper introduces the big data analysis tool in response to challenges. Then the paper carries on the comparative analysis of R language Rapid Miner and Mahout 3 popular analysis tools of big data in data science which finds that R language and Rapid Miner have fully functions and the Mahout has more outstanding analysis capability of big data. Finally the paper points out the development trend of data science analysis tool. Keywords data science R language big data R Rapid Miner Apache Mahout Nature Science 2008 Nature Big Data 1 1 2011 Science Dealing with Data 2 2012 3 3 1 6 Horizon 2020 4 2014 2 5 7 1 6 * ZR2011GL025 21 134 38 2015 3
ITA 1 V. Dhar 8 J. Leak 9 5 Cyberspace 10 14 3 2 J. Gray Google 11 NoSQL Google BigTable 15 3 VMware Redis 16 Microsoft Azure Tables 17 2. 1 2. 2 N log N N N N 12 PB 13 2. 3 Google MapReduce 18 YouTube Yahoo Hadoop 19 HDFS MapReduce Hadoop Hadoop HPCC R Storm Apache Drill Rapid Miner Mahout 38 2015 3 135
R Hadoop 3. 1 2 Rapid Miner 21 Yale Mahout Java R Rapid Miner Mahout Rapid Miner 6 1 R 20 GNU R CRAN R R R Hadoop Hadoop API GUI GUI 3 Apache Mahout 22 2008 Mahout Hadoop 1 R Rapid Miner Mahout Linux Windows Mac OSX UNIX Linux FreeBSD MacOS Windows Linux Mahout Na ve R Bayes K-Means EM Neural Network MapReduce SVM Apriori KNN Mahout Excel Arff Mahout SPSS Dbase CSV SequenceFile Txtfile PDF ASCII XML HTML SequenceFile NoSQL Rapid Miner6 5 Hadoop MapReduce R Rapid Miner Mahout 1D 2D 3D pdf jpg png R Hadoop Hadoop 23 Hadoop MapReduce PB Radoop 24 Ra- Mahout Hadoop Rapid Miner Mahout Hadoop TB GB doop RapidMiner Apache Hadoop R Hadoop Hadoop MapReduce Hadoop R R Mahout HDFS Hive Mahout R MapReduce Java MapReduce 136 38 2015 3
ITA Mahout Map-Reduce Naive Bayes Naive Bayes Complemen- tary Naive Bayes Naive Bayes 3 3. 2 Hadoop 3 Mahout 1 1 3 5 1 R R R Rapid Miner Rapid Miner 6 Hadoop R Mahout 3D Mahout Hadoop Hadoop Mahout Mahout Ma- preduce Mahout 1 Big Data-Nature EB /OL. 2014-04-10. http / /www. nature. Mahout com /. 2 Dealing with Data-Science EB /OL. 2014-04-10. http / / www. sciencemag. 4 3 DB /OL. 2014-04-10. http / /www. most. gov. cn /. 4 Horizon 2020 EB /OL. 2014-04-10. http / /eu. mofcom. gov. cn /. 5 2014 DB /OL. 2014-04-10. http / / dc2014. codata. cn /. 1 6 Date Science at NYU EB /OL. 2014-04-10. http / / datascience. nyu. edu /. 7 Wikipedia Date Science EB /OL. 2014-04-10. http / / en. wikipedia. org / wiki / Data_science. 8 DHAR V. Data science and prediction EB /OL. 2014-04- 10. http / /cacm. acm. org /magazines /2013 /12 /169933-data-science-and-prediction / fulltext. 9 LEAK J. The key word in Data Science is not data it is science. EB /OL. 2014-04-20. http / /simplystatistics. org / 2 2013 /12 /12 / the-key-word-in-data-science-is-not-data-it-is-science /. 144 38 2015 3 137
ring intention under self-efficacy trust reciprocity and shared-language J. Computers & Education 2013 68 223-232. 27 CHIU C M HSU M H WANG E T G. Understanding knowledge sharing in virtual communities an integration of social capital and social cognitive theories J. Decision Support Systems 2006 42 3 1872-1888. J. 2012 35 7 56-60. 38 SUH A SHIN K S. Exploring the effects of online social ties on 29. Wiki knowledge sharing a comparative analysis of collocated vs dispersed teams J. Journal of Information Science 2010 36 J. 2008 2 30-34. 4 443-463. 30. 39 CHI L CHAN W K SEOW G et al. Transplanting social J. 2009 16 57-81. 31. D. 2009. 32. 40. D. J. 2012 18 1 74-76. 33 ZHANG Y X FANG Y L WEI K K et al. Exploring the role of psychological safety in promoting the intention to continue sharing knowledge in virtual communities J. International Journal of Information Management 2010 30 5 425-436. 34 ZHA X J LI J YAN Y Y. Understanding preprint sharing on sciencepaper online from the perspectives of motivation and trust J. Information Development 2013 29 1 81-95. 35. J. 2012 31 10 1026-1033. 36. CAS D. 2011. 37 PAROUTIS S SALEH A A. Determinants of knowledge sharing 28. using Web 2. 0 technologies J. Journal of Knowledge Management 2009 13 4 52-63. capital to the online world insights from two experimental studies J. Journal of Organizational Computing and Electronic Commerce 2009 19 3 214-236. 2010. 41. D. 2013. 1974 1988 2014-09 - 01 137 10. EB /OL. 2014-04- 20. http / /www. dataology. fudan. edu. cn. 11 GRAY J. Jim Gray on escience atransformed scientific method R. The FourthParadigm Data-intensive Scientific Discovery 2009. 12 HEY T. M. 2012. 13. J. 2013 50 1 146-169. 14 WONG P C SHEN H-W JOHNSON C R et al. The top 10 challenges in extreme-scale visual analytics J. Computer Graphics and Applications 2012 32 4 63-67. 15 CHANG F DEAN J GHEMAWAT S et al. Bigtable a distributed storage system for structured data J. ACM Transactions on Computer Systems TOCS 2008 26 2 4. 16 Redis EB /OL. 2014-05-10. http / /redis. io /. 17 Azure Tables EB /OL. 2014-05-10. http / /azure. microsoft. com /. 18 DEAN J GHEMAWAT S. MapReduce simplified data processing on large clusters J. Communications of the ACM 2008 51 1 107-113. 19 Hadoop EB /OL. 2014-05-20. http / /hadoop. apache. 20 R EB /OL. 2014-05-20. http / /www. r-project. 21 Rapid-I EB /OL. 2014-04-20. http / /rapid-i. com /content /view /181 /196 /. 22 Mahout EB /OL. 2014-04-21. https / /mahout. apache. 23 2013 23 19-20.. R J. 24 PREKOPS K Z MAKRAI G HENK T et al. Radoop analyzing big data with rapidminer and hadoop C / /Proceedings of the 2nd RapidMiner Community Meeting and Conference RCOMM 2011 2011 1-12. 1990 1961 1979 1988 2014-09 - 15 144 38 2015 3