Learn all about SparkMLLIB at Upxacademy. MLLIB is a machine learning library used to build machine learning models on distributed data.
Apache Pig and Apache Hive are two commonly used data processing components of the Big Data ecosystem. Knowing the differences between these components will help in choosing the right tool to do the job at hand.
Want to learn Hadoop but don’t know Java? No problem. This post makes it super easy for Java beginners to kick-start their Hadoop journey.
Apache Flume is a distributed and reliable system for collection of high throughput data and storing to Hadoop storage. Learn the basics of Apache Flume at UpX academy.
Learn all about Apache Sqoop at Upxacademy. Apache Sqoop is a general purpose tool used to transfer data from traditional databases to HDFS and vice-versa. These import and export jobs can be automated by scheduling them through Apache Oozie
Introduction A Hadoop cluster is a reservoir of heterogeneous data, both structured and unstructured, coming from a variety of sources. Apache Hive is a data warehouse tool that can easily crunch petabytes of data and works well for interactive SQL queries. Industry …
Apache Pig is a platform for analyzing large data sets by representing them as data flows. Pig uses a high-level language Pig Latin for Data Analysis.