Subscribe to our Newsletter

Featured Blog Posts (178)

The Best Answers to Your Most Crucial Deep Learning Questions

(picture from www.re-work.co)

Most people keep close eyes on the top of the fast-moving technology trends. There’s no doubt that deep learning is most trending buzzwords today. Deep learning has made a significant breakthrough and is applied in many areas like facial recognition, recognizing images and AlphaGo Games. Thus…

Continue

Added by Paul Black on December 14, 2016 at 11:30pm — No Comments

Topic Modeling in R

As a part of Twitter Data Analysis, So far I have completed Movie review using RDocument Classification using RToday we will be dealing with discovering topics in Tweets, i.e. to mine the tweets data to discover underlying topics– approach known as Topic Modeling.

What is Topic…
Continue

Added by suresh kumar gorakala on December 23, 2015 at 8:30pm — No Comments

Fast clustering algorithms for massive datasets

Here we discuss two potential algorithms that can perform clustering extremely fast, on big data sets, as well as the graphical representation of such complex clustering structures. By extremely fast, we mean a computational complexity of order O(n) and even faster such as O(n/log n). This is much faster than good Hierarchical Agglomerative Clustering…

Continue

Added by Dr. Vincent Granville on February 23, 2013 at 10:00pm — 4 Comments

Big data set - 3.5 billion web pages - made available for all of us

This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies…
Continue

Added by Dr. Vincent Granville on November 18, 2013 at 10:30am — No Comments

Another large data set - 250 million data points - available for download

This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record. The years 1979 through 2005, inclusive, are available as yearly downloads containing all records for each year, while starting in January 2006 data is available as monthly downloads due to the larger number of records per month over time.…

Continue

Added by Dr. Vincent Granville on November 19, 2013 at 1:30pm — 2 Comments

How to Collect Big Data

Yes we know that you will be having a lots of queries such as Collection of Big Data, How organizations gather Big Data, how to gather information for quantitative research so don't stress, in the event that you are here to hunt down these questions here then you are on the right page as here we are going to give you a complete article on Collection of Big Data strategies quickly. …

Continue

Added by Ayushi Mishra on November 4, 2016 at 3:00am — No Comments

Top 30 Free Web Scraping Software

Web scraping (also termed web data extraction, screen scraping, or web harvesting) is a web technique of extracting data from the web, and turning unstructured data on the web  into structured data that can stored to your local computer or a database.

The web scraping technique is implemented by web scraping software tools. These tools interacts with websites in the same way as you do when using a…

Continue

Added by Paul Black on September 22, 2016 at 11:00pm — No Comments

Smart Business: automated sentiments analysis on top

The modern world seems really fast and dynamic with a multitude of new products being launched. Marketing agencies are making fortune by monitoring the markets and delivering reports on consumers’ opinions. For today, the feedback analysis is a separate area, let’s say a growing industry with an array of products and services. And the prices for those services are pretty exorbitant.

So, do vendors have a chance to cut down…

Continue

Added by Yana Yelina on August 12, 2016 at 12:00am — No Comments

Data Wars: Dawn of the Yottabyte

Big Data is an accumulation of data that is too large and complex for processing by traditional database management tools.

-Merriam Webster

 

Yeah But, What Really Makes Big Data Big Data?  This question is as fundamental to data science as the chicken/egg question should be to researchers at KFC. But we’re not dealing with an A/B chicken model here.  It’s more elephant to the dark room or scaling it up, the nearest star to our galactic…

Continue

Added by Orion Stallard on July 8, 2016 at 12:54pm — No Comments

7 Tools to extract text from HTML document

I want to share an interesting article about data scaping that you might need in your business. The article below is mainly reprinted from here

Text in the HTML document is the content that placed between HTML tags like <a> </a> , <title> </title>. Sometimes we want to extract the text in the HTML document and there are two methods that can…

Continue

Added by Nora Choi on May 31, 2016 at 2:30am — No Comments

Hadoop Yarn explanation and container memory allocations

Yarn Resource manager (The Yarn service Master component)

1) Controls of the total resource capacity of the cluster

2) Whatever the container is needed in the cluster it sets the minimum container size that is controlled by yarn configuration property

àyarn.scheduler.minimum-allocation-mb 1024(This value changes based on cluster ram capacity)

Description: The minimum allocation for every container request at the RM, in MBs.…

Continue

Added by skumar T on May 30, 2016 at 8:00pm — No Comments

Data has always existed, the key is the right data

What does The Library of Alexandria, The Normans and a book have to do with data? I never thought about

The Library...

...at Alexandria was in charge of collecting all the world's knowledge, and most of the staff was occupied with the task of translating works onto papyrus paper... 1

Or The Normans and the...

Domesday Book (Latin: Liber de Wintonia "Book of…

Continue

Added by George Psistakis on May 20, 2016 at 5:20am — No Comments

Data Lakes Still Need Governance Life Vests

As a central repository and processing engine, data lakes hold great promise for raising return on data assets (RDA).  Bringing analytics directly to different data in its native formats can accelerate time-to-value by…

Continue

Added by Gabriel Lowy on April 11, 2016 at 12:00pm — No Comments

The IoT User Experience Urgency

As we evolve toward a software-defined world, there’s a new user experience urgency emerging.  That’s because the definition of “user” is going to be vastly expanded.  In the Internet of Things (IoT)…

Continue

Added by Gabriel Lowy on March 30, 2016 at 9:43am — No Comments

Three Big Data Trends for 2016

Is your company poised to take advantage of three key trends in Big Data? Syncsort, a global leader in Big Data and mainframe software, recently released the results of its second annual Hadoop survey. Based on the survey results there are three areas that companies will focus on in 2016, to realize the full potential of Big Data analytics.

         First, Apache Spark will move from a talking point into deployment. Nearly 70 percent of survey respondents are interested in Apache…

Continue

Added by John McCure on January 22, 2016 at 4:00pm — No Comments

Principal Component Analysis using R

Curse of Dimensionality:

One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as…

Continue

Added by suresh kumar gorakala on February 28, 2016 at 9:30pm — No Comments

Do You Really Need a Big Data Strategy?

With increasing frequency, CIOs are being asked by their senior management, “What’s our big data strategy?”  But do you really need a big data strategy?…

Continue

Added by Gabriel Lowy on January 26, 2016 at 11:48am — No Comments

Learn Everything about Sentiment Analysis using R

Today I will explain you how to create a basic Movie review engine based on the tweets by people using R. The implementation of the Review Engine will be as follows:
  • Gets Tweets from Twitter
  • Clean the data
  • Create a Word Cloud
  • Create a data dictionary
  • Score each tweet.

Gets Tweets from Twitter:

First step is to fetch the data from Twitter. In R, we have facility to call the twitter API using package…
Continue

Added by suresh kumar gorakala on January 11, 2016 at 6:00am — No Comments

The CFO’s New Year’s Resolution

Virtually Print Receipts for Easier Tax Audits

While many individuals will make personal resolutions this December, the New Year is also…

Continue

Added by Sai Gundavelli on December 7, 2015 at 1:52pm — No Comments

The Collective Definition of Data Lake by Big Data Community

Yes, we are marching towards New Year 2016!  What happened to Resolution of 2014, 2015? Quit Habits? Practice Habits? Road ahead? Am into all, but i could not able to keep it up. Hence this New Year 2016 is no more resolutions, just implement the plan.

Extend to that, as we know big data is bringing more business value to enterprise by leveraging the data lake. Data Lake..... What is that? Data Lake is loosely defined word and the definition gets changed during implementation…

Continue

Added by Kumar Chinnakali on December 2, 2015 at 6:00am — No Comments

© 2017   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Terms of Service