Python for Data Science

Python for Data Science

Dec 06, 2020

Why Python?

Python is a high level, extensible, portable dynamic programming language. It has English like syntax which makes it easier to read. The syntax in Python helps the programmers to do coding in fewer steps as compared to Java, C or C++. Python can be extended to other languages. You can write some of your code in languages like C++ or C. Python is a very productive language, due to the simplicity of Python, developers can focus on solving the problem.

It has extensive library support it contain code for various purposes like string operations, web browser, web service tools, regular expressions, documentation-generation, unit-testing, threading, databases email, image manipulation, and more.

Python has great libraries for data science like:

  • Pandas: Pandas is an high-level data manipulation tool developed by Wes McKinney. It allows to you perform data manipulation in Python. Pandas library is built on top of Numpy, meaning Pandas needs Numpy to operate. Pandas provide an easy way to create, manipulate and wrangle the data.

  • NumPy: NumPy (Numerical Python) is a Python library used for working with mathematical and logical operations on Arrays. it is developed by Travis Oliphant. Arrays are very frequently used in data science, where speed and resources are very important. It provide 50x faster than traditional Python lists.

  • Visualization: These libraries are used for data visualization.

    Matplotlib: It is the default visualization Python library data scientists turn to. Matplotlib supports all the popular charts (lots, histograms, power spectra, bar charts, error charts, scatterplots, etc.)

    Seaborn: Seaborn is built on top of matplotlib and provides a very simple yet intuitive interface for building visualizations. Seaborn is built on top of matplotlib, it is highly compatible with it. You can start with advanced plots that seaborn already supports and then customize them as much as you want with the help of matplotlib.

    Plotly: Plotly is highly advanced visualization tools that’s capable of handling geographical, scientific, statistical, and financial data in a interactive manner. Plotly is packed with high-powered tools for analytics that can take on computes vision, ML, Deep Learning, NLP, and more.

  • Skit-Learn: Scikit-learn provides a range of Supervised and Unsupervised Machine Learning algorithms via a consistent interface in Python. It is built upon Scipy.

Python v/s R:

Python is a general purpose programming language that can handle everything from software development to web development to data mining. On the other hand R is a domain specific language which is known for providing statistical and graphical techniques.

So the question arises which is better for data science: R or Python

It all depends upon the need of user. If statistic is heavy you should use R.

Some Resources to learn Python:

Enjoy this post?

Buy Mustafa Khan a coffee

More from Mustafa Khan