T O P

  • By -

DrMaxwellEdison

Do those 10mil include the 20 or so notebooks I made for the Advent of Code 2020 challenge? Probably not the best representation of the state of data science there.


Accidental_Arnold

And 9 Million Titanic Survivor notebooks.


FuckNinjas

And what have we learned? Little girls are impervious to icerbergs.


the__itis

At this scale, it still gives a general idea of what people are working on the past two years.


bythenumbers10

And boom goes the dynamite, or in this case, "sampling error?" goes the data scientist.


kenny10101

Are you also majoring in medicine? :D


HeyItsRaFromNZ

This really is an interesting analysis! Not surprising: * Python is by far the dominant language (although I'd like to find out a bit more about this new nan framework) * Python 3 has seen massive growth over the past two years * NumPy is the single most used data science library * NumPy and pandas is the most popular combination * Keras is wildly popular as a deep learning framework, although PyTorch has seen massive growth recently What I find a little disheartening is how they're being used: 1. **Half of all these notebooks have fewer than four markdown cells** 2. **Over a third of these notebooks** **~~will~~** **may fail if you try to run the cells in order** A good notebook is like a great conversation. Cells should be like sentences. You make a point. You then justify that point. Make another point. One of the greatest advantages of using a notebook over editing a plain Python script with an IDE is that *you can give context to your code* . In other words, you can provide the framework (text, equations etc.) for the code you're about to run. Then run the code. Then hopefully provide some illumination as to the result, why the reader would care, etc. If you're not going to provide context for your code, then you've abdicated on the main advantage of using a notebook, and left with the down-sides of using a notebook (version control, formatting, automation, production integration). The second point is also potentially worrying. A good notebook really should run linearly. To take my conversation analogy further, it would be like trying to follow a conversation that keeps jumping back and forth, giving away the punchline before the ​ Edit: Because the analysis only checked the output of the notebooks, rather than actually run them, the analysis can't determine if the logic itself is out of order. This statefulness of notebooks can lead to confused work-flows, so committing a notebook run out of order is best avoided!


CognaticCognac

I am not a data scientist, just a chemist who sometimes uses Python. Thus, I can't reasonably judge the validity of structure of Jupyter Notebooks intended for data science. Yet I can assume that if one collects all notebooks on Github, quite a large fraction of them would not be made by data scientists. And while my circle of colleagues who uses Notebooks is not too big, I'd say that most of them (including myself) uses Jupyter Notebooks just as an equivalent of, well, paper notebooks: not a structured project, but a series of thoughts, a bit of throwing things around to see how they fit, playing with functions to see which plot of experimental data would be better, etc. The result is very far from structured, and cells surely would not work in the mess that results, but separate bits of code would be copied into a designated script, with Jupyter notebook remaining a playground. That's just to take into account when looking at findings in the article.


teerre

Unfortunately if you ever work in data-science, and specifically ML, you'll quick realize that this 1/3 mark is probably too forgiving. It's common for papers from big conferences to be at least partly unreproducible. It's huge problem. So much so that at my team when starting with some random paper, we immediately assume the results shown won't be reproducible. And this from peer-reviewed, fairly big publications. Random github repository? Ha! Good luck. I'm positive much more than 1/3 are actually not correct notebooks.


vectorpropio

The same have been true for scientific programming from eons. Take some funny geological paper from the 90 and try to use it as is.


ClayQuarterCake

Then github or jupyter cannot be the end-all resting place for published code in that form. The problem with GitHub is that the repository is so easy to edit. I can't tell you how many times I have had working code and then changed it to non-working code by simple virtue of the code writing process. I get up to do something else or work on another project and I forget how to make it work.


teerre

I wish the problem was that easy, but I highly doubt it. I think there's a deep issue with how ML research is done that leads to the situation we have today. This certainly relates to procedure, but it also relates to deeper issues like the pressure to publish, the inherent lack of understanding of how neural networks truly work and the possibility that even the most comprehensive ablation studies are many times are simply not enough.


HeyItsRaFromNZ

As a lapsed physicist, I can very much relate to this. Scientists should not be expected to be software engineers, there's already enough on your plate. However, I will say this: be kind to your future self. It can be very hard to follow the logic of your own investigation without a little bit of commentary and clean-up. This may be an actual, physical lab notebook, or a virtual one. Similar principles apply. If it was worth investigating, then it's worth just spending that ten or fifteen minutes extra to make sure it runs OK and will make sense later. Doesn't have to win a Pulitzer, can be stream-of-conscious. But your future self, or collaborators, will thank you for it. This advice also goes for commenting code in scripts and docstrings for functions (I could personally do better with these too).


BDube_Lensman

I disagree to an extent on "scientists should not be expected to be software engineers." Perhaps not on expected, but like it or not science in 2020 ~= coding, modulo experimental work (which still has quite a bit of coding in my field). Writing legible, reproducible code usually makes your work go faster, it's just a different balance. Spend {period of time} up-front making "nicer" code, spend {less} time later on the actual science part, because you aren't fighting the code, or the code is faster. Often, too, bad code simply walls off some scientific inquiries. E.g., "simulate an image from the camera" may take 120 ms. Seems fast, but if you need to do 110,000 of them to even start producing science, that's half of a work shift per run. At an iterative cadence, that means you almost certainly will not ask some questions. If it takes 5us, you will ask every question because there is not pain / substantial waiting for the code to run. Some areas of science (e.g. instrument design) are also big on configuration management, so you need to be able to archive your code and any inputs and say "in designing {widget}, I used {repo} at {commit} with data archive {file.zip} which has hash {md5}, the versions of major {dependencies} were {v}, this {notebook} serves as a record."


fgyoysgaxt

Agreed. Programming literacy has become an incredibly important skill. We don't expect academics to be New York Times best selling authors, but you need a certain amount of writing skills to write a paper. We need to help academics get to the same place with programming.


JohnBrownJayhawkerr1

I definitely think scripting as is common in STEM is definitely non-negotiable, but for the meatier aspects of software design, I think it's reasonable to hand that off to folks who specialized in the field for a reason. The problem is, everyone wants the jack of all trades who only wants to get paid in beer money.


zed_three

Scientists need to work with research software engineers -- professionals who understand both the research and the software. Highly optimised code should be left to the RSEs


Log2

Not only that, but if, at the very least, your notebook doesn't run linearly, then you'll likely be the first one bitten by it when you finally need to extract some experiment from that notebook (or some poor schmuck who inherits your notebooks).


JohnBrownJayhawkerr1

As one of those mega-brain software developers, I mostly agree. If I was assisting you on a really complicated physics simulation, certainly it would help if I had a broad knowledge of the field provided by a high school education or whatever, just as it would help you if you knew how to set things up algorithmically, but neither one of us should be expected to be experts in the other's field, and the reason for it is due to penny-pinching by industry and academia alike. Folks above our paygrades think that "scientists" can just do it all, regardless of their domain, and it's a huge wellspring of stress to try and do your job on top of learning a whole separate specialty.


CognaticCognac

That is a good advice that I'll have to keep in mind, thank you. Somehow it's harder to keep tidiness in mind for Notebooks than it is for "standard" scripts.


wannabe414

That's how i use some of my notebooks, as well (recent economics undergrad). But those notebooks aren't the ones I upload to github, and if others are like me those aren't the kinds of notebooks that are being analyzed.


tom2727

This 1000%. Most notebooks are where I put my "playing around code" before I get it into shape to actually be committed into a library file (documented, linted, tested, formatted) that I actually expect others to use. I might commit a notebook to a repo, but I make it clear they are just a snapshot in time of code that I was playing around with and might never actually use, but want to keep around in case I ever want to come back to it. I find this to be the killer application for them, good for rapid iteration.


[deleted]

what do you use python for in chemistry?


CognaticCognac

I work in electrochemistry, and most of my work is related to batteries and supercapacitors electrode materials. There is only a handful of (standard) methods for analysing the materials, yet one experiment may produce hundreds of curves (usually as many as recharging cycles done), which should be analysed for electrode capacity, energy density, polarisation, maybe some quirks with unusual voltage plateous (or peaks in case of cyclic voltammetry). For some reason this data extraction is often done manually (it's especially tiresome to persuade older professors not even to use scripts, but to trust them ["See, when you do something in Origin, you can see each step and he sure of the result. I can't be so sure about your fancy Python scripts"], but that's a different topic). I prefer to do some work using Python, quickly plot some figures with capacity/efficiency/stability/peaks position, and move on to the next experiment. But that's primitive stuff, of course. There are some really cool things like battery behaviour modelling that actively use programming in electrochemistry ([PyBAMM](https://github.com/pybamm-team/PyBaMM)), yet as much as I want to, I consider myself too thick to understand the math behind this.


NewDateline

I think they don't say that the notebooks will fail if executed in order but that they might because they were saved with execution counters pointing at a non linear execution. If done sporadically (commonly last few cells are re-run in a long notebook as this is most often when additions hapen) its not the end of the world, though best avoided.


tangerinelion

Ideally you would commit a state of the notebook to your repo where it is either unexecuted or executed linearly once. The in-between "development" phase where you tweak your plots and massage your data should be removed. You're spot on that the statement in point 2 is not in the original post. A third saved with non-linear execution does not mean a third require non-linear execution.


ChemEngandTripHop

As I understand it u/NewDateline isn't talking about the "in-between development phase". You could have a notebook that runs perfectly linearly but if you go back and run the cell before last the execution count would not be linear, even if there is no dependency from that last cell on the penultimate one. For people who want to get rid of the execution count (one of the many nb aspects that cause issues with git) then you can use `nbformat` to remove them.


tom2727

> (commonly last few cells are re-run in a long notebook as this is most often when additions hapen) its not the end of the world, though best avoided. With data science, one thing I find myself doing is running the "put data into graph" part of the notebook over and over as I tweak the look and change colors and whatnot, or maybe make multiple graphs with different slices. Where the "query and process raw data" step is usually not something I need to rerun and is often the most time consuming step for large datasets.


HeyItsRaFromNZ

Great point! I've edited my comment to reflect this. ​ You're right, it's not the end of the world, but it is best if you can just 'run all cells' to make sure the notebook actually runs linearly. I have committed notebooks that I thought would run all the way through but didn't, and only found out when someone else tried to run the notebook.


HonestCanadian2016

I can't tell you how many notebooks I've downloaded in my pursuit of learning Machine Learning that didn't work as written. The one positive takeaway is that it's allowed me to try and find the errors in the code, which in and of itself, have been a powerful, though unpleasant learning experience.


HeyItsRaFromNZ

Same here. This is one of the reasons I encourage others (and myself) to make sure it actually runs linearly. I have often deleted cells after the terrible last minute re-factoring job I did. I now make sure I flame the kernel and run all cells before sharing a notebook. And I encourage anyone I can to do the same.


reallyserious

>A good notebook is like a great conversation. Cells should be like sentences. You make a point. You then justify that point. Make another point. > >One of the greatest advantages of using a notebook over editing a plain Python script with an IDE is that you can give context to your code . In other words, you can provide the framework (text, equations etc.) for the code you're about to run. Then run the code. Then hopefully provide some illumination as to the result, why the reader would care, etc. Regular source code in scripts should ideally also be documented. So what you're describing isn't unique for notebooks. Sure you won't easily be able to inline LaTeX renderings of equations in normal source code but I don't see that as something particularly important. I'm sure notebooks have a place. But they're not for me. I've been developing professionally for +20 years and I just get frustrated with the horrible web interface and lack of proper debugging abilities. Give me a real IDE so I can write normal code without all this cell nonsense.


double_en10dre

I think the key difference is that the people who typically use notebooks are *not* developers. They are scientists or researchers, and the points they make should be about the underlying business or organizational motivations for writing this code As an example, there are many people working in finance who use python/notebooks as a tool for ingesting data, analyzing it, and devising new strategies. If I look at their notebooks, I expect to see documentation explaining the strategy, not the code


PediatricTactic

Some of us use notebooks just for the pretty default data frame formatting and because it's all the government lets us install.


HeyItsRaFromNZ

I hear you. I've taught in a lot of these environments. Some of the weird hoops and restrictions led to a certain amount of creativity. For example, teaching web-scraping at a (US) government department. The lecture component went fine, we were all interacting fine with [commonly used, uncontroversial site #1]. It took a while (too long, in hindsight) to figure out why the lab wasn't working out: no access to innocuous site #2


nickeltini

My notebooks run linearly but I leave errors in there and then completely rewrite the solved code in the following cell because I’m actually taking notes but these notebooks are not published to my GitHub obv


HeyItsRaFromNZ

Exactly!


FleetAdmiralFader

>left with the down-sides of using a notebook (version control, formatting, automation, production integration). I've worked in the industry for long enough and been to enough conferences to know that these are no longer limitations of using notebooks. For example, the vast majority of Netflix's internal data platform uses notebooks. Have you ever heard of papermill?


HeyItsRaFromNZ

Sure thing, I'm not saying that these will necessarily be noticeable issues in the future. The tooling and support for notebooks has been constantly improving, which is great to see. They're far easier to install, use and deploy than they used to be. Papermill is great, but it requires a higher level of institutional buy-in, which I haven't really seen yet (I teach and consult in this space full-time for a wide variety of clients). I'm also happy to see extensions like nb-black, too help with formatting. Unfortunately, relatively speaking, the users aware of linters etc are less likely to need them!


FourFingerLouie

I have a question on this: I'm doing a data science project to show for job applications. I didn't know wether to code it in an IDE and present it like production code, or use Juypter notebooks. Any input on the differences? I have it in the production format as of now, but I feel Juypter would be easier to show analysis.


HeyItsRaFromNZ

This does depend on what position you're applying for, and the skills you'd like to demonstrate. If the main point of your demo is the analysis of the data, some modeling and giving context for the results, then notebooks are definitely the way to go. They're easier to share without the end-user having to fire up their IDE (you can put your notebook directly on Github or save the output as HTML). Bear in mind, if this is an entry-level position, that most prospective employers are absolutely swamped with applications. Make it as easy to consume as possible. Don't assume the reader has followed every little detail, so make sure you clearly mark the problem and show clearly how you've solved this problem. If the position is more on the engineering side, then you might want to keep it as a script, assuming you've nicely formatted and commented appropriately etc. Some employers are burned from data scientists not understanding how to deploy their work in production, so there is certainly merit in sharing your work just as a script. It can be harder to share this, however.


FourFingerLouie

Thanks for the well thought out response. Should I just do both? It wouldn't be hard to transfer the scripts into a notebook format. The jobs I'm applying for are Data Analyst/Entry Level Data Science roles.


HeyItsRaFromNZ

Great question. The fact you're concerned about, and know the difference between, production and exploratory data analysis, is a competency trigger. If you're happy maintaining the two versions and can point to the two for these two distinct applications, then I would say that is a good idea. Very often I make a notebook from a script I've been developing on a more typical IDE (I'm currently a fan of VSCode, although I've used PyCharm and vim in the past and love those too). It's not hard to turn that into a notebook once you're happy with it. Then you can load the whole script into a new notebook with: %load script_name.py Find the distinct sections (I use comment blocks in scripts for this) and split the cell into two at the cursor with: Ctrl+Shift+- i.e. hold down control and shift and hit minus, while in edit mode Then create cells above and below, and explain what you're about to do, and then how it worked out.


FourFingerLouie

Wow thank you for the response! Hopefully this project mixed with my internship experience will make for a good January application process :)


HeyItsRaFromNZ

Awesome! Best of luck!


ChemEngandTripHop

Agreed with everythng you said apart from >If you're happy maintaining the two versions Trying to keep two codebases in different forms consistent with each other is a recipe for disaster. For u/FourFingerLouie I'd recommend it's best to either abstract the core functionality into a .py and then call that into an nb, or use something like nbdev to auto-generate the .pys from the nbs


HeyItsRaFromNZ

Sure thing, I completely agree with you there. I'm not advocating maintaining distinct code-bases in general. ​ This advice was meant purely for OP's specific case, i.e. show-casing he can display the results of an analysis in a notebook, but also has the chops to turn that into production-ready code. The fact is that code written for production and code written in a notebook for interactive/exploratory work do smell different. There's no perfect solution without a bit of filing off the rough edges. Loading a .py into a notebook is not a great solution, as you either miss all the context provided by the markdown, or you clutter up your original script with comments that wouldn't normally be considered good commentary style. On the other hand, using something like nbconvert dumps a huge amount of boiler-plate, and puts the markdown into huge commentary blocks. nbdev is a great suggestion, especially for the documentation side of things, but you still have to sprinkle in the export tags for each cell (not that I have any real experience with nbdev --- it looks like a great tool). The extra layer of complexity may not translate well for what OP would like to demonstrate.


theLastNenUser

I wonder how they accounted for people loading local datasets while iterating over 10M notebooks


Zeroflops

I think using markdown cells as an indicator of quality of documentation is bad. I use notebooks all the time but hardly ever use markdown except at key points. Almost all of my comments use standard python comments which makes it easier to move to py files if needed.


ClayQuarterCake

Ok I got the disappointed sentiment in your comment but I don't understand what any of it means. "Give context to your code" You use python/numpy/pandas because you have a body of data and you need to analyze it. You mean you want me to include all 2,000 CSV files that are 400 columns wide and 36,000 rows long? What if I don't want to or can't share what I am analyzing? I understand the language as much as I need to get my job done. I make my living by primarily doing other things beside coding. I am willing to bet that 80% of your new notebook users are in my boat. If you had an army of developers who are getting paid to write python code. Jupyter notebooks are great and all but they are in the same category as GitHub. I will only learn as much as I need to get the job done.


jwink3101

I have to say that an analysis from a company that designs tools to be used for Data Science, etc did such a bad job plotting. All of the labels should be at least 5 font sizes bigger. Or more! It is actually impossible to read some of it. They should be ashamed. Also, to call a notebook "not consistent" since the execution order is not linear is not to say that it is, in reality, not consistent. I know I am often jumping around while doing the analysis. I do like to rerun to ensure it is in order but that doesn't always happen. Especially with one-offs Finally, It would be more interesting to plot number of X-type cells divided by number of non-empty cells (or all cells but I know a lot of people, myself, included, have empty cells at the bottom)


execrator

Agree about consistency. Most of my notebooks are "inconsistent" by this standard. I would be surprised if there was more signal than noise in the way this has been measured. The only way to know if the notebook is actually inconsistent is to execute it in natural order, then again in as-it-happened order. If you get exceptions in the former but it works in the latter, it's inconsistent.


alexeusgr

And realized there's no way you can read through that till the end of the universe?


BAAM19

Computers can probably read that fast.


alexeusgr

Yeah, have you seen a computer which can explain why you can pull with a string but not push?


alexeusgr

Actually now I'm curious: what is the minimal random sample size that would produce comparable results?


Paddy3118

I await your notebook of your result!


alexeusgr

It's too difficult project, I'll code sudoku solver or make a GAN dickpic generator. Or make a deepfake Putin YouTube channel and start a revolution in Russia


Paddy3118

You can ask on r/math, but there is some paper that states that for certain "large" datasets, and I think large was >65K items or so, then a random sample of 3,500 should give results within X% of the results for the whole. and again, I can't remember what X was but it was around 5% accuracy for a 3,500 sample. I used this when running simulations many years ago that took many hours to run completely, but I found I did get partial results that followed the paper.


alexeusgr

I was asking rhetorically, I think it was either math and CS mistake or marketing trick on the side of the researchers. But I get it, computers are good at crunching numbers but someone has to tell them what numbers to crunch. And someone are people too. Me I like abstract work more and my arrogance is bit annoying. I still get the need for crunched numbers now and then


enilkcals

Not really surprising that Python is the top language used in Jupyter Notebooks even though others are available. Most R users likely use RStudio, unless they're weird and stubborn like me and use Emacs + ESS (I also use Emacs as my IDE and for using Jupyter Notebooks thanks to the excellent EIN package).


klotz

Do you have a recommended tutorial?


enilkcals

For what? [RStudio](https://rstudio.com), [Emacs + ESS](https://ess.r-project.org), [Emacs as a Python IDE](https://realpython.com/emacs-the-best-python-editor/), [EIN](https://github.com/millejoh/emacs-ipython-notebook)? Can't really "recommend" anything, since I've read and used documentation over the years, its the primary source and I delve into it as and when I need something (the exception being RStudio which I don't use).


klotz

Thanks! I am most interested in EIN. Thank you for the links!


enilkcals

The documentation is your best bet then, I don't think EIN see's wide enough use to have people writing "This is how I used it" type articles yet. Basic usage is simple, one drawback is you can't use Javascript output such as [Folium](https://python-visualization.github.io/folium/) to render maps in Emacs (at least as far as I've been able to discern so far).


[deleted]

Do those include my final project for the semester?


zeroviral

Am I the only person who uses PyCharm and not notebook?


RudyChicken

I don't use either. I just write in VS Code. Am I weird?


zeroviral

Nope!! Better than using vim just to be cool. Like, if you’re on a big project, don’t use vim lol.


[deleted]

[удалено]


zeroviral

I think notebook is strictly Data Science. I make back end applications/servers/APIs in Python as well as automate some stuff using an automation framework with a selenium wrapper for testing web applications. Mainly I use Java for the heavy duty stuff but never needed to use Jupyter notebook. When I did though, it was easy to use so I get that part


Bobert_Fico

Notebooks are nice for repeatedly editing and running chunks of code, PyCharm has a much better debugger. Usually I end up using PyCharm too.


[deleted]

Well said. With any ideas I start in Jupyter, then when I’m happy with it I put it in PyCharm. Notebooks are great for development but I get really lazy and treat the chunks like functions/classes. I add that stuff when I transfer to PyCharm.


anotherthrowaway469

You can do both now, fyi. PyCharm has a nice ipynb editor with a debugger.


NostraDavid

How many of those were created by nub students that got taught Numpy, Pandas and Matplotlib? Because I know *I* did (even though vscode didn't have much native support and GitHub had *no* support for Jupyter).


[deleted]

I hope to see something more stable than Tf/keras prevail in the future. Also there's still a lot of Python2, more than I expected.


eebmagic

I noticed that too. Anyone know why there’d be an increase in python 2 since 2019? Are there big libraries that are still dependent on python 2 or something?


alcalde

Not big libraries, just small programmers. There are programmers still developing code with Delphi 7 from 2001 too. Some people just haven't let Python 2 go yet and perhaps never will.


[deleted]

Idk that's weird especially since the EOL of Python2


CaptainP

Why *is* Jupyter Notebook so ubiquitous for Python but not for any other language?


yubijam

It may be that Jupyter evolves around python e.g. pip install jupyter There's a large number of languages available through the notebook/hub


1337-1911

What are Jupyter notebooks?


wineandconfetti

from jupyter.org: >The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Basically it's like a Google Doc that allows you to insert and execute code there, to put it simply. All you need is an application that allows to run and interpret a Jupyter Notebook code, but all the other things are just inside.


jwink3101

> Basically it's like a Google Doc There are both web-based which is absolutely a distinguishing feature of Google Docs so this is true. But another standout Google Docs feature is collaborative editing. To my knowledge, Jupyter Notebooks don't have that. I'm sure you know this but I wanted to add this note for others who may hear "google docs" and think "collaboration". And if they do have this feature, that's news to me! But awesome!


HeyItsRaFromNZ

> To my knowledge, Jupyter Notebooks don't have [inbuilt collaboration] You're exactly right, for vanilla notebooks. This is why Jupyter developed [JupyterHub](https://jupyter.org/hub) . There are also [Google Colabs](https://colab.research.google.com/notebooks/) and [Zeppelin notebooks](https://zeppelin.apache.org/) . The former is obviously a Google project, and hosted on the cloud, while the latter doesn't have to be, but has built-in user management, security etc.


ChemEngandTripHop

JupyterHub isn't like Google Docs, it's more like shared storage that comes with an environment. For collaborative work there's companies like [deepnote](https://deepnote.com/) who let you edit the same notebooks at the same time.


L0ngp1nk

Basically, a file that contains interactive python code that you can execute one cell at a time. Really handy for doing data science work. If you run VSCode you can create and run them there easily.


boredinclass1

On top of what others have said I've found them extremely valuable for marrying code examples with high quality linked documentation. You can use HTML linking and markdown in cells to explain what you will be doing in the following coding cells. It has helped move my company forward with people who need our products but aren't particularly sophisticated programmers.


HeyItsRaFromNZ

They're such a great tool for delivering to non-technical stakeholders (e.g. C-suite). You can easily output a proper HTML file or (a little less easily) PDF report, so people don't need Jupyter to read the result. ​ I like them for teaching, as the students have the exact same material as I do. I can then encourage people to code along, and address/clarify things directly inline. So much better than a PowerPoint presentation!


boredinclass1

100% this man gets it. We have improved people integrating our technology to their embedded systems because we opensource examples that are extremely readable (thanks to python and jupyter). It's a beautiful thing.


HeyItsRaFromNZ

> It's a beautiful thing For sure. There's a good reason why any data science platform/service needs a similar notebook interface to gain traction among data scientists. They're just so useful. Azure (MS), Databricks, SageMaker (AWS), Google Colabs are each basically modifications of Jupyter notebooks such that they run on the respective host. Each feel very familiar---great for user buy-in!


NostraDavid

Here is an image: https://www.dataschool.io/content/images/2019/03/binder-50.png The `In [n]:` blocks are just HTML TextAreas where you can insert (usually Python) code. When you press `Ctrl+Enter` (IIRC - it's been 3 years since I used it) it *runs* that block of code. You can of course also run all blocks of code - in order - via the main menu. `Out[n]:` is the output *for that block*. It can be text and tables (as it is in this case), but also diagrams generated in pyplot. **Vscode has native support for Jupyter notebooks - press `ctrl+shift+p` and type "Create new blank Jupyter Notebook".** *Though you still need to install Python itself* Just slap `print("Hello World!")` in the box and press the play button! If you don't know what to do with it, check out https://www.dataschool.io/ and their [YouTube channel](https://www.youtube.com/user/dataschool) - it was the main source for me learning about Jupyter (better than what School gave me, lmao)


[deleted]

What are notebooks?


m4dm4d

From notebooks.org: > We are a wood-based portable non-extendible physical tool.


IAMSTILLHERE2020

What is a Jupyter?


HeyItsRaFromNZ

Jupyter is a portmanteau of **Ju**lia, **Py**thon and **R**. The idea of the project was to have a unified interface that could run multiple different languages. Jupyter itself is a self-contained web-server, which provides self-contained instances of whatever language you're running (a 'kernel'; you can even run C++ kernels!) via a browser interface. You can then run commands in the browser, which get sent to the kernel, and the response is then displayed. It makes it a pleasure to work with data-centric workflows, because you require this level of interactivity. Jupyter is built on the interactive Python environment (IPython). They used to be called 'IPython Notebooks', and you can see the vestiges of that, as the extension for a notebook is .ipynb


IAMSTILLHERE2020

Very informative. ![gif](emote|free_emotes_pack|thumbs_up)


[deleted]

[удалено]


IAMSTILLHERE2020

LOL.


HonestCanadian2016

I'd like to know the state of IDE's of late. I have used Jupyter Notebook almost exclusively the last few months due to my laptop not working well with Pycharm, which is a far more robust IDE as far as I can tell. I've also worked with Atom and Spyder. My personal preference has always been JN, even though it wasn't what I started with. I tend to think that the type of application one is using the IDE for dictates what they start with, and, what they may even eventually choose. So, program developers may find Pycharm Professional far more useful to their needs than a Data Scientist who might find Jupyter Notebook more relatable.


james_pic

I think that whilst there are problems that can be tackled with either an IDE or a notebook, they mostly solve non-overlapping problems. If you've got a problem that wants code that runs standalone (a web service, a GUI app, a game, a console application), a notebook is not going to be an elegant solution. And standalone code isn't usually going to be a clean solution for data analysis - although I've seen it just about work on dev-heavy teams, for whom notebooks are too magicky. I'm not *just* saying this because part of my day job is developing against a system that runs notebooks programmatically, and is an absolute nightmare, but I do definitely advise against doing that.


Hi_Macri

print(“hello world”)


[deleted]

I likely contributed some to this study hah


ExHax

Jupyter notebook is similar to matlab command line


DrSheldon_Lee_Cooper

Why picture style of the preview of the link to this article is so similar to what Yandex use in Yandex.Practicum?


Comfortable_Yak_9776

Esri has it built into their software.