You are viewing a single comment's thread from:

RE: Citizen Science Entry 2

in STEMGeeks2 years ago

Thanks a lot for this second report, and congratulations for your hard work on this exercise. How long did it take you? I assume you spent a significant amount of time with the docker environment, didn't you?

This time I have plenty of things to comment on!

Before starting MadGraph, I picked apart the acronyms in the name of the package: 'MadGraph5_aMC@NLO'. Specifically, I wanted to know what the 'aMC' and 'NLO' terms meant. After some searching, here's what I concluded:
aMC: adjoint Monte Carlo.
NLO: Next-to-Leading-Order.

The name comes from the merging of the code MadGraph5 (Mad refers to Madison in the US, and Graph to Feynman diagrams or graphs) and MC@NLO (a Monte Carlo event generator achieving predictions at next-to-leading order accuracy in the strong coupling; more information on this are provided in the 6th episode and the upcoming 7th episode). The extra a in the name refers to automation (MadGraph5 was an automated package for predictions at the leading-order accuracy; MC@NLO was not automated) .

By automation, I mean that it is sufficient to specify the process of interest and the physics model, and the code does the rest.

Note the presence of '#' at the end of the MG5 commands above. Through trial and error, I found that '#' indicates a comment in MadGraph's REPL. For example, generate p p > t t~ # Collider process is a valid command.

That’s right. This follows standard Python conventions: anything at the right of the hash is ignored.

An aside on filesystems
Initially I ran the simulation on a shared filesystem (/host/c). That was slow. So I tried both a container-local filesystem and a filesystem mounted in RAM (ramfs). For comparison, I measured the elapsed time of the 'launch' command (minus menu interactions).

That’s interesting. I don’t use dockers (as I run everything locally), so that I am unable to really comment on this. While RAM filesystems seems better, I assume there are limited in disk space, aren’t they? In this case, this may be a weakness.

By the way, why don't you run everything locally (possibly in a virtual environment)?

Since I'm running MadGraph in a container, there is no interactive display session. It's (nearly) headless. So I'm forced to copy graphical results (like images) to the host machine before viewing. That does incur some overhead.

So you don’t have access to an interactive terminal? That’s definitely an overhead as this makes you unable to read error messages and capture them live (if relevant). So you use it more like a cluster on which you would submit a job and recover the output files after they are transferred locally, don’t you? This makes life complicated for testing purpose...

Once again, congratulations for having achieved this episode!

Cheers!

Sort:  

Thanks!

How long did it take you? I assume you spent a significant amount of time with the docker environment, didn't you?

Most of the container setup time was spent in Episode 1. Now that I have a Dockerfile that specifies how to build the container, I can spin up new instances of the development environment for MadGraph5 quickly. This makes experimenting and iterating with systems-level changes quicker (like playing with filesystems).

The initial procedure of entering commands and checking the results took me a couple hours. Writing the post took me about half a day.

By automation, I mean that it is sufficient to specify the process of interest and the physics model, and the code does the rest.

Thanks for the clarifications. :)

While RAM filesystems seems better, I assume there are limited in disk space, aren’t they? In this case, this may be a weakness.

Exactly, very limited space. Another downside of ramfs is that you need to preallocate the size of the filesystem. And if the system crashes you lose all your data. If you have enough memory, RAM filesystems are pretty good for processes that generate a lot of intermediate artifacts (like compiling a large programs). But otherwise the downsides outweigh the benefits.

So you don’t have access to an interactive terminal? That’s definitely an overhead as this makes you unable to read error messages and capture them live (if relevant). So you use it more like a cluster on which you would submit a job and recover the output files after they are transferred locally, don’t you? This makes life complicated for testing purpose...

Good question. In this case, I don't have access to a graphical display like an X windows or wayland. So no apps like firefox or gimp. But I do have access to an interactive terminal through SSH. And the container runs locally on my computer. So, thankfully, I can see the MadGraph5 process execute in realtime and enter commands as you would normally at a linux terminal.

Thanks for the clarifications and the opportunity to embark on this fun adventure. Looking forward to the next episode!

The initial procedure of entering commands and checking the results took me a couple hours. Writing the post took me about half a day.

I can easily imagine that it is also the case for the other participants. Writing the reports takes always more time than the exercises. I however didn't include that when I mentioned that each episode should take a few hours... I actually didn't even think about it. Baah....

Exactly, very limited space. Another downside of ramfs is that you need to preallocate the size of the filesystem. And if the system crashes you lose all your data. If you have enough memory, RAM filesystems are pretty good for processes that generate a lot of intermediate artifacts (like compiling a large programs). But otherwise the downsides outweigh the benefits.

This is what I thought for the RAM filesystem. As for the next-to-next exercises (in a few episodes that I have not written yet), we will need to simulate collisions and store millions of events (which leads to multi-GB intermediate files), I am not sure that this will work. Except of course if the machine is powerful enough. I nevertheless do not know whether it is worth the test.

Good question. In this case, I don't have access to a graphical display like an X windows or wayland. So no apps like firefox or gimp. But I do have access to an interactive terminal through SSH. And the container runs locally on my computer. So, thankfully, I can see the MadGraph5 process execute in realtime and enter commands as you would normally at a linux terminal.

Then it is then perfect. You can probably have access to the HTML output via a browser like links.

Cheers, and thanks again for this report!