beegfs goes opensource!

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

beegfs goes opensource!

James-2
Long awaited.


This smoking hot (many HPC scientist agree) distributed file
system will surely rock the cluster, container and Hi Performance
Computing worlds. [1] Now if I were only smart enough to get this
puppy into portage.......



enjoy!
James



[1] http://www.beegfs.com/content/news/


Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

tanstaafl-2
On 2/25/2016 5:03 PM, James <[hidden email]> wrote:
> Long awaited.
>
> This smoking hot (many HPC scientist agree) distributed file
> system will surely rock the cluster, container and Hi Performance
> Computing worlds. [1] Now if I were only smart enough to get this
> puppy into portage.......

Ummmm... nothing about what license it is released under, and they want
personal info from you to download the source...

I'm not sure this is anything to jump up and down about yet...

Is this going to be another ZFS problem, where it is open source, but
linux can't make the best use of it?

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

James-2
Tanstaafl <tanstaafl <at> libertytrek.org> writes:


> Ummmm... nothing about what license it is released under, and they want
> personal info from you to download the source...

> I'm not sure this is anything to jump up and down about yet...

agreed. bummer. Sometimes it takes time for the folks that put up the money
for initial development, to decide to do the right thing on licensing. With
file system choices so abundant, opensource gets you a community involved
with patches and bug fillings, so there is hope? [A] Maybe one of our
(council) leaders should drop Sven Breuner an email and ask it their is an
appropriately acceptable license for the gentoo community to use this
cluster file system routinely on gentoo.....


> Is this going to be another ZFS problem, where it is open source, but
> linux can't make the best use of it?


Excellent point about the license.  Did the license stop zfs folks
from enjoying zfs?  I know the zfs license stops some commercial folks
from deploy/using zfs. And zfs is not a routine choice in the installation
docs for gentoo.....


What I do know is about 75% of the folks that run clusters for Hi
Performance Computing, that I have exchanged pleasantries with, all extol
the virtues of beegfs. Most already pay to use it, but I do not know of
their financial models going forward. Hopefully, they'll be like postgresql
and sell/develop for the commercial folks and let the po(linux) folk
ride for free. My biggest bottleneck in bringing apache-mesos to gentoo
is the choice of node(File System)//distributed(File System) that leads to
the right mix of features and speed. Surely ext4/beegfs or btrfs/beegfs
is attractive no matter what container or HPC codes you run on top of your
gentoo cluster(s).


Furthermore, Cephfs is being used to replace NFS functions in some
locations, so there is now a growing pressure of competition among
opensource solutions for distributed(cluster) file systems.



James

[A] http://www.beegfs.com/content/about-us/




Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

»Q«
In reply to this post by tanstaafl-2
On Fri, 26 Feb 2016 10:47:12 -0500
Tanstaafl <[hidden email]> wrote:

> On 2/25/2016 5:03 PM, James <[hidden email]> wrote:
> > Long awaited.
> >
> > This smoking hot (many HPC scientist agree) distributed file
> > system will surely rock the cluster, container and Hi Performance
> > Computing worlds. [1] Now if I were only smart enough to get this
> > puppy into portage.......  
>
> Ummmm... nothing about what license it is released under, and they
> want personal info from you to download the source...
>
> I'm not sure this is anything to jump up and down about yet...
>
> Is this going to be another ZFS problem, where it is open source, but
> linux can't make the best use of it?

I dunno anything about it, but this comment says it's only source
available, not open source.
<http://insidehpc.com/2016/02/beegfs-parallel-file-system-now-open-source/#comment-113400>


Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

tanstaafl-2
In reply to this post by James-2
On 2/26/2016 12:04 PM, James <[hidden email]> wrote:
> Excellent point about the license.  Did the license stop zfs folks
> from enjoying zfs?  I know the zfs license stops some commercial folks
> from deploy/using zfs. And zfs is not a routine choice in the installation
> docs for gentoo.....

I recall a list conversation about this, explaining that it would be
trivial for someone who knows how to do ebuilds, to have their own
ZFS-in-kernel system available, and that it would also be possible to
accomplish this via an overlay...

Wish someone would do it, I'd love to play with ZFS, but I don't have
the skill or time to figure out the pieces...

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Rich Freeman
On Fri, Feb 26, 2016 at 12:48 PM, Tanstaafl <[hidden email]> wrote:

> On 2/26/2016 12:04 PM, James <[hidden email]> wrote:
>> Excellent point about the license.  Did the license stop zfs folks
>> from enjoying zfs?  I know the zfs license stops some commercial folks
>> from deploy/using zfs. And zfs is not a routine choice in the installation
>> docs for gentoo.....
>
> I recall a list conversation about this, explaining that it would be
> trivial for someone who knows how to do ebuilds, to have their own
> ZFS-in-kernel system available, and that it would also be possible to
> accomplish this via an overlay...

sys-fs/zfs-kmod

--
Rich

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Andreas K. Huettel
In reply to this post by tanstaafl-2
Am Freitag, 26. Februar 2016, 16:47:12 schrieb Tanstaafl:
> On 2/25/2016 5:03 PM, James <[hidden email]> wrote:
> > Long awaited.
> >
> > This smoking hot (many HPC scientist agree) distributed file
> > system will surely rock the cluster, container and Hi Performance
> > Computing worlds. [1] Now if I were only smart enough to get this
> > puppy into portage.......
>
> Ummmm... nothing about what license it is released under

Just read the web page, it's there.

"The BeeGFS client module is licensed under the GPLv2. All other BeeGFS
components are licensed under the BeeGFS EULA."

--
Andreas K. Hüttel
Gentoo Linux developer (council, perl, libreoffice)
[hidden email]
http://www.akhuettel.de/

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Neil Bothwick
In reply to this post by tanstaafl-2
On Fri, 26 Feb 2016 12:48:21 -0500, Tanstaafl wrote:

> > Excellent point about the license.  Did the license stop zfs folks
> > from enjoying zfs?  I know the zfs license stops some commercial folks
> > from deploy/using zfs. And zfs is not a routine choice in the
> > installation docs for gentoo.....  
>
> I recall a list conversation about this, explaining that it would be
> trivial for someone who knows how to do ebuilds, to have their own
> ZFS-in-kernel system available, and that it would also be possible to
> accomplish this via an overlay...

When I was playing with ZFS I was able to build it into the kernel (by
unmasking a USE flag) so I could boot from it without an initramfs.

IMO the main problem with ZFS on Linux is that it is based on a fairly
old version. Oracle have not released the sources for the recent
versions, so useful stuff like encryption is missing, and always will be.


--
Neil Bothwick

 ... We are Dyslexics of Borg. Your ass will be laminated.

attachment0 (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

James-2
In reply to this post by Andreas K. Huettel
Andreas K. Hüttel <dilfridge <at> gentoo.org> writes:

> > Ummmm... nothing about what license it is released under

> Just read the web page, it's there.

> "The BeeGFS client module is licensed under the GPLv2. All other BeeGFS
> components are licensed under the BeeGFS EULA."


So how does that work on a distributed file system. Is the client-server
model that restrictive for non-commercial use? From the "Introduction to
BeeGFS by ThinkParQ" it states:

"The BeeGFS file system comes without licence fee: It is a “free to use”
product for end users – so whoever wants to try it for his own use, can
download it from www.beegfs.com and use it. The client is published under
GPL, and the server is covered by the Fraunhofer EULA."


But in the EULA, section 3.4 is a killer. Oh well, maybe in time
they'll sell enough support to have all the 'enterprise features"
on a GPL style license. If not, I doubt their code stay competitive.
The fact they are make the sources available and allowing modifications,
for internal use only, might suggest a more realistic pathway forward. I'll
bet as soon as another opensourced-gpl become competitive, they fully GPL
the codes (at least this is my hope).

Bummer, but thanks for pointing out the license restrictions.

James




James

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

tanstaafl-2
In reply to this post by Rich Freeman
On 2/26/2016 1:14 PM, Rich Freeman <[hidden email]> wrote:
> On Fri, Feb 26, 2016 at 12:48 PM, Tanstaafl <[hidden email]> wrote:
>> I recall a list conversation about this, explaining that it would be
>> trivial for someone who knows how to do ebuilds, to have their own
>> ZFS-in-kernel system available, and that it would also be possible to
>> accomplish this via an overlay...

> sys-fs/zfs-kmod

I would be using this on a server, so, for security reasons, no module
support.

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Neil Bothwick
On Sat, 27 Feb 2016 22:51:13 -0500, Tanstaafl wrote:

> >> I recall a list conversation about this, explaining that it would be
> >> trivial for someone who knows how to do ebuilds, to have their own
> >> ZFS-in-kernel system available, and that it would also be possible to
> >> accomplish this via an overlay...  
>
> > sys-fs/zfs-kmod  
>
> I would be using this on a server, so, for security reasons, no module
> support.

echo sys-fs/zfs kernel-builtin >/etc/portage/package.use

You need to unmask the kernel-builtin USE flag.


--
Neil Bothwick

Why is bra singular and pants plural?

attachment0 (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Andrew Savchenko
In reply to this post by James-2
Hi all,

On Thu, 25 Feb 2016 22:03:59 +0000 (UTC) James wrote:
> This smoking hot (many HPC scientist agree) distributed file
> system will surely rock the cluster, container and Hi Performance
> Computing worlds. [1] Now if I were only smart enough to get this
> puppy into portage.......

By the way, does anyone have any real performance comparison with
Lustre?

While it is good to have another solution available, I don't see
any real benefits of FhgFS/BeeGFS compared to Lustre these days.
At the time where FhgFS was created, Lustre indeed was unable to
use multiple metadata servers, so this was a bottleneck. But now
Lustre also supports distributed metadata, so they should on par in
this matter.

On the other hand, Lustre has much larger community (e.g. see
TOP-500 list) and is much better tested (and even under such
conditions it has problems in some corner cases). Thus I see no
advantage in FhgFS for HPC setups.

Of course world of parallel distributed file systems is very
versatile, so for different tasks/workloads different file systems
are the most suitable, but for typical IB-based HPC storage I see
no better solution than Lustre at this moment.

Best regards,
Andrew Savchenko

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

tanstaafl-2
In reply to this post by Neil Bothwick
On 2/28/2016 4:24 AM, Neil Bothwick <[hidden email]> wrote:
> On Sat, 27 Feb 2016 22:51:13 -0500, Tanstaafl wrote:
>>> I recall a list conversation about this, explaining that it would be
>>>> trivial for someone who knows how to do ebuilds, to have their own
>>>> ZFS-in-kernel system available, and that it would also be possible to
>>>> accomplish this via an overlay...  

>>> sys-fs/zfs-kmod  

>> I would be using this on a server, so, for security reasons, no module
>> support.
>
> echo sys-fs/zfs kernel-builtin >/etc/portage/package.use
>
> You need to unmask the kernel-builtin USE flag.

Wow...! How long has that been available?

I also recall something about being able to use the latest/greatest too,
but the overlay would have to pull the sources from Oracle...?

So, would appreciate comments on what version of ZFS this would pull in,
limitations, caveats, dangers, etc...

Also, it has been a while since I read anything - what is the current
state of BTRFS vs ZFS? Is it stable/mature enough to use for production?
What can ZFS do that it cannot?

Thanks Neil!

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Neil Bothwick
On Sun, 28 Feb 2016 08:34:56 -0500, Tanstaafl wrote:

> >> I would be using this on a server, so, for security reasons, no
> >> module support.  
> >
> > echo sys-fs/zfs kernel-builtin >/etc/portage/package.use
> >
> > You need to unmask the kernel-builtin USE flag.  
>
> Wow...! How long has that been available?

I last used it a couple of years ago, before I switched to btrfs.
 
> I also recall something about being able to use the latest/greatest too,
> but the overlay would have to pull the sources from Oracle...?

AFAIK the sources for more recent versions of ZFS were never released.
 

--
Neil Bothwick

Beware! The end is... <aaarrgh!>

attachment0 (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

Rich Freeman
In reply to this post by tanstaafl-2
On Sun, Feb 28, 2016 at 8:34 AM, Tanstaafl <[hidden email]> wrote:
>
> Also, it has been a while since I read anything - what is the current
> state of BTRFS vs ZFS? Is it stable/mature enough to use for production?
> What can ZFS do that it cannot?
>

This is obviously a topic people will have various opinions on.

I'd say that zfs is more stable, but long-term less likely to be the
mainstream linux solution.

At the current time I'd say that btrfs single-disk or in raid1 is
mature enough to be usable, with a bunch of caveats.  It definitely
isn't up there with the likes of ext4.  Besides some stuff just not
being handled gracefully like low-disk-space it is fairly prone to
regressions.  I've found I've gotten the best experience by sticking
with longterm kernels and not updating to the next one until it
reaches maybe x.x.16 or so (maybe 6mo after release), and I always
check the lists before updating.

As far as features go, zfs tends to have more enterprise-oriented
features and btrfs tends to have more small-system-oriented features.
For example, in zfs with 100 drives you can arrange them into 10 neat
raid-6s with 10 drives each or something along those lines, and have a
few SDDs servicing the entire array as read and write caches.  On the
other hand, if you have a 4-drive raid5 and want to turn it into a
5-drive raid5 this is impossible to do in zfs without copying all the
data off the drives, but trivial to do in btrfs.  When you have 100
drives adding or removing 10 at a time probably isn't a big deal, but
when you 4 drives having to first add 5 drives and then remove the
previous 4 seems almost comical.  It just has to do with the original
goals of the two filesystems.

Oh, I've heard that zfs is ram-hungry, and many of its advocates
suggest it should only be used with ECC RAM.  Honestly, I've never
seen an argument for ECC on zfs that wouldn't apply equally to any
other filesystem, but whatever.  I haven't found btrfs to be terribly
RAM hungry, but I have found that it doesn't seem to do a good job
with IO scheduling classes (though I have no idea how zfs does on this
front).  Btrfs at least seems to accept too many writes into its queue
and then everything backlogs with getting it all out to disk,
resulting in processes that should be low-priority blocking writes
even on processes marked as realtime.  I suspect that this is just the
whole bufferbloat phenomena in another context.

I'm not really sure what the "conservative" recommendation.  Ext4 (or
even ext3) is the obvious one, but both zfs and btrfs have
checksumming of all data written to disk which is a huge data security
improvement.  That is a compelling feature that should give even
conservative sysadmins pause before just rejecting them, unless
they're mitigating silent corruptions in some other way.

--
Rich

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

tanstaafl-2
On 2/28/2016 9:09 AM, Rich Freeman <[hidden email]> wrote:
> I'm not really sure what the "conservative" recommendation.  Ext4 (or
> even ext3) is the obvious one, but both zfs and btrfs have
> checksumming of all data written to disk which is a huge data security
> improvement.  That is a compelling feature that should give even
> conservative sysadmins pause before just rejecting them, unless
> they're mitigating silent corruptions in some other way.

This is precisely why I'm interested in it...

Thanks for the comments...

Reply | Threaded
Open this post in threaded view
|

Re: beegfs goes opensource!

James-2
In reply to this post by Andrew Savchenko
Andrew Savchenko <bircoph <at> gentoo.org> writes:


> While it is good to have another solution available, I don't see
> any real benefits of FhgFS/BeeGFS compared to Lustre these days.
> At the time where FhgFS was created, Lustre indeed was unable to
> use multiple metadata servers, so this was a bottleneck. But now
> Lustre also supports distributed metadata, so they should on par in
> this matter.

Interesting thesis. I only have anecdotal information, from those
I've encountered who are willing to converse, privately. Many more sites
exist than are publicized as I think most (scientific) groups have a keen
interest in distributed processing, in an open source semantic.
I did notice the '9999' version of lustre in portage (science overlay), but
reading elsewhere I did not know it was still being actively developed?

> On the other hand, Lustre has much larger community (e.g. see
> TOP-500 list) and is much better tested (and even under such
> conditions it has problems in some corner cases). Thus I see no
> advantage in FhgFS for HPC setups.

Strangely, the folks I have chatted up do not publish their test results
as that would be quite a large undertaking to assure critics that the
tests are fair and equivalent, with the only thing different being the
local and cluster file systems. Lustre seems to have a bad rap, but that
may be due to folks testing much earlier versions. I'm no authority on the
subject; just trying to ferret out pathways for robust cluster computing
on gentoo; although containers are useful, my focus is on the
leanest/fastest bare metal HPC Opensource approach. to clusters on gentoo.


> Of course world of parallel distributed file systems is very
> versatile, so for different tasks/workloads different file systems
> are the most suitable, but for typical IB-based HPC storage I see
> no better solution than Lustre at this moment.


YES. But also these test/benchmarks should include Cephfs, gluster, and
tachyon if not many others. [1]  Perhaps we should encourage some of our
gentoo-devs, to put up a wiki for gentoo-HPC, with at least a working
framework of packages suggested, including all the DFS tricks therein ?
Me, I'm just stumbling my way around to try to figure out a resonable
pathway to HPC on gentoo.

I thought that systemd was going to dominate these cluster-container wars
until I started reading up on Docker's acquisition of the main dev at Alpine
linux and the rapid movement of Docker to 'subsume' Alpine linux as it's
distro for releases [2]. Alpine leverages OpenRC and eudev and Docker is
preparing for battle with other container offerings, commercially, so this
does suggest that the performance battle with clusters is now openly
challenging the systemd proponents for performance bragging rights. Combined
with the question of the DFS, it does lsuggest some publish test comparing
these different approaches would be of keen interest to a wide audience.

The only test code I am aware of for HPC on gentoo is sys-cluster/hpl
and I'm not sure how well that will exercise the DFS performance questions.


> Best regards,
> Andrew Savchenko

James


[1]
http://www.datanami.com/2016/02/23/meet-alluxio-the-distributed-file-system-formerly-known-as-tachyon/

[2] https://www.brianchristner.io/docker-is-moving-to-alpine-linux/