- an in-memory database technology?
- the name for SAP’s cloud platform?
- an on-premise DB to run SAP ERP (to replace Oracle)?
- a full-stack proprietary web development platform?
- a marketing term to solve all problems with SAP products?
Does anyone have an hands on experience with HANA, beyond the usual marketing BS? Is is that revolutionary? If it runs on such specialized hardware, is the speed increase that impressive?
Ultimately I think the HANA database is an overhyped in-memory database. The reason they make money with HANA is that they are successfully replacing Oracle Databases for their SAP on-premise customers.
SAP HANA is an in memory database. That’s all it is. It’s similar to Qlikview if you’ve used that. It’s licensed per socket, so scaling up is better than scaling out.
SAP will also host HANA for you, for a fee.
See. The thing is: SAP is a company for HR departments. The kinds of departments that aren’t technical, do not have technical staff to develop thing for them. (It’s an example) but your HR department talks to SAP, they deliver some service, sometimes something that the barely tech-literate members of HR is able to build on and then suddenly your sysadmins have to support those things in perpetuity.
Well. They market themselves as this anyway. And I have experience of our HR department buying all kinds of stuff from SAP and their child companies. (Concur, for example).
So I understand your incredulous-ness. This company is not made for us. And most of their tech isn’t either.
And finance. And operations. And compliance, and ...
> The kinds of departments that aren’t technical
Which, in most businesses, are almost all of them. That's why they are so entrenched and so profitable, despite selling an utterly unfashionable and over-complicated stack.
I suspect they also want information back from you, and most likely in a SAP compatible version.
This was all told to me at our IT quarterly meeting, with our new SAP deployments, as IT does not actually input the data, just run the servers.
Funny how we run the stuff, but sometimes dont even know how its used.
ERP applications which use HANA typically do have significant random accesses in their workload. It is due to this reason that it was initially marketed as an in-memory database. The random access performance of RAM was necessary for high performance. If HANA was just being used in in-memory mode, the speed increase is probably not that impressive. But, if there is non-volatile storage involved, then yes the speed increase would be impressive.
You can get good performance out of it, no medals there as any IMDB would give you that.
My pain points were:
- it is sold as the virtualisation layer to rule them all, until you find crazy breakages. SAP's SQL dialect is case sensitive. That'll frustrate a lot of downstream databases that you deal with, or the users thereof.
- virtual warehouse. I'm not a Kimball expert, but our resident experts came to the same conclusion as I did, that you can't really build useful VIRTUAL data marts directly from disparate downstream data sources without persisting that somewhere. That then defeats the purpose, in my quasi-professional opinion.
- I spent over a month looking at the SDA/SDI to "Big Data" parts. Very immature, found interesting instances where push-down of predicates were inconsistent and poor. It's one thing to say you've got Apache Spark integration, but it's another thing when your system can't figure out that data comes from the same system, and this you could do some expensive computations downstream instead of saturating the network pulling a lot of data. I did my work early this year, so I would expect improvement here. The problem though was that many of us started feeling like SAP are releasing a product that should spend more time in development before coming to us. I'm entitled to say so because they charge an arm and a leg.
- data modelling/development. SQL is Turing complete, but some things are still hard to do in it. Instead of giving us more capabilities when modelling our data, they're getting "distracted" by wanting to tuck marketing boxes like R support, NodeJS support, etc. I didn't spend enough time on those to feel comfortable with producing a fair opinion, so I'll reserve one.
- SDLC, security. SAP has solid products there, and I learnt quite a lot around security integration. There were some quirks, but those were largely because of our requirements, not SAP. I enjoyed working on data access and security patterns.
- IDE. They initially used a modified NetBeans, which was also available as a plugin. I'm not a NB person, but this was solid. They decided to move to a web-based IDE. I don't know why, but they deprecated certain things while their web thingy was not even feature complete. It was a growing mess when you'd find that you can only do something in X but not Y.
To recap, a high cost for performance that you could get from a competitor, or even a well baby-sat Apache Ignite cluster (for the features they support, at least they aren't lying in their marketing). I think of it as a lame caching layer, that's useful if you're already invested in the SAP ecosystem, think of SAP CRMs or even their warehousing as a stretch.
So, no; it's not revolutionary in my view. The "specialised hardware" is marketing bait, our IT (I too was) were convinced that one could build themselves the same hardware as was speced. When I first heard of the special hardware thing, I thought of a custom Intel with features that only SAP would have, DDR4 RAM at crazy clock speed, etc. Only to find that it's a sly way of controlling how much they charge you. "Want to add moar RAMs? Pay us so you don't vid warranty, and we'll charge you more of course for the extra RAM".
To conclude my rant, I hope you've found my view useful. My conclusion when I rolled off from my SAP HANA (notice how little I've used this acronym, I despise it) project feeling like it was making everyone in the room bad. People were forsaking their Oracle DBs and Hadoop clusters to make design compromises for an over-marketed database thingy.
EMC did something similar with avamar... you’d buy spray painted SuperMicro boxes runnding RHEL and pay for software as well. Come node end of life, you have to buy the software and hardware again, even after paying the 20-25% maintenance.
As I replied in a sub-thread, I think Intel's marketing diagram [1] is probably useful to help separate the Optane flavors. This is about the "near DRAM" variant.
While the blog post highlights running SAP HANA (SAP's in-memory focused database), you can use them for whatever you want. The persistent part is that it's persistent across reboots. The hope is that this might make it easier to have tiered database/caching systems, since the gap between DRAM and this new "memory" is much closer than say DRAM and SSD.
[1] https://newsroom.intel.com/wp-content/uploads/sites/11/2018/...
This article from Anandtech [1] covers the SSD variant though which states:
> The endurance rating for both capacities is 200 GB/day for the five-year warranty period. Given the small capacity of the drives, this works out to 1.7 or 3.4 drive writes per day, which is considerably higher than normal for consumer SSDs.
[1] https://www.anandtech.com/show/12512/the-intel-optane-ssd-80...
So does that mean it can be considered persistent like a regular SSD/HDD?
They're a nightmare to program because OSes do not have a good abstraction for them (at least not yet). Accessing them through the file-system seems sub-optimal (this is byte-addressable memory and not a block device). Accessing them through virtual memory is also pretty bad because they're much slower than DRAM.
Both Windows and Linux implement DAX, which, as @the8472 explained, allows bypassing page cache in memory mapped I/O. Additionally, DAX optionally allows you to flush your data directly from user-space instead of calling msync.
And that's the gist of NVM programming model [0], its entire point is to allow applications to avoid the now hugely excessive abstraction layer of traditional storage.
And I will freely admit that programming to raw memory mapped files can be difficult, but there is ongoing work on making it easier. An example of that is, excuse the shameless plug, Persistent Memory Development Kit [1], which makes writing new software for this new type of memory much simpler.
Performance of an NVDIMM is obviously hardware dependent, but the now widely accepted programming model works with the assumption that persistent memory is fast enough so that it is reasonable to stall a CPU while an instruction is accessing it. I'm not sure on what hardware evaluations you are basing your claims on, but let me assure you that the HW solution being described in the blog post does not violate that assumption.
[0] - https://www.snia.org/tech_activities/standards/curr_standard...
[1] - http://pmem.io/
With DAX[0] linux already has the ability to put a filesystem (currently ext4 and xfs) on NVDIMMS and then let userspace address them through mmap while skipping the page cache indirection. I.e. you're directly byte-addressing them through the memory controller via standard memory-mapped file abstractions. Direct block device mapping of nvdimms without filesystem is also possible.
[0] https://www.kernel.org/doc/Documentation/filesystems/dax.txt
The throughput and latency is fast enough that many apps can skip using memory and work straight from this. We already see this in some mobile devices that only have solid state and instant boot times because the data is always ready, with no need to shift from disk to ram.
It'll be awhile before OS and applications take advantage but it has potential to be a major shift in how storage works.
NVMe is (now) a more general purpose "speak to flash like things". This Optane memory stuff is made of "flash", but unlike our Local SSD offering (or AWS's i3) it's at a latency and throughput closer to DRAM.
Despite it being marketing material, I find the pyramid diagram [1] helpful. This blog post is about "Optane Persistent Memory".
[1] https://newsroom.intel.com/wp-content/uploads/sites/11/2018/...
Edit: Article[1] here says the latency was 40µs with 13M IOPS, if you consider RAM latency to be 100ns[2], then it looks like 40X
[1] - https://blogs.technet.microsoft.com/filecab/2018/10/30/windo... [2] - https://gist.github.com/jboner/2841832
Why are we constantly creating pets out of what should be cattle ?
The benefits of having access to that technology on GCP (or another cloud) are the usual: reduced operational burden, increased availability, flexible pricing structure, elastic scalability, etc.
Your question about pets vs cattle is a non sequitur. Nothing about this announcement or the underlying technologies are suggestive of how they should be used (or misused). It's just a new tool in the toolbox—and as a distributed systems engineer specializing in database technology, having easy access to this hardware at scale is extremely compelling.
It's pretty weird to call out people for making 'pet' servers because they're daring to attach storage to them.
[1] Redis on Optane https://redislabs.com/blog/redis-enterprise-flash-intel-opta...
[2] GraphBLAS & RedisGraph https://news.ycombinator.com/item?id=18099520