We have been using HBase for around a year in our development and projects, from 0.17.x to
0.19.x. We and all in the community know the critical performance and reliability issues of these
releases.
Now, the great news is that HBase‐0.20.0 will be released soon. Jonathan Gray from Streamy,
Ryan Rawson from StumbleUpon, Michael Stack from Powerset/Microsoft, Jean‐Daniel Cryans
from OpenPlaces, and other contributors had done a great job to redesign and rewrite many
codes to promote HBase. The two presentations [1] [2] provide more details of this release.
The primary themes of HBase‐0.20.0:
− Performance
− Real‐time and Unjavafy software implementations.
− HFile, based on BigTable’s SSTable. New file format limits index size.
− New API
− New Scanners
− New Block Cache
− Compression (LZO, GZ)
− Almost a RegionServer rewrite
− ZooKeeper integration, multiple masters (partly, 0.21 will rewrite Master with better ZK
integration)
Then, we will get a bran‐new, high performance (Random Access, Scan, Insert, …), and stronger
HBase. HBase‐0.20.0 shall be a great milestone, and we should say thanks to all developers.
Following items are very important for us:
− Insert performance: We have very big datasets, which are generated fast.
− Scan performance: For data analysis in MapReduce framework.
− Random Access performance: Provide real‐time services.
− Less memory and I/O overheads: We are neither Google nor Yahoo!, we cannot operate
big cluster with hundreds or thousands of machines, but we really have big data.
− The HFile: Same as SSTable. It should be a common and effective storage element.
Attachment | Size |
---|---|
hbase-0-20-0-pe-090825134516-phpapp01.pdf | 195.77 KB |