`
cloudeagle_bupt
  • 浏览: 538402 次
文章分类
社区版块
存档分类
最新评论

Hadoop2013

 
阅读更多

Sessions:

http://hadoopsummit.org/program/

Reading list:

Optimizing MapReduce Job Performance (http://www.slideshare.net/cloudera/mr-perf)

Optimizing MapReduce job performance is often seen as something of a black art. In order to maximize performance, developers need to understand the inner workings of the MapReduce execution framework and how they are affected by various configuration parameters and MR design patterns. The talk will illustrate the underlying mechanics of job and task execution, including the map side sort/spill, the shuffle, and the reduce side merge, and then explain how different job configuration parameters and job design strategies affect the performance of these operations. Though the talk will cover internals, it will also provide practical tips, guidelines, and rules of thumb for better job performance. The talk is primarily targeted towards developers directly using the MapReduce API, though will also include some tips for users of higher level frameworks.

Improving HBase Availability and Repair (http://www.slideshare.net/cloudera/120613-hadoopsummithbaseavailabilitybean-hsieh)

Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop’s HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase’s availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.

Hadoop Distributed File System Reliability and Durability at Facebook(http://www.slideshare.net/Hadoop_Summit/hadoop-distributed-file-system-at-facebook)

The Hadoop Distributed Filesystem, or HDFS, provides the storage layer to a variety of critical services at Facebook. The HDFS Namenode is often singled out as a particularly weak aspect of the design of HDFS, because it represents a single point of failure within an otherwise redundant system. To address this weakness, Facebook has been developing a highly available Namenode, known as Avatarnode. The objective of this study was to determine how much effect Avatarnode would have on overall service reliability and durability. To analyze this, we categorized, by root cause, the last two years` of operational incidents in the Data Warehouse and Messages services at Facebook, a total of 66 incidents. We were able to show that approximately 10% of each service`s incidents would have been prevented had Avatarnode been in place. Avatarnode would have prevented none of our incidents that involved data loss, and all of the most severe data loss incidents were a result of human error or software bugs. Our conclusions is that Avatarnode will improve the reliability of services that use HDFS, but that the HDFS Namenode represents only a small portion of overall operational incidents in services that use HDFS as a storage layer.

HDFS NameNode High Availability(http://www.slideshare.net/Hadoop_Summit/hdfs-namenode-high-availability)

The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.

Spark and shark(http://www.slideshare.net/Hadoop_Summit/spark-and-shark)

Spark is an open source cluster computing framework that can outperform Hadoop by 30x by storing datasets in memory across jobs. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. This talk will cover how both Spark and Shark are being used at various companies to accelerate big data analytics, the architecture of the systems, and where they are heading. In particular, we will show how both systems are used for large-scale machine learning, where the ability to keep data in memory across iterations yields substantial speedups, and for interactive data mining, from Shark’s SQL interface or Spark’s Scala-based console. We will also discuss an upcoming extension, Spark Streaming, that adds support for low-latency stream processing in Spark, giving users a unified interface for batch and online analytics.

分享到:
评论

相关推荐

    hadoop_0_20_2

    hadoop-0.20.2

    详解Hadoop.2013

    售前可以用到的技术文档,文件系统是一种存储和组织计算机数据的方法。文件系统是一套实现了数据的存储、分级组织、访问和获取等 操作的抽象数据类型

    Hadoop英文电子书集合

    2、Hadoop Beginner-'s Guide(2013).pdf 3、Hadoop in Practice 2nd Edition.pdf 4、Hadoop MapReduce Cookbook.pdf 5、Hadoop MapReduce v2 Cookbook 2nd Edition.pdf 6、Hadoop Operations and Cluster Management...

    [Hadoop] Hadoop 集群操作管理技巧 (英文版)

    ☆ 资源说明:☆ [Packt Publishing] Hadoop 集群操作管理技巧 (英文版) [Packt Publishing] Hadoop Operations and Cluster ...[出版日期] 2013年07月24日 [图书页数] 368页 [图书语言] 英语 [图书格式] PDF 格式

    Microsoft SQL Server 2012 with Hadoop(PACKT,2013)

    With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system ...

    基于hadoop对某网站日志分析部署实践课程设计报告参考模板.doc

    至此,我们通过Python网络爬虫手段进行数据抓取,将我们网站数据(2013-05-30,2013-05-31)保存为两个日志文件,由于文件大小超出我们一般的分析工具处理的范围,故借助Hadoop来完成本次的实践。 2. 总体设计 2.1 ...

    Pro Hadoop

    Merrill Lynch's analysts predicted in August 2008 that the annual global market for cloud computing will be 95 billion by 2013. Major cloud campaigns from Google, Microsoft, Yahoo, Amazon will soon ...

    [Hadoop] Hadoop 专业解决方案 (英文版)

    ☆ 资源说明:☆ [Wrox] Hadoop 专业解决方案 (英文版) [Wrox] Professional Hadoop Solutions (E-Book) ...[出版日期] 2013年09月23日 [图书页数] 504页 [图书语言] 英语 [图书格式] PDF 格式

    Hadoop Summit 2013

    2013 年 Hadoop 峰会部分PPT资料, 纯英文版。

    Hadoop部署和配置Kerberos安全认证

    Hadoop部署和配置Kerberos安全认证全套流程。已经过实测并部署与生产环境。

    大数据分析-网站日志数据文件(Hadoop部署分析资料)

    至此,我们通过Python网络爬虫手段进行数据抓取,将我们网站数据(2013-05-30,2013-05-31)保存为两个日志文件,由于文件大小超出我们一般的分析工具处理的范围,故借助Hadoop来完成本次的实践。 使用python对原始...

    ArcGIS与Hadoop的集成

    2013年Esri美国在开发者大会上演示了GIS数据结合Hadoop分析的一个示例,这个示例赢得了听众的阵阵掌声,我们也许对GIS不是很陌生,但是对Hadoop却不是很清楚,其实Esri是用Java开发了一套API,我们习惯性的称为...

    构建一个跨机房的Hadoop集群

    2013阿里技术嘉年华:构建一个跨机房的Hadoop集群,不错的介绍

    2013中国大数据技术大会PPT——腾讯大规模Hadoop集群实践

    【大数据架构与系统】腾讯数据中心资深专家翟艳堂分享了腾讯建立大规模Hadoop集群的过程,首先要解决单点问题,将JobTracker分散化,做NameNode高可用。在业务选型方面,选择了成熟度更高的Facebook开源的Corona。

    大规模Hadoop集群运维经验谈

    大规模Hadoop集群运维经验谈

    hadoop配置

    hadoop的安装配置过程详细介绍,本人也是初学者,和大家分享一下

    Hadoop项目的成功公式

    Hadoop项目的成功公式,will be helpful for Hadoop project. this is from Hadoop Summit 2013.

Global site tag (gtag.js) - Google Analytics