Google Tech-Talk Computer Science Video Lectures
There are many, many more google tech-talk video lectures available here:
Google Tech-Talk Lectures
ABSTRACT: Nearly 60 years after the first electronic digital computer was designed at the Princeton Institute for Advanced Studies (IAS), companies like Google are demonstrating the power of a world built from 1s and 0s. Zome is a system that models the world built from the numbers 2, 3 and 5. We will explore how these numbers are knotted together to form the structure of space, from the subatomic framework of the atom, to the geometry of life, to a recently proposed “shape” of the universe!
Creative Commons for Googlers
ABSTRACT: Creative Commons provides tools that enable the legal sharing and re-use of creative and educational materials online. Come learn about Creative Commons, what they're doing, and how Google might help. Creative Commons' general counsel will be on hand to answer questions about CC copyright licenses and other legal issues, but the presentation will focus on technical projects at Creative Commons: license-aware web search, microformats, reliable metadata-embedding in various media types, and licensing integration with user generated content platforms.
iClaustron: Open Source Grid Cluster Storage Controller
ABSTRACT: Many applications has requirements to store petabytes of base data and many terabytes of structured data. Examples of this are genealogy, astronomy, biotech and so forth. This talk will discuss requirements from the genealogy application and show how this requirements requires building very large clustered systems with an hierarchy of clusters. These clusters are used to both store base data and structured data. He goes on to show how these requirements translate into a systems architecture with essential components of off-the shelf servers, cheap storage, clustered software and integrated cluster interconnects.
The Technology Behind Debian's Testing Release
ABSTRACT Current Debian Project Leader, former Release Manager and all round good guy, Anthony "aj" Towns will give an in depth look at the ideas and code that hold Debian's "testing" suite together, from its initial genesis, through basic prototypes, to the "final" implementation and the couple of rewrites it's had since. The numerous optimisations used to make the ideas actually operate in an even vaguely acceptable amount of time would be examined; and the various tricks and tools used in development and debugging will be examined (including malloc debugging, writing C extensions to perl and python, and libapt versus libdpkg).
Better, faster, smarter: Python yesterday, today ... and tomorrow
A lecture on Python programming language. Emphasis on Python implementation 2.5 but also a historical review of 2.2, 2.3, 2.4.
Security is Broken
ABSTRACT: Our computer security model is broken. Worse yet, it never really has worked at all well, and is even less suitable for today's uses. In this talk, I explore the history behind the design of the current security both in hardware and operating systems. Instead of evolving a more secure model over time, system designers have actually managed to make things worse, creating insecurity in depth. Most of today's systems are single user machines: certainly desktops and laptops, but also most servers. The current security model was not designed to protect users from themselves, and this goes a long way towards understanding why security is so difficult. I end by looking at strategies for improving security -- but no real solutions. The point is to start thinking outside of the box, while adopting best practices today. What we have done in the past has not worked, and can not work. We need to look at the security model in a new way, and that is the real point of this presentation.
High Radix Interconnection Networks
ABSTRACT: High-radix interconnection networks offer significantly better cost/performance and lower latency than conventional (low-radix) topologies. Increasing radix is motivated by the exponential increase in router pin bandwidth over time. Increasing the radix or degree of a router node is a more efficient way to exploit this increasing bandwidth than making channels wider. A high-radix poses several challenges in router design because the internal structures of conventional routers (e.g., the allocators) scale quadratically with radix. A hierarchical switch organization with internal buffering yields a scalable design with near-optimal performance. A high-radix "flattened butterfly" topology, enabled by recent developments in global adaptive routing, offers twice the performance as a comparable-cost Clos network on balanced traffic. Many of these developments have been incorporated in the YARC router and interconnection network for the Cray Black Widow Supercomputer.
Data Representation/Laplace Operator
ABSTRACT: Data Representation by Graphs, Matrices, Formulas, and continued Fractions and Inverse Problems for Laplace Operator.
Using Statistics to Search and Annotate Pictures
ABSTRACT: The last decade has produced significant advances in content-based image retrieval, i.e. the design of computer vision systems for image search.
I will review our efforts in the area, with emphasis on the subject of semantic retrieval. This consists of learning to annotate images, in order to support natural language queries. In particular, I will argue for a retrieval framework which combines the best properties of classical "query by visual example" (QBVE), and more recent semantic methods, and which we denote as "query by semantic example" (QBSE). While simple, we show that, when combined with ideas from multiple instance learning, this framework can be quite powerful. It improves semantic retrieval along a number of dimensions, the most notable of which is generalization (out-of-vocabulary queries). It can also be directly compared to query by example, making it possible to quantify the gains of representing images in semantic spaces.
Our results show that these gains are quite significant, even when the semantic characterization is noisy and somewhat unreliable. This suggests an interesting hypothesis for computer vision: that it may suffice to adopt simple visual models, as long as they operate at various levels of abstraction and are learned from large amounts of data.
Badvertisements: Stealthy Click Fraud with Unwitting Accessories
ABSTRACT: We describe a new type of threat to the Internet infrastructure, in the shape of a highly efficient but very well camouflaged click-fraud attack on the advertising infrastructure, not using any type of malware. The attack, which we refer to as a "badvertisement", is described and experimentally verified on several prominent advertisement schemes. This stealthy attack can be thought of as a threatening mutation of spam and phishing attacks, with which it has many commonalities, except for the fact that it is not the targeted individual who is the victim in the attack, but the advertiser.
Decision Making and Chance
ABSTRACT: Certain gambling games, such as roulette and craps, are games of pure chance: In repeated play, luck disappears, and the persistent gambler will go broke. Other gambling activities, such as betting on sports or the stock market, may involve an element of skill. One way to measure this is to compare the results of a gambling strategy with chance: A skillful strategy should produce long-run results that are better than would be achieved by someone who is just guessing. One can also compare a gambler’s losses with chance to see if the gambler is doing worse than chance would allow. I will discuss two recent projects that illustrate these concepts:
- Automated data mining software discovers that the Baltimore Ravens are 17-3 versus the point spread when they lost their previous game and their opponents played their previous game on the road. Do situations like this give clever gamblers an edge or are such strong win-loss records merely random flukes?
- A gambler loses $30 million betting at an online casino. Is it possible to lose this much just by chance or is the gambler being cheated? Or maybe the gambler is part of a money laundering scheme.
The Electric Sheep and their Dreams in High Fidelity
Electric Sheep is a distributed screen-saver that harnesses idle computers into a render farm with the purpose of animating and evolving artificial life-forms known as sheep. The votes of the users form the basis for the fitness function for a genetic algorithm on a space of abstract animations. Users also may design sheep by hand for inclusion in the gene pool.
This cyborg mind composed of 35,000 computers and people was used to make Dreams in High Fidelity: a painting that evolves. It consists of 55GB of high definition sheep that would have taken one computer over 100 years to render, played back to form a nonrepeating continuously morphing image.
The talk will cover the genetic code and renderer, the genetic algorithm, how error correction is built into the distributed renderer while minimizing performance penalty, and how to distribute 750GB of video per day without paying for it. The talk will include a demo of the artwork.
ReUsable Web Components with Python and Future Python Web Development
ABSTRACT: Python's Web Server Gateway Interface (WSGI) not only enables a multitude of Python web frameworks to share code when it comes to deployment, but also enables entirely new levels of re-use for Python web development. This talk is focused on explaining WSGI, new types of re-use with WSGI middleware, and explore new frameworks that heavily utilize WSGI; in this case, Pylons. Moving beyond monolithic frameworks that try to do everything themselves, to new modes of development where you can use just the parts you want and still have active development communities to interact with.
Nanowires and Nanocrystals for Nanotechnology
(not computer science but too interesting to miss)
ABSTRACT: Nanowires and nanocrystals represent important nanomaterials with one-dimensional and zero-dimensional morphology, respectively. Here I will give an overview on the research about how these nanomaterials impact the critical applications in faster transistors, smaller nonvolatile memory devices, efficient solar energy conversion, high-energy battery and nanobiotechnology.
Measuring Programmer Productivity
ABSTRACT: Developers have been programming for the last 30 years in a wide variety of programming languages. Over the years, we have all developed a feeling for what it is in a programming language that makes us productive as programmers. As part of the DARPA HPCS (High Productivity Computing Systems) program, we are developing models and tools to measure programmer productivity. We will describe our data gathering process, and our effort to model programmer workflows using timed markov models. timed markov models.
Sparse and large-scale learning with heterogeneous data
ABSTRACT: An important challenge for the field of machine learning is to deal with the increasing amount of data that is available for learning and to leverage the (also increasing) diversity of information sources, describing these data. Beyond classical vectorial data formats, data in the format of graphs, trees, strings and beyond have become widely available for data mining, e.g., the linked structure of the world wide web, text, images and sounds on web pages, protein interaction networks, phylogenetic trees, etc. Moreover, for interpretability and economical reasons, decision rules that rely on a small subset of the information sources and/or a small subset of the features describing the data are highly desired: sparse learning algorithms are a must. This talk will outline two recent approaches that address sparse, large-scale learning with heterogeneous data, and show some applications.
Code Generation With Ruby
Talk about code generation techniques using Ruby. He will cover both do-it-yourself and off-the-shelf solutions in a conversation about where Ruby is as a tool, and where it's going.
Random Sampling from a Search Engine's Index
ABSTRACT: We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from a search engine's index using only the search engine's public interface?
In this paper we introduce two novel sampling techniques: a lexicon-based technique and a random walk technique. Our methods produce biased sample documents, but each sample is accompanied by a corresponding "weight", which represents the probability of this document to be selected in the sample. The samples, in conjunction with the weights, are then used to simulate near-uniform samples. To this end, we resort to three well known Monte Carlo simulation methods: rejection sampling, importance sampling and the Metropolis-Hastings algorithm.
We analyze our methods rigorously and prove that under plausible assumptions, our techniques are guaranteed to produce near-uniform samples from the search engine's index. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long or highly ranked documents.
A New Way to look at Networking
ABSTRACT: Today's research community congratulates itself for the success of the internet and passionately argues whether circuits or datagrams are the One True Way. Meanwhile the list of unsolved problems grows.
Security, mobility, ubiquitous computing, wireless, autonomous sensors, content distribution, digital divide, third world infrastructure, etc., are all poorly served by what's available from either the research community or the marketplace. I'll use various strained analogies and contrived examples to argue that network research is moribund because the only thing it knows how to do is fill in the details of a conversation between two applications. Today as in the 60s problems go unsolved due to our tunnel vision and not because of their intrinsic difficulty. And now, like then, simply changing our point of view may make many hard things easy.
Privacy Preserving DataMining
ABSTRACT: The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data --- for instance, how much traffic does the Internet now carry? The problem is not that the data is technically hard to obtain, or that it does not exist, but rather that the data is not shared. Obtaining an overall picture requires data from multiple sources, few of whom are open to sharing such data, either because it violates privacy legislation, or exposes business secrets. The approaches used so far in the Internet, e.g., trusted third parties, or data anonymization, have been only partially successful, and are not widely adopted.
The paper presents a method for performing computations on shared data without any participants revealing their secret data. For example, one can compute the sum of traffic over a set of service providers without any service provider learning the traffic of another. The method is simple, scalable, and flexible enough to perform a wide range of valuable operations on Internet data.
Near-optimal Monitoring of Online Data Sources
ABSTRACT Crawling the Web for interesting and relevant changes has become increasingly difficult due to the abundance of frequently changing information. Common techniques for solving such problems make use of heuristics, which do not provide performance guarantees and tend to be tailored to specific scenarios or benchmarks.
In this talk, I will present a principled approach based on mathematical optimization for monitoring high-volume online data sources. We have built and deployed a distributed system called Corona that enables clients to subscribe to Web pages and notifies clients of updates asynchronously via instant messages. Corona assigns multiple nodes to cooperatively monitor each Web page and employs a novel decentralized optimization technique for distributing the monitoring load. In its currently running form, the optimization algorithm guarantees the best update detection time on average without exceeding resource constraints on the monitoring servers. Based on simulations and measurements on our deployed system, I will show that Corona performs substantially better than commonly used heuristics.
- Free Computer Science Video Lecture Courses
(Courses include web application development, lisp/scheme programming, data structures, algorithms, machine structures, programming languages, principles of software engineering, object oriented programming in java, systems, computer system engineering, computer architecture, operating systems, database management systems, performance analysis, cryptography, artificial intelligence)
- More Mathematics and Theoretical Computer Science Video Lectures
(Includes algebra, elementary statistics, applied probability, finite mathematics, trigonometry with calculus, mathematical computation, pre-calculus, analytic geometry, first year calculus, business calculus, mathematical writing (by Knuth), computer science problem seminar (by Knuth), dynamic systems and chaos, computer musings (by Knuth) and other Donald E. Knuth lectures)
- Computer Science Courses
(Includes introduction to computer science and computing systems, computational complexity and quantum computing, the c programming language, multicore programming, statistics and data mining, combinatorics, software testing, evolutionary computation, deep learning, data structures and algorithms and computational origami.)