GSoC Idea: MapReduce Algorithm Using Scripted Channels and Threads

MapReduce Algorithm Using Scripted Channels and Threads

SEH: Tcl's new reflected channel feature makes it possible to create and manipulate channels entirely within the interpreter, without recourse to the network layer. This feature allows a channel to be opened with one communicating end in one interpreter, and the other end in another. Furthermore, the two interpreters may be in separate threads, thus enabling a method for exchanging data between threads.

This makes it possible to envision a multicore-friendly processing environment in which a "master" interpreter manages a central repository of data in memory, which it parcels out to independent processing threads on demand, via scripted channel interactions. This is just the framework needed to execute a mapReduce distributed computing architecture.

MapReduce is a leading technique for parallel computing, which has become an important field in the current multi-core hardware world. The purpose of this project would be to use Tcl's threads and scripted channel features to create a pure-Tcl framework for executing mapReduce-based algorithms.

I would suggest that coroutines might play a role in executing this project, but I don't understand enough about them to know if that would be appropriate or practical.

Benefit for the student: learn about parallel computing, one of the most significant topics in computing today, as well as channel and threading concepts.

Benefit for community: despite the acknowledged importance of parallel computing, it is widely agreed that available tools are not adequate to write software that takes full advantage of modern multi-core and cloud computing environments. A simple and stable parallel execution architecture would leverage some of Tcl's latest and best features and offer an opportunity for Tcl to take a leading role in the field.